subscribe to arXiv mailings

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Authors: Shaohong Wang, Lu Bin, Xinyu Xiao, Zhiyu Xiang, Hangguan Shan, Eryun Liu

Abstract: Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of t… ▽ More Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of the advantages offered by the camera modality. This work proposes an instance-level fusion transformer for visual collaborative perception (IFTR), which enhances the detection performance of camera-only collaborative perception systems through the communication and sharing of visual features. To capture the visual information from multiple agents, we design an instance feature aggregation that interacts with the visual features of individual agents using predefined grid-shaped bird eye view (BEV) queries, generating more comprehensive and accurate BEV features. Additionally, we devise a cross-domain query adaptation as a heuristic to fuse 2D priors, implicitly encoding the candidate positions of targets. Furthermore, IFTR optimizes communication efficiency by sending instance-level features, achieving an optimal performance-bandwidth trade-off. We evaluate the proposed IFTR on a real dataset, DAIR-V2X, and two simulated datasets, OPV2V and V2XSet, achieving performance improvements of 57.96%, 9.23% and 12.99% in AP@70 metrics compared to the previous SOTAs, respectively. Extensive experiments demonstrate the superiority of IFTR and the effectiveness of its key components. The code is available at https://github.com/wangsh0111/IFTR. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.08351 [pdf, other]

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

Authors: Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto

Abstract: Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), a… ▽ More Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), and (iii) difficulty (i.e., the benchmark should be difficult for existing models, leaving headroom for future improvement). We operationalize these three desiderata and cast benchmark creation as a search problem, that of finding benchmarks that that satisfy all three desiderata. To tackle this search problem, we present AutoBencher, which uses a language model to automatically search for datasets that meet the three desiderata. AutoBencher uses privileged information (e.g. relevant documents) to construct reliable datasets, and adaptivity with reranking to optimize for the search objective. We use AutoBencher to create datasets for math, multilingual, and knowledge-intensive question answering. The scalability of AutoBencher allows it to test fine-grained categories and tail knowledge, creating datasets that are on average 27% more novel and 22% more difficult than existing benchmarks. A closer investigation of our constructed datasets shows that we can identify specific gaps in LM knowledge in language models that are not captured by existing benchmarks, such as Gemini Pro performing much worse on question answering about the Permian Extinction and Fordism, while OpenAGI-7B performing surprisingly well on QA about COVID-19. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: preprint

arXiv:2407.00945 [pdf, other]

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster inference while maintaining performance. However, SMoE models still face limitations in broader deployment due to their large parameter counts and significant GPU memory requirements. In this work, we introduce a gradient-free evolutionary strategy named EEP (Efficient Expert P}runing) to enhance the pruning of experts in SMoE models. EEP relies solely on model inference (i.e., no gradient computation) and achieves greater sparsity while maintaining or even improving performance on downstream tasks. EEP can be used to reduce both the total number of experts (thus saving GPU memory) and the number of active experts (thus accelerating inference). For example, we demonstrate that pruning up to 75% of experts in Mixtral $8\times7$B-Instruct results in a substantial reduction in parameters with minimal performance loss. Remarkably, we observe improved performance on certain tasks, such as a significant increase in accuracy on the SQuAD dataset (from 53.4% to 75.4%), when pruning half of the experts. With these results, EEP not only lowers the barrier to deploying SMoE models,but also challenges the conventional understanding of model pruning by showing that fewer experts can lead to better task-specific performance without any fine-tuning. Code is available at https://github.com/imagination-research/EEP. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18389 [pdf, ps, other]

doi 10.1109/ICC40277.2020.9149366

Blockchain Based Zero-Knowledge Proof of Location in IoT

Authors: Wei Wu, Erwu Liu, Xinglin Gong, Rui Wang

Abstract: With the development of precise positioning technology, a growing number of location-based services (LBSs) facilitate people's life. Most LBSs require proof of location (PoL) to prove that the user satisfies the service requirement, which exposes the user's privacy. In this paper, we propose a zero-knowledge proof of location (zk-PoL) protocol to better protect the user's privacy. With the zk-PoL… ▽ More With the development of precise positioning technology, a growing number of location-based services (LBSs) facilitate people's life. Most LBSs require proof of location (PoL) to prove that the user satisfies the service requirement, which exposes the user's privacy. In this paper, we propose a zero-knowledge proof of location (zk-PoL) protocol to better protect the user's privacy. With the zk-PoL protocol, the user can choose necessary information to expose to the server, so that hierarchical privacy protection can be achieved. The evaluation shows that the zk-PoL has excellent security to resist main attacks, moreover the computational efficiency is independent of input parameters and the zk-PoL is appropriate to delay-tolerant LBSs. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Published on ICC 2020-2020 IEEE International Conference on Communications (ICC)

arXiv:2406.11830 [pdf, other]

Language Modeling with Editable External Knowledge

Authors: Belinda Z. Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig, Jacob Andreas

Abstract: When the world changes, so does the text that humans write about it. How do we build language models that can be easily updated to reflect these changes? One popular approach is retrieval-augmented generation, in which new documents are inserted into a knowledge base and retrieved during prediction for downstream tasks. Most prior work on these systems have focused on improving behavior during pre… ▽ More When the world changes, so does the text that humans write about it. How do we build language models that can be easily updated to reflect these changes? One popular approach is retrieval-augmented generation, in which new documents are inserted into a knowledge base and retrieved during prediction for downstream tasks. Most prior work on these systems have focused on improving behavior during prediction through better retrieval or reasoning. This paper introduces ERASE, which instead improves model behavior when new documents are acquired, by incrementally deleting or rewriting other entries in the knowledge base each time a document is added. In two new benchmark datasets evaluating models' ability to answer questions about a stream of news articles or conversations, ERASE improves accuracy relative to conventional retrieval-augmented generation by 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute. Code and data are available at https://github.com/belindal/ERASE △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09181 [pdf, other]

A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

Authors: Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Song, Mingli Song, Zunlei Feng

Abstract: With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a si… ▽ More With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at https://github.com/HengruiLou/DeepFaceGen. △ Less

Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: This is a paper about constructing a large-scale universal evaluation benchmark for face forgery detection.The full text is 30 pages

arXiv:2406.02540 [pdf, other]

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an effective method for reducing memory costs and computational complexity. When quantizing diffusion transformers, we find that applying existing diffusion quantization methods designed for U-Net faces challenges in preserving quality. After analyzing the major challenges for quantizing diffusion transformers, we design an improved quantization scheme: "ViDiT-Q": Video and Image Diffusion Transformer Quantization) to address these issues. Furthermore, we identify highly sensitive layers and timesteps hinder quantization for lower bit-widths. To tackle this, we improve ViDiT-Q with a novel metric-decoupled mixed-precision quantization method (ViDiT-Q-MP). We validate the effectiveness of ViDiT-Q across a variety of text-to-image and video models. While baseline quantization methods fail at W8A8 and produce unreadable content at W4A8, ViDiT-Q achieves lossless W8A8 quantization. ViDiTQ-MP achieves W4A8 with negligible visual quality degradation, resulting in a 2.5x memory optimization and a 1.5x latency speedup. △ Less

Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

arXiv:2406.01103 [pdf, other]

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Authors: Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

Abstract: Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a p… ▽ More Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent's behavior with human expectations. Shūkai's ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accept at ICML 2024

arXiv:2405.20220 [pdf, other]

BeerReview: A Blockchain-enabled Peer Review Platform

Authors: Guodong Jin, Zihan Zhou, Wenzheng Tang, Kanglei Yu, Hao Xu, Erwu Liu

Abstract: In an era of increasing concerns over intellectual property rights, traditional peer review systems face challenges including plagiarism, malicious attacks, and unauthorized data access. BeerReview, a blockchain-enabled peer review platform, offers a robust solution, enabling experts and scholars to participate actively in the review process without concerns about plagiarism or security threats. F… ▽ More In an era of increasing concerns over intellectual property rights, traditional peer review systems face challenges including plagiarism, malicious attacks, and unauthorized data access. BeerReview, a blockchain-enabled peer review platform, offers a robust solution, enabling experts and scholars to participate actively in the review process without concerns about plagiarism or security threats. Following the completion of its alpha testing, BeerReview demonstrates the potential for expanded deployment. This platform offers improved convenience and more robust intellectual property protection within the peer review process with open source initiative. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17873 [pdf, other]

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Authors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantiz… ▽ More Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedding quantization. Then, we conduct metric-decoupled sensitivity analysis to measure the sensitivity of each layer. Finally, we develop an integer-programming-based method to conduct bit-width allocation. While existing quantization methods fall short at W8A8, MixDQ could achieve W8A8 without performance loss, and W4A8 with negligible visual degradation. Compared with FP16, we achieve 3-4x reduction in model size and memory cost, and 1.45x latency speedup. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Project Page: https://a-suozhang.xyz/mixdq.github.io/

arXiv:2405.09757 [pdf, other]

Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates

Authors: Enze Liu, George Kappos, Eric Mugnier, Luca Invernizzi, Stefan Savage, David Tao, Kurt Thomas, Geoffrey M. Voelker, Sarah Meiklejohn

Abstract: Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they mak… ▽ More Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they make, and any potential bottlenecks for durable interventions. In this short paper, we focus on these questions in the context of cryptocurrency giveaway scams, where victims are tricked into irreversibly transferring funds to scammers under the pretense of even greater returns. Combining data from Twitter, YouTube and Twitch livestreams, landing pages, and cryptocurrency blockchains, we measure how giveaway scams operate at scale. We find that 1 in 1000 scam tweets, and 4 in 100,000 livestream views, net a victim, and that scammers managed to extract nearly \$4.62 million from just hundreds of victims during our measurement window. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.09276 [pdf, other]

Dual-Segment Clustering Strategy for Federated Learning in Heterogeneous Environments

Authors: Pengcheng Sun, Erwu Liu, Wei Ni, Kanglei Yu, Rui Wang, Abbas Jamalipour

Abstract: Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of param… ▽ More Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of parameter transmission, causing a degradation in the performance of the FL system or even preventing its convergence. This letter proposes a dual-segment clustering (DSC) strategy, which first clusters the clients according to the heterogeneous communication conditions and then performs a second clustering by the sample size and label distribution, so as to solve the problem of data and communication heterogeneity. Experimental results show that the DSC strategy proposed in this letter can improve the convergence rate of FL, and has superiority on accuracy in a heterogeneous environment compared with the classical algorithm of cluster. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.08651 [pdf, other]

BeACONS: A Blockchain-enabled Authentication and Communications Network for Scalable IoV

Authors: Qi Shi, Jingyi Sun, Hanwei Fu, Peizhe Fu, Jiayuan Ma, Hao Xu, Erwu Liu

Abstract: This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled… ▽ More This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled Mutual Authentication. The proposed network is structured into a primary layer, consisting of Road Side Units and edge servers as servers of Blockchain-enabled Domain Name Services for managing inter-vehicle communications identities, and a sub-layer within each vehicle for intra-vehicle communications via the Blockchain-enabled Mutual Authentication Protocol. This design facilitates secure connections across vehicles by coordinating between the layers, significantly improving communications security and efficiency. This study also evaluates Road Side Unit availability against the random distribution of Road Side Units along the route of different vehicles. The proposed model presents a novel pathway towards a decentralised, secure, and efficient Internet of Vehicles ecosystem, contributing to the advancement of autonomous and trustworthy vehicular networks. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.02320 [pdf, other]

A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning

Authors: Pengcheng Sun, Erwu Liu, Rui Wang

Abstract: The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes a… ▽ More The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes advantage of the superposition characteristics of wireless channels. The Minimum Mean Square Error (MMSE) based serial interference cancellation (SIC) technology is used to recover the gradient of each terminal node one by one at the receiving end. In this paper, the gradient parameters are quantized into multiple bits to retain more gradient information to the maximum extent and to improve the tolerance of transmission errors. On this basis, we designed the SER-based device selection mechanism (SER-DSM) to ensure that the learning performance is not affected by users with bad communication conditions, while accommodating as many users as possible to participate in the learning process, which is inclusive to a certain extent. The experiments show the influence of multi-bit quantization of gradient on FL and the necessity and superiority of the proposed SER-based device selection mechanism. △ Less

Submitted 20 April, 2024; originally announced May 2024.

arXiv:2404.03028 [pdf, other]

An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models

Authors: Emmy Liu, Graham Neubig, Jacob Andreas

Abstract: Modern language models (LMs) can learn to perform new tasks in different ways: in instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly with a small number of examples; in instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description before… ▽ More Modern language models (LMs) can learn to perform new tasks in different ways: in instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly with a small number of examples; in instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description before making predictions. Each of these procedures may be thought of as invoking a different form of reasoning: instruction following involves deductive reasoning, few-shot prompting involves inductive reasoning, and instruction inference involves abductive reasoning. How do these different capabilities relate? Across four LMs (from the gpt and llama families) and two learning problems (involving arithmetic functions and machine translation) we find a strong dissociation between the different types of reasoning: LMs can sometimes learn effectively from few-shot prompts even when they are unable to explain their own prediction rules; conversely, they sometimes infer useful task descriptions while completely failing to learn from human-generated descriptions of the same task. Our results highlight the non-systematic nature of reasoning even in some of today's largest LMs, and underscore the fact that very different learning mechanisms may be invoked by seemingly similar prompting procedures. △ Less

Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02241 [pdf, other]

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by pro… ▽ More Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by proper checkpoint averaging. Based on these observations, we propose LCSC, a simple but effective and efficient method to enhance the performance of DM and CM, by combining checkpoints along the training trajectory with coefficients deduced from evolutionary search. We demonstrate the value of LCSC through two use cases: $\textbf{(a) Reducing training cost.}$ With LCSC, we only need to train DM/CM with fewer number of iterations and/or lower batch sizes to obtain comparable sample quality with the fully trained model. For example, LCSC achieves considerable training speedups for CM (23$\times$ on CIFAR-10 and 15$\times$ on ImageNet-64). $\textbf{(b) Enhancing pre-trained models.}$ Assuming full training is already done, LCSC can further improve the generation quality or speed of the final converged models. For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10. Our code is available at https://github.com/imagination-research/LCSC. △ Less

Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01817 [pdf, other]

Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration

Authors: Lishuang Wang, Mengfei Zhao, Enyu Liu, Kebin Sun, Ran Cheng

Abstract: The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its sc… ▽ More The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its scalability potential. In response, this paper introduces a tensorization method for the NEAT algorithm, enabling the transformation of its diverse network topologies and associated operations into uniformly shaped tensors for computation. This advancement facilitates the execution of the NEAT algorithm in a parallelized manner across the entire population. Furthermore, we develop TensorNEAT, a library that implements the tensorized NEAT algorithm and its variants, such as CPPN and HyperNEAT. Building upon JAX, TensorNEAT promotes efficient parallel computations via automated function vectorization and hardware acceleration. Moreover, the TensorNEAT library supports various benchmark environments including Gym, Brax, and gymnax. Through evaluations across a spectrum of robotics control environments in Brax, TensorNEAT achieves up to 500x speedups compared to the existing implementations such as NEAT-Python. Source codes are available at: https://github.com/EMI-Group/tensorneat. △ Less

Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2403.13574 [pdf, other]

A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation

Authors: Bowen Zheng, Zihan Lin, Enze Liu, Chen Yang, Enyang Bai, Cheng Ling, Wayne Xin Zhao, Ji-Rong Wen

Abstract: In online video platforms, reading or writing comments on interesting videos has become an essential part of the video watching experience. However, existing video recommender systems mainly model users' interaction behaviors with videos, lacking consideration of comments in user behavior modeling. In this paper, we propose a novel recommendation approach called LSVCR by leveraging user interactio… ▽ More In online video platforms, reading or writing comments on interesting videos has become an essential part of the video watching experience. However, existing video recommender systems mainly model users' interaction behaviors with videos, lacking consideration of comments in user behavior modeling. In this paper, we propose a novel recommendation approach called LSVCR by leveraging user interaction histories with both videos and comments, so as to jointly conduct personalized video and comment recommendation. Specifically, our approach consists of two key components, namely sequential recommendation (SR) model and supplemental large language model (LLM) recommender. The SR model serves as the primary recommendation backbone (retained in deployment) of our approach, allowing for efficient user preference modeling. Meanwhile, we leverage the LLM recommender as a supplemental component (discarded in deployment) to better capture underlying user preferences from heterogeneous interaction behaviors. In order to integrate the merits of the SR model and the supplemental LLM recommender, we design a twostage training paradigm. The first stage is personalized preference alignment, which aims to align the preference representations from both components, thereby enhancing the semantics of the SR model. The second stage is recommendation-oriented fine-tuning, in which the alignment-enhanced SR model is fine-tuned according to specific objectives. Extensive experiments in both video and comment recommendation tasks demonstrate the effectiveness of LSVCR. Additionally, online A/B testing on the KuaiShou platform verifies the actual benefits brought by our approach. In particular, we achieve a significant overall gain of 4.13% in comment watch time. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.02768 [pdf, other]

An Empirical Analysis on the Use and Reporting of National Security Letters

Authors: Alex Bellon, Miro Haller, Andrey Labunets, Enze Liu, Stefan Savage

Abstract: National Security Letters (NSLs) are similar to administrative subpoenas and can be issued directly by elements of the executive branch without requiring prior approval from a court or grand jury. Importantly, NSLs authorize the imposition of nondisclosure orders (aka "gag orders") on the receiving party. Controversy about potential abuses of this authority has driven a range of legal and policy d… ▽ More National Security Letters (NSLs) are similar to administrative subpoenas and can be issued directly by elements of the executive branch without requiring prior approval from a court or grand jury. Importantly, NSLs authorize the imposition of nondisclosure orders (aka "gag orders") on the receiving party. Controversy about potential abuses of this authority has driven a range of legal and policy discussions. To address these concerns, both the public sector and the private sector have sought to document the usage of NSLs in aggregated form. However, each data source is limited in scope, time, and kind. In this paper, we consolidate the available data around NSLs and answer two questions: (1) what can the public effectively learn from the reported data and does this information suffice to assess the NSL usage? (2) how accessible is this data collection? We show that longitudinal trends in the usage of NSLs can be observed. For instance, we find a significant increase in NSL requests for non-US persons and that the policy reforms to decrease the mandated nondisclosure period appear to be effective. The observed trends suggest that the current transparency mechanisms are viable safeguards against the excessive use of NSLs. However, aggregating and normalizing the data requires manual reviewing, parsing, and validating. We even find inconsistencies within and across official data sources. Overall, the laborious data collection process hinders external and internal auditing efforts and demonstrates the need for a unified and more usable dataset for NSLs. △ Less

Submitted 10 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted as short presentation at CSLAW 2024 (not in proceeding): https://computersciencelaw.org/2024/accepted-papers/

arXiv:2402.19348 [pdf, other]

Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

Authors: Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang

Abstract: As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning m… ▽ More As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at https://github.com/yoshall/Awesome-Multimodal-Urban-Computing. △ Less

Submitted 16 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17219 [pdf, other]

Blockchain for Finance: A Survey

Authors: Hanjie Wu, Qian Yao, Zhenguang Liu, Butian Huang, Yuan Zhuang, Huayun Tang, Erwu Liu

Abstract: As an innovative technology for enhancing authenticity, security, and risk management, blockchain is being widely adopted in trade and finance systems. The unique capabilities of blockchain, such as immutability and transparency, enable new business models of distributed data storage, point-to-point transactions, and decentralized autonomous organizations. In this paper, we focus on blockchain-bas… ▽ More As an innovative technology for enhancing authenticity, security, and risk management, blockchain is being widely adopted in trade and finance systems. The unique capabilities of blockchain, such as immutability and transparency, enable new business models of distributed data storage, point-to-point transactions, and decentralized autonomous organizations. In this paper, we focus on blockchain-based securities trading, in which blockchain technology plays a vital role in financial services as it ultimately lifts trust and frees the need for third-party verification by using consensus-based verification. We investigate the 12 most popular blockchain platforms and elaborate on 6 platforms that are related to finance, seeking to provide a panorama of securities trading practices. Meanwhile, this survey provides a comprehensive summary of blockchain-based securities trading applications. We gather numerous practical applications of blockchain-based securities trading and categorize them into four distinct categories. For each category, we introduce a typical example and explain how blockchain contributes to solving the key problems faced by FinTech companies and researchers. Finally, we provide interesting observations ranging from mainstream blockchain-based financial institutions to security issues of decentralized finance applications, aiming to picture the current blockchain ecosystem in finance. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.01115 [pdf, other]

Interpretation of Intracardiac Electrograms Through Textual Representations

Authors: William Jongwon Han, Diana Gomez, Avi Alok, Chaojing Duan, Michael A. Rosenberg, Douglas Weber, Emerson Liu, Ding Zhao

Abstract: Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artif… ▽ More Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artificial intelligence (AI) has allowed some works to utilize deep learning frameworks to interpret EGMs during AFib. Additionally, language models (LMs) have shown exceptional performance in being able to generalize to unseen domains, especially in healthcare. In this study, we are the first to leverage pretrained LMs for finetuning of EGM interpolation and AFib classification via masked language modeling. We formulate the EGM as a textual sequence and present competitive performances on AFib classification compared against other representations. Lastly, we provide a comprehensive interpretability study to provide a multi-perspective intuition of the model's behavior, which could greatly benefit the clinical use. △ Less

Submitted 11 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 18 pages, 9 figures; Accepted to CHIL 2024

ACM Class: I.2.7; J.3

arXiv:2312.12634 [pdf, other]

MotionScript: Natural Language Descriptions for Expressive 3D Human Motions

Authors: Payam Jome Yazdian, Eric Liu, Li Cheng, Angelica Lim

Abstract: This paper proposes MotionScript, a motion-to-text conversion algorithm and natural language representation for human body motions. MotionScript aims to describe movements in greater detail and with more accuracy than previous natural language approaches. Many motion datasets describe relatively objective and simple actions with little variation on the way they are expressed (e.g. sitting, walking… ▽ More This paper proposes MotionScript, a motion-to-text conversion algorithm and natural language representation for human body motions. MotionScript aims to describe movements in greater detail and with more accuracy than previous natural language approaches. Many motion datasets describe relatively objective and simple actions with little variation on the way they are expressed (e.g. sitting, walking, dribbling a ball). But for expressive actions that contain a diversity of movements in the class (e.g. being sad, dancing), or for actions outside the domain of standard motion capture datasets (e.g. stylistic walking, sign-language), more specific and granular natural language descriptions are needed. Our proposed MotionScript descriptions differ from existing natural language representations in that it provides direct descriptions in natural language instead of simple action labels or high-level human captions. To the best of our knowledge, this is the first attempt at translating 3D motions to natural language descriptions without requiring training data. Our experiments show that when MotionScript representations are used in a text-to-motion neural task, body movements are more accurately reconstructed, and large language models can be used to generate unseen complex motions. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11084 [pdf, other]

Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects

Authors: Min Hua, Dong Chen, Xinda Qi, Kun Jiang, Zemin Eitan Liu, Quan Zhou, Hongming Xu

Abstract: Connected and automated vehicles (CAVs) have emerged as a potential solution to the future challenges of developing safe, efficient, and eco-friendly transportation systems. However, CAV control presents significant challenges, given the complexity of interconnectivity and coordination required among the vehicles. To address this, multi-agent reinforcement learning (MARL), with its notable advance… ▽ More Connected and automated vehicles (CAVs) have emerged as a potential solution to the future challenges of developing safe, efficient, and eco-friendly transportation systems. However, CAV control presents significant challenges, given the complexity of interconnectivity and coordination required among the vehicles. To address this, multi-agent reinforcement learning (MARL), with its notable advancements in addressing complex problems in autonomous driving, robotics, and human-vehicle interaction, has emerged as a promising tool for enhancing the capabilities of CAVs. However, there is a notable absence of current reviews on the state-of-the-art MARL algorithms in the context of CAVs. Therefore, this paper delivers a comprehensive review of the application of MARL techniques within the field of CAV control. The paper begins by introducing MARL, followed by a detailed explanation of its unique advantages in addressing complex mobility and traffic scenarios that involve multiple agents. It then presents a comprehensive survey of MARL applications on the extent of control dimensions for CAVs, covering critical and typical scenarios such as platooning control, lane-changing, and unsignalized intersections. In addition, the paper provides a comprehensive review of the prominent simulation platforms used to create reliable environments for training in MARL. Lastly, the paper examines the current challenges associated with deploying MARL within CAV control and outlines potential solutions that can effectively overcome these issues. Through this review, the study highlights the tremendous potential of MARL to enhance the performance and collaboration of CAV control in terms of safety, travel efficiency, and economy. △ Less

Submitted 16 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.07243 [pdf, other]

A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models

Authors: Enshu Liu, Xuefei Ning, Huazhong Yang, Yu Wang

Abstract: Recent years have witnessed the rapid progress and broad application of diffusion probabilistic models (DPMs). Sampling from DPMs can be viewed as solving an ordinary differential equation (ODE). Despite the promising performance, the generation of DPMs usually consumes much time due to the large number of function evaluations (NFE). Though recent works have accelerated the sampling to around 20 s… ▽ More Recent years have witnessed the rapid progress and broad application of diffusion probabilistic models (DPMs). Sampling from DPMs can be viewed as solving an ordinary differential equation (ODE). Despite the promising performance, the generation of DPMs usually consumes much time due to the large number of function evaluations (NFE). Though recent works have accelerated the sampling to around 20 steps with high-order solvers, the sample quality with less than 10 NFE can still be improved. In this paper, we propose a unified sampling framework (USF) to study the optional strategies for solver. Under this framework, we further reveal that taking different solving strategies at different timesteps may help further decrease the truncation error, and a carefully designed \emph{solver schedule} has the potential to improve the sample quality by a large margin. Therefore, we propose a new sampling framework based on the exponential integral formulation that allows free choices of solver strategy at each step and design specific decisions for the framework. Moreover, we propose $S^3$, a predictor-based search method that automatically optimizes the solver schedule to get a better time-quality trade-off of sampling. We demonstrate that $S^3$ can find outstanding solver schedules which outperform the state-of-the-art sampling methods on CIFAR-10, CelebA, ImageNet, and LSUN-Bedroom datasets. Specifically, we achieve 2.69 FID with 10 NFE and 6.86 FID with 5 NFE on CIFAR-10 dataset, outperforming the SOTA method significantly. We further apply $S^3$ to Stable-Diffusion model and get an acceleration ratio of 2$\times$, showing the feasibility of sampling in very few steps without retraining the neural network. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.09553 [pdf, other]

Program-Aided Reasoners (better) Know What They Know

Authors: Anubha Kabra, Sanketh Rangreji, Yash Mathur, Aman Madaan, Emmy Liu, Graham Neubig

Abstract: Prior work shows that program-aided reasoning, in which large language models (LLMs) are combined with programs written in programming languages such as Python, can significantly improve accuracy on various reasoning tasks. However, while accuracy is essential, it is also important for such reasoners to "know what they know", which can be quantified through the calibration of the model. In this pa… ▽ More Prior work shows that program-aided reasoning, in which large language models (LLMs) are combined with programs written in programming languages such as Python, can significantly improve accuracy on various reasoning tasks. However, while accuracy is essential, it is also important for such reasoners to "know what they know", which can be quantified through the calibration of the model. In this paper, we compare the calibration of Program Aided Language Models (PAL) and text-based Chain-of-thought (COT) prompting techniques over 5 datasets and 2 model types: LLaMA models and OpenAI models. Our results indicate that PAL leads to improved calibration in 75% of the instances. Our analysis uncovers that prompting styles that produce lesser diversity in generations also have more calibrated results, and thus we also experiment with inducing lower generation diversity using temperature scaling and find that for certain temperatures, PAL is not only more accurate but is also more calibrated than COT. Overall, we demonstrate that, in the majority of cases, program-aided reasoners better know what they know than text-based counterparts. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.09308 [pdf, other]

Divergences between Language Models and Human Brains

Authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe

Abstract: Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use… ▽ More Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using a data-driven approach, we identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense. We then validate these domains with human behavioral experiments and show that fine-tuning LMs on these domains can improve their alignment with human brain responses. △ Less

Submitted 4 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.08706 [pdf, other]

Aligned: A Platform-based Process for Alignment

Authors: Ethan Shaotran, Ido Pesok, Sam Jones, Emi Liu

Abstract: We are introducing Aligned, a platform for global governance and alignment of frontier models, and eventually superintelligence. While previous efforts at the major AI labs have attempted to gather inputs for alignment, these are often conducted behind closed doors. We aim to set the foundation for a more trustworthy, public-facing approach to safety: a constitutional committee framework. Initial… ▽ More We are introducing Aligned, a platform for global governance and alignment of frontier models, and eventually superintelligence. While previous efforts at the major AI labs have attempted to gather inputs for alignment, these are often conducted behind closed doors. We aim to set the foundation for a more trustworthy, public-facing approach to safety: a constitutional committee framework. Initial tests with 680 participants result in a 30-guideline constitution with 93% overall support. We show the platform naturally scales, instilling confidence and enjoyment from the community. We invite other AI labs and teams to plug and play into the Aligned ecosystem. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 11 pages, 7 figures. For associated public report, see https://energize.ai/openai

arXiv:2311.03707 [pdf, other]

The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade

Authors: Enhong Liu, Joseph Suarez, Chenhui You, Bo Wu, Bingcheng Chen, Jun Hu, Jiaxin Chen, Xiaolong Zhu, Clare Zhu, Julian Togelius, Sharada Mohanty, Weijun Hong, Rui Du, Yibing Zhang, Qinwen Wang, Xinhang Li, Zheng Yuan, Xiang Li, Yuejia Huang, Kun Zhang, Hanhui Yang, Shiqi Tang, Phillip Isola

Abstract: In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which in… ▽ More In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which introduces new equipment, combat, trading, and a better scoring system. These elements combine to pose additional robustness and generalization challenges not present in previous competitions. This paper summarizes the design and results of the challenge, explores the potential of this environment as a benchmark for learning methods, and presents some practical reinforcement learning training approaches for complex tasks with sparse rewards. Additionally, we have open-sourced our baselines, including environment wrappers, benchmarks, and visualization tools for future research. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.07081 [pdf, other]

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Authors: Emmy Liu, Aditi Chaudhary, Graham Neubig

Abstract: Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment reveali… ▽ More Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences. △ Less

Submitted 20 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.03303 [pdf, other]

A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Authors: Jintao Xue, Dongkun Zhang, Rong Xiong, Yue Wang, Eryun Liu

Abstract: Multi-Agent Reinforcement Learning (MARL) has become a promising solution for constructing a multi-agent autonomous driving system (MADS) in complex and dense scenarios. But most methods consider agents acting selfishly, which leads to conflict behaviors. Some existing works incorporate the concept of social value orientation (SVO) to promote coordination, but they lack the knowledge of other agen… ▽ More Multi-Agent Reinforcement Learning (MARL) has become a promising solution for constructing a multi-agent autonomous driving system (MADS) in complex and dense scenarios. But most methods consider agents acting selfishly, which leads to conflict behaviors. Some existing works incorporate the concept of social value orientation (SVO) to promote coordination, but they lack the knowledge of other agents' SVOs, resulting in conservative maneuvers. In this paper, we aim to tackle the mentioned problem by enabling the agents to understand other agents' SVOs. To accomplish this, we propose a two-stage system framework. Firstly, we train a policy by allowing the agents to share their ground truth SVOs to establish a coordinated traffic flow. Secondly, we develop a recognition network that estimates agents' SVOs and integrates it with the policy trained in the first stage. Experiments demonstrate that our developed method significantly improves the performance of the driving policy in MADS compared to two state-of-the-art MARL algorithms. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.01415 [pdf, other]

A generalized vector-field framework for mobility

Authors: Erjian Liu, Mattia Mazzoli, Xiao-Yong Yan, Jose J. Ramasco

Abstract: Trip flow between areas is a fundamental metric for human mobility research. Given its identification with travel demand and its relevance for transportation and urban planning, many models have been developed for its estimation. These models focus on flow intensity, disregarding the information provided by the local mobility orientation. A field-theoretic approach can overcome this issue and hand… ▽ More Trip flow between areas is a fundamental metric for human mobility research. Given its identification with travel demand and its relevance for transportation and urban planning, many models have been developed for its estimation. These models focus on flow intensity, disregarding the information provided by the local mobility orientation. A field-theoretic approach can overcome this issue and handling both intensity and direction at once. Here we propose a general vector-field representation starting from individuals' trajectories valid for any type of mobility. By introducing four models of spatial exploration, we show how individuals' elections determine the mesoscopic properties of the mobility field. Distance optimization in long displacements and random-like local exploration are necessary to reproduce empirical field features observed in Chinese logistic data and in New York City Foursquare check-ins. Our framework is an essential tool to capture hidden symmetries in mesoscopic urban mobility, it establishes a benchmark to test the validity of mobility models and opens the doors to the use of field theory in a wide spectrum of applications. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 13 pages, 8 figures, Appendices

arXiv:2308.11849 [pdf, other]

A Mobile Data-Driven Hierarchical Deep Reinforcement Learning Approach for Real-time Demand-Responsive Railway Rescheduling and Station Overcrowding Mitigation

Authors: Enze Liu, Zhiyuan Lin, Judith Y. T. Wang, Hong Chen

Abstract: Real-time railway rescheduling is an important technique to enable operational recovery in response to unexpected and dynamic conditions in a timely and flexible manner. Current research relies mostly on OD based data and model-based methods for estimating train passenger demands. These approaches primarily focus on averaged disruption patterns, often overlooking the immediate uneven distribution… ▽ More Real-time railway rescheduling is an important technique to enable operational recovery in response to unexpected and dynamic conditions in a timely and flexible manner. Current research relies mostly on OD based data and model-based methods for estimating train passenger demands. These approaches primarily focus on averaged disruption patterns, often overlooking the immediate uneven distribution of demand over time. In reality, passenger demand deviates significantly from predictions, especially during a disaster. Disastrous situations such as flood in Zhengzhou, China in 2022 has created not only unprecedented effect on Zhengzhou railway station itself, which is a major railway hub in China, but also other major hubs connected to Zhengzhou, e.g., Xi'an, the closest hub west of Zhengzhou. In this study, we define a real-time demand-responsive (RTDR) railway rescheduling problem focusing two specific aspects, namely, volatility of the demand, and management of station crowdedness. For the first time, we propose a data-driven approach using real-time mobile data (MD) to deal with this RTDR problem. A hierarchical deep reinforcement learning (HDRL) framework is designed to perform real-time rescheduling in a demand-responsive manner. The use of MD has enabled the modelling of passenger dynamics in response to train delays and station crowdedness, and a real-time optimisation for rescheduling of train services in view of the change in demand as a result of passengers' behavioural response to disruption. Results show that the agent can steadily satisfy over 62% of the demand with only 61% of the original rolling stock, ensuring continuous operations without overcrowding. Moreover, the agent exhibits adaptability when transferred to a new environment with increased demand, highlighting its effectiveness in addressing unforeseen disruptions in real-time settings. △ Less

Submitted 6 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 42 pages,20 figures

arXiv:2308.06829 [pdf, other]

When Web 3.0 Meets Reality: A Hyperdimensional Fractal Polytope P2P Ecosystems

Authors: Hao Xu, Yunqing Sun, Xiaoshuai Zhang, Erwu Liu, Chih-Lin I

Abstract: Web 3.0 opens the world of new existence of the crypto-network-entity, which is independently defined by the public key pairs for entities and the connection to the Web 3.0 cyberspace. In this paper, we first discover a spacetime coordinate system based on fractal polytope in any dimensions with discrete time offered by blockchain and consensus. Second, the novel network entities and functions are… ▽ More Web 3.0 opens the world of new existence of the crypto-network-entity, which is independently defined by the public key pairs for entities and the connection to the Web 3.0 cyberspace. In this paper, we first discover a spacetime coordinate system based on fractal polytope in any dimensions with discrete time offered by blockchain and consensus. Second, the novel network entities and functions are defined to make use of hyperdimensional deterministic switching and routing protocols and blockchain-enabled mutual authentication. In addition to spacetime network architecture, we also define a multi-tier identity scheme which extends the native Web 3.0 crypto-network-entity to outer cyber and physical world, offering legal-compliant anonymity and linkability to all derived identifiers of entities. In this way, we unify the holistic Web 3.0 network based on persistent spacetime and its entity extension to our cyber and physical world. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2306.13023 [pdf, other]

AugDMC: Data Augmentation Guided Deep Multiple Clustering

Authors: Jiawei Yao, Enbei Liu, Maham Rashid, Juhua Hu

Abstract: Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as k-means provide only a single clustering for one data set. Deep clustering methods such as auto-encoder based clustering methods have shown a better performance, bu… ▽ More Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as k-means provide only a single clustering for one data set. Deep clustering methods such as auto-encoder based clustering methods have shown a better performance, but still provide a single clustering. However, a given dataset might have multiple clustering structures and each represents a unique perspective of the data. Therefore, some multiple clustering methods have been developed to discover multiple independent structures hidden in data. Although deep multiple clustering methods provide better performance, how to efficiently capture the alternative perspectives in data is still a problem. In this paper, we propose AugDMC, a novel data Augmentation guided Deep Multiple Clustering method, to tackle the challenge. Specifically, AugDMC leverages data augmentations to automatically extract features related to a certain aspect of the data using a self-supervised prototype-based representation learning, where different aspects of the data can be preserved under different data augmentations. Moreover, a stable optimization strategy is proposed to alleviate the unstable problem from different augmentations. Thereafter, multiple clusterings based on different aspects of the data can be obtained. Experimental results on three real-world datasets compared with state-of-the-art methods validate the effectiveness of the proposed method. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.08860 [pdf, other]

OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models

Authors: Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, Yu Wang

Abstract: Diffusion probabilistic models (DPMs) are a new class of generative models that have achieved state-of-the-art generation quality in various domains. Despite the promise, one major drawback of DPMs is the slow generation speed due to the large number of neural network evaluations required in the generation process. In this paper, we reveal an overlooked dimension -- model schedule -- for optimizin… ▽ More Diffusion probabilistic models (DPMs) are a new class of generative models that have achieved state-of-the-art generation quality in various domains. Despite the promise, one major drawback of DPMs is the slow generation speed due to the large number of neural network evaluations required in the generation process. In this paper, we reveal an overlooked dimension -- model schedule -- for optimizing the trade-off between generation quality and speed. More specifically, we observe that small models, though having worse generation quality when used alone, could outperform large models in certain generation steps. Therefore, unlike the traditional way of using a single model, using different models in different generation steps in a carefully designed \emph{model schedule} could potentially improve generation quality and speed \emph{simultaneously}. We design OMS-DPM, a predictor-based search algorithm, to optimize the model schedule given an arbitrary generation time budget and a set of pre-trained models. We demonstrate that OMS-DPM can find model schedules that improve generation quality and speed than prior state-of-the-art methods across CIFAR-10, CelebA, ImageNet, and LSUN datasets. When applied to the public checkpoints of the Stable Diffusion model, we are able to accelerate the sampling by 2$\times$ while maintaining the generation quality. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted by ICML2023

arXiv:2306.08400 [pdf, other]

Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning

Authors: Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn

Abstract: Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate langu… ▽ More Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office's location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: International Conference on Machine Learning (ICML), 2023

arXiv:2305.18185 [pdf, other]

Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

Authors: Lindia Tjuatja, Emmy Liu, Lori Levin, Graham Neubig

Abstract: Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i.e. phenomena at the intersection of syntax and semantics. We present the semantic notion of agentivity as a case study for probing such i… ▽ More Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i.e. phenomena at the intersection of syntax and semantics. We present the semantic notion of agentivity as a case study for probing such interactions. We created a novel evaluation dataset by utilitizing the unique linguistic properties of a subset of optionally transitive English verbs. This dataset was used to prompt varying sizes of three model classes to see if they are sensitive to agentivity at the lexical level, and if they can appropriately employ these word-level priors given a specific syntactic context. Overall, GPT-3 text-davinci-003 performs extremely well across all experiments, outperforming all other models tested by far. In fact, the results are even better correlated with human judgements than both syntactic and semantic corpus statistics. This suggests that LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery than select corpora for certain tasks. Code is available at https://github.com/lindiatjuatja/lm_sem △ Less

Submitted 10 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.16567 [pdf, other]

Structured Latent Variable Models for Articulated Object Interaction

Authors: Emily Liu, Michael Noseworthy, Nicholas Roy

Abstract: In this paper, we investigate a scenario in which a robot learns a low-dimensional representation of a door given a video of the door opening or closing. This representation can be used to infer door-related parameters and predict the outcomes of interacting with the door. Current machine learning based approaches in the doors domain are based primarily on labelled datasets. However, the large qua… ▽ More In this paper, we investigate a scenario in which a robot learns a low-dimensional representation of a door given a video of the door opening or closing. This representation can be used to infer door-related parameters and predict the outcomes of interacting with the door. Current machine learning based approaches in the doors domain are based primarily on labelled datasets. However, the large quantity of available door data suggests the feasibility of a semisupervised approach based on pretraining. To exploit the hierarchical structure of the dataset where each door has multiple associated images, we pretrain with a structured latent variable model known as a neural statistician. The neural satsitician enforces separation between shared context-level variables (common across all images associated with the same door) and instance-level variables (unique to each individual image). We first demonstrate that the neural statistician is able to learn an embedding that enables reconstruction and sampling of realistic door images. Then, we evaluate the correspondence of the learned embeddings to human-interpretable parameters in a series of supervised inference tasks. It was found that a pretrained neural statistician encoder outperformed analogous context-free baselines when predicting door handedness, size, angle location, and configuration from door images. Finally, in a visual bandit door-opening task with a variety of door configuration, we found that neural statistician embeddings achieve lower regret than context-free baselines. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.16171 [pdf]

Multi-lingual and Multi-cultural Figurative Language Understanding

Authors: Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig

Abstract: Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be… ▽ More Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be universally applicable. In this work, we create a figurative language inference dataset, \datasetname, for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. We assess multilingual LMs' abilities to interpret figurative language in zero-shot and few-shot settings. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data, emphasizing the need for LMs to be exposed to a broader range of linguistic and cultural variation during training. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: ACL 2023 Findings

arXiv:2305.10830 [pdf]

Constructing a personalized AI assistant for shear wall layout using Stable Diffusion

Authors: Lufeng Wang, Jiepeng Liu, Guozhong Cheng, En Liu, Wei Chen

Abstract: Shear wall structures are widely used in high-rise residential buildings, and the layout of shear walls requires many years of design experience and iterative trial and error. Currently, there are methods based on heuristic algorithms, but they generate results too slowly. Those based on Generative Adversarial Networks (GANs) or Graph Neural Networks (GNNs) can only generate single arrangements an… ▽ More Shear wall structures are widely used in high-rise residential buildings, and the layout of shear walls requires many years of design experience and iterative trial and error. Currently, there are methods based on heuristic algorithms, but they generate results too slowly. Those based on Generative Adversarial Networks (GANs) or Graph Neural Networks (GNNs) can only generate single arrangements and require large amounts of training data. At present, Stable Diffusion is being widely used, and by using the Low-Rank Adaptation (LoRA) method to fine-tune large models with small amounts of data, good generative results can be achieved. Therefore, this paper proposes a personalized AI assistant for shear wall layout based on Stable Diffusion, which has been proven to produce good generative results through testing. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.00955 [pdf, other]

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Authors: Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins

Abstract: Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving mod… ▽ More Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention. △ Less

Submitted 31 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: Work in Progress

arXiv:2304.12821 [pdf, other]

Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Authors: Dongkun Zhang, Jintao Xue, Yuxiang Cui, Yunkai Wang, Eryun Liu, Wei Jing, Junbo Chen, Rong Xiong, Yue Wang

Abstract: Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assi… ▽ More Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.06286 [pdf, other]

Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report

Authors: Jielin Qiu, Jiacheng Zhu, Shiqi Liu, William Han, Jingqi Zhang, Chaojing Duan, Michael Rosenberg, Emerson Liu, Douglas Weber, Ding Zhao

Abstract: Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we… ▽ More Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images is more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions. △ Less

Submitted 6 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Accepted to the ML4H 2023 Proceedings track

arXiv:2304.06177 [pdf, other]

Visual based Tomato Size Measurement System for an Indoor Farming Environment

Authors: Andy Kweon, Vishnu Hu, Jong Yoon Lim, Trevor Gee, Edmond Liu, Henry Williams, Bruce A. MacDonald, Mahla Nejati, Inkyu Sa, Ho Seok Ahn

Abstract: As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement met… ▽ More As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement method combining a machine learning model and depth images captured from three low cost RGBD cameras to detect and measure the height and width of tomatoes. The performance of the presented system is evaluated on a lab environment with real tomato fruits and fake leaves to simulate occlusion in the real farm environment. To improve accuracy by addressing fruit occlusion, our three-camera system was able to achieve a height measurement accuracy of 0.9114 and a width accuracy of 0.9443. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 10 Pages, 12 Figures

arXiv:2304.03708 [pdf, other]

Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

Authors: Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu , et al. (5 additional authors not shown)

Abstract: Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi… ▽ More Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.14347 [pdf, other]

Vision-based Vineyard Navigation Solution with Automatic Annotation

Authors: Ertai Liu, Josephine Monica, Kaitlin Gold, Lance Cadle-Davidson, David Combs, Yu Jiang

Abstract: Autonomous navigation is the key to achieving the full automation of agricultural research and production management (e.g., disease management and yield prediction) using agricultural robots. In this paper, we introduced a vision-based autonomous navigation framework for agriculture robots in trellised cropping systems such as vineyards. To achieve this, we proposed a novel learning-based method t… ▽ More Autonomous navigation is the key to achieving the full automation of agricultural research and production management (e.g., disease management and yield prediction) using agricultural robots. In this paper, we introduced a vision-based autonomous navigation framework for agriculture robots in trellised cropping systems such as vineyards. To achieve this, we proposed a novel learning-based method to estimate the path traversibility heatmap directly from an RGB-D image and subsequently convert the heatmap to a preferred traversal path. An automatic annotation pipeline was developed to form a training dataset by projecting RTK GPS paths collected during the first setup in a vineyard in corresponding RGB-D images as ground-truth path annotations, allowing a fast model training and fine-tuning without costly human annotation. The trained path detection model was used to develop a full navigation framework consisting of row tracking and row switching modules, enabling a robot to traverse within a crop row and transit between crop rows to cover an entire vineyard autonomously. Extensive field trials were conducted in three different vineyards to demonstrate that the developed path detection model and navigation framework provided a cost-effective, accurate, and robust autonomous navigation solution in the vineyard and could be generalized to unseen vineyards with stable performance. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Submission to IROS 2023

arXiv:2303.14240 [pdf, other]

Adaptive Base-class Suppression and Prior Guidance Network for One-Shot Object Detection

Authors: Wenwen Zhang, Xinyu Xiao, Hangguan Shan, Eryun Liu

Abstract: One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing… ▽ More One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing this, we propose a novel framework, namely Base-class Suppression and Prior Guidance (BSPG) network to overcome the problem. Specifically, the objects of base categories can be explicitly detected by a base-class predictor and adaptively eliminated by our base-class suppression module. Moreover, a prior guidance module is designed to calculate the correlation of high-level features in a non-parametric manner, producing a class-agnostic prior map to provide the target features with rich semantic cues and guide the subsequent detection process. Equipped with the proposed two modules, we endow the model with a strong discriminative ability to distinguish the target objects from distractors belonging to the base classes. Extensive experiments show that our method outperforms the previous techniques by a large margin and achieves new state-of-the-art performance under various evaluation settings. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.01502 [pdf, other]

Computational Language Acquisition with Theory of Mind

Authors: Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig

Abstract: Unlike current state-of-the-art language models, young children actively acquire language through interactions with their surrounding environment and caretakers. One mechanism that has been argued to be critical to language learning is the ability to infer the mental states of other agents in social environments, coined Theory of Mind (ToM) by Premack & Woodruff (1978). Drawing inspiration from th… ▽ More Unlike current state-of-the-art language models, young children actively acquire language through interactions with their surrounding environment and caretakers. One mechanism that has been argued to be critical to language learning is the ability to infer the mental states of other agents in social environments, coined Theory of Mind (ToM) by Premack & Woodruff (1978). Drawing inspiration from the modern operationalized versions of ToM implemented in Rabinowitz et al. (2018) and Zhu et al. (2021), we build language-learning agents equipped with ToM, and measure its effects on the learning process. We model ToM by giving the speaker agent an internal listener model that is trained alongside the speaker and used to rerank potential utterances. We experiment with varying task difficulty, hypothesizing that models will acquire more complex language to adapt to stronger environmental pressures. We find that training speakers with a highly weighted ToM listener component leads to performance gains in our image referential game setting. We also find some evidence that increasing task difficulty in the training process results in more fluent and precise utterances in evaluation. This suggests the potential utility of further incorporating ToM, as well as other insights from child language acquisition, into computational models of language acquisition. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 9 pages, 3 figures. To be published in the 11th International Conference on Learning Representations, ICLR 2023, Conference Track Proceedings

arXiv:2302.07287 [pdf, other]

Forward Pass: On the Security Implications of Email Forwarding Mechanism and Policy

Authors: Enze Liu, Gautam Akiwate, Mattijs Jonker, Ariana Mirian, Grant Ho, Geoffrey M. Voelker, Stefan Savage

Abstract: The critical role played by email has led to a range of extension protocols (e.g., SPF, DKIM, DMARC) designed to protect against the spoofing of email sender domains. These protocols are complex as is, but are further complicated by automated email forwarding -- used by individual users to manage multiple accounts and by mailing lists to redistribute messages. In this paper, we explore how such em… ▽ More The critical role played by email has led to a range of extension protocols (e.g., SPF, DKIM, DMARC) designed to protect against the spoofing of email sender domains. These protocols are complex as is, but are further complicated by automated email forwarding -- used by individual users to manage multiple accounts and by mailing lists to redistribute messages. In this paper, we explore how such email forwarding and its implementations can break the implicit assumptions in widely deployed anti-spoofing protocols. Using large-scale empirical measurements of 20 email forwarding services (16 leading email providers and four popular mailing list services), we identify a range of security issues rooted in forwarding behavior and show how they can be combined to reliably evade existing anti-spoofing controls. We further show how these issues allow attackers to not only deliver spoofed email messages to prominent email providers (e.g., Gmail, Microsoft Outlook, and Zoho), but also reliably spoof email on behalf of tens of thousands of popular domains including sensitive domains used by organizations in government (e.g., state.gov), finance (e.g., transunion.com), law (e.g., perkinscoie.com) and news (e.g., washingtonpost.com) among others. △ Less

Submitted 19 April, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: The paper appeared at the 8th IEEE European Symposium on Security and Privacy

Journal ref: The 8th IEEE European Symposium on Security and Privacy, 2023

Showing 1–50 of 105 results for author: Liu, E