-
STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation
Authors:
Yaqi Wang,
Yifan Zhang,
Xiaodiao Chen,
Shuai Wang,
Dahong Qian,
Fan Ye,
Feng Xu,
Hongyuan Zhang,
Qianni Zhang,
Chengyu Wu,
Yunxiang Li,
Weiwei Cui,
Shan Luo,
Chengkai Wang,
Tianhao Li,
Yi Liu,
Xiang Feng,
Huiyu Zhou,
Dongyun Liu,
Qixuan Wang,
Zhouhao Lin,
Wei Song,
Yuanlin Li,
Bing Wang,
Chunshi Wang
, et al. (2 additional authors not shown)
Abstract:
Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d…
▽ More
Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics due to its low radiation dose. However, there is no open-access 2D public dataset for children's teeth and no open 3D dental CBCT dataset, which limits the development of automatic algorithms for segmenting teeth and analyzing diseases. The Semi-supervised Teeth Segmentation (STS) Challenge, a pioneering event in tooth segmentation, was held as a part of the MICCAI 2023 ToothFairy Workshop on the Alibaba Tianchi platform. This challenge aims to investigate effective semi-supervised tooth segmentation algorithms to advance the field of dentistry. In this challenge, we provide two modalities including the 2D panoramic X-ray images and the 3D CBCT tooth volumes. In Task 1, the goal was to segment tooth regions in panoramic X-ray images of both adult and pediatric teeth. Task 2 involved segmenting tooth sections using CBCT volumes. Limited labelled images with mostly unlabelled ones were provided in this challenge prompt using semi-supervised algorithms for training. In the preliminary round, the challenge received registration and result submission by 434 teams, with 64 advancing to the final round. This paper summarizes the diverse methods employed by the top-ranking teams in the STS MICCAI 2023 Challenge.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Authors:
Hao Feng,
Boyuan Zhang,
Fanjiang Ye,
Min Si,
Ching-Hsiang Chu,
Jiannan Tian,
Chunxing Yin,
Summer Deng,
Yuchen Hao,
Pavan Balaji,
Tong Geng,
Dingwen Tao
Abstract:
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we…
▽ More
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.
△ Less
Submitted 11 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Authors:
Xiyuan Wei,
Fanjiang Ye,
Ori Yonay,
Xingyu Chen,
Baixi Sun,
Dingwen Tao,
Tianbao Yang
Abstract:
Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra…
▽ More
Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstrated effective for removing the requirement of large batch size, their performance on large-scale data remains underexplored and not optimized. To bridge the gap, this paper explores several aspects of CLIP training with limited resources (e.g., up to tens of GPUs). First, we introduce FastCLIP, a general CLIP training framework built on advanced compositional optimization techniques while designed and optimized for the distributed setting. Our framework is equipped with an efficient gradient reduction strategy to reduce communication overhead. Second, to further boost training efficiency, we investigate three components of the framework from an optimization perspective: the schedule of the inner learning rate, the update rules of the temperature parameter and the model parameters, respectively. Experiments on different strategies for each component shed light on how to conduct CLIP training more efficiently. Finally, we benchmark the performance of FastCLIP and the state-of-the-art training baseline (OpenCLIP) on different compute scales up to 32 GPUs on 8 nodes, and three data scales ranging from 2.7 million, 9.1 million to 315 million image-text pairs to demonstrate the significant improvement of FastCLIP in the resource-limited setting. We release the code of FastCLIP at https://github.com/Optimization-AI/fast_clip .
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Explicit Performance Bound of Finite Blocklength Coded MIMO: Time-Domain versus Spatiotemporal Channel Coding
Authors:
Feng Ye,
Xiaohu You,
Jiamin Li,
Chuan Zhang,
Chen Ji
Abstract:
In the sixth generation (6G), ultra-reliable low-latency communications (URLLC) will further develop to achieve TKu extreme connectivity, and multiple-input multiple-output (MIMO) is expected to be a key enabler for its realization. Since the latency constraint can be represented by the blocklength of a codeword, it is essential to analyze different coded MIMO schemes under finite blocklength regi…
▽ More
In the sixth generation (6G), ultra-reliable low-latency communications (URLLC) will further develop to achieve TKu extreme connectivity, and multiple-input multiple-output (MIMO) is expected to be a key enabler for its realization. Since the latency constraint can be represented by the blocklength of a codeword, it is essential to analyze different coded MIMO schemes under finite blocklength regime. In this paper, we analyze the statistical characteristics of information density of time-domain coding and spatiotemporal coding MIMO, compute the channel capacity and dispersion, and present new explicit performance bounds of finite blocklength coded MIMO for different coding modes via normal approximation. As revealed by the analysis and simulation, spatiotemporal coding can effectively mitigate the performance loss induced by short blocklength by increasing the spatial degree of freedom (DoF). However, for time-domain coding, each spatial link is encoded independently, and the performance loss will be more severe with short blocklength under any spatial DoF. These results indicate that spatiotemporal coding can optimally exploit the spatial dimension advantages of MIMO systems compared with time-domain coding, and it has the potential to support URLLC transmission, enabling very low error-rate communication under stringent blocklength constraint.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation
Authors:
Fuda Ye,
Shuangyin Li,
Yongqi Zhang,
Lei Chen
Abstract:
Retrieval augmented generation (RAG) has been applied in many scenarios to augment large language models (LLMs) with external documents provided by retrievers. However, a semantic gap exists between LLMs and retrievers due to differences in their training objectives and architectures. This misalignment forces LLMs to passively accept the documents provided by the retrievers, leading to incomprehen…
▽ More
Retrieval augmented generation (RAG) has been applied in many scenarios to augment large language models (LLMs) with external documents provided by retrievers. However, a semantic gap exists between LLMs and retrievers due to differences in their training objectives and architectures. This misalignment forces LLMs to passively accept the documents provided by the retrievers, leading to incomprehension in the generation process, where the LLMs are burdened with the task of distinguishing these documents using their inherent knowledge. This paper proposes R$^2$AG, a novel enhanced RAG framework to fill this gap by incorporating Retrieval information into Retrieval Augmented Generation. Specifically, R$^2$AG utilizes the nuanced features from the retrievers and employs a R$^2$-Former to capture retrieval information. Then, a retrieval-aware prompting strategy is designed to integrate retrieval information into LLMs' generation. Notably, R$^2$AG suits low-source scenarios where LLMs and retrievers are frozen. Extensive experiments across five datasets validate the effectiveness, robustness, and efficiency of R$^2$AG. Our analysis reveals that retrieval information serves as an anchor to aid LLMs in the generation process, thereby filling the semantic gap.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Synergizing Foundation Models and Federated Learning: A Survey
Authors:
Shenghui Li,
Fanghua Ye,
Meng Fang,
Jiaxu Zhao,
Yun-Hin Chan,
Edith C. -H. Ngai,
Thiemo Voigt
Abstract:
The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such…
▽ More
The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such as the Internet, domain-specific FMs need proprietary data, posing a practical challenge regarding the amount of data available due to privacy concerns. Federated Learning (FL) is a collaborative learning paradigm that breaks the barrier of data availability from different participants. Therefore, it provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy. This survey paper discusses the potentials and challenges of synergizing FL and FMs and summarizes core techniques, future directions, and applications. A periodically updated paper collection on FM-FL is available at https://github.com/lishenghui/awesome-fm-fl.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory planning
Authors:
Bao Liu,
Tianbao Liu,
Zhongshuo Hu,
Fei Ye,
Lei Gao
Abstract:
The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynam…
▽ More
The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion (MF-DMOLSO) to address these limitations. MF-DMOLSO comprises three key components: initialization, swarm position update, and external archive update. The initialization unit employs chaotic mapping for uniform population distribution. The position update unit enhances behavior patterns and step size formulas for cub lions, incorporating crowding degree sorting, Pareto non-dominated sorting, and Levy flight to improve convergence speed and global search capabilities. Reference points guide convergence in higher-dimensional spaces, maintaining population diversity. An adaptive cold-hot start strategy generates a population responsive to environmental changes. The external archive update unit re-evaluates solutions based on non-domination and diversity to form the new population. Evaluations on benchmark functions showed MF-DMOLSO surpassed multi-objective particle swarm optimization, non-dominated sorting genetic algorithm II, and multi-objective lion swarm optimization, exceeding 90% accuracy for two-objective and 97% for three-objective problems. Compared to non-dominated sorting genetic algorithm III, MF-DMOLSO showed a 60% improvement. Applied to 6R robot trajectory planning, MF-DMOLSO optimized running time and maximum acceleration to 8.3s and 0.3pi rad/s^2, achieving a set coverage rate of 70.97% compared to 2% by multi-objective particle swarm optimization, thus improving efficiency and reducing mechanical dither.
△ Less
Submitted 7 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality
Authors:
Jiahuan Pei,
Irene Viola,
Haochen Huang,
Junxiao Wang,
Moonisa Ahsan,
Fanghua Ye,
Jiang Yiming,
Yao Sai,
Di Wang,
Zhumin Chen,
Pengjie Ren,
Pablo Cesar
Abstract:
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI a…
▽ More
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.
△ Less
Submitted 5 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning
Authors:
Yuchen Shi,
Shihong Duan,
Cheng Xu,
Ran Wang,
Fangwen Ye,
Chau Yuen
Abstract:
This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates…
▽ More
This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates factor graph structures on-the-fly, effectively addressing the dynamic collaboration requirements among agents. DDFG strikes an optimal balance between the computational overhead associated with aggregating value functions and the performance degradation inherent in their complete decomposition. Through the application of the max-sum algorithm, DDFG efficiently identifies optimal policies. We empirically validate DDFG's efficacy in complex scenarios, including higher-order predator-prey tasks and the StarCraft II Multi-agent Challenge (SMAC), thus underscoring its capability to surmount the limitations faced by existing value decomposition algorithms. DDFG emerges as a robust solution for MARL challenges that demand nuanced understanding and facilitation of dynamic agent collaboration. The implementation of DDFG is made publicly accessible, with the source code available at \url{https://github.com/SICC-Group/DDFG}.
△ Less
Submitted 7 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle
Authors:
Shuoyao Wang,
Jiawei Lin,
Fangwei Ye
Abstract:
Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel appr…
▽ More
Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel approach that combines imitation learning with the information bottleneck technique, to learn from the complex offline optimal scenario rather than inefficient exploration. In particular, we leverage the deterministic offline bitrate optimization problem with the future throughput realization as the expert and formulate it as a mixed-integer non-linear programming (MINLP) problem. To enable large-scale training for improved performance, we propose an alternative optimization algorithm that efficiently solves the MINLP problem. To address the issues of overfitting due to the future information leakage in MINLP, we incorporate an adversarial information bottleneck framework. By compressing the video streaming state into a latent space, we retain only action-relevant information. Additionally, we introduce a future adversarial term to mitigate the influence of future information leakage, where Model Prediction Control (MPC) policy without any future information is employed as the adverse expert. Experimental results demonstrate the effectiveness of our proposed approach in significantly enhancing the quality of adaptive video streaming, providing a 7.30\% average QoE improvement and a 30.01\% average ranking reduction.
△ Less
Submitted 12 March, 2024;
originally announced May 2024.
-
Task-Aware Low-Rank Adaptation of Segment Anything Model
Authors:
Xuehao Wang,
Feiyang Ye,
Yu Zhang
Abstract:
The Segment Anything Model (SAM), with its remarkable zero-shot capability, has been proven to be a powerful foundation model for image segmentation tasks, which is an important task in computer vision. However, the transfer of its rich semantic information to multiple different downstream tasks remains unexplored. In this paper, we propose the Task-Aware Low-Rank Adaptation (TA-LoRA) method, whic…
▽ More
The Segment Anything Model (SAM), with its remarkable zero-shot capability, has been proven to be a powerful foundation model for image segmentation tasks, which is an important task in computer vision. However, the transfer of its rich semantic information to multiple different downstream tasks remains unexplored. In this paper, we propose the Task-Aware Low-Rank Adaptation (TA-LoRA) method, which enables SAM to work as a foundation model for multi-task learning. Specifically, TA-LoRA injects an update parameter tensor into each layer of the encoder in SAM and leverages a low-rank tensor decomposition method to incorporate both task-shared and task-specific information. Furthermore, we introduce modified SAM (mSAM) for multi-task learning where we remove the prompt encoder of SAM and use task-specific no mask embeddings and mask decoder for each task. Extensive experiments conducted on benchmark datasets substantiate the efficacy of TA-LoRA in enhancing the performance of mSAM across multiple downstream tasks.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Better Understandings and Configurations in MaxSAT Local Search Solvers via Anytime Performance Analysis
Authors:
Furong Ye,
Chuan Luo,
Shaowei Cai
Abstract:
Though numerous solvers have been proposed for the MaxSAT problem, and the benchmark environment such as MaxSAT Evaluations provides a platform for the comparison of the state-of-the-art solvers, existing assessments were usually evaluated based on the quality, e.g., fitness, of the best-found solutions obtained within a given running time budget. However, concerning solely the final obtained solu…
▽ More
Though numerous solvers have been proposed for the MaxSAT problem, and the benchmark environment such as MaxSAT Evaluations provides a platform for the comparison of the state-of-the-art solvers, existing assessments were usually evaluated based on the quality, e.g., fitness, of the best-found solutions obtained within a given running time budget. However, concerning solely the final obtained solutions regarding specific time budgets may restrict us from comprehending the behavior of the solvers along the convergence process. This paper demonstrates that Empirical Cumulative Distribution Functions can be used to compare MaxSAT local search solvers' anytime performance across multiple problem instances and various time budgets. The assessment reveals distinctions in solvers' performance and displays that the (dis)advantages of solvers adjust along different running times. This work also exhibits that the quantitative and high variance assessment of anytime performance can guide machines, i.e., automatic configurators, to search for better parameter settings. Our experimental results show that the hyperparameter optimization tool, i.e., SMAC, generally achieves better parameter settings of local search when using the anytime performance as the cost function, compared to using the fitness of the best-found solutions.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Simulating Family Conversations using LLMs: Demonstration of Parenting Styles
Authors:
Frank Tian-fang Ye,
Xiaozi Gao
Abstract:
This study presents a framework for conducting psychological and linguistic research through simulated conversations using large language models (LLMs). The proposed methodology offers significant advantages, particularly for simulating human interactions involving potential unethical language or behaviors that would be impermissible in traditional experiments with human participants. As a demonst…
▽ More
This study presents a framework for conducting psychological and linguistic research through simulated conversations using large language models (LLMs). The proposed methodology offers significant advantages, particularly for simulating human interactions involving potential unethical language or behaviors that would be impermissible in traditional experiments with human participants. As a demonstration, we employed LLMs to simulate family conversations across four parenting styles (authoritarian, authoritative, permissive, and uninvolved). In general, we observed that the characteristics of the four parenting styles were portrayed in the simulated conversations. Several strategies could be used to improve the simulation quality, such as including context awareness, employing a few-shot prompting approach or fine-tuning models to cater to specific simulation requirements. Overall, this study introduces a promising methodology for conducting psychological and linguistic research through simulated conversations, while acknowledging the current limitations and proposing potential solutions for future refinement and improvement.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Graph Learning for Parameter Prediction of Quantum Approximate Optimization Algorithm
Authors:
Zhiding Liang,
Gang Liu,
Zheyuan Liu,
Jinglei Cheng,
Tianyi Hao,
Kecheng Liu,
Hang Ren,
Zhixin Song,
Ji Liu,
Fanny Ye,
Yiyu Shi
Abstract:
In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of com…
▽ More
In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of combinatorial optimization. However, practical application faces challenges due to current limitations on quantum computational resource. Our work optimizes QAOA initialization, using Graph Neural Networks (GNN) as a warm-start technique. This sacrifices affordable computational resource on classical computer to reduce quantum computational resource overhead, enhancing QAOA's effectiveness. Experiments with various GNN architectures demonstrate the adaptability and stability of our framework, highlighting the synergy between quantum algorithms and machine learning. Our findings show GNN's potential in improving QAOA performance, opening new avenues for hybrid quantum-classical approaches in quantum computing and contributing to practical applications.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Diffusion Language Models Are Versatile Protein Learners
Authors:
Xinyou Wang,
Zaixiang Zheng,
Fei Ye,
Dongyu Xue,
Shujian Huang,
Quanquan Gu
Abstract:
This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a princ…
▽ More
This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a principled way. After pre-training, DPLM exhibits the ability to generate structurally plausible, novel, and diverse protein sequences for unconditional generation. We further demonstrate the proposed diffusion generative pre-training makes DPLM possess a better understanding of proteins, making it a superior representation learner, which can be fine-tuned for various predictive tasks, comparing favorably to ESM2 (Lin et al., 2022). Moreover, DPLM can be tailored for various needs, which showcases its prowess of conditional generation in several ways: (1) conditioning on partial peptide sequences, e.g., generating scaffolds for functional motifs with high success rate; (2) incorporating other modalities as conditioner, e.g., structure-conditioned generation for inverse folding; and (3) steering sequence generation towards desired properties, e.g., satisfying specified secondary structures, through a plug-and-play classifier guidance.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing
Authors:
Limin Jiang,
Yi Shi,
Haiqin Hu,
Qingyu Deng,
Siyi Xu,
Yintao Liu,
Feng Yuan,
Si Wang,
Yihao Shen,
Fangfang Ye,
Shan Cao,
Zhiyuan Jiang
Abstract:
Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and…
▽ More
Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems
Authors:
Haoran Yin,
Diederick Vermetten,
Furong Ye,
Thomas H. W. Bäck,
Anna V. Kononova
Abstract:
When benchmarking optimization heuristics, we need to take care to avoid an algorithm exploiting biases in the construction of the used problems. One way in which this might be done is by providing different versions of each problem but with transformations applied to ensure the algorithms are equipped with mechanisms for successfully tackling a range of problems. In this paper, we investigate sev…
▽ More
When benchmarking optimization heuristics, we need to take care to avoid an algorithm exploiting biases in the construction of the used problems. One way in which this might be done is by providing different versions of each problem but with transformations applied to ensure the algorithms are equipped with mechanisms for successfully tackling a range of problems. In this paper, we investigate several of these problem transformations and show how they influence the low-level landscape features of a set of 5 problems from the CEC2022 benchmark suite. Our results highlight that even relatively small transformations can significantly alter the measured landscape features. This poses a wider question of what properties we want to preserve when creating problem transformations, and how to fairly measure them.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Anchor-based Large Language Models
Authors:
Jianhui Pang,
Fanghua Ye,
Derek Fai Wong,
Xin He,
Wanshun Chen,
Longyue Wang
Abstract:
Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading t…
▽ More
Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading to an urgent need for more efficient methods of information storage and processing. This study introduces Anchor-based LLMs (AnLLMs), which utilize an innovative anchor-based self-attention network (AnSAN) and also an anchor-based inference strategy. This approach enables LLMs to compress sequence information into an anchor token, reducing the keys/values cache and enhancing inference efficiency. Experiments on question-answering benchmarks reveal that AnLLMs maintain similar accuracy levels while achieving up to 99% keys/values cache reduction and up to 3.5 times faster inference. Despite a minor compromise in accuracy, the substantial enhancements of AnLLMs employing the AnSAN technique in resource utilization and computational efficiency underscore their potential for practical LLM applications.
△ Less
Submitted 1 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning
Authors:
Guoxin Chen,
Kexin Tang,
Chao Yang,
Fuying Ye,
Yu Qiao,
Yiming Qian
Abstract:
Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning throu…
▽ More
Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between different reasoning steps. In addition, we introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance. Our code is available at https://github.com/Chen-GX/SEER.
△ Less
Submitted 4 June, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Benchmarking LLMs via Uncertainty Quantification
Authors:
Fanghua Ye,
Mingming Yang,
Jianhui Pang,
Longyue Wang,
Derek F. Wong,
Emine Yilmaz,
Shuming Shi,
Zhaopeng Tu
Abstract:
The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking…
▽ More
The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. Our examination involves eight LLMs (LLM series) spanning five representative natural language processing tasks. Our findings reveal that: I) LLMs with higher accuracy may exhibit lower certainty; II) Larger-scale LLMs may display greater uncertainty compared to their smaller counterparts; and III) Instruction-finetuning tends to increase the uncertainty of LLMs. These results underscore the significance of incorporating uncertainty in the evaluation of LLMs.
△ Less
Submitted 25 April, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting
Authors:
Jinliang Deng,
Feiyang Ye,
Du Yin,
Xuan Song,
Ivor W. Tsang,
Hui Xiong
Abstract:
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis, characterized by extensive input sequences, as opposed to the shorter spans typical of traditional approaches. While longer sequences inherently offer richer information for enhanced predictive precision, prevailing studies often respond by escalating model complexity. These intricate models can inflat…
▽ More
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis, characterized by extensive input sequences, as opposed to the shorter spans typical of traditional approaches. While longer sequences inherently offer richer information for enhanced predictive precision, prevailing studies often respond by escalating model complexity. These intricate models can inflate into millions of parameters, resulting in prohibitive parameter scales. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation while achieving uniformly superior and robust results across various datasets. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks, using over 99 \% fewer parameters than the majority of competing methods. Through this work, we aim to unleash the power of a restricted set of parameters by capitalizing on domain characteristics--a timely reminder that in the realm of LTSF, bigger is not invariably better.
△ Less
Submitted 23 May, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization
Authors:
Feiyang Ye,
Baijiong Lin,
Xiaofeng Cao,
Yu Zhang,
Ivor Tsang
Abstract:
In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-…
▽ More
In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets. The code is available at https://github.com/Baijiong-Lin/FORUM.
△ Less
Submitted 10 July, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
Authors:
Jianhui Pang,
Fanghua Ye,
Longyue Wang,
Dian Yu,
Derek F. Wong,
Shuming Shi,
Zhaopeng Tu
Abstract:
The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017), which have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, t…
▽ More
The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017), which have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, translation of long sentences, attention model as word alignment, and sub-optimal beam search. Our empirical findings indicate that LLMs effectively lessen the reliance on parallel data for major languages in the pretraining phase. Additionally, the LLM-based translation system significantly enhances the translation of long sentences that contain approximately 80 words and shows the capability to translate documents of up to 512 words. However, despite these significant improvements, the challenges of domain mismatch and prediction of rare words persist. While the challenges of word alignment and beam search, specifically associated with NMT, may not apply to LLMs, we identify three new challenges for LLMs in translation tasks: inference efficiency, translation of low-resource languages in the pretraining phase, and human-aligned evaluation. The datasets and models are released at https://github.com/pangjh3/LLM4MT.
△ Less
Submitted 17 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts
Authors:
Diederick Vermetten,
Furong Ye,
Thomas Bäck,
Carola Doerr
Abstract:
Choosing a set of benchmark problems is often a key component of any empirical evaluation of iterative optimization heuristics. In continuous, single-objective optimization, several sets of problems have become widespread, including the well-established BBOB suite. While this suite is designed to enable rigorous benchmarking, it is also commonly used for testing methods such as algorithm selection…
▽ More
Choosing a set of benchmark problems is often a key component of any empirical evaluation of iterative optimization heuristics. In continuous, single-objective optimization, several sets of problems have become widespread, including the well-established BBOB suite. While this suite is designed to enable rigorous benchmarking, it is also commonly used for testing methods such as algorithm selection, which the suite was never designed around.
We present the MA-BBOB function generator, which uses the BBOB suite as component functions in an affine combination. In this work, we describe the full procedure to create these affine combinations and highlight the trade-offs of several design decisions, specifically the choice to place the optimum uniformly at random in the domain. We then illustrate how this generator can be used to gain more low-level insight into the function landscapes through the use of exploratory landscape analysis.
Finally, we show a potential use-case of MA-BBOB in generating a wide set of training and testing data for algorithm selectors. Using this setup, we show that the basic scheme of using a set of landscape features to predict the best algorithm does not lead to optimal results, and that an algorithm selector trained purely on the BBOB functions generalizes poorly to the affine combinations.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
Authors:
Shitong Sun,
Fanghua Ye,
Shaogang Gong
Abstract:
Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text. It has recently attracted attention due to the collaboration of information-rich images and concise language to precisely express the requirements of target images. Most current composed image retrieval methods follow a supervised…
▽ More
Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text. It has recently attracted attention due to the collaboration of information-rich images and concise language to precisely express the requirements of target images. Most current composed image retrieval methods follow a supervised learning approach to training on a costly triplet dataset composed of a reference image, modified text, and a corresponding target image. To avoid difficult to-obtain labeled triplet training data, zero-shot composed image retrieval (ZS-CIR) has been introduced, which aims to retrieve the target image by learning from image-text pairs (self-supervised triplets), without the need for human-labeled triplets. However, this self-supervised triplet learning approach is computationally less effective and less understandable as it assumes the interaction between image and text is conducted with implicit query embedding without explicit semantical interpretation. In this work, we present a new training-free zero-shot composed image retrieval method which translates the query into explicit human-understandable text. This helps improve model learning efficiency to enhance the generalization capacity of foundation models. Further, we introduce a Local Concept Re-ranking (LCR) mechanism to focus on discriminative local information extracted from the modified instructions. Extensive experiments on four ZS-CIR benchmarks show that our method achieves comparable performances to that of the state of-the-art triplet training based methods, but significantly outperforms other training-free methods on the open domain datasets (CIRR, CIRCO and COCO), as well as the fashion domain dataset (FashionIQ).
△ Less
Submitted 24 March, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
A Unified Framework for Unsupervised Domain Adaptation based on Instance Weighting
Authors:
Jinjing Zhu,
Feiyang Ye,
Qiao Xiao,
Pengxin Guo,
Yu Zhang,
Qiang Yang
Abstract:
Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal doma…
▽ More
Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal domain adaptation, where shared common classes between source and target domains coexist alongside domain-specific private classes. The prominent challenges inherent in diverse UDA settings center around the discrimination of common/private classes and the precise measurement of domain discrepancy. To surmount these challenges effectively, we propose a novel yet effective method called Learning Instance Weighting for Unsupervised Domain Adaptation (LIWUDA), which caters to various UDA settings. Specifically, the proposed LIWUDA method constructs a weight network to assign weights to each instance based on its probability of belonging to common classes, and designs Weighted Optimal Transport (WOT) for domain alignment by leveraging instance weights. Additionally, the proposed LIWUDA method devises a Separate and Align (SA) loss to separate instances with low similarities and align instances with high similarities. To guide the learning of the weight network, Intra-domain Optimal Transport (IOT) is proposed to enforce the weights of instances in common classes to follow a uniform distribution. Through the integration of those three components, the proposed LIWUDA method demonstrates its capability to address all four UDA settings in a unified manner. Experimental evaluations conducted on three benchmark datasets substantiate the effectiveness of the proposed LIWUDA method.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Turn-Level Active Learning for Dialogue State Tracking
Authors:
Zihan Zhang,
Meng Fang,
Fanghua Ye,
Ling Chen,
Mohammad-Reza Namazi-Rad
Abstract:
Dialogue state tracking (DST) plays an important role in task-oriented dialogue systems. However, collecting a large amount of turn-by-turn annotated dialogue data is costly and inefficient. In this paper, we propose a novel turn-level active learning framework for DST to actively select turns in dialogues to annotate. Given the limited labelling budget, experimental results demonstrate the effect…
▽ More
Dialogue state tracking (DST) plays an important role in task-oriented dialogue systems. However, collecting a large amount of turn-by-turn annotated dialogue data is costly and inefficient. In this paper, we propose a novel turn-level active learning framework for DST to actively select turns in dialogues to annotate. Given the limited labelling budget, experimental results demonstrate the effectiveness of selective annotation of dialogue turns. Additionally, our approach can effectively achieve comparable DST performance to traditional training approaches with significantly less annotated data, which provides a more efficient way to annotate new dialogue data.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting
Authors:
Fanghua Ye,
Meng Fang,
Shenghui Li,
Emine Yilmaz
Abstract:
Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large languag…
▽ More
Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.
△ Less
Submitted 18 October, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Lending Interaction Wings to Recommender Systems with Conversational Agents
Authors:
Jiarui Jin,
Xianyu Chen,
Fanghua Ye,
Mengyue Yang,
Yue Feng,
Weinan Zhang,
Yong Yu,
Jun Wang
Abstract:
Recommender systems trained on offline historical user behaviors are embracing conversational techniques to online query user preference. Unlike prior conversational recommendation approaches that systemically combine conversational and recommender parts through a reinforcement learning framework, we propose CORE, a new offline-training and online-checking paradigm that bridges a COnversational ag…
▽ More
Recommender systems trained on offline historical user behaviors are embracing conversational techniques to online query user preference. Unlike prior conversational recommendation approaches that systemically combine conversational and recommender parts through a reinforcement learning framework, we propose CORE, a new offline-training and online-checking paradigm that bridges a COnversational agent and REcommender systems via a unified uncertainty minimization framework. It can benefit any recommendation platform in a plug-and-play style. Here, CORE treats a recommender system as an offline relevance score estimator to produce an estimated relevance score for each item; while a conversational agent is regarded as an online relevance score checker to check these estimated scores in each session. We define uncertainty as the summation of unchecked relevance scores. In this regard, the conversational agent acts to minimize uncertainty via querying either attributes or items. Based on the uncertainty minimization framework, we derive the expected certainty gain of querying each attribute and item, and develop a novel online decision tree algorithm to decide what to query at each turn. Experimental results on 8 industrial datasets show that CORE could be seamlessly employed on 9 popular recommendation approaches. We further demonstrate that our conversational agent could communicate as a human if empowered by a pre-trained large language model.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation
Authors:
Xiang Liu,
Liangxi Liu,
Feiyang Ye,
Yunheng Shen,
Xia Li,
Linshan Jiang,
Jialin Li
Abstract:
Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing communication overhead, one-shot federated learning (i.e., limiting client-server communication into a single round) has gained popularity among research…
▽ More
Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing communication overhead, one-shot federated learning (i.e., limiting client-server communication into a single round) has gained popularity among researchers. However, the one-shot aggregation performances are sensitively affected by the non-identical training data distribution, which exhibits high statistical heterogeneity in some real-world scenarios. To address this issue, we propose a novel one-shot aggregation method with layer-wise posterior aggregation, named FedLPA. FedLPA aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any private label information, e.g., label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.
△ Less
Submitted 21 May, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Chinese herb medicine in augmented reality
Authors:
Qianyun Zhu,
Yifeng Xie,
Fangyang Ye,
Zhenyuan Gao,
Binjie Che,
Zhenglin Chen,
Dongmei Yu
Abstract:
Augmented reality becomes popular in education gradually, which provides a contextual and adaptive learning experience. Here, we develop a Chinese herb medicine AR platform based the 3dsMax and the Unity that allows users to visualize and interact with the herb model and learn the related information. The users use their mobile camera to scan the 2D herb picture to trigger the presentation of 3D A…
▽ More
Augmented reality becomes popular in education gradually, which provides a contextual and adaptive learning experience. Here, we develop a Chinese herb medicine AR platform based the 3dsMax and the Unity that allows users to visualize and interact with the herb model and learn the related information. The users use their mobile camera to scan the 2D herb picture to trigger the presentation of 3D AR model and corresponding text information on the screen in real-time. The system shows good performance and has high accuracy for the identification of herbal medicine after interference test and occlusion test. Users can interact with the herb AR model by rotating, scaling, and viewing transformation, which effectively enhances learners' interest in Chinese herb medicine.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students
Authors:
Yu Ye,
Gongjin Zhang,
Hongbiao Si,
Liang Xu,
Shenghua Hu,
Yong Li,
Xulong Zhang,
Kaiyu Hu,
Fangzhou Ye
Abstract:
Blended learning is generally defined as the combination of traditional face-to-face learning and online learning. This learning mode has been widely used in advanced education across the globe due to the COVID-19 pandemic's social distance restriction as well as the development of technology. Online learning plays an important role in blended learning, and as it requires more student autonomy, th…
▽ More
Blended learning is generally defined as the combination of traditional face-to-face learning and online learning. This learning mode has been widely used in advanced education across the globe due to the COVID-19 pandemic's social distance restriction as well as the development of technology. Online learning plays an important role in blended learning, and as it requires more student autonomy, the quality of blended learning in advanced education has been a persistent concern. Existing literature offers several elements and frameworks regarding evaluating the quality of blended learning. However, most of them either have different favours for evaluation perspectives or simply offer general guidance for evaluation, reducing the completeness, objectivity and practicalness of related works. In order to carry out a more intuitive and comprehensive evaluation framework, this paper proposes a hierarchy-based analysis approach. Applying gradient boosting model and feature importance evaluation method, this approach mainly analyses student engagement and its three identified dimensions (behavioral engagement, emotional engagement, cognitive engagement) to eliminate some existing stubborn problems when it comes to blended learning evaluation. The results show that cognitive engagement and emotional engagement play a more important role in blended learning evaluation, implying that these two should be considered to improve for better learning as well as teaching quality.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding
Authors:
Chuanmin Jia,
Feng Ye,
Fanke Dong,
Kai Lin,
Leonardo Chiariglione,
Siwei Ma,
Huifang Sun,
Wen Gao
Abstract:
The rapid advancement of artificial intelligence (AI) technology has led to the prioritization of standardizing the processing, coding, and transmission of video using neural networks. To address this priority area, the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a suite of standards called MPAI-EEV for "end-to-end optimized neural video coding." Th…
▽ More
The rapid advancement of artificial intelligence (AI) technology has led to the prioritization of standardizing the processing, coding, and transmission of video using neural networks. To address this priority area, the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a suite of standards called MPAI-EEV for "end-to-end optimized neural video coding." The aim of this AI-based video standard project is to compress the number of bits required to represent high-fidelity video data by utilizing data-trained neural coding technologies. This approach is not constrained by how data coding has traditionally been applied in the context of a hybrid framework. This paper presents an overview of recent and ongoing standardization efforts in this area and highlights the key technologies and design philosophy of EEV. It also provides a comparison and report on some primary efforts such as the coding efficiency of the reference model. Additionally, it discusses emerging activities such as learned Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under development, or in the exploration phase. With a focus on UAV video signals, this paper addresses the current status of these preliminary efforts. It also indicates development timelines, summarizes the main technical details, and provides pointers to further points of reference. The exploration experiment shows that the EEV model performs better than the state-of-the-art video coding standard H.266/VVC in terms of perceptual evaluation metric.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Dual-Balancing for Multi-Task Learning
Authors:
Baijiong Lin,
Weisen Jiang,
Feiyang Ye,
Yu Zhang,
Pengguang Chen,
Ying-Cong Chen,
Shu Liu,
James T. Kwok
Abstract:
Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task balancing problem remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task b…
▽ More
Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task balancing problem remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task balancing problem from both loss and gradient perspectives. Specifically, DB-MTL ensures loss-scale balancing by performing a logarithm transformation on each task loss, and guarantees gradient-magnitude balancing via normalizing all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the state-of-the-art performance of DB-MTL.
△ Less
Submitted 29 September, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Authors:
Fulong Ye,
Guang Liu,
Xinya Wu,
Ledell Wu
Abstract:
Large Text-to-Image(T2I) diffusion models have shown a remarkable capability to produce photorealistic and diverse images based on text inputs. However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multil…
▽ More
Large Text-to-Image(T2I) diffusion models have shown a remarkable capability to produce photorealistic and diverse images based on text inputs. However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multilingual T2I diffusion model that supports eighteen different languages. Specifically, we first train a multilingual text encoder based on the knowledge distillation. Then we plug it into a pretrained English-only diffusion model and train the model with a two-stage schema to enhance the multilingual capability, including concept alignment and quality improvement stage on a large-scale multilingual dataset. Furthermore, we introduce a new benchmark, which includes Multilingual-General-18(MG-18) and Multilingual-Cultural-18(MC-18) datasets, to evaluate the capabilities of T2I diffusion models for generating high-quality images and capturing culture-specific concepts in different languages. Experimental results on both MG-18 and MC-18 demonstrate that AltDiffusion outperforms current state-of-the-art T2I models, e.g., Stable Diffusion in multilingual understanding, especially with respect to culture-specific concepts, while still having comparable capability for generating high-quality images. All source code and checkpoints could be found in https://github.com/superhero-7/AltDiffuson.
△ Less
Submitted 23 August, 2023; v1 submitted 19 August, 2023;
originally announced August 2023.
-
Whether you can locate or not? Interactive Referring Expression Generation
Authors:
Fulong Ye,
Yuxing Long,
Fangxiang Feng,
Xiaojie Wang
Abstract:
Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object. Existing methods construct REG models independently by using only the REs as ground truth for model training, without considering the potential interaction between REG and REC models…
▽ More
Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object. Existing methods construct REG models independently by using only the REs as ground truth for model training, without considering the potential interaction between REG and REC models. In this paper, we propose an Interactive REG (IREG) model that can interact with a real REC model, utilizing signals indicating whether the object is located and the visual region located by the REC model to gradually modify REs. Our experimental results on three RE benchmark datasets, RefCOCO, RefCOCO+, and RefCOCOg show that IREG outperforms previous state-of-the-art methods on popular evaluation metrics. Furthermore, a human evaluation shows that IREG generates better REs with the capability of interaction.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification
Authors:
Pei Liu,
Luping Ji,
Xinyu Zhang,
Feng Ye
Abstract:
Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL m…
▽ More
Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL models from adequate and efficient training, suppressing the continuous performance promotion of classification models on WSIs. Inspired by the basic idea of Mixup, this paper proposes a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models. This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by pseudo-bags, our PseMix fulfills the critical size alignment and semantic alignment in Mixup strategy. Moreover, it is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions. Comparative experiments and ablation studies are specially designed to evaluate the effectiveness and advantages of our PseMix. Experimental results show that PseMix could often assist state-of-the-art MIL networks to refresh their classification performance on WSIs. Besides, it could also boost the generalization performance of MIL models in special test scenarios, and promote their robustness to patch occlusion and label noise. Our source code is available at https://github.com/liupei101/PseMix.
△ Less
Submitted 2 November, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts
Authors:
Diederick Vermetten,
Furong Ye,
Thomas Bäck,
Carola Doerr
Abstract:
Extending a recent suggestion to generate new instances for numerical black-box optimization benchmarking by interpolating pairs of the well-established BBOB functions from the COmparing COntinuous Optimizers (COCO) platform, we propose in this work a further generalization that allows multiple affine combinations of the original instances and arbitrarily chosen locations of the global optima. We…
▽ More
Extending a recent suggestion to generate new instances for numerical black-box optimization benchmarking by interpolating pairs of the well-established BBOB functions from the COmparing COntinuous Optimizers (COCO) platform, we propose in this work a further generalization that allows multiple affine combinations of the original instances and arbitrarily chosen locations of the global optima. We demonstrate that the MA-BBOB generator can help fill the instance space, while overall patterns in algorithm performance are preserved. By combining the landscape features of the problems with the performance data, we pose the question of whether these features are as useful for algorithm selection as previous studies suggested. MA-BBOB is built on the publicly available IOHprofiler platform, which facilitates standardized experimentation routines, provides access to the interactive IOHanalyzer module for performance analysis and visualization, and enables comparisons with the rich and growing data collection available for the (MA-)BBOB functions.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
Authors:
Fanghua Ye,
Zhiyuan Hu,
Emine Yilmaz
Abstract:
Dialogue systems have received increasing attention while automatically evaluating their performance remains challenging. User satisfaction estimation (USE) has been proposed as an alternative. It assumes that the performance of a dialogue system can be measured by user satisfaction and uses an estimator to simulate users. The effectiveness of USE depends heavily on the estimator. Existing estimat…
▽ More
Dialogue systems have received increasing attention while automatically evaluating their performance remains challenging. User satisfaction estimation (USE) has been proposed as an alternative. It assumes that the performance of a dialogue system can be measured by user satisfaction and uses an estimator to simulate users. The effectiveness of USE depends heavily on the estimator. Existing estimators independently predict user satisfaction at each turn and ignore satisfaction dynamics across turns within a dialogue. In order to fully simulate users, it is crucial to take satisfaction dynamics into account. To fill this gap, we propose a new estimator ASAP (sAtisfaction eStimation via HAwkes Process) that treats user satisfaction across turns as an event sequence and employs a Hawkes process to effectively model the dynamics in this sequence. Experimental results on four benchmark dialogue datasets demonstrate that ASAP can substantially outperform state-of-the-art baseline estimators.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
When to be Discrete: Analyzing Algorithm Performance on Discretized Continuous Problems
Authors:
André Thomaser,
Jacob de Nobel,
Diederick Vermetten,
Furong Ye,
Thomas Bäck,
Anna V. Kononova
Abstract:
The domain of an optimization problem is seen as one of its most important characteristics. In particular, the distinction between continuous and discrete optimization is rather impactful. Based on this, the optimizing algorithm, analyzing method, and more are specified. However, in practice, no problem is ever truly continuous. Whether this is caused by computing limits or more tangible propertie…
▽ More
The domain of an optimization problem is seen as one of its most important characteristics. In particular, the distinction between continuous and discrete optimization is rather impactful. Based on this, the optimizing algorithm, analyzing method, and more are specified. However, in practice, no problem is ever truly continuous. Whether this is caused by computing limits or more tangible properties of the problem, most variables have a finite resolution.
In this work, we use the notion of the resolution of continuous variables to discretize problems from the continuous domain. We explore how the resolution impacts the performance of continuous optimization algorithms. Through a mapping to integer space, we are able to compare these continuous optimizers to discrete algorithms on the exact same problems. We show that the standard $(μ_W, λ)$-CMA-ES fails when discretization is added to the problem.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Learning Harmonic Molecular Representations on Riemannian Manifold
Authors:
Yiqun Wang,
Yuning Shen,
Shi Chen,
Lihao Wang,
Fei Ye,
Hao Zhou
Abstract:
Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular…
▽ More
Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Towards Self-adaptive Mutation in Evolutionary Multi-Objective Algorithms
Authors:
Furong Ye,
Frank Neumann,
Jacob de Nobel,
Aneta Neumann,
Thomas Bäck
Abstract:
Parameter control has succeeded in accelerating the convergence process of evolutionary algorithms. While empirical and theoretical studies have shed light on the behavior of algorithms for single-objective optimization, little is known about how self-adaptation influences multi-objective evolutionary algorithms. In this work, we contribute (1) extensive experimental analysis of the Global Simple…
▽ More
Parameter control has succeeded in accelerating the convergence process of evolutionary algorithms. While empirical and theoretical studies have shed light on the behavior of algorithms for single-objective optimization, little is known about how self-adaptation influences multi-objective evolutionary algorithms. In this work, we contribute (1) extensive experimental analysis of the Global Simple Evolutionary Multi-objective Algorithm (GSEMO) variants on classic problems, such as OneMinMax, LOTZ, COCZ, and (2) a novel version of GSEMO with self-adaptive mutation.
To enable self-adaptation in GSEMO, we explore three self-adaptive mutation techniques from single-objective optimization and use various performance metrics, such as hypervolume and inverted generational distance, to guide the adaptation. Our experiments show that adapting the mutation rate based on single-objective optimization and hypervolume can speed up the convergence of GSEMO. Moreover, we propose a GSEMO with self-adaptive mutation, which considers optimizing for single objectives and adjusts the mutation rate for each solution individually. Our results demonstrate that the proposed method outperforms the GSEMO with static mutation rates across all the tested problems.
This work provides a comprehensive benchmarking study for MOEAs and complements existing theoretical runtime analysis. Our proposed algorithm addresses interesting issues for designing MOEAs for future practical applications.
△ Less
Submitted 8 May, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Using Affine Combinations of BBOB Problems for Performance Assessment
Authors:
Diederick Vermetten,
Furong Ye,
Carola Doerr
Abstract:
Benchmarking plays a major role in the development and analysis of optimization algorithms. As such, the way in which the used benchmark problems are defined significantly affects the insights that can be gained from any given benchmark study. One way to easily extend the range of available benchmark functions is through affine combinations between pairs of functions. From the perspective of lands…
▽ More
Benchmarking plays a major role in the development and analysis of optimization algorithms. As such, the way in which the used benchmark problems are defined significantly affects the insights that can be gained from any given benchmark study. One way to easily extend the range of available benchmark functions is through affine combinations between pairs of functions. From the perspective of landscape analysis, these function combinations smoothly transition between the two base functions.
In this work, we show how these affine function combinations can be used to analyze the behavior of optimization algorithms. In particular, we highlight that by varying the weighting between the combined problems, we can gain insights into the effects of added global structure on the performance of optimization algorithms. By analyzing performance trajectories on more function combinations, we also show that aspects such as the scaling of objective functions and placement of the optimum can greatly impact how these results are interpreted.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Structure-informed Language Models Are Protein Designers
Authors:
Zaixiang Zheng,
Yifan Deng,
Dongyu Xue,
Yi Zhou,
Fei YE,
Quanquan Gu
Abstract:
This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We co…
▽ More
This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that LM-Design improves the state-of-the-art results by a large margin, leading to up to 4% to 12% accuracy gains in sequence recovery (e.g., 55.65%/56.63% on CATH 4.2/4.3 single-chain benchmarks, and >60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e.g., antibodies and de novo proteins)
△ Less
Submitted 9 February, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler
Authors:
Frank Neumann,
Aneta Neumann,
Chao Qian,
Viet Anh Do,
Jacob de Nobel,
Diederick Vermetten,
Saba Sadeghi Ahouei,
Furong Ye,
Hao Wang,
Thomas Bäck
Abstract:
Submodular functions play a key role in the area of optimization as they allow to model many real-world problems that face diminishing returns. Evolutionary algorithms have been shown to obtain strong theoretical performance guarantees for a wide class of submodular problems under various types of constraints while clearly outperforming standard greedy approximation algorithms. This paper introduc…
▽ More
Submodular functions play a key role in the area of optimization as they allow to model many real-world problems that face diminishing returns. Evolutionary algorithms have been shown to obtain strong theoretical performance guarantees for a wide class of submodular problems under various types of constraints while clearly outperforming standard greedy approximation algorithms. This paper introduces a setup for benchmarking algorithms for submodular optimization problems with the aim to provide researchers with a framework to enhance and compare the performance of new algorithms for submodular problems. The focus is on the development of iterative search algorithms such as evolutionary algorithms with the implementation provided and integrated into IOHprofiler which allows for tracking and comparing the progress and performance of iterative search algorithms. We present a range of submodular optimization problems that have been integrated into IOHprofiler and show how the setup can be used for analyzing and comparing iterative search algorithms in various settings.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
On Pre-trained Language Models for Antibody
Authors:
Danqing Wang,
Fei Ye,
Hao Zhou
Abstract:
Antibodies are vital proteins offering robust protection for the human body from pathogens. The development of general protein and antibody-specific pre-trained language models both facilitate antibody prediction tasks. However, there have been limited studies that comprehensively explore the representation capability of distinct pre-trained language models on different antibody tasks. To investig…
▽ More
Antibodies are vital proteins offering robust protection for the human body from pathogens. The development of general protein and antibody-specific pre-trained language models both facilitate antibody prediction tasks. However, there have been limited studies that comprehensively explore the representation capability of distinct pre-trained language models on different antibody tasks. To investigate the problem, we aim to answer several key questions in this paper, such as how pre-trained language models perform in antibody tasks with different specificity and how introducing specific biological mechanisms to the pre-training process can benefit the model. Additionally, we evaluate if the learned antibody pre-trained representations can be applied to real-world antibody problems, like drug discovery and immune process understanding. Previously, no benchmark available largely hindered the study to answer these questions. To aid in our investigation, we provide an AnTibody Understanding Evaluation (ATUE) benchmark. We comprehensively evaluate the performance of protein pre-trained language models by empirical study along with conclusions and new insights. Our ATUE and code are released at https://github.com/dqwang122/EATLM.
△ Less
Submitted 1 March, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Learning to Compress Unmanned Aerial Vehicle (UAV) Captured Video: Benchmark and Analysis
Authors:
Chuanmin Jia,
Feng Ye,
Huifang Sun,
Siwei Ma,
Wen Gao
Abstract:
During the past decade, the Unmanned-Aerial-Vehicles (UAVs) have attracted increasing attention due to their flexible, extensive, and dynamic space-sensing capabilities. The volume of video captured by UAVs is exponentially growing along with the increased bitrate generated by the advancement of the sensors mounted on UAVs, bringing new challenges for on-device UAV storage and air-ground data tran…
▽ More
During the past decade, the Unmanned-Aerial-Vehicles (UAVs) have attracted increasing attention due to their flexible, extensive, and dynamic space-sensing capabilities. The volume of video captured by UAVs is exponentially growing along with the increased bitrate generated by the advancement of the sensors mounted on UAVs, bringing new challenges for on-device UAV storage and air-ground data transmission. Most existing video compression schemes were designed for natural scenes without consideration of specific texture and view characteristics of UAV videos. In this work, we first contribute a detailed analysis of the current state of the field of UAV video coding. Then we propose to establish a novel task for learned UAV video coding and construct a comprehensive and systematic benchmark for such a task, present a thorough review of high quality UAV video datasets and benchmarks, and contribute extensive rate-distortion efficiency comparison of learned and conventional codecs after. Finally, we discuss the challenges of encoding UAV videos. It is expected that the benchmark will accelerate the research and development in video coding on drone platforms.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Authors:
Yuxing Long,
Binyuan Hui,
Fulong Ye,
Yanyang Li,
Zhuoxin Han,
Caixia Yuan,
Yongbin Li,
Xiaojie Wang
Abstract:
Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Petrained with Multimodal Questions from INcrement…
▽ More
Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Petrained with Multimodal Questions from INcremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Incremental Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
Accelerating Antimicrobial Peptide Discovery with Latent Structure
Authors:
Danqing Wang,
Zeyu Wen,
Fei Ye,
Lei Li,
Hao Zhou
Abstract:
Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-sca…
▽ More
Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-scale vector quantization in the latent space to represent secondary structures (e.g. alpha helix and beta sheet). By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of antimicrobial activity. Our wet laboratory experiments verified that two of the 21 candidates exhibit strong antimicrobial activity. The code is released at https://github.com/dqwang122/LSSAMP.
△ Less
Submitted 20 August, 2023; v1 submitted 28 November, 2022;
originally announced December 2022.
-
AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images
Authors:
Pei Liu,
Luping Ji,
Feng Ye,
Bo Fu
Abstract:
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a c…
▽ More
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a completely-certain point estimation of time-to-event, and they could only learn from the labeled WSI data currently at a small scale. To tackle these problems, we propose a novel adversarial multiple instance learning (AdvMIL) framework. This framework is based on adversarial time-to-event modeling, and integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning. It is a plug-and-play one, so that most existing MIL-based end-to-end methods can be easily upgraded by applying this framework, gaining the improved abilities of survival distribution estimation and semi-supervised learning. Our extensive experiments show that AdvMIL not only could often bring performance improvement to mainstream WSI survival analysis methods at a relatively low computational cost, but also enables these methods to effectively utilize unlabeled data via semi-supervised learning. Moreover, it is observed that AdvMIL could help improving the robustness of models against patch occlusion and two representative image noises. The proposed AdvMIL framework could promote the research of survival analysis in computational pathology with its novel adversarial MIL paradigm.
△ Less
Submitted 5 April, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.