Skip to main content

Showing 1–50 of 382 results for author: Zheng, K

  1. arXiv:2407.09024  [pdf, other

    cs.LG

    Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

    Authors: Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

    Abstract: Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generaliz… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2406.15735  [pdf, other

    cs.CV cs.AI

    Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

    Authors: Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu

    Abstract: Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Project page: https://cond-image-leak.github.io/

  3. arXiv:2406.14015  [pdf, other

    cs.LG

    CohortNet: Empowering Cohort Discovery for Interpretable Healthcare Analytics

    Authors: Qingpeng Cai, Kaiping Zheng, H. V. Jagadish, Beng Chin Ooi, James Yip

    Abstract: Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that lack medical relevance. Automating cohort studies with interpretable patterns has great potential to facilitate healthcare analysis but remains an unmet need in p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages, 12 figures

  4. arXiv:2406.12707  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

    Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

    Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, ACL24 accepted

  5. arXiv:2406.11357  [pdf, other

    cs.CL cs.AI

    Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

    Authors: Zhonghao Li, Xuming Hu, Aiwei Liu, Kening Zheng, Sirui Huang, Hui Xiong

    Abstract: Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  6. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.06776  [pdf, other

    cs.CV cs.LG

    SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models

    Authors: James Lowman, Kelly Liu Zheng, Roydon Fraser, Jesse Van Griensven The, Mojtaba Valipour

    Abstract: SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Work in Progress!

  10. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  11. arXiv:2406.01844  [pdf, other

    astro-ph.IM

    The Simons Observatory: Studies of Detector Yield and Readout Noise From the First Large-Scale Deployment of Microwave Multiplexing at the Large Aperture Telescope

    Authors: Thomas P. Satterthwaite, Zeeshan Ahmed, Kyuyoung Bae, Mark Devlin, Simon Dicker, Shannon M. Duff, Daniel Dutcher, Saianeesh K. Haridas, Shawn W. Henderson, Johannes Hubmayr, Bradley R. Johnson, Anna Kofman, Jack Lashner, Michael J. Link, Tammy J. Lucas, Alex Manduca, Michael D. Niemack, John Orlowski-Scherer, Tristan Pinsonneault-Marotte, Max Silva-Feaver, Suzanne Staggs, Eve M. Vavagiakis, Yuhan Wang, Kaiwen Zheng

    Abstract: The Simons Observatory is a new ground-based cosmic microwave background experiment, which is currently being commissioned in Chile's Atacama Desert. During its survey, the observatory's small aperture telescopes will map 10% of the sky in bands centered at frequencies ranging from 27 to 280 GHz to constrain cosmic inflation models, and its large aperture telescope will map 40% of the sky in the s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures, 1 table. To be presented at SPIE Astronomical Telescopes + Instrumentation 2024

  12. arXiv:2405.18756  [pdf, other

    cs.LG cs.AI cs.CV stat.AP stat.ML

    Provable Contrastive Continual Learning

    Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang

    Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  13. arXiv:2405.16928  [pdf

    cs.SI cs.GT

    TopoLa: a novel embedding framework for understanding complex networks

    Authors: Kai Zheng, Qilong Feng, Yaohang Li, Qichang Zhao, Jinhui Xu, Jianxin Wang

    Abstract: Complex networks, which are the abstractions of many real-world systems, present a persistent challenge across disciplines for people to decipher their underlying information. Recently, hyperbolic geometry of latent spaces has gained traction in network analysis, due to its ability to preserve certain local intrinsic properties of the nodes. In this study, we explore the problem from a much broade… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 85 pages, 17 figures

  14. arXiv:2405.15885  [pdf, other

    cs.LG stat.ML

    Diffusion Bridge Implicit Models

    Authors: Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

    Abstract: Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluatio… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  16. arXiv:2405.10951  [pdf, other

    cs.CV cs.LG

    Block Selective Reprogramming for On-device Training of Vision Transformers

    Authors: Sreetama Sarkar, Souvik Kundu, Kai Zheng, Peter A. Beerel

    Abstract: The ubiquity of vision transformers (ViTs) for various edge applications, including personalized learning, has created the demand for on-device fine-tuning. However, training with the limited memory and computation power of edge devices remains a significant challenge. In particular, the memory required for training is much higher than that needed for inference, primarily due to the need to store… ▽ More

    Submitted 25 March, 2024; originally announced May 2024.

  17. arXiv:2405.06868  [pdf, other

    astro-ph.IM astro-ph.CO

    Simons Observatory: Pre-deployment Performance of a Large Aperture Telescope Optics Tube in the 90 and 150 GHz Spectral Bands

    Authors: Carlos E. Sierra, Kathleen Harrington, Shreya Sutariya, Thomas Alford, Anna M. Kofman, Grace E. Chesmore, Jason E. Austermann, Andrew Bazarko, James A. Beall, Tanay Bhandarkar, Mark J. Devlin, Simon R. Dicker, Peter N. Dow, Shannon M. Duff, Daniel Dutcher, Nicholas Galitzki, Joseph E. Golec, John C. Groh, Jon E. Gudmundsson, Saianeesh K. Haridas, Erin Healy, Johannes Hubmayr, Jeffrey Iuliano, Bradley R. Johnson, Claire S. Lessler , et al. (20 additional authors not shown)

    Abstract: The Simons Observatory will map the temperature and polarization over half of the sky, at millimeter wavelengths in six spectral bands from the Atacama Desert in Chile. These data will provide new insights into the genesis, content, and history of our Universe; the astrophysics of galaxies and galaxy clusters; objects in our solar system; and time-varying astrophysical phenomena. This ambitious ne… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  18. arXiv:2405.05550  [pdf, other

    astro-ph.IM astro-ph.CO

    The Simons Observatory: Design, integration, and testing of the small aperture telescopes

    Authors: Nicholas Galitzki, Tran Tsan, Jake Spisak, Michael Randall, Max Silva-Feaver, Joseph Seibert, Jacob Lashner, Shunsuke Adachi, Sean M. Adkins, Thomas Alford, Kam Arnold, Peter C. Ashton, Jason E. Austermann, Carlo Baccigalupi, Andrew Bazarko, James A. Beall, Sanah Bhimani, Bryce Bixler, Gabriele Coppi, Lance Corbett, Kevin D. Crowley, Kevin T. Crowley, Samuel Day-Weiss, Simon Dicker, Peter N. Dow , et al. (55 additional authors not shown)

    Abstract: The Simons Observatory (SO) is a cosmic microwave background (CMB) survey experiment that includes small-aperture telescopes (SATs) observing from an altitude of 5,200 m in the Atacama Desert in Chile. The SO SATs will cover six spectral bands between 27 and 280 GHz to search for primordial B-modes to a sensitivity of $σ(r)=0.002$, with quantified systematic errors well below this value. Each SAT… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  19. arXiv:2405.04844  [pdf, ps, other

    cs.IR

    Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

    Authors: Kai Zheng, Haijun Zhao, Rui Huang, Beichuan Zhang, Na Mou, Yanan Niu, Yang Song, Hongning Wang, Kun Gai

    Abstract: The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through eve… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by WWW 2024

  20. arXiv:2405.04233  [pdf, other

    cs.CV cs.LG

    Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

    Authors: Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu

    Abstract: We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as un… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Project page at https://www.shengshu-ai.com/vidu

  21. arXiv:2405.03409  [pdf, other

    cs.LG

    LightTR: A Lightweight Framework for Federated Trajectory Recovery

    Authors: Ziqiao Liu, Hao Miao, Yan Zhao, Chenxi Liu, Kai Zheng, Huan Li

    Abstract: With the proliferation of GPS-equipped edge devices, huge trajectory data is generated and accumulated in various domains, motivating a variety of urban applications. Due to the limited acquisition capabilities of edge devices, a lot of trajectories are recorded at a low sampling rate, which may lead to the effectiveness drop of urban applications. We aim to recover a high-sampled trajectory based… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: The paper was accepted by ICDE 2024

  22. arXiv:2404.18055  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.str-el

    Enhanced torque efficiency in ferromagnetic multilayers by introducing naturally oxidized Cu

    Authors: Kun Zheng, Cuimei Cao, Yingying Lu, Jing Meng, Junpeng Pan, Zhenjie Zhao, Yang Xu, Tian Shang, Qingfeng Zhan

    Abstract: Spin-orbit torque (SOT) in the heavy elements with a large spin-orbit coupling (SOC) has been frequently used to manipulate the magnetic states in spintronic devices. Recent theoretical works have predicted that the surface oxidized light elements with a negligible SOC can yield a sizable orbital torque (OT), which plays an important role in switching the magnetization. Here, we report anomalous-H… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, accepted by Appl. Phys. Lett

    Journal ref: Appl. Phys. Lett. 124, 192408 (2024)

  23. arXiv:2404.14999  [pdf, other

    cs.DB cs.LG

    A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data

    Authors: Hao Miao, Yan Zhao, Chenjuan Guo, Bin Yang, Kai Zheng, Feiteng Huang, Jiandong Xie, Christian S. Jensen

    Abstract: The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e.g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability. Many recent proposals that target deep learning for spatio-temporal prediction suff… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  24. arXiv:2404.11450  [pdf, other

    cs.DB cs.CR

    Real-Time Trajectory Synthesis with Local Differential Privacy

    Authors: Yujia Hu, Yuntao Du, Zhikun Zhang, Ziquan Fang, Lu Chen, Kai Zheng, Yunjun Gao

    Abstract: Trajectory streams are being generated from location-aware devices, such as smartphones and in-vehicle navigation systems. Due to the sensitive nature of the location data, directly sharing user trajectories suffers from privacy leakage issues. Local differential privacy (LDP), which perturbs sensitive data on the user side before it is shared or analyzed, emerges as a promising solution for priva… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024. Code is available at: https://github.com/ZJU-DAILY/RetraSyn

  25. arXiv:2404.10232  [pdf, other

    eess.SP

    Channel Estimation for AFDM With Superimposed Pilots

    Authors: Kai Zheng, Miaowen Wen, Tianqi Mao, Lixia Xiao, Zhaocheng Wang

    Abstract: The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  26. arXiv:2404.09520  [pdf, other

    cs.IR

    UniSAR: Modeling User Transition Behaviors between Search and Recommendation

    Authors: Teng Shi, Zihua Si, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Dewei Leng, Yanan Niu, Yang Song

    Abstract: Nowadays, many platforms provide users with both search and recommendation services as important tools for accessing information. The phenomenon has led to a correlation between user search and recommendation behaviors, providing an opportunity to model user interests in a fine-grained way. Existing approaches either model user search and recommendation behaviors separately or overlook the differe… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  27. arXiv:2404.07441  [pdf, ps, other

    cs.CC cs.DS

    Near Optimal Alphabet-Soundness Tradeoff PCPs

    Authors: Dor Minzer, Kai Zhe Zheng

    Abstract: We show that for all $\varepsilon>0$, for sufficiently large prime power $q$, for all $δ>0$, it is NP-hard to distinguish whether a 2-Prover-1-Round projection game with alphabet size $q$ has value at least $1-δ$, or value at most $1/q^{(1-ε)}$. This establishes a nearly optimal alphabet-to-soundness tradeoff for 2-query PCPs with alphabet size $q$, improving upon a result of [Chan 2016]. Our resu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: STOC 2024, 91 pages

  28. arXiv:2404.05673  [pdf, other

    cs.CV

    CoReS: Orchestrating the Dance of Reasoning and Segmentation

    Authors: Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang

    Abstract: The reasoning segmentation task, which demands a nuanced comprehension of intricate queries to accurately pinpoint object regions, is attracting increasing attention. However, Multi-modal Large Language Models (MLLM) often find it difficult to accurately localize the objects described in complex reasoning contexts. We believe that the act of reasoning segmentation should mirror the cognitive stage… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at ECCV 2024

  29. arXiv:2404.01177  [pdf, other

    cs.CR cs.IR

    Poisoning Decentralized Collaborative Recommender System and Its Countermeasures

    Authors: Ruiqi Zheng, Liang Qu, Tong Chen, Kai Zheng, Yuhui Shi, Hongzhi Yin

    Abstract: To make room for privacy and efficiency, the deployment of many recommender systems is experiencing a shift from central servers to personal devices, where the federated recommender systems (FedRecs) and decentralized collaborative recommender systems (DecRecs) are arguably the two most representative paradigms. While both leverage knowledge (e.g., gradients) sharing to facilitate learning local m… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  30. arXiv:2403.17688  [pdf, other

    cs.IR

    Large Language Models Enhanced Collaborative Filtering

    Authors: Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu

    Abstract: Recent advancements in Large Language Models (LLMs) have attracted considerable interest among researchers to leverage these models to enhance Recommender Systems (RSs). Existing work predominantly utilizes LLMs to generate knowledge-rich texts or utilizes LLM-derived embeddings as features to improve RSs. Although the extensive world knowledge embedded in LLMs generally benefits RSs, the applicat… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 11 pages

  31. arXiv:2403.17007  [pdf, other

    cs.CV

    DreamLIP: Language-Image Pre-training with Long Captions

    Authors: Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

    Abstract: Language-image pre-training largely relies on how precisely and thoroughly a text describes its paired image. In practice, however, the contents of an image can be so rich that well describing them requires lengthy captions (e.g., with 10 sentences), which are usually missing in existing datasets. Consequently, there are currently no clear evidences on whether and how language-image pre-training c… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  32. arXiv:2403.14151  [pdf, other

    cs.LG cs.AI cs.CY cs.DB

    Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond

    Authors: Wei Chen, Yuxuan Liang, Yuanshao Zhu, Yanchuan Chang, Kang Luo, Haomin Wen, Lei Li, Yanwei Yu, Qingsong Wen, Chao Chen, Kai Zheng, Yunjun Gao, Xiaofang Zhou, Yu Zheng

    Abstract: Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptabili… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 25 pages, 12 figures, 5 tables

  33. arXiv:2403.12995  [pdf, other

    q-bio.BM cs.CE cs.LG

    ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

    Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

    Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: ICML2024 camera-ready, update some experimental results, add github url, fix some typos

  34. arXiv:2403.12922  [pdf, other

    cs.CV

    Contextual AD Narration with Interleaved Multimodal Sequence

    Authors: Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang

    Abstract: The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the stor… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  35. arXiv:2403.12422  [pdf, other

    cs.LG

    Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

    Authors: Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu

    Abstract: Pretraining transformers are generally time-consuming. Fully quantized training (FQT) is a promising approach to speed up pretraining. However, most FQT methods adopt a quantize-compute-dequantize procedure, which often leads to suboptimal speedup and significant performance degradation when used in transformers due to the high memory access overheads and low-precision computations. In this work,… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures

  36. arXiv:2403.09167  [pdf, other

    cs.CL

    Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

    Authors: Jianwei Sun, Chaoyang Mei, Linlin Wei, Kaiyu Zheng, Na Liu, Ming Cui, Tianyi Li

    Abstract: The efficacy of large language models (LLMs) is heavily dependent on the quality of the underlying data, particularly within specialized domains. A common challenge when fine-tuning LLMs for domain-specific applications is the potential degradation of the model's generalization capabilities. To address these issues, we propose a two-stage approach for the construction of production prompts designe… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  37. arXiv:2403.08447  [pdf, other

    physics.med-ph

    Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report

    Authors: Evi M. C. Huijben, Maarten L. Terpstra, Arthur Jr. Galapon, Suraj Pai, Adrian Thummerer, Peter Koopmans, Manya Afonso, Maureen van Eijnatten, Oliver Gurney-Champion, Zeli Chen, Yiwen Zhang, Kaiyi Zheng, Chuanpu Li, Haowen Pang, Chuyang Ye, Runqi Wang, Tao Song, Fuxin Fan, Jingna Qiu, Yixing Huang, Juhyung Ha, Jong Sung Park, Alexandra Alain-Beaudoin, Silvain Bériault, Pengxin Yu , et al. (34 additional authors not shown)

    Abstract: Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, wh… ▽ More

    Submitted 11 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Preprint submitted to Medical Image Analysis

  38. arXiv:2402.11769  [pdf, other

    eess.SY cs.GT math.OC

    Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection

    Authors: Cheng Feng, Kedi Zheng, Lanqing Shan, Hani Alers, Lampros Stergioulas, Hongye Guo, Qixin Chen

    Abstract: Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE PES Transactions

  39. arXiv:2402.11148  [pdf, other

    cs.LG cs.CV

    Knowledge Distillation Based on Transformed Teacher Matching

    Authors: Kaixiang Zheng, En-Hui Yang

    Abstract: As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting varia… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  40. arXiv:2402.10398  [pdf, ps, other

    cs.SE

    Prompt Learning for Multi-Label Code Smell Detection: A Promising Approach

    Authors: Haiyang Liu, Yang Zhang, Vidya Saikrishna, Quanquan Tian, Kun Zheng

    Abstract: Code smells indicate the potential problems of software quality so that developers can identify refactoring opportunities by detecting code smells. State-of-the-art approaches leverage heuristics, machine learning, and deep learning to detect code smells. However, existing approaches have not fully explored the potential of large language models (LLMs). In this paper, we propose \textit{PromptSmel… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  41. arXiv:2402.09165  [pdf, other

    cs.LG

    Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has a massive of real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraph an… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  42. arXiv:2402.05154  [pdf, other

    cs.SI cs.AI

    Adaptive Hypergraph Network for Trust Prediction

    Authors: Rongwei Xu, Guanfeng Liu, Yan Wang, Xuyun Zhang, Kai Zheng, Xiaofang Zhou

    Abstract: Trust plays an essential role in an individual's decision-making. Traditional trust prediction models rely on pairwise correlations to infer potential relationships between users. However, in the real world, interactions between users are usually complicated rather than pairwise only. Hypergraphs offer a flexible approach to modeling these complex high-order correlations (not just pairwise connect… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  43. arXiv:2401.15824  [pdf, other

    eess.SY

    Innovation-triggered Learning for Data-driven Predictive Control: Deterministic and Stochastic Formulations

    Authors: Kaikai Zheng, Dawei Shi, Sandra Hirche, Yang Shi

    Abstract: Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first-principle. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. {To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  44. arXiv:2401.14583  [pdf, other

    cs.IR

    Physical Trajectory Inference Attack and Defense in Decentralized POI Recommendation

    Authors: Jing Long, Tong Chen, Guanhua Ye, Kai Zheng, Nguyen Quoc Viet Hung, Hongzhi Yin

    Abstract: As an indispensable personalized service within Location-Based Social Networks (LBSNs), the Point-of-Interest (POI) recommendation aims to assist individuals in discovering attractive and engaging places. However, the accurate recommendation capability relies on the powerful server collecting a vast amount of users' historical check-in data, posing significant risks of privacy breaches. Although s… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  45. arXiv:2401.04916  [pdf, other

    cs.NI

    Digital Retina for IoV Towards 6G: Architecture, Opportunities, and Challenges

    Authors: Kan Zheng, Jie Mei, Haojun Yang, Lu Hou, Siwei Ma

    Abstract: Vehicles are no longer isolated entities in traffic environments, thanks to the development of IoV powered by 5G networks and their evolution into 6G. However, it is not enough for vehicles in a highly dynamic and complex traffic environment to make reliable and efficient decisions. As a result, this paper proposes a cloud-edge-end computing system with multi-streams for IoV, referred to as Vehicu… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures

  46. arXiv:2312.15430  [pdf, other

    cs.CV

    Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

    Authors: Jianqiang Ren, Chao He, Lin Liu, Jiahao Chen, Yutong Wang, Yafei Song, Jianfang Li, Tangli Xue, Siqi Hu, Tao Chen, Kunkun Zheng, Jianjing Xiang, Liefeng Bo

    Abstract: There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages th… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: Technical Report

  47. arXiv:2312.14149  [pdf, other

    cs.CV cs.AI

    TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

    Authors: Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

    Abstract: The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data. Existing attempts usually face the problem of coarse alignment, e.g., the vision encoder struggles in localizing an attribute-specified object. In this work, we propose an embarrassingly simple approach to better align image and text features with no need of additional data f… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  48. arXiv:2312.11935  [pdf, other

    cs.AI

    Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

    Authors: Yuyang Xia, Shuncheng Liu, Quanlin Yu, Liwei Deng, You Zhang, Han Su, Kai Zheng

    Abstract: Autonomous driving is an emerging technology that has advanced rapidly over the last decade. Modern transportation is expected to benefit greatly from a wise decision-making framework of autonomous vehicles, including the improvement of mobility and the minimization of risks and travel time. However, existing methods either ignore the complexity of environments only fitting straight roads, or igno… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: IEEE International Conference on Data Engineering (ICDE2024)

  49. arXiv:2312.08397  [pdf, other

    cs.LG cs.AI cs.HC

    Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning

    Authors: Huao Li, Yao Fan, Keyang Zheng, Michael Lewis, Katia Sycara

    Abstract: In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropria… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE SMC 2023

  50. arXiv:2312.07125  [pdf, other

    cs.CV

    TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification

    Authors: Kaipeng Zheng, Weiran Huang, Lichao Sun

    Abstract: Few-shot learning has been studied to adapt models to tasks with very few samples. It holds profound significance, particularly in clinical tasks, due to the high annotation cost of medical images. Several works have explored few-shot learning on medical images, yet they still require a large number of medical images for pre-training models to gain domain-specific priors. Vision foundation models… ▽ More

    Submitted 4 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.