Skip to main content

Showing 1–50 of 53 results for author: Wan, G

  1. arXiv:2406.18937  [pdf, other

    cs.LG cs.AI

    Federated Graph Semantic and Structural Learning

    Authors: Wenke Huang, Guancheng Wan, Mang Ye, Bo Du

    Abstract: Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenges. Most relative arts focus on traditional distributed tasks like images and voices, incapable of graph structures. This paper firstly reveals that local client distortion is brought by both node-level sem… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2023

  2. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  3. arXiv:2406.06016  [pdf, other

    cs.LG

    EpiLearn: A Python Library for Machine Learning in Epidemic Modeling

    Authors: Zewen Liu, Yunxiao Li, Mingyang Wei, Guancheng Wan, Max S. Y. Lau, Wei Jin

    Abstract: EpiLearn is a Python toolkit developed for modeling, simulating, and analyzing epidemic data. Although there exist several packages that also deal with epidemic modeling, they are often restricted to mechanistic models or traditional statistical tools. As machine learning continues to shape the world, the gap between these packages and the latest models has become larger. To bridge the gap and ins… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2405.17233  [pdf, other

    cs.LG

    CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

    Authors: Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio… ▽ More

    Submitted 2 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2405.01649  [pdf, other

    cs.CL

    Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

    Authors: Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

    Abstract: Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propo… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2404.14296  [pdf, other

    cs.SE cs.AI

    Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

    Authors: Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Pan Zhou, Hai Jin, Lichao Sun

    Abstract: Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Although using source code in GitHub has been a common practice for training deep-learning-based models for code completion, it may induce some legal and ethical issues such as copyright infringement. In this paper, we investigate the legal and ethical issues of current neural c… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.09515  [pdf, other

    cs.CV

    Revealing the structure-property relationships of copper alloys with FAGC

    Authors: Yuexing Han, Guanxin Wan, Tao Han, Bing Wang, Yi Liu

    Abstract: Understanding how the structure of materials affects their properties is a cornerstone of materials science and engineering. However, traditional methods have struggled to accurately describe the quantitative structure-property relationships for complex structures. In our study, we bridge this gap by leveraging machine learning to analyze images of materials' microstructures, thus offering a novel… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  8. arXiv:2403.19852  [pdf, other

    cs.LG cs.SI physics.soc-ph q-bio.PE

    A Review of Graph Neural Networks in Epidemic Modeling

    Authors: Zewen Liu, Guancheng Wan, B. Aditya Prakash, Max S. Y. Lau, Wei Jin

    Abstract: Since the onset of the COVID-19 pandemic, there has been a growing interest in studying epidemiological models. Traditional mechanistic models mathematically describe the transmission mechanisms of infectious diseases. However, they often suffer from limitations of oversimplified or fixed assumptions, which could cause sub-optimal predictive power and inefficiency in capturing complex relation inf… ▽ More

    Submitted 21 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  9. arXiv:2402.18243  [pdf, other

    cs.CL

    Learning or Self-aligning? Rethinking Instruction Fine-tuning

    Authors: Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun

    Abstract: Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potentia… ▽ More

    Submitted 2 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  10. arXiv:2402.11068  [pdf, other

    cs.CL cs.AI

    Bridging Causal Discovery and Large Language Models: A Comprehensive Survey of Integrative Approaches and Future Directions

    Authors: Guangya Wan, Yuqi Wu, Mengxuan Hu, Zhixuan Chu, Sheng Li

    Abstract: Causal discovery (CD) and Large Language Models (LLMs) represent two emerging fields of study with significant implications for artificial intelligence. Despite their distinct origins, CD focuses on uncovering cause-effect relationships from data, and LLMs on processing and generating humanlike text, the convergence of these domains offers novel insights and methodologies for understanding complex… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  11. arXiv:2402.06099  [pdf, other

    cs.NI

    CATO: End-to-End Optimization of ML-Based Traffic Analysis Pipelines

    Authors: Gerry Wan, Shinan Liu, Francesco Bronzino, Nick Feamster, Zakir Durumeric

    Abstract: Machine learning has shown tremendous potential for improving the capabilities of network traffic analysis applications, often outperforming simpler rule-based heuristics. However, ML-based solutions remain difficult to deploy in practice. Many existing approaches only optimize the predictive performance of their models, overlooking the practical challenges of running them against network traffic… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  12. arXiv:2402.03694  [pdf, other

    cs.NI cs.AI

    ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis

    Authors: Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster

    Abstract: Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presen… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  13. arXiv:2312.14518  [pdf, other

    q-bio.NC cs.CV eess.IV

    Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

    Authors: Minghui Liao, Guojia Wan, Bo Du

    Abstract: Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular characteristics of neurons is relatively low and costly. With the advancements in electron microscopy imaging and analysis techniques for brain tissue, we… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  14. arXiv:2312.03325  [pdf, other

    cs.CV cs.LG

    FAGC:Feature Augmentation on Geodesic Curve in the Pre-Shape Space

    Authors: Yuexing Han, Guanxin Wan, Bing Wang

    Abstract: Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we… ▽ More

    Submitted 25 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

  15. arXiv:2311.06750  [pdf, other

    cs.LG cs.AI

    Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

    Authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang

    Abstract: Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 22 pages, 4 figures

  16. arXiv:2310.09499  [pdf, other

    cs.CL cs.AI

    One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

    Authors: Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been… ▽ More

    Submitted 23 April, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP2024

  17. arXiv:2310.00597  [pdf, other

    cs.CL

    A Task-oriented Dialog Model with Task-progressive and Policy-aware Pre-training

    Authors: Lucen Zhong, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Jiashen Sun, Ke Zeng, Guanglu Wan

    Abstract: Pre-trained conversation models (PCMs) have achieved promising progress in recent years. However, existing PCMs for Task-oriented dialog (TOD) are insufficient for capturing the sequential nature of the TOD-related tasks, as well as for learning dialog policy information. To alleviate these problems, this paper proposes a task-progressive PCM with two policy-aware pre-training tasks. The model is… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted at NLPCC 2023

  18. arXiv:2309.09443  [pdf, other

    eess.AS cs.CL cs.SD

    Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter

    Authors: Song Li, Yongbin You, Xuezhi Wang, Ke Ding, Guanglu Wan

    Abstract: Multilingual intelligent assistants, such as ChatGPT, have recently gained popularity. To further expand the applications of multilingual artificial intelligence assistants and facilitate international communication, it is essential to enhance the performance of multilingual speech recognition, which is a crucial component of speech interaction. In this paper, we propose two simple and parameter-e… ▽ More

    Submitted 19 September, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  19. arXiv:2309.07413  [pdf, other

    cs.CL cs.SD eess.AS

    CPPF: A contextual and post-processing-free model for automatic speech recognition

    Authors: Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration n… ▽ More

    Submitted 20 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  20. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  21. arXiv:2307.08991  [pdf, other

    cs.CV cs.RO

    EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps

    Authors: Yuzhe He, Shuang Liang, Xiaofei Rui, Chengying Cai, Guowei Wan

    Abstract: Accurate and reliable ego-localization is critical for autonomous driving. In this paper, we present EgoVM, an end-to-end localization network that achieves comparable localization accuracy to prior state-of-the-art methods, but uses lightweight vectorized maps instead of heavy point-based maps. To begin with, we extract BEV features from online multi-view images and LiDAR point cloud. Then, we em… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 8 pages

  22. arXiv:2306.15376  [pdf, other

    cs.CL

    Exploiting Pseudo Future Contexts for Emotion Recognition in Conversations

    Authors: Yinyi Wei, Shuaipeng Liu, Hailei Yan, Wei Ye, Tong Mo, Guanglu Wan

    Abstract: With the extensive accumulation of conversational data on the Internet, emotion recognition in conversations (ERC) has received increasing attention. Previous efforts of this task mainly focus on leveraging contextual and speaker-specific features, or integrating heterogeneous external commonsense knowledge. Among them, some heavily rely on future contexts, which, however, are not always available… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 15 pages, accepted by ADMA 2023

  23. Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

    Authors: Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, Ming'en Zhao

    Abstract: Transducer is one of the mainstream frameworks for streaming speech recognition. There is a performance gap between the streaming and non-streaming transducer models due to limited context. To reduce this gap, an effective way is to ensure that their hidden and output distributions are consistent, which can be achieved by hierarchical knowledge distillation. However, it is difficult to ensure the… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  24. arXiv:2304.00884  [pdf, other

    cs.CL

    Dialog-to-Actions: Building Task-Oriented Dialogue System via Action-Level Generation

    Authors: Yuncheng Hua, Xiangyu Xi, Zheng Jiang, Guanwei Zhang, Chaobo Sun, Guanglu Wan, Wei Ye

    Abstract: End-to-end generation-based approaches have been investigated and applied in task-oriented dialogue systems. However, in industrial scenarios, existing methods face the bottlenecks of controllability (e.g., domain-inconsistent responses, repetition problem, etc) and efficiency (e.g., long computation time, etc). In this paper, we propose a task-oriented dialogue system via action-level generation.… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at SIGIR 2023 Industry Track

  25. arXiv:2303.07830  [pdf

    q-bio.NC cs.AI

    Emergent Bio-Functional Similarities in a Cortical-Spike-Train-Decoding Spiking Neural Network Facilitate Predictions of Neural Computation

    Authors: Tengjun Liu, Yansong Chua, Yiwei Zhang, Yuxiao Ning, Pengfu Liu, Guihua Wan, Zijun Wan, Shaomin Zhang, Weidong Chen

    Abstract: Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the moto… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  26. arXiv:2302.14511  [pdf, other

    cs.CV

    A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation

    Authors: Lin Li, Wendong Ding, Yongkun Wen, Yufei Liang, Yong Liu, Guowei Wan

    Abstract: Pairwise point cloud registration is a critical task for many applications, which heavily depends on finding correct correspondences from the two point clouds. However, the low overlap between input point clouds causes the registration to fail easily, leading to mistaken overlapping and mismatched correspondences, especially in scenes where non-overlapping regions contain similar structures. In th… ▽ More

    Submitted 14 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 8 pages. Accepted by ICRA-2023

  27. arXiv:2212.03482  [pdf, other

    eess.AS cs.SD

    Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

    Authors: Pengcheng Li, Genshun Wan, Fenglin Ding, Hang Chen, Jianqing Gao, Jia Pan, Cong Liu

    Abstract: Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data. Based on a well-designed self-supervised learning pattern, pre-trained models can be used to serve lots of downstream speech tasks such as automatic speech recognition. In order to take full advantage of the labed data in low resource task, we present an improved pre-t… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  28. arXiv:2212.03480  [pdf, other

    eess.AS cs.SD

    Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

    Authors: Genshun Wan, Tan Liu, Hang Chen, Jia Pan, Cong Liu, Zhongfu Ye

    Abstract: Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning theoretically. To this end, we propose a progressive multi-scale self-supervised learning (PMS-SSL) method, which uses fine-grained target sets to compute SSL loss… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  29. arXiv:2212.03476  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

    Authors: Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu

    Abstract: Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer fr… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Subimitted to ICASSP 2023

  30. arXiv:2212.03090  [pdf, other

    cs.SD eess.AS

    Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

    Authors: Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan

    Abstract: Very deep models for speaker recognition (SR) have demonstrated remarkable performance improvement in recent research. However, it is impractical to deploy these models for on-device applications with constrained computational resources. On the other hand, light-weight models are highly desired in practice despite their sub-optimal performance. This research aims to improve light-weight SR models… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  31. arXiv:2212.03039  [pdf, ps, other

    cs.SD eess.AS

    Covariance Regularization for Probabilistic Linear Discriminant Analysis

    Authors: Zhiyuan Peng, Mingjie Shao, Xuanji He, Xu Li, Tan Lee, Ke Ding, Guanglu Wan

    Abstract: Probabilistic linear discriminant analysis (PLDA) is commonly used in speaker verification systems to score the similarity of speaker embeddings. Recent studies improved the performance of PLDA in domain-matched conditions by diagonalizing its covariance. We suspect such brutal pruning approach could eliminate its capacity in modeling dimension correlation of speaker embeddings, leading to inadequ… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  32. arXiv:2212.02782  [pdf, other

    eess.AS cs.SD

    Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

    Authors: Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu

    Abstract: In this work, we present a novel method, named AV2vec, for learning audio-visual speech representations by multimodal self-distillation. AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher. The parameters of the teacher model are a momentum update of the student. Since… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: submitted to ICASSP 2023

  33. arXiv:2211.13896  [pdf, other

    cs.CL

    MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts

    Authors: Xiangyu Xi, Jianwei Lv, Shuaipeng Liu, Wei Ye, Fan Yang, Guanglu Wan

    Abstract: Event detection (ED) identifies and classifies event triggers from unstructured texts, serving as a fundamental task for information extraction. Despite the remarkable progress achieved in the past several years, most research efforts focus on detecting events from formal texts (e.g., news articles, Wikipedia documents, financial announcements). Moreover, the texts in each dataset are either from… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted at EMNLP 2022

  34. arXiv:2211.03284  [pdf, other

    eess.AS cs.SD

    Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

    Authors: Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan

    Abstract: The CTC model has been widely applied to many application scenarios because of its simple structure, excellent performance, and fast inference speed. There are many peaks in the probability distribution predicted by the CTC models, and each peak represents a non-blank token. The recognition latency of CTC models can be reduced by encouraging the model to predict peaks earlier. Existing methods to… ▽ More

    Submitted 15 March, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023(5 pages, 2 figures)

  35. arXiv:2209.14613  [pdf, other

    cs.LG cs.CY

    Fair admission risk prediction with proportional multicalibration

    Authors: William La Cava, Elle Lett, Guangya Wan

    Abstract: Fair calibration is a widely desirable fairness criteria in risk prediction contexts. One way to measure and achieve fair calibration is with multicalibration. Multicalibration constrains calibration error among flexibly-defined subpopulations while maintaining overall calibration. However, multicalibrated models can exhibit a higher percent calibration error among groups with lower base rates tha… ▽ More

    Submitted 31 August, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published in the 2023 Conference on Health, Inference, and Learning (CHIL). Best paper award

    Journal ref: Proceedings of Machine Learning Research 209 (2023) 350-378

  36. arXiv:2205.14295  [pdf, other

    cs.CV eess.AS

    Is Lip Region-of-Interest Sufficient for Lipreading?

    Authors: Jing-Xuan Zhang, Gen-Shun Wan, Jia Pan

    Abstract: Lip region-of-interest (ROI) is conventionally used for visual input in the lipreading task. Few works have adopted the entire face as visual input because lip-excluded parts of the face are usually considered to be redundant and irrelevant to visual speech recognition. However, faces contain much more detailed information than lips, such as speakers' head pose, emotion, identity etc. We argue tha… ▽ More

    Submitted 1 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: preprint

  37. arXiv:2205.06436  [pdf, other

    cs.CL

    A Low-Cost, Controllable and Interpretable Task-Oriented Chatbot: With Real-World After-Sale Services as Example

    Authors: Xiangyu Xi, Chenxu Lv, Yuncheng Hua, Wei Ye, Chaobo Sun, Shuaipeng Liu, Fan Yang, Guanglu Wan

    Abstract: Though widely used in industry, traditional task-oriented dialogue systems suffer from three bottlenecks: (i) difficult ontology construction (e.g., intents and slots); (ii) poor controllability and interpretability; (iii) annotation-hungry. In this paper, we propose to represent utterance with a simpler concept named Dialogue Action, upon which we construct a tree-structured TaskFlow and further… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accept at SIGIR Industry Track 2022

  38. arXiv:2204.10523  [pdf, other

    cs.SD eess.AS

    Unifying Cosine and PLDA Back-ends for Speaker Verification

    Authors: Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan

    Abstract: State-of-art speaker verification (SV) systems use a back-end model to score the similarity of speaker embeddings extracted from a neural network model. The commonly used back-end models are the cosine scoring and the probabilistic linear discriminant analysis (PLDA) scoring. With the recently developed neural embeddings, the theoretically more appealing PLDA approach is found to have no advantage… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: submitted to interspeech2022

  39. arXiv:2203.16776  [pdf, ps, other

    eess.AS cs.CL cs.LG

    An Empirical Study of Language Model Integration for Transducer based Speech Recognition

    Authors: Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan

    Abstract: Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract th… ▽ More

    Submitted 3 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  40. arXiv:2203.16758  [pdf, other

    eess.AS cs.CL

    CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

    Authors: Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, with… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  41. arXiv:2203.09278  [pdf, other

    cs.CL

    Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

    Authors: Yantao Gong, Cao Liu, Fan Yang, Xunliang Cai, Guanglu Wan, Jiansong Chen, Weipeng Zhang, Houfeng Wang

    Abstract: Data-driven methods have achieved notable performance on intent detection, which is a task to comprehend user queries. Nonetheless, they are controversial for over-confident predictions. In some scenarios, users do not only care about the accuracy but also the confidence of model. Unfortunately, mainstream neural networks are poorly calibrated, with a large gap between accuracy and confidence. To… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  42. arXiv:2202.08880  [pdf, other

    eess.IV cs.GR physics.optics

    Ray-transfer functions for camera simulation of 3D scenes with hidden lens design

    Authors: Thomas Goossens, Zheng Lyu, Jamyuen Ko, Gordon Wan, Joyce Farrell, Brian Wandell

    Abstract: Combining image sensor simulation tools (e.g., ISETCam) with physically based ray tracing (e.g., PBRT) offers possibilities for designing and evaluating novel imaging systems as well as for synthesizing physically accurate, labeled images for machine learning. One practical limitation has been simulating the optics precisely: Lens manufacturers generally prefer to keep lens design confidential. We… ▽ More

    Submitted 23 February, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

  43. arXiv:2112.12743  [pdf, other

    eess.AS cs.SD

    Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

    Authors: Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

    Abstract: In the existing cross-speaker style transfer task, a source speaker with multi-style recordings is necessary to provide the style for a target speaker. However, it is hard for one speaker to express all expected styles. In this paper, a more general task, which is to produce expressive speech by combining any styles and timbres from a multi-speaker corpus in which each speaker has a unique style,… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: submitted to icassp2022

  44. Density-Based Dynamic Curriculum Learning for Intent Detection

    Authors: Yantao Gong, Cao Liu, Jiazhen Yuan, Fan Yang, Xunliang Cai, Guanglu Wan, Jiansong Chen, Ruiyao Niu, Houfeng Wang

    Abstract: Pre-trained language models have achieved noticeable performance on the intent detection task. However, due to assigning an identical weight to each sample, they suffer from the overfitting of simple samples and the failure to learn complex samples well. To handle this problem, we propose a density-based dynamic curriculum learning model. Our model defines the sample's difficulty level according t… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  45. arXiv:2008.04265  [pdf, other

    eess.AS cs.SD

    Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

    Authors: Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan

    Abstract: Data efficient voice cloning aims at synthesizing target speaker's voice with only a few enrollment samples at hand. To this end, speaker adaptation and speaker encoding are two typical methods based on base model trained from multiple speakers. The former uses a small set of target speaker data to transfer the multi-speaker model to target speaker's voice through direct model update, while in the… ▽ More

    Submitted 10 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

  46. arXiv:2003.03026  [pdf, other

    cs.CV cs.RO

    DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving

    Authors: Yao Zhou, Guowei Wan, Shenhua Hou, Li Yu, Gang Wang, Xiaofei Rui, Shiyu Song

    Abstract: We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too s… ▽ More

    Submitted 13 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: 19 pages, 4 figures, Accepted by ECCV 2020

  47. arXiv:2001.01986  [pdf, other

    cs.CL cs.LG eess.AS

    Learning Speaker Embedding with Momentum Contrast

    Authors: Ke Ding, Xuanji He, Guanglu Wan

    Abstract: Speaker verification can be formulated as a representation learning task, where speaker-discriminative embeddings are extracted from utterances of variable lengths. Momentum Contrast (MoCo) is a recently proposed unsupervised representation learning framework, and has shown its effectiveness for learning good feature representation for downstream vision tasks. In this work, we apply MoCo to learn… ▽ More

    Submitted 6 September, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

  48. arXiv:1911.08024  [pdf, ps, other

    cs.LG stat.ML

    A Bias Trick for Centered Robust Principal Component Analysis

    Authors: Baokun He, Guihong Wan, Haim Schweitzer

    Abstract: Outlier based Robust Principal Component Analysis (RPCA) requires centering of the non-outliers. We show a "bias trick" that automatically centers these non-outliers. Using this bias trick we obtain the first RPCA algorithm that is optimal with respect to centering.

    Submitted 18 November, 2019; originally announced November 2019.

  49. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

    Authors: Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, Junwei Han

    Abstract: Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories… ▽ More

    Submitted 21 September, 2019; v1 submitted 31 August, 2019; originally announced September 2019.

    Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, 159: 296-307, 2020

  50. arXiv:1907.11094  [pdf, ps, other

    stat.ML cs.LG

    Improving the Accuracy of Principal Component Analysis by the Maximum Entropy Method

    Authors: Guihong Wan, Crystal Maung, Haim Schweitzer

    Abstract: Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The most important function is arguably the Euclidean distance between data items, This can be used, for example, to solve the approximate nearest neighbor problem… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.