subscribe to arXiv mailings

Hybrid Temporal Computing for Lower Power Hardware Accelerators

Authors: Maliha Tasnim, Sachin Sachdeva, Yibo Liu, Sheldon X. -D. Tan

Abstract: In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. Howe… ▽ More In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. However, race logic is limited in its applications due to inherent restrictions. The new HTC framework overcomes these limitations by encoding signals in both temporal and pulse rate formats for multiplication and in temporal format for propagation. This approach maintains reduced switch energy while being general enough to implement a wide range of arithmetic operations. We demonstrate how HTC multiplication is performed for both unipolar and bipolar data encoding and present the basic designs for multipliers, adders, and MAC units. Additionally, we implement two hardware accelerators: a Finite Impulse Response (FIR) filter and a Discrete Cosine Transform (DCT)/iDCT engine for image compression and DSP applications. Experimental results show that the HTC MAC has a significantly smaller power and area footprint compared to the Unary MAC design and is orders of magnitude faster. Compared to the CBSC MAC, the HTC MAC reduces power consumption by $45.2\%$ and area footprint by $50.13\%$. For the FIR design, the HTC design significantly outperforms the Unary design on all metrics. Compared to the CBSC design, the HTC-based FIR filter reduces power consumption by $36.61\%$ and area cost by $45.85\%$. The HTC-based DCT filter retains the quality of the original image with a decent PSNR, while consuming $23.34\%$ less power and occupying $18.20\%$ less area than the CBSC MAC-based DCT filter. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 7 pages, 8 figures and 3 tables

arXiv:2406.19958 [pdf, other]

The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis

Authors: Yan Shuo Tan, Omer Ronen, Theo Saarinen, Bin Yu

Abstract: Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by theoretical guarantees that its posterior distribution concentrates around the true regression function at optimal rates under various data generative settings and for appropriate prior choices. In th… ▽ More Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by theoretical guarantees that its posterior distribution concentrates around the true regression function at optimal rates under various data generative settings and for appropriate prior choices. In this paper, we show that the BART sampler often converges slowly, confirming empirical observations by other researchers. Assuming discrete covariates, we show that, while the BART posterior concentrates on a set comprising all optimal tree structures (smallest bias and complexity), the Markov chain's hitting time for this set increases with $n$ (training sample size), under several common data generative settings. As $n$ increases, the approximate BART posterior thus becomes increasingly different from the exact posterior (for the same number of MCMC samples), contrasting with earlier concentration results on the exact posterior. This contrast is highlighted by our simulations showing worsening frequentist undercoverage for approximate posterior intervals and a growing ratio between the MSE of the approximate posterior and that obtainable by artificially improving convergence via averaging multiple sampler chains. Finally, based on our theoretical insights, possibilities are discussed to improve the BART sampler convergence performance. △ Less

Submitted 28 June, 2024; originally announced June 2024.

MSC Class: 62G08; 65C40

arXiv:2406.18897 [pdf, other]

Resilience of the surface code to error bursts

Authors: Shi Jie Samuel Tan, Christopher A. Pattison, Matt McEwen, John Preskill

Abstract: Quantum error correction works effectively only if the error rate of gate operations is sufficiently low. However, some rare physical mechanisms can cause a temporary increase in the error rate that affects many qubits; examples include ionizing radiation in superconducting hardware and large deviations in the global control of atomic systems. We refer to such rare transient spikes in the gate err… ▽ More Quantum error correction works effectively only if the error rate of gate operations is sufficiently low. However, some rare physical mechanisms can cause a temporary increase in the error rate that affects many qubits; examples include ionizing radiation in superconducting hardware and large deviations in the global control of atomic systems. We refer to such rare transient spikes in the gate error rate as error bursts. In this work, we investigate the resilience of the surface code to generic error bursts. We assume that, after appropriate mitigation strategies, the spike in the error rate lasts for only a single syndrome extraction cycle; we also assume that the enhanced error rate is uniform across the code block. Under these assumptions, and for a circuit-level depolarizing noise model, we perform Monte Carlo simulations to determine the regime in burst error rate and background error rate for which the memory time becomes arbitrarily long as the code block size grows. Our results indicate that suitable hardware mitigation methods combined with standard decoding methods may suffice to protect against transient error bursts in the surface code. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.13124 [pdf, other]

Learning to Generate Answers with Citations via Factual Consistency Models

Authors: Rami Aly, Zhiqiang Tang, Samson Tan, George Karypis

Abstract: Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning meth… ▽ More Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs). Our approach alternates between generating texts with citations and supervised fine-tuning with FCM-filtered citation data. Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens, as measured by an FCM. Results on the ALCE few-shot citation benchmark with various instruction-tuned LLMs demonstrate superior performance compared to in-context learning, vanilla supervised fine-tuning, and state-of-the-art methods, with an average improvement of $34.1$, $15.5$, and $10.5$ citation F$_1$ points, respectively. Moreover, in a domain transfer setting we show that the obtained citation generation ability robustly transfers to unseen datasets. Notably, our citation improvements contribute to the lowest factual error rate across baselines. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024. Code release will follow

arXiv:2406.12800 [pdf, other]

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-violative content, detecting potential errors in human rating, or surfacing critical context to support human rating. We outline how to support all of these design patterns using a single, optimized prompt. Beyond these synthetic experiments, we share how piloting our proposed techniques in a real-world review queue yielded a 41.5% improvement in optimizing available human rater capacity, and a 9--11% increase (absolute) in precision and recall for detecting violative content. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12649 [pdf, other]

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Authors: Hengyi Wang, Shiwei Tan, Hao Wang

Abstract: Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such… ▽ More Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such as feature-attribution and conceptual models, fall short in this regard. This paper proposes five desiderata for explaining ViTs -- faithfulness, stability, sparsity, multi-level structure, and parsimony -- and demonstrates the inadequacy of current methods in meeting these criteria comprehensively. We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. Our qualitative analysis reveals the distributions of patch-level concepts, elucidating the effectiveness of ViTs by modeling the joint distribution of patch embeddings and ViT's predictions. Moreover, these patch-level explanations bridge the gap between image-level and dataset-level explanations, thus completing the multi-level structure of PACE. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that PACE surpasses state-of-the-art methods in terms of the defined desiderata. △ Less

Submitted 18 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted at ICML 2024

arXiv:2406.12313 [pdf]

A framework for developing a knowledge management platform

Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide guidance on envisioning, executing, evaluating, and evolving knowledge management platforms. We emphasize essential considerations such as setting knowledge domain boundaries and measuring success, as well as the importance of making knowledge accessible for downstream applications and non-computational users and highlights necessary personal and organizational skills for success. We stress the importance of collaboration and the need for convergence on shared principles and commitment to provide or seek resources to advance KM. The community is invited to join the journey of KM and contribute to the advancement of the field by applying and improving on the guidelines described. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages, 1 figure

arXiv:2406.11230 [pdf, other]

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Authors: Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang

Abstract: Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-contex… ▽ More Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs. With this benchmark, we evaluate state-of-the-art MLLMs, encompassing both API-based and open-source models. The findings reveal that GPT-4o consistently surpasses other models in long-context scenarios, but suffers from hallucination problems in negative samples, i.e., when needles are not in the haystacks. Our comprehensive long-context evaluation of MLLMs also sheds lights on the considerable performance gap between API-based and open-source models. All the code, data, and instructions required to reproduce the main results are available at https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07866 [pdf, other]

Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

Authors: Samuel Tan, Peter I. Frazier

Abstract: We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes… ▽ More We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 2 figures, 3 tables

arXiv:2405.16003 [pdf, other]

Disentangling Heterogeneous Knowledge Concept Embedding for Cognitive Diagnosis on Untested Knowledge

Authors: Kui Xiao, Runtian Xing, Miao Zhang, Shunfeng Tan, Ziming Wang, Xiaolian Zhu

Abstract: Cognitive diagnosis is a fundamental and critical task in learning assessment, which aims to infer students' proficiency on knowledge concepts from their response logs. Current works assume each knowledge concept will certainly be tested and covered by multiple exercises. However, whether online or offline courses, it's hardly feasible to completely cover all knowledge concepts in several exercise… ▽ More Cognitive diagnosis is a fundamental and critical task in learning assessment, which aims to infer students' proficiency on knowledge concepts from their response logs. Current works assume each knowledge concept will certainly be tested and covered by multiple exercises. However, whether online or offline courses, it's hardly feasible to completely cover all knowledge concepts in several exercises. Restricted tests lead to undiscovered knowledge deficits, especially untested knowledge concepts(UKCs). In this paper, we propose a novel \underline{Dis}entangling Heterogeneous \underline{K}nowledge \underline{C}ognitive \underline{D}iagnosis framework on untested knowledge(DisKCD). Specifically, we leverage course grades, exercise questions, and resources to learn the potential representations of students, exercises, and knowledge concepts. In particular, knowledge concepts are disentangled into tested and untested based on the limiting actual exercises. We construct a heterogeneous relation graph network via students, exercises, tested knowledge concepts(TKCs), and UKCs. Then, through a hierarchical heterogeneous message-passing mechanism, the fine-grained relations are incorporated into the embeddings of the entities. Finally, the embeddings will be applied to multiple existing cognitive diagnosis models to infer students' proficiency on UKCs. Experimental results on real-world datasets show that the proposed model can effectively improve the performance of the task of diagnosing students' proficiency on UKCs. Our anonymous code is available at https://anonymous.4open.science/r/DisKCD. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14782 [pdf, other]

Lessons from the Trenches on Reproducible Evaluation of Language Models

Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers. First, we provide an overview of common challenges faced in language model evaluation. Second, we delineate best practices for addressing or lessening the impact of these challenges on research. Third, we present the Language Model Evaluation Harness (lm-eval): an open source library for independent, reproducible, and extensible evaluation of language models that seeks to address these issues. We describe the features of the library as well as case studies in which the library has been used to alleviate these methodological concerns. △ Less

Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12462

Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting

Authors: Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo

Abstract: Transformer-based models for long sequence time series forecasting (LSTF) problems have gained significant attention due to their exceptional forecasting precision. As the cornerstone of these models, the self-attention mechanism poses a challenge to efficient training and inference due to its quadratic time complexity. In this article, we propose a novel architectural design for Transformer-based… ▽ More Transformer-based models for long sequence time series forecasting (LSTF) problems have gained significant attention due to their exceptional forecasting precision. As the cornerstone of these models, the self-attention mechanism poses a challenge to efficient training and inference due to its quadratic time complexity. In this article, we propose a novel architectural design for Transformer-based models in LSTF, leveraging a substitution framework that incorporates Surrogate Attention Blocks and Surrogate FFN Blocks. The framework aims to boost any well-designed model's efficiency without sacrificing its accuracy. We further establish the equivalence of the Surrogate Attention Block to the self-attention mechanism in terms of both expressiveness and trainability. Through extensive experiments encompassing nine Transformer-based models across five time series tasks, we observe an average performance improvement of 9.45% while achieving a significant reduction in model size by 46% △ Less

Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: We believe this work is premature and requires further study

arXiv:2405.09386 [pdf, ps, other]

Quantum vertex algebra associated to quantum toroidal $\mathfrak{gl}_N$

Authors: Fulin Chen, Xin Huang, Fei Kong, Shaobin Tan

Abstract: In this paper, we associate the quantum toroidal algebra $\mathcal{E}_N$ of type $\mathfrak{gl}_N$ with quantum vertex algebra through equivariant $φ$-coordinated quasi modules. More precisely, for every $\ell\in \mathbb{C}$, by deforming the universal affine vertex algebra of $\mathfrak{sl}_\infty$, we construct an $\hbar$-adic quantum $\Z$-vertex algebra… ▽ More In this paper, we associate the quantum toroidal algebra $\mathcal{E}_N$ of type $\mathfrak{gl}_N$ with quantum vertex algebra through equivariant $φ$-coordinated quasi modules. More precisely, for every $\ell\in \mathbb{C}$, by deforming the universal affine vertex algebra of $\mathfrak{sl}_\infty$, we construct an $\hbar$-adic quantum $\Z$-vertex algebra $V_{\widehat{\mathfrak{sl}}_{\infty},\hbar}(\ell,0)$. Then we prove that the category of restricted $\mathcal{E}_N$-modules of level $\ell$ is canonically isomorphic to that of equivariant $φ$-coordinated quasi $V_{\widehat{\mathfrak{sl}}_{\infty},\hbar}(\ell,0)$-modules. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.05413 [pdf]

Digital Evolution: Novo Nordisk's Shift to Ontology-Based Data Management

Authors: Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose

Abstract: Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transf… ▽ More Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organizations digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 14 pages, 2 figures

arXiv:2405.02213 [pdf, other]

Automatic Programming: Large Language Models and Beyond

Authors: Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

Abstract: Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related is… ▽ More Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance △ Less

Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01548 [pdf]

doi 10.1109/JLT.2023.3304659

Foundry's perspective on laser and SOA module integration with silicon photonics

Authors: James Y. S. Tan, Shawn Xie Wu, Salih Yanikgonul, Chao Li, Patrick Guo-Qiang Lo

Abstract: Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However,… ▽ More Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However, given its indirect bandgap, the platform is typically integrated with other direct bandgap (e.g., III-V semiconductor) platforms for on-chip light source. An effective solution to integrating light source onto silicon photonics platform is integral to a practical scaled-up and full-fledged integrated photonics implementation. Here, we discuss the integration solutions, and present our foundry's perspective toward realizing it. △ Less

Submitted 20 February, 2024; originally announced May 2024.

Comments: 14 pages

Journal ref: IEEE J Lightwave Technol. vol. 42, no. 3, pp. 1062-1074, 2024

arXiv:2405.01350 [pdf, other]

Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

Abstract: Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current know… ▽ More Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git). △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: This paper is accepted by ICML-2024

arXiv:2404.19179 [pdf, other]

On the Determining Physical Factor of Jet-Related Coronal Mass Ejection's Morphology in the High Corona

Authors: Yadan Duan, Yuandeng Shen, Zehao Tang, Chenrui Zhou, Song Tan

Abstract: A solar jet can often cause coronal mass ejections (CMEs) with different morphologies in the high corona, for example, jet-like CMEs, bubble-like CMEs, and so-called twin CMEs that include a pair of simultaneous jet-like and bubble-like CMEs. However, what determines the morphology of a jet-related CME is still an open question. Using high spatiotemporal resolution stereoscopic observations taken… ▽ More A solar jet can often cause coronal mass ejections (CMEs) with different morphologies in the high corona, for example, jet-like CMEs, bubble-like CMEs, and so-called twin CMEs that include a pair of simultaneous jet-like and bubble-like CMEs. However, what determines the morphology of a jet-related CME is still an open question. Using high spatiotemporal resolution stereoscopic observations taken by the Solar Dynamics Observatory (SDO) and the Solar Terrestrial Relations Observatory (STEREO) from October 2010 to December 2012, we performed a statistical study of jet-related CMEs to study the potential physical factors that determine the morphology of CMEs in the outer corona. Our statistical sample includes 16 jet-related CME events of which 7 are twin CME events and 9 are jet-like narrow CMEs. We find that all CMEs in our sample were accompanied by filament-driven blowout jets and Type III radio bursts during their initial formation and involved magnetic reconnection between filament channels and the surrounding magnetic fields. Most of our cases occurred in a fan-spine magnetic configuration. Our study suggests that the bubble-like components of twin CMEs lacking an obvious core are related to the expansion of the closed-loop systems next to the fan-spine topology, while the jet-like component is from the coronal extension of the jet plasma along open fields. Based on the statistical results, we conclude that the morphology of jet-related CMEs in the high corona may be related to the filament length and the initial magnetic null point height of the fan-spine structures. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 15 pages, 15 figures, 2 tables, accepted by ApJ

arXiv:2404.18391 [pdf, other]

Broad and Bi-directional narrow quasi-periodic fast-propagating wave trains associated with a filament-driven halo CME on 2023 April 21

Authors: Xinping Zhou, Yuandeng Shen, Yihua Yan, Ke Yu, Zhining Qu, Ahmed Ahmed Ibrahim, Zehao Tang, Chengrui Zhou, Song Tan, Ye Qiu, Hongfei Liang

Abstract: This paper presents three distinct wave trains that occurred on 2023 April 21: a broad quasi-periodic fast-propagating (QFP) wave train and a bi-directional narrow QFP wave train. The broad QFP wave train expands outward in a circular wavefront, while bi-directional narrow QFP wave trains propagate in the northward and southward directions, respectively. The concurrent presence of the wave trains… ▽ More This paper presents three distinct wave trains that occurred on 2023 April 21: a broad quasi-periodic fast-propagating (QFP) wave train and a bi-directional narrow QFP wave train. The broad QFP wave train expands outward in a circular wavefront, while bi-directional narrow QFP wave trains propagate in the northward and southward directions, respectively. The concurrent presence of the wave trains offers a remarkable opportunity to investigate their respective triggering mechanisms. Measurement shows that the broad QFP wave train's speed is 300- 1100 km/s in different propagating directions. There is a significant difference in the speed of the bi-directional narrow QFP wave trains: the southward propagation achieves 1400 km/s, while the northward propagation only reaches about 550 km/s accompanied by a deceleration of about 1- 2 kms-2. Using the wavelet analysis, we find that the periodicity of the propagating wave trains in the southward and northward directions closely matches the quasi-periodic pulsations (QPPs) exhibited by the flares. Based on these results, the narrow QFP wave trains were most likely excited by the intermittent energy release in the accompanying flare. In contrast, the broad QFP wave train had a tight relationship with the erupting filament, probably attributed to the unwinding motion of the erupting filament or the leakage of the fast sausage wave train inside the filament body. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures, accepted by ApJ

arXiv:2404.17126 [pdf, other]

Deep Evidential Learning for Dose Prediction

Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

Abstract: In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of n… ▽ More In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 24 pages, 8 figures

arXiv:2404.15163 [pdf, other]

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

Abstract: With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov… ▽ More With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: IEEE Transactions on Broadcasting (TBC)

arXiv:2404.13818 [pdf, other]

Joint Liability Model with Adaptation to Climate Change

Authors: Jiayue Zhang, Ken Seng Tan, Tony S. Wirjanto, Lysa Porth

Abstract: This paper extends the application of ESG score assessment methodologies from large corporations to individual farmers' production, within the context of climate change. Our proposal involves the integration of crucial agricultural sustainability variables into conventional personal credit evaluation frameworks, culminating in the formulation of a holistic sustainable credit rating referred to as… ▽ More This paper extends the application of ESG score assessment methodologies from large corporations to individual farmers' production, within the context of climate change. Our proposal involves the integration of crucial agricultural sustainability variables into conventional personal credit evaluation frameworks, culminating in the formulation of a holistic sustainable credit rating referred to as the Environmental, Social, Economics (ESE) score. This ESE score is integrated into theoretical joint liability models, to gain valuable insights into optimal group sizes and individual-ESE score relationships. Additionally, we adopt a mean-variance utility function for farmers to effectively capture the risk associated with anticipated profits. Through a set of simulation exercises, the paper investigates the implications of incorporating ESE scores into credit evaluation systems, offering a nuanced comprehension of the repercussions under various climatic conditions. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.11201 [pdf, other]

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation

Authors: Shaomu Tan, Di Wu, Christof Monz

Abstract: Training a unified multilingual model promotes knowledge transfer but inevitably introduces negative interference. Language-specific modeling methods show promise in reducing interference. However, they often rely on heuristics to distribute capacity and struggle to foster cross-lingual transfer via isolated modules. In this paper, we explore intrinsic task modularity within multilingual networks… ▽ More Training a unified multilingual model promotes knowledge transfer but inevitably introduces negative interference. Language-specific modeling methods show promise in reducing interference. However, they often rely on heuristics to distribute capacity and struggle to foster cross-lingual transfer via isolated modules. In this paper, we explore intrinsic task modularity within multilingual networks and leverage these observations to circumvent interference under multilingual translation. We show that neurons in the feed-forward layers tend to be activated in a language-specific manner. Meanwhile, these specialized neurons exhibit structural overlaps that reflect language proximity, which progress across layers. Based on these findings, we propose Neuron Specialization, an approach that identifies specialized neurons to modularize feed-forward layers and then continuously updates them through sparse networks. Extensive experiments show that our approach achieves consistent performance gains over strong baselines with additional analyses demonstrating reduced interference and increased knowledge transfer. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.08877 [pdf, other]

Aligning LLMs for FL-free Program Repair

Authors: Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

Abstract: Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of… ▽ More Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of locating and repairing bugs end-to-end when using the related artifacts (e.g., test cases) as input, existing methods regard them as separate tasks and ask LLMs to generate patches at fixed locations. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first performing fault localization. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.07979 [pdf, other]

LLoCO: Learning Long Contexts Offline

Authors: Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

Abstract: Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM… ▽ More Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using $30\times$ fewer tokens during inference. LLoCO achieves up to $7.62\times$ speed-up and substantially reduces the cost of long document question answering, making it a promising solution for efficient long context processing. Our code is publicly available at https://github.com/jeffreysijuntan/lloco. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.05200 [pdf]

Quasicrystal bulk and surface energies from density functional theory

Authors: Woohyeon Baek, Sambit Das, Shibo Tan, Vikram Gavini, Wenhao Sun

Abstract: Are quasicrystals stable or metastable? Density functional theory (DFT) is often used to evaluate thermodynamic stability, but quasicrystals are long-range aperiodic and their energies cannot be calculated using conventional ab initio methods. Here, we perform first-principles calculations on quasicrystal nanoparticles of increasing sizes, from which we can directly extrapolate their bulk and surf… ▽ More Are quasicrystals stable or metastable? Density functional theory (DFT) is often used to evaluate thermodynamic stability, but quasicrystals are long-range aperiodic and their energies cannot be calculated using conventional ab initio methods. Here, we perform first-principles calculations on quasicrystal nanoparticles of increasing sizes, from which we can directly extrapolate their bulk and surface energies. Using this technique, we determine with high confidence that the icosahedral quasicrystals ScZn7.33 and YbCd5.7 are ground-state phases--revealing that translational symmetry is not a necessary condition for the T = 0 K stability of inorganic solids. Although we find the ScZn7.33 quasicrystal to be thermodynamically stable, we show on a mixed thermodynamic and kinetic phase diagram that its solidification from the melt is nucleation-limited, which illustrates why even stable materials may be kinetically challenging to grow. Our techniques here broadly open the door to first-principles investigations into the structure-bonding-stability relationships of aperiodic materials. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.01647 [pdf, other]

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

Authors: Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan

Abstract: Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate independently without mutual interference and b) can be preserved to share with different modal input,… ▽ More Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate independently without mutual interference and b) can be preserved to share with different modal input, both aspects often neglected in existing methods. To address this gap, this paper proposes a novel Efficient Disentanglement framework for Talking head generation (EDTalk). Our framework enables individual manipulation of mouth shape, head pose, and emotional expression, conditioned on video or audio inputs. Specifically, we employ three lightweight modules to decompose the facial dynamics into three distinct latent spaces representing mouth, pose, and expression, respectively. Each space is characterized by a set of learnable bases whose linear combinations define specific motions. To ensure independence and accelerate training, we enforce orthogonality among bases and devise an efficient training strategy to allocate motion responsibilities to each space without relying on external knowledge. The learned bases are then stored in corresponding banks, enabling shared visual priors with audio input. Furthermore, considering the properties of each space, we propose an Audio-to-Motion module for audio-driven talking head synthesis. Experiments are conducted to demonstrate the effectiveness of EDTalk. We recommend watching the project website: https://tanshuai0219.github.io/EDTalk/ △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 22 pages, 15 figures

arXiv:2403.18927 [pdf, other]

Optimal Coherent Quantum Phase Estimation via Tapering

Authors: Dhrumil Patel, Shi Jie Samuel Tan, Yigit Subasi, Andrew T. Sornborger

Abstract: Quantum phase estimation is one of the fundamental primitives that underpins many quantum algorithms, including quantum amplitude estimation, the HHL algorithm for solving linear systems of equations, and quantum principal component analysis. Due to its significance as a subroutine, in this work, we study the coherent version of the phase estimation problem, where given an arbitrary input state an… ▽ More Quantum phase estimation is one of the fundamental primitives that underpins many quantum algorithms, including quantum amplitude estimation, the HHL algorithm for solving linear systems of equations, and quantum principal component analysis. Due to its significance as a subroutine, in this work, we study the coherent version of the phase estimation problem, where given an arbitrary input state and black-box access to unitaries $U$ and controlled-$U$, the goal is to estimate the phases of $U$ in superposition. Unlike most existing phase estimation algorithms, which employ intermediary measurements steps that inevitably destroy coherence, only a couple of algorithms, including the well-known standard quantum phase estimation algorithm, consider this coherent setting. In this work, we propose an improved version of this standard algorithm that utilizes tapering/window functions. Our algorithm, which we call tapered quantum phase estimation algorithm, achieves the optimal query complexity (total number of calls to $U$ and controlled-$U$) without requiring the use of a computationally expensive quantum sorting network for median computation, which the standard algorithm uses to boost the success probability arbitrarily close to one. We also show that the tapering functions that we use are optimal by formulating optimization problems with different optimization criteria. Beyond the asymptotic regime, we also provide non-asymptotic query complexity of our algorithm, as it is crucial for practical implementation. Finally, we also propose an efficient algorithm that prepares the quantum state corresponding to the optimal tapering function. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 23 pages, 6 figures

Report number: LA-UR-23-30410

arXiv:2403.15132 [pdf, other]

Transfer CLIP for Generalizable Image Denoising

Authors: Jun Cheng, Dong Liang, Shan Tan

Abstract: Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-w… ▽ More Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-world image recognition and segmentation. Yet, the potential for leveraging CLIP to enhance the robustness of low-level tasks remains largely unexplored. This paper uncovers that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Leveraging these properties, we devise an asymmetrical encoder-decoder denoising network, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising. The progressive feature augmentation strategy is further proposed to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments and comparisons conducted across diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of our method. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.09917 [pdf]

doi 10.1016/j.pss.2024.105863

The Equilibrium Vapor Pressures of Ammonia and Oxygen Ices at Outer Solar System Temperatures

Authors: B. P. Blakley, Will M. Grundy, Jordan K. Steckloff, Sugata P. Tan, Jennifer Hanley, Anna E. Engle, Stephen C. Tegler, Gerrick E. Lindberg, Shae M. Raposa, Kendall J. Koga, Cecilia L. Thieberger

Abstract: Few laboratory studies have investigated the vapor pressures of the volatiles that may be present as ices in the outer solar system; even fewer studies have investigated these species at the temperatures and pressures suitable to the surfaces of icy bodies in the Saturnian and Uranian systems ($\lt$100 K, $\lt10^{-9}$ bar). This study adds to the work of Grundy et al. (2024) in extending the known… ▽ More Few laboratory studies have investigated the vapor pressures of the volatiles that may be present as ices in the outer solar system; even fewer studies have investigated these species at the temperatures and pressures suitable to the surfaces of icy bodies in the Saturnian and Uranian systems ($\lt$100 K, $\lt10^{-9}$ bar). This study adds to the work of Grundy et al. (2024) in extending the known equilibrium vapor pressures of outer solar system ices through laboratory investigations at very low temperatures. Our experiments with ammonia and oxygen ices provide new thermodynamic models for these species' respective enthalpies of sublimation. We find that ammonia ice, and to a lesser degree oxygen ice, are stable at higher temperatures than extrapolations in previous literature have predicted. Our results show that these ices should be retained over longer periods of time than previous extrapolations would predict, and a greater amount of these solids is required to support observation in exospheres of airless bodies in the outer solar system. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 29 pages, 9 figures, to be published in Planetary and Space Science

arXiv:2403.08245 [pdf, other]

Scattered Mixture-of-Experts Implementation

Authors: Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

Abstract: We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and overcoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our… ▽ More We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and overcoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our implementation and the various kernels used to speed up the operation. We benchmark our implementation against Megablocks, and show that it enables a higher throughput and lower memory footprint. We also show how ParallelLinear enables extension of the Mixture-of-Experts concept by demonstrating with an implementation of Mixture of Attention. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06375 [pdf, other]

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

Authors: Shuai Tan, Bin Ji, Ye Pan

Abstract: Generating emotional talking faces is a practical yet challenging endeavor. To create a lifelike avatar, we draw upon two critical insights from a human perspective: 1) The connection between audio and the non-deterministic facial dynamics, encompassing expressions, blinks, poses, should exhibit synchronous and one-to-many mapping. 2) Vibrant expressions are often accompanied by emotion-aware high… ▽ More Generating emotional talking faces is a practical yet challenging endeavor. To create a lifelike avatar, we draw upon two critical insights from a human perspective: 1) The connection between audio and the non-deterministic facial dynamics, encompassing expressions, blinks, poses, should exhibit synchronous and one-to-many mapping. 2) Vibrant expressions are often accompanied by emotion-aware high-definition (HD) textures and finely detailed teeth. However, both aspects are frequently overlooked by existing methods. To this end, this paper proposes using normalizing Flow and Vector-Quantization modeling to produce emotional talking faces that satisfy both insights concurrently (FlowVQTalker). Specifically, we develop a flow-based coefficient generator that encodes the dynamics of facial emotion into a multi-emotion-class latent space represented as a mixture distribution. The generation process commences with random sampling from the modeled distribution, guided by the accompanying audio, enabling both lip-synchronization and the uncertain nonverbal facial cues generation. Furthermore, our designed vector-quantization image generator treats the creation of expressive facial images as a code query task, utilizing a learned codebook to provide rich, high-quality textures that enhance the emotional perception of the results. Extensive experiments are conducted to showcase the effectiveness of our approach. △ Less

Submitted 22 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 11 pages, 11 figures, conference

arXiv:2403.06365 [pdf, other]

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Authors: Shuai Tan, Bin Ji, Ye Pan

Abstract: Although automatically animating audio-driven talking heads has recently received growing interest, previous efforts have mainly concentrated on achieving lip synchronization with the audio, neglecting two crucial elements for generating expressive videos: emotion style and art style. In this paper, we present an innovative audio-driven talking face generation method called Style2Talker. It involv… ▽ More Although automatically animating audio-driven talking heads has recently received growing interest, previous efforts have mainly concentrated on achieving lip synchronization with the audio, neglecting two crucial elements for generating expressive videos: emotion style and art style. In this paper, we present an innovative audio-driven talking face generation method called Style2Talker. It involves two stylized stages, namely Style-E and Style-A, which integrate text-controlled emotion style and picture-controlled art style into the final output. In order to prepare the scarce emotional text descriptions corresponding to the videos, we propose a labor-free paradigm that employs large-scale pretrained models to automatically annotate emotional text labels for existing audiovisual datasets. Incorporating the synthetic emotion texts, the Style-E stage utilizes a large-scale CLIP model to extract emotion representations, which are combined with the audio, serving as the condition for an efficient latent diffusion model designed to produce emotional motion coefficients of a 3DMM model. Moving on to the Style-A stage, we develop a coefficient-driven motion generator and an art-specific style path embedded in the well-known StyleGAN. This allows us to synthesize high-resolution artistically stylized talking head videos using the generated emotional motion coefficients and an art style source picture. Moreover, to better preserve image details and avoid artifacts, we provide StyleGAN with the multi-scale content features extracted from the identity image and refine its intermediate feature maps by the designed content encoder and refinement network, respectively. Extensive experimental results demonstrate our method outperforms existing state-of-the-art methods in terms of audio-lip synchronization and performance of both emotion style and art style. △ Less

Submitted 11 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, conference

arXiv:2403.06363 [pdf, other]

Say Anything with Any Style

Authors: Shuai Tan, Bin Ji, Yu Ding, Ye Pan

Abstract: Generating stylized talking head with diverse head motions is crucial for achieving natural-looking videos but still remains challenging. Previous works either adopt a regressive method to capture the speaking style, resulting in a coarse style that is averaged across all training data, or employ a universal network to synthesize videos with different styles which causes suboptimal performance. To… ▽ More Generating stylized talking head with diverse head motions is crucial for achieving natural-looking videos but still remains challenging. Previous works either adopt a regressive method to capture the speaking style, resulting in a coarse style that is averaged across all training data, or employ a universal network to synthesize videos with different styles which causes suboptimal performance. To address these, we propose a novel dynamic-weight method, namely Say Anything withAny Style (SAAS), which queries the discrete style representation via a generative model with a learned style codebook. Specifically, we develop a multi-task VQ-VAE that incorporates three closely related tasks to learn a style codebook as a prior for style extraction. This discrete prior, along with the generative model, enhances the precision and robustness when extracting the speaking styles of the given style clips. By utilizing the extracted style, a residual architecture comprising a canonical branch and style-specific branch is employed to predict the mouth shapes conditioned on any driving audio while transferring the speaking style from the source to any desired one. To adapt to different speaking styles, we steer clear of employing a universal network by exploring an elaborate HyperStyle to produce the style-specific weights offset for the style branch. Furthermore, we construct a pose generator and a pose codebook to store the quantized pose representation, allowing us to sample diverse head motions aligned with the audio and the extracted style. Experiments demonstrate that our approach surpasses state-of-theart methods in terms of both lip-synchronization and stylized expression. Besides, we extend our SAAS to video-driven style editing field and achieve satisfactory performance. △ Less

Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, conference

arXiv:2403.06217 [pdf, ps, other]

Non-existence of Shimura curves of Mumford type generically in the non-hyperelliptic locus

Authors: Xin Lu, Shengli Tan, Kang Zuo

Abstract: We show that there does not exist any Shimura curve with strictly maximal Higgs field generically in the Torelli locus of non-hyperelliptic curves of genus $g\geq 4$. In particular, Shimura curves of Mumford type are not generically in the Torelli locus of non-hyperelliptic curves of genus $g\geq 4$. We show that there does not exist any Shimura curve with strictly maximal Higgs field generically in the Torelli locus of non-hyperelliptic curves of genus $g\geq 4$. In particular, Shimura curves of Mumford type are not generically in the Torelli locus of non-hyperelliptic curves of genus $g\geq 4$. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: Any comment is welcome

MSC Class: 14J10; 14E30

arXiv:2403.04446 [pdf, other]

Weak Hopf symmetry and tube algebra of the generalized multifusion string-net model

Authors: Zhian Jia, Sheng Tan, Dagomir Kaszlikowski

Abstract: We investigate the multifusion generalization of string-net ground states and lattice Hamiltonians, delving into its associated weak Hopf symmetry. For the multifusion string-net, the gauge symmetry manifests as a general weak Hopf algebra, leading to a reducible vacuum string label; the charge symmetry, serving as a quantum double of gauge symmetry, constitutes a connected weak Hopf algebra. This… ▽ More We investigate the multifusion generalization of string-net ground states and lattice Hamiltonians, delving into its associated weak Hopf symmetry. For the multifusion string-net, the gauge symmetry manifests as a general weak Hopf algebra, leading to a reducible vacuum string label; the charge symmetry, serving as a quantum double of gauge symmetry, constitutes a connected weak Hopf algebra. This implies that the associated topological phase retains its characterization by a unitary modular tensor category (UMTC). The bulk charge symmetry can also be captured by a weak Hopf tube algebra. We offer an explicit construction of the weak Hopf tube algebra structure and thoroughly discuss its properties. The gapped boundary and domain wall models are extensively discussed, with these $1d$ phases characterized by unitary multifusion categories (UMFCs). We delve into the gauge and charge symmetries of these $1d$ phases, as well as the construction of the boundary and domain wall tube algebras. Additionally, we illustrate that the domain wall tube algebra can be regarded as a cross product of two boundary tube algebras. As an application of our model, we elucidate how to interpret the defective string-net as a restricted multifusion string-net. △ Less

Submitted 14 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: v1: 64 pages

arXiv:2403.04133 [pdf, other]

Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving

Authors: Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, Holger Caesar

Abstract: Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world's first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situa… ▽ More Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world's first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situations and to make safe and efficient decisions. To that end, we introduce a new large-scale dataset that consists of 1282 hours of diverse driving scenarios from 4 cities (Las Vegas, Boston, Pittsburgh, and Singapore) and includes high-quality auto-labeled object tracks and traffic light data. We exhaustively mine and taxonomize common and rare driving scenarios which are used during evaluation to get fine-grained insights into the performance and characteristics of a planner. Beyond the dataset, we provide a simulation and evaluation framework that enables a planner's actions to be simulated in closed-loop to account for interactions with other traffic participants. We present a detailed analysis of numerous baselines and investigate gaps between ML-based and traditional methods. Find the nuPlan dataset and code at nuplan.org. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: ICRA 2024 camera ready incl. supplementary material

arXiv:2403.02593 [pdf, ps, other]

The Ramsey numbers for trees of order $n$ with maximum degree at least $n-5$ versus the wheel graph of order nine

Authors: Zhi Yee Chng, Thomas Britz, Ta Sheng Tan, Kok Bin Wong

Abstract: The Ramsey numbers $R(T_n,W_8)$ are determined for each tree graph $T_n$ of order $n\geq 7$ and maximum degree $Δ(T_n)$ equal to either $n-4$ or $n-5$. These numbers indicate strong support for the conjecture, due to Chen, Zhang and Zhang and to Hafidh and Baskoro, that $R(T_n,W_m) = 2n-1$ for each tree graph $T_n$ of order $n\geq m-1$ with $Δ(T_n)\leq n-m+2$ when $m\geq 4$ is even. The Ramsey numbers $R(T_n,W_8)$ are determined for each tree graph $T_n$ of order $n\geq 7$ and maximum degree $Δ(T_n)$ equal to either $n-4$ or $n-5$. These numbers indicate strong support for the conjecture, due to Chen, Zhang and Zhang and to Hafidh and Baskoro, that $R(T_n,W_m) = 2n-1$ for each tree graph $T_n$ of order $n\geq m-1$ with $Δ(T_n)\leq n-m+2$ when $m\geq 4$ is even. △ Less

Submitted 4 March, 2024; originally announced March 2024.

MSC Class: 05C55; 05D10

arXiv:2403.01229 [pdf, other]

REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild

Authors: Jose Vargas Quiros, Chirag Raman, Stephanie Tan, Ekin Gedik, Laura Cabrera-Quiros, Hayley Hung

Abstract: Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning mode… ▽ More Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning models trained on video and wearable sensor data make it possible to recognize speech by detecting its related gestures in an unobtrusive, privacy-preserving way. These models themselves should ideally be trained using labels obtained from the speech signal. However, existing mingling datasets do not contain high quality audio recordings. Instead, speaking status annotations have often been inferred by human annotators from video, without validation of this approach against audio-based ground truth. In this paper we revisit no-audio speaking status estimation by presenting the first publicly available multimodal dataset with high-quality individual speech recordings of 33 subjects in a professional networking event. We present three baselines for no-audio speaking status segmentation: a) from video, b) from body acceleration (chest-worn accelerometer), c) from body pose tracks. In all cases we predict a 20Hz binary speaking status signal extracted from the audio, a time resolution not available in previous datasets. In addition to providing the signals and ground truth necessary to evaluate a wide range of speaking status detection methods, the availability of audio in REWIND makes it suitable for cross-modality studies not feasible with previous mingling datasets. Finally, our flexible data consent setup creates new challenges for multimodal systems under missing modalities. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.18600 [pdf]

Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina

Authors: Yasin Sadeghi Bazargani, Majid Mirzaei, Navid Sobhi, Mirsaeed Abdollahi, Ali Jafarizadeh, Siamak Pedrammehr, Roohallah Alizadehsani, Ru San Tan, Sheikh Mohammed Shariful Islam, U. Rajendra Acharya

Abstract: Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled s… ▽ More Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 44 Pages, 6 figures, 1 table, 166 references

ACM Class: J.3.2; J.3.3

arXiv:2402.18592 [pdf, other]

A$^3$PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader

Authors: Qingcai Jiang, Shaojie Tan, Junshi Chen, Hong An

Abstract: The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The Processing-in-Memory (PIM) paradigm emerges as a promising architecture that mitigates the need for extensive data movements by strategically positioning computing units pro… ▽ More The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The Processing-in-Memory (PIM) paradigm emerges as a promising architecture that mitigates the need for extensive data movements by strategically positioning computing units proximate to the memory. Despite the abundant efforts devoted to building a robust and highly-available PIM system, identifying PIM-friendly segments of applications poses significant challenges due to the lack of a comprehensive tool to evaluate the intrinsic memory access pattern of the segment. To tackle this challenge, we propose A$^3$PIM: an Automated, Analytic and Accurate Processing-in-Memory offloader. We systematically consider the cross-segment data movement and the intrinsic memory access pattern of each code segment via static code analyzer. We evaluate A$^3$PIM across a wide range of real-world workloads including GAP and PrIM benchmarks and achieve an average speedup of 2.63x and 4.45x (up to 7.14x and 10.64x) when compared to CPU-only and PIM-only executions, respectively. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures, accepted for presentation at Design, Automation and Test in Europe Conference | The European Event for Electronic System Design & Test (DATE 2024), conference to be held in March 2024

arXiv:2402.17509 [pdf, other]

Extreme Miscalibration and the Illusion of Adversarial Robustness

Authors: Vyas Raina, Samson Tan, Volkan Cevher, Aditya Rawal, Sheng Zha, George Karypis

Abstract: Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, we have discovered an intriguing phenomenon: deliberately or accidentally miscalibrating models masks gradients in a way that interferes with adversarial attack… ▽ More Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, we have discovered an intriguing phenomenon: deliberately or accidentally miscalibrating models masks gradients in a way that interferes with adversarial attack search methods, giving rise to an apparent increase in robustness. We show that this observed gain in robustness is an illusion of robustness (IOR), and demonstrate how an adversary can perform various forms of test-time temperature calibration to nullify the aforementioned interference and allow the adversarial attack to find adversarial examples. Hence, we urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations to ensure that any observed gains are genuine. Finally, we show how the temperature can be scaled during \textit{training} to improve genuine robustness. △ Less

Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.14366 [pdf, other]

Understanding and Detecting Annotation-Induced Faults of Static Analyzers

Authors: Huaien Zhang, Yu Pei, Shuyun Liang, Shin Hwei Tan

Abstract: Static analyzers can reason about the properties and behaviors of programs and detect various issues without executing them. Hence, they should extract the necessary information to understand the analyzed program well. Annotation has been a widely used feature for different purposes in Java since the introduction of Java 5. Annotations can change program structures and convey semantics information… ▽ More Static analyzers can reason about the properties and behaviors of programs and detect various issues without executing them. Hence, they should extract the necessary information to understand the analyzed program well. Annotation has been a widely used feature for different purposes in Java since the introduction of Java 5. Annotations can change program structures and convey semantics information without awareness of static analyzers, consequently leading to imprecise analysis results. This paper presents the first comprehensive study of annotation-induced faults (AIF) by analyzing 246 issues in six open-source and popular static analyzers (i.e., PMD, SpotBugs, CheckStyle, Infer, SonarQube, and Soot). We analyzed the issues' root causes, symptoms, and fix strategies and derived ten findings and some practical guidelines for detecting and repairing annotation-induced faults. Moreover, we developed an automated testing framework called AnnaTester based on three metamorphic relations originating from the findings. AnnaTester generated new tests based on the official test suites of static analyzers and unveiled 43 new faults, 20 of which have been fixed. The results confirm the value of our study and its findings. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 23 pages, 16 figures

arXiv:2402.10551 [pdf, other]

Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.04983 [pdf, ps, other]

Broadband squeezed light field by magnetostriction in an opto-magnomechanical

Authors: Ke Di, Shuai Tan, Anyu Cheng, Yinxue Zhao, Yu Liu, Jiajia Du

Abstract: We present a novel mechanism for generating a wide bandwidth squeezed optical output field in an opto-magnomechanical system. In this system, the magnon (mechanical) mode in the yttrium-iron-garnet crystal is coupled to the microwave field (optical field) through magnetic dipole (radiation pressure) interaction. The magnetostrictive force induced by the yttrium-iron-garnet crystal causes a mechani… ▽ More We present a novel mechanism for generating a wide bandwidth squeezed optical output field in an opto-magnomechanical system. In this system, the magnon (mechanical) mode in the yttrium-iron-garnet crystal is coupled to the microwave field (optical field) through magnetic dipole (radiation pressure) interaction. The magnetostrictive force induced by the yttrium-iron-garnet crystal causes a mechanical displacement and creates a quadrature squeezed magnon mode. Eventually, this quadrature squeezed mechanical mode is transferred to the output optical field through state-swap interaction. Our results demonstrate the optimal parameter range for obtaining a stable squeezed optical output field with a wide bandwidth. Moreover, the squeezed light field exhibits strong robustness to environmental temperature. The new scheme we propose has potential applications in quantum precision measurements, quantum wireless networks, quantum radar, etc. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.02478 [pdf, other]

Why are hyperbolic neural networks effective? A study on hierarchical representation capability

Authors: Shicheng Tan, Huanjing Zhao, Shu Zhao, Yanping Zhang

Abstract: Hyperbolic Neural Networks (HNNs), operating in hyperbolic space, have been widely applied in recent years, motivated by the existence of an optimal embedding in hyperbolic space that can preserve data hierarchical relationships (termed Hierarchical Representation Capability, HRC) more accurately than Euclidean space. However, there is no evidence to suggest that HNNs can achieve this theoretical… ▽ More Hyperbolic Neural Networks (HNNs), operating in hyperbolic space, have been widely applied in recent years, motivated by the existence of an optimal embedding in hyperbolic space that can preserve data hierarchical relationships (termed Hierarchical Representation Capability, HRC) more accurately than Euclidean space. However, there is no evidence to suggest that HNNs can achieve this theoretical optimal embedding, leading to much research being built on flawed motivations. In this paper, we propose a benchmark for evaluating HRC and conduct a comprehensive analysis of why HNNs are effective through large-scale experiments. Inspired by the analysis results, we propose several pre-training strategies to enhance HRC and improve the performance of downstream tasks, further validating the reliability of the analysis. Experiments show that HNNs cannot achieve the theoretical optimal embedding. The HRC is significantly affected by the optimization objectives and hierarchical structures, and enhancing HRC through pre-training strategies can significantly improve the performance of HNNs. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02202 [pdf, other]

Three-body scattering area for particles with infinite or zero scattering length in two dimensions

Authors: Junjie Liang, Shina Tan

Abstract: We derive the asymptotic expansions of the wave function of three particles having equal mass with finite-range interactions and infinite or zero two-dimensional scattering length colliding at zero energy and zero orbital angular momentum, from which a three-body parameter $D$ is defined. The dimension of $D$ is length squared, and we call $D$ three-body scattering area. We find that the ground st… ▽ More We derive the asymptotic expansions of the wave function of three particles having equal mass with finite-range interactions and infinite or zero two-dimensional scattering length colliding at zero energy and zero orbital angular momentum, from which a three-body parameter $D$ is defined. The dimension of $D$ is length squared, and we call $D$ three-body scattering area. We find that the ground state energy per particle of a zero-temperature dilute Bose gas with these interactions is approximately $\frac{\hbar^2 D }{6m}ρ^2$, where $ρ$ is the number density of the bosons, $m$ is the mass of each boson, and $\hbar$ is Planck's constant over $2π$. Such a Bose gas is stable at $D\geq 0$ in the thermodynamic limit, and metastable at $D<0$ in the harmonic trap if the number of bosons is less than $N_{cr}\approx 3.6413 \sqrt{\frac{\hbar}{mω|D|}}$, where $ω$ is the angular frequency of the harmonic trap. If the two-body interaction supports bound states, $D$ typically acquires a negative imaginary part, and we find the relation between this imaginary part and the amplitudes of the pair-boson production processes. We derive a formula for the three-body recombination rate constant of the many-boson system in terms of the imaginary part of $D$. △ Less

Submitted 28 April, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01150 [pdf]

Entanglement enhancement of two different magnon modes via nonlinear effect in cavity magnomechanics

Authors: Ke Di, Xi Wang, Shuai Tan, Yinxue Zhao, Yu Liu, Anyu Cheng, Jiajia Du

Abstract: We present a scheme to enhance two different magnon modes entanglement in cavity magnomechanics via nonlinear effect. The scheme demonstrated that nonlinear effects enhance entanglement of the two magnon modes. Moreover, the entanglement of the two magnon modes is also significantly enhanced by microwave parametric amplification (PA) and magnon self-Kerr nonlinearity. Not only dose nonlinear effec… ▽ More We present a scheme to enhance two different magnon modes entanglement in cavity magnomechanics via nonlinear effect. The scheme demonstrated that nonlinear effects enhance entanglement of the two magnon modes. Moreover, the entanglement of the two magnon modes is also significantly enhanced by microwave parametric amplification (PA) and magnon self-Kerr nonlinearity. Not only dose nonlinear effect enhances the strength of entanglement, but it also increases the robustness of entanglement against temperature. Our proposed scheme plays an important role in the research of fundamental theories of quantum physics and quantum information processing theory. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:1903.00221 by other authors

arXiv:2401.15234 [pdf, other]

Moving beyond Deletions: Program Simplification via Diverse Program Transformations

Authors: Haibo Wang, Zezhong Xing, Zheng Wang, Chengnian Sun, Shin Hwei Tan

Abstract: To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially a… ▽ More To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially applied to automate developer-induced program simplification. However, as there is little study on how developers simplify programs in Open-source Software (OSS) projects, it is unclear whether these approaches can be effectively used for developer-induced program simplification. Hence, we present the first study of developer-induced program simplification in OSS projects, focusing on the types of program transformations used, the motivations behind simplifications, and the set of program transformations covered by existing refactoring types. Our study of 382 pull requests from 296 projects reveals that there exist gaps in applying existing approaches for automating developer-induced program simplification. and outlines the criteria for designing automatic program simplification techniques. Inspired by our study and to reduce the manual effort in developer-induced program simplification, we propose SimpT5, a tool that can automatically produce simplified programs (semantically-equivalent programs with reduced source lines of code). SimpT5 is trained based on our collected dataset of 92,485 simplified programs with two heuristics: (1) simplified line localization that encodes lines changed in simplified programs, and (2)checkers that measure the quality of generated programs. Our evaluation shows that SimpT5 are more effective than prior approaches in automating developer-induced program simplification. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Showing 1–50 of 742 results for author: Tan, S