Skip to main content

Showing 1–50 of 60 results for author: Tan, D

  1. arXiv:2407.12404  [pdf, other

    cs.LG

    Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

    Authors: Daniel Tan, David Chanin, Aengus Lynch, Dimitrios Kanoulas, Brooks Paige, Adria Garriga-Alonso, Robert Kirk

    Abstract: Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that s… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2406.09801  [pdf, other

    cs.CV

    RaNeuS: Ray-adaptive Neural Surface Reconstruction

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Our objective is to leverage a differentiable radiance field \eg NeRF to reconstruct detailed 3D surfaces in addition to producing the standard novel view renderings. There have been related methods that perform such tasks, usually by utilizing a signed distance field (SDF). However, the state-of-the-art approaches still fail to correctly reconstruct the small-scale details, such as the leaves, ro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 3DV 2024, oral. In: Proceedings of the IEEE/CVF International Conference on 3D Vision (2023)

  3. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2405.15095  [pdf, other

    cs.ET quant-ph

    Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling

    Authors: Daniel Bochen Tan, Wan-Hsuan Lin, Jason Cong

    Abstract: Dynamically field-programmable qubit arrays based on neutral atoms have high fidelity and highly parallel gates for quantum computing. However, it is challenging for compilers to fully leverage the novel flexibility offered by such hardware while respecting its various constraints. In this study, we break down the compilation for this architecture into three tasks: scheduling, placement, and routi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2404.19664  [pdf, other

    cs.RO cs.LG

    Towards Generalist Robot Learning from Internet Video: A Survey

    Authors: Robert McCarthy, Daniel C. H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li

    Abstract: This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for developing general-purpose robots. We open w… ▽ More

    Submitted 7 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Updated formatting. Reduced paper length and made other minor improvements

  6. arXiv:2404.18369  [pdf, other

    quant-ph cs.ET

    A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum Computing

    Authors: Daniel Bochen Tan, Murphy Yuezhen Niu, Craig Gidney

    Abstract: Quantum error correction is necessary for large-scale quantum computing. A promising quantum error correcting code is the surface code. For this code, fault-tolerant quantum computing (FTQC) can be performed via lattice surgery, i.e., splitting and merging patches of code. Given the frequent use of certain lattice-surgery subroutines (LaS), it becomes crucial to optimize their design in order to m… ▽ More

    Submitted 17 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: To appear in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)

  7. arXiv:2404.06224  [pdf, other

    cs.CL cs.AI cs.LG

    Low-Cost Generation and Evaluation of Dictionary Example Sentences

    Authors: Bill Cai, Clarence Boon Liang Ng, Daniel Tan, Shelvia Hotama

    Abstract: Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundationa… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2403.04765  [pdf, other

    cs.CV

    Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

    Authors: Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, Xiaowei Zhou

    Abstract: We present a novel method for efficiently producing semi-dense matches across images. Previous detector-free matcher LoFTR has shown remarkable matching capability in handling large-viewpoint change and texture-poor scenarios but suffers from low efficiency. We revisit its design choices and derive multiple improvements for both efficiency and accuracy. One key observation is that performing the t… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project page: https://zju3dv.github.io/efficientloftr

  9. arXiv:2402.10551  [pdf, other

    cs.LG q-bio.QM

    Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

    Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

    Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  10. arXiv:2402.03046  [pdf, other

    cs.LG

    Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

    Authors: Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut , et al. (8 additional authors not shown)

    Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, i… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Under review

  11. arXiv:2401.16356  [pdf, other

    physics.ins-det cs.LG gr-qc

    cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation

    Authors: Tom Dooney, Lyana Curier, Daniel Tan, Melissa Lopez, Chris Van Den Broeck, Stefano Bromuri

    Abstract: Simulating realistic time-domain observations of gravitational waves (GWs) and GW detector glitches can help in advancing GW data analysis. Simulated data can be used in downstream tasks by augmenting datasets for signal searches, balancing data sets for machine learning, and validating detection schemes. In this work, we present Conditional Derivative GAN (cDVGAN), a novel conditional model in th… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 20 pages, 17 figures, 5 tables

  12. arXiv:2401.13807  [pdf, other

    cs.ET quant-ph

    Depth-Optimal Addressing of 2D Qubit Array with 1D Controls Based on Exact Binary Matrix Factorization

    Authors: Daniel Bochen Tan, Shuohao Ping, Jason Cong

    Abstract: Reducing control complexity is essential for achieving large-scale quantum computing. However, reducing control knobs may compromise the ability to independently address each qubit. Recent progress in neutral atom-based platforms suggests that rectangular (row-column) addressing may strike a balance between control granularity and flexibility for 2D qubit arrays. This scheme allows addressing qubi… ▽ More

    Submitted 22 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  13. arXiv:2311.16241  [pdf, other

    cs.CV

    SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

    Authors: Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari

    Abstract: In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  14. arXiv:2311.16190  [pdf, other

    quant-ph cs.AR cs.ET

    Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

    Authors: Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han

    Abstract: Neutral atom arrays have become a promising platform for quantum computing, especially the field programmable qubit array (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges i… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 10 pages, 16 figures; Published as a conference paper at DAC 2024

  15. arXiv:2311.15123  [pdf, other

    quant-ph cs.AR cs.DC

    Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

    Authors: Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han

    Abstract: The neutral atom array has gained prominence in quantum computing for its scalability and operation fidelity. Previous works focus on fixed atom arrays (FAAs) that require extensive SWAP operations for long-range interactions. This work explores a novel architecture reconfigurable atom arrays (RAAs), also known as field programmable qubit arrays (FPQAs), which allows for coherent atom movements du… ▽ More

    Submitted 2 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 17 pages, 26 figures; Published as a conference paper at ISCA 2024

  16. arXiv:2309.15487  [pdf, other

    cs.CV

    Tackling VQA with Pretrained Foundation Models without Further Training

    Authors: Alvin De Jun Tan, Bingquan Shen

    Abstract: Large language models (LLMs) have achieved state-of-the-art results in many natural language processing tasks. They have also demonstrated ability to adapt well to different tasks through zero-shot or few-shot settings. With the capability of these LLMs, researchers have looked into how to adopt them for use with Visual Question Answering (VQA). Many methods require further training to align the i… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  17. arXiv:2309.15486  [pdf, other

    cs.CV

    Transferability of Representations Learned using Supervised Contrastive Learning Trained on a Multi-Domain Dataset

    Authors: Alvin De Jun Tan, Clement Tan, Chai Kiat Yeo

    Abstract: Contrastive learning has shown to learn better quality representations than models trained using cross-entropy loss. They also transfer better to downstream datasets from different domains. However, little work has been done to explore the transferability of representations learned using contrastive learning when trained on a multi-domain dataset. In this paper, a study has been conducted using th… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  18. arXiv:2306.04026  [pdf, other

    cs.LG cs.AI cs.RO

    Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

    Authors: Daniel C. H. Tan, Fernando Acero, Robert McCarthy, Dimitrios Kanoulas, Zhibin Li

    Abstract: Guaranteeing safe behaviour of reinforcement learning (RL) policies poses significant challenges for safety-critical applications, despite RL's generality and scalability. To address this, we propose a new approach to apply verification methods from control theory to learned value functions. By analyzing task structures for safety preservation, we formalize original theorems that establish links b… ▽ More

    Submitted 5 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  19. Compiling Quantum Circuits for Dynamically Field-Programmable Neutral Atoms Array Processors

    Authors: Daniel Bochen Tan, Dolev Bluvstein, Mikhail D. Lukin, Jason Cong

    Abstract: Dynamically field-programmable qubit arrays (DPQA) have recently emerged as a promising platform for quantum information processing. In DPQA, atomic qubits are selectively loaded into arrays of optical traps that can be reconfigured during the computation itself. Leveraging qubit transport and parallel, entangling quantum operations, different pairs of qubits, even those initially far away, can be… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Version accepted by Quantum. 21 pages, 9 figures, 7 tables. An extended abstract was presented at the 41st International Conference on Computer-Aided Design (ICCAD '22)

    Journal ref: Quantum 8, 1281 (2024)

  20. arXiv:2303.17408  [pdf, other

    cs.CL

    P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

    Authors: Yucheng Ruan, Xiang Lan, Daniel J. Tan, Hairil Rizal Abdullah, Mengling Feng

    Abstract: Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

  21. arXiv:2301.10894  [pdf, other

    cs.RO

    Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

    Authors: Daniel Chee Hian Tan, Jenny Zhang, Michael, Chuah, Zhibin Li

    Abstract: This work developed a learning framework for perceptive legged locomotion that combines visual feedback, proprioceptive information, and active gait regulation of foot-ground contacts. The perception requires only one forward-facing camera to obtain the heightmap, and the active regulation of gait paces and traveling velocity are realized through our formulation of CPG-based high-level imitation o… ▽ More

    Submitted 30 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  22. arXiv:2212.06145  [pdf, other

    cs.LG cs.AI cs.CV

    AP: Selective Activation for De-sparsifying Pruned Neural Networks

    Authors: Shiyu Liu, Rohan Ghosh, Dylan Tan, Mehul Motani

    Abstract: The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. I… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: 16 Pages

  23. arXiv:2212.03398   

    eess.AS cs.CL cs.SD

    Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

    Authors: Daxin Tan, Nikos Kargas, David McHardy, Constantinos Papayiannis, Antonio Bonafonte, Marek Strelec, Jonas Rohnke, Agis Oikonomou Filandras, Trevor Wood

    Abstract: Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: This version has been removed by arXiv administrators because the submitter did not have the right to assign a license at the time of submission

  24. arXiv:2211.11674  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

    Authors: Dario Pavllo, David Joseph Tan, Marie-Julie Rakotosaona, Federico Tombari

    Abstract: Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream appli… ▽ More

    Submitted 20 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Code and models are available at https://github.com/google-research/nerf-from-image

  25. arXiv:2209.13112  [pdf, other

    eess.AS cs.SD

    Automated Sex Classification of Children's Voices and Changes in Differentiating Factors with Age

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Weiting Tan

    Abstract: Sex classification of children's voices allows for an investigation of the development of secondary sex characteristics which has been a key interest in the field of speech analysis. This research investigated a broad range of acoustic features from scripted and spontaneous speech and applied a hierarchical clustering-based machine learning model to distinguish the sex of children aged between 5 a… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  26. arXiv:2209.12213  [pdf, other

    cs.CV

    ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement

    Authors: Dongli Tan, Jiang-Jiang Liu, Xingyu Chen, Chao Chen, Ruixin Zhang, Yunhang Shen, Shouhong Ding, Rongrong Ji

    Abstract: Modeling sparse and dense image matching within a unified functional correspondence model has recently attracted increasing research interest. However, existing efforts mainly focus on improving matching accuracy while ignoring its efficiency, which is crucial for realworld applications. In this paper, we propose an efficient structure named Efficient Correspondence Transformer (ECO-TR) by finding… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted by ECCV2022

  27. arXiv:2209.06429  [pdf, other

    cs.LG

    A Hybrid Deep Learning Model-based Remaining Useful Life Estimation for Reed Relay with Degradation Pattern Clustering

    Authors: Chinthaka Gamanayake, Yan Qin, Chau Yuen, Lahiru Jayasinghe, Dominique-Ea Tan, Jenny Low

    Abstract: Reed relay serves as the fundamental component of functional testing, which closely relates to the successful quality inspection of electronics. To provide accurate remaining useful life (RUL) estimation for reed relay, a hybrid deep learning network with degradation pattern clustering is proposed based on the following three considerations. First, multiple degradation behaviors are observed for r… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: This paper has been acctepted by IEEE Transactions on Industrial Informatics

  28. SoftPool++: An Encoder-Decoder Network for Point Cloud Completion

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel convolutional operator for the task of point cloud completion. One striking characteristic of our approach is that, conversely to related work it does not require any max-pooling or voxelization operation. Instead, the proposed operator used to learn the point cloud embedding in the encoder extracts permutation-invariant features from the point cloud via a soft-pooling of featur… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

    Comments: Accepted in International Journal of Computer Vision

    Journal ref: Int J Comput Vis 130, 1145-1164 (2022)

  29. arXiv:2204.05460  [pdf, other

    eess.AS cs.CL cs.SD

    CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

    Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted by ISCSLP 2022

  30. arXiv:2203.17190  [pdf, other

    eess.AS cs.CL

    Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

    Authors: Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

    Abstract: Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability t… ▽ More

    Submitted 19 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by interspeech 2022

  31. arXiv:2203.16600  [pdf, other

    cs.CV cs.AI

    Learning Local Displacements for Point Cloud Completion

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud. Our architecture relies on three novel layers that are used successively within an encoder-decoder structure and specifically developed for the task at hand. The first one carries out feature extraction by matching the point features to a set of pre-trained local descripto… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  32. arXiv:2201.05675  [pdf, other

    cs.CV cs.LG

    Transformers in Action: Weakly Supervised Action Segmentation

    Authors: John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels. In this formulation, the task presents various challenges for sequence modeling approaches due to the emphasis on action transition points, long sequence lengths, and frame contextualization, making the… ▽ More

    Submitted 20 January, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: Under Review

  33. arXiv:2201.01669  [pdf, other

    eess.AS cs.LG cs.SD

    Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

    Authors: Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, Mert Pilanci

    Abstract: The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a soluti… ▽ More

    Submitted 29 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

  34. arXiv:2112.13384  [pdf, other

    cs.LG cs.MM cs.SI

    Will You Dance To The Challenge? Predicting User Participation of TikTok Challenges

    Authors: Lynnette Hui Xian Ng, John Yeh Han Tan, Darryl Jing Heng Tan, Roy Ka-Wei Lee

    Abstract: TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

    Comments: Accepted at ASONAM 2021

  35. arXiv:2110.03887  [pdf, other

    eess.AS cs.SD

    Environment Aware Text-to-Speech Synthesis

    Authors: Daxin Tan, Guangyan Zhang, Tan Lee

    Abstract: This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in TTS system development. The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condi… ▽ More

    Submitted 6 August, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted by Interspeech 2022

  36. arXiv:2110.03857  [pdf, other

    eess.AS cs.CL cs.SD

    A study on the efficacy of model pre-training in developing neural text-to-speech system

    Authors: Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

    Abstract: In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data. This study aims to understand bet… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  37. arXiv:2108.02821  [pdf, other

    eess.AS cs.SD

    Applying the Information Bottleneck Principle to Prosodic Representation Learning

    Authors: Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

    Abstract: This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE quantized layer is incorporated in the speech generation model to control the IB capacity and adjust the balance between reconstruction power and dise… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: To be appeared in Interspeech 2021

  38. arXiv:2107.01554  [pdf, other

    eess.AS cs.SD

    EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

    Authors: Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bi… ▽ More

    Submitted 7 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted by ASRU 2021

  39. arXiv:2103.17086  [pdf, other

    cs.CV cs.AI

    Deep adaptive fuzzy clustering for evolutionary unsupervised representation learning

    Authors: Dayu Tan, Zheng Huang, Xin Peng, Weimin Zhong, Vladimir Mahalec

    Abstract: Cluster assignment of large and complex images is a crucial but challenging task in pattern recognition and computer vision. In this study, we explore the possibility of employing fuzzy clustering in a deep neural network framework. Thus, we present a novel evolutionary unsupervised learning representation model with iterative optimization. It implements the deep adaptive fuzzy clustering (DAFC) s… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  40. arXiv:2103.04699  [pdf, other

    eess.AS cs.SD

    CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

    Authors: Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

    Abstract: This paper presents the CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge. The challenge provides two Mandarin speech corpora: the AIShell-3 corpus of 218 speakers with noise and reverberation and the MST corpus including high-quality speech of one male and one female speakers. 100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectiv… ▽ More

    Submitted 5 July, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

  41. arXiv:2102.07982  [pdf, other

    cs.SD eess.AS

    Voice Gender Scoring and Independent Acoustic Characterization of Perceived Masculinity and Femininity

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan

    Abstract: Previous research has found that voices can provide reliable information to be used for gender classification with a high level of accuracy. In social psychology, perceived masculinity and femininity (masculinity and femininity rated by humans) has often been considered an important feature when investigating the influence of vocal features on social behaviours. While previous studies have charact… ▽ More

    Submitted 4 August, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 24 pages, 7 figures, journal

  42. arXiv:2012.14629  [pdf, other

    cs.CV

    TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions

    Authors: Daniel Stanley Tan, Yi-Chun Chen, Trista Pei-Chun Chen, Wei-Chao Chen

    Abstract: In this paper, we propose a framework called TrustMAE to address the problem of product defect classification. Instead of relying on defective images that are difficult to collect and laborious to label, our framework can accept datasets with unlabeled images. Moreover, unlike most anomaly detection methods, our approach is robust against noises, or defective images, in the training dataset. Our f… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

    Comments: Accepted at WACV 2021

  43. arXiv:2011.08534  [pdf, other

    cs.CV

    A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views

    Authors: Riccardo Spezialetti, David Joseph Tan, Alessio Tonioni, Keisuke Tateno, Federico Tombari

    Abstract: Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input imag… ▽ More

    Submitted 18 November, 2020; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to 3DV 2020 as oral

  44. Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement

    Authors: Daxin Tan, Tan Lee

    Abstract: This paper presents a novel design of neural network system for fine-grained style modeling, transfer and prediction in expressive text-to-speech (TTS) synthesis. Fine-grained modeling is realized by extracting style embeddings from the mel-spectrograms of phone-level speech segments. Collaborative learning and adversarial learning strategies are applied in order to achieve effective disentangleme… ▽ More

    Submitted 7 October, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted by Interspeech 2021

  45. arXiv:2008.07358  [pdf, other

    cs.CV eess.IV

    SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Point clouds are often the default choice for many applications as they exhibit more flexibility and efficiency than volumetric data. Nevertheless, their unorganized nature -- points are stored in an unordered way -- makes them less suited to be processed by deep learning pipelines. In this paper, we propose a method for 3D object completion and classification based on point clouds. We introduce a… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: accepted in ECCV 2020 as oral

  46. arXiv:2001.07108  [pdf, other

    cs.CV

    Spectral Pyramid Graph Attention Network for Hyperspectral Image Classification

    Authors: Tinghuai Wang, Guangming Wang, Kuan Eeik Tan, Donghui Tan

    Abstract: Convolutional neural networks (CNN) have made significant advances in hyperspectral image (HSI) classification. However, standard convolutional kernel neglects the intrinsic connections between data points, resulting in poor region delineation and small spurious predictions. Furthermore, HSIs have a unique continuous data distribution along the high dimensional spectrum domain - much remains to be… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

    Comments: 7 pages, 6 figures, 4 tables

  47. arXiv:1910.09156  [pdf

    physics.ed-ph cs.CY

    Easy Java/JavaScript Simulations as a tool for Learning Analytics

    Authors: Francisco Esquembre, Félix J. García Clemente, Rafael Chicón, Lawrence Wee, Leong Tze Kwang, Darren Tan

    Abstract: In this paper we introduce the new and planned features of Easy Java/JavaScript Simulations (EJS) to support Learning Analytics (LA) and Educational Data Mining (EDM) research and practice in the use of simulations for the teaching and self-learning of natural sciences and engineering. Simulations created with EJS can now be easily embedded in a popular Learning Management System using a new plug-… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: 7 pages, 4 figures, conference paper Shih, J. L. et al. (Eds.) (2019). Proceedings of the 27th International Conference on Computers in Education. Taiwan: Asia-Pacific Society for Computers in Education

  48. arXiv:1909.01106  [pdf, other

    cs.CV cs.AI cs.CG cs.LG eess.IV

    ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space. To transfer information between the geometric and semantic branches of the network, we introduce paths between them concaten… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Accepted in International Conference on Computer Vision 2019

  49. arXiv:1903.03804  [pdf, other

    cs.AI

    Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

    Authors: Mingming Lu, Dingwu Tan, Naixue Xiong, Zailiang Chen, Haifeng Li

    Abstract: The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid increasing of source-code repositories, which is difficult to explore manually. The emergence of source-code mining provides a promising way to analyze those source cod… ▽ More

    Submitted 9 March, 2019; originally announced March 2019.

    Comments: 12 pages, 27 figures

  50. Adversarial Semantic Scene Completion from a Single Depth Image

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a method to reconstruct, complete and semantically label a 3D scene from a single input depth image. We improve the accuracy of the regressed semantic 3D maps by a novel architecture based on adversarial learning. In particular, we suggest using multiple adversarial loss terms that not only enforce realistic outputs with respect to the ground truth, but also an effective embedding of th… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Comments: 2018 International Conference on 3D Vision (3DV)

    Journal ref: 2018 International Conference on 3D Vision (3DV), Verona, Italy, 2018, pp. 426-434