Skip to main content

Showing 1–50 of 100 results for author: Dang, J

  1. arXiv:2407.07209  [pdf

    cond-mat.mes-hall

    Electrical switching of spin-polarized light-emitting diodes based on a 2D CrI3/hBN/WSe2 heterostructure

    Authors: Jianchen Dang, Tongyao Wu, Shuohua Yan, Kenji Watanabe, Takashi Taniguchi, Hechang Lei, Xiao-Xiao Zhang

    Abstract: Spin-polarized light-emitting diodes (spin-LEDs) convert the electronic spin information to photon circular polarization, offering potential applications including spin amplification, optical communications, and advanced imaging. The conventional control of the emitted light's circular polarization requires a change in the external magnetic field, limiting the operation conditions of spin-LEDs. He… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  4. arXiv:2407.00281  [pdf

    cond-mat.str-el cond-mat.mes-hall

    Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator

    Authors: Khanh Duy Nguyen, Woojoo Lee, Jianchen Dang, Tongyao Wu, Gabriele Berruto, Chenhui Yan, Chi Ian Jess Ip, Haoran Lin, Qiang Gao, Seng Huat Lee, Binghai Yan, Chaoxing Liu, Zhiqiang Mao, Xiao-Xiao Zhang, Shuolong Yang

    Abstract: The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures

  5. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  7. arXiv:2404.11129  [pdf, other

    cs.CV

    Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

    Authors: Minghe Gao, Shuang Chen, Liang Pang, Yuan Yao, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

    Abstract: The remarkable performance of Multimodal Large Language Models (MLLMs) has unequivocally demonstrated their proficient understanding capabilities in handling a wide array of visual tasks. Nevertheless, the opaque nature of their black-box reasoning processes persists as an enigma, rendering them uninterpretable and struggling with hallucination. Their ability to execute intricate compositional rea… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  8. arXiv:2404.10542  [pdf, other

    astro-ph.HE

    Statistical analysis of pulsar flux density distribution

    Authors: H. W. Xu, R. S. Zhao, Erbil Gugercinoglu, H. Liu, D. Li, P. Wang, C. H. Niu, C. Miao, X. Zhu, R. W. Tian, W. L. Li, S. D. Wang, Z. F. Tu, Q. J. Zhi, S. J. Dang, L. H. Shang, S. Xiao

    Abstract: This study presents a comprehensive analysis of the spectral properties of 886 pulsars across a wide frequency range from 20MHz to 343.5GHz, including a total of 86 millisecond pulsars. The majority of the pulsars exhibit power-law behavior in their spectra, although some exceptions are observed. Five different spectral models, namely simple power-law, broken power-law, low-frequency turn-over, hi… ▽ More

    Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 39 papers,17figures

  9. arXiv:2404.03216  [pdf, other

    cs.CR

    Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

    Authors: Jianming Tong, Jingtian Dang, Anupam Golder, Callie Hao, Arijit Raychowdhury, Tushar Krishna

    Abstract: As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Proceedings of the 5th MLSys Conference, Santa Clara, CA, USA, 2024. Copyright 2024 by the author(s)

  10. arXiv:2402.15069  [pdf, other

    astro-ph.HE

    Investigation of profile shifting and subpulse movement in PSR J0344-0901 with FAST

    Authors: H. M. Tedila, R. Yuen, N. Wang, D. Li, Z. G. Wen, W. M. Yan, J. P. Yuan, X. H. Han, P. Wang, W. W. Zhu, S. J. Dang, S. Q. Wang, J. T. Xie, Q. D. Wu, Sh. Khasanov, FAST Collaboration

    Abstract: We report two phenomena detected in PSR J0344$-$0901 from two observations conducted at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The first phenomenon manifests as shifting in the pulse emission to later longitudinal phases and then gradually returns to its original location. The event lasts for about 216 pulse periods, with an average s… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  11. arXiv:2401.13475  [pdf, other

    hep-lat hep-ex hep-ph

    Lattice QCD calculation of the $D_s^{*}$ radiative decay with (2+1)-flavor Wilson-clover ensembles

    Authors: Yu Meng, Jin-Long Dang, Chuan Liu, Zhaofeng Liu, Tinghong Shen, Haobo Yan, Ke-Long Zhang

    Abstract: We perform a lattice calculation on the radiative decay of $D_s^*$ using the (2+1)-flavor Wilson-clover gauge ensembles generated by CLQCD collaboration. A method allowing us to calculate the form factor with zero transfer momentum is proposed and applied to the radiative transition $D_s^*\rightarrow D_sγ$ and the Dalitz decay $D_s^*\rightarrow D_s e^+e^-$. After a continuum extrapolation using th… ▽ More

    Submitted 29 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures, published version

    Journal ref: Physical Review D 109,074511(2024)

  12. arXiv:2401.12426  [pdf, other

    astro-ph.HE

    Pulse Jitter and Single-pulse Variability in Millisecond Pulsars

    Authors: S. Q. Wang, N. Wang, J. B. Wang, G. Hobbs, H. Xu, B. J. Wang, S. Dai, S. J. Dang, D. Li, Y. Feng, C. M. Zhang

    Abstract: Understanding the jitter noise resulting from single-pulse phase and shape variations is important for the detection of gravitational waves using pulsar timing array. We presented measurements of jitter noise and single-pulse variability of 12 millisecond pulsars that are part of the International Pulsar Timing Array sample using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). We… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 14 pages, 9 figures, Accepted for publication in ApJ

  13. arXiv:2401.02081  [pdf, ps, other

    cs.IT eess.SP

    Performance Trade-off and Joint Waveform Design for MIMO-OFDM DFRC Systems

    Authors: Tianchen Liu, Liang Wu, Bo An, Zaichen Zhang, Jian Dang, Jiangzhou Wang

    Abstract: Dual-functional radar-communication (DFRC) has attracted considerable attention. This paper considers the frequency-selective multipath fading environment and proposes DFRC waveform design strategies based on multiple-input and multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) techniques. In the proposed waveform design strategies, the Cramer-Rao bound (CRB) of the radar… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  14. arXiv:2312.14398  [pdf, other

    cs.SD eess.AS

    ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

    Authors: Cheng Gong, Xin Wang, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang, Korin Richmond, Junichi Yamagishi

    Abstract: Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. In most cases, TTS systems are built using a single speaker's voice. However, there is growing interest in developing systems that can synthesize voices… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 5 figures

  15. arXiv:2312.11201  [pdf, other

    eess.AS cs.SD eess.SP

    A Refining Underlying Information Framework for Monaural Speech Enhancement

    Authors: Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

    Abstract: Supervised speech enhancement has gained significantly from recent advancements in neural networks, especially due to their ability to non-linearly fit the diverse representations of target speech, such as waveform or spectrum. However, these direct-fitting solutions continue to face challenges with degraded speech and residual noise in hearing evaluations. By bridging the speech enhancement and t… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 5 pages

  16. arXiv:2312.07032  [pdf, ps, other

    cs.LG stat.ML

    Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

    Authors: Yun Liao, Junfan Li, Shizhong Liao, Qinghua Hu, Jianwu Dang

    Abstract: In this paper, we study the mistake bound of online kernel learning on a budget. We propose a new budgeted online kernel learning model, called Ahpatron, which significantly improves the mistake bound of previous work and resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We first present an aggressive variant of Perceptron, named AVP, a model without budget, which uses a… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  17. arXiv:2311.00370  [pdf

    astro-ph.HE astro-ph.GA hep-ph

    Discovery of four pulsars in a pilot survey at intermediate Galactic latitudes with FAST

    Authors: Q. J. Zhi, J. T. Bai, S. Dai, X. Xu, S. J. Dang, L. H. Shang, R. S. Zhao, D. Li, W. W. Zhu, N. Wang, J. P. Yuan, P. Wang, L. Zhang, Y. Feng, J. B. Wang, S. Q. Wang, Q. D. Wu, A. J. Dong, H. Yang, J. Tian, W. Q. Zhong, X. H. Luo, Miroslav D. Filipovi, G. J. Qiao

    Abstract: We present the discovery and timing results of four pulsars discovered in a pilot survey at intermediate Galactic latitudes with the Five-hundred Aperture Spherical Telescope (FAST). Among these pulsars, two belong to the category of millisecond pulsars (MSPs) with spin periods of less than 20 ms. The other two fall under the classification of "mildly recycled" pulsars, with massive white dwarfs a… ▽ More

    Submitted 28 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 7 pages, 4 figures, 2 tables, accepted to ApJ

  18. arXiv:2310.11523  [pdf, other

    cs.LG cs.AI cs.CL

    Group Preference Optimization: Few-Shot Alignment of Large Language Models

    Authors: Siyan Zhao, John Dang, Aditya Grover

    Abstract: Many applications of large language models (LLMs), ranging from chatbots to creative writing, require nuanced subjective judgments that can differ significantly across different groups. Existing alignment algorithms can be expensive to align for each group, requiring prohibitive amounts of group-specific preference data and computation for real-world use cases. We introduce Group Preference Optimi… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 24 pages, 12 figures

  19. arXiv:2309.15512  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

    Authors: Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

    Abstract: Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from inform… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2307.15484; text overlap with arXiv:2309.00424

  20. arXiv:2309.00424  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Learning Speech Representation From Contrastive Token-Acoustic Pretraining

    Authors: Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

    Abstract: For fine-grained generation and recognition tasks such as minimally-supervised text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), the intermediate representations extracted from speech should serve as a "bridge" between text and acoustic information, containing information from both modalities. The semantic content is emphasized, while the paralinguistic informati… ▽ More

    Submitted 18 December, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  21. arXiv:2308.15812  [pdf, other

    cs.LG cs.AI cs.CL

    Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

    Authors: Hritik Bansal, John Dang, Aditya Grover

    Abstract: Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback presents a structural design choice between ratings (e.g., score Response A on a scale of 1-7) and rankings (e.g., is Response A better than Response B?). In this work, we analyze the effect… ▽ More

    Submitted 5 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 31 pages, Accepted to ICLR 2024

  22. arXiv:2307.15484  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

    Authors: Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

    Abstract: Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with minimal supervision by combining two types of discrete speech representations and using two sequence-to-sequence tasks to decouple TTS. However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging pr… ▽ More

    Submitted 18 December, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted by ICASSP 2024

  23. arXiv:2307.06657  [pdf, other

    cs.IT eess.SP

    Downlink Precoding for Cell-free FBMC/OQAM Systems With Asynchronous Reception

    Authors: Yuhao Qi, Jian Dang, Zaichen Zhang, Liang Wu, Yongpeng Wu

    Abstract: In this work, an efficient precoding design scheme is proposed for downlink cell-free distributed massive multiple-input multiple-output (DM-MIMO) filter bank multi-carrier (FBMC) systems with asynchronous reception and highly frequency selectivity. The proposed scheme includes a multiple interpolation structure to eliminate the impact of response difference we recently discovered, which has bette… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 16pages, 4 figures

  24. arXiv:2306.02625  [pdf, other

    cs.SD eess.AS

    Rethinking the visual cues in audio-visual speaker extraction

    Authors: Junjie Li, Meng Ge, Zexu pan, Rui Cao, Longbiao Wang, Jianwu Dang, Shiliang Zhang

    Abstract: The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to leverage two visual cues, namely speaker identity and synchronization, to enhance performance compared to audio-only algorithms. However, the visual front-end in AVSE is often derived from a pre-trained model or end-to-end trained, making it unclear which visual cue contributes more to the speaker extraction p… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted in Interspeech 2023

  25. arXiv:2305.17860  [pdf, other

    cs.SD eess.AS

    speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

    Authors: Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang

    Abstract: In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with d… ▽ More

    Submitted 30 May, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  26. arXiv:2305.10821  [pdf, other

    eess.AS

    Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

    Authors: Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for… ▽ More

    Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2212.03401

  27. arXiv:2304.13904  [pdf, other

    cond-mat.mes-hall

    Asymmetric Chiral Coupling in a Topological Resonator

    Authors: Shushu Shi, Xin Xie, Sai Yan, Jingnan Yang, Jianchen Dang, Shan Xiao, Longlong Yang, Danjie Dai, Bowen Fu, Yu Yuan, Rui Zhu, Xiangbin Su, Hanqing Liu, Zhanchun Zuo, Can Wang, Haiqiao Ni, Zhichuan Niu, Qihuang Gong, Xiulai Xu

    Abstract: Chiral light-matter interactions supported by topological edge modes at the interface of valley photonic crystals provide a robust method to implement the unidirectional spin transfer. The valley topological photonic crystals possess a pair of counterpropagating edge modes. The edge modes are robust against the sharp bend of $60^{\circ}$ and $120^{\circ}$, which can form a resonator with whisperin… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 13 pages, 4 figures

    Journal ref: Applied Physics letters (2023)

  28. arXiv:2303.14593  [pdf, other

    cs.SD eess.AS

    Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

    Authors: Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara

    Abstract: Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information,… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

  29. arXiv:2303.05070  [pdf, other

    cs.IT

    Pilot-Free Unsourced Random Access Via Dictionary Learning and Error-Correcting Codes

    Authors: Zhentian Zhang, Jian Dang, Zaichen Zhang, Liang Wu, Bingcheng Zhu, Lei Wang

    Abstract: Massive machine-type communications (mMTC) or massive access is a critical scenario in the fifth generation (5G) and the future cellular network. With the surging density of devices from millions to billions, unique pilot allocation becomes inapplicable in the user ID-incorporated grant-free random access protocol. Unsourced random access (URA) manifests itself by focusing only on unwrapping the r… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  30. arXiv:2302.11399  [pdf, other

    cond-mat.mes-hall

    Controllable Spin-Resolved Photon Emission Enhanced by Slow-Light Mode in Photonic Crystal Waveguides on Chip

    Authors: Shushu Shi, Shan Xiao, Jingnan Yang, Shulun Li, Xin Xie, Jianchen Dang, Longlong Yang, Danjie Dai, Bowen Fu, Sai Yan, Yu Yuan, Rui Zhu, Bei-Bei Li, Zhanchun Zuo, Can Wang, Haiqiao Ni, Zhichuan Niu, Kuijuan Jin, Qihuang Gong, Xiulai Xu

    Abstract: We report the slow-light enhanced spin-resolved in-plane emission from a single quantum dot (QD) in a photonic crystal waveguide (PCW). The slow light dispersions in PCWs are designed to match the emission wavelengths of single QDs. The resonance between two spin states emitted from a single QD and a slow light mode of a waveguide is investigated under a magnetic field with Faraday configuration.… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 7 pages,5 figures

    Journal ref: Optics Express, 31,10348 (2023)

  31. arXiv:2302.11254  [pdf, other

    cs.SD cs.CV cs.LG eess.AS eess.IV

    Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

    Authors: Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

    Abstract: Visual speech (i.e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production. This paper investigates this correlation and proposes a cross-modal speech co-learning paradigm. The primary motivation of our cross-modal co-learning method is modeling one modality aided by exploiting knowledge from another modality. Specifically, two cross-mod… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  32. arXiv:2302.09208  [pdf, other

    cs.CV

    Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

    Authors: Tatsuro Yamane, Pang-jo Chun, Ji Dang, Takayuki Okatani

    Abstract: In this paper, a bridge member damage cause estimation framework is proposed by calculating the image position using Structure from Motion (SfM) and acquiring its information via Visual Question Answering (VQA). For this, a VQA model was developed that uses bridge images for dataset creation and outputs the damage or member name and its existence based on the images and questions. In the developed… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  33. arXiv:2212.03401  [pdf, other

    eess.AS cs.LG cs.SD

    MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

    Authors: Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we d… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  34. arXiv:2211.01046  [pdf, other

    eess.AS cs.CL cs.SD

    Monolingual Recognizers Fusion for Code-switching Speech Recognition

    Authors: Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang

    Abstract: The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recogn… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  35. arXiv:2210.10401  [pdf, other

    cs.IT eess.SP

    Asynchronous RIS-assisted Localization: A Comprehensive Analysis of Fundamental Limits

    Authors: Ziyi Gong, Liang Wu, Zaichen Zhang, Jian Dang, Yongpeng Wu, Jiangzhou Wang

    Abstract: The reconfigurable intelligent surface (RIS) has drawn considerable attention for its ability to enhance the performance of not only the wireless communication but also the indoor localization with low-cost. This paper investigates the performance limits of the RIS-based near-field localization in the asynchronous scenario, and analyzes the impact of each part of the cascaded channel on the locali… ▽ More

    Submitted 26 March, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

  36. arXiv:2210.06177  [pdf, other

    cs.CV cs.CL cs.SD eess.AS

    VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

    Authors: Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

    Abstract: Speaker extraction seeks to extract the target speech in a multi-talker scenario given an auxiliary reference. Such reference can be auditory, i.e., a pre-recorded speech, visual, i.e., lip movements, or contextual, i.e., phonetic sequence. References in different modalities provide distinct and complementary information that could be fused to form top-down attention on the target speaker. Previou… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

  37. arXiv:2210.05254  [pdf, other

    cs.SD cs.AI eess.AS

    Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

    Authors: Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang

    Abstract: The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection). In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding featur… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: 7 pages, 1 figures, Accecpted by Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia

  38. arXiv:2209.15401  [pdf

    cond-mat.mes-hall

    Single charge control of localized excitons in heterostructures with ferroelectric thin films and two-dimensional transition metal dichalcogenides

    Authors: Danjie Dai, Xinyan Wang, Jingnan Yang, Jianchen Dang, Yu Yuan, Bowen Fu, Xin Xie, Longlong Yang, Shan Xiao, Shushu Shi, Sai Yan, Rui Zhu, Zhanchun Zuo, Can Wang, Kuijuan Jin, Qihuang Gong, Xiulai Xu

    Abstract: Single charge control of localized excitons (LXs) in two-dimensional transition metal dichalcogenides (TMDCs) is crucial for potential applications in quantum information processing and storage. However, traditional electrostatic doping method with applying metallic gates onto TMDCs may cause the inhomogeneous charge distribution, optical quench, and energy loss. Here, by locally controlling the f… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: 13 pages, 5 figures

    Journal ref: Nanoscale, 2022,14, 14537-14543

  39. MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

    Authors: Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

    Abstract: Recent neural network based Direction of Arrival (DoA) estimation algorithms have performed well on unknown number of sound sources scenarios. These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO. However, such MISO algorithms strongly depend on empirical threshold settin… ▽ More

    Submitted 16 November, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted by Interspeech 2022

  40. arXiv:2206.14580  [pdf, other

    cs.CL eess.AS

    Language-specific Characteristic Assistance for Code-switching Speech Recognition

    Authors: Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang

    Abstract: Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, most existing methods have no language constraints on LSEs and underutili… ▽ More

    Submitted 11 July, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  41. arXiv:2206.12273  [pdf, other

    eess.AS cs.LG

    Iterative Sound Source Localization for Unknown Number of Sources

    Authors: Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

    Abstract: Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio. For the practical problem of unknown number of sources, existing localization algorithms attempt to predict a likelihood-based coding (i.e., spatial spectrum) and employ a pre-determined threshold to detect the source number and corresponding DOA value. However, these t… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  42. Arecibo and FAST Timing Follow-up of twelve Millisecond Pulsars Discovered in Commensal Radio Astronomy FAST Survey

    Authors: C. C. Miao, W. W. Zhu, D. Li, P. C. C. Freire, J. R. Niu, P. Wang, J. P. Yuan, M. Y. Xue, A. D. Cameron, D. J. Champion, M. Cruces, Y. T. Chen, M. M. Chi, X. F. Cheng, S. J. Dang, M. F. Ding, Y. Feng, Z. Y. Gan, G. Hobbs, M. Kramer, Z. J. Liu, Y. X. Li, Z. K. Luo, X. L. Miao, L. Q. Meng , et al. (24 additional authors not shown)

    Abstract: We report the phase-connected timing ephemeris, polarization pulse profiles, Faraday rotation measurements, and Rotating-Vector-Model (RVM) fitting results of twelve millisecond pulsars (MSPs) discovered with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal radio Astronomy FAST survey (CRAFTS). The timing campaigns were carried out with FAST and Arecibo over three… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: 11 pages, 5 figures, MNRAS accepted

  43. arXiv:2205.02999  [pdf, ps, other

    eess.SP

    Fast and Arbitrary Beam Pattern Design for RIS-Assisted Terahertz Wireless Communication

    Authors: Jian Dang, Zaichen Zhang, Yewei Li, Liang Wu, Bingcheng Zhu, Lei Wang

    Abstract: Reconfigurable intelligent surface (RIS) can assist terahertz wireless communication to restore the fragile line-of-sight links and facilitate beam steering. Arbitrary reflection beam patterns are desired to meet diverse requirements in different applications. This paper establishes relationship between RIS beam pattern design with two-dimensional finite impulse response filter design and proposes… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 5 pages, 5 figures

  44. arXiv:2205.01407  [pdf

    hep-th astro-ph.HE

    Emission Variation of a Long-period Pulsar Discovered by the Five-hundred-meter Aperture Spherical Radio Telescope (FAST)

    Authors: H. M. Tedila, R. Yuen, N. Wang, J. P. Yuan, Z. G. Wen, W. M. Yan, S. Q. Wang, S. J. Dang, D. Li, P. Wang, W. W. Zhu, J. R. Niu, C. C. Miao, M. Y. Xue, L. Zhang, Z. Y. Tu, R. Rejep, J. T. Xie, FAST Collaboration

    Abstract: We report on the variation in the single-pulse emission from PSR J1900+4221 (CRAFTS 19C10) observed at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope. The integrated pulse profile shows two distinct components, referred to here as the leading and trailing components, with the latter component also containing a third weak component. The single-pulse s… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Journal ref: The Astrophysical Journal, 929:171T (10pp), 2022

  45. arXiv:2205.00256  [pdf, other

    cs.LG

    Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning

    Authors: Cuiying Huo, Dongxiao He, Yawen Li, Di Jin, Jianwu Dang, Weixiong Zhang, Witold Pedrycz, Lingfei Wu

    Abstract: Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. Most existing HGNN-based approaches are supervised or semi-supervised learning methods requiring graphs to be annotated, which is costly and time-consuming. Self-supervised contrastive learning has been proposed to address the problem of requiring annotated data by mining in… ▽ More

    Submitted 16 November, 2023; v1 submitted 30 April, 2022; originally announced May 2022.

  46. arXiv:2203.15134  [pdf, other

    astro-ph.HE astro-ph.SR

    Detection of strong scattering close to the eclipse region of PSR B1957+20

    Authors: J. T. Bai, S. Dai, Q. J. Zhi, W. A. Coles, D. Li, W. W. Zhu, G. Hobbs, G. J. Qiao, N. Wang, J. P. Yuan, M. D. Filipovic, J. B. Wang, Z. C. Pan, L. H. Shang, S. J. Dang, S. Q. Wang, C. C. Miao

    Abstract: We present the first measurement of pulse scattering close to the eclipse region of PSR B1957+20, which is in a compact binary system with a low-mass star. We measured pulse scattering time-scales up to 0.2 ms close to the eclipse and showed that it scales with the dispersion measure (DM) excess roughly as $τ\proptoΔ{\rm DM}^{2}$. Our observations provide the first evidence of strong scattering du… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 8 pages, 4 figures, MNRAS accepted

  47. arXiv:2203.09098  [pdf, other

    cs.SD cs.LG eess.AS

    TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

    Authors: Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

    Abstract: Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale f… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  48. arXiv:2203.01501  [pdf, other

    cond-mat.mes-hall physics.optics

    Strong light-matter interactions between gap plasmons and two-dimensional excitons at ambient condition in a deterministic way

    Authors: Longlong Yang, Xin Xie, Jingnan Yang, Mengfei Xue, Shiyao Wu, Shan Xiao, Feilong Song, Jianchen Dang, Sibai Sun, Zhanchun Zuo, Jianing Chen, Yuan Huang, Xingjiang Zhou, Kuijuan Jin, Can Wang, Xiulai Xu

    Abstract: Strong exciton-plasmon interaction between the layered two-dimensional (2D) semiconductors and gap plasmons shows a great potential to implement cavity quantum-electrodynamics in ambient condition. However, achieving a robust plasmon-exciton coupling with nanocavity is still very challenging, because the layer area is usually small with conventional approaches. Here, we report on a robust strong e… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 21 pages, 5 figures

    Journal ref: Nano Letters, 2022, 22, 2177-2186

  49. arXiv:2202.09995  [pdf, other

    eess.AS cs.SD

    L-SpEx: Localized Target Speaker Extraction

    Authors: Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

    Abstract: Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance. Recent studies show that speaker extraction benefits from the location or direction of the target speaker. However, these studies assume that the target speaker's location is known in advance or detected by an extra visual cue, e.g., face image or video. In this… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: Accepted in ICASSP 2022

  50. arXiv:2201.11893  [pdf, ps, other

    cs.IT

    A New High Energy Efficiency Scheme Based on Two-Dimension Resource Blocks in Wireless Communication Systems

    Authors: Kang Liu, Zaichen Zhang, Jian Dang, Liang Wu, Bingchen Zhu, Lei Wang, Chuan Zhang

    Abstract: Energy efficiency (EE) plays a key role in future wireless communication network and it is easily to achieve high EE performance in low SNR regime. In this paper, a new high EE scheme is proposed for a MIMO wireless communication system working in the low SNR regime by using two dimension resource allocation. First, we define the high EE area based on the relationship between the transmission powe… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.