Skip to main content

Showing 1–18 of 18 results for author: Jang, J R

  1. arXiv:2407.00657  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Real-Time Music Accompaniment Separation with MMDenseNet

    Authors: Chun-Hsiang Wang, Chung-Che Wang, Jun-You Wang, Jyh-Shing Roger Jang, Yen-Hsun Chu

    Abstract: Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them unsuitable for deployment on edge devices. Moreover, these methods may produce low-quality output when the input duration is short, making them impractical for real-time applications. T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.15751  [pdf, other

    cs.SD eess.AS

    Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data

    Authors: Yu-Hua Chen, Woosung Choi, Wei-Hsiang Liao, Marco Martínez-Ramírez, Kin Wai Cheuk, Yuki Mitsufuji, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to DAFx 2024

  3. arXiv:2406.04582  [pdf, other

    eess.AS cs.SD

    Neural Codec-based Adversarial Sample Detection for Speaker Verification

    Authors: Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specificall… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.03111  [pdf, other

    eess.AS eess.SP

    Singing Voice Graph Modeling for SingFake Detection

    Authors: Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice. Existing models for speech deepfake detection have struggled to adapt to unseen attacks in this unique singing voice domain of human vocalization. To bridge the gap, we present a groundbreaking SingGraph model. The model synergizes the capabilities of the MERT acoustic music unde… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024; Our code is available at https://github.com/xjchenGit/SingGraph.git

  5. arXiv:2402.13018  [pdf, other

    eess.AS cs.SD

    EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

    Authors: Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

    Abstract: Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi… ▽ More

    Submitted 12 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: webpage: https://emosuperb.github.io/

  6. arXiv:2311.16267  [pdf, other

    cs.CL cs.SE

    Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model

    Authors: Yu-Chen Lin, Akhilesh Kumar, Norman Chang, Wenliang Zhang, Muhammad Zakir, Rucha Apte, Haiyang He, Chao Wang, Jyh-Shing Roger Jang

    Abstract: We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm… ▽ More

    Submitted 30 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2311.12488  [pdf, other

    eess.AS cs.SD

    Adapting pretrained speech model for Mandarin lyrics transcription and alignment

    Authors: Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang

    Abstract: The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by ASRU 2023

  8. arXiv:2307.15293  [pdf, other

    cs.CL cs.AI

    WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories

    Authors: Te-Yu Chi, Yu-Meng Tang, Chia-Wen Lu, Qiu-Xia Zhang, Jyh-Shing Roger Jang

    Abstract: Our research focuses on solving the zero-shot text classification problem in NLP, with a particular emphasis on innovative self-training strategies. To achieve this objective, we propose a novel self-training strategy that uses labels rather than text for training, significantly reducing the model's training time. Specifically, we use categories from Wikipedia as our training set and leverage the… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  9. arXiv:2302.08130  [pdf, other

    cs.SD cs.LG eess.AS

    Personalized Audio Quality Preference Prediction

    Authors: Chung-Che Wang, Yu-Chun Lin, Yu-Teng Hsu, Jyh-Shing Roger Jang

    Abstract: This paper proposes to use both audio input and subject information to predict the personalized preference of two audio segments with the same content in different qualities. A siamese network is used to compare the inputs and predict the preference. Several different structures for each side of the siamese network are investigated, and an LDNet with PANNs' CNN6 as the encoder and a multi-layer pe… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  10. arXiv:2210.15563  [pdf, other

    cs.CV cs.IR cs.SD eess.AS

    Multimodal Transformer Distillation for Audio-Visual Synchronization

    Authors: Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Audio-visual synchronization aims to determine whether the mouth movements and speech in the video are synchronized. VocaLiST reaches state-of-the-art performance by incorporating multimodal Transformers to model audio-visual interact information. However, it requires high computing resources, making it impractical for real-world applications. This paper proposed an MTDVocaLiST model, which is tra… ▽ More

    Submitted 18 March, 2024; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2024

  11. arXiv:2210.00753  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

    Authors: Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications. However, to the best of our knowledge, the adversarial robustness of AVASD models hasn't been investigated, not to mention the effective defense against such attacks. In this paper, we are the first to reveal the vulnerability of AVASD models under audio-only… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT 2022

  12. arXiv:2203.17031  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification

    Authors: Yen-Lun Liao, Xuanjun Chen, Chung-Che Wang, Jyh-Shing Roger Jang

    Abstract: The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud-based systems, confining the model size un… ▽ More

    Submitted 2 October, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by ISCA SPSC 2022

  13. arXiv:2202.09907  [pdf, other

    cs.SD eess.AS

    towards automatic transcription of polyphonic electric guitar music:a new dataset and a multi-loss transformer model

    Authors: Yu-Hua Chen, Wen-Yi Hsiao, Tsu-Kuang Hsieh, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a se… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: to be published at ICASSP 2022

  14. arXiv:2110.06707  [pdf, other

    cs.SD cs.MM eess.AS

    Singer separation for karaoke content generation

    Authors: Hsuan-Yu Chen, Xuanjun Chen, Jyh-Shing Roger Jang

    Abstract: Due to the rapid development of deep learning, we can now successfully separate singing voice from mono audio music. However, this separation can only extract human voices from other musical instruments, which is undesirable for karaoke content generation applications that only require the separation of lead singers. For this karaoke application, we need to separate the music containing male and f… ▽ More

    Submitted 12 January, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  15. arXiv:1812.01269  [pdf, other

    cs.SD eess.AS

    Learning to match transient sound events using attentional similarity for few-shot sound recognition

    Authors: Szu-Yu Chou, Kai-Hsiang Cheng, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition. Given a few examples of an unseen sound event, a classifier must be quickly adapted to recognize the new sound event without much fine-tuning. The proposed attentional similarity module can be plugged into any metric-based learning method for few-shot learning, allowing the resulting mo… ▽ More

    Submitted 18 February, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: This is a pre-print version of an ICASSP 2019 paper

  16. arXiv:1807.02254  [pdf, other

    cs.SD cs.AI eess.AS

    Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks

    Authors: Cheng-Wei Wu, Jen-Yu Liu, Yi-Hsuan Yang, Jyh-Shing R. Jang

    Abstract: Can we make a famous rap singer like Eminem sing whatever our favorite song? Singing style transfer attempts to make this possible, by replacing the vocal of a song from the source singer to the target singer. This paper presents a method that learns from unpaired data for singing style transfer using generative adversarial networks.

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: 3 pages, 3 figures, demo website: http://mirlab.org/users/haley.wu/cybegan

    Journal ref: ICML Workshop 2018 (Joint Music Workshop)

  17. arXiv:1805.09621  [pdf, other

    cs.LG cs.CV stat.ML

    Backpropagation with N-D Vector-Valued Neurons Using Arbitrary Bilinear Products

    Authors: Zhe-Cheng Fan, Tak-Shing T. Chan, Yi-Hsuan Yang, Jyh-Shing R. Jang

    Abstract: Vector-valued neural learning has emerged as a promising direction in deep learning recently. Traditionally, training data for neural networks (NNs) are formulated as a vector of scalars; however, its performance may not be optimal since associations among adjacent scalars are not modeled. In this paper, we propose a new vector neural architecture called the Arbitrary BIlinear Product Neural Netwo… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

    Comments: 14 pages, 8 figures, 3 tables

  18. arXiv:1710.11428  [pdf, other

    cs.SD cs.LG eess.AS

    SVSGAN: Singing Voice Separation via Generative Adversarial Network

    Authors: Zhe-Cheng Fan, Yen-Lin Lai, Jyh-Shing Roger Jang

    Abstract: Separating two sources from an audio mixture is an important task with many applications. It is a challenging problem since only one signal channel is available for analysis. In this paper, we propose a novel framework for singing voice separation using the generative adversarial network (GAN) with a time-frequency masking function. The mixture spectra is considered to be a distribution and is map… ▽ More

    Submitted 13 November, 2017; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: 5 pages, 4 figures, 1 table. Demo website: http://mirlab.org/demo/svsgan