Skip to main content

Showing 1–31 of 31 results for author: Yoshii, K

  1. arXiv:2404.02840  [pdf, ps, other

    cs.DC

    A Survey on Error-Bounded Lossy Compression for Scientific Datasets

    Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

    Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: submitted to ACM Computing journal, requited to be 35 pages including references

  2. arXiv:2311.01739  [pdf, other

    cs.DC cs.PF

    Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware

    Authors: John Tramm, Bryce Allen, Kazutomo Yoshii, Andrew Siegel, Leighton Wilson

    Abstract: The recent trend toward deep learning has led to the development of a variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras Wafer-Scale Engine 2 (WSE-2), features 40 GB of on-chip SRAM, making it a potentially attractive platform for latency- or bandwidth-bound HPC simulation workloads. In this study, we examine the feasibility of performing continuous ener… ▽ More

    Submitted 6 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    ACM Class: D.1.3; J.2

  3. arXiv:2306.10240  [pdf, other

    cs.SD cs.LG eess.AS

    Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

    Authors: Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii

    Abstract: This paper describes an efficient unsupervised learning method for a neural source separation model that utilizes a probabilistic generative model of observed multichannel mixtures proposed for blind source separation (BSS). For this purpose, amortized variational inference (AVI) has been used for directly solving the inverse problem of BSS with full-rank spatial covariance analysis (FCA). Althoug… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, accepted to EUSIPCO 2023

  4. arXiv:2305.04447  [pdf, other

    eess.AS cs.SD

    Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions

    Authors: Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound source separation and localization, essential as the front-end of speech recognition. Classical approaches to interpolation rely on linear weighting of… ▽ More

    Submitted 1 March, 2024; v1 submitted 7 May, 2023; originally announced May 2023.

    Comments: Camera ready version for HSCMA 24 at ICASSP 24

  5. arXiv:2302.06751  [pdf, other

    cs.AR cs.LG

    OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science

    Authors: Maksim Levental, Arham Khan, Ryan Chard, Kazutomo Yoshii, Kyle Chard, Ian Foster

    Abstract: In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency process… ▽ More

    Submitted 15 March, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  6. arXiv:2207.10934  [pdf, other

    eess.AS cs.SD

    DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

    Authors: Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance ma… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: IWAENC 2022

  7. arXiv:2207.07296  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well i… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: IEEE/RSJ IROS 2022

  8. arXiv:2207.07273  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

    Authors: Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is to sequentially perform speech enhancement and automatic speech recognition (ASR) based on deep neural networks (DNNs) trained in a supervised manner. In our ta… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022

  9. arXiv:2205.05330  [pdf, other

    cs.SD eess.AS eess.SP stat.ML

    Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation

    Authors: Mathieu Fontaine, Kouhei Sekiguchi, Aditya Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view. The common way of deriving such an extension is to replace the multivariate complex Gaussian distribution in the likelihood function with its heavy-tailed generalization, e.g., the multivariate… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2022, pp.1-1

  10. arXiv:2111.05999  [pdf, ps, other

    cs.SE

    What Does the Post-Moore Era Mean for Research Software Engineering?

    Authors: Kazutomo Yoshii

    Abstract: We are entering the post-Moore era where we no longer enjoy the free ride of the performance growth from simply shrinking the transistor features. However, this does not necessarily mean that we are entering a dark era of computing. On the contrary, sustaining the performance growth of computing in the post-Moore era itself is cutting-edge research. Concretely, heterogeneity and hardware specializ… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: Research Software Engineers in HPC (RSE-HPC-2021) https://us-rse.org/rse-hpc-2021/

  11. arXiv:2105.05791  [pdf, other

    cs.SD cs.LG eess.AS

    Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

    Authors: Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii

    Abstract: This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal, in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music s… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: Submitted to Signals (ISSN 2624-6120)

  12. arXiv:2010.03749  [pdf, other

    cs.SD cs.LG eess.AS

    Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

    Authors: Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii

    Abstract: This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the $\textit{frame}$ lev… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to APSIPA 2020

  13. arXiv:2010.00059  [pdf, other

    cs.SD cs.LG eess.AS

    The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

    Authors: Andrew McLeod, James Owers, Kazuyoshi Yoshii

    Abstract: In this paper, we introduce the MIDI Degradation Toolkit (MDTK), containing functions which take as input a musical excerpt (a set of notes with pitch, onset time, and duration), and return a "degraded" version of that excerpt with some error (or errors) introduced. Using the toolkit, we create the Altered and Corrupted MIDI Excerpts dataset version 1.0 (ACME v1.0), and propose four tasks of incre… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: Authors 1 and 2 contributed equally to this work. Accepted by International Society for Music Information Retrieval Conference (ISMIR), 2020

  14. Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription

    Authors: Kentaro Shibata, Eita Nakamura, Kazuyoshi Yoshii

    Abstract: We present an automatic piano transcription system that converts polyphonic audio recordings into musical scores. This has been a long-standing problem of music information processing, and recent studies have made remarkable progress in the two main component techniques: multipitch detection and rhythm quantization. Given this situation, we study a method integrating deep-neural-network-based mult… ▽ More

    Submitted 3 April, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: 16 pages, 7 figures, typos corrected

    Journal ref: Information Sciences, vol. 566, p. 262, 2021

  15. arXiv:2005.07091  [pdf, other

    cs.SD cs.LG eess.AS

    Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Latent Chord Labels and Features

    Authors: Yiming Wu, Tristan Carsault, Eita Nakamura, Kazuyoshi Yoshii

    Abstract: This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical approach to ACE is to train a deep classification model (neural chord estimator) in a supervised manner by using only annotated music signals. In this discriminative approach, prior k… ▽ More

    Submitted 8 September, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

  16. arXiv:2004.03811  [pdf, other

    cs.CV

    MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

    Authors: Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

    Abstract: This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effect… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 19 pages

  17. arXiv:1911.04972  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network

    Authors: Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii

    Abstract: This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulat… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Accepted for publication in MLSP, 2019

  18. arXiv:1909.03632  [pdf, other

    cs.DC

    Improving the scalabiliy of neutron cross-section lookup codes on multicore NUMA system

    Authors: Kazutomo Yoshii, John Tramm, Andrew Siegel, Pete Beckman

    Abstract: We use the XSBench proxy application, a memory-intensive OpenMP program, to explore the source of on-node scalability degradation of a popular Monte Carlo (MC) reactor physics benchmark on non-uniform memory access (NUMA) systems. As background, we present the details of XSBench, a performance abstraction "proxy app" for the full MC simulation, as well as the internal design of the Linux kernel. W… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

  19. arXiv:1908.11307  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

    Authors: Yoshiaki Bando, Yoko Sasaki, Kazuyoshi Yoshii

    Abstract: This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: 6 pages, 2 figures, accepted for publication in 2019 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

  20. arXiv:1908.06969  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions

    Authors: Eita Nakamura, Kazuyoshi Yoshii

    Abstract: Most work on musical score models (a.k.a. musical language models) for music transcription has focused on describing the local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetiti… ▽ More

    Submitted 16 February, 2021; v1 submitted 18 August, 2019; originally announced August 2019.

    Comments: Title changed; change in organizations of sections; appendix added; some explanations added; 14 pages, 9 figures (supplemental material: 11 pages)

  21. arXiv:1904.10237  [pdf, other

    cs.LG cs.SD eess.AS

    Statistical Learning and Estimation of Piano Fingering

    Authors: Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii

    Abstract: Automatic estimation of piano fingering is important for understanding the computational process of music performance and applicable to performance assistance and education systems. While a natural way to formulate the quality of fingerings is to construct models of the constraints/costs of performance, it is generally difficult to find appropriate parameter values for these models. Here we study… ▽ More

    Submitted 1 January, 2020; v1 submitted 23 April, 2019; originally announced April 2019.

    Comments: 30 pages, 8 figures, tex style changed, minor modifications

  22. arXiv:1903.09341  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

    Authors: Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take… ▽ More

    Submitted 31 March, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

  23. arXiv:1903.03269  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    A Deep Generative Model of Speech Complex Spectrograms

    Authors: Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii

    Abstract: This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivat… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  24. arXiv:1903.03237  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a versatile method that accelerates multichannel source separation methods based on full-rank spatial modeling. A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain. One of the most succe… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  25. arXiv:1808.05006  [pdf, other

    cs.AI cs.SD eess.AS

    Statistical Piano Reduction Controlling Performance Difficulty

    Authors: Eita Nakamura, Kazuyoshi Yoshii

    Abstract: We present a statistical-modelling method for piano reduction, i.e. converting an ensemble score into piano scores, that can control performance difficulty. While previous studies have focused on describing the condition for playable piano scores, it depends on player's skill and can change continuously with the tempo. We thus computationally quantify performance difficulty as well as musical fide… ▽ More

    Submitted 25 October, 2018; v1 submitted 15 August, 2018; originally announced August 2018.

    Comments: 12 pages, 7 figures, version accepted to APSIPA Transactions on Signal and Information Processing

  26. arXiv:1710.11439  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

    Authors: Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not ro… ▽ More

    Submitted 19 March, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: 5 pages, 3 figures, version that Eqs. (9), (19), and (20) in v2 (submitted to ICASSP 2018) are corrected. Samples available here: http://sap.ist.i.kyoto-u.ac.jp/members/yoshiaki/demo/vae-nmf/

  27. arXiv:1708.02255  [pdf, other

    cs.AI cs.CL cs.SD

    Generative Statistical Models with Self-Emergent Grammar of Chord Sequences

    Authors: Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

    Abstract: Generative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for… ▽ More

    Submitted 2 March, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

    Comments: 22 pages, 14 figures, version accepted to JNMR, minor revision

  28. Note Value Recognition for Piano Transcription Using Markov Random Fields

    Authors: Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon

    Abstract: This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete… ▽ More

    Submitted 7 July, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

    Comments: 13 pages, 16 figures, version accepted to IEEE/ACM TASLP, minor revision

  29. arXiv:1701.08343  [pdf, other

    cs.AI cs.SD

    Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

    Authors: Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama

    Abstract: In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between v… ▽ More

    Submitted 28 January, 2017; originally announced January 2017.

    Comments: 13 pages, 13 figures, version accepted to IEEE/ACM TASLP

  30. arXiv:1610.02606  [pdf, other

    cs.OH

    Doing Moore with Less -- Leapfrogging Moore's Law with Inexactness for Supercomputing

    Authors: Sven Leyffer, Stefan M. Wild, Mike Fagan, Marc Snir, Krishna Palem, Kazutomo Yoshii, Hal Finkel

    Abstract: Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness, where the quality of the solution can be traded for energy savings, has been proposed as an approach to overcoming those limitations. In the past, however, inexactness necessitated the need for highly customized or specialized hardware. The current evolution of commercial off-the-shelf(COTS)… ▽ More

    Submitted 12 October, 2016; v1 submitted 8 October, 2016; originally announced October 2016.

    Comments: 9 pages, 12 figures, PDFLaTeX. 12 Oct 2016: Corrected author Hal Finkel's affiliation to show ALCF/Argonne

    ACM Class: F.2.1; G.1.5

  31. Singing Voice Separation and Vocal F0 Estimation based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation

    Authors: Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

    Abstract: This paper presents a new method of singing voice analysis that performs mutually-dependent singing voice separation and vocal fundamental frequency (F0) estimation. Vocal F0 estimation is considered to become easier if singing voices can be separated from a music audio signal, and vocal F0 contours are useful for singing voice separation. This calls for an approach that improves the performance o… ▽ More

    Submitted 1 April, 2016; originally announced April 2016.

    Comments: 11 pages