Kyushu University Academic Staff Educational and Research Activities Database
List of Presentations
Kohei WAKAMIYA Last modified date:2023.11.27

Assistant Professor / Information and acoustics system / Department of Acoustic Design / Faculty of Design


Presentations
1. Interrupted and stretched mosaic speech: Dissociating the effect of interruption from the temporal resolution degradation on intelligibility in interrupted and stretched mosaic speech.
2. Classification of vocal fold vibration patterns based on high-speed digital imaging in whistle register.
3. Automatic Evaluation of Pathological Voice Quality using a Small Set of Laveled Voice Samples.
4. Kazuo UEDA, Hiroshige TAKEICH, Kohei WAKAMIYA, Gerard B. REMIJN, Auditory grouping by stretching: Regaining intelligibility of interrupted mosaic speech stimuli, 日本音響学会 2022年秋季研究発表会, 2022.09, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here we show that the intelligibility for masaic speech, in which original speech was segmented in frequency and time, and noise-vocoded with the average power in each unit, was largely reduced by periodical interruption. The interruption was devastating for mosaic speech, very likely because the deprivation of priodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments with stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≧ 4) and the original segment duration was equal to or less than 40 ms. These results sugget that a grouping cue may play an important role in the perception of normal speech under adverse conditions..
5. Kazuo Ueda, Hiroshige Takeichi, Kohei Wakamiya, AUDITORY GROUPING FACILITATES UNDERSTANDING INTERRUPTED MOSAIC SPEECH STIMULI, 38th Annual Meeting of the International Society for Psychophysics, 2022.08, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here we show that the intelligibility for mosaic speech, in which original speech was segmented in frequency and time, and noise-vocoded with the average power in each unit, was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments with stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≧ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because a poor grouping cue, which resulted from the deprivation of periodicity and temporal fine structure with mosaicking, prevented successful auditory grouping for the interrupted segments. These results suggest that a grouping cue should play an important role in the perception of normal speech under adverse conditions..
6. Shunsuke HIDAKA, Kohei WAKAMIYA, Tokihiko KABURAGI, AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION, 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), 2022.05, While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for tasks such as speech enhancement and source separation. In this study, we extensively investigated the effectiveness of including phase information of signals for eight audio classification tasks. We constructed a learnable front-end that can compute the phase and its derivatives based on a time-frequency representation with mel-like frequency axis. As a result, experimental results showed significant performance improvement for musical pitch detection, musical instrument detection, language identification, speaker identification, and birdsong detection. On the other hand, overfitting to the recording condition was observed for some tasks when the instantaneous frequency was used. The results implied that the relationship between the phase values of adjacent elements is more important than the phase itself in audio classification..
7. An Investigation of Effective Representation of Phase Information for Audio Classification.
8. On the Reproducibility of Auditory Perceptual Evaluation of Pathological Voice.
9. Shunsuke Hidaka1, Yogaku Lee, Kohei Wakamiya, Takashi Nakagawa, Tokihiko Kaburagi, Automatic Estimation of Pathological Voice Quality based on Recurrent Neural Network using Amplitude and Phase Spectrogram, Interspeech 2020, 2020.10, Perceptual evaluation of voice quality is widely used in laryngological practice, but it lacks reproducibility caused by inter- and intra-rater variability. This problem can be solved by automatic estimation of voice quality using machine learning. In the previous studies, conventional acoustic features, such as jitter, have often been employed as inputs. However, many of them are vulnerable to severe hoarseness because they assume a quasiperiodicity of voice. This paper investigated non-parametric features derived from amplitude and phase spectrograms. We applied the instantaneous phase correction proposed by Yatabe et al. (2018) to extract features that could be interpreted as indicators of non-sinusoidality. Specifically, we compared log amplitude, temporal phase variation, temporal complex value variation, and mel-scale versions of them. A deep neural network with a bidirectional GRU was constructed for each item of GRBAS Scale, a hoarseness evaluation method. The dataset was composed of 2545 samples of sustained vowel /a/ with the GRBAS scores labeled by an otolaryngologist. The results showed
that the Hz-mel conversion improved the performance in almost all the case. The best scores were obtained when using temporal phase variation along the mel scale for Grade, Rough, Breathy, and Strained, and when using log mel amplitude for Asthenic..
10. Evaluation of Pathological Voice Quality considering Amplitude and Phase Spectra.
11. Shunsuke Hidaka, Yogaku Lee, Kohei Wakamiya, Takashi Nakagawa, Tokihiko Kaburagi, Automatic Evaluation of Voice Severity using Deep Neural Network, The Voice Foundation's VIRTUAL VOICE SYMPOSIUM Care of the Professional Voice, 2020.05, Introduction: Perceptual evaluation of voice quality (e.g., the GRBAS scale or CAPE-V) is used widely in laryngological practice. However, this method suffers from the lack of reproducibility caused by inter- and intra-rater variability. To date, it has been a topic of discussion among clinicians how to improve the reliability of judgement. Objective: The purpose of this study was to solve the inevitable problem of perceptual evaluation by building an automatic evaluation system. Understandably, automatic evaluation is surely reproducible (i.e., reliable). Moreover, the system was required to output meaningful judgements (i.e., to be valid). Methods: We constructed a deep neural network (DNN) that estimated all the scores of the GRBAS scale. DNN was composed of Bidirectional GRUs and fully connected layers. As the acoustic feature, we compared spectrogram and mel-spectrogram of speech samples obtained using sustained vowel /a/. The dataset for supervised learning was composed of 3118 samples. All true labels were given by an otolaryngologist. Results: The performance of the system was measured in terms of accuracy and statistical agreement index Cohen’s linearly weighted Kappa. Five-fold cross validation showed the accuracy of 60% on average. The Kappa scores of GBAS were “moderate” and that of R was “fair.” For all the GRBAS, the performance was higher when using mel-spectrogram. Conclusions: Our study showed the feasibility of automatic evaluation. In order to indicate how valid the system performance is, future studies could investigate inter- and intra-rater variability for our dataset..
12. Evaluation of Pathological Voice Quality using Deep Neural Network.
13. Acoustic related demonstrations on the open campus at school of design, kyushu university – case study on the year of 2019 –.
14. Collection of large-scale Japanese articulatory-acoustic parallel data.
15. Kohei WAKAMIYA, Hidetsugu UCHIDA, Tokihiko Kaburagi, ALIGNMENT OF THE TRANSMITTER COILS IN THE THREE-DIMENSIONAL ELECTROMAGNETIC ARTICULOGRAPHY HAVING EIGHT TRANSMISSION CHANNELS
, Youngnam-Kyushu Joint Conference on Acoustics 2017, 2017.02, The relationship between position-estimation performance and transmitter-coil alignment in three-dimensional electromagnetic articulography (3D-EMA) having eight transmission channels was investigated. 3D-EMA is a measurement system used to observe articulatory movements and consists of transmitter coils as magnetic field generators and receiver coils as position markers. In this system, the state (position and orientation) of each receiver coil is estimated by minimizing the signal difference between the measured and predicted receiver signals. The magnetic field distribution determined by the transmitter-coil alignment has a strong influence on the position-estimation performance. In a previous study, we proposed a method for transmitter-coil alignment using a criterion expressing the minimum signal difference between two points in the measurement region. We have here increased the number of transmission channels from six to eight and investigated the position-estimation performance for various combinations of transmitter-coil position using computer simulations. As a result, we found that two combinations of transmitter-coil positions produced good results for the position estimation. One combination allocated coils at the vertices of a rectangular parallelepiped; in another the transmitter coils were allocated at the vertices of a cube. For both, the spacing of the coils was reduced from the ordinary size. .
16. Examination of the electromagnetic articulograph having eight transmission channels.
17. Construction of a three-dimensional electromagnetic articulograph system having an optimal transmitter alignment.
18. Kohei WAKAMIYA, Hidetsugu UCHIDA, Tokihiko Kaburagi, The Effect of Additional Transmission Channels in Three-dimensional Electromagnetic Articurography, Kyushu-Youngnam Joint Conference on Acoustics 2015, 2015.01, The relations between the performance of the position estimation and the number of transmission channels in the three-dimensional electromagnatic articulography(3D-EMA), a measurement system used to observe articulatory movements, were investigated. The receiver coils of the 3D-EMA are used as position markers and are placed in alternating magnetic field produced by multiple transmitter coils. The state(position and orientation) of each receiver coil is estimated by minimizing the signal error between the measured and predicted reciever signals using the magnetic field model. In our previous study, we proposed an alignment method of transmitter coils using the criterion that the minimum value of the difference between predicted receiver signals for any two states in the measurement region. This method was developed to resolve the problem that the existence of the specific zone in the measurement region that the position estimation error was noticeable increased, irrespective of small signal error. In this study, we added additional transmitter coils to the 3D-EMA system, optimized the alignment of transmitter, and investigated the estimation accuracy by computer simulation. As a result, we know that if the number of transmission channels is increased, the estimation accuracy is improved, and the maximum estimation error might become less than 1mm when the number is 8..
19. Hidetsugu UCHIDA, Kohei WAKAMIYA, Tokihiko Kaburagi, A Study on the Improvement of Measurement Accuracy of the Three-Dimensional Electromagnatic Articulography, Interspeech 2014, 2014.09, The alignment of the transmitter coils for the three-dimensional electromagnentic articulography(3D-EMA), an instrument used to measure articulatory movements, was studied. The receiver coils of the 3D-EMA are used as position markers and are placed in an alternating magnetic field produced by multiple transmitter coils. The estimation of state(the position and orientation) of each receiver coil is based on the minimization of signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error at a specific portion of the measurement region irrespective of small signal error values. The existence of non-uniqueness in the position estimation problem is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of tha transmitter coils by maximizing the difference between the receiver signals at any two states in the mesurement region and evaluated the alignment using a computer simulation and an experiment. As a result, a measurement accuracy of approximatley 0.4mm was obtained..
20. A study about the number of the transmission channels in the three-dimensional electromagnetic artculography.
21. Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi, A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography, 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2014.01, The alignment of the transmitter coils for the three-dimensional electromagnetic articulography (3D-EMA), an instrument used to measure articulatory movements, was studied. The receiver coils of the 3D-EMA are used as position markers and are placed in an alternating magnetic field produced by multiple transmitter coils. The estimation of state (the position and orientation) of each receiver coil is based on the minimization of signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error at a specific portion of the measurement region irrespective of small signal error values. The existence of non-uniqueness in the position estimation problem is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of the transmitter coils by maximizing the difference between the receiver signals at any two states in the measurement region and evaluated the alignment using a computer simulation and an experiment. As a result, a measurement accuracy of approximately 0.4 mm was obtained..
22. A study on the optimal alignment of transmitter coils for a three-dimensional electromagnetic articulography.
23. A study on the alignment of tranmitter coils for three-dimensional articulography and accuracy evaluation.
24. A Study on the alignment of tranmitter coils for three-dimensional articulography.
25. Case study of introduction and planning ``2003 Spring Exhibition: Sound and Technology'' at The Saga Pref. Space & Science Museum..
26. [Tutorial Invited Lecture]An Introductory Workshop on Acoustics for High School Students.
27. Continuum model of the vocal fold based on a particle method.
28. Construction of a continuum model of the vocal fold using a particle method.
29. A performance evaluation of the three-dimensional electromagnetic articulographic system.
30. An evaluation of the three-dimensional electromagnetic articulographic system.
31. Articulatory-to-Acoustic Mapping Using Phoneme-Dependent Neural Networks
---An Investigation on the Parameter Learning Method---.
32. Generation of the vocal-tract spectrum using a dynamic articulatory model.
33. Kohei Wakamiya, Tokihiko Kaburagi, Takuya Tsuji, Jiji Kim, Estimation of the vocal tract spectrum from articulatory movements using phoneme-dependent neural networks, 8th International Conference on Spoken Language Processing, ICSLP 2004, 2004.01, This paper presents an estimation method of the vocal tract spectrum from articulatory movements. The method is based on the interpolation of spectra obtained by phoneme-dependent neural networks. Given the phonemic context and articulation timing corresponding to each phoneme, the proposed method first transforms articulator positions to phoneme-dependent spectra. Then the vocal tract spectrum is estimated by the interpolation of transformed spectra. This interpolation is based on the distance among the input articulator position and that of the preceding and succeeding phonemes. Also, a training procedure of the neural networks is presented while taking the spectral interpolation into account. Articulatory and acoustic data pairs collected by a simultaneous recording of articulator positions and speech were used as the training and test data. Finally, we showed an estimation result using the proposed method..
34. ``On the convergency of position estimation on the three-dimensional electromagnetic articulography,'' Kohei WAKAMIYA, Tokihiko KABURAGI and Masaaki HONDA, The 2003 autumn meeting of the acoustical society of Japan.
35. ``Articulatory-to acoustic Mapping using Phoneme-based Neural Networks,'' Takuya TSUJI, Jiji KIM, Kohei WAKAMIYA and Tokihiko KABURAGI, IEICE Speech conference.
36. ``A factor of position estimation error on a three-dimensional electromagnetic artigulography,'' Kohei WAKAMIYA, Tokihiko KABURAGI, Kohtaro SAWADA, Masaaki HONDA and Masashi SAWADA, The 2002 autumn meeting of acoustical society of Japan.
37. ``A study on the source of position estimation errors in a three-dimensional electromagnetic articulography,'' Kohei WAKAMIYA, Tokihiko KABURAGI, Kohtaro SAWADA, Masaaki HONDA and Masashi SAWADA, IEICE Speech conference.
38. ``A study on the accuracy of position estimates on a three-dimentional electromagnetic articulograph system,'' Kohei WAKAMIYA, Tokihiko KABURAGI, Kohtaro SAWADA and Masaaki HONDA, The 2002 spring meeting of the acoustical society of Japan.
39. ``Three-dimensional electromagnetic articulography using a sline representation of the magnetic feild,'' Tokihiko KABURAGI, Kohei WAKAMIYA, Kohtaro SAWADA and Masaaki HONDA, IEICE Speech conference.
40. Tokihiko Kaburagi, Kohei Wakamiya, Masaaki Honda, Three-dimensional electromagnetic articulograph based ona nonparametric representation of the magnetic field, 7th International Conference on Spoken Language Processing, ICSLP 2002, 2002.01, A measurement method of the three-dimensional electromagnetic articulograph system is presented to investigate the dynamic behavior of articulatory organs which can include lateral or rotational movements. To accurately represent the spatial pattern of the magnetic field, we use a multivariate B-spline function, which smoothly interpolates a given set of calibration data samples. The strength of the received signal is predicted based on the spline field function while considering the tilting effect of the receiver coil relative to the direction of the magnetic field. The position and orientation of the receiver coil are then estimated using an iterative procedure so that the difference between the measured and predicted signal strengths is minimized. Preliminary experiments showed that the mean estimation error of the receiver position is about 0.5 mm when the axis of the receiver coil is parallel with one of the axes of the coordinate system..
41. ``Estimable Limitation of High Linear Density Waveforms Superposing Reproduced Isolated Pulses,'' Kohei WAKAMIYA and Toshiyuki SUZUKI, IEICE Magnetic Recording Conference.
42. ``A Study of Superposing Reproduced Isolated Pulses,'' Kohei WAKAMIYA, Kazutaka TAKENAGA and Toshiyuki SUZUKI, 2000 Joint Conference of Electrical and Electronics Engineers in Kyushu.
43. ``A study of neural network equalization using error confinement,'' Kohei WAKAMIYA and Toshiyuki SUZUKI, 1999 joint conference of electrical and electronics engineers in Kyushu.
44. ``Head fringe field effects on thin film disk performance,'' Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, The 23rd Annual Conference of Magnetics in Japan.
45. ``Performance improvement of M-H loop tracer for soft-magnetic thin films using virtual instrument'', Kohei WAKAMIYA, Shomin JAMG and Toshiyuki SUZUKI, 1998 Joint Conference of Electrical and Electronics Engineers in Kyushu.
46. ``Appearance Machanisms on Read/Write Track Fringe Widths in Thin Film Disks,'' Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, 1998 Joint Conference of Electrical and Electronics Engineers in Kyushu.
47. ``Considerations on Read/Write Track Fringe Widths in Thin Film Disks,'' Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, The 22nd Annual Conference on Magnetics in Japan.