Kyushu University Academic Staff Educational and Research Activities Database
Researcher information (To researchers) Need Help? How to update
Kohei WAKAMIYA Last modified date:2023.11.27

Assistant Professor / Information and acoustics system
Department of Acoustic Design
Faculty of Design


Graduate School
Undergraduate School


E-Mail *Since the e-mail address is not displayed in Internet Explorer, please use another web browser:Google Chrome, safari.
Homepage
https://kyushu-u.elsevierpure.com/en/persons/kohei-wakamiya
 Reseacher Profiling Tool Kyushu University Pure
Phone
092-553-4547
Fax
092-553-4520
Academic Degree
Ph.D in Engineering
Country of degree conferring institution (Overseas)
No
Field of Specialization
Speech Science
Total Priod of education and research career in the foreign country
00years00months
Outline Activities
The speech conversation is one of the most fundamental communication resources between human beings. Human beings move their articulators for something speaking and can produce speech sounds with variety qualities when they speak something. I would like to investigate how human beings produce these speech sounds. For this purpose, I would like to make an artificial speech production model analogous to the human speech mechanism. Now, my study interested in the three dimensional electromagnetic articulography which is used for observation inside and outside the vocal tract for analysis of the speech production mechanism. In addition, my research is interested in the source separation for the monaural recording sound and the auditory perceptual evaluation for the hoarseness voice, too. On the other hand, I am in charge of a class of Acoustics Lab I and II, Electric Lab, Information Theory, Speech Informatics, Acoustics programing Practice and Seminar etc.
Research
Research Interests
  • Study for observation and the analysis of the speech production mechanism
    keyword : Speech production, Three-dimensional electro-magnetic articulography, Articulator movement, and Speech synthesis
    2001.04Obseavation instruments and methods, and ayalysis for human speech production mechanizm are studied..
  • Study on an auditory perceptual evaluation for hoarseness
    keyword : Hoarseness, GRBAS scale, automatic estimation, DNN, voice quality evaluation, auditory perceptual evaluation, reliability
    2017.04Obseavation instruments and methods, and ayalysis for human speech production mechanizm are studied..
  • A study of the source separation for the monaural recording sound.
    keyword : monaural recording sound, source separation
    2013.04~2015.05.
  • Study for the high-density digital magnetic recording.
    keyword : HDD, PRML, Equalization, Encoding, and Head/disk interaction
    1997.04~2000.03Head/Disk Interation and signal processing for high density hard disk drive are studied..
Academic Activities
Books
1. ``Introduction of Acoustic Design -Sound, Music and Technology,'' edited by Department of Acoustic Design, Kyushu Institute of Design, authors, Kimiko INOUE, Shinichiro IWAMIYA, Yuichi UEDA, Toshio OGATA, Akira OMOTO, Kazuhiko KAWAHARA, Tetsuji KAWABE, Toshiyuki SUZUKI, Hideyuki TAKAGI, Takashi TSUMURA, Hideo TORIHARA, Yoshitaka NAKAJIMA, Kimitoshi FUKUDOME, Kyoji FUJIWARA, Ken MATSUNAGA, Toshio MATSUMOTO, Tsuneo MIZUNO, Masato YAKO, Shigeru YOSHIKAWA and Kohei WAKAMIYA(Kyushu University Press)..
Papers
1. Shunsuke HIDAKA, Yogaku Lee, Moe NAKANISHI, Kohei WAKAMIYA, Takashi NAKAGAWA, Tokihiko KABURAGI, Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data, Journal of Voice, https://doi.org/10.1016/j.jvoice.2022.10.020, 2022.10, Objectives
Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data.

Methods
A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations.

Results
Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains.

Conclusion
The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection..
2. Kazuo UEDA, Hiroshige TAKEICHI, Kohei WAKAMIYA, Auditory grouping is necessary to understand interrupted mosaic speech stimuli, The Journal of the Acoustical Society of America, 10.1121/10.0013425, 152, 2, 970-980, 2022.08, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≥4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments..
3. Hidetsugu UCHIDA, Kohei WAKAMIYA, Kaburagi Tokihiko, Improvement of measurement accuracy for the three-dimensional electromagnetic articulograph by optimizing the alignment of the transmitter coils, Acoustical Science and Technology, 37, 3, 106-114, 2016.05, The alignment of transmitter coils for the three-dimensional electromagnetic articulograph (3D-EMA), an instrument used to measure articulatory movements, was studied. Receiver coils of the 3D-EMA are used as position markers and are placed in alternating magnetic field produced by multiple transmitter coils. The estimation of the state (the position and orientation) of each receiver coil is based on the minimization of the signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error irrespective of small signal error at a specific portion of the measurement region. The existence of the non-uniqueness problem in the position estimation is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of the transmitter coils by maximizing the difference between the receiver signals for any pair of states in the measurement region and evaluated the alignment by performing computer simulations and actual measurement. As a result, a measurement accuracy of approximately 0.4 mm was obtained..
4. Tokihiko KABURAGI, Kohei WAKAMIYA, Masaaki HONDA, Three-dimensional electromagnetic articulgoraphy: A measurement principle, J. Acoust. Soc. Am., 10.1121/1.1928707, 118, 1, 428-443, vol. 118(1), pp.428-443, 2005.07.
5. Kohei WAKAMIYA, Takuya TSUJI and Tokihiko KABURAGI, Estimation of the vocal tract spectrum from the articulatory movement usnig phneme-dependent neural networks, Proceedings of International Conference of Spoken Language Processing 2004, TuB603-15, 2004.10.
6. Kohei WAKAMIYA, Tokihiko KABURAGI and Masaaki HONDA, An investigation of the measurement accuracy on the three-dimensional electromagnetic articulography, Proceedings of the 6th International Seminar on Speech Production, 301-307, 2003.12.
7. Tokihiko KABURAGI, Kohei WAKAMIYA and Masaaki HONDA, Three-dimensional electromagnetic articulograph based on a nonparametric representation of the magnetic field, Prec. International Conference on Spoken Language Processing, 2297-2300, 2002.09.
8. Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, Read/Write Track Fringe Effect of Thin Film and MR Head with Different Pole Shapes, IEICE Transactions on Electronics, E82-C/12, 2165-2170, 1999.12.
9. ``Error Rate Performance of Viterbi Detection with Path-feedback in Digital Magnetic Recording,'' Yoshihiro OKAMOTO, Yusuke INAI, Kohei WAKAMIYA, Hidetoshi SAITO and Hisashi OSAWA, Memoirs of the Faculty of Engineering, Ehime University.
Educational
Educational Activities
Basically Dynamics and Practice, Design Literacy Basics, Design Case Study I, Acoustics Lab I and II, Electrical Lab, Information Theory, Basic Practice of Design, Information Processing Practice V, Digital Signal Processing Practice, Speech Informatics, Acoustical Programing Practice, Seminar, Special seminar of Acoustic Information Transmission, PBL of Acoustic Information Transmission, Advanced Speech Production
Social
Professional and Outreach Activities
A Japanese Scholarship Association interview committee(1999)..