九州大学 研究者情報
論文一覧
若宮 幸平(わかみや こうへい) データ更新日:2023.11.27

助教 /  芸術工学研究院 音響設計部門 情報音響システム学


原著論文
1. Shunsuke HIDAKA, Yogaku Lee, Moe NAKANISHI, Kohei WAKAMIYA, Takashi NAKAGAWA, Tokihiko KABURAGI, Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data, Journal of Voice, https://doi.org/10.1016/j.jvoice.2022.10.020, 2022.10, Objectives
Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data.

Methods
A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations.

Results
Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains.

Conclusion
The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection..
2. Kazuo UEDA, Hiroshige TAKEICHI, Kohei WAKAMIYA, Auditory grouping is necessary to understand interrupted mosaic speech stimuli, The Journal of the Acoustical Society of America, 10.1121/10.0013425, 152, 2, 970-980, 2022.08, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≥4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments..
3. Hidetsugu UCHIDA, Kohei WAKAMIYA, Kaburagi Tokihiko, Improvement of measurement accuracy for the three-dimensional electromagnetic articulograph by optimizing the alignment of the transmitter coils, Acoustical Science and Technology, 37, 3, 106-114, 2016.05, The alignment of transmitter coils for the three-dimensional electromagnetic articulograph (3D-EMA), an instrument used to measure articulatory movements, was studied. Receiver coils of the 3D-EMA are used as position markers and are placed in alternating magnetic field produced by multiple transmitter coils. The estimation of the state (the position and orientation) of each receiver coil is based on the minimization of the signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error irrespective of small signal error at a specific portion of the measurement region. The existence of the non-uniqueness problem in the position estimation is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of the transmitter coils by maximizing the difference between the receiver signals for any pair of states in the measurement region and evaluated the alignment by performing computer simulations and actual measurement. As a result, a measurement accuracy of approximately 0.4 mm was obtained..
4. Tokihiko KABURAGI, Kohei WAKAMIYA, Masaaki HONDA, Three-dimensional electromagnetic articulgoraphy: A measurement principle, J. Acoust. Soc. Am., 10.1121/1.1928707, 118, 1, 428-443, vol. 118(1), pp.428-443, 2005.07.
5. Kohei WAKAMIYA, Takuya TSUJI and Tokihiko KABURAGI, Estimation of the vocal tract spectrum from the articulatory movement usnig phneme-dependent neural networks, Proceedings of International Conference of Spoken Language Processing 2004, TuB603-15, 2004.10.
6. Kohei WAKAMIYA, Tokihiko KABURAGI and Masaaki HONDA, An investigation of the measurement accuracy on the three-dimensional electromagnetic articulography, Proceedings of the 6th International Seminar on Speech Production, 301-307, 2003.12.
7. Tokihiko KABURAGI, Kohei WAKAMIYA and Masaaki HONDA, Three-dimensional electromagnetic articulograph based on a nonparametric representation of the magnetic field, Prec. International Conference on Spoken Language Processing, 2297-2300, 2002.09.
8. 田中輝光, 若宮幸平, 鈴木俊行, 狭トラック記録におけるヘッドフリンジ磁界の影響, 日本応用磁気学会, 24/4-2, 375-378, 2000.04.
9. Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, Read/Write Track Fringe Effect of Thin Film and MR Head with Different Pole Shapes, IEICE Transactions on Electronics, E82-C/12, 2165-2170, 1999.12.
10. 岡本好弘, 稲井祐介, 若宮幸平, 斎藤秀俊, 大沢寿, ディジタル磁気記録におけるパス帰還ビタビ復号法の誤り率特性, 愛媛大学工学部紀要, XVII, 101-110, 1998.02.

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。