九州大学 研究者情報
研究者情報 (研究者の方へ)入力に際してお困りですか?
基本情報 研究活動 教育活動 社会活動
若宮 幸平(わかみや こうへい) データ更新日:2024.04.25

助教 /  芸術工学研究院 音響設計部門 情報音響システム学

キーワード:音声生成, 3次元磁気センサシステム, 調音運動, 音声合成
キーワード:嗄声, GRBAS尺度, 自動推定, DNN, 声質評価, 聴覚知覚的評価, 信頼性
キーワード:音源分離, モノラル録音
キーワード:HDD, PRML, 等化, 符号化, ディスク・ヘッドインタラクション
1. 九州芸術工科大学音響設計学科編, 井上公子, 岩宮眞一郎, 上田裕市, 緒方敏郎, 尾本章, 河原一彦, 河辺哲次, 鈴木俊行, 高木英行, 津村尚志, 鳥原秀男, 中島祥好, 福留公利, 藤原恭司, 松永建, , 音響設計学入門 音・音楽・テクノロジー, 九州大学出版会, 2000.12.
1. Shunsuke HIDAKA, Yogaku Lee, Moe NAKANISHI, Kohei WAKAMIYA, Takashi NAKAGAWA, Tokihiko KABURAGI, Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data, Journal of Voice, https://doi.org/10.1016/j.jvoice.2022.10.020, 2022.10, Objectives
Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data.

A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations.

Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains.

The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection..
2. Kazuo UEDA, Hiroshige TAKEICHI, Kohei WAKAMIYA, Auditory grouping is necessary to understand interrupted mosaic speech stimuli, The Journal of the Acoustical Society of America, 10.1121/10.0013425, 152, 2, 970-980, 2022.08, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≥4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments..
3. Hidetsugu UCHIDA, Kohei WAKAMIYA, Kaburagi Tokihiko, Improvement of measurement accuracy for the three-dimensional electromagnetic articulograph by optimizing the alignment of the transmitter coils, Acoustical Science and Technology, 37, 3, 106-114, 2016.05, The alignment of transmitter coils for the three-dimensional electromagnetic articulograph (3D-EMA), an instrument used to measure articulatory movements, was studied. Receiver coils of the 3D-EMA are used as position markers and are placed in alternating magnetic field produced by multiple transmitter coils. The estimation of the state (the position and orientation) of each receiver coil is based on the minimization of the signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error irrespective of small signal error at a specific portion of the measurement region. The existence of the non-uniqueness problem in the position estimation is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of the transmitter coils by maximizing the difference between the receiver signals for any pair of states in the measurement region and evaluated the alignment by performing computer simulations and actual measurement. As a result, a measurement accuracy of approximately 0.4 mm was obtained..
4. Tokihiko KABURAGI, Kohei WAKAMIYA, Masaaki HONDA, Three-dimensional electromagnetic articulgoraphy: A measurement principle, J. Acoust. Soc. Am., 10.1121/1.1928707, 118, 1, 428-443, vol. 118(1), pp.428-443, 2005.07.
5. Kohei WAKAMIYA, Takuya TSUJI and Tokihiko KABURAGI, Estimation of the vocal tract spectrum from the articulatory movement usnig phneme-dependent neural networks, Proceedings of International Conference of Spoken Language Processing 2004, TuB603-15, 2004.10.
6. Kohei WAKAMIYA, Tokihiko KABURAGI and Masaaki HONDA, An investigation of the measurement accuracy on the three-dimensional electromagnetic articulography, Proceedings of the 6th International Seminar on Speech Production, 301-307, 2003.12.
7. Tokihiko KABURAGI, Kohei WAKAMIYA and Masaaki HONDA, Three-dimensional electromagnetic articulograph based on a nonparametric representation of the magnetic field, Prec. International Conference on Spoken Language Processing, 2297-2300, 2002.09.
8. 田中輝光, 若宮幸平, 鈴木俊行, 狭トラック記録におけるヘッドフリンジ磁界の影響, 日本応用磁気学会, 24/4-2, 375-378, 2000.04.
9. Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, Read/Write Track Fringe Effect of Thin Film and MR Head with Different Pole Shapes, IEICE Transactions on Electronics, E82-C/12, 2165-2170, 1999.12.
10. 岡本好弘, 稲井祐介, 若宮幸平, 斎藤秀俊, 大沢寿, ディジタル磁気記録におけるパス帰還ビタビ復号法の誤り率特性, 愛媛大学工学部紀要, XVII, 101-110, 1998.02.
主要総説, 論評, 解説, 書評, 報告書等
特許出願件数  2件
特許登録件数  0件
International Speech Communication Association
Institute of Electrical and Electronics Engineering
2019.04~2021.03, 日本音響学会 九州支部, 評議員.
2016.07~2016.07, 電気関係学会九州支部連合大会, プログラム編集委員.
2016.04~2018.03, 日本音響学会 九州支部, 幹事.
2004.04~2005.03, 電気関係学会九州支部連合会, 役員.
2003.04~2005.03, 日本音響学会 九州支部, 幹事.
2016.09.29~2016.09.30, 電気関係学会九州支部連合大会, プログラム編集委員.
2008.09.10~2008.09.20, 音響学会秋季大会, 実行委員.
2004.09~2004.09, 電気関係学会九州支部大会, 役員.
年度 外国語雑誌査読論文数 日本語雑誌査読論文数 国際会議録査読論文数 国内会議録査読論文数 合計
海外渡航状況, 海外での教育研究歴
Pukyong National University, Korea, 2017.02~2017.02.
Max Atria at Singapore EXPO, Singapore, 2014.09~2014.09.
International courage of Tourism and Hotel management, Australia, 2003.12~2003.12.
2023年度~2025年度, 基盤研究(C), 代表, 発話器官の運動軌道を用いた聴覚障害者用発話訓練システムに関する研究.
2023年度~2023年度, 公益財団法人 電気通信普及財団 2022年度研究調査助成, 代表, 調音・音声データベースの構築.
2019年度~2021年度, 曽田豊二記念財団 研究助成金, 代表, 3次元磁気センサシステムを用いた調音観測とその応用に関する研究.
2007.09~2008.03, 分担, 3次元磁気センサシステムの構築.

