1. |
Kazuo Ueda, Masashi Hashimoto, Hiroshige Takeichi, and Kohei Wakamiya, Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretching, The Journal of the Acoustical Society of America, https://doi.org/10.1121/10.0025132, 155, 3, 1767-1779, 2024.03, Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970–980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of inter- ruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n ¼ 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restora- tion may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for “U”-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.. |
2. |
Hikari KATO, Yogaku LEE, Kohei WAKAMIYA, Takashi NAKAGAWA and Tokihiko KABURAGI, Vocal Fold Vibration of the Whistle Register Observed by High-Speed Digital Imaging, Journal of Voice, https://doi.org/10.1016/j.jvoice.2023.08.026, 2023.06, Introduction Singers use a whistle register to sing at a fundamental frequency above 1000 Hz. In previous studies, vocal fold vibrations with or without complete closure and partial vocal fold vibrations were observed depending on the subject. However, the production mechanism of the whistle register is not yet clearly understood because of the limitations of the imaging device for the glottis and subjects. Objectives This study aims to examine vocal fold vibrations in a whistle register. Methods The dynamic behavior of the glottis was recorded for six singers (four females and two males) using a high-speed digital imaging device with a frame rate above 10,000 fps. Audio signals were recorded simultaneously. The data were analyzed in the form of topography, glottal area waveforms, spectrograms, and phonovibrography to examine spatiotemporal patterns of glottal motion. Results The vibratory motion of the vocal folds was classified into six patterns. The first pattern was the entire vocal fold vibration with complete closure during the closed phase. The second to fifth was the entire vocal fold vibration without complete closure, where a gap was observed for the full length of the vocal folds for the second, at the posterior part of the glottis for the third, at the anterior for the fourth, and at both ends for the fifth. In the sixth pattern, the vocal folds vibrated partially. Our results support the previous findings on the vibration of the vocal folds. In addition, we identified novel vibratory patterns in the vocal folds. Conclusion We conclude that the production of the whistle register is not just an extension of the falsetto register to the higher fundamental-frequency region; rather, the production mechanism of the whistle register appeared to be diverse as a means of vocalization. Key Words. |
3. |
Shunsuke HIDAKA, Yogaku Lee, Moe NAKANISHI, Kohei WAKAMIYA, Takashi NAKAGAWA, Tokihiko KABURAGI, Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data, Journal of Voice, https://doi.org/10.1016/j.jvoice.2022.10.020, 2022.10, Objectives Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data.
Methods A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations.
Results Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains.
Conclusion The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection.. |
4. |
Kazuo UEDA, Hiroshige TAKEICHI, Kohei WAKAMIYA, Auditory grouping is necessary to understand interrupted mosaic speech stimuli, The Journal of the Acoustical Society of America, 10.1121/10.0013425, 152, 2, 970-980, 2022.08, The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough (≥4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.. |
5. |
Hidetsugu UCHIDA, Kohei WAKAMIYA, Kaburagi Tokihiko, Improvement of measurement accuracy for the three-dimensional electromagnetic articulograph by optimizing the alignment of the transmitter coils, Acoustical Science and Technology, 37, 3, 106-114, 2016.05, The alignment of transmitter coils for the three-dimensional electromagnetic articulograph (3D-EMA), an instrument used to measure articulatory movements, was studied. Receiver coils of the 3D-EMA are used as position markers and are placed in alternating magnetic field produced by multiple transmitter coils. The estimation of the state (the position and orientation) of each receiver coil is based on the minimization of the signal error between the measured and predicted receiver signals using a model of the magnetic field. Previous studies report a noticeable increase in the position estimation error irrespective of small signal error at a specific portion of the measurement region. The existence of the non-uniqueness problem in the position estimation is hypothesized to be the cause of this problem. To resolve the problem, we optimized the alignment of the transmitter coils by maximizing the difference between the receiver signals for any pair of states in the measurement region and evaluated the alignment by performing computer simulations and actual measurement. As a result, a measurement accuracy of approximately 0.4 mm was obtained.. |
6. |
Tokihiko KABURAGI, Kohei WAKAMIYA, Masaaki HONDA, Three-dimensional electromagnetic articulgoraphy: A measurement principle, J. Acoust. Soc. Am., 10.1121/1.1928707, 118, 1, 428-443, vol. 118(1), pp.428-443, 2005.07. |
7. |
Kohei WAKAMIYA, Takuya TSUJI and Tokihiko KABURAGI, Estimation of the vocal tract spectrum from the articulatory movement usnig phneme-dependent neural networks, Proceedings of International Conference of Spoken Language Processing 2004, TuB603-15, 2004.10. |
8. |
Kohei WAKAMIYA, Tokihiko KABURAGI and Masaaki HONDA, An investigation of the measurement accuracy on the three-dimensional electromagnetic articulography, Proceedings of the 6th International Seminar on Speech Production, 301-307, 2003.12. |
9. |
Tokihiko KABURAGI, Kohei WAKAMIYA and Masaaki HONDA, Three-dimensional electromagnetic articulograph based on a nonparametric representation of the magnetic field, Prec. International Conference on Spoken Language Processing, 2297-2300, 2002.09. |
10. |
Terumitsu TANAKA, Kohei WAKAMIYA and Toshiyuki SUZUKI, Read/Write Track Fringe Effect of Thin Film and MR Head with Different Pole Shapes, IEICE Transactions on Electronics, E82-C/12, 2165-2170, 1999.12. |
11. |
``Error Rate Performance of Viterbi Detection with Path-feedback in Digital Magnetic Recording,'' Yoshihiro OKAMOTO, Yusuke INAI, Kohei WAKAMIYA, Hidetoshi SAITO and Hisashi OSAWA, Memoirs of the Faculty of Engineering, Ehime University. |