|Kazuo UEDA||Last modified date：2019.05.09|
Associate Professor / Perceptual Psychology / Department of Human Science / Faculty of Design
|1.||Kazuo Ueda, Tomoya Araki, Yoshitaka Nakajima, Frequency specificity of amplitude envelope patterns in noise-vocoded speech, Hearing Research, 10.1016/j.heares.2018.06.005, 367, 169-181, 2018.08, We examined the frequency specificity of amplitude envelope patterns in 4 frequency bands, which universally appeared through factor analyses applied to power fluctuations of critical-band filtered speech sounds in 8 different languages/dialects [Ueda and Nakajima (2017). Sci. Rep., 7 (42468)]. A series of 3 perceptual experiments with noise-vocoded speech of Japanese sentences was conducted. Nearly perfect (92–94%) mora recognition was achieved, without any extensive training, in a control condition in which 4-band noise-vocoded speech was employed (Experiments 1–3). Blending amplitude envelope patterns of the frequency bands, which resulted in reducing the number of amplitude envelope patterns while keeping the average spectral levels unchanged, revealed a clear deteriorating effect on intelligibility (Experiment 1). Exchanging amplitude envelope patterns brought generally detrimental effects on intelligibility, especially when involving the 2 lowest bands (≲1850 Hz; Experiment 2). Exchanging spectral levels averaged in time had a small but significant deteriorating effect on intelligibility in a few conditions (Experiment 3). Frequency specificity in low-frequency-band envelope patterns thus turned out to be conspicuous in speech perception..|
|2.||Yoshitaka Nakajima, Mizuki Matsuda, Kazuo Ueda, and Gerard B. Remijn, Temporal Resolution Needed for Auditory Communication: Measurement with Mosaic Speech, Frontiers in Human Neuroscience, 10.3389/fnhum.2018.00149, 12, 149, 2018.04, Temporal resolution needed for Japanese speech communication was measured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for healthy listeners to perceive speech is introduced. As a first step, we report listeners' intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called "mosaic speech": speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of ~40 ms into vivid motion..|
|3.||Kazuo UEDA, Yoshitaka NAKAJIMA, Wolfgang ELLERMEIER, Florian KATTNER, Intelligibility of locally time-reversed speech: A multilingual comparison, Scientific Reports, 10.1038/s41598-017-01831-z, 7, doi:10.1038/s41598-017-01831-z, 2017.05, [URL], A set of experiments was performed to make a cross-language comparison of intelligibility of locally time-reversed speech, employing a total of 117 native listeners of English, German, Japanese, and Mandarin Chinese. The experiments enabled to examine whether the languages of three types of timing---stress-, syllable-, and mora-timed languages---exhibit different trends in intelligibility, depending on the duration of the segments that were temporally reversed. The results showed a strikingly similar trend across languages, especially when the time axis of segment duration was normalised with respect to the deviation of a talker's speech rate from the average in each language.
This similarity is somewhat surprising given the systematic differences in vocalic proportions characterising the languages studied which had been shown in previous research and were largely replicated with the present speech material. These findings suggest that a universal temporal window shorter than 20--40~ms plays a crucial role in perceiving locally time-reversed speech by working as a buffer in which temporal reorganisation can take place with regard to lexical and semantic processing..
|4.||Yoshitaka NAKAJIMA, Kazuo UEDA, Shota FUJIMARU, Hirotoshi MOTOMURA, Yuki OHSAKA, English phonology and an acoustic language universal, Scientific Reports, 10.1038/srep46049, 7, 46049, 1-6, doi: 10.1038/srep46049, 2017.04, [URL], Acoustic analyses of eight different languages/dialects had revealed a language universal: Three spectral factors consistently appeared in analyses of power fluctuations of spoken sentences divided by critical-band filters into narrow frequency bands. Examining linguistic implications of these factors seems important to understand how speech sounds carry linguistic information. Here we show the three general categories of the English phonemes, i.e., vowels, sonorant consonants, and obstruents, to be discriminable in the Cartesian space constructed by these factors: A factor related to frequency components above 3,300 Hz was associated only with obstruents (e.g., /k/ or /z/), and another factor related to frequency components around 1,100 Hz only with vowels (e.g., /a/ or /i/) and sonorant consonants (e.g., /w/, /r/, or /m/). The latter factor highly correlated with the hypothetical concept of sonority or aperture in phonology. These factors turned out to connect the linguistic and acoustic aspects of speech sounds systematically..|
|5.||Kazuo UEDA, Yoshitaka NAKAJIMA, An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, Scientific Reports, doi: 10.1038/srep42468, 7, 42468, 1-4, doi: 10.1038/srep42468, 2017.02, [URL], The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands---covering approximately 50--540, 540--1,700, 1,700--3,300, and above 3,300 Hz---from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated---the low & mid-high factor related to the two separate frequency ranges of 50--540 and 1,700--3,300 Hz, the mid-low factor the range of 540--1,700 Hz, and the high factor the range above 3,300 Hz---in these different languages/dialects, suggesting a language universal..|
|6.||Takuya KISHIDA, Yoshitaka NAKAJIMA, Kazuo UEDA, Gerard Remijn, Three Factors Are Critical in Order to Synthesize Intelligible Noise-Vocoded Japanese Speech, Front. Psychol., 26 April 2016, http://dx.doi.org/10.3389/fpsyg.2016.00517, 7, 517, 1-9, 2016.04, [URL].|
|7.||Wolfgang Ellermeier, Florian Kattner, Kazuo UEDA, Kana Doumoto, Yoshitaka NAKAJIMA, Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands, the Journal of the Acoustical Society of America, http://dx.doi.org/10.1121/1.4928954, 138, 3, 1561-1569, 2015.09, [URL], To investigate the mechanisms by which unattended speech impairs short-term memory performance, speech samples were systematically degraded by means of a noise vocoder. For experiment 1, recordings of German and Japanese sentences were passed through a filter bank dividing the spectrum between 50 and 7000 Hz into 20 critical-band channels or combinations of those, yielding 20, 4, 2, or just 1 channel(s) of noise-vocoded speech. Listening tests conducted with native speakers of both languages showed a monotonic decrease in speech intelligibility as the number of frequency channels was reduced. For experiment 2, 40 native German and 40 native Japanese participants were exposed to speech processed in the same manner while trying to memorize visually presented sequences of digits in the correct order. Half of each sample received the German, the other half received the Japanese speech samples. The results show large irrelevant-speech effects increasing in magnitude with the number of frequency channels. The effects are slightly larger when subjects are exposed to their own native language. The results are neither predicted very well by the speech transmission index, nor by psychoacoustical fluctuation strength, most likely, since both metrics fail to disentangle amplitude and frequency modulations in the signals.
(C) 2015 Acoustical Society of America..
|8.||Yoshitaka NAKAJIMA, Takayuki SASAKI, Kazuo UEDA, Gerard B. REMIJN, Auditory Grammar, Acoustics Australia, 42, 2, 97-101, 2014.08.|
|9.||Emi HASUO, Yoshitaka NAKAJIMA, Erika TOMIMATSU, Simon GRONDIN, Kazuo UEDA, The occurrence of the filled duration illusion: A comparison of the method of adjustment with the method of magnitude estimation, Acta Psychologica, 147, 111-121, (Accepted 4 October 2013; Available online 5 November 2013), 2014.02, A time interval between the onset and the offset of a continuous sound (filled interval) is often perceived to be longer than a time interval between two successive brief sounds (empty interval) of the same physical duration. The present study examined whether and how this phenomenon, sometimes called the filled duration illusion (FDI), occurs for short time intervals (40–520 ms). The investigation was conducted with the method of adjustment (Experiment 1) and the method of magnitude estimation (Experiment 2). When the method of adjustment was used, the FDI did not appear for the majority of the participants, but it appeared clearly for some participants. In the latter case, the amount of the FDI increased as the interval duration lengthened. The FDI was more likely to occur with magnitude estimation than with the method of adjustment. The participants who showed clear FDI with one method did not necessarily show such clear FDI with the other method..|
|10.||Yuko Yamashita, Yoshitaka Nakajima, Kazuo Ueda, Yohko Shimada, David Hirsh, Takeharu Seno and Benjamin Alexander Smith, Acoustic analyses of speech sounds and rhythms in Japanese- and English-learning infants , Frontiers in Language Sciences, 10.3389/fpsyg.2013.00057, 4, 57, 2013.02, The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment. - See more at: http://www.frontiersin.org/language_sciences/10.3389/fpsyg.2013.00057/abstract#sthash.R2weBtfH.dpuf.|
|11.||Emi Hasuo, Yoshitaka Nakajima, and Kazuo Ueda, Does filled duration illusion occur for very short time intervals?, Acoustical Science and Technology, 32, 2, 82-85, 2011.03.|
|12.||Takayuki Sasaki, Yoshitaka Nakajima, Gert ten Hoopen, Edwin van Buuringen, Bob Massier, Taku Kojo, Tsuyoshi Kuroda, and Kazuo Ueda, Time-stretching: Illusory lengthening of filled auditory durations, Attention, Perception, & Psychophysics, 72, 1404-1421, 2010.07.|
|13.||Kazuo Ueda, Reiko Akahane-Yamada, Ryo Komaki, and Takahiro Adachi, Identification of English /r/ and /l/ in noise: the effects of baseline performance, Acoustical Science and Technology, 28 (4) 251-259, 2007.07.|
|14.||Takahiro Adachi, Reiko Akahane-Yamada, and Kazuo Ueda, Intelligibility of English phonemes in noise for native and non-native listeners, Acoustical Science and Technology, vol. 27, no. 5, 285-289, 2006.09.|
|15.||Kazuo Ueda, Yoshitaka Nakajima, and Reiko Akahane-Yamada, An artificial environment is often a noisy environment: Auditory scene analysis and speech perception in noise, Journal of Physiological Anthropology and Applied Human Science, vol. 24, no. 1, 129-133, 2005.02.|
|16.||Nakajima, Y., Sasaki, T., Remijn, G. B., and Ueda, K., Perceptual organization of onsets and offsets of sounds, Journal of Physiological Anthropology and Applied Human Science, vol. 23, no. 6, 345-349, 2004.12.|
|17.||Ueda, K., Short-term auditory memory interference: the Deutsch demonstration revisited, Acoustical Science and Technology, vol. 25, no. 6, 457-467, 2004.11.|
|18.||Ueda, K., Akahane-Yamada, R., and Komaki, R., Identification of English /r/ and /l/ in white noise by native and non-native listeners, Acoustical Science and Technology, vol. 23, no. 6, 336-338, 2002.11.|
|19.||Semal, C., Demany, L., Ueda, K., and Hall?, P., Speech versus nonspeech in pitch memory, Journal of the Acoustical Society of America, vol. 100, no. 2, 1132-1140, 1996.08.|
|20.||Ueda, K., and Ohtsuki, M., The effect of sound pressure level difference on filled duration extension, Journal of the Acoustical Society of Japan (E), vol. 17, no. 3, 159-161, 1996.05.|
|21.||Ueda, K., and Hirahara, T., Frequency response of headphones measured in free field and diffuse field by loudness comparison, Journal of the Acoustical Society of Japan (E), vol. 12, no. 3, 131-138, 1991.05.|
|22.||Ueda, K., and Akagi, M., Sharpness and amplitude envelopes of broadband noise, Journal of the Acoustical Society of America, vol. 87, no. 2, 814-819, 1990.02.|
|23.||Ueda, K., and Ohgushi, K., Perceptual components of pitch: Spatial representation using a multidimensional scaling technique, Journal of the Acoustical Society of America, vol. 82, no. 4, 1193-1200, 1987.10.|
|24.||Should we assume a hierarchical structure for adjectives describing timbre?.|
|25.||Spatial representations of two components of pitch using multidimensional scaling technique.|