||Yoshitaka NAKAJIMA, Mizuki MATSUDA, Kazuo UEDA, Gerard B. REMIJN, Temporal resolution needed for auditory communication: Measurement with mosaic speech, Frontiers in Human Neuroscience, Frontiers in Human Neuroscience, 10.3389/fnhum.2018.00149, 12, 149, 1-8, 2018.06, Temporal resolution needed for Japanese speech communication wasmeasured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for
healthy listeners to perceive speech is introduced. As a first step, we report listeners’ intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called “mosaic speech”: speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of 40 ms into vivid motion..
||Kazuo UEDA, Yoshitaka NAKAJIMA, An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, SCIENTIFIC REPORTS, 10.1038/srep42468, 7, 42468, 1-4, 2017.02, The peripheral auditory system functions like a frequency analyser, often modelled as a bank of nonoverlapping
band-pass filters called critical bands; 20 bands are necessary for simulating frequency
resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller
number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power
fluctuations of the speech signals passing through them; nevertheless, the number and frequency
ranges of the frequency bands for efficient speech communication are yet unknown. We derived four
common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above
3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects.
The analyses robustly led to three factors common to all languages investigated—the low & mid-high
factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor
the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/
dialects, suggesting a language universal..
||Yoshitaka NAKAJIMA, Kazuo UEDA, Shota FUJIMARU, Hirotoshi MOTOMURA, Yuki OHSAKA, English phonology and an acoustic language universal, SCIENTIFIC REPORTS, 10.1038/srep46049, 7, 46049, 1-6, 2017.04, Acoustic analyses of eight different languages/dialects had revealed a language universal: Three
spectral factors consistently appeared in analyses of power fluctuations of spoken sentences divided
by critical-band filters into narrow frequency bands. Examining linguistic implications of these factors
seems important to understand how speech sounds carry linguistic information. Here we show the
three general categories of the English phonemes, i.e., vowels, sonorant consonants, and obstruents,
to be discriminable in the Cartesian space constructed by these factors: A factor related to frequency
components above 3,300 Hz was associated only with obstruents (e.g., /k/ or /z/), and another factor
related to frequency components around 1,100 Hz only with vowels (e.g., /a/ or /i/) and sonorant
consonants (e.g., /w/, /r/, or /m/). The latter factor highly correlated with the hypothetical concept
of sonority or aperture in phonology. These factors turned out to connect the linguistic and acoustic
aspects of speech sounds systematically..
||Takuya KISHIDA, Yoshitaka NAKAJIMA, Kazuo UEDA, Gerard B. REMIJN, Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech, Frontiers in Psychology, 10, 3389, 2016.04, The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. If the number of factors is 3 or more, elementary linguistic information is preserved..
||Wolfgang Ellermeier, Florian Kattner, Kazuo UEDA, Kana Doumoto, Yoshitaka NAKAJIMA, Memory dirsuption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands , Journal of the Acoustical Society of America, 138, 1561-1569, 2015.08, To investigate the mechanisms by which unattended speech impairs short-term memory performance,
speech samples were systematically degraded by means of a noise vocoder. For experiment 1,
recordings of German and Japanese sentences were passed through a filter bank dividing the spectrum
between 50 and 7000 Hz into 20 critical-band channels or combinations of those, yielding 20,
4, 2, or just 1 channel(s) of noise-vocoded speech. Listening tests conducted with native speakers
of both languages showed a monotonic decrease in speech intelligibility as the number of frequency
channels was reduced. For experiment 2, 40 native German and 40 native Japanese participants
were exposed to speech processed in the same manner while trying to memorize visually presented
sequences of digits in the correct order. Half of each sample received the German, the other half
received the Japanese speech samples. The results show large irrelevant-speech effects increasing
in magnitude with the number of frequency channels. The effects are slightly larger when subjects
are exposed to their own native language. The results are neither predicted very well by the speech
transmission index, nor by psychoacoustical fluctuation strength, most likely, since both metrics
fail to disentangle amplitude and frequency modulations in the signals..
||Emi HASUO, Yoshitaka NAKAJIMA, Michiko WAKASUGI, Takuya FUJIOKA, Effects of sound marker durations on the perception of inter-onset time intervals: a study with instrumental sounds, 基礎心理学研究（The Japanese Journal of Psychonomic Science）, 34, 1, 2-16, 2015.12, Previous studies have shown that the time interval marked by the onsets of two successive pure tone bursts is perceived to be longer when the second sound marker is lengthened. The present study examined whether this phenomenon appeared in a more natural setting in which the time interval was marked by instrumental sounds with complex temporal and spectral structures. Real piano sounds and synthesized sounds that simulated either just the temporal structure of the piano sound or both its harmonic and temporal structures were used as sound markers. Lengthening the second marker increased the perceived duration of the interval, as in previous studies, but only in limited cases, and this did not occur in an experiment in which only the synthesized piano sounds were used. Thus, the effect of sound durations was weakened with the new series of sounds. Characteristics of piano sounds that were not captured in the synthesized sounds seem to have played an important role in duration perception..
||Yoshitaka NAKAJIMA, Emi HASUO, Miki YAMASHITA, Yuki HARAGUCHI, Overestimation of the second time interval replaces time-shrinking when the difference between two adjacent time intervals increases, Frontiers in Human Neuroscience, 10.3389/fnhum.2014.00281, 8, 281, 1-12, 2014.05, When the onsets of three successive sound bursts mark two adjacent time intervals, the second time interval can be underestimated when it is physically longer than the first time interval by up to 100 ms. This illusion, time-shrinking, is very stable when the first time interval is 200 ms or shorter (Nakajima et al., 2004, Perception, 33). Time-shrinking had been considered a kind of perceptual assimilation to make the first and the second time interval more similar to each other. Here we investigated whether the underestimation of the second time interval was replaced by an overestimation if the physical difference between the neighboring time intervals was too large for the assimilation to take place; this was a typical situation in which a perceptual contrast could be expected. Three experiments to measure the overestimation/underestimation of the second time interval by the method of adjustment were conducted. The first time interval was varied from 40 to 280 ms, and such overestimations indeed took place when the first time interval was 80–280 ms. The overestimations were robust when the second time interval was longer than the first time interval by 240 ms or more, and the magnitude of the overestimation was larger than 100 ms in some conditions. Thus, a perceptual contrast to replace time-shrinking was established. An additional experiment indicated that this contrast did not affect the perception of the first time interval substantially: The contrast in the present conditions seemed unilateral..
||Yuko YAMASHITA, Yoshitaka NAKAJIMA, Kazuo UEDA, Yohko SHIMADA, David HIRSH, Takeharu SENO, Benjamin A. SMITH, Acoustic analyses of speech sounds and rhythms in Japanese- and English-learning infants, Frontiers in Psychology, 10.3389/fpsyg.2013.00057, 4, 57, 1-10, 2013.02, The purpose of this study was to explore developmental changes, in terms of spectral fluctuations
and temporal periodicity with Japanese- and English-learning infants. Three age
groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories
with age. Natural speech of the infants was recorded.We utilized a critical-band-filter
bank, which simulated the frequency resolution in adults’ auditory periphery. First, the
correlations between the power fluctuations of the critical-band outputs represented by
factor analysis were observed in order to see how the critical bands should be connected
to each other, if a listener is to differentiate sounds in infants’ speech. In the following
analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations.
The present analysis identified three factors as had been observed in adult speech
at 24 months of age in both linguistic environments. These three factors were shifted to
a higher frequency range corresponding to the smaller vocal tract size of the infants. The
results suggest that the vocal tract structures of the infants had developed to become
adult-like configuration by 24 months of age in both language environments. The amount
of utterances with periodic nature of shorter time increased with age in both environments.
This trend was clearer in the Japanese environment..
||Yoshitaka NAKAJIMA, Hiroshige TAKEICHI, Human processing of short temporal intervals as revealed by an ERP waveform analysis, Frontiers in Integrative Neuroscience, 10.3389/fnint.2011.00074, 2011.12, To clarify the time course over which the human brain processes information about durations up to ∼300 ms, we reanalyzed the data that were previously reported by Mitsudo et al. (2009) using a multivariate analysis method. Event-related potentials were recorded from 19 scalp electrodes on 11 (nine original and two additional) participants while they judged whether two neighboring empty time intervals – called t1 and t2 and marked by three tone bursts – had equal durations. There was also a control condition in which the participants were presented the same temporal patterns but without a judgment task. In the present reanalysis, we sought to visualize how the temporal patterns were represented in the brain over time. A correlation matrix across channels was calculated for each temporal pattern. Geometric separations between the correlation matrices were calculated, and subjected to multidimensional scaling. We performed such analyses for a moving 100-ms time window after the t1 presentations. In the windows centered at <100 ms after the t2 presentation, the analyses revealed the local maxima of categorical separation between temporal patterns of perceptually equal durations versus perceptually unequal durations, both in the judgment condition and in the control condition. Such categorization of the temporal patterns was prominent only in narrow temporal regions. The analysis indicated that the participants determined whether the two neighboring time intervals were of equal duration mostly within 100 ms after the presentation of the temporal patterns. A very fast brain activity was related to the perception of elementary temporal patterns without explicit judgments. This is consistent with the findings of Mitsudo et al. and it is in line with the processing time hypothesis proposed by Nakajima et al. (2004). The validity of the correlation matrix analyses turned out to be an effective tool to grasp the overall responses of the brain to temporal patterns..
||Tsuyoshi Kuroda, Yoshitaka NAKAJIMA, Shuntarou EGUCHI, Illusory continuity without sufficient sound energy to fill a temporal gap: Examples of crossing glide tones, Journal of Experimental Psychology : Human Perception and Performance, 10.1037/a0026629, 38, 1254-1267, 2012.10, The gap transfer illusion is an auditory illusion where a temporal gap inserted in a longer glide tone is perceived as if it were in a crossing shorter glide tone. Psychophysical and phenomenological experiments were conducted to examine the effects of sound-pressure-level (SPL) differences between crossing glides on the occurrence of the gap transfer illusion. We found that the subjective continuity-discontinuity of the crossing glides changed as a function of the relative level of the shorter glide to the level of the longer glide. When the relative level was approximately between −9 and +2 dB, listeners perceived the longer glide as continuous and the shorter glide as discontinuous, that is, the gap transfer illusion took place. The glides were perceived veridically below this range, that is, gap transfer did not take place, whereas above this range the longer glide and the shorter glide were both perceived as continuous. The fact that the longer glide could be perceived as continuous even when the crossing shorter glide was 9 dB weaker indicates that the longer glide's subjective continuity cannot be explained within the conventional framework of auditory organization, which assumes reallocation of sound energy from the shorter to the longer glide. The implicated mechanisms are discussed in terms of the temporal configuration of onsets and terminations and the time-frequency distribution of sound energy..
||Emi HASUO, Yoshitaka NAKAJIMA, Satoshi OSAWA, Hiroyuki FUJISHIMA, Effects of temporal shapes of sound markers on the perception of inter-onset time intervals, Attention, Perception, & Psychophysics, 10.3758/s13414-011-0236-1, 74, 430-445, 2012.03, This study investigated how the temporal characteristics,
particularly durations, of sounds affect the
perceived duration of very short interonset time intervals
(120–360 ms), which is important for rhythm perception in
speech and music. In four experiments, the subjective
duration of single time intervals marked by two sounds was
measured utilizing the method of adjustment, while the
markers’ durations, amplitude difference (which accompanied
the duration change), and sound energy distribution in
time were varied. Lengthening the duration of the second
marker in the range of 20–100 ms increased the subjective
duration of the time interval in a stable manner. Lengthening
the first marker tended to increase the subjective
duration, but unstably; an opposite effect sometimes
appeared for the shortest time interval of 120 ms. The
effects of varying the amplitude and the sound energy
distribution in time of either marker were very small in the
present experimental conditions, thus proving the effects of
marker durations per se..
||Takayuki Sasaki, Yoshitaka Nakajima, Gert ten Hoopen, Edwin van Buuringen, Bob Massier, Taku Kojo, Tsuyoshi Kuroda, and Kazuo Ueda, Time-stretching: Illusory lengthening of filled auditory durations, Attention, Perception, and Psychophysics, 印刷中, 1404-1421, 2010.05.
||Tsuyoshi Kuroda, Yoshitaka Nakajima, Shimpei Tsunashima, and Tatsuro Yasutake, Effects of spectra and sound pressure levels on the occurrence of the gap transfer illusion, Perception, 2009.04.
||Takako Mitsudo, Yoshitaka Nakajima, Gerard B. Remijn, Hiroshige Takeichi, Yoshinobu Goto, and Shozo Tobimatsu, Electrophysiological evidence of auditory temporal perception related to the assimilation between two neighboring time intervals, NeuroQuantology, 2009.03.
||Y. Nakajima, G. ten Hoopen, T. Sasaki, K. Yamamoto, M. Kadota, M. Simons, D. Suetomi, Time-shrinking: the process of unilateral temporal assimilation, Perception, vol 33, 1061-1079, 2004.12.
||Yoshitaka Nakajima, Demonstrations of the gap transfer illusion, Acoustical Science and Technology, 2006.06.
||Y. Nakajima, T. Sasaki, K. Kanafuka, A. Miyamoto, G. Remijn, G. ten Hoopen, Illusory recouplings of onsets and terminations of glide tone components, Perception & Psychophysics, vol. 62, 1413-1425, 2000.07.
||Yoshitaka Nakajima, Gert ten Hoopen, Rene van der Wilk, A new illusion of time perception, Music Perception, Vol. 8, 431-448, 1991.06.