Kyushu University Academic Staff Educational and Research Activities Database
Researcher information (To researchers) Need Help? How to update
Yoshitaka NAKAJIMA Last modified date:2018.06.13

Professor / perceptual psychology
Department of Human Science
Faculty of Design

Graduate School
Undergraduate School
Other Organization

Materials on auditory perception .
Academic Degree
PhD in Design
Country of degree conferring institution (Overseas)
Field of Specialization
perceptual psychology, speech signal processing, English phonology
Total Priod of education and research career in the foreign country
Outline Activities
I have three main branches of investigation, 1) the relationship
between psychophysics and phenomenology, 2) auditory
organization, and 3) perceptual foundations of linguistic

Editorial Board Member, Journal of Music Perception and Cognition
Consulting Editor, Music Perception

Research on auditory organization (with Miyagi Gakuin Women's University)
Research on rhythm perception (within Kyushu University and with RIKEN and Laval University)
Research on applied perceptual psychology (with National University of Ireland, Galway)
Research on speech perception (Technische Universität Darmstadt)
Research Interests
  • English phonology
    keyword : sonority phoneme
  • Speech signal processing
    keyword : speech enhancement
  • Perceptual psychology
    keyword : language, audition, time perception
    1979.05perceptual psychology.
Current and Past Project
  • The 21st Century COE Program: Design of artificial environments on the basis of human sensibility
Academic Activities
1. Y. Nakajima & K. Ueda, Study Manual for University Students: Welcome to Owl University, Nakanishiya, Kyoto, 2006.
1. Yoshitaka NAKAJIMA, Kazuo UEDA, Gerard B. REMIJN, Yuko YAMASHITA, Takuya KISHIDA, How sonority appears in speech analyses, Acoustical Science and Technology, 2018.05, Sonority is a subjective or linguistic property of speech sounds closely related to syllable formation. We showed
that it was highly correlated to one of the 3 or 4 factors extracted to describe spectral changes of speech, and this factor was closely related to a frequency range around 540–1,700 Hz. The factor scores of this factor were high in vowels, lower in sonorant consonants, and even lower in obstruents in British English. Another factor related to a range above 3,300 Hz was found to be negatively correlated to sonority; the factor scores of this factor were high in obstruents, in which high-frequency components were conspicuous..
1. Yoshitaka NAKAJIMA, Mizuki MATSUDA, Kazuo UEDA, Gerard B. REMIJN, Temporal resolution needed for auditory communication: Measurement with mosaic speech, Frontiers in Human Neuroscience, Frontiers in Human Neuroscience, 10.3389/fnhum.2018.00149, 12, 149, 1-8, 2018.06, Temporal resolution needed for Japanese speech communication wasmeasured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for
healthy listeners to perceive speech is introduced. As a first step, we report listeners’ intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called “mosaic speech”: speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of 40 ms into vivid motion..
2. Kazuo UEDA, Yoshitaka NAKAJIMA, An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, SCIENTIFIC REPORTS, 10.1038/srep42468, 7, 42468, 1-4, 2017.02, The peripheral auditory system functions like a frequency analyser, often modelled as a bank of nonoverlapping
band-pass filters called critical bands; 20 bands are necessary for simulating frequency
resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller
number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power
fluctuations of the speech signals passing through them; nevertheless, the number and frequency
ranges of the frequency bands for efficient speech communication are yet unknown. We derived four
common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above
3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects.
The analyses robustly led to three factors common to all languages investigated—the low & mid-high
factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor
the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/
dialects, suggesting a language universal..
3. Yoshitaka NAKAJIMA, Kazuo UEDA, Shota FUJIMARU, Hirotoshi MOTOMURA, Yuki OHSAKA, English phonology and an acoustic language universal, SCIENTIFIC REPORTS, 10.1038/srep46049, 7, 46049, 1-6, 2017.04, Acoustic analyses of eight different languages/dialects had revealed a language universal: Three
spectral factors consistently appeared in analyses of power fluctuations of spoken sentences divided
by critical-band filters into narrow frequency bands. Examining linguistic implications of these factors
seems important to understand how speech sounds carry linguistic information. Here we show the
three general categories of the English phonemes, i.e., vowels, sonorant consonants, and obstruents,
to be discriminable in the Cartesian space constructed by these factors: A factor related to frequency
components above 3,300 Hz was associated only with obstruents (e.g., /k/ or /z/), and another factor
related to frequency components around 1,100 Hz only with vowels (e.g., /a/ or /i/) and sonorant
consonants (e.g., /w/, /r/, or /m/). The latter factor highly correlated with the hypothetical concept
of sonority or aperture in phonology. These factors turned out to connect the linguistic and acoustic
aspects of speech sounds systematically..
4. Takuya KISHIDA, Yoshitaka NAKAJIMA, Kazuo UEDA, Gerard B. REMIJN, Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech, Frontiers in Psychology, 10, 3389, 2016.04, The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. If the number of factors is 3 or more, elementary linguistic information is preserved..
5. Wolfgang Ellermeier, Florian Kattner, Kazuo UEDA, Kana Doumoto, Yoshitaka NAKAJIMA, Memory dirsuption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands , Journal of the Acoustical Society of America, 138, 1561-1569, 2015.08, To investigate the mechanisms by which unattended speech impairs short-term memory performance,
speech samples were systematically degraded by means of a noise vocoder. For experiment 1,
recordings of German and Japanese sentences were passed through a filter bank dividing the spectrum
between 50 and 7000 Hz into 20 critical-band channels or combinations of those, yielding 20,
4, 2, or just 1 channel(s) of noise-vocoded speech. Listening tests conducted with native speakers
of both languages showed a monotonic decrease in speech intelligibility as the number of frequency
channels was reduced. For experiment 2, 40 native German and 40 native Japanese participants
were exposed to speech processed in the same manner while trying to memorize visually presented
sequences of digits in the correct order. Half of each sample received the German, the other half
received the Japanese speech samples. The results show large irrelevant-speech effects increasing
in magnitude with the number of frequency channels. The effects are slightly larger when subjects
are exposed to their own native language. The results are neither predicted very well by the speech
transmission index, nor by psychoacoustical fluctuation strength, most likely, since both metrics
fail to disentangle amplitude and frequency modulations in the signals..
6. Emi HASUO, Yoshitaka NAKAJIMA, Michiko WAKASUGI, Takuya FUJIOKA, Effects of sound marker durations on the perception of inter-onset time intervals: a study with instrumental sounds, 基礎心理学研究(The Japanese Journal of Psychonomic Science), 34, 1, 2-16, 2015.12, Previous studies have shown that the time interval marked by the onsets of two successive pure tone bursts is perceived to be longer when the second sound marker is lengthened. The present study examined whether this phenomenon appeared in a more natural setting in which the time interval was marked by instrumental sounds with complex temporal and spectral structures. Real piano sounds and synthesized sounds that simulated either just the temporal structure of the piano sound or both its harmonic and temporal structures were used as sound markers. Lengthening the second marker increased the perceived duration of the interval, as in previous studies, but only in limited cases, and this did not occur in an experiment in which only the synthesized piano sounds were used. Thus, the effect of sound durations was weakened with the new series of sounds. Characteristics of piano sounds that were not captured in the synthesized sounds seem to have played an important role in duration perception..
7. Yoshitaka NAKAJIMA, Emi HASUO, Miki YAMASHITA, Yuki HARAGUCHI, Overestimation of the second time interval replaces time-shrinking when the difference between two adjacent time intervals increases, Frontiers in Human Neuroscience, 10.3389/fnhum.2014.00281, 8, 281, 1-12, 2014.05, When the onsets of three successive sound bursts mark two adjacent time intervals, the second time interval can be underestimated when it is physically longer than the first time interval by up to 100 ms. This illusion, time-shrinking, is very stable when the first time interval is 200 ms or shorter (Nakajima et al., 2004, Perception, 33). Time-shrinking had been considered a kind of perceptual assimilation to make the first and the second time interval more similar to each other. Here we investigated whether the underestimation of the second time interval was replaced by an overestimation if the physical difference between the neighboring time intervals was too large for the assimilation to take place; this was a typical situation in which a perceptual contrast could be expected. Three experiments to measure the overestimation/underestimation of the second time interval by the method of adjustment were conducted. The first time interval was varied from 40 to 280 ms, and such overestimations indeed took place when the first time interval was 80–280 ms. The overestimations were robust when the second time interval was longer than the first time interval by 240 ms or more, and the magnitude of the overestimation was larger than 100 ms in some conditions. Thus, a perceptual contrast to replace time-shrinking was established. An additional experiment indicated that this contrast did not affect the perception of the first time interval substantially: The contrast in the present conditions seemed unilateral..
8. Yuko YAMASHITA, Yoshitaka NAKAJIMA, Kazuo UEDA, Yohko SHIMADA, David HIRSH, Takeharu SENO, Benjamin A. SMITH, Acoustic analyses of speech sounds and rhythms in Japanese- and English-learning infants, Frontiers in Psychology, 10.3389/fpsyg.2013.00057, 4, 57, 1-10, 2013.02, The purpose of this study was to explore developmental changes, in terms of spectral fluctuations
and temporal periodicity with Japanese- and English-learning infants. Three age
groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories
with age. Natural speech of the infants was recorded.We utilized a critical-band-filter
bank, which simulated the frequency resolution in adults’ auditory periphery. First, the
correlations between the power fluctuations of the critical-band outputs represented by
factor analysis were observed in order to see how the critical bands should be connected
to each other, if a listener is to differentiate sounds in infants’ speech. In the following
analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations.
The present analysis identified three factors as had been observed in adult speech
at 24 months of age in both linguistic environments. These three factors were shifted to
a higher frequency range corresponding to the smaller vocal tract size of the infants. The
results suggest that the vocal tract structures of the infants had developed to become
adult-like configuration by 24 months of age in both language environments. The amount
of utterances with periodic nature of shorter time increased with age in both environments.
This trend was clearer in the Japanese environment..
9. Yoshitaka NAKAJIMA, Hiroshige TAKEICHI, Human processing of short temporal intervals as revealed by an ERP waveform analysis, Frontiers in Integrative Neuroscience, 10.3389/fnint.2011.00074, 2011.12, To clarify the time course over which the human brain processes information about durations up to ∼300 ms, we reanalyzed the data that were previously reported by Mitsudo et al. (2009) using a multivariate analysis method. Event-related potentials were recorded from 19 scalp electrodes on 11 (nine original and two additional) participants while they judged whether two neighboring empty time intervals – called t1 and t2 and marked by three tone bursts – had equal durations. There was also a control condition in which the participants were presented the same temporal patterns but without a judgment task. In the present reanalysis, we sought to visualize how the temporal patterns were represented in the brain over time. A correlation matrix across channels was calculated for each temporal pattern. Geometric separations between the correlation matrices were calculated, and subjected to multidimensional scaling. We performed such analyses for a moving 100-ms time window after the t1 presentations. In the windows centered at <100 ms after the t2 presentation, the analyses revealed the local maxima of categorical separation between temporal patterns of perceptually equal durations versus perceptually unequal durations, both in the judgment condition and in the control condition. Such categorization of the temporal patterns was prominent only in narrow temporal regions. The analysis indicated that the participants determined whether the two neighboring time intervals were of equal duration mostly within 100 ms after the presentation of the temporal patterns. A very fast brain activity was related to the perception of elementary temporal patterns without explicit judgments. This is consistent with the findings of Mitsudo et al. and it is in line with the processing time hypothesis proposed by Nakajima et al. (2004). The validity of the correlation matrix analyses turned out to be an effective tool to grasp the overall responses of the brain to temporal patterns..
10. Tsuyoshi Kuroda, Yoshitaka NAKAJIMA, Shuntarou EGUCHI, Illusory continuity without sufficient sound energy to fill a temporal gap: Examples of crossing glide tones, Journal of Experimental Psychology : Human Perception and Performance, 10.1037/a0026629, 38, 1254-1267, 2012.10, The gap transfer illusion is an auditory illusion where a temporal gap inserted in a longer glide tone is perceived as if it were in a crossing shorter glide tone. Psychophysical and phenomenological experiments were conducted to examine the effects of sound-pressure-level (SPL) differences between crossing glides on the occurrence of the gap transfer illusion. We found that the subjective continuity-discontinuity of the crossing glides changed as a function of the relative level of the shorter glide to the level of the longer glide. When the relative level was approximately between −9 and +2 dB, listeners perceived the longer glide as continuous and the shorter glide as discontinuous, that is, the gap transfer illusion took place. The glides were perceived veridically below this range, that is, gap transfer did not take place, whereas above this range the longer glide and the shorter glide were both perceived as continuous. The fact that the longer glide could be perceived as continuous even when the crossing shorter glide was 9 dB weaker indicates that the longer glide's subjective continuity cannot be explained within the conventional framework of auditory organization, which assumes reallocation of sound energy from the shorter to the longer glide. The implicated mechanisms are discussed in terms of the temporal configuration of onsets and terminations and the time-frequency distribution of sound energy..
11. Emi HASUO, Yoshitaka NAKAJIMA, Satoshi OSAWA, Hiroyuki FUJISHIMA, Effects of temporal shapes of sound markers on the perception of inter-onset time intervals, Attention, Perception, & Psychophysics, 10.3758/s13414-011-0236-1, 74, 430-445, 2012.03, This study investigated how the temporal characteristics,
particularly durations, of sounds affect the
perceived duration of very short interonset time intervals
(120–360 ms), which is important for rhythm perception in
speech and music. In four experiments, the subjective
duration of single time intervals marked by two sounds was
measured utilizing the method of adjustment, while the
markers’ durations, amplitude difference (which accompanied
the duration change), and sound energy distribution in
time were varied. Lengthening the duration of the second
marker in the range of 20–100 ms increased the subjective
duration of the time interval in a stable manner. Lengthening
the first marker tended to increase the subjective
duration, but unstably; an opposite effect sometimes
appeared for the shortest time interval of 120 ms. The
effects of varying the amplitude and the sound energy
distribution in time of either marker were very small in the
present experimental conditions, thus proving the effects of
marker durations per se..
12. Takayuki Sasaki, Yoshitaka Nakajima, Gert ten Hoopen, Edwin van Buuringen, Bob Massier, Taku Kojo, Tsuyoshi Kuroda, and Kazuo Ueda, Time-stretching: Illusory lengthening of filled auditory durations, Attention, Perception, and Psychophysics, 印刷中, 1404-1421, 2010.05.
13. Tsuyoshi Kuroda, Yoshitaka Nakajima, Shimpei Tsunashima, and Tatsuro Yasutake, Effects of spectra and sound pressure levels on the occurrence of the gap transfer illusion, Perception, 2009.04.
14. Takako Mitsudo, Yoshitaka Nakajima, Gerard B. Remijn, Hiroshige Takeichi, Yoshinobu Goto, and Shozo Tobimatsu, Electrophysiological evidence of auditory temporal perception related to the assimilation between two neighboring time intervals, NeuroQuantology, 2009.03.
15. Y. Nakajima, G. ten Hoopen, T. Sasaki, K. Yamamoto, M. Kadota, M. Simons, D. Suetomi, Time-shrinking: the process of unilateral temporal assimilation, Perception, vol 33, 1061-1079, 2004.12.
16. Yoshitaka Nakajima, Demonstrations of the gap transfer illusion, Acoustical Science and Technology, 2006.06.
17. Y. Nakajima, T. Sasaki, K. Kanafuka, A. Miyamoto, G. Remijn, G. ten Hoopen, Illusory recouplings of onsets and terminations of glide tone components, Perception & Psychophysics, vol. 62, 1413-1425, 2000.07.
18. Yoshitaka Nakajima, Gert ten Hoopen, Rene van der Wilk, A new illusion of time perception, Music Perception, Vol. 8, 431-448, 1991.06.
1. Yoshitaka NAKAJIMA, Kazuo UEDA, Gerard Remijn, Yuko Yamashita, Takuya KISHIDA, Phonology and psychophysics: Is sonority real?, 33rd Annual Meeting of the International Society for Psychophysics, 2017.10.
2. Yoshitaka NAKAJIMA, Perceptual interactions between adjacent time intervals marked by sound bursts, 5th Joint Meeting: Acoustical Society of America and Acoustical Society of Japan, 2016.11, Perceptual interactions take place between adjacent time intervals up to ~600 ms even in simple contexts. Let us suppose that two adjacent time intervals, T1 and T2 in this order, are marked by sound bursts. Their durations are perceptually assimilated in a bilateral manner if the difference between them is up to ~50 ms. When T1 200 ms and T1 T2 < T1 + 100 ms, T2 is underestimated systematically,
and the underestimation is roughly a function of T2—T1. Except when T1 ’ T2, this is assimilation of T2 to T1, partially in a unilateral manner. This systematic underestimation, time-shrinking, disappears when T1 > 300 ms. When T2 = 100 or 200 ms and T1 = T2 + 100 or T2 + 200 ms, T1 is perceptually contrasted against T2: T1 is overestimated. When 80 T1 280 ms and T2 T1 + 300 ms, T2 is contrasted against T1: In this case, T2 is overestimated. Assimilation and contrast are more conspicuous in T2 than in T1. For three adjacent time intervals, T1, T2, and T3, the perception of T3 can be affected by both T1 and T2, and the perception of T2 by T1..
3. Yoshitaka NAKAJIMA, Mizuki MATSUDA, Gerard B. REMIJN, Kazuo UEDA, Temporal resolution needed to hear out Japanese morae in mosaic speech, 日本音響学会聴覚研究会, 2014.05.
4. Yoshitaka NAKAJIMA, Hiroshige TAKEICHI, Takako MITSUDO, Shozo TOBIMATSU, Perceptual processing of pairs of acoustically marked time intervals: Correspondence between psychophysical and electrophysiological data, Fechner Day 2013, The 29th Annual Meeting of the International Society for Psychophysics, 2013.10, Event-related potentials (ERPs) elicited by pairs of subsequent time intervalsmarked by sound bursts were recorded in our previous study1, and the data were reanalyzed utilizing a new multivariate method. Subsequent time intervals t1 and t2 are often perceived as equal in duration when t2 is shorter than 300 ms and up to 50 ms shorter or up to 80 ms longer than t1; the subjective equality holds even if the physical difference is larger than the just noticeable difference obtained for t1 and t2 separated in time. This phenomenon is called auditory temporal assimilation. ERPs were registered in two types of sessions: J sessions, where the participants judged whether the two intervals were subjectively equal or not, and NJ sessions, where no judgments were required. Slow negative components occurred in brain activities in the J sessions, more conspicuous when inequality between t1 and t2 was perceived, in agreement with our earlier study. An experiment in which t2 was fixed at 200ms was chosen for the present analysis. For a moving 100-ms time window, a correlation matrix across the 19 electrodes was calculated for each temporal pattern, and the correlation matrix distance (CMD = Euclidean distance between the respective correlation matrices) between each two patterns was evaluated. The patterns for which subjective equality dominated were classified as equal (E) patterns, those for which subjective inequality dominated as unequal (UE) patterns. There were four E patterns and three UE patterns, but no patterns to be classified otherwise. A measure of separation of E vs. UE patterns
in terms of brain activities was calculated as the sum of squared CMDs between E and UE patterns, and expressed as relative separation (proportionally to the total squared CMD). The relative separation was a function of time, represented by the temporal midpoint of the moving window. The relative separation in the J sessions showed a peak around 70ms after t2, similarly to our earlier findings2. A process related to E-UE judgment is thus likely to take place within 100 ms after t2. Peaks within 100ms after t2 were observed also in the NJ sessions, suggesting that implicit judgments, although not required, may have occurred in a very early stage. The perceptual separation between the E and the UE patterns can thus be related to dynamic aspects of brain activities, critical factors of which we are trying to identify and locate.
5. Temporal structures of temporal perception: How time intervals
marked by very short sounds are perceived.
6. Auditory grammar and auditory organization.
7. Musical tonality.
8. Perceptual organization of onsets and offsets in speech.
Membership in Academic Society
  • The Japanese Psychonomic Society
  • The International Society for Psychophysics
  • The Japanese Psychological Association
  • The Japanese Society for Music Perception and Cognition
  • The Acoustical Society of Japan
Educational Activities
Introduction to Psychology
Auditory Psychology
Auditory Physiology
Time Perception
Advanced Scientific English
Acoustic Experiments
Introduction to Psychology
Other Educational Activities
  • 2017.09.
  • 2014.03.
  • 2011.12.
  • 2012.03.
  • 2011.01.
  • 2011.03.
  • 2010.03.
  • 2008.11.
  • 2010.03, Masako Asano, Madoka Konegawa, Yoshitaka Nakajima, & Emi Hasuo, Current trends in music psychology: Music perception, emotion in music, and music therapy, Journal of Design, 12, 83-95 (in Japanese)..
  • 2008.03, Tsuyoshi Kuroda, Ryota Miyauchi, Emi Hasuo, & Yoshitaka Nakajima, What one should care when conducting auditory psychological experiments, Journal of Design, 9, 63-66 (in Japanese).
  • 2006.12, Yoshitaka Nakajima & Kazuo Ueda, A case of faculty development at Owl University, Journal of Design, 6, 91-99 (in Japanese).
  • 2006.03.
Professional and Outreach Activities
Science education of high school students..