Kyushu University Academic Staff Educational and Research Activities Database
Researcher information (To researchers) Need Help? How to update
Paul Joseph Vitta Last modified date:2023.10.02

Associate Professor / Faculty of Languages and Cultures
Department of Linguistic Environment
Faculty of Languages and Cultures

Administration Post

E-Mail *Since the e-mail address is not displayed in Internet Explorer, please use another web browser:Google Chrome, safari.
 Reseacher Profiling Tool Kyushu University Pure
Academic Degree
Doctorate - Education/TESOL - Queen's University Belfast - UK - 2019, Masters - TESOL - Sookmyung Women's University - Korea -2012, BA - Sociology & Anthropology - Washington & Lee - USA - 2003
Country of degree conferring institution (Overseas)
Yes Bachelor Master Doctor
Field of Specialization
TESOL, L2 Research synthesis, linguistic complexity, instructed vocabulary (L2), EAP curricula design
ORCID(Open Researcher and Contributor ID)
Total Priod of education and research career in the foreign country
Outline Activities
--teach EAP courses in the Q-LEAP3 English Program within University's Kikan Scheme
--research activity here:
--ad hoc reviewer for 20 SLA/AL/TESOL journals
--*Associate Editor*
Vocabulary Learning and Instruction
*Associate Editor*
International Journal of TESOL Studies
*Editorial Board*
Studies in Educational Evaluation
Research Interests
  • I am interested in second language acquisition, applied linguistics, and TESOL. My research often lies at the intersection of practice and theory and I specialize in research synthesis and quantitative methods in social science.
    keyword : TESOL
Academic Activities
1. Joseph P. Vitta, Christopher Nicklin, Simon W. Albright, Academic word difficulty and multidimensional lexical sophistication: An English‐for‐academic‐purposes‐focused conceptual replication of Hashimoto and Egbert (2019), The Modern Language Journal, 10.1111/modl.12835, 2023.03, This article presents a conceptual replication of Hashimoto and Egbert (, a study that featured multivariate models where lexical sophistication variables accounted for word difficulty (yes-no recognition) better than frequency alone among learners of English as a second or foreign language from North America. This current study (n words = 88; n people = 128) conceptually replicated Hashimoto and Egbert with data from three Asian university English-for-academic-purposes sites. Methodological differences included a more conservative lexical sophistication operationalization process and avoidance of stepwise regression. Like the original study, the replication's findings favored multi-variate models over frequency, which predicted 36% of word difficulty's variance alone. In a multiple regression model accounting for word difficulty, R 2 = .52, frequency accounted for 17% of the predicted variance with age of acquisition (AoA: 18%) and word naming reaction time (WN_RT: 16%) also being significant predictors. This replication also extended the testing approach by using a mixed-effect model, involving person and site intercepts as random effects. The model's ability to predict word difficulty fell, marginal R 2 = .22, conditional R 2 = .40, but frequency, AoA, and WN_RT remained the strongest predictors. Taken together, this replication successfully supports the original study's more-than-frequency conclusion while highlighting the need for further research into the area..
2. Ali H. Al-Hoorie, W.L. Quint Oga-Baldwin, Phil Hiver, Joseph P. Vitta, Self-determination mini-theories in second language learning: A systematic review of three decades of research, Language Teaching Research, 10.1177/13621688221102686, 136216882211026-136216882211026, 2022.06, Self-determination theory is one of the most established motivational theories both within second language learning and beyond. This theory has generated several mini-theories, namely: organismic integration theory, cognitive evaluation theory, basic psychological needs theory, goal contents theory, causality orientations theory, and relationships motivation theory. After providing an up-to-date account of these mini-theories, we present the results of a systematic review of empirical second language research into self-determination theory over a 30-year period ( k = 111). Our analysis of studies in this report pool showed that some mini-theories were well-represented while others were underrepresented or absent from the literature. We also examined this report pool to note trends in research design, operationalization, measurement, and application of self-determination theory constructs. Based on our results, we highlight directions for future research in relation to theory and practice..
3. Christopher Nicklin, Joseph P. Vitta, Assessing Rasch measurement estimation methods across R packages with yes/no vocabulary test data, Language Testing, 10.1177/02655322211066822, 026553222110668-026553222110668, 2022.02, Instrument measurement conducted with Rasch analysis is a common process in language assessment research. A recent systematic review of 215 studies involving Rasch analysis in language testing and applied linguistics research reported that 23 different software packages had been utilized. However, none of the analyses were conducted with one of the numerous R-based Rasch analysis software packages, which generally employ one of the three estimation methods: conditional maximum likelihood estimation (CMLE), joint maximum likelihood estimation (JMLE), or marginal maximum likelihood estimation (MMLE). For this study, eRm, a CMLE-based R package, was utilized to conduct a dichotomous Rasch analysis of a Yes/No vocabulary test based on the academic word list. The resulting parameters and diagnostic statistics were compared with the equivalent results from four other R-based Rasch measurement software packages and Winsteps. Finally, all of the packages were utilized in the analysis of 1000 simulated datasets to investigate the extent to which results generated from the contrasting estimation methods converged or diverged. Overall, the differences between the results produced with the three estimation methods were negligible, and the discrepancies observed between datasets were attributable to the software choice as opposed to the estimation method..
4. Jeffrey Stewart, Joseph P. Vitta, Christopher Nicklin, Stuart McLean, Geoffrey G. Pinchbeck, Brandon Kramer, The Relationship between Word Difficulty and Frequency: A Response to Hashimoto (2021), Language Assessment Quarterly, 10.1080/15434303.2021.1992629, 1-12, 2021.10, brief report/no abstract.
5. Joseph P Vitta, The Functions and Features of ELT Textbooks and Textbook Analysis: A Concise Review, RELC Journal, 10.1177/00336882211035826, 003368822110358-003368822110358, 2021.08, <jats:p> Decades of research have established that most English Language Teaching (ELT) contexts rely on textbooks and corresponding materials to drive the education process. Textbook analysis as a vital quality control check of these products has also become a popular trend in applied linguistics and second language (L2) research. In the main, ELT textbooks have been conceptualized in past literature as supporting target language proficiency attainment while also giving the teacher everything required to conduct a lesson. Textbook analysis has subsequently checked products’ utility and quality in relation to such proficiency development. Over time, the scope of textbook analyses has expanded to include issues such as cultural representations in these products to address how the products help students become better at English in a multicultural and globalized world. While teachers and researchers can reference numerous well-known books on these phenomena, a concise summary of how textbooks function within ELT contexts and the defining features of textbook analysis research appears to be lacking. This brief report meets this need and is useful to stakeholders in the ELT community, such as teachers and program managers, who might not have the time nor the resources to consult such extensive sources of information. </jats:p>.
6. Joseph P. Vitta, Ali H. Al-Hoorie, Measurement and sampling recommendations for L2 flipped learning experiments: A bottom-up methodological synthesis, Journal of Asia TEFL, 10.18823/asiatefl.2021., 18, 2, 682-692, 2021.06, Brief report/no abstract.
7. Joseph P. Vitta, Ali H. Al-Hoorie, The flipped classroom in second language learning: A meta-analysis, LANGUAGE TEACHING RESEARCH, 10.1177/1362168820981403, 2020.12.
8. Joseph P. Vitta, Christopher Nicklin, Stuart McLean, EFFECT SIZE–DRIVEN SAMPLE-SIZE PLANNING, RANDOMIZATION, AND MULTISITE USE IN L2 INSTRUCTED VOCABULARY ACQUISITION EXPERIMENTAL SAMPLES, Studies in Second Language Acquisition, 10.1017/s0272263121000541, 1-25, 2021.09, <jats:title>Abstract</jats:title> <jats:p>In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an <jats:italic>a priori</jats:italic> power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider <jats:italic>a priori</jats:italic> effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.</jats:p>.
9. Phil Hiver, Ali H. Al-Hoorie, Joseph P. Vitta, Janice Wu, Engagement in language learning: A systematic review of 20 years of research methods and definitions, LANGUAGE TEACHING RESEARCH, 10.1177/13621688211001289, 2021.03.
10. Christopher Nicklin, Joseph P. Vitta, Effect-Driven Sample Sizes in Second Language Instructed Vocabulary Acquisition Research, MODERN LANGUAGE JOURNAL, 10.1111/modl.12692, 105, 1, 218-236, 2021.03.
11. Ali H. Al-Hoorie; Joseph P. Vitta, The seven sins of L2 research: A review of 30 journals' statistical quality and their CiteScore, SJR, SNIP, JCR Impact Factors, LANGUAGE TEACHING RESEARCH, 10.1177/1362168818767191, 23, 6, 727-744, N/A, 2019.11.
12. Joseph P. Vitta, Dayna Jost, Alexis Pusina, A Case Study Inquiry into the Efficacy of Four East Asian EAP Writing Programmes: Presenting the Emergent Themes, RELC JOURNAL, 10.1177/0033688217730145, 50, 1, 71-85, 2019.04.
1. Pablo Robles-García, Christopher Nicklin, Joseph P. Vitta, Jeffrey Stewart, Exploring Teacher Judgements as a Predictor of Students’ Vocabulary Knowledge, American Association for Applied Linguistics (AAAL), 2023.03, [URL], For the last few decades, word-frequency has been widely used to identify which words L2 learners are more or less likely to know (Hashimoto, 2021). However, research indicates that teachers often prefer to rely on their own intuition rather than using corpus-based vocabulary lists for making decisions about the words they want to teach in the classroom (Dang & Webb, 2020; Sánchez-Gutiérrez et al., 2022). Although teacher judgments are a commonly used strategy for vocabulary selection in the L2 classroom, little is known about the accuracy of such judgments when predicting L2 learners’ vocabulary knowledge. This study investigated the effectiveness of word-frequency and teacher judgments in determining students’ vocabulary knowledge and compared the predictive powers of both approaches when estimating word difficulty. Twenty-nine L2 Spanish teachers were asked to predict how likely their students would know words from 3K-LEx (Robles-García, 2020), a 216-word Yes/No test that measures knowledge of the first 3,000 words in Spanish. The accuracy of their responses was compared with the 3K-LEx results of 1,075 L2 Spanish learners. To examine if the results could apply to other L2 settings, 15 L2 English language instructors and 394 L2 English students completed a 70-word Yes/No test measuring knowledge of the first 14,000 words in English. Results showed that for both language contexts, (1) the median teacher rater could assess difficulty with an accuracy roughly comparable to frequency, (2) the combination of teachers’ judgments (minimum of three teachers as determined via bootstrapping) displayed a stronger relationship with word difficulty than frequency, and (3) using teacher judgments and frequency together in multiple regressions did not substantially improve the prediction of word difficulty compared to models with teacher judgments as the lone predictor. These findings suggest the need to develop vocabulary lists that acknowledge teachers’ judgments as a major source of information..
2. Joseph P. Vitta, Aaron Hahn, Christopher Nicklin, Exploring the Sampling Crisis in L2 Quantitative Research: a Predictive Model and Future Directions, American Association for Applied Linguistics (AAAL), 2023.03, [URL], This paper considers the extent to which L2 subfield, author collaboration, and bibliometric and citation analysis metrics predict the observed frequencies of power analysis and multi-site sample use in L2 quantitative research. Recent L2 sampling guidance has called for power analysis derived sample sizes and multi-site samples as a compromise between ideal but impractical randomized sampling and biased single-site samples (Vitta et al., 2021). The predictors were chosen by referencing L2 sampling literature (Morgan-Short et al., 2018) and past research predicting study quality (Al-Hoorie & Vitta, 2019). To investigate L2 sampling practices, 230 papers featuring L2 research and utilizing inferential testing published in 2020 by 46 international journals were coded. Unlike some past sampling reviews that were bound to a handful of journals and/or subfield, the report pool was created to capture a wide-breadth of L2 quantitative research. The search was confined to 2020 as bibliometric and citation analysis metrics were intended as predictors and such metrics are published with a ‘look back’ approach. The results highlighted that the field still overlooks power analysis derived samples with only six of 230 (2.61%) reports considering power in any fashion and thus predictive models involving power as the response variable were omitted. Furthermore, just 36.95% of the reports (k = 85) featured multi-site samples. During the bivariate screening process, bibliometric and citation analysis metrics (e.g., CiteScore) were discounted as significant predictors of multi-site sampling practices. Multivariate generalized modeling (pseudo R2 = .28) demonstrated that instructed SLA-focused reports (OR = .11) and single-authored reports (OR = .36) had a 89% and 64% probability, respectively, of being single-site papers. A sensitivity GLMM analysis was conducted and the clustering (random) effect of journals was found to be inconsequential. Finally, the authors will discuss how to apply the findings to reform future L2 sampling and research practices..
3. Joseph P. Vitta, BAAL Researcher Development Workshop Series: Professional Development in Applied Linguistics for Graduate Students and Early Career Researchers -- Navigating the academic job market, British Association for Applied Linguistics, 2022.03, [URL], Workshop objectives

This series aims to serve as a platform to instigate dialogues between doctoral students/early career researchers (ECRs) with senior academics in the field of Applied Linguistics on various important topics relevant to professional development in academia. Specifically, through hosting semi-formal online workshops and informal discussion forums for doctoral students and ECRs, and launching a researcher development website, the organiser aims to:

Demystify areas related to professional development in Applied Linguistics which are often hidden and misconceived through experience sharing by senior academics;
Provide opportunities for doctoral students/ECRs to network with each other to share information and their own experiences related to different facets of professional development in Applied Linguistics..
4. Joseph P. Vitta, Christopher Nicklin, Simon W. Albright, Academic word difficulty and multidimensional lexical sophistication: A multi-site replication of Hashimoto & Egbert (2019), American Association for Applied Linguistics (AAAL), 2022.03, [URL], Hashimoto and Egbert (2019) demonstrated that multivariate models featuring lexical sophistication predictor predict L2 words difficulty best. Their ‘more than frequency’ conclusion contributed to the ongoing debate regarding the extent to which frequency should be considered the “the single most important characteristic of lexis” (Schmitt, 2010, p. 63). This current study (nwords = 91; npeople = 171) conceptually replicated Hashimoto and Egbert with data from three Asian University EAP sites. This conceptual replication featured two main departures from the original study’s methodology. First, the target words came from Coxhead’s Academic Word List (2000). Second, an alternative testing approach was undertaken featuring a theory-driven selection of predictors and the avoidance of stepwise regression, which applied statistics literature has challenged (Smith, 2018). Like the original study, the replication’s findings favored multivariate models. In the final model, frequency was replaced by range and the predictor accounted for approximately 21% (observed Pratt value) of the total variance predicted in the multiple regression model (R2 = .60). Age of acquisition (AoA: 18%) and word naming reaction time (WN_RT: 15%) were the other significant lexical sophistication predictors. Extending from the original study’s use of fixed-effect modeling, the word difficulty predicted by lexical sophistication hypothesis was addressed with a generalized mixed-effects model, where person and site intercepts were random effects and false alarm rate was added as a fixed-effect covariate. As expected, the model’s ability to predict word difficulty fell (conditional R2 = .45), but AoA, range, and WN_RT remained the strongest predictors with all other predictors having less of an association with the DV than false alarms. Our study’s main implications are that multivariate lexical sophistication models appear ideal for predicting word difficulty across functional domains, and that false alarm rate might be an important covariate in word difficulty research, although it has recently been overlooked (e.g., Hashimoto, 2021)..
5. Joseph P. Vitta, Christopher Nicklin, Improving L2 instructed vocabulary experimental designs’ samples: A methodological synthesis with recommendations, American Association for Applied Linguistics (AAAL), 2021.03, In this presentation, we will present the results of a methodological synthesis investigating the state of second language instructed vocabulary (L2 IV) research sampling procedures. A sample of 82 (quasi-)experimental IV reports were systematically collected from five SSCI journals and then synthesized over a two-stage process with the aim of improving future L2 IV research and enhancing its usefulness to researchers and teachers alike.
Phase I entailed an analysis of the reports’ sample design choices. The results revealed that none of the reports conducted a priori sample-size planning nor random sampling, and approximately 83% of the studies drew their samples from a single education setting, which potentially affects the generalizability of the results (Morgan-Short et al., 2018). Of the 14 studies employing multi-site samples, none empirically checked for possible clustering effects of the different locations (Al-Hoorie & Vitta, 2019). Half (k = 41) of the reports included random assignment at either the class or participant level. Based on these findings, we suggest more widespread collaboration among colleagues and institutions. We also encourage researchers to engage in random/probability sampling with demographic considerations that are representative of meaningful populations. This addresses the observed over-reliance on smaller samples from a single setting. Additionally, we highlight how multivariate testing can parsimoniously consider random/clustering effects in a sample.

Phase I’s finding regarding sample-size planning was addressed in Phase II.The effect sizes observed in our sample were aggregated according to group comparison, as such summative effect sizes are required for sample-size planning. Utilizing effect sizes (between-subject gs = .62 [medium], and .33 [small]; within-subject counterbalancing gav = .25) that operationalized meaningful IV hypotheses (i.e., which teaching method is more effective), power simulations are presented to provide suggestions on the required sample sizes needed to detect the aggregated effects that we observed..
Membership in Academic Society
  • Korea TESOL
  • American Association for Applied Linguistics
  • Asia TEFL - Life time Member
  • JALT
Educational Activities
teach academic English Courses:

Theme English C
Academic Issues
Global Issues
Professional and Outreach Activities
Through my publishing and editorial work, I have collaborated with academics from the United Kingdom, Saudi Arabia, the Republic of Korea, Canada, and the United States..