TOEIC® test design and development processes are systematic and rigorous, leading to score interpretations that are meaningful, fair and relevant to the real world.
Evidence: Research in this category examines:
- whether scores mean what they are intended to mean
- that TOEIC score interpretations about someone's English skills are unbiased and fair (e.g., in regard to gender or disability)
- that they reflect a real-world setting ability
Validity: What Does It Mean for the TOEIC® Tests?
This paper provides a nontechnical overview of test development and research projects undertaken to ensure that TOEIC test scores serve as valid indicators of test takers' skills to communicate in English in global workplace environments.
The main value of the TOEIC tests lies in their validity, which can be defined as the extent to which the tests do what we claim they can do. The TOEIC tests yield valid scores in part because of the careful way in which they are designed. The brief overview of the design process in this paper highlights the training of TOEIC test developers, the development process and test specifications and the scoring process. By following standardized procedures and using highly qualified, trained test developers, the test development process helps ensure that test content:
- is focused only on the relevant abilities
- is not biased against any group of test takers
- results in forms that are highly similar or parallel
Standardized and rigorous statistical procedures are used to routinely monitor scores to ensure that they are consistent or reliable, and do not include items that were unexpectedly biased against groups of test takers. Further evidence of the validity of the TOEIC test scores comes from special studies such as can-do self-assessment studies. These studies provide evidence that higher TOEIC test scores are associated with the increased likelihood that someone can perform a variety of everyday and workplace tasks in English.
The Relationship Among TOEIC® Listening, Reading, Speaking and Writing Skills
Through examination of test scores, this research found that the TOEIC tests measure distinct but related skills, and that, taken together, they provide a reasonably complete picture of English-language proficiency. This finding provides additional evidence that four-skill approach to language proficiency assessment is crucial. In addition, this research also explored the potential role of receptive skills (i.e., listening and reading) in the improvement of productive skills (i.e., speaking and writing) with the result that test takers at similar levels of speaking and writing ability showed more improvement over time when they had higher reading and listening ability, highlighting the potential importance of receptive skills in developing productive skills.
Measuring English-language workplace proficiency across subgroups: Using CFA models to validate test score interpretation
This study used a statistical technique called "factor analysis" to determine which statistical model best explained performance on the TOEIC® Listening and Reading test. Researchers found that a model (two-factor model) in which reading and listening skills were represented as distinct abilities best accounted for performance, consistent with how scores are supposed to be interpreted. In addition, researchers examined whether the two-factor model held across different subgroups of test takers (e.g., gender, age, employment status) in order to investigate whether scores could be interpreted in the same manner for all groups of test takers. Results indicated that TOEIC test scores have the same meaning across the subgroups included in this study, which provides evidence of fairness.
Expanding the Question Formats of the TOEIC® Speaking Test
Traditionally, researchers have used the term "authenticity" to refer to the degree to which tasks on a language test correspond to those used in the real world, with authenticity being a desired characteristic of tasks and tests. This white paper explains how the format of several questions in the TOEIC® Speaking test was expanded to include a greater variety of real-world situations. This systematic process used by TOEIC test developers helped produce a more authentic test without altering its overall difficulty, resulting in even more authentic interpretations of speaking proficiency.
The Incremental Contribution of TOEIC® Listening, Reading, Speaking and Writing Tests to Predicting Performance on Real-Life English-Language Tasks
This study investigated whether proficiency in a particular language skill (e.g., speaking) could be better estimated by considering not only the TOEIC test scores corresponding to that skill, but also TOEIC tests scores for other skills. The results supported this assertion, suggesting that scores on the four-skill TOEIC tests together provide a more valid measurement of English-language proficiency than any skill in isolation. (The findings of this externally published report are consistent with the ETS-published report The TOEIC® Listening, Reading, Speaking and Writing Tests: Evaluating Their Unique Contribution to Assessing English-Language Proficiency.)
The TOEIC® Listening, Reading, Speaking and Writing Tests: Evaluating Their Unique Contribution to Assessing English-Language Proficiency
This study investigates:
- The extent to which TOEIC test scores of one ability correlate with test takers' self-assessments of their English abilities across all four skills
- Whether one English skill (e.g., reading) can be more accurately estimated or predicted using multiple other TOEIC test scores i.e., listening, speaking and writing
For the first point, researchers found significant correlations between self-assessments and TOEIC test scores across all four skills, which provides evidence that TOEIC test scores are valid and meaningful indicators of English-language proficiency. For the second, researchers concluded that more accurate predictions of English-language ability were obtained when multiple TOEIC test scores were used. This can be explained by a conceptualization of language proficiency in which language knowledge (e.g., grammar, vocabulary) is utilized across language skills, and demonstrates that four-skill assessment can be used to measure any single language skill even more accurately. (This outcome is consistent with the results of the externally conducted study The Incremental Contribution of TOEIC Listening, Reading, Speaking and Writing Tests to Predicting Performance on Real-Life English Language Tasks.)
Constructed-Response (CR) Differential Item Functioning (DIF) Evaluations for TOEIC Speaking and Writing Tests
Differential item functioning (DIF) is a statistical procedure used to identify items or tasks that are unexpectedly biased in some way, inappropriately favoring one group of test takers over another. One of the challenges for speaking and writing tests is the lack of proven, practical DIF techniques that can be used to analyze performance-based or "constructed-response" tests. This paper investigates several such techniques and illustrates how research is being conducted to ensure the fairness of score interpretations.
Validating TOEIC Bridge™ Scores Against Teacher and Student Ratings: A Small-Scale Study
This study sought to assess the degree to which TOEIC Bridge™ scores correspond to student self-assessments and teacher assessments of students, two measurements of English-language proficiency. TOEIC Bridge scores were found to be moderately correlated with these measurements, a finding which provides validity evidence that TOEIC Bridge scores can be meaningfully interpreted as indicators of English-language proficiency.
TOEIC Bridge™ Scores: Validity Evidence from Korea and Japan
This study sought to compare TOEIC Bridge scores to test takers' self-evaluations of their own abilities to perform everyday language tasks in English. The results suggest that the test scores correlated well with test takers' self-evaluations, providing further evidence in support of the of TOEIC Bridge scores as valid and fair indicators of English-language proficiency.
Background and Goals of the TOEIC® Listening and Reading Test Redesign Project
As time progresses, it becomes important to revisit the design of a test to ensure that its conceptualization of language proficiency aligns with current theory and test tasks continue to be indicative of real-world tasks. This report outlines the goals, theoretical alignment, procedures and outcomes of a redesign effort for the TOEIC Listening and Reading test. The updated test design and specifications for the revised TOEIC Listening and Reading test helped ensure that the test continued to provide a meaningful interpretation of listening and reading proficiency that generalize to a real-world ability.
Comparison of Content, Item Statistics, and Test Taker Performance on the Redesigned and Classic TOEIC® Listening and Reading Test
This paper compares the content, reliability and difficulty of the classic and 2006 redesigned TOEIC Listening and Reading tests. Although the redesigned tests included slightly different item (question) types to better reflect current models of language proficiency, the tests were judged to be similar across versions. The results provide evidence for the reliability and consistency of the redesigned TOEIC Listening and Reading tests, and suggest that the redesigned tests can be meaningfully interpreted and used to make decisions in line with the classic tests.
Evidence-centered Design: The TOEIC® Speaking and Writing Tests
Evidence-centered design (ECD) is an assessment development methodology which explicitly clarifies what an assessment measures and supports skills interpretations based on test scores. This paper describes the ECD processes used to develop the TOEIC® Speaking and Writing tests. Evidence collected through the test design process produced foundational support for the validity of TOEIC Speaking and Writing test score interpretations.
Field Study Results for the Redesigned TOEIC® Listening and Reading Test
This paper describes the results of a field study for the 2006 redesigned TOEIC Listening and Reading tests, which includes analyses of item and test difficulty, reliability and correlations between test sections with classic TOEIC Listening and Reading tests. Results are consistent with another comparability study (Liao, Hatrak and Yu's in 2010), which found evidence of the reliability of the redesigned tests, and suggested that scores on the redesigned test could be interpreted and used in similar ways to classic TOEIC Listening and Reading test scores. This provides evidence for the reliability and consistency of TOEIC Listening and Reading test scores as well as the validity and fairness of score interpretations comparable to the previous version.
Statistical Analyses for the TOEIC® Speaking and Writing Pilot Study
This paper reports the results of a pilot study that contributed to TOEIC Speaking and Writing test development. The analysis of the reliability of test scores found evidence of several types of score consistency, including inter-rater reliability (agreement of several raters on a score) and internal consistency (a measure based on correlation between items on the same test). The correlational analysis found evidence that each test section measured three distinct claims about Speaking or Writing, as intended. These results helped the development of final specifications for the TOEIC Speaking and Writing tests in addition to providing evidence of score reliability and the validity of score interpretations.
The Redesigned TOEIC® Listening and Reading Test: Relations to Test Taker Perceptions of Proficiency in English
After any test redesign project — such as the redesign of the TOEIC Listening and Reading test in 2006 — it is important to provide evidence that test scores can still be meaningfully interpreted. This study examined the relationship between scores on the redesign of the TOEIC Listening and Reading test and test takers' perceptions of their own English proficiency. Researchers found moderate correlations between the test scores and test takers' perceptions, providing evidence that scores on the redesigned TOEIC Listening and Reading tests are meaningful indicators of English ability.
The Relationships of test scores measured by the TOEIC® Listening and Reading Test and TOEIC® Speaking and Writing Test
This study examines the relationship between TOEIC Listening and Reading scores and TOEIC Speaking and Writing scores in order to determine whether or not Listening and Reading scores should be used as predictors of Speaking and Writing scores, and vice versa. Findings support the validity of test scores for the measured skills (e.g., Listening and Reading test scores provide meaningful interpretations of Listening and Reading skills). The findings also suggest that the skills all assess different abilities, and researchers concluded that predictions of speaking and writing scores based on reading and listening were relatively imprecise. This research supports the argument that four-skill testing provides the most accurate picture of a test taker’s language proficiency.
TOEIC® Speaking and Writing Tests: Relations to Test Taker Perceptions of Proficiency in English
This study sought to compare scores on the TOEIC Speaking and Writing tests to students' self-evaluations of their abilities to perform everyday English-language tasks. The researchers reported relatively strong correlations between test scores and the self-evaluations. This finding contributes further evidence in support of TOEIC Speaking and Writing test scores as indicators of English-language proficiency. This study was also published as Powers, Kim, Weng, and Van Winkle (2009).
TOEIC® Listening and Reading Test Scale Anchoring Study
Scale anchoring is a process that groups test scores into score ranges or proficiency levels. It uses a combination of statistical methods and expert judgment to produce descriptions of the skills and knowledge typically exhibited by test takers at each proficiency level. This research report describes the scale anchoring process for TOEIC Listening and Reading tests, which facilitates meaningful score interpretations.
Relating Scores on the TOEIC Bridge Test to Student Perceptions of Proficiency in English
This study investigated the relationship between TOEIC Bridge scores and students' evaluations of their own English-language proficiency. The TOEIC Bridge test scores were found to be correlated with self-reported reading and listening skills, providing evidence that TOEIC Bridge test scores are valid or meaningful indicators of English-language reading and listening proficiency.
Validating TOEIC Bridge™ Scores Against Teacher Ratings for Vocational Students in China
This study compared TOEIC Bridge scores with teachers' assessments of test takers' abilities to perform everyday language tasks in English. The authors reported moderate correlations between these assessments and test scores, which provide supporting evidence of the validity of TOEIC Bridge test scores as indicators of English-language proficiency.