The argument-based approach to justifying test use presumes that test developers must convince stakeholders (i.e., anyone affected by the test) that the intended use of the test is justified. To this end, the test developer makes explicit claims regarding how test scores should be interpreted and used to make decisions. These claims are supported or undermined by evidence which may include documentation from the test development process and/or ongoing research. Through an examination of the test developer's claims and the evidence to support them, stakeholders may arrive at a global evaluation of whether the intended use of the test is justified. This approach is used to:
- guide test development
- provide direction for ongoing research
- serve as an accountability tool for different stakeholder groups
An Assessment Use Argument is "a conceptual framework for guiding the development and use of a particular language assessment, including the interpretations and uses we make on the basis of the assessment" (Bachman and Palmer, 2010, 99). The framework is structured as a hierarchical set of claims made by the test developer regarding how test scores should be interpreted and used to make decisions. It takes the following general form:
Each component in the figure above represents a claim. At the highest level, the test developer may claim that the consequences that are an outcome of the decisions made based on the test are beneficial for all stakeholder groups (e.g., decision errors have been minimized). This presumes a claim regarding the decisions that follow from score interpretations — specifically, that decisions are equitable and sensitive to the values of relevant institutions (educational, societal, organizational, legal). In order to justify interpretations about test-taker abilities based on scores, the test developer makes claims about the meaningfulness, impartiality, generalizability, relevance and sufficiency of interpretations. Finally, all of these claims rest upon the foundational claim that scores based on test-taker performances are consistent across test forms, administrations and raters. Thus, each claim in an AUA consists of:
- an outcome of test use (e.g., the decisions that follow from interpretations about test-taker abilities)
- qualities of that outcome (e.g., decisions are values-sensitive and equitable)
Both decision makers and test developers share responsibility for justifying assessment use. Test developers are expected to provide evidence to support the claim that test scores are consistent, and that scores may be used to make interpretations about test-taker abilities. Decision makers need to demonstrate that decisions are values-sensitive and equitable, and that consequences of decisions are beneficial. Unfortunately, decision makers may lack the expertise needed to provide adequate backing for these claims (e.g., documentation from standard setting, estimates of decision errors). Consequently, an AUA may be enhanced through collaboration between decision makers and test developers. At the very least, feedback from decision makers should be sought by test developers in order to determine whether claims about the decisions and consequences based on test use may be justified.
As a whole, the structure of an AUA provides a basis for a comprehensive justification of test use that links real-world concerns about decisions and their consequences with the traditional concerns of test developers — reliability and validity. As a comprehensive list of claims, warrants, backing and rebuttals, it can be used to identify weaknesses in the overall argument for test use and prioritize research or test development projects.
Finally, as a simple hierarchical set of claims (as shown in the figure above), an AUA can be used as a communication tool that illustrates the key issues that determine important qualities of the usefulness of a test, including fairness, impact, reliability and validity. The concerns of individuals and stakeholder groups vary, and one of the challenges for research is addressing these concerns in a coherent manner, while enhancing the assessment literacy of stakeholders. Concerns can include:
- Score consistency
"How can you make sure that all raters follow the scoring guides?"
- The interpretation of scores
"When we calculate criterion validity, who or what is the criterion?"
- The decisions based on these interpretations
"What are the cutscores in other institutions?"
- Consequences of test use
"How have the TOEIC® tests been helpful for job seekers?"
- Test use that relates to a number of these issues
"How can recruiters know that TOEIC scores meet the needs of the market?"
By delivering versions of an AUA oriented toward specific stakeholder groups, a test developer with a strong research program may be able to help stakeholders find answers to their questions and become more sophisticated consumers of assessment products.
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.