High Stakes, but Low Validity? A Case Study of Standardized Tests and Admissions into New York City Specialized High Schools

This is a study of the admissions process at a select group of New York City public high schools. It offers the first detailed look at the admissions practices of this highly regarded and competitive group of schools, and also provides a window into the broader national debate about the use of standardized tests in school admissions. According to New York State law, admission to these schools must

be based solely on an exam. The exam used is called the Specialized High Schools Admissions Test (SHSAT). This study makes use of the individual test results from 2005 and 2006.

Several key findings emerge:

1. The SHSAT has an unusual scoring feature that is not widely known,

and may give an edge to those who have access to expensive test-prep tutors. Other reasonable scoring systems could be constructed that would yield different results for many students, and there is no

evidence offered to support the validity of the current system.

2. Thousands of students who are not being accepted have scores that are statistically indisti nguishable from thousands who are granted admission. And these estimates are de

rived using the less precise, classical-test-theory-based measures of statistical uncertainty, which may understate the problem. The New York City Department of Education (NYCDOE) fails to provide the more accurate, item- response-theory-based estimates of the SHSAT’s standard error of measurement (SEM) near the admi

ssion cutoff scores, which would offer a clearer picture of how well the test is able to differentiate among students who score close to the admi ssion/rejection line. This omission

violates generally-accepted testi ng standards and practices.

3. Students who receive certain versions of the test may be more likely to gain admission than students who receive other versions. No evidence is offered on how accurate the statistical equating of different test versions is. The mean scaled scores vary across versions much more than would be expected given the ch

ance distribution of ability across large random samples of students, suggesting that the scoring system may not be completely eliminating differences among test versions.

4. No studies have ever been done to see if the SHSAT is subject to

prediction bias across gender and ethnic groups (i.e., if SHSAT scores

predict things for different groups).

https://nepc.colorado.edu/publication/high-stakes-but-low-validity