I’ve lived and taught in New York State since 1993 when I moved from Chicago to be a professor at Syracuse University; I moved to New York City in 1997. I follow state and city educational policy closely and when Mayor Michael Bloomberg initiated giving letter grades to restaurants and schools, I began to investigate the formula and the New York State achievement tests that counted so heavily towards the final grade. There were a few names that continually popped up as offering expert analysis of the flaws of the tests and thus the fatal flaws of the evaluation formulas. One is a colleague of mine at Teachers College — Professor Aaron Pallas — who often blogs at Hechinger Report.. Another name that kept popping up was Fred Smith who worked as a testing specialist for the New York City public school system. Smith retired as an administrative analyst and has since served as a consultant on test projects. He is a member of Change the Stakes, a parent advocacy group opposed to the harmful impact that testing has had on education.
When the news broke last night of the massive Pearson server error in Colorado for their on-line achievement tests, I immediately wrote to Fred to ask what the implications would be for the validity and relibility of the test results. How standardized can the standardized measure be when some of the students got part way through the tests? Why is Pearson taking on so many contracts for state (and PARCC) testing when they keep having such troubles? Here is his reply:
How much more can test publisher Pearson do wrong and keep getting lucrative contracts to furnish statewide testing programs? Today it’s Colorado. Tomorrow Minnesota. And we in New York have seen repeat performances of poorly conceived and implemented exams supposedly aligned to the Common Core.
Pearson’s slogan, “Always Learning” takes on new meaning in the context of its obvious trial-and-error approach to testing. This was true even before the latest push to initiate computer-based exams. Now we’re on the cusp on the new and improved, latest and greatest adaptive assessment era.
Having delivered paper and pencil pineapples to us and age-inappropriate reading and math material leading us to the dawn of the coming age, the publisher and the states, who become Pearson’s defenders, will never admit that the tests themselves and the measurement process have virtually become a clumsy experiment that stumbles along as mistakes are made and compounded—but always explained away as a few glitches.
The disruption of Colorado’s attempt to inaugurate computerized Science and Social Studies exams in no less than 179 school districts is the latest example. Bugs, excuse me, “technical difficulties” and “functionality challenges” caused computerized testing in Colorado to “not operate optimally” for several hours Tuesday morning, according to the Colorado Department of Education. But CDE seemed to have answers even before they understood the extent of the problem.
So some 34,000 children started taking the social studies exams in grades 4 and 7 and others started on science in grades 5 and 8 before the interruption. They were allowed to continue the exams. The remaining 28,000 students were told not to begin. Meanwhile CDE hadn’t figured out which schools were which.
Teachers spoke about how the system completely crashed, stranding students in the middle of the exam. What happens to them? Do they come back and take a new test (unlikely), or do they re-take the exam? Do they pick up where they left off? These questions and the decisions that are made about resuming the tests will confound their results.
Having seen parts of the exams and later coming back to finish them will make the experience of the interrupted children different than that of the rest of the test population and will introduce factors that may impinge on their scores. Can these children go back and change their answers? Will they go back to their classrooms and talk to other kids in the same situation and gain knowledge of the correct answers? Will they talk with children waiting to take the exam and give them intelligence about what’s coming? Who knows? The crashing of the system invites variables into the mix that are irrelevant to student knowledge of science and social studies and that muddle the meaning of the results (aka the test’s validity).
In addition, there were reportedly other mechanical difficulties, not just log-in problems. They included the inability to drag and drop items as certain items required. You can be sure that CDE will assert that nothing in the administration of the test and the eventual generation of results has been compromised. But these are supposed to be standardized tests—and, as such, given under uniform conditions to all students. To the degree that the problems cited interfere with that objective, questions will endure about the meaning of the results for each student, about measuring the achievement of each and about setting a baseline by which to evaluate growth over time. There’s no getting around the fact that the issues involved negatively impact the validity of the exams.