Year after year, Mario takes district, state and national tests. Each year Mario’s individual scores are combined with others in his class, school, district, and state. The scores are sent home to parents, analyzed by teachers, districts and departments of education. Decisions are made about Mario, his teachers and his school. Belief in the validity of the scores is so strong that most people uncritically accept their truth.
All high-stakes testing is based on the paradigm that learning can be ‘measured’ by using a device that produces a number. Tests play the role of this measuring device and the resulting numbers are translated into scores. These scores are then compared and contrasted and by selecting arbitrary criteria are used to categorize students, teachers, schools, districts and states. But what if the paradigm is wrong. What if learning cannot be ‘measured’?
Under the current line of thinking we have had tests for a long time in our classrooms and schools. Every such test has supported the idea that once Mario’s test is scored it can be used as the basis for judgments about his progress and comprehension of the taught concepts. The idea appears to be very simple: ask Mario a set of questions, arrive at a number for each correct answer, add up these numbers and there is his score.
There is a fatal flaw in this line of thought. The process of adding scores must be based on a simple scientific principle: Items can only be added if they have the same units. One apple plus one apple is two apples. We can add one plus one and arrive at two because of the same units: apple.
One apple plus one orange has no sum because they are different items. Attempting to collect them into a new entity is contradictory to their essence. The combination of one apple plus one orange does not produce an ‘apple-orange’. In reality this mathematical computation does not produce one or two of anything. In fact this process cannot be done.
In current high stakes test construction each test question is based on a singular standard. For example, let’s say that the standard is: “Students will understand the slope of a line.” There are an infinite set of questions that can address this standard, but each question will be different from the others or otherwise they would be identical questions. If the test asks five diverse questions on the slope of a line and Mario gets three of them correct we cannot say that his score on slope of a line is three. These are five different questions like adding 1 apple to 1 orange to 1 banana. Three correct answers cannot produce a score of 3. Each question is really a test unto itself and cannot be combined with others. Each question is unique; it stands alone and cannot be added to another unique question.
Imagine a singular test that has questions from mathematics, English, science, and social studies. It is quite obvious that combining the number correct from these different disciplines provides no clarity as to what this score would mean. What is not so obvious is when the exam is a ‘math’ test with questions on slope added to those on geometry to those on equations etc.. This same concept holds true for tests in any subject with differing standards and an infinite set of questions for each.
The very act of counting the number of Mario’s correct responses in the category ‘questions’ can only specify that the number of questions correct is ‘such and such’ and not that this number defines any type of conceptual understanding.
We delude ourselves into thinking we have measured learning because we uncritically accept the premise that ‘learning is measurable’. Adding the number of correct responses along with some mathematical formulation cannot produce a score. We have been duped!
Therefore, if it is impossible to arrive at a score for Mario and any compilation of questions we call a ‘test’ then what can be done to find out what he really knows? Answering a question with a correct choice does not mean he has correct understanding. Not only can Mario guess, but also he can have wrong reasons for the correct answer. Surely, if Mario’s score is without merit, combining it with other invalid scores in the classroom, in the school, in the state tells us nothing. Can there be evidence of Mario’s learning there? Yes.
The evaluator of the test-usually the teacher- can describe the student’s level of understanding by using words to articulate their comprehension of each question.
Well, can’t numerical scores also describe? No. A score is a number… is a number… is a number. It is not a description. It is the interpretation of the scored number that forms a description in words. I am suggesting that we significantly reduce the number of questions on a test to provide the time for a knowledgeable evaluator or teacher to discuss with each student the justification for their answers.
To do this they need to be in dialogue with each student about their answers and record the justification for their commentary. Describing learning uses words just like an artist uses media of varying colors and type as the means to paint the picture of the learning. “Helen you have great skills in calculating the slope of a line but you are not yet able to explain its meaning.” See http://www.learningrecord.org/compare.html for one example of such a process.
“This is where the Learning Record shines. Because of its structure, information about student learning, no matter how diverse, is organized in consistent, meaningful sections that can be quickly accessed and understood by readers across all disciplines.”
‘Forgiving Learning’ as explained in the last chapter of my book is another:
“If not high-stakes testing, then how else are students, parents and to determine what students know and are able to do? What system can be placed within the structure of a classroom, school and district to provide authentic information about what students have learned? This assessment can take place in a mastery conference with the teacher in which the student must demonstrate their understanding in any form of presentation, demonstration, portfolio defense, etc. The key to any assessment is the requirement that the student is to justify their understanding.”
What should be the upshot of all of this? Our confidence in high-stakes testing scores should take a significant plunge. We should no longer believe that state and national test scores could measure learning. We may have thought we were measuring learning, but now we know that no measurement had ever taken place. We were performing mathematical manipulations that had no meaning in the real world. We thought we could extend these scores to teacher effectiveness, school and district rankings and comparisons across the US and the world. With invalid scores, all of this is nullified. Some schools had created ‘data’ walls but now we know they are bogus: there was really no valid data to display.
And so, it is finally over. The tyranny of high stakes test scores are laid to rest. We cannot accept purported test scores and the impact they have on individual students, teachers and schools without being grounded in a sound understanding of what they are and what they are not. All are now released from the paradigm that student learning can be measured. We are now free to describe student learning as we have done throughout history, “Mario, your paragraph is clear, concise and shows your mastery of English form and content. A terrific job.”