Standardized Tests: Have we gone too far?

Andy Resnick · May 11, 2015

stevendaryl said:

<snip>

To give you a counter-example: You can go to a website such as: http://www.sheppardsoftware.com/African_Geography.htm to test your knowledge of the countries in the continent of Africa. <snip> It is NOT to come up with a numerical score: 0 to 100 (what percentage of the countries in Africa can you name). <snip>

I don't understand your point- taking those 'tests' absolutely results in a numerical score. Your comments regarding the SAT underscore my point that there is only partial agreement about how 'learning outcomes' can be tested in the first place. How does one design a test to evaluate how well a student has learned to fashion a logical argument? To critically read an editorial column?

This thread is about 'standardized tests'- not 'testing'.

stevendaryl · May 11, 2015

Andy Resnick said:

I don't understand your point- taking those 'tests' absolutely results in a numerical score.

Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.

JoeyCentral · May 11, 2015

jbunniii said:

Which shows that the correct answer is ##7 \times 8 = 8! / 6!##

You could also denote it as:

∫⁷₀8dx

Andy Resnick · May 12, 2015

stevendaryl said:

Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.

I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students. And compare their teachers. And compare their schools. And this comparison is used to determine the funding received by those schools.

stevendaryl · May 12, 2015

Andy Resnick said:

I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students.

Being a standardized test means that the questions and answers are standardized. That's independent of whether it is used for self-assessment or for comparison between students, isn't it?

votingmachine · Jun 30, 2015

stevendaryl said:

I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant? Certainly, tests can't accurately measure inherent ability, but that's only relevant if you're trying to use the test to decide a student's entire future. But if you're only trying to decide what courses the student should take next, and whether the student needs additional help in a subject, then I think a test can give you a lot of information about that. That's why I advocate lots of small, low-stakes tests. They would just be a snapshot of where the student is, academically, not some kind of Tarot reading of what they are capable of next year or 10 years from now.

Your point about external factors such as a home life that is not conducive to learning is very good, but I'm not sure how schools should address those kinds of inequalities, other than to give students lots of opportunities for extra help.

Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.

HomogenousCow · Jun 30, 2015

votingmachine said:

Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.

That's an unreasonable comparison, the quality of the teachers does impact the test results of the students.

Dale · Jun 30, 2015

votingmachine said:

But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system

Most people have the opinion that whatever metric is currently used to measure their performance is not representative of their performance.

jedishrfu · Jun 30, 2015

There is another thing to consider that whatever metric is chosen skews the results in a certain way as people try to optimize their score for their performance appraisal thus invalidating the metric.

Dr. Deming often said that a metric shouldn't be tied to an individual's performance for that very reason.

Instead it should be used to discover those teachers who are naturally better at teaching so that you can learn from them and train other teachers to do the same.

votingmachine · Jun 30, 2015

I agree that teacher quality matters and does impact the test results of students. What I said was INTUITIVELY, I think the data has too much randomness to allow easy statistical conclusions. The apocryphal story was clearly an exaggeration to show why we don't want to base process changes on bad data.

I also agree that people often think that whatever metric they are measured by misses some elusive qualities that make them special. But then again, some metrics DO miss the important thing. Adding testing is the frequent attempt to get a meaningful metric.

To draw conclusions from data, you need good data. And an understanding of the thing you are measuring. I might be wrong, and it may only take a dozen test scores and a single year to discover that a teacher needs more instruction on the craft of teaching. My perception is that it will take more scores and more time. But that is an intuitive perception, based on being a parent, and seeing student variation, seeing the occasional sick kid tested, and just having my own perception of populations and variations. There were comments here about how easy the tests were. Those comments also did not really endorse their teachers. But their test results would specifically support whatever the education system did.

I think tests can be a valuable part of measuring student performance and measuring system performance. But I think that bringing the tests into the system side needs to be done carefully. I thought that about the initial comment I was reading:

"I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant?"

The answer is that anything that increases the variance makes it harder to draw statistical conclusions. If one teacher has a class with test scores of 50, with a sigma of 20, and another has a class with test scores of 60, with sigma of 23, then the variations from other environmental factors makes comparison of the two teaching styles a bit difficult. One might be wildly better. Or it might be small datasets like:

50, 40, 30, 50, 60, 70, 80, 20
average=50, std=20

50, 40, 30, 50, 60, 70, 80, 100
average=60, std=23

I took out the worst student score and stuck in a bright score. Or maybe the 20 was a kid who was sick on testing day (and there are generally no excused absences).

I don't know if that is realistic. But I think that for results to be statistically significant, it will take some good data. And strictly INTUITIVELY, I think that is difficult to get quickly and easily.

Dale · Jun 30, 2015

Statistical methods for dealing with those kinds of issues are well known. As long as the analysts are competent (not a given) I don't see any of those as being a real problem. Furthermore, available statistical methods are quite good at testing the data itself to determine if these issues are even problems for a given data set.

votingmachine · Jun 30, 2015

That is the big thing. Unfortunately you can't assign kids randomly to placebo groups that get no instruction and different treatment groups. Bugger that whole problem of human test subjects.

Good statisticians are important. I wish I was one, but I'm strictly low level. I agree that the analysts have to be competent, and they have to be alert to some crazy potential confounding variables. Home life, which was mentioned, would have to be looked at.

Do you think my "intuition" is wildly wrong or on the mark about the variation issue in student populations? I recall my brother relating the story of how his rural school tried very hard to convince him to find a way to keep his two very bright kids in their system. He was aware that they were aware of the tumble in their metrics from the transfer of two excellent students (and high test results). Likewise, there is a whole bunch of news around how schools try to transfer out any student that has poor test scores. If you have that pregnant teen ... you know what will happen.

I see now the following post was made already:
"There is another thing to consider that whatever metric is chosen skews the results in a certain way as people try to optimize their score for their performance appraisal this invalidating the metric." (Jedishrfu)

That is what I was getting at.

Dale · Jun 30, 2015

votingmachine said:

Unfortunately you can't assign kids randomly to placebo groups that get no instruction and different treatment groups.

But you don't need to if you use the right methods.

votingmachine said:

Do you think my "intuition" is wildly wrong or on the mark about the variation issue in student populations?

I think your intuition is wildly wrong about the importance of the issue. It is simply something that the statisticians need to account for in their methodology, not something that fundamentally precludes analysis.

votingmachine · Jun 30, 2015

DaleSpam said:

I think your intuition is wildly wrong about the importance of the issue. It is simply something that the statisticians need to account for in their methodology, not something that fundamentally precludes analysis.

Cool. Although I guess what I am saying is not that you can't do the analysis. As you point out the analysis tells you if there is a significant difference. I was getting at whether the randomness matters so much that the data set has to be extraordinarily large to draw conclusions. A grade-school teacher might have 25 students and teach several subjects to them, every year. If it ends up that you need 10 years of data to get to statistical significance, then that would accord with my first intuition that variation ends up making too much difference. 10 years is 250 kids, and that might let you start to control for socioeconomic factors, and confounding variables.

I recognize the analysis is its own thing, and it tells you what it tells you. My intuition was that the annual data sets would tell you that there was not a lot to conclude about individual teachers. And then schools systems need to have the teacher factor removed, and control for socioeconomic factors again. And of course everyone will try to game the system for better metrics as they move along.

Standardized Tests: Have we gone too far?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Some thoughts about self-education

Questions about a parallel plate capacitor apparatus for lab experiments

Where Can Teachers Access and Collect Classic PSSC Physics Films from the 1960s?

How can we teach students the difference between sequences and series?

How to teach errors and uncertainty in an undergraduate setting?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight