Standardized Tests: Have we gone too far?

  • Thread starter Thread starter micromass
  • Start date Start date
AI Thread Summary
The discussion highlights concerns about the excessive reliance on standardized testing in the U.S. education system, with many feeling it detracts from genuine learning and understanding. Participants express frustration over the pressure on students and teachers to perform well on these tests, often at the expense of deeper educational goals. There is a call for a significant overhaul of the testing system, as current practices are seen as promoting rote memorization rather than critical thinking. Additionally, the debate touches on the varying testing requirements across districts and the implications for educational equity. Overall, there is a growing sentiment against standardized testing, with calls for more meaningful assessment methods.
  • #51
rollingstein said:
True. Even a madman can speak out silvers of sense sometimes.

I guess the point behind a comedian is "Don't take me seriously". So although, you are right that just because Oliver says the "Sun rises in the East" that doesn't automatically make it false.

But OTOH, a comedian isn't the source you ought to be looking for your dose of facts. Though it doesn't harm if he motivates you to look deeper into something.
That's not at all the argument I'm making. I'm saying that Jon Oliver is likely to get it right because of the research that his staff is doing to put the story together. It's obviously not at the level of science, but unless you're actually studying the science on this (if it exists at all), you will have a hard time finding a better source.

Edit: I watched the clip again, and I should add that it's not hard to find some flaws in Oliver's reporting. In particular, why does it matter that when Pearson wanted to hire some test scorers, one of the places they advertised was craigslist? I also looked up the story about the hare and the pineapple, which according to Oliver "doesn't remotely work as a test question" and was so bad that he and his staff weren't able to answer the questions. There's nothing really wrong with the text. Some of the questions are kind of bad, but I found it easy enough to answer them all.

This illustrates one thing that's problematic when comedians do reports like this. If something can be made fun of, they can't resist, even if it means including a flawed argument in the report.
 
Last edited:
Science news on Phys.org
  • #52
Fredrik said:
That's not at all the argument I'm making. I'm saying that Jon Oliver is likely to get it right because of the research that his staff is doing to put the story together. It's obviously not at the level of science, but unless you're actually studying the science on this (if it exists at all), you will have a hard time finding a better source.

Okay. I just think it's unfair to quote him as a source because if he does get a fact wrong people can always say "Oh, but he's just a comedian".

Personally I like his show a lot and also Jon Stewart's too & they have very scathing commentary on a lot of the stupidities & incongruities of the world we live in. I just would not use them as a source.
 
  • #53
Evo said:
What I don't understand is why these tests are needed. Wouldn't the child's every day school work be the best indicator?
For individual teachers, perhaps. But what about when you want to compare the performance of students from different schools?
 
  • #54
rollingstein said:
Okay. I just think it's unfair to quote him as a source because if he does get a fact wrong people can always say "Oh, but he's just a comedian".

Personally I like his show a lot and also Jon Stewart's too & they have very scathing commentary on a lot of the stupidities & incongruities of the world we live in. I just would not use them as a source.

I don't think that's their goal either. He succeeded in starting a discussion on this forum. And that appears to be the goal to me.
It worked at least once so to say, all we can hope for is that this brings back critical thinking.
It's clear from social media (to me at least) that a lot of people just buy everything classical media feeds them.
 
  • Like
Likes micromass
  • #55
Andy Resnick said:
This is the essence of the "pro-test" argument. But the reality is that it's impossible to generate statistically significant results in the first place- there are too many uncontrollable variables (e.g. home life) that strongly impact student ability. Further, while there is growing agreement with what minimum content constitutes a 'proper' STEM course, this concordance has not yet been reached for subjects like history, languages, art, etc.

This, and also, as I said, in my school nobody takes it seriously. There are maybe 4 or 5 kids in the whole school who actually their best on the PARCC.
 
  • #56
Evo said:
What I don't understand is why these tests are needed. Wouldn't the child's every day school work be the best indicator? Many kids have test anxiety and will do worse on tests like this than in normal school work. Of course some kids hate school and don't do well even though they are very bright and capable but a special test isn't going to change that.

I think that one-on-one interaction with a teacher would certainly give as good (or better) indication of a student's competence than a test would. But I don't think that that happens on a regular basis. The problem with every day school work (to me) is this: What is the interpretation of homework that contains mistakes? Was the student just being careless, or does the mistake indicate some gap in the student's knowledge or skills that needs to be addressed? In mathematics and other "hard" sciences, the topics build on each other. If a student's understanding of, say, arithmetic, is faulty, then he is going to have trouble with fractions. If he has trouble with fractions, then he's going to have trouble with algebra, and with trigonometry and with calculus. So what I think happens (in my observation) is that children miss key concepts or skills at an early age, but still manage to do well enough to pass. But then each new mathematical topic puts them farther behind their peers until eventually they give up and decide mathematics is beyond their abilities. My feeling is that if gaps in understanding can be caught and addressed at the earliest moment, then this can be avoided.

I actually don't care about tests, but what I care about is preventing a certain kind of complacency on the part of students and teachers, which is for them to feel that, even though something is not understood, it's good enough, and it's time to move on to something else. I think it's bad for kids to have the gnawing feeling that they don't understand something that they are expected to understand, and that they have to pretend to understand it, because it would be too embarrassing to admit that they don't.

Ultimately, what I want to instill in students is an internal drive to understand things deeply, not to be satisfied with superficial understanding. Because the superficial understanding is just pretend understanding.
 
  • #57
Here is an actual example of the challenge of writing a good test question, taken from an hour exam I was given at Harvard in about 1963. The class was linear algebra and I had neglected to learn the tedious looking and unmotivated Gram Schmidt formula for reducing a spanning set to an orthonormal one. On the test however there was a quesion meant to measure command of this topic: "find a maximal orthonormal set in the space of polynomials of degree at most 2, with respect to the pairing <f,g> = f(0)g(0) + f(1)g(1)." The average student would begin with the standard basis { 1, x, x^2} and perform Gram Schmidt on it, getting some hideous answer. My paper had only a few stray marks and calculations on it for that question, and the professor initially gave me a zero for it.

Afterwards I pointed out to him that prominently displayed among those marks was the simplest possible maximal orthonormal set {x, 1-x}, and he was forced to raise my score to 20 points for that question. I just noticed that in his example, I only needed a couple of functions f,g that vanished respectively at zero or at one, and equaled one at the other point. I also had an intuition that two was a maximal number of orthonormal functions for that pairing. So his question seemed reasonable but failed to measure what he wanted it to in this case. This changed my grade from a D to a B, and for years after that I still was innocent of the general Gram Schmidt process, until in graduate school someone drew a picture showing how it is just a simple projection process. This ability to pass tests that I did not know the material for was actually quite harmful since it allowed me to bump along for years without learning anything before hitting the wall at a certain point and having wasted many years of "schooling". It also hurt me on placement tests when I regularly placed into classes well over my head.

I agree by the way that one on one oral exams are potentially the most accurate way to measure understanding, except however for the crucial fact that they can be very intimidating, and hence the nervousness factor can enter significantly and hinder a student's performance unfairly. I have had impatient professors bark at me when I hesitated at choosing the right words, "well if you don't know just say so!" and I would say well ok, I don't know, when actualy I did know, but was trying to phrase my answer precisely.
 
Last edited:
  • Like
Likes atyy
  • #58
A more common challenge for the professor than writing questions one cannot get right without knowing the material, is writing ones that a student cannot get wrong who does "know" the material. E.g. I gave a calculus class a problem to maximize the area of some figure subject to certain conditions and it came down to multiplying 13 by 65 to get the answer. No calculators were allowed and one B student did not know how to multiply two digit numbers without one, so she added up a column of thirteen 65's. I was scandalized at such basic ignorance and felt guilty at giving her full credit, since in my opinion, contrary to some expressed above, knowing how to do basic arithmetic and utilize positional notation, is much more important than say knowing differential calculus. You really cannot always google every question safely.

E.g. when purchasing a load of soil recently I measured my needs at 95 cubic feet and the salesman told me that would be about 1.5 cubic yards. I objected that since one cubic yard equals 27 cubic feet it should be more like 3.5. He said "no, I typed it into my calculator twice and it still says 1.5", so that was that. This conversation went on a depressingly long time in that vein until he eventually found another calculator on which he knew better how to manage the parentheses function and got the right answer. Out of fairness I might say he had actually taught me what calculation to do, he just didn't know how to multiply. So we made an adequate team eventually.

By the way, in controversies like this one, I have learned that jumping up and down and screaming "I have a PhD in mathematics!" does not help the situation.
 
Last edited:
  • #59
Well, mainly I am so bad at arithmetic I always use a calculator (yes, even my literature teacher complained to my parents that I have to use my fingers and toes).

Anyway, another interesting example is whether there are cases where all the logical up to the last step is right, but the final answer being wrong can be argued to indicate a gross lack of understanding. I have a friend who got zero marks on a physics problem, because he failed to include an "i" in the final answer, just out of carelessness. All his steps were right up to that point. The lecturer said a zero was justified, because an exponentially decaying or exploding solution is qualitatively different from an oscillating one, and that missing an "i" showed that he had not understood the physics at all. I think the lecturer went overboard on this, but I concede he had a point.
 
  • #60
atyy said:
<Snip>

Anyway, another interesting example is whether there are cases where all the logical up to the last step is right, but the final answer being wrong can be argued to indicate a gross lack of understanding. I have a friend who got zero marks on a physics problem, because he failed to include an "i" in the final answer, just out of carelessness. All his steps were right up to that point. The lecturer said a zero was justified, because an exponentially decaying or exploding solution is qualitatively different from an oscillating one, and that missing an "i" showed that he had not understood the physics at all. I think the lecturer went overboard on this, but I concede he had a point.
Unless this is advanced level, it seems unfair to expect a student put things in context with time constrains and nervousness of an exam. Besides, this can be corrected much more easily than structural misunderstandings.

When I taught as an adjunct, for undergraduate classes, I either assigned problems that did not require much thought, to be done in class, or, for problems requiring more thinking or creativity, I gave the assignments to be done at home. Does not seem to be fair at this level , to expect students to perform creatively while under pressure, under time constraint and nervousness.
 
  • #61
atyy, would you say giving a negative answer (with the correct absolute value) to a physics question whose answer must be a positive quantity would justify a zero score?

The whole question of how to assess something or someone correctly is a very hard one in my experience and requires diligent attention. The personal interview method is one that is used by people I respect in some technical business situations. E.g. I know someone with responsibility for recruiting, hiring, and when necessary firing, for a global high tech company. He always does all three of these tasks in person face to face, even if he has to fly 5 or 10 thousand miles to do so. Certainly he never hires anyone based on performance on a standardized written test.
 
  • #62
mathwonk said:
He always does all three of these tasks in person face to face, even if he has to fly 5 or 10 thousand miles to do so. Certainly he never hires anyone based on performance on a standardized written test.

Which makes a lot of sense. Standardized testing for the masses & for initial screening followed by more expensive, time consuming methods for the really important decisions.
 
  • #63
As an HS student, I can't agree more. I can feel the immense pressure of the need to perform well in these standardized tests. It's like everyone around me is emphasizing how much of a passport high scores are to a good college, but I can't help but feel that the focus is less on learning and more on blind analysis of performance under stress. Everyone is not built out of the same wood you know.
 
  • #64
mathwonk said:
atyy, would you say giving a negative answer (with the correct absolute value) to a physics question whose answer must be a positive quantity would justify a zero score?

My personal view is that as long as the steps are correct, I would only take off the mark for the final answer in a classroom test situation. Generally I would grade according to the marking scheme that is determined before the test.

Another friend of mine took an economics class and ended up with a complex profit in a test. She left the answer intact and got partial credit. It clearly didn't mean anything about her deep undertstanding of economics, since even a complete idiot knows a profit cannot be complex.

mathwonk said:
The whole question of how to assess something or someone correctly is a very hard one in my experience and requires diligent attention. The personal interview method is one that is used by people I respect in some technical business situations. E.g. I know someone with responsibility for recruiting, hiring, and when necessary firing, for a global high tech company. He always does all three of these tasks in person face to face, even if he has to fly 5 or 10 thousand miles to do so. Certainly he never hires anyone based on performance on a standardized written test.

I think it is impossible to have a fail-safe algrorithm in real life! Personal interactions and luck are so essential.

But most of the time, I don't know if one is trying to things that require luck in the classroom. Thinking a bit about your example of 13 X 65, I do find it a bit perturbing too. However, what part of arithmetic is really "essential"? To do the calculation by hand, one only needs to memorize 3 X 6 and 3 X 5, or more generally working in base 10, I guess one only needs the multiplication table up 9 X 9. Naively, is there anything deep about 9 X 9 = 81, or would it be ok to know how to set up the calculation by hand, but use a multiplication table?
 
  • #65
I want to repeat my point about one positive aspect to standardized tests, namely they allow anonymous students with no connections or social status, to make an argument that they belong up there with the privileged few. In the 1960's, SAT scores and the consequent merit scholarships they brought, sent hundreds of relatively poor boys from low socio economic families like mine, to schools like Harvard, where we met sons of famous wealthy people, and future national politicians and scions of business. And the test itself is relatively cheap and easy to prepare for, all you need is a $20 prep book with old tests in it to practice on. I.e. anyone can afford the test, anyone can afford the practice materials, and the benefit is a chance to compare yourself favorably with much more privileged students, and possibly get accepted at top schools with financial assistance. I.e. the very definition of a standardized test is one that tests everyone the same, and hence let's you compete against people you have never met and who go to schools you can't afford. This is essentially the only way a high school boy from Tennessee can show he is comparable in ability and potential to a prep school boy from Connecticut. This is a very useful tool for social advancement. But of course you have to prepare or you don't benefit.
 
  • Like
Likes rollingstein and atyy
  • #66
atyy said:
But most of the time, I don't know if one is trying to things that require luck in the classroom. Thinking a bit about your example of 13 X 65, I do find it a bit perturbing too. However, what part of arithmetic is really "essential"? To do the calculation by hand, one only needs to memorize 3 X 6 and 3 X 5, or more generally working in base 10, I guess one only needs the multiplication table up 9 X 9. Naively, is there anything deep about 9 X 9 = 81, or would it be ok to know how to set up the calculation by hand, but use a multiplication table?

To me, the critical thing about arithmetic is the relationship between addition and multiplication, understanding integer multiplication as repeated addition, and understanding multiplication of a pair of positive reals as the area of a rectangle. The specific times facts are not as important as understanding the meaning of place notation for decimals.
 
  • #67
TheDemx27 said:
Ha, I took the SAT last week, and will take the SAT Math 2 subject test and SAT physics subject test less than a few months from now. What annoys me the most is that the SAT claims to be an aptitude test, yet you can significantly improve your score by preparing for it.
Actually, they do not make that claim about the SAT anymore. SAT no longer stands for "scholastic aptitude test", it is simply the name of the test and does not stand for anything.
 
  • #68
I don't know if any of you remember what it's like being 17 and in school, but the majority of people would not study AT ALL if not from exam pressure. The sad truth is that you need a short term incentive for teenagers to learn, a long term incentive such as a "good job" is simply too far off to significantly motivate the majority of people.
 
  • #69
HomogenousCow said:
I don't know if any of you remember what it's like being 17 and in school, but the majority of people would not study AT ALL if not from exam pressure. The sad truth is that you need a short term incentive for teenagers to learn, a long term incentive such as a "good job" is simply too far off to significantly motivate the majority of people.

I think that's absolutely right. If you aren't challenged to do something (such as pass a test) with knowledge, it tends to go in one ear and out the other. I don't remember the reference, but there was a study that showed that frequent quizes on what someone has learned tends to improve his ability to recall what he learned.
 
  • #70
The simple truth is that most high school students do not care about the intrinsic value of whatever they are being taught (assuming there is any, I'm looking at you Samuel Beckett). There seems to be some grand fantasy that students are inquisitive angels, oppressed by "The Man" and his weapon of choice-the SAT.
 
  • Like
Likes TheDemx27
  • #71
HomogenousCow said:
The simple truth is that most high school students do not care about the intrinsic value of whatever they are being taught (assuming there is any, I'm looking at you Samuel Beckett). There seems to be some grand fantasy that students are inquisitive angels, oppressed by "The Man" and his weapon of choice-the SAT.

I think that a certain amount of drive to understand does exist in most (all?) people. Outside of the realm of academics, people are driven to understand how to win video games, they are driven to understand why Dumbledore told Snape to kill him, they are driven to understand why this batch of cookies turned out worse than the last batch. People really do have a drive to understand. The problem (for school purposes) is that the average person, if led by his own curiosity, would get around to understanding how to do calculus some time like never. What most people are naturally curious about isn't what schools want to teach.
 
  • #72
stevendaryl said:
I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant? <snip>

It goes to the most basic reason for tests: the purpose of a test is to measure "something" (more on this later). So you start with some average score and then try and figure out interventions that result in higher test scores. And this is the basic problem- trying to establish statistically significant results showing how some specific intervention (just in time teaching, flipped classroom, thank-pair-share, problem-based learning, etc.) results in improved test scores. That's done (at best) by having the identical instructor teach multiple sections, some of which are negative controls and some are with the intervention. That assumes that every section is made of interchangeable students, for example. And then the process has to be repeated to ensure reproducibility. And then the intervention has to be performed at different schools, using different teachers. In the end, the statistical error associated with the study results is typically as large as the effect.

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/
 
  • #73
Andy Resnick said:
It goes to the most basic reason for tests: the purpose of a test is to measure "something" (more on this later). So you start with some average score and then try and figure out interventions that result in higher test scores. And this is the basic problem- trying to establish statistically significant results showing how some specific intervention (just in time teaching, flipped classroom, thank-pair-share, problem-based learning, etc.) results in improved test scores. That's done (at best) by having the identical instructor teach multiple sections, some of which are negative controls and some are with the intervention. That assumes that every section is made of interchangeable students, for example. And then the process has to be repeated to ensure reproducibility. And then the intervention has to be performed at different schools, using different teachers. In the end, the statistical error associated with the study results is typically as large as the effect.

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

I'm not going to argue about whether there even is such a thing as "aptitude", much less whether SAT measures it. However, it seems to me that if someone is tested on vocabulary (for instance), you can find out whether that person knows what a word means. If someone is tested on solving algebraic equations in one variable, you can find out whether they know how to do that. It certainly may be the case that finding out that someone doesn't know how to solve an equation in one variable doesn't necessarily say what should be done about it.

I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.
 
  • #74
stevendaryl said:
I'm not going to argue about whether there even is such a thing as "aptitude", much less whether SAT measures it. However, it seems to me that if someone is tested on vocabulary (for instance), you can find out whether that person knows what a word means. If someone is tested on solving algebraic equations in one variable, you can find out whether they know how to do that. It certainly may be the case that finding out that someone doesn't know how to solve an equation in one variable doesn't necessarily say what should be done about it.

I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.

Just to make it clear, I'm not in favor of tests as "measurements". I don't think that an overall numeric result, from 0 to 100 or 0 to 1500 (or whatever the range is for SATs) means much at all. But the fact that a student is able or unable to answer specific questions certainly is meaningful. I favor tests as diagnostics or assessment, not as measures of quality of the student.
 
  • #75
stevendaryl said:
<snip>I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.

But you would hopefully agree that there should be a reason to test and that the test should evaluate how effectively students achieve some specified learning objective.

Learning involves much more than rote memorization. A good example of a (relatively) new standardized test is the Force Concept Inventory:

http://www.flaguide.org/tools/diagnostic/force_concept_inventory.php

This is the key innovation: "Each question offers only one correct Newtonian solution, with common-sense distractors (incorrect possible answers) that are based upon student's misconceptions about that topic, gained from interviews."

This exam is an attempt to directly measure student learning, not simple recall.
 
Last edited by a moderator:
  • #76
PWiz said:
As an HS student, I can't agree more. I can feel the immense pressure of the need to perform well in these standardized tests. It's like everyone around me is emphasizing how much of a passport high scores are to a good college, but I can't help but feel that the focus is less on learning and more on blind analysis of performance under stress. Everyone is not built out of the same wood you know.

But this stress mimics real life situations pretty well. Most jobs you get will test your descision making skills under pressure.

So might as well embrace the testing stress & try to thrive & perform better under stress. And I can tell you that the skill is learned to a large extent. The more tests you give the less the stress will impact your scores adversely.

Yes, not everyone is built of the same wood, but that is partly what the tests are trying to discern.
 
  • #77
Andy Resnick said:
But you would hopefully agree that there should be a reason to test and that the test should evaluate how effectively students achieve some specified learning objective.

To the extent that there is a reason to teach something at all, there is a reason to see whether you've accomplished it. If you don't care whether a student learns arithmetic, why teach it?

Learning involves much more than rote memorization. A good example of a (relatively) new standardized test is the Force Concept Inventory:

http://www.flaguide.org/tools/diagnostic/force_concept_inventory.php

This is the key innovation: "Each question offers only one correct Newtonian solution, with common-sense distractors (incorrect possible answers) that are based upon student's misconceptions about that topic, gained from interviews."

This exam is an attempt to directly measure student learning, not simple recall.

Yeah, sure. Tests shouldn't test memorization. (Or at least shouldn't test ONLY memorization. It's possible that there is a benefit to committing some things to memory.)
 
Last edited by a moderator:
  • #78
stevendaryl said:
What most people are naturally curious about isn't what schools want to teach.

Should schools be teaching what kids are curious about or what is more likely to be useful to them to earn a living or to contribute to the skillset that society demands from them?

I think you are perfectly right that kids tend to be naturally driven & curious about certain things. But the whole point behind schooling & discipline is to teach people stuff they may not enjoy doing on their own but they ought to know. And funnily enough there are a range of activities that are not enjoyable in their initial learning curve that subsequently do become enjoyable & it takes something like school to take you through that initial uggh drudgery.

No one every enjoyed learning multiplication tables. But having learned them we now find them pretty useful.

I think one of the under-appreciated functions of school is to get us to endure the boredom of things that we do not like to do.
 
  • #79
rollingstein said:
Should schools be teaching what kids are curious about or what is more likely to be useful to them to earn a living or to contribute to the skillset that society demands from them?

I'm not making a claim about that, I'm just pointing out that the fact that students don't naturally want to lean what is taught in schools does not mean that they aren't naturally curious and driven to understand things. Just that there is a mismatch between what they are curious about and what schools teach. That might be inevitable.
 
  • Like
Likes rollingstein
  • #80
stevendaryl said:
I'm not making a claim about that, I'm just pointing out that the fact that students don't naturally want to lean what is taught in schools does not mean that they aren't naturally curious and driven to understand things. Just that there is a mismatch between what they are curious about and what schools teach. That might be inevitable.

Ah ok. My bad then. I thought you were being critical of schools for teaching what they teach.
 
  • #81
stevendaryl said:
Just to make it clear, I'm not in favor of tests as "measurements". I don't think that an overall numeric result, from 0 to 100 or 0 to 1500 (or whatever the range is for SATs) means much at all. <snip>

But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.
 
  • #82
Andy Resnick said:
But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.

I accept your point. But part of the video and discussion in this topic was related not only to the volume/usefulness of these tests but the quality.
As far as I can tell a lot/some of these tests require signing an agreement stating that you will not discuss the questions.
Someone also mentioned the problem with quality from personal experience with plain wrong questions in there.

Another point I find interesting is the example in the video where a student was expected to get more than 100% (486 points was the goal with only 483 points available I believe)
How can such expectations be assessed? Clearly the method of measurement as a whole has some serious flaws (using the language of physics).

At this point the discussion can take several directions e.g. are (well-designed) standardized tests useful, How can we assure well-designed tests, ...
 
  • Like
Likes micromass
  • #83
JorisL said:
As far as I can tell a lot/some of these tests require signing an agreement stating that you will not discuss the questions.

Has anyone gotten sued for discussing questions from the SAT / ACT etc? Just curious. Otherwise I think we should just disregard those agreements as unenforceable BS boilerplate. I would like to see the legal precedent on this.

You can put what you want in an agreement but ultimately you've got to find a sympathetic court that will enforce it. I want to see a jury find a kid guilty for violating Pearson's test confidentiality.

JorisL said:
Someone also mentioned the problem with quality from personal experience with plain wrong questions in there.

What test doesn't have some wrong questions on it?!
 
  • #84
rollingstein said:
Has anyone gotten sued for discussing questions from the SAT / ACT etc? Just curious. Otherwise I think we should just disregard those agreements as unenforceable BS boilerplate. I would like to see the legal precedent on this.

You can put what you want in an agreement but ultimately you've got to find a sympathetic court that will enforce it. I want to see a jury find a kid guilty for violating Pearson's test confidentiality.

Well do you see people discussing questions anywhere? The post I referenced is post #28 by Fredrik, check it out. They hide behind the agreement.
And it works, which kid thinks about this stuff? They probably just keep it going long enough for the other party to either get fed up or out of money.

Point is its rotten, and if they feel their position is threatened, I'm certain something will happen.
Money makes things happen, remember?

rollingstein said:
What test doesn't have some wrong questions on it?!

And which teacher doesn't agree when you explain in detail why it is wrong? More importantly how can you without risk of prosecution get a second opinion?

Even when there are vague parts of a question you either ask for clarification or get back to the teacher afterwards if this is somehow impossible.

Finally, these are standardised tests.
Isn't the written test you take when getting your drivers license standardised? Here it is, I never ever heard of an error in the tests even right after changes in the law (it is easier for errors to slip in one would think).
Point is if a lot of students take these tests isn't it absolutely necessary to check, double-check, ... the tests?
Also I'm sure they have a lot of questions used for many years if not decades. Shouldn't those be 100% correct?
 
  • Like
Likes micromass
  • #85
I should perhaps clarify that my experience isn't with the tests made for school kids. It's with a couple of professional certification exams (made by one of the Pearson companies). What they can do if you violate the agreement is to kick you out of the certification program. Now you have spent at the very least a few hundred dollars (possibly many thousands, if you took classes or bought hardware to practice on), and you're no longer certified, even though you passed the exam. I don't know what they would do if a high school kid would violate the confidentiality agreement.

Even if you want to violate the confidentiality agreement, it's pretty difficult to do that, especially for the kind of tests I did. It's difficult to remember the questions exactly, and you have to hand in all the notes you've made at the end of the exam. You're not even allowed to erase them. You're also not allowed to look at a question again once the test is over, not even to provide feedback about possible issues with a question. And you don't really have time to try to memorize the questions. The exam I did was extremely difficult to complete on time. You basically had to cheat, or remember what you did last time you took the test.

What I found especially bizarre about my experience was that they were completely unwilling to discuss any specific points I had made. Instead of trying to refute my arguments, they just said that there's nothing wrong with the test, even though I know for sure that some of the questions were bad.

It certainly seems to me that (at least with these professional exams), the rules are in place to ensure that they don't have to make the tests good.
 
  • #86
Andy Resnick said:
But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.

If we are talking about SAT or ACT, then I agree with you that the point is to get a numerical score for the purpose of comparison between students. But is the word "standardized test" limited to those sorts of measurements?

To give you a counter-example: You can go to a website such as: http://www.sheppardsoftware.com/African_Geography.htm to test your knowledge of the countries in the continent of Africa. Now, you might object that there is no reason to know the names, locations and capitals of the countries in Africa, but it's just a simple example of knowledge that can be tested through a standardized test. The point of such a test is certainly NOT to compare the student to other students. It is NOT to come up with a numerical score: 0 to 100 (what percentage of the countries in Africa can you name). The point of such a test is to see if you DO know the countries in Africa. If you get them all right, then you do. If you miss even one then you don't. You can retake the test as often as you like, until you get 100%. Then you know all the countries in Africa (well, at least until you forget them).

To me, the proper goal of a test is to assess how well a student understands a subject. That can be done, at least with some subjects, using standardized tests.

Now, the SAT has all these questions that are not actually about understanding a subject, but seem to be some kind of measurement of mental fitness. It's been a long time since I've taken it, but back in the day, there were questions along the lines of:

Here is a sequence of pictures. Based on the pattern, what is the next picture in the sequence?

There were questions along the lines of:

Mustard is to hot dog as pickles are to what?

These questions were sort of interesting to me, because it was a challenge to figure out what the test-creators had in mind. I was pretty good at that sort of thing, but I'm not convinced that there is a strong point in asking those types of questions.
 
  • Like
Likes PWiz
  • #87
stevendaryl said:
<Snip>:

Here is a sequence of pictures. Based on the pattern, what is the next picture in the sequence?

There were questions along the lines of:

Mustard is to hot dog as pickles are to what?

These questions were sort of interesting to me, because it was a challenge to figure out what the test-creators had in mind. I was pretty good at that sort of thing, but I'm not convinced that there is a strong point in asking those types of questions.

Ah, yes, what is _the_ next figure. If you do not fit into their narrow world/experiential view, you are wrong. Same with sequences of numbers, other than obvious ones like 1,2,3,4,... I am remembering the phrase " limit your imagination, keep you where they must".
 
  • Like
Likes PWiz
  • #88
Andy Resnick said:
Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/

(The article that you linked just shows that kids of richer parents get better score, without analysing any other factors like aptitude or heredity of IQ. I'm not saying that you are wrong, I'm merely pointing that the source that you linked does not prove your point)

Fredrik said:
I should perhaps clarify that my experience isn't with the tests made for school kids. It's with a couple of professional certification exams (made by one of the Pearson companies). What they can do if you violate the agreement is to kick you out of the certification program. Now you have spent at the very least a few hundred dollars (possibly many thousands, if you took classes or bought hardware to practice on), and you're no longer certified, even though you passed the exam. I don't know what they would do if a high school kid would violate the confidentiality agreement.

Even if you want to violate the confidentiality agreement, it's pretty difficult to do that, especially for the kind of tests I did. It's difficult to remember the questions exactly, and you have to hand in all the notes you've made at the end of the exam. You're not even allowed to erase them. You're also not allowed to look at a question again once the test is over, not even to provide feedback about possible issues with a question. And you don't really have time to try to memorize the questions. The exam I did was extremely difficult to complete on time. You basically had to cheat, or remember what you did last time you took the test.

What I found especially bizarre about my experience was that they were completely unwilling to discuss any specific points I had made. Instead of trying to refute my arguments, they just said that there's nothing wrong with the test, even though I know for sure that some of the questions were bad.

It certainly seems to me that (at least with these professional exams), the rules are in place to ensure that they don't have to make the tests good.

They behave even better than infallible beings that my gov put into examination boards. Except that now such boards in my country are being challenged as unconstitutional.
 
  • #89
Andy Resnick said:
Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/

The data in that article is (are?) interesting...

Certainly I would think writing would scale with aptitude or intelligence. If one ignores spelling or simple grammar mistakes, the quality of writing is going to scale with ability. And writing is surely something that can't be standardized, it needs someone to mark it. So it seems reasonable prima facie to be able to look at the writing scores and read off how ability trends with wealth.

And looking at the data, writing scores ramp to $100k, are flat to $200k, and ramp thereafter. It's reasonable to assume those >$200k households have children who were privately tutored or have parents who are doctors, they are the geniuses and score very well. But we see that most middle class households score the same. I think it's reasonable to suppose that the sub $100k households are predominantly in poorer neighborhood and have schools that aren't as good, or that there could be a language bias in the writing scores for poorer households. So there's no real evidence from the writing scores that ability trends with wealth.

Reading is the same: a ramp to $100k, flat to $200k, ramping thereafter. It's clear the reading questions are sufficiently elementary that the same is true, any language bias only shows up in poorer households.

Math however is a consistent ramp, rising to the right. We know that this isn't measuring ability because it differs from the writing and reading scores. And math performance in general is contingent on quality of teaching/schooling. So for me it says more about the quality of the schools than any kind of proportion between ability with wealth.

So I see no evidence in the data that there is a trend between ability and wealth. The claim that the SAT is more a measure of affluence than ability would seem to be on point.

I apologize, the math score is not a good measure of ability but the reading and writing scores seem to be pretty good; a decent score on the reading section is quite a reliable indicator of ability. Perhaps this is more toward the concept of emotional IQ.
 
Last edited:
  • #90
Interesting since one of the students in my friends class got stressed during class (during a proof on logic or something like that) and vomited. But we only take at most two midterms, and one final. I think the stress on a person depends on the student, rather than what the student is doing. What's the difference between standardized tests, and regular chapter tests like the ones I did in high school?

Also common core is fairly new. Of course the students that have to join the program abruptly will suffer, but students that will grow up with the common core system may do better in things like math and science. At least that is the idea.
 
  • #91
stevendaryl said:
<snip>

To give you a counter-example: You can go to a website such as: http://www.sheppardsoftware.com/African_Geography.htm to test your knowledge of the countries in the continent of Africa. <snip> It is NOT to come up with a numerical score: 0 to 100 (what percentage of the countries in Africa can you name). <snip>

I don't understand your point- taking those 'tests' absolutely results in a numerical score. Your comments regarding the SAT underscore my point that there is only partial agreement about how 'learning outcomes' can be tested in the first place. How does one design a test to evaluate how well a student has learned to fashion a logical argument? To critically read an editorial column?

This thread is about 'standardized tests'- not 'testing'.
 
  • #92
Andy Resnick said:
I don't understand your point- taking those 'tests' absolutely results in a numerical score.

Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.
 
  • #93
jbunniii said:
Which shows that the correct answer is ##7 \times 8 = 8! / 6!##
You could also denote it as:

708dx
 
  • #94
stevendaryl said:
Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.

I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students. And compare their teachers. And compare their schools. And this comparison is used to determine the funding received by those schools.
 
  • #95
Andy Resnick said:
I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students.

Being a standardized test means that the questions and answers are standardized. That's independent of whether it is used for self-assessment or for comparison between students, isn't it?
 
  • #96
stevendaryl said:
I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant? Certainly, tests can't accurately measure inherent ability, but that's only relevant if you're trying to use the test to decide a student's entire future. But if you're only trying to decide what courses the student should take next, and whether the student needs additional help in a subject, then I think a test can give you a lot of information about that. That's why I advocate lots of small, low-stakes tests. They would just be a snapshot of where the student is, academically, not some kind of Tarot reading of what they are capable of next year or 10 years from now.

Your point about external factors such as a home life that is not conducive to learning is very good, but I'm not sure how schools should address those kinds of inequalities, other than to give students lots of opportunities for extra help.

Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.
 
  • #97
votingmachine said:
Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.

That's an unreasonable comparison, the quality of the teachers does impact the test results of the students.
 
  • #98
votingmachine said:
But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system
Most people have the opinion that whatever metric is currently used to measure their performance is not representative of their performance.
 
  • Like
Likes PWiz and HomogenousCow
  • #99
There is another thing to consider that whatever metric is chosen skews the results in a certain way as people try to optimize their score for their performance appraisal thus invalidating the metric.

Dr. Deming often said that a metric shouldn't be tied to an individual's performance for that very reason.

Instead it should be used to discover those teachers who are naturally better at teaching so that you can learn from them and train other teachers to do the same.
 
Last edited:
  • #100
I agree that teacher quality matters and does impact the test results of students. What I said was INTUITIVELY, I think the data has too much randomness to allow easy statistical conclusions. The apocryphal story was clearly an exaggeration to show why we don't want to base process changes on bad data.

I also agree that people often think that whatever metric they are measured by misses some elusive qualities that make them special. But then again, some metrics DO miss the important thing. Adding testing is the frequent attempt to get a meaningful metric.

To draw conclusions from data, you need good data. And an understanding of the thing you are measuring. I might be wrong, and it may only take a dozen test scores and a single year to discover that a teacher needs more instruction on the craft of teaching. My perception is that it will take more scores and more time. But that is an intuitive perception, based on being a parent, and seeing student variation, seeing the occasional sick kid tested, and just having my own perception of populations and variations. There were comments here about how easy the tests were. Those comments also did not really endorse their teachers. But their test results would specifically support whatever the education system did.

I think tests can be a valuable part of measuring student performance and measuring system performance. But I think that bringing the tests into the system side needs to be done carefully. I thought that about the initial comment I was reading:

"I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant?"

The answer is that anything that increases the variance makes it harder to draw statistical conclusions. If one teacher has a class with test scores of 50, with a sigma of 20, and another has a class with test scores of 60, with sigma of 23, then the variations from other environmental factors makes comparison of the two teaching styles a bit difficult. One might be wildly better. Or it might be small datasets like:

50, 40, 30, 50, 60, 70, 80, 20
average=50, std=20

50, 40, 30, 50, 60, 70, 80, 100
average=60, std=23

I took out the worst student score and stuck in a bright score. Or maybe the 20 was a kid who was sick on testing day (and there are generally no excused absences).

I don't know if that is realistic. But I think that for results to be statistically significant, it will take some good data. And strictly INTUITIVELY, I think that is difficult to get quickly and easily.
 
  • Like
Likes Silicon Waffle
Back
Top