ArXiv crackpot filter developed by accident

mfb · May 19, 2016

A very interesting blog post (from @hossi).

A program that helps sorting arXiv submissions into categories frequently struggles with crackpot submissions - because they do not fit in anywhere. The program was never designed for it, but is helps finding them.

Drakkith · May 19, 2016

Haha! Nice!

DennisN · May 19, 2016

mfb said:

A very interesting blog post (from @hossi).

A program that helps sorting arXiv submissions into categories frequently struggles with crackpot submissions - because they do not fit in anywhere. The program was never designed for it, but is helps finding them.

Fun! And interesting too...

fresh_42 · May 19, 2016

I almost automatically have to think about the documentation of Darwin's Life I saw today. Or this Japanese mathematician whose proof nobody else in the world can understand. (Sorry I've forgotten name and conjecture.) And Einstein has been lucky that a solar eclipse came around. And his "biggest stupidity" has now a value.

Psinter · May 19, 2016

mfb said:

A very interesting blog post (from @hossi).

A program that helps sorting arXiv submissions into categories frequently struggles with crackpot submissions - because they do not fit in anywhere. The program was never designed for it, but is helps finding them.

Nice indeed. Over time I have come to loath crackpottery because it does so much damage to those who are still learning. They come to believe things that haven't been proven, in occasions can be proved wrong, and sometimes are extremely biased toward a subject with more crackpoterry.

However, this is another reason to stick to my language when writing a scientific paper. Otherwise I think I would fall on that spit out category, even if I'm not, just because English is not my first language and if I were to write a paper in English, I would 100% certainly use words not used by native English speakers. Not because I'm not trained in science, but because I'm not natively trained in English.

Interesting to note is that once when I took an IQ test at a doctor, the IQ test was in my language and I scored above average. Yet, in the crappy online IQ tests, that are given in English, I score average and in ocassions even lower than average because they are in English and sometimes I don't even understand the instructions correctly (what they are asking me to do). So in my language I'm above average, in English, not over and sometimes even lower than average. Another reason for me to believe that online IQ tests are biased in favor of some groups at the expense of other groups.

So sorry before hand my PF fellows if sometimes I sound below average in my posts.

[PLAIN]http://backreaction.blogspot.de/2016/05/the-holy-grail-of-crackpot-filtering.html said:

It[/PLAIN] doesn’t surprise me much – you can see this happening in comment sections all over the place: The “insiders” can immediately tell who is an “outsider.” Often it doesn’t take more than a sentence or two, an odd expression, a term used in the wrong context, a phrase that nobody in the field would ever use. It is only consequential that with smart software you can tell insiders from outsiders even more efficiently than humans.

It is therefore also consequential that a non native English speaker runs higher chances of being classified as an outsider by an English speaking community. :sorry:

Not because the person doesn't know science, but because the person doesn't communicate the same way, even if they apply the same scientific method.

mfb · May 20, 2016

Most scientists learn English as a foreign language. Scientific English is not so hard to learn I think. You don't want to use complicated grammar or unusual words* anyway.

*unusual as in "not used in everyday language AND not a scientific expression".

fresh_42 · May 20, 2016

mfb said:

Most scientists learn English as a foreign language. Scientific English is not so hard to learn I think. You don't want to use complicated grammar or unusual words* anyway.

*unusual as in "not used in everyday language AND not a scientific expression".

I've once been told by a scientist: (with a staccato accent) "Scientific English is broken English."

MathematicalPhysicist · May 26, 2016

mfb said:

Most scientists learn English as a foreign language. Scientific English is not so hard to learn I think. You don't want to use complicated grammar or unusual words* anyway.

*unusual as in "not used in everyday language AND not a scientific expression".

I came across this week while reading Principles of Algebraic Geometry by Griffiths and Harris to the word "abut", I never encountered this word in my life.

So I guess that also scientific English uses quite a lot unusual words; it depends on the period of time the book was publsihed; if it's further to the past the language will differ from nowadays English.

ShayanJ · May 26, 2016

Psinter said:

It is therefore also consequential that a non native English speaker runs higher chances of being classified as an outsider by an English speaking community.

Not because the person doesn't know science, but because the person doesn't communicate the same way, even if they apply the same scientific method.

As someone knowing English as a second language, I've never encountered such a thing! I've been here since high school and my English wasn't as good as it is now surely. I even can say that I got better at English partly because of my involvement with physicsforums and during all these years, I don't remember being labeled a crackpot because of not being good enough at English. So I'm sure you don't need to worry about this.
When you regularly read scientific writings, either forum posts or papers, you can easily recognize a crackpot. I myself have experienced it several times. After even reading a sentence or two of the post, I said to myself this guy is surely a crackpot and it had nothing to do with the way s\he used English. In fact as far as I can remember, all of the crackpots I recognized were native speakers.

Planobilly · May 26, 2016

mfb said:

Most scientists learn English as a foreign language. Scientific English is not so hard to learn I think. You don't want to use complicated grammar or unusual words* anyway.

*unusual as in "not used in everyday language AND not a scientific expression".

The fact that other people may not have a good command of a certain language should not preclude the use of less than common words. This is especially true if the word in question has a highly defined meaning. It should be self evident that one should give consideration to the intended recipient of a communication and select words accordingly.
I do not think it is appropriate that I should limit your vocabulary due to my lack of understanding of the language being used.

Cheers,

Billy

mfb · May 26, 2016

That was not my point. Taken randomly from a random theoretical particle physics preprint on arXiv:

Using concrete examples, we demonstrate the most sensitive channels and relevant bounds, as well as the required integrated luminosity to rule out particular models explaining the diphoton excess. For concreteness, we assume throughout the paper that the resonance is a scalar singlet under the SM gauge group. Hence, its interactions with SM particles are captured at leading order by a set of dimension-5 operators suppressed by a new physics scale ##\Lambda## [11]. We further assume that the new resonance does not mix with the SM Higgs boson, as existing and projected limits from Higgs coupling measurements set strong indirect constraints [12].

You don't want to write literature - the physics is complicated enough, adding complicated grammar doesn't help anyone. And you also want to use words with a clear meaning to everyone, which usually means you use the words everyone else uses. I'm sure you can find different ways to say "leading order", for example, but why should you? There is certainly a synonym for "resonance", but everyone calls it resonance because everyone knows what everyone else means by that word.

Taken from a random crackpot webpage:

Energy and solid material are night and day. One is never like the other. Yet here, they say you get what is widely believed to be physical matter that does not behave the same in all frames of reference. They say we're all seeing solid material behaving like energy -- This mystery cannot be. It slaps us all across the face, demanding satisfaction, demanding a sensible explanation to prevent our imaginations from getting in the way of what is really happening. Here tiny bits of physical matter appear to imaginative theories to not act the same in all frames of reference. They say solid material creates a pattern of waves and no fact explains why, only theory; only guesses, through which our imagination's desperate appeals for attention and affirmation thrive in the face of no sustained argument to the contrary -- science fails where over imagination takes root.

See the difference?
That is not literature either, but the choice of words is completely different.

Planobilly · May 26, 2016

Hi mfb,
I don't think we are in any disagreement. Most anyone not directly involved in physics would likely not understand the word "diphoton" used in the first text for example. I myself have only a very limited understanding of that word and I assume it means a resonance particle.

My comment was less about syntax and more about the use of words that clearly define something in standard language. For example, the word "bruise" is a word most anyone would understand and communicates well enough in some cases. It is not as descriptive as the term "subcutaneous hematoma" a word that many people would consider perhaps unusual.

I 100% agree with your example of the use of the word resonance, why would one call that term anything else. I don't see any synonym that I could use to replace that word. I also agree that the use of "literary" devices have little or no place in the language of science. While the phrase "the force that through the green fuse drives the flower" may produce a smile it would get in the way of a someone understanding biology...lol

I had little issue understanding both of the text you posted but I don't think either one was very well written. Both communicated. The first one was based on the laws of physics as we currently understand them. The second was pure speculation and opinion with no basis in fact written in a manner to elicit emotional response.

Real "crackpots" are easy to spot. They have a personal agenda inconsistent with generally excepted facts and are highly emotionally involved in their beliefs. On the other hand it is sometimes very difficult to spot people who are unformed and are stating something as fact. I hear so call subject matter experts on the TV news and in print and the internet, that have very little idea what they are talking about. This includes very well educated people who should know better. These folks are more problematic than the crackpots.

More than a crackpot algorithm we need a fact checking algorithm. Until that happens those people who actually know and understand things will be burdened with the responsibility of exposing and correcting the nonsense that gets written by crackpots and the unformed alike.

Cheers,

Billy

TeethWhitener · May 26, 2016

mfb said:

That was not my point. Taken randomly from a random theoretical particle physics preprint on arXiv:You don't want to write literature - the physics is complicated enough, adding complicated grammar doesn't help anyone. And you also want to use words with a clear meaning to everyone, which usually means you use the words everyone else uses. I'm sure you can find different ways to say "leading order", for example, but why should you? There is certainly a synonym for "resonance", but everyone calls it resonance because everyone knows what everyone else means by that word.

Taken from a random crackpot webpage:See the difference?
That is not literature either, but the choice of words is completely different.

This is one of the best examples I've ever seen.

1oldman2 · May 26, 2016

http://www.nasa.gov/feature/nasa-response-to-recent-paper-on-neowise-asteroid-size-results
Not crackpot but...

e.bar.goum · May 26, 2016

mfb said:

Most scientists learn English as a foreign language. Scientific English is not so hard to learn I think. You don't want to use complicated grammar or unusual words* anyway.

*unusual as in "not used in everyday language AND not a scientific expression".

Indeed. As a native English speaker, I've had to make a definite effort to simplify my choice of words and sentence structure for Scientific English.

BiGyElLoWhAt · May 27, 2016

You know what? I actually really like the conclusion. That's been something that I've been struggling with for a bit. It seems like a lot of stuff is really quick to get tossed out if it doesn't have support by Kaku, or Hawking, or ...

Drakkith · May 27, 2016

BiGyElLoWhAt said:

It seems like a lot of stuff is really quick to get tossed out if it doesn't have support by Kaku, or Hawking, or ...

What?

sophiecentaur · May 28, 2016

Can there really be any surprise about all this? If we used a machine like language to live our lives then we could only communicate simple machine like ideas. language is greater than this.
Science is not easy, even less at the cutting edge and we must expect confusion whenever everyday language is used as a description.
Maths does a good job, here.
I detect notes of complaint about the fact that non-native English speakers find difficulty and demands for non-mathematical explanations.
We are stuck with the way language is so rich and the way it is used. No complaints.

BiGyElLoWhAt · May 28, 2016

Drakkith said:

What?

I don't know, maybe it's just because I'm an undergrad and I'm basing this on personal experience (people within the department). Also, there seem to be some (in my opinion) interesting and neat theories that have fizzled out. Maybe there's a reason for that, as I'm not necessarily versed on any of these with any sort of depth.
What I was referencing was the last couple paragraphs:

Blog said:

Conventional science isn’t bad science. But we also need unconventional science, and we should be careful to not assign the label “crackpottery” too quickly. If science is what scientists do, scientists should pay some attention to the science of what they do.

Maybe it's, again, just not accessible to me as an undergrad, but I don't tend to see many "out there" theories. I even did a google search yesterday trying to find some, and the most "out there" thing that I came across was Unparticle theory.

He also referenced a stat that the rate of production of these "out there" theories has declined by about a percent, from 3 point something to 2 point something, and the given reason for this was conformism. Not sure if that was the researchers conclusion or the bloggers, though.

They found that having previously unlikely combinations in the quoted literature is positively correlated with the later impact of a paper. They also note that the fraction of papers with such ‘unconventional’ combinations has decreased from 3.54% in the 1980s to 2.67% in the 1990, “indicating a persistent and prominent tendency for high conventionality.”

mfb · May 28, 2016

I guess 2.67% of 1990s' publication rate is still more than 3.54% of 1980s' publication rate. Science is getting more complex, so you need more and more papers to cover the same fields and also new fields. You can't have something like special relativity in every new publication.

BiGyElLoWhAt said:

but I don't tend to see many "out there" theories

They are "out there" for a good reason, and they are rare for the same good reason.

BiGyElLoWhAt · May 28, 2016

That very well may be. I suppose I would just like to see more of it. Of course, with rigor and all of the necessary things, but crazy ideas like the invariance of the speed of light.

Nugatory · May 28, 2016

BiGyElLoWhAt said:

It seems like a lot of stuff is really quick to get tossed out if it doesn't have support by Kaku, or Hawking, or ...

For example?

BiGyElLoWhAt · May 28, 2016

Nugatory said:

For example?

I know generalizing is always bad, but I was pretty much generalizing to anything that isn't generally accepted. As an example, let's just drop any type of preon theory. I found those via the internet, and they seem to make sense (to me). However, there are many things like this that seem to have been "swept under the rug", so to speak. Again, this is based on my personal experience in my physics department, and the general attitude of other students as well as the profs, as well as what gets talked about in class and out.

I suppose I should mention that I'm sure there were at least some problems with it, either not coinciding with what we have (I know I myself couldn't resolve Rishon theory with the generation of mesons conceptually), or other reasons. I just feel like, if nothing else, these theories have value in a timeline sort of way, a history lesson so to speak. Not to derail this further, but I read a very recent psych article where they showed one group of physics students a bunch of failure stories from Einstein, Galileo, and Curie. Those students' marks went up w.r.t. the control and the students who were read success stories. This is the article. One important detail, is that the primary attribute was motivation, but improved scores are improved scores, IMO.

Dr. Courtney · May 28, 2016

Well, that certainly explains why some of my ballistics papers are often set aside for a week or so for moderation. I think the computer may be spitting them out because there are so few submissions in ballistics among the arXiv. Once a human looks at it, it makes sense and they quickly discover that I am one of the most widely published physicists in ballistics in the past decade, but there are simply too few submissions in ballistics for the computer to make sense of.

Something different is going on in Physics Education (physics.ed-ph). We actually had a paper that had already been accepted and appeared in an educational journal (Eur J Phys) reclassified as popular physics (physics.pop-ph). Still, we got a lot of great press on that one, with a mention in the MIT Tech Review blog and Physics World. See:

https://arxiv.org/ftp/arxiv/papers/1305/1305.0966.pdf

Dale · May 29, 2016

BiGyElLoWhAt said:

He also referenced a stat that the rate of production of these "out there" theories has declined by about a percent, from 3 point something to 2 point something, and the given reason for this was conformism.

You appear to be substantially misunderstanding what the author is talking about in the referenced statistic. He is not talking about crackpots, he is talking about scientific authors who use unusual combinations of references. In other words, authors who establish new connections between existing disciplines. Crackpots generally don't even know the literature of a single discipline, let alone multiple.

ShayanJ · May 29, 2016

BiGyElLoWhAt said:

He

Dale said:

He

The author is a female!

Dr. Courtney · May 29, 2016

To me, the most interesting part of the linked blog are these two paragraphs. Since they are worthy of much more discussion, I'll quote them here and add some comments:

Indeed, if you look at the scientific enterprise today, almost all of its institutionalized procedures are methods not for testing hypotheses, but for filtering hypotheses: Degrees, peer reviews, scientific guidelines, reproduction studies, measures for statistical significance, and community quality standards. Even the use of personal recommendations works to that end. In theoretical physics in particular the prevailing quality standard is that theories need to be formulated in mathematical terms. All these are requirements which have evolved over the last two centuries – and they have proved to work very well. It’s only smart to use them.

But the business of hypotheses filtering is a tricky one and it doesn’t proceed by written rules. It is a method that has developed through social demarcation, and as such it has its pitfalls. Humans are prone to social biases and every once in a while an idea get dismissed not because it’s bad, but because it lacks community support. And there is no telling how often this happens because these are the stories we never get to hear.

To me, it is a shame that hypothesis filtering so often comes down to variations of "expert opinion" rather than hard data. I try hard to use the following rules and process:

1. What testable predictions does the hypothesis make?
2. What testable predictions may already have support in reliable published data?
3. What testable predictions are already disproven or have doubt cast upon them in reliable published data?
4. Does the available data make the room for the hypothesis to be true so narrow that it's truth is unlikely?
5. Is the hypothesis more elegant, simpler, or more general than existing explanations of the data?
6. How can new experiments be performed that provide more direct tests of the hypothesis and its predictions than existing data?
7. Is the cost and effort of these new experiments justified by the possibility of learning something new and interesting by their execution?

Some of my best accomplishments as a scientist have been recognizing when ideas were rejected (or accepted) based on expert opinion, but considerable effort on my part to find hard data disproving (or supporting) the ideas failed to turn up the hard data, but devolved into something of a circular argument of the experts citing each other in support of their views rather than citing data.

Carpe datum.

Sure, there is a lot of crackpottery out there. But there is also a substantial amount of argument from authority attempting to dismiss new ideas as crackpottery in order to avoid serious consideration based on the seven questions above.

mfb · May 29, 2016

Dr. Courtney said:

I try hard to use the following rules and process:

Those are the components of the "expert opinion" you don't like. The evaluation of those bullet points is not always completely obvious, therefore different experts can disagree on their evaluation.

Dr. Courtney · May 29, 2016

mfb said:

Those are the components of the "expert opinion" you don't like. The evaluation of those bullet points is not always completely obvious, therefore different experts can disagree on their evaluation.

Sometimes they are, sometimes they are not.

They are only believably the components of the "expert opinion" when the experts clearly explain their reasoning and cite the original sources of published data upon which they are basing their opinions.

I've gotten in the habit of looking hard at expert opinions to see if there is hard data backing them up or not. In many cases, it has turned out that the "expert opinions" were only citing other experts and/or competing hypotheses rather than directly applicable hard data. In other cases, it has turned out that the data they cited was ambiguous and did not really represent as compelling disproof of the hypothesis as suggested by the experts.

In still other cases, the experts are confounding absence of supporting evidence with evidence of absence. The claim, "If the hypotheses were true, surely someone would have noticed by now" usually needs careful attention. Was the predicted or expected effect being looked for? Could the experiments reasonably be expected to have seen it given their sensitivity and accuracy? Is there really hard data to falsify the predictions of the hypothesis? Null results need careful scrutiny.

It's not that I don't like expert opinion, it's that I work hard to recognize when it is based in data and when the experts are pulling a shell game. If the experts fail to meet their burden of citing the data disproving the hypothesis, reducing its likelihood, or reducing its range of applicability, then the rest of the community should keep an open mind to the hypothesis until they do or until other relevant evidence (hard data) is presented. Citing the expert rather than the data is intellectually lazy and prone to propagate the human errors of which all experts are capable.

arivero · May 29, 2016

mfb said:

The program was never designed for it, but is helps finding them.

This concrete program was not designed for it, but of course arxiv has always weaved crackpot-catching nets.

ArXiv crackpot filter developed by accident

Discussion

Housecat Reproduction

"A good big man will beat a good little man" (boxing)

Looking for movies where mathematics actually matters

Point of particular Doonesbury strip?

April Fool's Physics Papers 2026

Recent Hard SciFi

RIP Dave Mason (79), May 10, 1946 - April 19, 2026

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect