Can Mathematical Symbols Be Extracted from Scanned Journal Pages?

qnach · Oct 19, 2021

Many ancient journals were scanned into PDF files. What I don't know is how could these pages become mark-able and search-able?
They should only be images.

pbuk · Oct 19, 2021

Optical character recognition (OCR).

jedishrfu · Oct 19, 2021

In addition, PDF tools may allow for edits of the searchable text when OCR fails to recognize some scripts.

sophiecentaur · Oct 20, 2021

pbuk said:

Optical character recognition (OCR).

It works quite well these days but I read a lot of books (obviously scanned) on my Kindle and there are one or two mistakes in most of the books. Context usually digs me out of the problem but the Maths in some papers could produce undetectable errors (undetectable by my Maths brain, at least).

sophiecentaur · Oct 20, 2021

jedishrfu said:

In addition, PDF tools may allow for edits of the searchable text when OCR fails to recognize some scripts.

That sounds v intelligent. You mean an improved bolt on when the context reads as garbage? It's a matter of spotting an error in the first place.

robphy · Oct 20, 2021

PDF Editors like Adobe Acrobat (https://www.adobe.com/acrobat/) and ABBYY (https://pdf.abbyy.com/) can show the text layer. I'm sure there are tools for spell-checking and maybe even grammar-testing that layer.

There are also tools that can extract the whole text layer into a text file (e.g. https://en.wikipedia.org/wiki/Pdftotext ).

A new technology is trying to extract mathematical symbols from handwriting and from scanned images
(e.g.
https://mathpix.com/
https://photomath.com/en/
https://webdemo.myscript.com/
https://socratic.org/ (from Google)
http://write-math.com/
https://www.i2ocr.com/free-online-math-equation-ocr
https://www.cs.rit.edu/~dprl/software.html
https://www.xthink.com/mathjournal.html (once promising)
https://www.inftyproject.org/en/software.html InftyReader
)

Maybe AI can help learn the appropriate context to improve recognition
https://www.searchonmath.com/
https://approach0.xyz/search/
https://mathdeck.org/I haven't tried all of these sites.

sophiecentaur · Oct 20, 2021

The weak link is in the OCR though. OCR utilities can spot text and non-text but some old papers give OCR a hard time. I should imagine any system might need to ask a human for advice. This could involve a lot of (specialist) man-hours for millions of documents.

sophiecentaur · Oct 21, 2021

robphy said:

A new technology is trying to extract mathematical symbols from handwriting and from scanned images

That could be very demanding. It could involve parsing an equation and working out its meaning from the context. Sort of thing that only an expert in the field of the paper could do. But never say never, about computing.

Can Mathematical Symbols Be Extracted from Scanned Journal Pages?

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

France to ditch Windows for Linux

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Can Mathematical Symbols Be Extracted from Scanned Journal Pages?

Similar threads