Can Mathematical Symbols Be Extracted from Scanned Journal Pages?

  • Thread starter qnach
  • Start date
  • Tags
    Journal
In summary, OCR technology has improved greatly and can now accurately extract text from scanned images, including mathematical symbols. However, there are still some errors, especially in old papers, and tools like PDF editors and spell-checkers can help correct these mistakes. New technology is being developed to specifically extract mathematical symbols from handwriting and scanned images, but it may still require human input to accurately understand the context.
  • #1
qnach
142
4
TL;DR Summary
Many ancient journals were scanned into PDF files. What I don't know is how could these pages become mark-able and search-able?

They should only be images.
Many ancient journals were scanned into PDF files. What I don't know is how could these pages become mark-able and search-able?
They should only be images.
 
Computer science news on Phys.org
  • #2
Optical character recognition (OCR).
 
  • Like
Likes hmmm27, Vanadium 50, jedishrfu and 3 others
  • #3
In addition, PDF tools may allow for edits of the searchable text when OCR fails to recognize some scripts.
 
  • #4
pbuk said:
Optical character recognition (OCR).
It works quite well these days but I read a lot of books (obviously scanned) on my Kindle and there are one or two mistakes in most of the books. Context usually digs me out of the problem but the Maths in some papers could produce undetectable errors (undetectable by my Maths brain, at least).
 
  • #5
jedishrfu said:
In addition, PDF tools may allow for edits of the searchable text when OCR fails to recognize some scripts.
That sounds v intelligent. You mean an improved bolt on when the context reads as garbage? It's a matter of spotting an error in the first place.
 
  • #6
PDF Editors like Adobe Acrobat (https://www.adobe.com/acrobat/) and ABBYY (https://pdf.abbyy.com/) can show the text layer. I'm sure there are tools for spell-checking and maybe even grammar-testing that layer.

There are also tools that can extract the whole text layer into a text file (e.g. https://en.wikipedia.org/wiki/Pdftotext ).

A new technology is trying to extract mathematical symbols from handwriting and from scanned images
(e.g.
https://mathpix.com/
https://photomath.com/en/
https://webdemo.myscript.com/
https://socratic.org/ (from Google)
http://write-math.com/
https://www.i2ocr.com/free-online-math-equation-ocr
https://www.cs.rit.edu/~dprl/software.html
https://www.xthink.com/mathjournal.html (once promising)
https://www.inftyproject.org/en/software.html InftyReader
)

Maybe AI can help learn the appropriate context to improve recognition
https://www.searchonmath.com/
https://approach0.xyz/search/
https://mathdeck.org/I haven't tried all of these sites.
 
  • #7
The weak link is in the OCR though. OCR utilities can spot text and non-text but some old papers give OCR a hard time. I should imagine any system might need to ask a human for advice. This could involve a lot of (specialist) man-hours for millions of documents.
 
  • #8
robphy said:
A new technology is trying to extract mathematical symbols from handwriting and from scanned images
That could be very demanding. It could involve parsing an equation and working out its meaning from the context. Sort of thing that only an expert in the field of the paper could do. But never say never, about computing.
 

1. How are journal pages scanned?

Journal pages are typically scanned using a flatbed scanner or a document scanner. The pages are placed face-down on the scanner and the scanner uses a light source and sensors to create a digital image of the page.

2. What file format is used for scanned journal pages?

The most common file format for scanned journal pages is PDF (Portable Document Format). This format allows for high-quality images and text to be preserved and easily shared.

3. How can I improve the quality of scanned journal pages?

To improve the quality of scanned journal pages, it is important to use a high resolution (at least 300 dpi) and to ensure that the pages are flat and free of any wrinkles or creases. Additionally, adjusting the brightness and contrast settings on the scanner can also improve the quality of the scanned images.

4. Can handwritten notes or annotations on journal pages be scanned?

Yes, handwritten notes or annotations on journal pages can be scanned. However, the quality of the scanned notes may vary depending on the legibility of the handwriting and the quality of the scanner.

5. Are there any copyright issues with scanning journal pages?

It is important to consider copyright laws when scanning journal pages. If the journal pages are still under copyright, permission may be required from the publisher or author before scanning and sharing the pages. It is always best to consult with legal counsel before scanning copyrighted material.

Similar threads

  • Computing and Technology
Replies
15
Views
1K
  • Computing and Technology
Replies
4
Views
3K
Replies
46
Views
4K
Replies
7
Views
223
  • Computing and Technology
Replies
9
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • Biology and Medical
Replies
9
Views
2K
  • Computing and Technology
Replies
4
Views
1K
  • Computing and Technology
Replies
2
Views
2K
  • Art, Music, History, and Linguistics
Replies
4
Views
802
Back
Top