How to Automatically OCR PDF Files in a Given Folder?

  • Thread starter Thread starter NeoDevin
  • Start date Start date
  • Tags Tags
    files Pdf
Click For Summary

Discussion Overview

The discussion revolves around methods to automatically perform Optical Character Recognition (OCR) on PDF files located in a specified folder. Participants explore various software options and scripting methods applicable across different operating systems, including Windows and Linux.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested, Homework-related

Main Points Raised

  • One participant requests recommendations for OCR methods for PDF files to make them searchable.
  • Another participant suggests that the operating system is relevant, mentioning Adobe Acrobat as a solution.
  • A participant indicates they are using both Windows and Linux and prefers a free option for OCR.
  • Several OCR options are mentioned, with a suggestion to use a Google search for Linux-specific solutions and to consider writing a script to process multiple PDFs.
  • A later reply proposes "tesseract" as a suitable OCR tool, noting it has a GUI and can also be run from the command line or scripted.

Areas of Agreement / Disagreement

Participants express varying preferences for operating systems and software solutions, with no consensus on a single method or tool for performing OCR on PDFs.

Contextual Notes

Some suggestions depend on specific operating systems, and the effectiveness of different OCR tools may vary based on user needs and preferences.

NeoDevin
Messages
334
Reaction score
2
Can anyone recommend a method to have all pdf files in a given folder automatically OCR?

My scanner saves files as pdf, but I would like them to be searchable.

Thanks in advance.
 
Computer science news on Phys.org
It would help to know what operating system you are using. Mac OS X or Linux?

Adobe Acrobat will do what you want.
 
I have computers running windows and linux, a method for either would be fine, preferably a free option.
 
There's several OCR options available to you to use. I did a Google Search for 'linux ocr pdf' and this was the first hit on the list
http://ubuntuforums.org/showthread.php?t=1456756

you can write a small script with a for loop that will go through the contents of a directory and ocr all the pdf files if the program doesn't have flags that allow you to do multiple pdfs at the same time.
 
Sorry, I +thought+ I had relied to this days ago. It seems the way to go is "tesseract" http://code.google.com/p/tesseract-ocr/
It has it's own GUI but there are other 3rd party GUIs or you can run it from the command line or script
 

Similar threads

  • · Replies 16 ·
Replies
16
Views
6K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 36 ·
2
Replies
36
Views
5K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 19 ·
Replies
19
Views
6K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 22 ·
Replies
22
Views
3K
  • · Replies 35 ·
2
Replies
35
Views
6K