Automatically OCR PDF files


by NeoDevin
Tags: ocr, pdf
NeoDevin
NeoDevin is offline
#1
May6-13, 11:42 AM
P: 685
Can anyone recommend a method to have all pdf files in a given folder automatically OCR?

My scanner saves files as pdf, but I would like them to be searchable.

Thanks in advance.
Phys.Org News Partner Science news on Phys.org
Scientists pinpoint when harmless bacteria became flesh-eating monsters
Asian air pollution affect Pacific Ocean storms
Rocket leak delays space station delivery launch (Update)
ChrisJA
ChrisJA is offline
#2
May9-13, 01:00 AM
P: 38
It would help to know what operating system you are using. Mac OS X or Linux?

Adobe Acrobat will do what you want.
NeoDevin
NeoDevin is offline
#3
May11-13, 02:09 AM
P: 685
I have computers running windows and linux, a method for either would be fine, preferably a free option.

Routaran
Routaran is offline
#4
May16-13, 11:38 AM
P: 271

Automatically OCR PDF files


There's several OCR options available to you to use. I did a Google Search for 'linux ocr pdf' and this was the first hit on the list
http://ubuntuforums.org/showthread.php?t=1456756

you can write a small script with a for loop that will go through the contents of a directory and ocr all the pdf files if the program doesn't have flags that allow you to do multiple pdfs at the same time.
ChrisJA
ChrisJA is offline
#5
May16-13, 11:54 AM
P: 38
Sorry, I +thought+ I had relied to this days ago. It seems the way to go is "tesseract" http://code.google.com/p/tesseract-ocr/
It has it's own GUI but there are other 3rd party GUIs or you can run it from the command line or script


Register to reply

Related Discussions
Is a slit automatically also a detector? Quantum Physics 8
Notification - how do I turn that off automatically? Forum Feedback & Announcements 8
Automatically Charging and Discharging a Capacitor Electrical Engineering 4
Car Wheels Automatically Recenter General Physics 3
Automatically redirected to chat??? Forum Feedback & Announcements 3