Can't see PDF contents as well as Google does

Stephen Tashi · Dec 13, 2018

There are some PDF files where Google finds keywords, but when I open the link Google gives, the PDF isn't searchable by any tools I've tried ( such as the search feature of Firefox). If I save the PDF to a file, I can't search it with the linux commands grep and pdfgrep.

Example: http://www.rld.state.nm.us/uploads/...d5a3b8f137/Recent_Liquor_License_Sales_19.pdf

Is Google finding keywords in some source different than the PDF?

jtbell · Dec 13, 2018

I opened your example file in my (paid) Acrobat Pro, and tried to search for a word in it. It contains scanned images with no actual text. In order to do a text search, you have to run OCR software on it to extract the text. Acrobat Pro can do this, and then it can search for text (e.g. "Albuquerque" which I tried just now).

Apparently Google does OCR on scanned documents in PDFs.

[added] My website has some files which I scanned and converted to PDF without doing OCR on them. I've just now verified that Google can find text strings in them. For example, searching for "rapid rider special" (with quotes) gives me

where the searched-for text is upside down (!) at the bottom right of the first page of the PDF.

Stephen Tashi · Dec 13, 2018

Can you find the word "Mariah" in the PDF? In my Google search, Google quoted that word in the excerpt it gave with the link, but I don't see the word as an image.

jtbell · Dec 13, 2018

It's on page 7, line 4. That's the only occurrence in the document.

Can't see PDF contents as well as Google does

Attachments

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

France to ditch Windows for Linux

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Can't see PDF contents as well as Google does

Attachments

Similar threads