Can't see PDF contents as well as Google does

Stephen Tashi · Dec 13, 2018

There are some PDF files where Google finds keywords, but when I open the link Google gives, the PDF isn't searchable by any tools I've tried ( such as the search feature of Firefox). If I save the PDF to a file, I can't search it with the linux commands grep and pdfgrep.

Example: http://www.rld.state.nm.us/uploads/...d5a3b8f137/Recent_Liquor_License_Sales_19.pdf

Is Google finding keywords in some source different than the PDF?

jtbell · Dec 13, 2018

I opened your example file in my (paid) Acrobat Pro, and tried to search for a word in it. It contains scanned images with no actual text. In order to do a text search, you have to run OCR software on it to extract the text. Acrobat Pro can do this, and then it can search for text (e.g. "Albuquerque" which I tried just now).

Apparently Google does OCR on scanned documents in PDFs.

[added] My website has some files which I scanned and converted to PDF without doing OCR on them. I've just now verified that Google can find text strings in them. For example, searching for "rapid rider special" (with quotes) gives me

where the searched-for text is upside down (!) at the bottom right of the first page of the PDF.

Stephen Tashi · Dec 13, 2018

Can you find the word "Mariah" in the PDF? In my Google search, Google quoted that word in the excerpt it gave with the link, but I don't see the word as an image.

jtbell · Dec 13, 2018

It's on page 7, line 4. That's the only occurrence in the document.

Can't see PDF contents as well as Google does

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Similar threads

What Free Privacy-Focused AI Chatbots Don’t Use My Data for Training?

How far will we let AI control us?

If you think having a backup is too expensive, try not having one

Impersonation News

Cooling a processor chip

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight