Can't see PDF contents as well as Google does

  • Thread starter Stephen Tashi
  • Start date
  • Tags
    Google Pdf
In summary, the conversation discusses the issue of searching for keywords in PDF files that are found by Google, but are not searchable by other tools such as Firefox, grep, and pdfgrep. It is mentioned that this is because the PDF files contain scanned images with no actual text, and OCR software is needed to extract the text for searching. It is also noted that Google does OCR on scanned documents in PDFs, and can find text in PDF files that have not been OCR'd. The conversation ends with a question about finding the word "Mariah" in a specific PDF file.
  • #1
Stephen Tashi
Science Advisor
7,861
1,598
There are some PDF files where Google finds keywords, but when I open the link Google gives, the PDF isn't searchable by any tools I've tried ( such as the search feature of Firefox). If I save the PDF to a file, I can't search it with the linux commands grep and pdfgrep.

Example: http://www.rld.state.nm.us/uploads/...d5a3b8f137/Recent_Liquor_License_Sales_19.pdf

Is Google finding keywords in some source different than the PDF?
 
Computer science news on Phys.org
  • #2
I opened your example file in my (paid) Acrobat Pro, and tried to search for a word in it. It contains scanned images with no actual text. In order to do a text search, you have to run OCR software on it to extract the text. Acrobat Pro can do this, and then it can search for text (e.g. "Albuquerque" which I tried just now).

Apparently Google does OCR on scanned documents in PDFs.

[added] My website has some files which I scanned and converted to PDF without doing OCR on them. I've just now verified that Google can find text strings in them. For example, searching for "rapid rider special" (with quotes) gives me

rapidriderspecial.gif


where the searched-for text is upside down (!) at the bottom right of the first page of the PDF.
 

Attachments

  • rapidriderspecial.gif
    rapidriderspecial.gif
    10.8 KB · Views: 507
Last edited:
  • Like
Likes DrClaude
  • #3
Can you find the word "Mariah" in the PDF? In my Google search, Google quoted that word in the excerpt it gave with the link, but I don't see the word as an image.
 
  • #4
It's on page 7, line 4. That's the only occurrence in the document.
 

What are the common reasons for not being able to see PDF contents as well as Google does?

There are several reasons why you may not be able to see PDF contents as well as Google does. These include outdated software, unsupported file formats, incorrect settings, and browser compatibility issues.

How can I ensure that I can see PDF contents as well as Google does?

To ensure that you can see PDF contents as well as Google does, make sure you have the latest version of Adobe Acrobat Reader installed. Additionally, check the file format of the PDF and ensure that your browser and settings are compatible with viewing PDFs.

Why do I see distorted or incomplete PDF contents when opening them?

This could be due to a corrupted or damaged PDF file. Try re-downloading the file or opening it on a different device to see if the issue persists.

Is it possible to see PDF contents as well as Google does on all devices?

Yes, as long as your device has the necessary software and settings to view PDFs properly, you should be able to see the contents as well as Google does.

What can I do if I still can't see PDF contents as well as Google does?

If you are still experiencing issues with viewing PDF contents, try clearing your browser's cache and cookies and restarting your device. You can also try using a different browser or downloading a different PDF viewer software.

Similar threads

  • Computing and Technology
Replies
5
Views
2K
Replies
4
Views
2K
  • Computing and Technology
Replies
13
Views
1K
  • Computing and Technology
Replies
3
Views
1K
Replies
7
Views
881
Replies
14
Views
2K
  • Computing and Technology
Replies
0
Views
187
  • Programming and Computer Science
Replies
3
Views
1K
  • Computing and Technology
Replies
30
Views
2K
  • Computing and Technology
Replies
15
Views
1K
Back
Top