Compressing PDF documents to oblivion

  • Thread starter Wrichik Basu
  • Start date
  • Tags
    Pdf
In summary,The student is applying to several universities for a Masters degree and is having difficulty compressing all the required documents into a single 5 MB file. The universities are asking the student to compress the documents using Adobe's "Optimize" feature, but the compression only achieves a 100 KB decrease. The student contacted the admissions team and was told to try harder. Some of the scanned documents were not OCR'ed correctly and the text became illegible when downscaled to 100 ppi. The total package represents around 15 MB when compressed using the "Optimize" feature and the text is selectable.
  • #1

Wrichik Basu

Science Advisor
Insights Author
Gold Member
2,105
2,670
I am applying to several universities for Masters. The universities are asking me to compress all the required documents into a single 5 MB file.

I have to submit my 10th and 12th standard marksheets and gradesheets, along with their reverse side. These documents are around 5 MB each because they are digitally signed and downloaded from the official Govt. website. I distilled these using the Adobe Acrobat Print to PDF (I have the paid version of Adobe), and the total size decreased to 12 MB. Add to these the scans of my UG semester transcripts, which are 1.3 MB total. Upon that, a photo ID card, so a total of around 15 MB.

After combining all these documents into a single file, I tried compressing using the "Optimize" feature in Adobe. With the default settings, the size decreased by just 100 KB. If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible. I also tried to compress each file separately, but without success.

I tried a few online services as well, but in vain.

I contacted the admissions team regarding the issue. The reply was pretty straightforward: "You say you can't do it, but a lot of students are doing it. We are confident you can do it if you try hard enough. If you still can't, please don't apply."

Any assistance is appreciated.
 
Computer science news on Phys.org
  • #2
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
 
  • Like
Likes Vanadium 50 and russ_watters
  • #3
Are you using ZIP tools like 7-Zip or WinZip?
 
  • #4
Baluncore said:
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
No links accepted. Even if they did, the documents have to be downloaded by logging in and are inaccessible to the public.

All documents have already been OCR'ed.
Borg said:
Are you using ZIP tools like 7-Zip or WinZip?
Zip files not accepted. Only PDF, max 5 MB.
 
  • #5
Wrichik Basu said:
If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible.
What types of images are these? If they are .bmp images that were generated from a Microsoft copy command, they could be 2 MB each (it drives me crazy when people send me massive emails like that). Changing the resolution won't affect the size of those very much because they have a bunch of OLE code associated with them. You might try capturing a screenshot of the images and saving them as jpeg or png files which would be much smaller. Just be sure to delete the old MS code.
 
  • Like
Likes Algr and phinds
  • #7
Borg said:
What types of images are these?
No idea. I downloaded the files as PDF, did not create them. I verified the signature on the documents, and the distilled them. Regarding downscaling images, I am talking about this option in Adobe: OptimizeAdvanced Optimization.

1680539683773.png
 
  • #8
Wrichik Basu said:
All documents have already been OCR'ed.
I don't think that's true. I think when you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.

Some of what you scanned, too - is it available in a digital to digital pdf from the original source? OCR is limited and graphics on original docs are often vector as well (so, much smaller).
Wrichik Basu said:
Zip files not accepted. Only PDF, max 5 MB.
Zip files don't help here anyway - they do very little on pictures because they have to be lossless.
 
Last edited:
  • #9
russ_watters said:
I don't think that's true. When you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.
The text is selectable. I re-did the OCR after distilling.
 
  • Like
Likes russ_watters
  • #10
Can you indicate roughly how many pages of each kind of material the total package represents? PDF is usually pretty good,,,,something seems amiss. I would also look at which Docs are least compressed by the advanced compression routines and concentrate on those first.
Of course on a more humorous note it would seem prudent to compress the life out of any parts of your record you would like to de-emphasize. Maybe they will only look at the really good parts if the rest is nearly illegible!!
 
  • #11
Did it. Finally.

Converted each downloaded PDF to PNG with low quality setting. Converted those PNG files using an online service (https://png2pdf.com) and then combined those PDF files in Adobe. The final file size is 4116 KB.

And now, the university website is down.

Anyway, a pretty bad method, but the documents are still legible.
 
  • Like
Likes Nik_2213, DrClaude, Borg and 2 others

What is the purpose of compressing PDF documents?

The purpose of compressing PDF documents is to reduce their file size, making them easier to share and store. Compressing a PDF can also improve its loading and downloading speed.

How does compressing a PDF document affect its quality?

Compressing a PDF document can affect its quality by reducing its resolution and image quality. However, there are different levels of compression that can be applied, so the impact on quality can vary.

What are the different methods of compressing PDF documents?

There are two main methods of compressing PDF documents: lossless and lossy compression. Lossless compression reduces file size without sacrificing quality, while lossy compression sacrifices quality for a smaller file size.

Can a compressed PDF document be uncompressed?

Yes, a compressed PDF document can be uncompressed. However, this process may not restore the original quality of the document, depending on the level of compression that was applied.

Are there any risks associated with compressing PDF documents?

There are minimal risks associated with compressing PDF documents. However, if the compression level is too high, it may result in a significant loss of quality. It is always recommended to save a copy of the original document before compressing it.

Suggested for: Compressing PDF documents to oblivion

Replies
9
Views
173
Replies
6
Views
113
Replies
7
Views
2K
Replies
2
Views
911
Replies
18
Views
2K
Replies
3
Views
2K
Replies
22
Views
3K
Replies
24
Views
14K
Back
Top