Compressing PDF documents to oblivion

  • Thread starter Thread starter Wrichik Basu
  • Start date Start date
  • Tags Tags
    Pdf
AI Thread Summary
The discussion revolves around the challenges of compressing multiple academic documents into a single PDF file under a strict 5 MB limit for university applications. The user faced difficulties with the size of their 10th and 12th standard marksheets, which were digitally signed and large in size. Attempts to compress the documents using Adobe Acrobat's optimization features yielded minimal results, and other methods, including online services, were unsuccessful. The admissions team emphasized that many students manage to meet the requirements, suggesting persistence is key. Suggestions included using different image formats, ensuring text is stored as text rather than rasterized, and focusing on the least compressed documents. Ultimately, the user found success by converting PDFs to PNGs with lower quality settings and then recombining them, achieving a final file size of 4116 KB, although the university's website was down at that moment.
Wrichik Basu
Science Advisor
Insights Author
Gold Member
Messages
2,180
Reaction score
2,717
I am applying to several universities for Masters. The universities are asking me to compress all the required documents into a single 5 MB file.

I have to submit my 10th and 12th standard marksheets and gradesheets, along with their reverse side. These documents are around 5 MB each because they are digitally signed and downloaded from the official Govt. website. I distilled these using the Adobe Acrobat Print to PDF (I have the paid version of Adobe), and the total size decreased to 12 MB. Add to these the scans of my UG semester transcripts, which are 1.3 MB total. Upon that, a photo ID card, so a total of around 15 MB.

After combining all these documents into a single file, I tried compressing using the "Optimize" feature in Adobe. With the default settings, the size decreased by just 100 KB. If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible. I also tried to compress each file separately, but without success.

I tried a few online services as well, but in vain.

I contacted the admissions team regarding the issue. The reply was pretty straightforward: "You say you can't do it, but a lot of students are doing it. We are confident you can do it if you try hard enough. If you still can't, please don't apply."

Any assistance is appreciated.
 
Computer science news on Phys.org
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
 
  • Like
Likes Vanadium 50 and russ_watters
Are you using ZIP tools like 7-Zip or WinZip?
 
Baluncore said:
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
No links accepted. Even if they did, the documents have to be downloaded by logging in and are inaccessible to the public.

All documents have already been OCR'ed.
Borg said:
Are you using ZIP tools like 7-Zip or WinZip?
Zip files not accepted. Only PDF, max 5 MB.
 
Wrichik Basu said:
If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible.
What types of images are these? If they are .bmp images that were generated from a Microsoft copy command, they could be 2 MB each (it drives me crazy when people send me massive emails like that). Changing the resolution won't affect the size of those very much because they have a bunch of OLE code associated with them. You might try capturing a screenshot of the images and saving them as jpeg or png files which would be much smaller. Just be sure to delete the old MS code.
 
  • Like
Likes Algr and phinds
Borg said:
What types of images are these?
No idea. I downloaded the files as PDF, did not create them. I verified the signature on the documents, and the distilled them. Regarding downscaling images, I am talking about this option in Adobe: OptimizeAdvanced Optimization.

1680539683773.png
 
Wrichik Basu said:
All documents have already been OCR'ed.
I don't think that's true. I think when you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.

Some of what you scanned, too - is it available in a digital to digital pdf from the original source? OCR is limited and graphics on original docs are often vector as well (so, much smaller).
Wrichik Basu said:
Zip files not accepted. Only PDF, max 5 MB.
Zip files don't help here anyway - they do very little on pictures because they have to be lossless.
 
Last edited:
russ_watters said:
I don't think that's true. When you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.
The text is selectable. I re-did the OCR after distilling.
 
  • Like
Likes russ_watters
  • #10
Can you indicate roughly how many pages of each kind of material the total package represents? PDF is usually pretty good,,,,something seems amiss. I would also look at which Docs are least compressed by the advanced compression routines and concentrate on those first.
Of course on a more humorous note it would seem prudent to compress the life out of any parts of your record you would like to de-emphasize. Maybe they will only look at the really good parts if the rest is nearly illegible!!
 
  • #11
Did it. Finally.

Converted each downloaded PDF to PNG with low quality setting. Converted those PNG files using an online service (https://png2pdf.com) and then combined those PDF files in Adobe. The final file size is 4116 KB.

And now, the university website is down.

Anyway, a pretty bad method, but the documents are still legible.
 
  • Like
Likes Nik_2213, DrClaude, Borg and 2 others
Back
Top