Code language for editing PDF files

Click For Summary
SUMMARY

The discussion centers on the most effective coding language for editing PDF files, specifically for deleting the first page or scanning for and removing specific text. Python is recommended as the most convenient language for this task, particularly due to its libraries like PyPDF2 and BeautifulSoup for handling PDF manipulation. While JavaScript is suggested for web-based PDF viewing, Python's versatility and ease of use make it the preferred choice for beginners and amateur coders alike.

PREREQUISITES
  • Basic understanding of Python programming (version 3.x recommended)
  • Familiarity with PDF manipulation libraries such as PyPDF2
  • Knowledge of web scraping techniques using BeautifulSoup
  • Command line usage for executing scripts and managing files
NEXT STEPS
  • Learn how to use PyPDF2 for PDF file manipulation
  • Explore BeautifulSoup for web scraping and text extraction
  • Research additional Python libraries for PDF editing, such as PDFMiner
  • Practice command line operations to run Python scripts efficiently
USEFUL FOR

This discussion is beneficial for amateur coders, particularly those with a basic understanding of Python, who are interested in learning how to manipulate PDF files programmatically.

linag96
Messages
3
Reaction score
0
I have around ~24 PDF files I want to edit. I want to delete the first page of each file OR scan for certain text in the file/delete the text (whichever implementation is easier). What would be the most convenient and straight forward coding language to do this with and what would be my starting guide to creating this program?
I am an amateur coder, I love learning coding. I have about 4 months of experience with C++. I used a lot of MatLab last year for my math classes, I also learned some command line. I'm open to learning anything new. Thank you for your help.
 
Technology news on Phys.org
Like web crawling (or scraping)?

If you are viewing them in browser i guess using Javascript would be the best way to crawl. If not, probably Python.

Some interesting resources here and http://www.nyu.edu/projects/politicsdatalab/localdata/workshops/BeautifulSoup.pdf.

However if you only want to delete the first page of each, then the editor or previewer you are using to view them will likely be the fastest way.
 

Similar threads

Replies
7
Views
2K
  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
10
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K