Is There a Tool to Efficiently Download Articles from a Newspaper's Website?

  • Thread starter Thread starter mech-eng
  • Start date Start date
  • Tags Tags
    Articles
Click For Summary
SUMMARY

The discussion centers on the need for tools to efficiently download articles from a newspaper's website, specifically for archiving purposes. Users mention command line tools such as curl and wget as effective solutions for pulling down webpages and their references. The conversation emphasizes the importance of adhering to copyright laws while considering the preservation of articles in case the websites become unavailable. Users express a desire to archive content without infringing on copyright, particularly when authors change publications.

PREREQUISITES
  • Familiarity with command line interfaces
  • Understanding of copyright laws related to digital content
  • Basic knowledge of web scraping techniques
  • Experience with tools like curl and wget
NEXT STEPS
  • Research advanced usage of curl for web scraping
  • Explore wget options for recursive downloads
  • Learn about ethical web scraping practices and copyright compliance
  • Investigate alternatives for archiving web content, such as the Internet Archive
USEFUL FOR

This discussion is beneficial for content archivists, web developers, and anyone interested in preserving online articles while navigating copyright considerations.

mech-eng
Messages
825
Reaction score
13
I would like to download articles of my favorite journalists. But they are in their newspaper's website. There are hundrends of article for one author. Is there a practical way to download a author's articles. I have to click hundreds of times. Is there any program for this case? For years ago I heard a type of programs called vampire programs. They could download the website so that programs' users can surf on that websites offline. Do you have any idea?

Thank you.
 
Computer science news on Phys.org
jedishrfu said:
There are command line tools such as curl and wget for linux that can pull down webpages and what they reference.

https://en.wikipedia.org/wiki/CURL

https://en.wikipedia.org/wiki/Wget

Remember though we don't want to capture copyrighted material.

If you just read them in the future is this related to copyrighted material? Those websites might be shutdown in the future so having an archive would be better. I use windows can I still use them?

Thank you.
 
That's for you to answer not me. If you feel its wrong then don't do it.

If you plan to post or republish them then that may go over the line.
 
jedishrfu said:
That's for you to answer not me. If you feel its wrong then don't do it.

If you plan to post or republish them then that may go over the line.

No, I just want them not to be terminated in the future and of course reading them. If the websites shutdown or author leaves from newspaper the articles will be deleted i.e terminated.

Thank you.
 
Most newspapers I know of allow non-subscribers to access some number of articles/ some time period.
Once you access an article you could print or make a pdf of it for later.

If the author were to leave a newspaper, they would not throw away their old articles.
 
  • Like
Likes   Reactions: davenn

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 3 ·
Replies
3
Views
7K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
4
Views
2K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 17 ·
Replies
17
Views
16K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 19 ·
Replies
19
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
Replies
2
Views
3K