Is there any way to web-scrape a website that's down?

  • Context: Python 
  • Thread starter Thread starter Eclair_de_XII
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on web-scraping techniques for accessing content from a website that is currently down, specifically the Effbot site for Tkinter documentation. Users suggest utilizing the Internet Archive's Wayback Machine to find archived snapshots of the site, although many archives lack specific instruction pages. A user successfully found a working mirror of the site through a Google search, highlighting the importance of searching for alternative sources. The conversation emphasizes that if a site is inaccessible, scraping is not feasible, but archived versions may provide some content.

PREREQUISITES
  • Familiarity with web-scraping concepts and tools
  • Understanding of the Internet Archive's Wayback Machine
  • Knowledge of Tkinter and its documentation resources
  • Basic skills in using search engines effectively
NEXT STEPS
  • Explore the Internet Archive's Wayback Machine for archived web pages
  • Learn about web-scraping libraries such as Beautiful Soup and Scrapy
  • Investigate alternative Tkinter documentation sources and mirrors
  • Research effective search techniques for finding cached versions of websites
USEFUL FOR

Developers, researchers, and anyone seeking to access Tkinter documentation or similar resources when original websites are down.

Eclair_de_XII
Messages
1,082
Reaction score
91
TL;DR
I used to go to effbot.org for documentation on tkinter. But now it seems to be down. Sometimes I thought about writing a web-scraping script to record all the pages explaining the widgets and what-not of tkinter, but I'm wondering if that is even possible. I cannot even access the pages normally.
I tried Google-searching the site, and found several archive sites. Each archive site has archived the main site and page directory, yes. But every single archive site has seemed to fail to capture the pages on the tkinter objects. I confess that I had taken the site for granted. I'm aware of other tkinter documentation sites on the internet, and I am also aware that other GUI modules exist, like Flask; one user on here mentioned it to me once. All the same, I found effbot the most valuable for tkinter documentation.
 
Technology news on Phys.org
Short answer no. If you can’t see it how can you scrape it.

There is another way though. Try the internet archive wayback machine. They may have taken a snapshot of the site.

HTTPS://web.archive.org
 
Last edited:
https://web.archive.org/web/20200801000000*/effbot.org

I've found plenty of archives of the site, but the ones I have checked do not seem to have the instruction pages available. Frankly, it would be a bit hasslesome to check every single one; I'm considering using a web-scraping script to search for a working link. As mentioned earlier, the web archive seems to have the page directories but not the pages themselves. For example:

https://web.archive.org/web/20200703091947/http://effbot.org/tkinterbook
 
Is that a mirror or a Google cache of the site (aka snapshot)?
 
jedishrfu said:
Is that a mirror or a Google cache of the site (aka snapshot)?
According to the message on the site it is a scrape from Wayback Machine.
 
  • Haha
Likes   Reactions: jedishrfu

Similar threads

  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
7K
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K