Python Is there any way to web-scrape a website that's down?

  • Thread starter Thread starter Eclair_de_XII
  • Start date Start date
AI Thread Summary
The discussion centers on the challenges of accessing archived pages from the Effbot site, particularly its tkinter documentation. Users noted that while several archive sites have captured the main site and directory, they often lack the specific pages detailing tkinter objects. The Wayback Machine was suggested as a potential resource for finding snapshots of the site, with links provided for users to explore. Some participants mentioned the difficulty of checking multiple archives and considered using web-scraping scripts to locate functioning links. Additionally, a working mirror of the Effbot site was discovered through a simple search, which was confirmed to be a scrape from the Wayback Machine. The conversation also clarified that Flask is not a GUI library, emphasizing the focus on tkinter resources.
Eclair_de_XII
Messages
1,082
Reaction score
91
TL;DR Summary
I used to go to effbot.org for documentation on tkinter. But now it seems to be down. Sometimes I thought about writing a web-scraping script to record all the pages explaining the widgets and what-not of tkinter, but I'm wondering if that is even possible. I cannot even access the pages normally.
I tried Google-searching the site, and found several archive sites. Each archive site has archived the main site and page directory, yes. But every single archive site has seemed to fail to capture the pages on the tkinter objects. I confess that I had taken the site for granted. I'm aware of other tkinter documentation sites on the internet, and I am also aware that other GUI modules exist, like Flask; one user on here mentioned it to me once. All the same, I found effbot the most valuable for tkinter documentation.
 
Technology news on Phys.org
Short answer no. If you can’t see it how can you scrape it.

There is another way though. Try the internet archive wayback machine. They may have taken a snapshot of the site.

HTTPS://web.archive.org
 
Last edited:
https://web.archive.org/web/20200801000000*/effbot.org

I've found plenty of archives of the site, but the ones I have checked do not seem to have the instruction pages available. Frankly, it would be a bit hasslesome to check every single one; I'm considering using a web-scraping script to search for a working link. As mentioned earlier, the web archive seems to have the page directories but not the pages themselves. For example:

https://web.archive.org/web/20200703091947/http://effbot.org/tkinterbook
 
Is that a mirror or a Google cache of the site (aka snapshot)?
 
jedishrfu said:
Is that a mirror or a Google cache of the site (aka snapshot)?
According to the message on the site it is a scrape from Wayback Machine.
 
  • Haha
Likes jedishrfu
Back
Top