Discussion Overview
The discussion revolves around the challenges and methods of web-scraping a website that is currently down, specifically focusing on the tkinter documentation site. Participants explore various alternatives, including archived versions of the site and other resources for tkinter information.
Discussion Character
- Exploratory
- Technical explanation
- Debate/contested
Main Points Raised
- One participant notes that while archive sites have captured the main site and page directory, they have failed to capture specific pages related to tkinter objects.
- Another participant asserts that if a site is down, it cannot be scraped directly, but suggests using the Internet Archive's Wayback Machine as a potential solution.
- A participant shares a specific link to the Wayback Machine for the effbot site, indicating that while archives exist, they may not contain the desired instruction pages.
- Several links to alternative resources and documentation for tkinter are provided by participants, indicating the existence of other valuable materials.
- One participant clarifies that Flask is not a GUI library, which may be relevant to the discussion of tkinter alternatives.
- A participant mentions finding a working mirror of the site through a Google search, expressing a sense of relief at discovering this resource.
- There is a question raised about whether a shared link is a mirror or a Google cache, indicating some uncertainty about the nature of the resource.
- Another participant suggests that the recovered site is a scrape from the Wayback Machine, but this remains unverified.
Areas of Agreement / Disagreement
Participants express differing views on the effectiveness of various methods for accessing downed websites, with some advocating for the Wayback Machine while others highlight the limitations of archived pages. The discussion remains unresolved regarding the best approach to retrieve specific content from the downed site.
Contextual Notes
Participants note limitations in the availability of specific pages on archive sites and the potential confusion between mirrors and cached versions of the site. There is also a lack of consensus on the reliability of the resources shared.