Wayback Machine to the rescue (almost)

  • Thread starter Thread starter jtbell
  • Start date Start date
  • Tags Tags
    Machine
Click For Summary
SUMMARY

The discussion centers on the use of the Wayback Machine to recover lost web pages after a server crash and accidental deletion of backups. The user successfully retrieved missing text content from their website but faced challenges due to a robots.txt file that restricted access to image directories. To resolve this, they modified their robots.txt to allow the Wayback Machine's crawler to access images while still blocking other crawlers. This adjustment facilitates easier recovery of images in the future.

PREREQUISITES
  • Understanding of robots.txt file configuration
  • Familiarity with the Wayback Machine and its functionalities
  • Basic knowledge of web server management
  • Experience with image editing software like Photoshop
NEXT STEPS
  • Research how to effectively configure robots.txt for web crawlers
  • Explore advanced features of the Wayback Machine for archiving
  • Learn about best practices for web server backups and recovery
  • Investigate image management strategies for web galleries
USEFUL FOR

Web developers, content creators, and digital archivists who need to recover lost web content or manage web crawlers effectively.

jtbell
Staff Emeritus
Science Advisor
Homework Helper
Messages
16,026
Reaction score
7,827
Last December, my college's Web server crashed because of a hard-disk failure. As I was fiddling with my most recent backup of my own Web pages, I clumsily managed to delete that, too. I had to resort to a much older backup that was missing several pages that I had created since then. I put the missing pages on my "to do" list to re-create, but never got around to doing it.

Just today in a thread elsewhere on PF, someone mentioned the Wayback Machine, which I had forgotten about. I used it to search for my Web site, and voilà, there were the missing pages! :!)

There was just one problem. The pages are part of a large photo gallery, and all the images are in a directory that I've forbidden to Web crawlers via a robots.txt file. (Some people were slurping hundreds of pictures at once, and bogging the server down.) So the Wayback Machine has the Web page text, including the picture captions, but not the images. :cry:

At least I can find the images again in my collection, based on the captions and URLs, but it will take some time to track them down and fix them up in Photoshop again. To make things easier in the future, I've added an entry to robots.txt that allows the Wayback Machine's crawler to fetch my images, while still forbidding other crawlers from doing so.
 
Computer science news on Phys.org
Last edited:
"About the Wayback Machine:
Browse through 85 billion web pages archived from 1996 to a few months ago. "


"Thank you very much for considering us in your charitable giving. We appreciate and rely on donations from people like yourselves. The Internet Archive is a 501(c)(3) non-profit organization, therefore your donations are tax deductible as allowed by law."
 
Last edited:
I am having a hell of a time finding a good all-in-one inkjet printer. I must have gone through 5 Canon, 2 HP, one Brother, one Epson and two 4 X 6 photo printers in the last 7 yrs. all have all sort of problems. I don't even know where to start anymore. my price range is $180-$400, not exactly the cheapest ones. Mainly it's for my wife which is not exactly good in tech. most of the problem is the printers kept changing the way it operate. Must be from auto update. I cannot turn off the...

Similar threads

Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
14K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 17 ·
Replies
17
Views
5K
  • · Replies 4 ·
Replies
4
Views
6K
  • · Replies 0 ·
Replies
0
Views
4K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 3 ·
Replies
3
Views
4K