Replace Broken links with archived sources

Manasan3010
Messages
38
Reaction score
3
I've seen some broken external links in physics forums Which have been changed to BROKEN as moderators. Is replacing the broken links with archived links(Ex. archive.org) a bad idea?
 
Physics news on Phys.org
No it's not a bad idea, but it's a matter of scale. We have near 700k threads. We'd require an army to go through them, check links and then replace broken ones with archived links.
 
Greg Bernhardt said:
No it's not a bad idea, but it's a matter of scale. We have near 700k threads. We'd require an army to go through them, check links and then replace broken ones with archived links.
Is there a bot changing broken links text to "broken", If so Can't you make the bot to check the availability of the link in Archive.org through their api and route the link to Archived Link?
 
Manasan3010 said:
Is there a bot changing broken links text to "broken", If so Can't you make the bot to check the availability of the broken link page and route the link to Archived Link?
That was automated, but a one time thing. It's my understanding that archive.org doesn't archive everything and is organized by snapshot date. How would a bot know what date it was archived on if it was? Sure, it's likely programmically possible, but a lot of work and we'd likely be blocked after sending archive.org hundreds of thousands of requests.

Also during that first run there were false positives found. Servers can respond with some less than standard responses and confuse our simple checker. It's not something I want to rely on doing all the time.
 
Greg Bernhardt said:
That was automated, but a one time thing. It's my understanding that archive.org doesn't archive everything and is organized by snapshot date. How would a bot know what date it was archived on if it was? Sure, it's likely programmically possible, but a lot of work and we'd likely be blocked after sending archive.org hundreds of thousands of requests.
Maybe you don't need the date. You can make a bot that will take a link from PF and then use the search option in Wayback machine, and if the search returns some results (except null), the bot will copy the URL of the latest snapshot and place it in PF.
 
Wrichik Basu said:
Maybe you don't need the date. You can make a bot that will take a link from PF and then use the search option in Wayback machine, and if the search returns some results (except null), the bot will copy the URL of the latest snapshot and place it in PF.

Let me know when it's ready :-p

Easiest solution is if a broken link is found, simply include a link to archive.org and they can do the rest :wink:
 
  • Haha
Likes   Reactions: Wrichik Basu
  • Informative
Likes   Reactions: Greg Bernhardt

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
Replies
35
Views
6K
Replies
9
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
5
Views
2K
  • · Replies 67 ·
3
Replies
67
Views
7K
  • · Replies 11 ·
Replies
11
Views
2K