Replace Broken links with archived sources

Manasan3010 · Jul 28, 2019

I've seen some broken external links in physics forums Which have been changed to BROKEN as moderators. Is replacing the broken links with archived links(Ex. archive.org) a bad idea?

Greg Bernhardt · Jul 28, 2019

No it's not a bad idea, but it's a matter of scale. We have near 700k threads. We'd require an army to go through them, check links and then replace broken ones with archived links.

Manasan3010 · Jul 28, 2019

Greg Bernhardt said:

No it's not a bad idea, but it's a matter of scale. We have near 700k threads. We'd require an army to go through them, check links and then replace broken ones with archived links.

Is there a bot changing broken links text to "broken", If so Can't you make the bot to check the availability of the link in Archive.org through their api and route the link to Archived Link?

Greg Bernhardt · Jul 28, 2019

Manasan3010 said:

Is there a bot changing broken links text to "broken", If so Can't you make the bot to check the availability of the broken link page and route the link to Archived Link?

That was automated, but a one time thing. It's my understanding that archive.org doesn't archive everything and is organized by snapshot date. How would a bot know what date it was archived on if it was? Sure, it's likely programmically possible, but a lot of work and we'd likely be blocked after sending archive.org hundreds of thousands of requests.

Also during that first run there were false positives found. Servers can respond with some less than standard responses and confuse our simple checker. It's not something I want to rely on doing all the time.

Wrichik Basu · Jul 28, 2019

Greg Bernhardt said:

That was automated, but a one time thing. It's my understanding that archive.org doesn't archive everything and is organized by snapshot date. How would a bot know what date it was archived on if it was? Sure, it's likely programmically possible, but a lot of work and we'd likely be blocked after sending archive.org hundreds of thousands of requests.

Maybe you don't need the date. You can make a bot that will take a link from PF and then use the search option in Wayback machine, and if the search returns some results (except null), the bot will copy the URL of the latest snapshot and place it in PF.

Greg Bernhardt · Jul 28, 2019

Wrichik Basu said:

Maybe you don't need the date. You can make a bot that will take a link from PF and then use the search option in Wayback machine, and if the search returns some results (except null), the bot will copy the URL of the latest snapshot and place it in PF.

Let me know when it's ready

Easiest solution is if a broken link is found, simply include a link to archive.org and they can do the rest

Manasan3010 · Jul 28, 2019

Greg Bernhardt said:

How would a bot know what date it was archived on if it was?

I am not sure what you mean by it.
Normally If you want to see last available copy of a page like google.com

You can get it through https://web.archive.org/web/https://www.google.com/
And you can replace https://www.google.com/ with the broken link.

Greg Bernhardt · Jul 28, 2019

Manasan3010 said:

I am not sure what you mean by it.
Normally If you want to see last available copy of a page like google.com

You can get it through https://web.archive.org/web/https://www.google.com/
And you can replace https://www.google.com/ with the broken link.

That would be better and I wish I had thought of that at the time of running it.

Replace Broken links with archived sources

Similar threads

Hot Threads

I can no longer render LaTeX

Suggestion Suggest "Informative" feedback reaction icon change away from the latest "pink brain" emoticon

Censorship in Science

Bug Thread with python and sqlalchemy code is being blocked by Cloudflare

Can I re-create a previous thread using different terminology?

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem