Browsing Usenet Archives: 1996-1999 on Google

  • Thread starter Thread starter davee123
  • Start date Start date
  • Tags Tags
    Google
Click For Summary
SUMMARY

This discussion focuses on the challenges of browsing Usenet archives from 1996-1999 via Google Groups. Users experience slow navigation, with only 30 topics displayed at a time, and encounter bot detection that blocks access after repeated requests. Alternatives like contacting Usenet servers for historical data are considered, but these typically do not maintain complete archives. The conversation highlights the unique situation of Google Groups, especially after its acquisition of DejaNews, and the uncertainty surrounding the availability of a downloadable archive.

PREREQUISITES
  • Understanding of Usenet and NNTP (Network News Transfer Protocol)
  • Familiarity with Google Groups and its historical context
  • Knowledge of web scraping and bot detection mechanisms
  • Basic awareness of data archiving practices in online forums
NEXT STEPS
  • Research alternatives to Google Groups for accessing Usenet archives
  • Explore NNTP clients that allow for bulk downloading of Usenet data
  • Investigate the history and features of DejaNews and its transition to Google Groups
  • Look into academic institutions that may have archived Usenet data
USEFUL FOR

This discussion is beneficial for researchers, historians, and anyone interested in accessing historical Usenet data, as well as developers and data analysts exploring web scraping and data retrieval challenges.

davee123
Messages
671
Reaction score
4
I'm interested in browsing Usenet archives for a couple groups from roughly 1996-1999.

Google has them, and they let me browse through in a terribly slow fashion, viewing 30 or so topics at a time, ordered based on the last update of the post in the topic (IE, if someone posted a topic in 1996, and then someone replied to it in 2003, the topic would show up in 2003, not 1996). What's worse, Google thinks I'm a bot. After enough page hits clicking on "show older" repeatedly, it thinks I'm a bot, caching their content, and blocks me.

Anyone know if Google has any way to get a full archive in a nice downloadable chunk?

I guess the other option is to go after one of the Usenet servers and request as much history as they have, but I don't think they usually keep full histories, do they?

DaveE
 
Computer science news on Phys.org
If Google are preventing bots from taking the data then I wouldn't think they'd have an open link allowing you to download it - you'd just open it up to bots again.
 
jarednjames said:
If Google are preventing bots from taking the data then I wouldn't think they'd have an open link allowing you to download it - you'd just open it up to bots again.

That's what I likely suspect. It's an odd situation, though, because NNTP is typically available in downloadable swaths. You say how many days back you want to go, and bingo! But Google Groups is ... different. Further, since they effectively purchased the archive from DejaNews, and (I believe) DejaNews offered it for download (not sure if they charged, or what the available history was back when they existed), I wondered if Google simply had a different interface that I wasn't seeing.

It's entirely in the realm of possibility that they don't want web spiders searching all their content, but that they're perfectly content to deliver the data in other formats. I don't really know.

I have found out that other NNTP services don't even come close to offering the history necessary. So unless some J. Random institution out there has decided to keep a bountiful history (it's POSSIBLE in the academic world), it seems like Google is the only source.

But of course Google just shut down their Google Groups discussion forums for help-- presumably because they're about to launch a new version of Google Groups. I'm left not really sure where to go next, other than send some hopeful emails into the black, bottomless pit of Google user requests. I wouldn't mind paying a small fee and agreeing not to publish the data-- but I doubt it's worth Google's time to set up such a transaction for the amount I'd be willing to pay.

DaveE
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 19 ·
Replies
19
Views
8K
Replies
4
Views
2K
Replies
7
Views
3K
  • · Replies 65 ·
3
Replies
65
Views
11K
Replies
5
Views
3K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 2 ·
Replies
2
Views
3K