How to download entire Forums in PF

  • Thread starter Rainbows_
  • Start date
In summary, there are software programs available that can download and save individual forum threads, but downloading the entire Special and General Relativity forum archives would take a significant amount of time and use up a large amount of bandwidth. This is not encouraged by web owners, and manually saving each thread would be extremely time-consuming. It is suggested to not download the entire forum and instead stay online for access.
  • #1
Rainbows_
I want to download the entire Special and General Relativity forum messages archives so I can read them offline and do searches as there are so many gems inside. What software must I use to download. Manually saving each thread would take too long. Thanks.
 
Physics news on Phys.org
  • #3
rootone said:
Something like this should do what you want.
https://www.httrack.com/

Have you done it successfully? It says "HTTrack has detected that the mirror is empty".

Isn't this illegal or discouraged by web owners? If it is then, then let's transfer our messages to private conversation. If anyone has successfully downloaded an entire forum, please private message me if you don't want to share it publicly. Thanks.
 
  • #4
here's the error log:

HTTrack3.49-2+htsswf+htsjava launched on Thu, 27 Jul 2017 10:48:10 at https://www.physicsforums.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
(winhttrack -qYC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2014], %s -->" -%l "en, *" -Y https://www.physicsforums.com -O1 "d:\My Web Sites\p6" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private
10:48:11 Warning: Moved Permanently for www.physicsforums.com/robots.txt
10:48:11 Warning: Redirected link is identical because of 'URL Hack' option: www.physicsforums.com/robots.txt and https://www.physicsforums.com/robots.txt
10:48:11 Warning: Warning moved treated for www.physicsforums.com/robots.txt (real one is https://www.physicsforums.com/robots.txt)
10:48:11 Warning: Moved Permanently for www.physicsforums.com/
10:48:11 Warning: Redirected link is identical because of 'URL Hack' option: www.physicsforums.com/ and https://www.physicsforums.com/
10:48:11 Warning: File has moved from www.physicsforums.com/ to https://www.physicsforums.com/
10:48:11 Warning: No data seems to have been transferred during this session! : restoring previous one!
 
  • #5
No it's not illegal, in fact the original internet encouraged that sort of thing.
I haven't tried it on this site, but I have on others.
You can't download the database of a site, but you can download all the HTML and JScrpit, Images, etc.
Downloading the actual database of a site is not something most site admins would want to agree to
 
Last edited:
  • #6
Please don't do this. It can kill our bandwidth.
 
  • Like
Likes davenn and symbolipoint
  • #7
Greg Bernhardt said:
Please don't do this. It can kill our bandwidth.

Ok. And I think your robots are guarding the place to avoid any downloading... anyway. Hehe...
 
  • Like
Likes Greg Bernhardt
  • #8
rootone said:
No it's not illegal, in fact the original internet encouraged that sort of thing.
I haven't tried it on this site, but I have on others.
You can't download the database of a site, but you can download all the HTML and JScrpit, Images, etc.
Downloading the actual database of a site is not something most site admins would want to agree to

You mean even in other web sites with forums.. you can't download the messages too?

I hope there is option even for a paid archive collection retrieval.. Maybe Greg Bernhardt can offer this someday?
 
  • #9
Rainbows_ said:
I hope there is option even for a paid archive collection retrieval.. Maybe Greg Bernhardt can offer this someday?
What prevents you from staying online?
 
  • #10
Greg Bernhardt said:
What prevents you from staying online?

Just for backup. In case the entire database gets wiped out.. for example from EMP from north korea or other events you never expected (like CME burst).
 
  • #11
Rainbows_ said:
Just for backup. In case the entire database gets wiped out.. for example from EMP from north korea or other events you never expected (like CME burst).
If something like that happens you have more important things to think about than loading up your backup of PF ;)
 
  • #12
Greg Bernhardt said:
If something like that happens you have more important things to think about than loading up your backup of PF ;)

Or just a virus or hack that can destroy the database (don't you get worried). The contents are gems and they can recreate 21th century physics if we were back to say the time of Newton :)
 
  • #13
Rainbows_ said:
Or just a virus or hack that can destroy the database (don't you get worried). The contents are gems and they can recreate 21th century physics if we were back to say the time of Newton :)
Don't worry, I have backups :)
 
  • Like
Likes Rainbows_
  • #14
We have 17,400 threads in the special relativity section, many of them with multiple pages. Downloading their HTML view would be many gigabytes of traffic (or even more if the script would just follow every link). They wouldn't be very useful as backup either, because they don't have all the relevant data, and they have it in a format not useful for backups.
Rainbows_ said:
You mean even in other web sites with forums.. you can't download the messages too?
I don't think any forum likes a huge amount of unnecessary extra traffic.
 
  • Like
Likes symbolipoint
  • #15
mfb said:
We have 17,400 threads in the special relativity section, many of them with multiple pages. Downloading their HTML view would be many gigabytes of traffic (or even more if the script would just follow every link). They wouldn't be very useful as backup either, because they don't have all the relevant data, and they have it in a format not useful for backups.I don't think any forum likes a huge amount of unnecessary extra traffic.

I think the following would be reasonable.

Is there any script or software where one can make the software opens each thread manually then save every page. This is not only for this physicsforums but for countless other forums sites out there?
 
  • #16
You can manually open every thread and manually save it if you like. It will take you something like a week - just for the relativity section.
 
  • #17
Rainbows_ said:
I think the following would be reasonable.

Is there any script or software where one can make the software opens each thread manually then save every page. This is not only for this physicsforums but for countless other forums sites out there?

Yes plenty of programs exist. I thought you agreed you would not do this? You would use up a good chunk of our bandwidth that we pay for.
 
  • Like
Likes Vanadium 50
  • #18
Greg Bernhardt said:
Yes plenty of programs exist. I thought you agreed you would not do this? You would use up a good chunk of our bandwidth that we pay for.

There is no software that can do this.. that's why mfb suggested to manually save it one by one for a week.
 
  • #19
I didn't suggest it. I said it is possible, but a bad idea.
 
  • #20
I think the OP should first try to contribute something to the forum rather than seeing how much he can get from it.
 
  • Like
Likes S.G. Janssens
  • #21
Charles Link said:
I think the OP should first try to contribute something to the forum rather than seeing how much he can get from it.

Yup. Anyway just install a bandwidth limiter so it can avoid any similar attempts in the future by others. I'm very poor in computers and others may be more clever to do it. And it's ok if this thread is deleted to avoid encouraging others. Thanks.
 
  • #22
Greg Bernhardt said:
Yes plenty of programs exist. I thought you agreed you would not do this? You would use up a good chunk of our bandwidth that we pay for.

btw.. just out of curiosity.. do you have certain gigabytes bandwidth allocation per month like 3 gigabytes for all access and concerned forum retrieval software can exceed that limit or is the bandwidth allocation unlimited and you are concerned only for killing the bandwidth in the sense it becomes very slow because people are downloading forums? But then in our age where 20 mbps fiber connection exist we can download gigabytes in less than 10 minutes and if this occurs at midnight where most members are asleep, the effect won't be felt.

Well. Just asking. I believe in karma and I don't want you to shoulder additional cost (or lose money) for an excellent service.

(I thought this thread would be deleted.. but it's ok too if this thread would be visible only to the participants (of this thread) or become a private conversation due to some classified data within).
 
  • #23
Most websites other than giant corporations exist on what are called server farms.
I am pretty sure that is the case with PF.
The site owner pays a monthly or something fee to rent some of that server capacity.
There isn't any politics about it, you pay the server farm for a service, and they supply it,
(unless the site breaks rules of the server farm, like porn for instance in a lot of cases, or criminal activity)
Site admins do of course have rules for their own site, but on PF I only have seen threads deleted because of crackpot nonsense.
 
Last edited:
  • #24
Rainbows_ said:
btw.. just out of curiosity.. do you have certain gigabytes bandwidth allocation per month like 3 gigabytes for all access and concerned forum retrieval software can exceed that limit or is the bandwidth allocation unlimited and you are concerned only for killing the bandwidth in the sense it becomes very slow because people are downloading forums? But then in our age where 20 mbps fiber connection exist we can download gigabytes in less than 10 minutes and if this occurs at midnight where most members are asleep, the effect won't be felt.
It's not about mbps but total bandwidth served.
 
  • #25
Rainbows_ said:
Anyway just install a bandwidth limiter...
From what I've seen, I think a bandwidth limiter is already installed...[COLOR=#black].[/COLOR] :-p [COLOR=#black].[/COLOR] :biggrin:

Rainbows1.jpg
 
  • Like
Likes dlgoff and Greg Bernhardt
  • #26
I thought you might...
Greg Bernhardt likes this.
I mean... I hoped you might...[COLOR=#black].[/COLOR] :nb)
 
  • #27
Bandwidth edit 2.jpg


Hey, c'mon guys... that isn't funny...[COLOR=#black].[/COLOR] :frown:
 
  • #28
I'd no longer save the entire web site... if it is even possible.. because I don't want Greg to lose money.

I just want to save all the messages of Arnold Neumaier because he is the most genius and talented person in the net.. the way he wrote and his mathematical equations don't seem to be written (or think up) by a mere human or harbinger of a new breed of human.. and I think he can be a Nobel Prize recipient someday. So I'll just save each of his messages.. but a script to browse the site and search/save only the messages of Neumaier would be helpful though.
 
  • #30

1. How do I download an entire forum in PF?

To download an entire forum in PF, you will need to use a web scraping tool or program. This will allow you to automatically extract all the content from the forum and save it onto your computer.

2. What is the best web scraping tool for downloading forums in PF?

There are many great web scraping tools available, but some popular options for downloading forums in PF include Scrapy, Beautiful Soup, and Octoparse. It is recommended to do some research and choose the tool that best fits your needs and technical abilities.

3. Can I download forums in PF without using a web scraping tool?

It is possible to manually download forum content in PF, but it can be very time-consuming and tedious. Using a web scraping tool will save you a lot of time and effort.

4. Is it legal to download entire forums in PF?

Downloading forums in PF is generally considered legal, as long as you are not using the content for commercial purposes or violating any copyright laws. However, it is always best to check the forum's terms of use or contact the forum owner for permission before downloading any content.

5. How can I ensure that I am not violating any forum rules or policies when downloading content in PF?

Before downloading any forum content in PF, it is important to review the forum's rules and policies. Make sure to follow any guidelines or restrictions that the forum has in place, such as not downloading content for commercial use or not redistributing the downloaded content without permission.

Similar threads

  • Feedback and Announcements
Replies
8
Views
1K
Replies
64
Views
4K
  • Feedback and Announcements
2
Replies
66
Views
3K
  • Feedback and Announcements
Replies
17
Views
2K
  • Feedback and Announcements
Replies
1
Views
385
  • Feedback and Announcements
Replies
22
Views
1K
  • Feedback and Announcements
Replies
10
Views
968
  • Feedback and Announcements
Replies
16
Views
2K
  • Feedback and Announcements
Replies
10
Views
1K
  • Feedback and Announcements
2
Replies
46
Views
8K
Back
Top