Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

How to download entire Forums in PF

  1. Jul 26, 2017 #1
    I want to download the entire Special and General Relativity forum messages archives so I can read them offline and do searches as there are so many gems inside. What software must I use to download. Manually saving each thread would take too long. Thanks.
     
  2. jcsd
  3. Jul 26, 2017 #2
  4. Jul 26, 2017 #3
    Have you done it successfully? It says "HTTrack has detected that the mirror is empty".

    Isn't this illegal or discouraged by web owners? If it is then, then let's transfer our messages to private conversation. If anyone has successfully downloaded an entire forum, please private message me if you don't want to share it publicly. Thanks.
     
  5. Jul 26, 2017 #4
    here's the error log:

    HTTrack3.49-2+htsswf+htsjava launched on Thu, 27 Jul 2017 10:48:10 at http://www.physicsforums.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
    (winhttrack -qYC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2014], %s -->" -%l "en, *" -Y http://www.physicsforums.com -O1 "d:\My Web Sites\p6" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar )
    Information, Warnings and Errors reported for this mirror:
    note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
    such as username/password authentication for websites mirrored in this project
    do not share these files/folders if you want these information to remain private
    10:48:11 Warning: Moved Permanently for www.physicsforums.com/robots.txt
    10:48:11 Warning: Redirected link is identical because of 'URL Hack' option: www.physicsforums.com/robots.txt and https://www.physicsforums.com/robots.txt
    10:48:11 Warning: Warning moved treated for www.physicsforums.com/robots.txt (real one is https://www.physicsforums.com/robots.txt)
    10:48:11 Warning: Moved Permanently for www.physicsforums.com/
    10:48:11 Warning: Redirected link is identical because of 'URL Hack' option: www.physicsforums.com/ and https://www.physicsforums.com/
    10:48:11 Warning: File has moved from www.physicsforums.com/ to https://www.physicsforums.com/
    10:48:11 Warning: No data seems to have been transferred during this session! : restoring previous one!
     
  6. Jul 26, 2017 #5
    No it's not illegal, in fact the original internet encouraged that sort of thing.
    I haven't tried it on this site, but I have on others.
    You can't download the database of a site, but you can download all the HTML and JScrpit, Images, etc.
    Downloading the actual database of a site is not something most site admins would want to agree to
     
    Last edited: Jul 26, 2017
  7. Jul 26, 2017 #6
    Please don't do this. It can kill our bandwidth.
     
  8. Jul 26, 2017 #7
    Ok. And I think your robots are guarding the place to avoid any downloading... anyway. Hehe...
     
  9. Jul 26, 2017 #8
    You mean even in other web sites with forums.. you can't download the messages too?

    I hope there is option even for a paid archive collection retrieval.. Maybe Greg Bernhardt can offer this someday?
     
  10. Jul 26, 2017 #9
    What prevents you from staying online?
     
  11. Jul 26, 2017 #10
    Just for backup. In case the entire database gets wiped out.. for example from EMP from north korea or other events you never expected (like CME burst).
     
  12. Jul 26, 2017 #11
    If something like that happens you have more important things to think about than loading up your backup of PF ;)
     
  13. Jul 26, 2017 #12
    Or just a virus or hack that can destroy the database (don't you get worried). The contents are gems and they can recreate 21th century physics if we were back to say the time of Newton :)
     
  14. Jul 26, 2017 #13
    Don't worry, I have backups :)
     
  15. Jul 26, 2017 #14

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    We have 17,400 threads in the special relativity section, many of them with multiple pages. Downloading their HTML view would be many gigabytes of traffic (or even more if the script would just follow every link). They wouldn't be very useful as backup either, because they don't have all the relevant data, and they have it in a format not useful for backups.
    I don't think any forum likes a huge amount of unnecessary extra traffic.
     
  16. Jul 27, 2017 #15
    I think the following would be reasonable.

    Is there any script or software where one can make the software opens each thread manually then save every page. This is not only for this physicsforums but for countless other forums sites out there?
     
  17. Jul 27, 2017 #16

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    You can manually open every thread and manually save it if you like. It will take you something like a week - just for the relativity section.
     
  18. Jul 27, 2017 #17
    Yes plenty of programs exist. I thought you agreed you would not do this? You would use up a good chunk of our bandwidth that we pay for.
     
  19. Jul 27, 2017 #18
    There is no software that can do this.. that's why mfb suggested to manually save it one by one for a week.
     
  20. Jul 27, 2017 #19

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    I didn't suggest it. I said it is possible, but a bad idea.
     
  21. Jul 27, 2017 #20

    Charles Link

    User Avatar
    Homework Helper

    I think the OP should first try to contribute something to the forum rather than seeing how much he can get from it.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: How to download entire Forums in PF
  1. Pf Forum changes (Replies: 2)

  2. Query in PF Forum (Replies: 4)

Loading...