Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Wikipedia Dumping

  1. Jul 22, 2006 #1
    I am trying to import the a wikipedia 5.6 GB XML dump into MySQL 5.0 using mwdumper. The problem is that the temperature of my hard disk is reaching 50C (120F) after just 30000 pages.

    Is there any way to prevent this or to import the wikipedia dump into MySQL in parts?
    Speed is not an issue so if there is a solution which can solve my problem but imports at a low speed, I wouldn't mind it.

    PS - I am getting speed 35 pages/sec in mwdumper and 8 pages/sec in import.php.
    Last edited: Jul 22, 2006
  2. jcsd
  3. Jul 22, 2006 #2


    User Avatar
    Science Advisor

    That sounds very interesting. What kind of data are you importing?
  4. Jul 22, 2006 #3


    User Avatar
    Science Advisor
    Homework Helper

    Buy a fan for the HD.
  5. Jul 22, 2006 #4


    User Avatar
    Science Advisor

    Or put your computer in your refrigerator temporarily, if it's at all feasible. Server rooms are usually refrigerated for this reason.
  6. Jul 22, 2006 #5
    The wikipedia (en.wikipedia.org) encyclopedia, nothing special.

    There is no way to route the comp cables out of the refrigerator without keeping it open, so not possible.

    Do you mean then that there is no way to import the data in parts? Maybe by modifying the script to filter pages according to the first alphabet of the title page and then execute the script for each of the 26 alphabets?
  7. Jul 22, 2006 #6
    Or modify the script so you can pause it, or even pause the process. I dont know much about the import method so i cant be much help.
  8. Jul 22, 2006 #7


    User Avatar
    Science Advisor

    There's a way to do it so that it doesn't have to be done all at once, but exactly how depends on the structure of the XML file. It's not a terribly complicated process. You read from the XML file and insert into the database server. Perhaps one way to do this is to manually split the XML file into smaller pieces, then you can still use whatever script you're using rather than write one, but again that depends on the structure of the XML.
  9. Jul 23, 2006 #8


    User Avatar

    Staff: Mentor

    Box fan..........
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook