Import Wikipedia Dump into MySQL 5.0 w/o Heat Issues

  • Thread starter Thread starter sid_galt
  • Start date Start date
  • Tags Tags
    Wikipedia
AI Thread Summary
Importing a 5.6 GB Wikipedia XML dump into MySQL 5.0 using mwdumper can lead to high hard disk temperatures, reaching 50C (120F) after processing 30,000 pages. To mitigate this issue, users can consider several strategies. One option is to manually split the XML file into smaller segments, allowing for a more manageable import process. This can be done by filtering pages based on the first letter of the title, executing the import script for each segment. Additionally, implementing cooling solutions, such as using a fan, can help maintain optimal hard drive temperatures during the import. While speed is not a primary concern, the current import rates are 35 pages per second with mwdumper and 8 pages per second with import.php.
sid_galt
Messages
502
Reaction score
1
I am trying to import the a wikipedia 5.6 GB XML dump into MySQL 5.0 using mwdumper. The problem is that the temperature of my hard disk is reaching 50C (120F) after just 30000 pages.

Is there any way to prevent this or to import the wikipedia dump into MySQL in parts?
Speed is not an issue so if there is a solution which can solve my problem but imports at a low speed, I wouldn't mind it.

PS - I am getting speed 35 pages/sec in mwdumper and 8 pages/sec in import.php.
 
Last edited:
Computer science news on Phys.org
That sounds very interesting. What kind of data are you importing?
 
Buy a fan for the HD.
 
Or put your computer in your refrigerator temporarily, if it's at all feasible. Server rooms are usually refrigerated for this reason.
 
-Job- said:
That sounds very interesting. What kind of data are you importing?

The wikipedia (en.wikipedia.org) encyclopedia, nothing special.

-Job- said:
Or put your computer in your refrigerator temporarily, if it's at all feasible. Server rooms are usually refrigerated for this reason.

There is no way to route the comp cables out of the refrigerator without keeping it open, so not possible.Do you mean then that there is no way to import the data in parts? Maybe by modifying the script to filter pages according to the first alphabet of the title page and then execute the script for each of the 26 alphabets?
 
Or modify the script so you can pause it, or even pause the process. I don't know much about the import method so i can't be much help.
 
There's a way to do it so that it doesn't have to be done all at once, but exactly how depends on the structure of the XML file. It's not a terribly complicated process. You read from the XML file and insert into the database server. Perhaps one way to do this is to manually split the XML file into smaller pieces, then you can still use whatever script you're using rather than write one, but again that depends on the structure of the XML.
 
Back
Top