Import Wikipedia Dump into MySQL 5.0 w/o Heat Issues

  • Thread starter Thread starter sid_galt
  • Start date Start date
  • Tags Tags
    Wikipedia
Click For Summary

Discussion Overview

The discussion revolves around the challenges of importing a large Wikipedia XML dump into MySQL 5.0, particularly focusing on the overheating of the hard disk during the process. Participants explore potential solutions for importing the data in parts or at a slower rate to mitigate heat issues.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • One participant reports that their hard disk temperature reaches 50C after importing 30,000 pages and seeks solutions to prevent overheating or to import the dump in parts.
  • Some participants suggest practical solutions like buying a fan or placing the computer in a refrigerator to cool it down.
  • Another participant questions the feasibility of modifying the import script to filter pages by the first letter of the title, allowing for segmented imports.
  • There is a suggestion to manually split the XML file into smaller pieces for easier importing, although this depends on the XML structure.
  • One participant expresses uncertainty about the import method and suggests the possibility of pausing the import process.

Areas of Agreement / Disagreement

Participants present multiple competing views on how to address the overheating issue and whether the import can be done in parts. No consensus is reached on a specific solution.

Contextual Notes

Participants mention limitations related to the structure of the XML file and the feasibility of certain cooling methods, but do not resolve these issues.

sid_galt
Messages
503
Reaction score
1
I am trying to import the a wikipedia 5.6 GB XML dump into MySQL 5.0 using mwdumper. The problem is that the temperature of my hard disk is reaching 50C (120F) after just 30000 pages.

Is there any way to prevent this or to import the wikipedia dump into MySQL in parts?
Speed is not an issue so if there is a solution which can solve my problem but imports at a low speed, I wouldn't mind it.

PS - I am getting speed 35 pages/sec in mwdumper and 8 pages/sec in import.php.
 
Last edited:
Computer science news on Phys.org
That sounds very interesting. What kind of data are you importing?
 
Buy a fan for the HD.
 
Or put your computer in your refrigerator temporarily, if it's at all feasible. Server rooms are usually refrigerated for this reason.
 
-Job- said:
That sounds very interesting. What kind of data are you importing?

The wikipedia (en.wikipedia.org) encyclopedia, nothing special.

-Job- said:
Or put your computer in your refrigerator temporarily, if it's at all feasible. Server rooms are usually refrigerated for this reason.

There is no way to route the comp cables out of the refrigerator without keeping it open, so not possible.Do you mean then that there is no way to import the data in parts? Maybe by modifying the script to filter pages according to the first alphabet of the title page and then execute the script for each of the 26 alphabets?
 
Or modify the script so you can pause it, or even pause the process. I don't know much about the import method so i can't be much help.
 
There's a way to do it so that it doesn't have to be done all at once, but exactly how depends on the structure of the XML file. It's not a terribly complicated process. You read from the XML file and insert into the database server. Perhaps one way to do this is to manually split the XML file into smaller pieces, then you can still use whatever script you're using rather than write one, but again that depends on the structure of the XML.
 

Similar threads

Replies
23
Views
7K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K