Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Need Help Turning One File into Thousands of Files (in Python)

  1. Feb 15, 2012 #1
    1. The problem statement, all variables and given/known data

    I'm not sure if this is quite where this belongs since it's not homework really, but I just need help programming and I figured this would be the place to ask. It's related to undergraduate research I'm doing so if that's not kosher just say so!

    I want to write a Python script that takes in a certain file as an input, and then, after transforming the the file as a list where each line is an index, then outputting every 567 lines (so something like [0:566], [567:1133], etc] to new seperate files. There are well over a million lines if I remember correctly so doing this without a script would be extremely inefficient (unless I've overlooked some super simple way to do it in the bash shell). I only need to do this for one file, so technically I don't need to find an extremely general solution for this, just one that works with this file.

    The problem is that I have very little programming experience, so a little guidance would be super helpful (even just basic programming advice is certainly welcome). I don't want someone to do it for me because I need to be able to do this myself, I just need to know if my ideas will possibly work and what general functions I should read up on to get this accomplished if I'm on the complete wrong track.

    If this is the wrong place for this I apologize!

    2. List of Things that Must Be Done

    1. Accept name of text file as an argument when executing file.

    2. Turn Text File type into a python list. (I'm not sure if this is even necessary. Are text files already considered as a "list" type when they are opened in a python environment? Are they already indexed based on line?)

    3. Output every 567 lines to separate text files. (Could technically do this some way by using a keyword that will start every separate file, but I'd assume that would require more work)

    3. The attempt at a solution

    1. This one I basically understand. Simply have to import the necessary packages and set the first argument equal to the file being split

    2. I'm not really sure yet if this is even necessary because I don't know if the file, after being called on the command line, is already indexed by line. Any clues on this?

    3. Perhaps a for loop (something like for line [i,j] in file) with i = 0 and j = 566. If I can find a way within the loop to call i and j the indices which I would then use some way to output to a file of some form FILE#, and with every iteration of the loop increase the # next to the file name by 1 and increase i and j by 567.

    Am I on the right track? Or should I abandon ship ont his way and start from scratch?
  2. jcsd
  3. Feb 15, 2012 #2


    User Avatar
    Science Advisor
    Gold Member

    You'll want a loop, but you may find a while loop easier to work with and use exception handling:

    Code (Text):

    inport sys

    fIn = open('mybigfile.txt')
    while True:
              "Inner loops here will open output file (fOut = open('filename1.txt')   ),
               read lines and write to file,
               close file and open next file as if it would go on forever."
         except IOError:
               "when end of file is reached read function will throw an error
                which this code catches.  
                Here we close last file and break out of loop."

    print "All Done!"
    Note to use numerical index in file names you can use the str function and string operations:
    e.g. open('filename'+str(k)+'.txt') will when k=5 give you: open('filename5.txt')

    EDIT: You may want to include some debugging progress code, say print a message saying "opening output file"+filename each time you open a new file. You might want to be more precise with the exception handling, making sure it is an EOF and not some other file error, at least say, print out the error type. Check online python documentation for this.
  4. Feb 15, 2012 #3
    THanks I really appreciate the help!
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook