Python Need Help Turning One File into Thousands of Files (in Python)

AI Thread Summary
The discussion revolves around creating a Python script to process a large text file by splitting it into smaller files, each containing 567 lines. The user seeks guidance on several programming aspects, including accepting a file name as an argument, determining if a text file is treated as a list in Python, and implementing a loop to output the lines to separate files. Key points include the need to import necessary packages and handle file operations correctly. The user is advised to utilize a while loop for reading lines and managing file outputs, with exception handling to catch end-of-file errors. Suggestions for naming output files using numerical indices are provided, along with recommendations for debugging progress messages. The conversation emphasizes learning through the process rather than seeking a complete solution, highlighting the importance of understanding basic programming concepts and Python documentation.
ADCooper
Messages
20
Reaction score
1

Homework Statement



I'm not sure if this is quite where this belongs since it's not homework really, but I just need help programming and I figured this would be the place to ask. It's related to undergraduate research I'm doing so if that's not kosher just say so!

I want to write a Python script that takes in a certain file as an input, and then, after transforming the the file as a list where each line is an index, then outputting every 567 lines (so something like [0:566], [567:1133], etc] to new separate files. There are well over a million lines if I remember correctly so doing this without a script would be extremely inefficient (unless I've overlooked some super simple way to do it in the bash shell). I only need to do this for one file, so technically I don't need to find an extremely general solution for this, just one that works with this file.

The problem is that I have very little programming experience, so a little guidance would be super helpful (even just basic programming advice is certainly welcome). I don't want someone to do it for me because I need to be able to do this myself, I just need to know if my ideas will possibly work and what general functions I should read up on to get this accomplished if I'm on the complete wrong track.

If this is the wrong place for this I apologize!2. List of Things that Must Be Done

1. Accept name of text file as an argument when executing file.

2. Turn Text File type into a python list. (I'm not sure if this is even necessary. Are text files already considered as a "list" type when they are opened in a python environment? Are they already indexed based on line?)

3. Output every 567 lines to separate text files. (Could technically do this some way by using a keyword that will start every separate file, but I'd assume that would require more work)


The Attempt at a Solution



1. This one I basically understand. Simply have to import the necessary packages and set the first argument equal to the file being split

2. I'm not really sure yet if this is even necessary because I don't know if the file, after being called on the command line, is already indexed by line. Any clues on this?

3. Perhaps a for loop (something like for line [i,j] in file) with i = 0 and j = 566. If I can find a way within the loop to call i and j the indices which I would then use some way to output to a file of some form FILE#, and with every iteration of the loop increase the # next to the file name by 1 and increase i and j by 567. Am I on the right track? Or should I abandon ship ont his way and start from scratch?
 
Technology news on Phys.org
You'll want a loop, but you may find a while loop easier to work with and use exception handling:

Code:
inport sys

fIn = open('mybigfile.txt')
while True:
     try:
          "Inner loops here will open output file (fOut = open('filename1.txt')   ), 
           read lines and write to file, 
           close file and open next file as if it would go on forever."
     except IOError:
           "when end of file is reached read function will throw an error
            which this code catches.  
            Here we close last file and break out of loop."
             fOut.close()
             break

print "All Done!"

Note to use numerical index in file names you can use the str function and string operations:
e.g. open('filename'+str(k)+'.txt') will when k=5 give you: open('filename5.txt')

EDIT: You may want to include some debugging progress code, say print a message saying "opening output file"+filename each time you open a new file. You might want to be more precise with the exception handling, making sure it is an EOF and not some other file error, at least say, print out the error type. Check online python documentation for this.
 
THanks I really appreciate the help!
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Back
Top