Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Text File Editing

  1. Jan 12, 2012 #1
    Hello! I have a file that looks like this:

    Where every line beginning with a T is a new data set. So basically I need each line in the file to look like this:

    I need to merge every other line so that each data set lies on the same line. There can just be a space in between the two data points, a comma is not necessary.

    So what's the easiest way to go about doing this? I was thinking something along the lines of sed, awk, or grep. Help is greatly appreciated.
     
  2. jcsd
  3. Jan 12, 2012 #2

    jhae2.718

    User Avatar
    Gold Member

    Code (Text):

    sed -e '{N; s/\n/ / }' FILENAME > OUTPUTFILENAME
     
    produces a file containing
    Code (Text):

    T,2009,05,06,15,54,34 D,NBXX,500,1.032e-4,7.657e-5,4.956e-5,9.404e-6,7.269e-6,6.667e-6
    T,2009,05,06,15,55,34 D,NBXX,440,1.032e-4,7.665e-5,4.911e-5,9.303e-6,7.292e-6,6.507e-6
    T,2009,05,06,15,56,34 D,NBXX,380,1.041e-4,7.743e-5,5.073e-5,9.635e-6,7.639e-6,6.867e-6
    T,2009,05,06,15,57,34 D,NBXX,320,1.027e-4,7.628e-5,4.938e-5,9.491e-6,7.596e-6,6.489e-6
    T,2009,05,06,15,58,34 D,NBXX,260,1.014e-4,7.490e-5,4.852e-5,9.352e-6,7.325e-6,6.657e-6
    T,2009,05,06,15,59,34 D,NBXX,200,1.010e-4,7.408e-5,4.763e-5,9.408e-6,7.383e-6,6.794e-6
     
    Is that the desired output?
     
  4. Jan 12, 2012 #3

    DaveC426913

    User Avatar
    Gold Member

    Jeez, I would have just opened the file in Word, then done a Replace of |pD (the linebreak character) with ,D then saved out as .txt
     
  5. Jan 12, 2012 #4

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    The easiest way depends what tools you have. If it's a "one off" task, it doesn't matter much if it takes 1 sec or 100 sec to do it, compared with being sure that you did what you wanted to do.

    DaveC's solution might be the best for DaveC, because (1) he already has Word and (2) he can remember Microsoft's pointlessly arcane (IMHO) abbreviations for regular expressions and non-printable characters.

    Doing it with sed (or several other unix tools, or a good programmer's text editor) is just as trivially simple if you already have some of those tools and you know how to use them. .
     
  6. Jan 17, 2012 #5
    jhae2.718: Yes, that is the desired output. It worked with 3 out of the 5 files... the other ones turned out like this:

    Code (Text):
    T,2009,08,18,15,41,28
     D,ZBXX,420,3.972e-5,2.358e-5,3.176e-5,1.940e-5,1.304e-5,2.451e-5
    T,2009,08,18,15,42,28
     D,ZBXX,360,3.922e-5,2.326e-5,3.163e-5,1.932e-5,1.300e-5,2.447e-5
    T,2009,08,18,15,43,28
     D,ZBXX,300,3.886e-5,2.296e-5,3.147e-5,1.921e-5,1.295e-5,2.447e-5
    T,2009,08,18,15,44,28
     D,ZBXX,240,3.857e-5,2.273e-5,3.131e-5,1.926e-5,1.295e-5,2.448e-5
    Any reason as to why, and how to fix it?

    DaveC426913: AlephZero is right. Each of my files is hundreds of thousands of lines (it's basically 2 lines for every minute of the day for a whole month). If I had pasted that into a Word file, that would easily more than 500 pages. I don't think Word would like that (it would be way too slow). Although sed can be slow at times, it is still much, much faster than Word.
     
  7. Jan 17, 2012 #6

    DaveC426913

    User Avatar
    Gold Member


    Ah, yes, well specifying number and size of files in your OP would certainly get you higher quality answers off-the-bat. :wink:
     
  8. Jan 17, 2012 #7
    Hmm. I guess I don't see the point in specifying how many lines. Whether it's 5 or 5 million, the operation in sed is essentially the same.
     
  9. Jan 17, 2012 #8

    jhae2.718

    User Avatar
    Gold Member

    Are all the files formatted the same initially?
     
  10. Jan 17, 2012 #9
    I thought they were. They look identical to the files that worked. I had to download the data from a website, and it's quite possible they are different in some way I cannot see for these two months.
     
  11. Jan 17, 2012 #10

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    You started by saying "my file" and then changed to "each of my files".

    A decent programmer's editor will let you open a few hundred files all at once, do the same edit(s) to all of them, and then "save all"...

    And so will the Unix command line, of course.
     
  12. Jan 17, 2012 #11
    Yes... but what if the files that look the same are different? This is my problem now. The last two files are different.

    I had no problem duplicating the commands and changing one character to operate on different files. I thought all the files were the same. Now I know that they are not for some reason.

    I'm just not sure the reason was for posting your last post, AlephZero.
     
  13. Jan 17, 2012 #12
    Problem is solved. My problem in the last two files was that I also had \r's and \n's at the end of each line, instead of just the \n for the other 3 files. If anyone is curious, this is the code I used (and it worked perfectly):

    Code (Text):
    sed 's/\r//; /^T/ {N; s/\n/ /}' <input.txt >output.txt
     
  14. Jan 17, 2012 #13

    jhae2.718

    User Avatar
    Gold Member

    Those must have been written on Windows. Unix uses \n as a line terminator, Windows does \r\n
     
  15. Jan 17, 2012 #14
    Oh, I see. Thanks for the explanation. I was wondering why this was so!

    Glad everything worked out in the end :)
     
  16. Jan 17, 2012 #15

    jhae2.718

    User Avatar
    Gold Member

    For further reference,
    Code (Text):

    Line Endings
    -------------------------------------------------
    Windows                       \r\n        (CR LF)
    Unix/Linux/Mac OS X           \n          (LF)
    Mac OS (before OS X)          \r          (CR)
     
    CR is a carriage return, LF a line feed.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook