Editing Text Files - Easiest Way To Merge Every Other Line

Sam032789 · Jan 12, 2012

Hello! I have a file that looks like this:

T,2009,05,06,15,54,34
D,NBXX,500,1.032e-4,7.657e-5,4.956e-5,9.404e-6,7.269e-6,6.667e-6
T,2009,05,06,15,55,34
D,NBXX,440,1.032e-4,7.665e-5,4.911e-5,9.303e-6,7.292e-6,6.507e-6
T,2009,05,06,15,56,34
D,NBXX,380,1.041e-4,7.743e-5,5.073e-5,9.635e-6,7.639e-6,6.867e-6
T,2009,05,06,15,57,34
D,NBXX,320,1.027e-4,7.628e-5,4.938e-5,9.491e-6,7.596e-6,6.489e-6
T,2009,05,06,15,58,34
D,NBXX,260,1.014e-4,7.490e-5,4.852e-5,9.352e-6,7.325e-6,6.657e-6
T,2009,05,06,15,59,34
D,NBXX,200,1.010e-4,7.408e-5,4.763e-5,9.408e-6,7.383e-6,6.794e-6
.
.
.
(etc.)

Where every line beginning with a T is a new data set. So basically I need each line in the file to look like this:

T,2009,05,06,15,59,34 D,NBXX,200,1.010e-4,7.408e-5,4.763e-5,9.408e-6,7.383e-6,6.794e-6

I need to merge every other line so that each data set lies on the same line. There can just be a space in between the two data points, a comma is not necessary.

So what's the easiest way to go about doing this? I was thinking something along the lines of sed, awk, or grep. Help is greatly appreciated.

jhae2.718 · Jan 12, 2012

Code:

sed -e '{N; s/\n/ / }' FILENAME > OUTPUTFILENAME

produces a file containing

Code:

T,2009,05,06,15,54,34 D,NBXX,500,1.032e-4,7.657e-5,4.956e-5,9.404e-6,7.269e-6,6.667e-6
T,2009,05,06,15,55,34 D,NBXX,440,1.032e-4,7.665e-5,4.911e-5,9.303e-6,7.292e-6,6.507e-6
T,2009,05,06,15,56,34 D,NBXX,380,1.041e-4,7.743e-5,5.073e-5,9.635e-6,7.639e-6,6.867e-6
T,2009,05,06,15,57,34 D,NBXX,320,1.027e-4,7.628e-5,4.938e-5,9.491e-6,7.596e-6,6.489e-6
T,2009,05,06,15,58,34 D,NBXX,260,1.014e-4,7.490e-5,4.852e-5,9.352e-6,7.325e-6,6.657e-6
T,2009,05,06,15,59,34 D,NBXX,200,1.010e-4,7.408e-5,4.763e-5,9.408e-6,7.383e-6,6.794e-6

Is that the desired output?

DaveC426913 · Jan 12, 2012

Jeez, I would have just opened the file in Word, then done a Replace of |pD (the linebreak character) with ,D then saved out as .txt

AlephZero · Jan 12, 2012

DaveC426913 said:

Jeez, I would have just opened the file in Word, then done a Replace of |pD (the linebreak character) with ,D then saved out as .txt

The easiest way depends what tools you have. If it's a "one off" task, it doesn't matter much if it takes 1 sec or 100 sec to do it, compared with being sure that you did what you wanted to do.

DaveC's solution might be the best for DaveC, because (1) he already has Word and (2) he can remember Microsoft's pointlessly arcane (IMHO) abbreviations for regular expressions and non-printable characters.

Doing it with sed (or several other unix tools, or a good programmer's text editor) is just as trivially simple if you already have some of those tools and you know how to use them. .

Sam032789 · Jan 17, 2012

jhae2.718: Yes, that is the desired output. It worked with 3 out of the 5 files... the other ones turned out like this:

Code:

T,2009,08,18,15,41,28
 D,ZBXX,420,3.972e-5,2.358e-5,3.176e-5,1.940e-5,1.304e-5,2.451e-5
T,2009,08,18,15,42,28
 D,ZBXX,360,3.922e-5,2.326e-5,3.163e-5,1.932e-5,1.300e-5,2.447e-5
T,2009,08,18,15,43,28
 D,ZBXX,300,3.886e-5,2.296e-5,3.147e-5,1.921e-5,1.295e-5,2.447e-5
T,2009,08,18,15,44,28
 D,ZBXX,240,3.857e-5,2.273e-5,3.131e-5,1.926e-5,1.295e-5,2.448e-5

Any reason as to why, and how to fix it?

DaveC426913: AlephZero is right. Each of my files is hundreds of thousands of lines (it's basically 2 lines for every minute of the day for a whole month). If I had pasted that into a Word file, that would easily more than 500 pages. I don't think Word would like that (it would be way too slow). Although sed can be slow at times, it is still much, much faster than Word.

DaveC426913 · Jan 17, 2012

Sam032789 said:

DaveC426913Each of my files is hundreds of thousands of lines

Ah, yes, well specifying number and size of files in your OP would certainly get you higher quality answers off-the-bat.

Hello! I have a file ...

Sam032789 · Jan 17, 2012

Hmm. I guess I don't see the point in specifying how many lines. Whether it's 5 or 5 million, the operation in sed is essentially the same.

jhae2.718 · Jan 17, 2012

Are all the files formatted the same initially?

Sam032789 · Jan 17, 2012

I thought they were. They look identical to the files that worked. I had to download the data from a website, and it's quite possible they are different in some way I cannot see for these two months.

AlephZero · Jan 17, 2012

You started by saying "my file" and then changed to "each of my files".

A decent programmer's editor will let you open a few hundred files all at once, do the same edit(s) to all of them, and then "save all"...

And so will the Unix command line, of course.

Sam032789 · Jan 17, 2012

Yes... but what if the files that look the same are different? This is my problem now. The last two files are different.

I had no problem duplicating the commands and changing one character to operate on different files. I thought all the files were the same. Now I know that they are not for some reason.

I'm just not sure the reason was for posting your last post, AlephZero.

Sam032789 · Jan 17, 2012

Problem is solved. My problem in the last two files was that I also had \r's and \n's at the end of each line, instead of just the \n for the other 3 files. If anyone is curious, this is the code I used (and it worked perfectly):

Code:

sed 's/\r//; /^T/ {N; s/\n/ /}' <input.txt >output.txt

jhae2.718 · Jan 17, 2012

Those must have been written on Windows. Unix uses \n as a line terminator, Windows does \r\n

Sam032789 · Jan 17, 2012

Oh, I see. Thanks for the explanation. I was wondering why this was so!

Glad everything worked out in the end :)

jhae2.718 · Jan 17, 2012

For further reference,

Code:

Line Endings
-------------------------------------------------
Windows                       \r\n        (CR LF)
Unix/Linux/Mac OS X           \n          (LF)
Mac OS (before OS X)          \r          (CR)

CR is a carriage return, LF a line feed.

Editing Text Files - Easiest Way To Merge Every Other Line

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect