Order CSV Table - Programming Language Code

  • Thread starter hilbert2
  • Start date
  • Tags
    csv table
In summary, the person asked for a solution to a problem that they did not state in their question, and provided a summary of a program that can be written in different languages to solve a problem.
  • #1
hilbert2
Science Advisor
Insights Author
Gold Member
1,598
605
I have a little question related to reading, manipulating and writing .CSV table files... Suppose I have a table that has two columns and some number N of rows. The elements of the table have short character strings in them. How would I, with any common programming language, make a code that reads such a CSV file, orders the rows of the table so that the elements of 2nd column are in alphabetical order, and then writes the resulting table to another CSV file? So both columns are reordered but the new order depends only on the elements of the second column.

The reason why I'm asking is that such a code would help a relative of mine in things she does at work, and I though that someone here could have an example code that does something like that. It can be written with C, C#, R-Code, Python or anything.
 
Technology news on Phys.org
  • #2
The CSV file doesn't really contain a table. It contains lines, with each line terminated by a newline character (which can be a single character in Linux/Unix or two characters in Windows). Within each line are fields that are separated by, typically, a comma.

One possible solution would be to create a two-dimensional array with the same number of rows as the number of lines in the file. Then read each line of the CSV file, storing the first field (a string) of a line in the column-0 part of a given row in the array, and storing the second field (also a string) in the column-1 part of the given array row.
After all of the data is read from the file, sort the second column (column 1) of the array, making sure to move the entry in column 0 whenever there is a change in column 1. Both C and C++ have a standard library sort function, qsort(). Other languages usually have some capability of sorting a list of things.

After all of the elements in column 1 are sorted, write the contents of the array to a differen CSV file
 
  • Like
Likes hilbert2
  • #3
Thanks, I'll try to do that. The number of rows may not be the same in every CSV file, so I need to somehow read the number of rows from the file before allocating an array of strings. I haven't really needed to write code that manipulates files containing text, as the arrays in computational science usually contain only numbers.
 
  • #4
in C++, use vectors, that will take care of the varying number of rows in different files.
 
  • Like
Likes hilbert2
  • #5
If the first column has always the same width, you can just use command line's sort command. On Windows, something like
Code:
sort file1.csv -options >file2.txt
You'll need to replace -options with correct options. I don't have access to Windows right now. You should read help for sort (given by sort /?), there is an option to start comparison at a given column.
There might be more options to actually start at first comma but there wasn't one last time I looked. You might find some freeware sort on the internet that has this option, it's probably easier than writing your own.

Some keywords for search if you have no idea what I'm talking about: windows command line, windows batch file
 
  • Like
Likes hilbert2
  • #6
Sometimes other characters are chosen for field separation such as the vertical bar character "|" especially if the text contains a comma.
 
  • #7
If you go the programming route and you have access to the .NET framework, you may want to make use of the TextFieldParser class. Also, if I'm understanding you correctly, what you described can be done within Exel and OpenOffice. A macro (not sure if OpenOffice has macros) could be set up to make it a one click operation for each file.
 
  • #8
Thanks, I was just reading about the sort and grep commands in Unix, and it looks like I can make a .sh batch file that automatically sorts the text in a set of files according to some criterion and ignores rows that don't contain some given keyword. The person I'm supposed to teach this to doesn't have Linux, but I guess there's some kind of Unix emulator for Win7 that allows the use of these commands.
 
  • #9
Windows 7 has a sort program that you can access from the command prompt. However, unless the columns are fixed width, it can only sort by the first column. At the command prompt type "sort /?" without quotes to see a list of options.
 
Last edited:
  • #10
The CSV files that are to be arranged here are tables of food products that are sold in a store and are divided to different classes by some logic, like in this simplified example where a comma has been used as a separator:

Code:
milk,Class B
rice flour, Class A
curry,Class C
beef,Class B
chocolate,Class A
pepper,Class C

On Unix command line, this can be sorted according to the second column with command "sort -k 2 -t $',' test-table.csv", which produces the output:

Code:
chocolate,Class A
rice flour,Class A
beef,Class B
milk,Class B
curry,Class C
pepper,Class C

If I want to include only the products of "Class C" in the output, I can pipe the output to grep command like

"sort -k 5 -t $',' test-table.csv | grep 'Class C' > out.txt",

which produces an output file named "out.txt" which contains the text:

Code:
curry,Class C
pepper,Class C

I also started making a Python app that makes GUI dialog boxes asking for the input and output file names and the sorting/filtering criteria and then executes the Unix terminal commands with the given parameters, using the "subprocess.call()" function.
 
  • #11
A Perl program to do it. I am confused about which OS you are running on. I originally programmed this for Windows and tested it. For Unix, the only change was that I replaced the 'type' command with 'cat', but I could not test it. It should work. So this is for Unix and has not been run. If you are on Windows, replace 'cat' with 'type'. I have tested that.
Perl:
# define a subroutine to compare the second columns of lines $a and $b
# and return +1, 0, -1 appropriately
sub byCol2 {
    $a =~ /,(.+)/;
    $a2=$1;
    $b =~ /,(.+)/;
    $b2=$1;
    return $a2 cmp $b2;
}

@inputLines = `cat temp.csv`; # read the file into an array
@sortedLines = sort byCol2 @inputLines; # sort the array with byCol2

# print sorted array output to file tempSortedCSV.csv
open(OUT, ">tempSortedCSV.csv");
print OUT  @sortedLines;
close OUT;
 
Last edited:
  • Like
Likes Hajytel, jim mcnamara and hilbert2
  • #12
Thanks, I'm doing this on Linux at the moment, but I may have to change it to a Windows program later. Ideally, the program would let you choose several files from an "Open File" GUI window by keeping CTRL down, then choose the sorting criteria in another window and finally produce sorted output files that are named by some obvious logic (original file table1.csv becoming table1_sorted.csv or something).
 
  • #13
Perl runs with identical source code for non-system operations, like your example, on either Windows or Linux. How you run it, in a DOS batch file or shell script, is what changes. Also note that Excel and most other spreadsheet programs have a sort function. The reason I mention spreadsheets is the csv format is sort of a Lingua Franca for moving data from program to other program. Everyone messing with data seems to use them - IMO.

For the reasons mentioned and that you kind of waffled on requirements, consider the @FactChecker solution. It might help if you two were to collude on your requirements :woot:

Of course if you are going to linux definitely shell and sort are the way to go. Beware of csv files that use double quoted text streams with embedded commas.
 

1. What is a "CSV" file?

A CSV (Comma Separated Values) file is a type of plain text file that contains data in a tabular format. It is a common file format used for storing and exchanging data between different applications and systems.

2. How do I open a CSV file?

You can open a CSV file using a variety of programs, such as Microsoft Excel, Google Sheets, or any text editor. Simply double-click on the file to open it in the default program, or right-click on the file and choose "Open With" to select a specific program to open it with.

3. What is the purpose of the "Order CSV Table - Programming Language Code"?

The purpose of the "Order CSV Table - Programming Language Code" is to provide a standardized and organized way to order and display data related to programming languages. This can be helpful for data analysis, comparisons, and other purposes.

4. Can I edit the data in a CSV file?

Yes, you can edit the data in a CSV file using a text editor or a spreadsheet program. However, it is important to be careful when editing a CSV file as any changes made can affect the data formatting and structure.

5. How do I create a CSV file?

You can create a CSV file using a text editor or a spreadsheet program. Simply enter the data in a tabular format, with each column separated by a comma. Then, save the file with a ".csv" extension to convert it into a CSV file.

Similar threads

  • Programming and Computer Science
Replies
9
Views
2K
  • Programming and Computer Science
Replies
14
Views
4K
  • Programming and Computer Science
Replies
7
Views
2K
  • Programming and Computer Science
Replies
4
Views
1K
  • Programming and Computer Science
Replies
2
Views
21K
  • Programming and Computer Science
Replies
3
Views
806
  • Programming and Computer Science
Replies
7
Views
412
Replies
3
Views
761
  • Programming and Computer Science
4
Replies
107
Views
5K
  • Programming and Computer Science
Replies
19
Views
3K
Back
Top