Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Ordering of CSV table

  1. May 14, 2017 #1

    hilbert2

    User Avatar
    Science Advisor
    Gold Member

    I have a little question related to reading, manipulating and writing .CSV table files... Suppose I have a table that has two columns and some number N of rows. The elements of the table have short character strings in them. How would I, with any common programming language, make a code that reads such a CSV file, orders the rows of the table so that the elements of 2nd column are in alphabetical order, and then writes the resulting table to another CSV file? So both columns are reordered but the new order depends only on the elements of the second column.

    The reason why I'm asking is that such a code would help a relative of mine in things she does at work, and I though that someone here could have an example code that does something like that. It can be written with C, C#, R-Code, Python or anything.
     
  2. jcsd
  3. May 14, 2017 #2

    Mark44

    Staff: Mentor

    The CSV file doesn't really contain a table. It contains lines, with each line terminated by a newline character (which can be a single character in Linux/Unix or two characters in Windows). Within each line are fields that are separated by, typically, a comma.

    One possible solution would be to create a two-dimensional array with the same number of rows as the number of lines in the file. Then read each line of the CSV file, storing the first field (a string) of a line in the column-0 part of a given row in the array, and storing the second field (also a string) in the column-1 part of the given array row.
    After all of the data is read from the file, sort the second column (column 1) of the array, making sure to move the entry in column 0 whenever there is a change in column 1. Both C and C++ have a standard library sort function, qsort(). Other languages usually have some capability of sorting a list of things.

    After all of the elements in column 1 are sorted, write the contents of the array to a differen CSV file
     
  4. May 14, 2017 #3

    hilbert2

    User Avatar
    Science Advisor
    Gold Member

    Thanks, I'll try to do that. The number of rows may not be the same in every CSV file, so I need to somehow read the number of rows from the file before allocating an array of strings. I haven't really needed to write code that manipulates files containing text, as the arrays in computational science usually contain only numbers.
     
  5. May 14, 2017 #4

    Dr Transport

    User Avatar
    Science Advisor
    Gold Member

    in C++, use vectors, that will take care of the varying number of rows in different files.
     
  6. May 14, 2017 #5
    If the first column has always the same width, you can just use command line's sort command. On Windows, something like
    Code (Text):
    sort file1.csv -options >file2.txt
    You'll need to replace -options with correct options. I don't have access to Windows right now. You should read help for sort (given by sort /?), there is an option to start comparison at a given column.
    There might be more options to actually start at first comma but there wasn't one last time I looked. You might find some freeware sort on the internet that has this option, it's probably easier than writing your own.

    Some keywords for search if you have no idea what I'm talking about: windows command line, windows batch file
     
  7. May 14, 2017 #6

    jedishrfu

    Staff: Mentor

    Sometimes other characters are chosen for field separation such as the vertical bar character "|" especially if the text contains a comma.
     
  8. May 14, 2017 #7
    If you go the programming route and you have access to the .NET framework, you may want to make use of the TextFieldParser class. Also, if I'm understanding you correctly, what you described can be done within Exel and OpenOffice. A macro (not sure if OpenOffice has macros) could be set up to make it a one click operation for each file.
     
  9. May 14, 2017 #8

    hilbert2

    User Avatar
    Science Advisor
    Gold Member

    Thanks, I was just reading about the sort and grep commands in Unix, and it looks like I can make a .sh batch file that automatically sorts the text in a set of files according to some criterion and ignores rows that don't contain some given keyword. The person I'm supposed to teach this to doesn't have Linux, but I guess there's some kind of Unix emulator for Win7 that allows the use of these commands.
     
  10. May 14, 2017 #9
    Windows 7 has a sort program that you can access from the command prompt. However, unless the columns are fixed width, it can only sort by the first column. At the command prompt type "sort /?" without quotes to see a list of options.
     
    Last edited: May 14, 2017
  11. May 14, 2017 #10

    hilbert2

    User Avatar
    Science Advisor
    Gold Member

    The CSV files that are to be arranged here are tables of food products that are sold in a store and are divided to different classes by some logic, like in this simplified example where a comma has been used as a separator:

    Code (Text):
    milk,Class B
    rice flour, Class A
    curry,Class C
    beef,Class B
    chocolate,Class A
    pepper,Class C
    On Unix command line, this can be sorted according to the second column with command "sort -k 2 -t $',' test-table.csv", which produces the output:

    Code (Text):

    chocolate,Class A
    rice flour,Class A
    beef,Class B
    milk,Class B
    curry,Class C
    pepper,Class C
    If I want to include only the products of "Class C" in the output, I can pipe the output to grep command like

    "sort -k 5 -t $',' test-table.csv | grep 'Class C' > out.txt",

    which produces an output file named "out.txt" which contains the text:

    Code (Text):
    curry,Class C
    pepper,Class C
    I also started making a Python app that makes GUI dialog boxes asking for the input and output file names and the sorting/filtering criteria and then executes the Unix terminal commands with the given parameters, using the "subprocess.call()" function.
     
  12. May 14, 2017 #11

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    A Perl program to do it. I am confused about which OS you are running on. I originally programmed this for Windows and tested it. For Unix, the only change was that I replaced the 'type' command with 'cat', but I could not test it. It should work. So this is for Unix and has not been run. If you are on Windows, replace 'cat' with 'type'. I have tested that.
    Code (Perl):

    # define a subroutine to compare the second columns of lines $a and $b
    # and return +1, 0, -1 appropriately
    sub byCol2 {
        $a =~ /,(.+)/;
        $a2=$1;
        $b =~ /,(.+)/;
        $b2=$1;
        return $a2 cmp $b2;
    }

    @inputLines = `cat temp.csv`; # read the file into an array
    @sortedLines = sort byCol2 @inputLines; # sort the array with byCol2

    # print sorted array output to file tempSortedCSV.csv
    open(OUT, ">tempSortedCSV.csv");
    print OUT  @sortedLines;
    close OUT;
     
    Last edited: May 14, 2017
  13. May 14, 2017 #12

    hilbert2

    User Avatar
    Science Advisor
    Gold Member

    Thanks, I'm doing this on Linux at the moment, but I may have to change it to a Windows program later. Ideally, the program would let you choose several files from an "Open File" GUI window by keeping CTRL down, then choose the sorting criteria in another window and finally produce sorted output files that are named by some obvious logic (original file table1.csv becoming table1_sorted.csv or something).
     
  14. May 15, 2017 #13

    jim mcnamara

    User Avatar

    Staff: Mentor

    Perl runs with identical source code for non-system operations, like your example, on either Windows or Linux. How you run it, in a DOS batch file or shell script, is what changes. Also note that Excel and most other spreadsheet programs have a sort function. The reason I mention spreadsheets is the csv format is sort of a Lingua Franca for moving data from program to other program. Everyone messing with data seems to use them - IMO.

    For the reasons mentioned and that you kind of waffled on requirements, consider the @FactChecker solution. It might help if you two were to collude on your requirements :woot:

    Of course if you are going to linux definitely shell and sort are the way to go. Beware of csv files that use double quoted text streams with embedded commas.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Ordering of CSV table
  1. Textread of csv-file (Replies: 9)

  2. Csv reader (Replies: 1)

  3. C++ csv file reading (Replies: 2)

Loading...