Looking for the fastest open source library to parse CSV

AI Thread Summary
The discussion centers on the need for a fast open-source library to parse large CSV files containing millions of records from communications carriers. The user currently employs JavaCSV but finds it inadequate and is considering uniVocity-parsers. They require a custom parsing logic due to complex business needs, which cannot be efficiently handled by Excel or other spreadsheet software due to row limits and performance issues. A suggestion is made to utilize MySQL's load data command for importing CSV files, allowing for SQL-based data manipulation. Overall, the focus is on finding a robust solution for processing extensive datasets efficiently.
Jerry Sivan
Messages
3
Reaction score
0
I would like to to read and process huge CSV files with millions of records, which we collected from the communications carriers' network.Here is my logic as a simplified procedure:
1) Read the CSV file with FTP protocol
2) Parse the CSV file with my own logic, such as combination, duplicates deletion and so on.
3) Store the parsed data into the MySQL database.
4) Do analysis based on the MySQL database.
5) Generate report as Excel,PPT,Word.Currently we are using the library JavaCSV, but it's not good enough for my project. The fastest library I could find recently is uniVocity-parsers.Do you have any other suggestion? Thanks.
 
Engineering news on Phys.org
Is there any reason you can't do the entire process through Excel?
 
russ_watters said:
Is there any reason you can't do the entire process through Excel?
Thanks for your quick reply.
Sorry, but I need to program with my own business logic (which is complex, interacting with other business modules in my system) in step 2.
 
The maximum rows you can have in MS Excel is 1,048,576. This is also the limit with open/libreoffice. Libreoffice tends to get crashy if you try to graph very large datasets. Excel is better about that but still rather slow. I don't know how Openoffice or other spread sheets respond to very large datasets.

Once you get to tens to hundreds of thousands of rows, spreadsheets tend to be suboptimal for any kind of data analysis. I have heard that "R" is good for that sort of thing but I have never learned to use it properly so can't give any useful report on it.

BoB
 
Hi all, I have a question. So from the derivation of the Isentropic process relationship PV^gamma = constant, there is a step dW = PdV, which can only be said for quasi-equilibrium (or reversible) processes. As such I believe PV^gamma = constant (and the family of equations) should not be applicable to just adiabatic processes? Ie, it should be applicable only for adiabatic + reversible = isentropic processes? However, I've seen couple of online notes/books, and...
I have an engine that uses a dry sump oiling system. The oil collection pan has three AN fittings to use for scavenging. Two of the fittings are approximately on the same level, the third is about 1/2 to 3/4 inch higher than the other two. The system ran for years with no problem using a three stage pump (one pressure and two scavenge stages). The two scavenge stages were connected at times to any two of the three AN fittings on the tank. Recently I tried an upgrade to a four stage pump...
Back
Top