How can I read CSV format in C/C++?

  • Context: C/C++ 
  • Thread starter Thread starter leon1127
  • Start date Start date
  • Tags Tags
    Csv Format
Click For Summary

Discussion Overview

The discussion revolves around reading CSV formatted data in C/C++. Participants share code snippets, address issues related to memory management, and explore different methods for parsing CSV files. The conversation includes both theoretical and practical aspects of handling CSV data, with a focus on specific code implementations.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Homework-related
  • Exploratory

Main Points Raised

  • One participant shares their code for reading CSV data but encounters memory leaks and issues with parsing the data correctly.
  • Another participant suggests that memory leak detection may be misleading due to STL behavior and points out the need to close the file pointer.
  • Some participants propose using stringstream and getline() for more effective parsing of CSV data, highlighting its flexibility in handling delimiters.
  • A later reply mentions issues with the format of the CSV file, specifically the absence of line returns, which causes data to run together.
  • Participants discuss potential solutions for handling different line separators in the CSV file.
  • Another participant introduces a new problem related to reading a different text file format and seeks advice on parsing it correctly.

Areas of Agreement / Disagreement

There is no consensus on the best method for reading CSV files, as participants suggest different approaches and face varying issues with their implementations. The discussion remains unresolved regarding the optimal solution for all scenarios presented.

Contextual Notes

Participants express concerns about memory management and file handling, but specific assumptions about the CSV file format and its consistency are not fully addressed. The discussion includes various code snippets that may not be universally applicable due to differing file formats.

Who May Find This Useful

Readers interested in C/C++ programming, particularly those working with file I/O and data parsing, may find the insights and code examples shared in this discussion beneficial.

leon1127
Messages
484
Reaction score
0
Hi Guys,

I have be trying to read CSV formatted data with C/C++ with no help...

The format is following
8/29/2008,19.54,19.6,19.28,19.38,11204900,19.38
8/28/2008,19.48,19.76,19.38,19.65,11729500,19.65
8/27/2008,19.08,19.45,18.93,19.37,9300100,19.37
8/26/2008,19.12,19.2,19,19.09,8770500,19.09
8/25/2008,19.34,19.4,19.05,19.09,13779300,19.09
8/22/2008,19.11,19.68,19.1,19.53,11087500,19.53
8/21/2008,19.06,19.18,18.87,19.11,16995100,19.11
8/20/2008,19.57,19.65,19.1,19.17,16336900,19.17
8/19/2008,19.78,19.91,19.41,19.42,12837300,19.42

and my code is
// csv_read.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <cstring>
#include<stdio.h>
#include<stdlib.h>
#include <vector>
#include <iostream>
using namespace std;

struct yahoo_data
{
char date[9];
double open;
double high;
double low;
double close;
float volume;
double adj_close;
};

vector<yahoo_data> yhoo;

typedef struct yahoo_data data_format;

int _tmain(int argc, _TCHAR* argv[])
{

FILE *fp;
fp=fopen("table.csv","r");
char bufferCustNum[9];
char bufferCost[40];
int i = 0;

data_format temp;

count<< "before while"<<endl;

while( fgets(bufferCustNum,sizeof(bufferCustNum),fp) != NULL)
{
count<< "i = "<< i << endl;
if(i>0){

strcpy(temp.date, strtok(bufferCustNum,","));
count<< temp.date<<endl;
/*
strcpy(temp.open, (double)strtok(NULL,","));
strcpy(temp.high, strtok(NULL,","));
strcpy(temp.low, strtok(NULL,","));
strcpy(temp.close, strtok(NULL,","));
strcpy(temp.volume, strtok(NULL,","));
strcpy(temp.adj_close, strtok(NULL,","));
*/

temp.open = (double)atof(strtok(NULL,","));
count<< "after open"<<endl;
temp.high = atof(strtok(NULL,","));
count<< "after high"<<endl;
temp.low = atof(strtok(NULL,","));
count<< "after low"<<endl;
temp.close = atof(strtok(NULL,","));
count<< "after close"<<" "<<temp.close<<endl;
// count<< atof(strtok(NULL,","))<<endl;
temp.volume = (int)atof(strtok(NULL,","));
count<< "after adjust close"<<endl;
temp.adj_close = atof(strtok(NULL,","));
count<< "before push"<<endl;
yhoo.push_back(temp);
count<< "before push"<<endl;
}
++i;
}

for (int j=0; j<10; j++)
{

}
return 0;
}


There is some unnecessary part in the code but it should compile under g++. I get some memory leaks... Anyone have idea?
 
Technology news on Phys.org
How are you identifying memory leaks? Some implementations of the STL will cache small blocks of memory. And upon program termination, if those blocks are not used, they are not explicitly deallocated -- which is fine behavior, but will confuse some memory leak detectors.

That said, I notice you forgot to close the FILE* you created... (an fstream would automatically close upon going out of scope)

As an aside... strtok always scares me. Have you considered using stringstream to help with parsing? Or a string processing library, such as spirit from the boost libraries?
 
Yes, a stringstream and getline() will do the job. Remember, getline() isn't just for reading lines. You can specify a different "line terminator" instead of the default '\n'. getline(mystream, mystring, ',') reads from mystream into mystring, stopping at the next ',' (and discarding the ','). Here's a sample program based on your data file:

Code:
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <string>

using namespace std;

int main ()
{
    ifstream inFile ("csv.dat");
    string line;
    int linenum = 0;
    while (getline (inFile, line))
    {
        linenum++;
        cout << "\nLine #" << linenum << ":" << endl;
        istringstream linestream(line);
        string item;
        int itemnum = 0;
        while (getline (linestream, item, ','))
        {
            itemnum++;
            cout << "Item #" << itemnum << ": " << item << endl;
        }
    }

    return 0;
}

It produces the following output:

Code:
Line #1:
Item #1: 8/29/2008
Item #2: 19.54
Item #3: 19.6
Item #4: 19.28
Item #5: 19.38
Item #6: 11204900
Item #7: 19.38

Line #2:
Item #1: 8/28/2008
Item #2: 19.48
Item #3: 19.76
Item #4: 19.38
Item #5: 19.65
Item #6: 11729500
Item #7: 19.65
(etc.)
 
Last edited:
  • Like
Likes   Reactions: OmCheeto
Thank you very much. that sovled my problem!
 
Last edited:
I've tried the above code and it doesn't seem to work quite the same. I've learned that my .csv file ignores the white space and has no linen returns. Therefore, it seems that the last number on the first line and the date on the next line run together, ex.

19.38
8/28/2008

Is there anyway to adjust the file type to add a line return after each line? Any other suggestions to solve this problem?
 
What character does your file use to separate lines? Tell the getline() that controls the while-loop to use that as the separator. For example, if it's a '|', the beginning of the while-loop would look like this:

Code:
while (getline (inFile, line, '|'))
 
Thanks! The code does exactly what I'm looking for too.
 
Hi all,

I'm trying to read a text file which has following format. What I would like to have is:
- Put "25" into number_data variable, "6" into number_place variable, and "1" into timestep variable.
- separate year (e.g 1990), month (e.g 01), and date (e.g 01) into 3 variables
- Handle the mix between number and NA especially in Place3 data

I tried jtbell's code and it worked for the first three lines. But I had no ideas how to just write out what I need and how to handle the rest of the data file.
Could anyone tell me how to do this? Thank you very much.

Number of data points 25
Number of places 6
Timestep 1 hour
Date Hour Place1 Place2 Place3 Place4 Place5 Place6
1990-01-01 1 25.002 NA NA 16.265 6.231 9.680
1990-01-01 2 24.449 NA NA 16.265 6.231 9.551
1990-01-01 3 24.449 NA NA 16.265 6.231 9.551
1990-01-01 4 24.550 NA NA 16.265 6.231 9.551
1990-01-01 5 24.851 NA NA 16.265 6.130 9.551
1990-01-01 6 25.002 NA NA 16.099 6.130 9.421
1990-01-01 7 25.306 NA 29.540 15.933 6.130 9.421
1990-01-01 8 25.357 NA 29.197 15.933 6.130 9.421
1990-01-01 9 25.357 NA 28.856 15.933 6.029 9.389
1990-01-01 10 25.306 NA 28.477 15.769 6.029 9.260
1990-01-01 11 25.306 NA 28.176 15.769 6.029 9.260
1990-01-01 12 25.103 NA 27.913 15.605 6.029 9.132
1990-01-01 13 24.952 NA 27.651 15.605 6.029 9.132
1990-01-01 14 24.901 NA 27.464 15.605 5.905 9.132
1990-01-01 15 24.851 NA 27.315 15.442 6.004 9.132
1990-01-01 16 24.801 NA 27.240 15.442 5.905 9.003
1990-01-01 17 24.700 NA 27.240 15.442 5.905 9.003
1990-01-01 18 24.550 NA 27.278 15.442 5.905 9.003
1990-01-01 19 24.350 NA NA 15.442 5.905 9.003
1990-01-01 20 24.150 NA NA 15.279 5.905 9.003
1990-01-01 21 23.952 NA NA 15.239 5.806 8.875
1990-01-01 22 23.803 NA NA 15.078 5.806 8.875
1990-01-01 23 23.704 NA NA 14.758 5.806 8.875
1990-01-01 24 23.557 NA NA 14.758 5.806 8.875
1990-01-02 1 23.458 NA NA 14.758 5.684 8.875

Regards,

MT
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
12K
Replies
10
Views
2K
  • · Replies 8 ·
Replies
8
Views
6K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 89 ·
3
Replies
89
Views
6K
Replies
7
Views
4K
  • · Replies 29 ·
Replies
29
Views
10K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 17 ·
Replies
17
Views
2K