Comp Sci Java - Importing .csv file for calculating selected averages

  • Thread starter Thread starter SalfordPhysics
  • Start date Start date
  • Tags Tags
    File Java
AI Thread Summary
The discussion focuses on importing a CSV file containing monthly weather data to calculate yearly averages for rainfall and temperatures using Java. The user has successfully calculated overall averages but seeks to refine the program to compute averages for specific years. Suggestions include reading the file twice: the first pass to count the total months and the second to store data in an array for easier calculations. The conversation also touches on how to implement indexing to access specific year data, such as calculating the average temperature for 1994. Overall, the user is encouraged to experiment with the provided code structures to achieve the desired functionality.
SalfordPhysics
Messages
68
Reaction score
1
1. Problem: The task is to import an excel (.csv) file containing monthly weather data and produce a programme that can execute different averages on a yearly basis. i.e.;, produce average rainfall, maximum or minimum temperatures for desired years.

I have thus far managed to calculate average values for entire rows, but I want the programme to only calculate averages of desired values (yearly basis), so that this year's average data can be requested.

2. Relevant Information:

The format of the excel file is (representing a column each): {Year, Month, Max Temp, Min Temp, Rainfall, ...}

In the attempt below, avMonth is showing all the months listed then the average at the bottom as I expected. I have no idea where to go now to complete the task. 3. Attempt:
Mod note: Added [ code ] tags to restore indentation.[/color]
Code:
package weatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Main {
	
	public static void main(String[] args) {
		String fileName = "sheffielddata.csv";
		File file = new File(fileName);
		try {
			Scanner inputStream = new Scanner(file);
			double sum = 0;
			double numberofMonths = 0;
			while (inputStream.hasNext()){
				String data = inputStream.next();
				String[] values = data.split(",");
				double avMonth = Double.parseDouble(values [1]);
				sum += avMonth;
				numberofMonths++;
				System.out.println(avMonth);
			}
			inputStream.close();
			System.out.println("Average - " + (sum / numberofMonths));
			
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
}
 
Last edited by a moderator:
Physics news on Phys.org
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.
 
.Scott said:
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.

Thanks for the response Scott, although I can't say I am any the wiser. I am struggling a lot due to next-to-no java experience. Any kind of elaboration would be much appreciated.

My numberOfMonths gives me total number of rows, which calculates me my overall averages. But I want to just calculate averages on a yearly basis as chosen by indexing so...

1) Set up variables for calculating averages
a) Since the divisor is over a year on a monthly chart it is just 12, so I can just put this in.
b) To be thorough, I could divide my numberOfMonths by the actual number of years, but this I am not sure how to do. It would need to only register every 12 rows (since each row is a month). Could this be done by use of indexOf? If so, how do I go about this on the multiple basis I need?

2) Are all the rows already indexed? When I println(numberOfMonths) I get 1 through to 804 (which is the number of rows). Can I implement where by inserting these values the corresponding row of data can be displayed?

So many questions and I apologise that I aren't as clear as could be.
Many thanks again, and to all in advance.
 
The first thing you need to do is decide how you going to access the data.
1) One way would be to keep the data in the file and only read what you need when you need it. That would be a bit complicated and perhaps not that efficient.
2) Another would be to try to collect and compute everything you needed in a single pass through the data. If possible, that would be the most efficient use of system resources.
3) Less complicated would be to read the entire file into a "List" and then copy the list into an array.
4) The least complicated is to read through your data twice (two passes). On the first pass, you just count the months. On the second pass, you read everything into a convenient array. Since that's the simplest for coding, that's what I show in the examples below.

Also in my examples below, I am assuming that the first two value are the year and month.

Here's one way of getting the data into an array:
1) Define a structure for my weather data, for example:
Code:
  public class Weather {
    public int nYear;
    public int nMonth;
    public double dAvTemp;
    public double dMaxTemp;
    public double dMinTemp;
    public double dAvHumidity;
    ... whatever ...
  }
2) As you did in your code, read the data set and determine numberOfMonths.
3) Declare your array, as with:
Code:
  Weather[] WeatherArray = new Weather [numberOfMonths];
4) Read through the file again, this time storing everything away in the array:
Code:
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    String data = inputStream.next();
    String[] values = data.split(",");
    WeatherArray[nMonth].nYear      = Integer.parseInt(values[0]);
    WeatherArray[nMonth].nMonth    = Integer.parseInt(values[1]);
    WeatherArray[nMonth].dAvTemp = Double.parseDouble(values [2]);
    ... whatever ...
  }

With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.

Without knowing the details of the csv file, it's hard to suggest anything further.

If you don't actually have year information in the row, then you can do "nFullYears = numberOfMonths/12" as you suggested. If you have the option, put the year and month with the data.

If you want to find a break from one year to the next, you can do this:
Code:
  int nThisMonth = 0;
  int nTotalMonths = 0;
  double dAvTemp = 0.0;
  double dMaxTemp = -999.9;
  for(int nRow=0; nRow<nomberOfMonths; nRow++) {
    if(WeatherArray[nRow].nMonth != nThisMonth) {
      if(nTotalMonths>0) {
        dAvTemp = dAvTemp/nTotalMonths;
        print everything for this year ...
      }
      nThisMonth = nMonth;
      nTotalMonths = 0;
      dAvTemp = 0.0;
      dMaxTemp = -999.9;
    }
    nTotalMonths++;
    dAvTemp += WeatherArray[nRow].dTemp;
    double dMax = WeatherArray[nRow].dMaxTemp;
    if(dMaxTemp<dMax) dMaxTemp = dMax;
  }
  if(nTotalMonths>0) {
    dAvTemp = dAvTemp/nTotalMonths;
    print everything for this year ...
  }

I don't think I've answered all your questions. But I think you have enough to experiment.
As you experiment, you should discover different ways of doing different things.
I hope you have a build environment handy.
 
  • Like
Likes 1 person
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.

.Scott said:
With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.


Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.

How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }
 
.Scott said:
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }

Thanks again Scott for all your help, this is an edit after working back through again.

I have compiled as you instructed, and up to where you left of after constructing WeatherArray.
At this point, I can get the earlier print out of average over a column (that helps in getting numberOfMonths) but I can't get a print out of anything after WeatherArray, not even numberofMonths, it is returning
"Exception in thread "main" java.lang.IllegalStateException: Scanner closed
at java.util.Scanner.ensureOpen(Scanner.java:1070)
at java.util.Scanner.next(Scanner.java:1358)
at sheffieldWeatherPackage.Weather.main(Weather.java:37)


Here is my code, much replicating yours, for reference;


package sheffieldWeatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Weather {
public int nYear;
public int nMonth;
public double maxTemp;
public double minTemp;
public double afDays;
public double rainFall;
public double sunHrs;


public static void main(String[] args) {
String fileName = "sheffielddata.csv";
File file = new File(fileName);
try {
Scanner inputStream = new Scanner(file);
double sum = 0;
int numberofMonths = 0;
while (inputStream.hasNext()){
String data = inputStream.next();
String[] values = data.split(",");
double avRain = Double.parseDouble(values[5]);
sum += avRain;
numberofMonths++;
System.out.println(numberofMonths);
}
inputStream.close();
System.out.println("Average - " + (sum / numberofMonths));

Weather[] WeatherArray = new Weather [numberofMonths];
for (int nMonth = 0; nMonth<numberofMonths; nMonth++) {
String data = inputStream.next();
String[] values = data.split(",");
WeatherArray[nMonth].nYear = Integer.parseInt(values[0]);
WeatherArray[nMonth].nMonth = Integer.parseInt(values[1]);
WeatherArray[nMonth].maxTemp = Integer.parseInt(values[2]);
WeatherArray[nMonth].minTemp = Integer.parseInt(values[3]);
WeatherArray[nMonth].afDays = Integer.parseInt(values[4]);
WeatherArray[nMonth].rainFall = Integer.parseInt(values[5]);
WeatherArray[nMonth].sunHrs = Integer.parseInt(values[6]);



System.out.println(numberofMonths);


}

} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}
 
Last edited:
SalfordPhysics said:
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.




Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.
The code tags should look like this:
[code[/color]] your code [/code]
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
I would like to provide the following code in calculating average rainfall, maximum or minimum temperatures.

Java:
public static void readCSV() throws FileNotFoundException {
        // 1st, config the CSV reader, such as line separator, column separator and so on
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");

        // 2nd, creates a CSV parser with the configs
        CsvParser parser = new CsvParser(settings);

        // 3rd, parses all rows from the CSV file into a 2-dimensional array
        List<String[]> resolvedData = parser.parseAll(new FileReader("/examples/example.csv"));

        // 4th, fill the data list with 1st column in all rows in the csv file.
        List<Integer> data = new ArrayList<Integer>();
        for (String[] row : resolvedData) {
            data.add(Integer.parseInt(row[0]));
        }

        // 5th, find out the smallest, largest, average and sum
        int smallest = Integer.MAX_VALUE, largest = 0, average = 0, sum = 0;
        for (Integer val : data) {
            sum += val;
            smallest = val <= smallest ? val : smallest;
            largest = val > largest ? val : largest;
        }

        average = data.isEmpty() ? 0 : sum / data.size();
    }

In this code example, I used the open source library uniVocity-parsers to parse CSV files into list of arrays just in few lines of code. You can try to extend from this code to fullfill your requirements better.
 

Similar threads

Replies
7
Views
2K
Replies
10
Views
11K
Replies
6
Views
3K
Replies
2
Views
3K
Replies
1
Views
2K
Replies
1
Views
2K
Replies
2
Views
4K
Back
Top