Java - Importing .csv file for calculating selected averages

  • Context: Comp Sci 
  • Thread starter Thread starter SalfordPhysics
  • Start date Start date
  • Tags Tags
    File Java
Click For Summary

Discussion Overview

The discussion revolves around importing a CSV file containing monthly weather data into a Java program to calculate yearly averages for rainfall and temperatures. Participants explore various methods for processing the data efficiently and addressing specific programming challenges related to indexing and data retrieval.

Discussion Character

  • Technical explanation
  • Exploratory
  • Homework-related

Main Points Raised

  • One participant describes the initial problem of calculating averages from a CSV file and expresses a desire to compute averages on a yearly basis instead of overall averages.
  • Another participant suggests reading the file twice to first determine the number of data values and then to process the data into an array for easier access.
  • There are suggestions for structuring the weather data into a class to facilitate calculations, including storing year, month, and temperature data.
  • One participant raises a question about how to calculate averages for specific years, such as 1994, and discusses the challenges of using indexing to access the correct data.
  • Multiple methods for accessing and processing the data are proposed, including reading the entire file into a list or array and using loops to compute statistics based on year.
  • Some participants express confusion regarding the implementation of their ideas and seek further clarification on specific coding techniques, such as using indexOf.

Areas of Agreement / Disagreement

Participants generally agree on the need to structure the data for efficient access and computation, but there is no consensus on the best approach to implement the calculations or handle specific programming challenges. The discussion remains unresolved regarding the most effective coding strategies.

Contextual Notes

Participants mention limitations related to their Java experience and the complexity of the task, indicating that further experimentation and clarification are needed to fully understand the implementation details.

Who May Find This Useful

This discussion may be useful for individuals interested in Java programming, data processing, and statistical analysis, particularly those working with CSV files and weather data.

SalfordPhysics
Messages
68
Reaction score
1
1. Problem: The task is to import an excel (.csv) file containing monthly weather data and produce a programme that can execute different averages on a yearly basis. i.e.;, produce average rainfall, maximum or minimum temperatures for desired years.

I have thus far managed to calculate average values for entire rows, but I want the programme to only calculate averages of desired values (yearly basis), so that this year's average data can be requested.

2. Relevant Information:

The format of the excel file is (representing a column each): {Year, Month, Max Temp, Min Temp, Rainfall, ...}

In the attempt below, avMonth is showing all the months listed then the average at the bottom as I expected. I have no idea where to go now to complete the task. 3. Attempt:
Mod note: Added [ code ] tags to restore indentation.[/color]
Code:
package weatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Main {
	
	public static void main(String[] args) {
		String fileName = "sheffielddata.csv";
		File file = new File(fileName);
		try {
			Scanner inputStream = new Scanner(file);
			double sum = 0;
			double numberofMonths = 0;
			while (inputStream.hasNext()){
				String data = inputStream.next();
				String[] values = data.split(",");
				double avMonth = Double.parseDouble(values [1]);
				sum += avMonth;
				numberofMonths++;
				System.out.println(avMonth);
			}
			inputStream.close();
			System.out.println("Average - " + (sum / numberofMonths));
			
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
}
 
Last edited by a moderator:
Physics news on Phys.org
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.
 
.Scott said:
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.

Thanks for the response Scott, although I can't say I am any the wiser. I am struggling a lot due to next-to-no java experience. Any kind of elaboration would be much appreciated.

My numberOfMonths gives me total number of rows, which calculates me my overall averages. But I want to just calculate averages on a yearly basis as chosen by indexing so...

1) Set up variables for calculating averages
a) Since the divisor is over a year on a monthly chart it is just 12, so I can just put this in.
b) To be thorough, I could divide my numberOfMonths by the actual number of years, but this I am not sure how to do. It would need to only register every 12 rows (since each row is a month). Could this be done by use of indexOf? If so, how do I go about this on the multiple basis I need?

2) Are all the rows already indexed? When I println(numberOfMonths) I get 1 through to 804 (which is the number of rows). Can I implement where by inserting these values the corresponding row of data can be displayed?

So many questions and I apologise that I aren't as clear as could be.
Many thanks again, and to all in advance.
 
The first thing you need to do is decide how you going to access the data.
1) One way would be to keep the data in the file and only read what you need when you need it. That would be a bit complicated and perhaps not that efficient.
2) Another would be to try to collect and compute everything you needed in a single pass through the data. If possible, that would be the most efficient use of system resources.
3) Less complicated would be to read the entire file into a "List" and then copy the list into an array.
4) The least complicated is to read through your data twice (two passes). On the first pass, you just count the months. On the second pass, you read everything into a convenient array. Since that's the simplest for coding, that's what I show in the examples below.

Also in my examples below, I am assuming that the first two value are the year and month.

Here's one way of getting the data into an array:
1) Define a structure for my weather data, for example:
Code:
  public class Weather {
    public int nYear;
    public int nMonth;
    public double dAvTemp;
    public double dMaxTemp;
    public double dMinTemp;
    public double dAvHumidity;
    ... whatever ...
  }
2) As you did in your code, read the data set and determine numberOfMonths.
3) Declare your array, as with:
Code:
  Weather[] WeatherArray = new Weather [numberOfMonths];
4) Read through the file again, this time storing everything away in the array:
Code:
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    String data = inputStream.next();
    String[] values = data.split(",");
    WeatherArray[nMonth].nYear      = Integer.parseInt(values[0]);
    WeatherArray[nMonth].nMonth    = Integer.parseInt(values[1]);
    WeatherArray[nMonth].dAvTemp = Double.parseDouble(values [2]);
    ... whatever ...
  }

With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.

Without knowing the details of the csv file, it's hard to suggest anything further.

If you don't actually have year information in the row, then you can do "nFullYears = numberOfMonths/12" as you suggested. If you have the option, put the year and month with the data.

If you want to find a break from one year to the next, you can do this:
Code:
  int nThisMonth = 0;
  int nTotalMonths = 0;
  double dAvTemp = 0.0;
  double dMaxTemp = -999.9;
  for(int nRow=0; nRow<nomberOfMonths; nRow++) {
    if(WeatherArray[nRow].nMonth != nThisMonth) {
      if(nTotalMonths>0) {
        dAvTemp = dAvTemp/nTotalMonths;
        print everything for this year ...
      }
      nThisMonth = nMonth;
      nTotalMonths = 0;
      dAvTemp = 0.0;
      dMaxTemp = -999.9;
    }
    nTotalMonths++;
    dAvTemp += WeatherArray[nRow].dTemp;
    double dMax = WeatherArray[nRow].dMaxTemp;
    if(dMaxTemp<dMax) dMaxTemp = dMax;
  }
  if(nTotalMonths>0) {
    dAvTemp = dAvTemp/nTotalMonths;
    print everything for this year ...
  }

I don't think I've answered all your questions. But I think you have enough to experiment.
As you experiment, you should discover different ways of doing different things.
I hope you have a build environment handy.
 
  • Like
Likes   Reactions: 1 person
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.

.Scott said:
With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.


Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.

How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }
 
.Scott said:
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }

Thanks again Scott for all your help, this is an edit after working back through again.

I have compiled as you instructed, and up to where you left of after constructing WeatherArray.
At this point, I can get the earlier print out of average over a column (that helps in getting numberOfMonths) but I can't get a print out of anything after WeatherArray, not even numberofMonths, it is returning
"Exception in thread "main" java.lang.IllegalStateException: Scanner closed
at java.util.Scanner.ensureOpen(Scanner.java:1070)
at java.util.Scanner.next(Scanner.java:1358)
at sheffieldWeatherPackage.Weather.main(Weather.java:37)


Here is my code, much replicating yours, for reference;


package sheffieldWeatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Weather {
public int nYear;
public int nMonth;
public double maxTemp;
public double minTemp;
public double afDays;
public double rainFall;
public double sunHrs;


public static void main(String[] args) {
String fileName = "sheffielddata.csv";
File file = new File(fileName);
try {
Scanner inputStream = new Scanner(file);
double sum = 0;
int numberofMonths = 0;
while (inputStream.hasNext()){
String data = inputStream.next();
String[] values = data.split(",");
double avRain = Double.parseDouble(values[5]);
sum += avRain;
numberofMonths++;
System.out.println(numberofMonths);
}
inputStream.close();
System.out.println("Average - " + (sum / numberofMonths));

Weather[] WeatherArray = new Weather [numberofMonths];
for (int nMonth = 0; nMonth<numberofMonths; nMonth++) {
String data = inputStream.next();
String[] values = data.split(",");
WeatherArray[nMonth].nYear = Integer.parseInt(values[0]);
WeatherArray[nMonth].nMonth = Integer.parseInt(values[1]);
WeatherArray[nMonth].maxTemp = Integer.parseInt(values[2]);
WeatherArray[nMonth].minTemp = Integer.parseInt(values[3]);
WeatherArray[nMonth].afDays = Integer.parseInt(values[4]);
WeatherArray[nMonth].rainFall = Integer.parseInt(values[5]);
WeatherArray[nMonth].sunHrs = Integer.parseInt(values[6]);



System.out.println(numberofMonths);


}

} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}
 
Last edited:
SalfordPhysics said:
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.




Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.
The code tags should look like this:
[code[/color]] your code [/code]
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
I would like to provide the following code in calculating average rainfall, maximum or minimum temperatures.

Java:
public static void readCSV() throws FileNotFoundException {
        // 1st, config the CSV reader, such as line separator, column separator and so on
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");

        // 2nd, creates a CSV parser with the configs
        CsvParser parser = new CsvParser(settings);

        // 3rd, parses all rows from the CSV file into a 2-dimensional array
        List<String[]> resolvedData = parser.parseAll(new FileReader("/examples/example.csv"));

        // 4th, fill the data list with 1st column in all rows in the csv file.
        List<Integer> data = new ArrayList<Integer>();
        for (String[] row : resolvedData) {
            data.add(Integer.parseInt(row[0]));
        }

        // 5th, find out the smallest, largest, average and sum
        int smallest = Integer.MAX_VALUE, largest = 0, average = 0, sum = 0;
        for (Integer val : data) {
            sum += val;
            smallest = val <= smallest ? val : smallest;
            largest = val > largest ? val : largest;
        }

        average = data.isEmpty() ? 0 : sum / data.size();
    }

In this code example, I used the open source library uniVocity-parsers to parse CSV files into list of arrays just in few lines of code. You can try to extend from this code to fullfill your requirements better.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 10 ·
Replies
10
Views
12K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
7K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K