Java - Importing .csv file for calculating selected averages

  • Comp Sci
  • Thread starter SalfordPhysics
  • Start date
  • Tags
    File Java
In summary, the task is to import an excel (.csv) file containing monthly weather data and produce a programme that can execute different averages on a yearly basis. i.e.;, produce average rainfall, maximum or minimum temperatures for desired years.
  • #1
SalfordPhysics
69
1
1. Problem: The task is to import an excel (.csv) file containing monthly weather data and produce a programme that can execute different averages on a yearly basis. i.e.;, produce average rainfall, maximum or minimum temperatures for desired years.

I have thus far managed to calculate average values for entire rows, but I want the programme to only calculate averages of desired values (yearly basis), so that this year's average data can be requested.

2. Relevant Information:

The format of the excel file is (representing a column each): {Year, Month, Max Temp, Min Temp, Rainfall, ...}

In the attempt below, avMonth is showing all the months listed then the average at the bottom as I expected. I have no idea where to go now to complete the task. 3. Attempt:
Mod note: Added [ code ] tags to restore indentation.
Code:
package weatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Main {
	
	public static void main(String[] args) {
		String fileName = "sheffielddata.csv";
		File file = new File(fileName);
		try {
			Scanner inputStream = new Scanner(file);
			double sum = 0;
			double numberofMonths = 0;
			while (inputStream.hasNext()){
				String data = inputStream.next();
				String[] values = data.split(",");
				double avMonth = Double.parseDouble(values [1]);
				sum += avMonth;
				numberofMonths++;
				System.out.println(avMonth);
			}
			inputStream.close();
			System.out.println("Average - " + (sum / numberofMonths));
			
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
}
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.
 
  • #3
.Scott said:
Hint: You're allowed to read the file twice - and after the first read, you'll know how many data values there are.

Thanks for the response Scott, although I can't say I am any the wiser. I am struggling a lot due to next-to-no java experience. Any kind of elaboration would be much appreciated.

My numberOfMonths gives me total number of rows, which calculates me my overall averages. But I want to just calculate averages on a yearly basis as chosen by indexing so...

1) Set up variables for calculating averages
a) Since the divisor is over a year on a monthly chart it is just 12, so I can just put this in.
b) To be thorough, I could divide my numberOfMonths by the actual number of years, but this I am not sure how to do. It would need to only register every 12 rows (since each row is a month). Could this be done by use of indexOf? If so, how do I go about this on the multiple basis I need?

2) Are all the rows already indexed? When I println(numberOfMonths) I get 1 through to 804 (which is the number of rows). Can I implement where by inserting these values the corresponding row of data can be displayed?

So many questions and I apologise that I aren't as clear as could be.
Many thanks again, and to all in advance.
 
  • #4
The first thing you need to do is decide how you going to access the data.
1) One way would be to keep the data in the file and only read what you need when you need it. That would be a bit complicated and perhaps not that efficient.
2) Another would be to try to collect and compute everything you needed in a single pass through the data. If possible, that would be the most efficient use of system resources.
3) Less complicated would be to read the entire file into a "List" and then copy the list into an array.
4) The least complicated is to read through your data twice (two passes). On the first pass, you just count the months. On the second pass, you read everything into a convenient array. Since that's the simplest for coding, that's what I show in the examples below.

Also in my examples below, I am assuming that the first two value are the year and month.

Here's one way of getting the data into an array:
1) Define a structure for my weather data, for example:
Code:
  public class Weather {
    public int nYear;
    public int nMonth;
    public double dAvTemp;
    public double dMaxTemp;
    public double dMinTemp;
    public double dAvHumidity;
    ... whatever ...
  }
2) As you did in your code, read the data set and determine numberOfMonths.
3) Declare your array, as with:
Code:
  Weather[] WeatherArray = new Weather [numberOfMonths];
4) Read through the file again, this time storing everything away in the array:
Code:
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    String data = inputStream.next();
    String[] values = data.split(",");
    WeatherArray[nMonth].nYear      = Integer.parseInt(values[0]);
    WeatherArray[nMonth].nMonth    = Integer.parseInt(values[1]);
    WeatherArray[nMonth].dAvTemp = Double.parseDouble(values [2]);
    ... whatever ...
  }

With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.

Without knowing the details of the csv file, it's hard to suggest anything further.

If you don't actually have year information in the row, then you can do "nFullYears = numberOfMonths/12" as you suggested. If you have the option, put the year and month with the data.

If you want to find a break from one year to the next, you can do this:
Code:
  int nThisMonth = 0;
  int nTotalMonths = 0;
  double dAvTemp = 0.0;
  double dMaxTemp = -999.9;
  for(int nRow=0; nRow<nomberOfMonths; nRow++) {
    if(WeatherArray[nRow].nMonth != nThisMonth) {
      if(nTotalMonths>0) {
        dAvTemp = dAvTemp/nTotalMonths;
        print everything for this year ...
      }
      nThisMonth = nMonth;
      nTotalMonths = 0;
      dAvTemp = 0.0;
      dMaxTemp = -999.9;
    }
    nTotalMonths++;
    dAvTemp += WeatherArray[nRow].dTemp;
    double dMax = WeatherArray[nRow].dMaxTemp;
    if(dMaxTemp<dMax) dMaxTemp = dMax;
  }
  if(nTotalMonths>0) {
    dAvTemp = dAvTemp/nTotalMonths;
    print everything for this year ...
  }

I don't think I've answered all your questions. But I think you have enough to experiment.
As you experiment, you should discover different ways of doing different things.
I hope you have a build environment handy.
 
  • Like
Likes 1 person
  • #5
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.

.Scott said:
With all the data in an array, I guessing you would know how to compute any statistics. For example, average March temperatures or maximum January temperature.


Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.

How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
  • #6
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }
 
  • #7
.Scott said:
Let's assume that the year and month are included each line of the *.csv file - as I described above. And also assume that we have read the data into the array as described above.
Then:
Code:
  double av1994 = 0.0;
  int nM1994 = 0;
  for(int nMonth=0; nMonth<numberOfMonths; nMonth++) {
    if(WeatherArray[nMonth].nYear==1994) {
      nM1994++;
      av1994 += WeatherArray[nMonth].dTemp;
    }
  }
  if(nM1994>0) {
    av1994 = av1994/nM1994;
    ... output the result ...
  } else {
    ... report that there is no data for 1994 ...
  }

Thanks again Scott for all your help, this is an edit after working back through again.

I have compiled as you instructed, and up to where you left of after constructing WeatherArray.
At this point, I can get the earlier print out of average over a column (that helps in getting numberOfMonths) but I can't get a print out of anything after WeatherArray, not even numberofMonths, it is returning
"Exception in thread "main" java.lang.IllegalStateException: Scanner closed
at java.util.Scanner.ensureOpen(Scanner.java:1070)
at java.util.Scanner.next(Scanner.java:1358)
at sheffieldWeatherPackage.Weather.main(Weather.java:37)


Here is my code, much replicating yours, for reference;


package sheffieldWeatherPackage;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Weather {
public int nYear;
public int nMonth;
public double maxTemp;
public double minTemp;
public double afDays;
public double rainFall;
public double sunHrs;


public static void main(String[] args) {
String fileName = "sheffielddata.csv";
File file = new File(fileName);
try {
Scanner inputStream = new Scanner(file);
double sum = 0;
int numberofMonths = 0;
while (inputStream.hasNext()){
String data = inputStream.next();
String[] values = data.split(",");
double avRain = Double.parseDouble(values[5]);
sum += avRain;
numberofMonths++;
System.out.println(numberofMonths);
}
inputStream.close();
System.out.println("Average - " + (sum / numberofMonths));

Weather[] WeatherArray = new Weather [numberofMonths];
for (int nMonth = 0; nMonth<numberofMonths; nMonth++) {
String data = inputStream.next();
String[] values = data.split(",");
WeatherArray[nMonth].nYear = Integer.parseInt(values[0]);
WeatherArray[nMonth].nMonth = Integer.parseInt(values[1]);
WeatherArray[nMonth].maxTemp = Integer.parseInt(values[2]);
WeatherArray[nMonth].minTemp = Integer.parseInt(values[3]);
WeatherArray[nMonth].afDays = Integer.parseInt(values[4]);
WeatherArray[nMonth].rainFall = Integer.parseInt(values[5]);
WeatherArray[nMonth].sunHrs = Integer.parseInt(values[6]);



System.out.println(numberofMonths);


}

} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}
 
Last edited:
  • #8
SalfordPhysics said:
Thanks again Scott great help and given me a much better perspective of things, I will follow through as you suggested.




Unfortunately, I am again at a loss with the computations. I have been researching using indexOf, but have not had any success when using within [code/] println [code/] to get index values of first occurrence of, for example, 1994.
The code tags should look like this:
[code] your code [/code]
SalfordPhysics said:
How would I, for example, calculate the average temperature of all months in 1994, given that the data is given monthly? This would also help me in figuring out how to select by year for computations.

Again, thanks so much for the help.
 
  • #9
I would like to provide the following code in calculating average rainfall, maximum or minimum temperatures.

Java:
public static void readCSV() throws FileNotFoundException {
        // 1st, config the CSV reader, such as line separator, column separator and so on
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");

        // 2nd, creates a CSV parser with the configs
        CsvParser parser = new CsvParser(settings);

        // 3rd, parses all rows from the CSV file into a 2-dimensional array
        List<String[]> resolvedData = parser.parseAll(new FileReader("/examples/example.csv"));

        // 4th, fill the data list with 1st column in all rows in the csv file.
        List<Integer> data = new ArrayList<Integer>();
        for (String[] row : resolvedData) {
            data.add(Integer.parseInt(row[0]));
        }

        // 5th, find out the smallest, largest, average and sum
        int smallest = Integer.MAX_VALUE, largest = 0, average = 0, sum = 0;
        for (Integer val : data) {
            sum += val;
            smallest = val <= smallest ? val : smallest;
            largest = val > largest ? val : largest;
        }

        average = data.isEmpty() ? 0 : sum / data.size();
    }

In this code example, I used the open source library uniVocity-parsers to parse CSV files into list of arrays just in few lines of code. You can try to extend from this code to fullfill your requirements better.
 

1. What is the purpose of importing a .csv file for calculating selected averages?

The purpose of importing a .csv file for calculating selected averages is to easily and accurately perform calculations on large sets of data. .csv files are commonly used for storing and organizing data, and by importing them into a program like Java, we can quickly analyze and manipulate the data to calculate desired averages.

2. How do I import a .csv file in Java?

To import a .csv file in Java, you can use the built-in BufferedReader class. This class allows you to read the contents of a file line by line. You will also need to use a FileReader object to specify the location of the .csv file. Once you have read the data, you can store it in an array or other data structure to perform calculations.

3. What are some common methods for calculating averages in Java?

There are several methods for calculating averages in Java, including using a for loop to iterate through the data and calculate the average, using the built-in Math library to calculate the average, or using a third-party library such as Apache Commons Math. The method you choose will depend on the complexity of your data and the specific average you are trying to calculate.

4. How can I handle errors or missing data when importing a .csv file in Java?

When importing a .csv file in Java, it is important to handle errors and missing data appropriately. One way to do this is by using try-catch blocks to catch any errors that may occur during the import process. You can also use conditional statements to check for missing data and handle it accordingly, such as skipping over the missing data or replacing it with a default value.

5. Can I import a .csv file with non-numeric data in Java?

Yes, you can import a .csv file with non-numeric data in Java. However, you will need to use different methods to calculate averages for non-numeric data. For example, you may need to use a HashMap to store and count the occurrences of each data point, and then calculate the average based on the frequency of each data point.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
7
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
10K
  • Engineering and Comp Sci Homework Help
Replies
11
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
6
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
6K
  • Programming and Computer Science
Replies
3
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
31
Views
11K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
8K
  • Engineering and Comp Sci Homework Help
Replies
21
Views
5K
Back
Top