Solving R Homework: Sunspots Data

  • Thread starter Thread starter _N3WTON_
  • Start date Start date
  • Tags Tags
    Data Homework
Click For Summary

Homework Help Overview

The discussion revolves around an assignment for a Statistics/Probability class that involves analyzing sunspots data using R programming. The tasks include calculating sample means and variances, creating histograms, and conducting simulations with varying sample sizes.

Discussion Character

  • Exploratory, Conceptual clarification, Mathematical reasoning, Problem interpretation

Approaches and Questions Raised

  • Participants discuss the original poster's code and the errors encountered, particularly regarding data types and column labeling. There are suggestions to print values for debugging and to check if the data is numeric. Some participants share their own programming experiences and offer encouragement.

Discussion Status

The discussion includes various attempts to troubleshoot the code, with some participants providing insights into potential issues with data types and column definitions. There is a mix of personal anecdotes about learning programming and advice on debugging techniques. No explicit consensus has been reached, but several constructive suggestions have been made.

Contextual Notes

Participants note that the data may be incorrectly formatted, with strings instead of numeric values, which is causing issues with calculations. There is also mention of the original poster's limited experience with R programming, which may influence their understanding of the problem.

_N3WTON_
Messages
350
Reaction score
3

Homework Statement


I have an upcoming assignment for a Statistics/Probability class that requires me to write a program in R. The assignment requires me to do the following:
1. Obtain the sample mean x and sample variance s2 of the sunspots data.
2. Provide a histogram of the data.
3. For 10000 replications, randomly sample n sunspots observations from the given dataset. For each replication, obtain the sample mean. That is, you will have 10000 sample means. Compute the sample variance of these sample means.
4. Repeat the above process with n=10, 20, 30, 40, 50, 60, 70, 80, 90, and 100.
5. Obtain juxtaposed plots of the histograms of the means corresponding to n=10 and n=100.
6. Plot the variances as the function of n. What are your observations?

Homework Equations

The Attempt at a Solution


This is the code I have come up with thus far:
Code:
filename <- "C://Users//Colin//Desktop//Project1Data.txt"
data <- read.table(filename)
colnames(data) <- c("id","x")

#Questions 1 and 2
x <- data$x
mean(x)
var(x)
summary(data)
hist(x)

#partial answer to Questions 3 and 4
p <- numeric()

for (i in 1:10000){
s <- data[sample(1:1053,10,replace=FALSE),]
x <- s$x
y[i] <- mean(x)
p[i] <- y[i]
assign(paste("sample",i,sep=""),s)
assign(paste("mean",i,sep=""),y)
}
Unfortunately this code is generating a number of errors that I am unsure how to deal with (I was hoping somebody here could help). I suppose I should begin with the first error I am receiving: "
Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA". Any help or advice would be greatly appreciated, thanks.
 
Physics news on Phys.org
Can you print the x values? That might give you a clue.

When you read in the flle the data object has three columns so data$x is an array of values. The question is are they strings or numbers. The mean() is looking for numbers.
 
jedishrfu said:
Can you print the x values? That might give you a clue.

When you read in the flle the data object has three columns so data$x is an array of values. The question is are they strings or numbers. The mean() is looking for numbers.
Here is a sample of the data set:
"x" "id"

"1" 33

"2" 81

"3" 7

"4" 38

"5" 113

"6" 92

"7" 18

"8" 24

"9" 100

"10" 89

"11" 14

"12" 26

"13" 19

"14" 32

"15" 7

"16" 58

"17" 1

"18" 30

"19" 41

"20" 32
the "x" values go until 1053 so I don't want to post them all here, but I have it saved in the above format in a .txt file
 
Okay you can see right there that the x value is a string not a number so that's why its failing. I think they want you to average the second column which is numeric. It makes no sense to find the mean of x here since it is just a row counter.
 
I see a mismatch you defined your columns as id and x whereas your data says x and id. Try switching the column labeling in your program at line 3
 
Thank you for that. However, I'm a little confused as to why I am getting results for variance but not mean despite the code being the same for both...
 
I apologize if some of these questions are sort of elementary, my only real knowledge of R comes from a crash course a few days ago from a tutorial I found online :/
 
  • #10
Don't feel,bad we all start programming somewhere and that means we get tripped up by some very simple things. My first programming at my high school on a fancy programmable desktop calculator and I couldn't figure out how to turn it on. The teacher had a chuckle but was impressed with my first program to compute the nth root of any number.
 
  • #11
jedishrfu said:
Don't feel,bad we all start programming somewhere and that means we get tripped up by some very simple things. My first programming at my high school on a fancy programmable desktop calculator and I couldn't figure out how to turn it on. The teacher had a chuckle but was impressed with my first program to compute the nth root of any number.
That's impressive. In my high school programming class we had a project to write a program on a TI-84 that finds the area under the curve using Riemann Sums. It finds the left sum, right sum, midpoint sum, trapezoidal sum and the definite integral. I still use it to this day :D
 
  • #12
My project was on a very limited desktop calculator circa 1970 that had programmable features for math only. You were really limited in what it could do and how much memory it had. In my program, I ran out of registers and so I had hit the enter key repeatedly for the next iteration because I had no register for the loop counter. It used something akin to the Newton approximation technique optimized for the machine.
 
  • #13
jedishrfu said:
You were really limited in what it could do and how much memory it had.
I always forget how spoiled we are today: for most basic programs memory really doesn't enter into the equation at all. I have a lot of respect for people good at programming, it takes a ton of patience. Right now I'm about ready to throw my computer out a window into the snow because I can't get this program working haha :D
 
  • #14
Learn how to use the print statement. It's one of the best debugging tools in a new environment like this. Don't trust your code write a few lines and test them and eventually you'll get through it.
 
  • #15
jedishrfu said:
Learn how to use the print statement. It's one of the best debugging tools in a new environment like this. Don't trust your code write a few lines and test them and eventually you'll get through it.
Thanks again for the advice. After a lot of trying I was able to successfully complete this assignment a few minutes ago :D
 
  • #16
That's great. Welcome to the programmers guild!
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
2K