Which statistical test should I use?

In summary: There are some persons who believe that there should be no standards at all and that an article should be accepted for publication if the author(s) believe it's "correct".In summary, the question at hand is whether the amount of chocolate consumption is associated with decreased levels of a certain cytokine. This question cannot be definitively answered by statistics, as it is limited by the given information and assumptions made. However, a common approach in life science journals is to use frequentist statistics, specifically hypothesis testing, to determine the probability of the data given a null hypothesis. The acceptance of such evidence may vary among individuals and journals.
  • #1
Kvm90
28
0
I have a data set consisting of ~10,000 patients. Each patient answered how often they had chocolate and we assigned the following values to their responses:


Never= 0*

1-6 time per year = 0.01*

7-11 times per year = 0.028*

1 time per month = 0.033*

2-3 times per month = 0.08*

1 time per week = 0.14*

2 times per week = 0.29*

3-4 times per week = 0.5*

5-6 times per week = 0.79*

1 time per day = 1*

2 or more times per day = 2*


We also have data describing the levels of a certain cytokine in their blood.*


The question I am trying to answer is: Does the amount of chocolate someone eats cause DECREASED levels of this cytokine?


What statistical test should I run in order to answer this question? I am very new to statistics, so any and all information is much appreciated.
 
Physics news on Phys.org
  • #2
Help would be greatly appreciated! Please post if you can!
 
  • #3
Kvm90 said:
The question I am trying to answer is: Does the amount of chocolate someone eats cause DECREASED levels of this cytokine?

Applying statistics is a subject matter, so you should clarify whether "trying to answer" means that you want an answer that can be published in an article or do you want an answer for your own personal satisfaction?

Unless you explain your rating system, people can only advise you how to test whether higher levels of your rating are associated with decreased levels of the cytokine. This is not necessarily the same thing as whether higher levels of chocolate are associated with decreased levels.
 
  • #4
Thanks Stephen, I am trying to obtain an answer that can be published in an article.

As for explaining the rating system - the participants are given a questionnaire in which they have the multiple choice options shown above (never, 1-6 times per year, 7-11 times per year, etc...). The database codes these answers to the numbers shown above (0=never, 0.01=1-6 times per year, 0.028=7-11 times per year, etc.)...

Maybe my lack of knowledge in statistics is causing me to be unfortunately vague. I thought that I could describe the data, our ranking system, and the question at hand (Does the amount of chocolate someone eats cause DECREASED levels of this cytokine?) in the hopes that someone would respond with:

"Use a t-test" or "use a chi-squared test"

I'm privy to the fact that it's probably going to require more than running one simple test in order to obtain publishable results, but any step in the right direction (maybe just explaining the general outline or providing what statistics concepts I should look into) would be a great help to get the ball rolling for me in this project.

Thanks again!
 
  • #5
Kvm90 said:
Thanks Stephen, I am trying to obtain an answer that can be published in an article.

Then the only reliable way to do the statistics is to understand what type of statistical tests are accepted by the editors of the target publications. Academic journals may have a document describing their guidelines. You can look at other articles that were published and see what tests were used in a similar situation. If you can't find articles about similar situations (Does eating X more often influence the level of Y?) it's unlikely the journal would publish your article anwyay.

The database codes these answers to the numbers shown above (0=never, 0.01=1-6 times per year, 0.028=7-11 times per year, etc.)...

Why code 1-6 times a year to 0.01. Why not 0.02 or 177.45? You haven't explained the reason for the ratings.

Maybe my lack of knowledge in statistics is causing me to be unfortunately vague. I thought that I could describe the data, our ranking system, and the question at hand (Does the amount of chocolate someone eats cause DECREASED levels of this cytokine?) in the hopes that someone would respond with:

"Use a t-test" or "use a chi-squared test"

There are people who make responses like that. If you want to believe them, good luck!

I think your lack of knowledge in statistics leads you to attribute too much capability to statistics.

Statistics does not determine cause and effect. (The very definition of a "cause" gets into philosophical discussions.) It's better to ask the question in the form "Is increased chocolate consumption associated with decreased levels of the cytokine".

The conclusions made by mathematics are limited by the given information. For example, if you know one side and one angle of a triangle, there is no mathematics that let's you determine the missing sides and angles. There is not enough given information In your problem (and in most applications of statistics to real world problems) to determine a yes-or-no answer to the question. It is not even possible to determine the the probability that the answer is yes or the probability that the answer is no.

The statistics used in most life science journals is called "frequentist statistics" and it is historically the oldest form of statistics. A problem like yours is approached by a procedure called a "hypothesis test". We make an assumption that let's us compute the probability of the data. This assumption, called the "null hypothesis" is usually as "empty" as possible. In your case it would say "chocolate consumption has no effect of levels of the cytokine" and that would have to be further elaborated so it provides a mathematical model of how your data were generated. The probability of some function of the data is computed (such functions are "statistics" in the technical sense of the term). If the probability of the statistic being as extreme as the data shows is small then we "reject" the null hypothesis.

The above method is not a mathematical proof that "rejection" is the correct thing to do. It is simply a procedure that is accepted by many people as "evidence" in the non-technical sense of the word "evidence". Some persons may be persuaded by one type of statistic and others may not. Different journals may have different standards about what constitutes a "small" probability (0.05 is a typical value, but this is arbitrary).
 
  • #6
Wow, thanks a lot for the information. Your explanation of 'frequentist statistics' is really helpful. I've been independently trying to understand the uses of each statistical test, but your explanation of the null hypothesis as it relates to my study really made me understand what I am really trying do (that is, compile argument that supports the rejection my null hypothesis).

Stephen Tashi said:
Why code 1-6 times a year to 0.01. Why not 0.02 or 177.45? You haven't explained the reason for the ratings.

My apologies - I didn't understand your question at first. The decimal changes the multiple choice answer into to the participant's daily frequency of chocolate consumption.


Stephen Tashi said:
...the only reliable way to do the statistics is to understand what type of statistical tests are accepted by the editors of the target publications. Academic journals may have a document describing their guidelines. You can look at other articles that were published and see what tests were used in a similar situation.

I'm taking your advice and went along to look at other journals that did the same study. According to these studies, cytokine "concentrations were transformed into natural logarithms to reduct their positive skewness and data were reported as geometric means. Differences between chocolate nonconsumers and dark chocolate consumers were evaluated using multivariate analysis of variance or multivariate binomial (Poisson) regression with the log link function." The article continues to address covariates, but I think it's best for me to stop here and make covariates step 2 of my attempts to understand this process.

From here, I have looked into Poisson regression and multivariate ANOVA. Here, I run into another bit of confusion. When looking into Poisson regression, most instructions say that the outcome is dichotomized/categorical (yes or no, agree or disagree), so I'm not sure how that can be applied since there are various levels of chocolate consumption and various concentrations of CRP.

MANOVA seems complex. Plus, most sites and papers are "controlling for.." this and that (age, sex, race, etc.) which I don't know how to do. "Controlling for..." something makes intuitive sense to me. Obviously, there are more things that influence cytokine levels than chocolate - and I need to control for a convincing amount of these factors. Any help understanding these two tests and if/why they would be useful, what they would show, etc. would be wonderful.
 
  • #7
Kvm90 said:
Differences between chocolate nonconsumers and dark chocolate consumers were evaluated using multivariate analysis of variance or multivariate binomial (Poisson) regression with the log link function."

That's a promising lead. Don't neglect non-chocolate related articles with the same general pattern. (The more X you eat the more your Y level goes down (or up).)

From here, I have looked into Poisson regression and multivariate ANOVA. Here, I run into another bit of confusion. When looking into Poisson regression, most instructions say that the outcome is dichotomized/categorical (yes or no, agree or disagree), so I'm not sure how that can be applied since there are various levels of chocolate consumption and various concentrations of CRP.

I don't see how to apply Poission regression either, unless you treat the number of times per week a person eats chocolates as the dependent varaible. The "number of times" is a count. (Using counts would ignore your rating scale.)

MANOVA seems complex. Plus, most sites and papers are "controlling for.." this and that (age, sex, race, etc.) which I don't know how to do. "Controlling for..." something makes intuitive sense to me. Obviously, there are more things that influence cytokine levels than chocolate - and I need to control for a convincing amount of these factors. Any help understanding these two tests and if/why they would be useful, what they would show, etc. would be wonderful.


The last time I did ANOVA was over 20 years ago when I took a course in it, However, if we discuss it somebody who knows what they are doing will probably chime in and it keeps the thread alive.

Before your worry about complicated approaches, do the simplest ANOVA. As I see it, this would use the cytokine level as the response variable and the levels of chocolate eating as the treatments. (It would ignore your rating scale.) This type of test won't provide evidence for "The more chocolate you eat, the lower your cytokine level will be". If it provides evidence, It only provides evidence that that eating chocolate has some effect. (For example, perhaps if you eat a certain level of chocolate you cytokine goes low, but at greater or lesser level of chocolate eating, your cytokine goes high.)

If you plan to "control for factors" you'll have to survey your data to make sure the data for those factors is present. The database may have a field for a datum but the contents of the field may be null or some place holder for "missing data" like "99999".
 

1. What is the purpose of a statistical test?

A statistical test is used to analyze and evaluate data in order to determine if there is a significant relationship or difference between variables. It helps to make sense of large amounts of data and draw conclusions based on statistical evidence.

2. How do I know which statistical test to use?

The choice of a statistical test depends on the type of data you have, the research question you are trying to answer, and the assumptions of the test. It is important to carefully consider these factors and consult with a statistician if needed to determine the most appropriate test for your specific study.

3. Is there a general rule for choosing a statistical test?

While there are some general guidelines for selecting a statistical test, there is no one-size-fits-all rule. It is important to understand the type of data you have, the goals of your study, and the assumptions of the different tests in order to make an informed decision about which test to use.

4. Can I use the same statistical test for different types of data?

Not all statistical tests are suitable for every type of data. For example, a t-test is used for comparing means of two groups, while ANOVA is used for comparing means of three or more groups. It is important to choose a test that is appropriate for your specific data in order to obtain accurate results.

5. What should I do if my data does not meet the assumptions of the chosen statistical test?

If your data does not meet the assumptions of the chosen statistical test, you may need to use a different test or consider transforming your data to meet the assumptions. It is important to carefully examine the assumptions of the test and decide if they are reasonable for your data before proceeding with the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
3K
  • Introductory Physics Homework Help
Replies
11
Views
845
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Replies
47
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
2K
  • Mechanical Engineering
Replies
3
Views
217
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Back
Top