Sensitivity analysis, missing data, and hypothesis testing

In summary: If so, then the assumption of missing at random may not be valid. In that case, you may want to look at a models that takes into account the missingness. If you don't see a marked difference from the sample mean, then the "missing" observations are probably missing at random.
  • #1
wvguy8258
50
0
Hi,

I have a large data set with with less than 10% missing values (missing response, but all predictor variables present). It is a near certainty that these values are not missing at random, dependent upon the missing value. The response is survival time of a land parcel with 'death' being development of the parcel. Predictors are things like average topographic slope in the parcel etc. I plan to fit a hazard model to the data to test hypotheses related to the sign and magnitude of slope coefficients. I've read a bit about methods for dealing with missing data, but I feel that because I am primarily interested in testing hypotheses that a simpler method may be available that I haven't yet seen in print. I am here asking for advice on the feasibility of the simple idea to follow, how it can be improved, and if anyone has any pertinent references to share.

The survival time is bounded. I am taking the beginning of colonization of the area as the beginning of the study period and the present as its end. So, the response variable is bounded between zero and 2009-time of first colonization. Let's say I have a very simple hypothesis that the slope coefficient of topographic slope is less than zero, so my null hypothesis is that it is greater than or equal to zero. It seems that I could pick values for the missing data so as to minimize the chance of rejecting this null hypothesis. If I still find evidence to reject the null under this extreme example, then it is reasonable to conclude that the full data set, if missing values were also observed, would likewise lead to this rejection. So, in the example of topographic slope I would assign missing data values that would give the largest slope coefficient (and the smallest variance at a high parameter estimate? less sure of how to think here) possible given the observed data. First, are there any logical pits I am falling into here? This seems rather straightforward with only one predictor in the model, but I suspect that a multivariate model will complicate things. Should each hypothesis considered (corresponding to each slope coefficient of interest) be considered separately? Meaning, should I concoct a series of missing values to try and not reject the null associated with hypothesis 1, start over and do the same thing for hypothesis 2, etc? Or should this be done at once? If you think this is all a bad idea, in a few words, how would you go about modeling the data I've described? Thanks. -seth
 
Physics news on Phys.org
  • #2
Yours is a non-standard method but I don't see a logical pitfall. For multivar analysis my guess is you can do it separately for each parameter you are testing; or you can set up a joint test (e.g. an F test) that encompasses all of the individual tests, and then assign values to minimize that single, overarching test statistic.

Another approach may be to use a censored data ("Tobit") model.
 
  • #3
Thank you, I have been reading some about models for censored data.
 
  • #4
When you look at the observed characteristics of the "missing" observations, do you see a marked difference from the sample mean?
 

1. What is sensitivity analysis and why is it important?

Sensitivity analysis is a technique used in scientific studies to evaluate how sensitive the results of a study are to changes in the assumptions or variables used in the analysis. It helps researchers understand the robustness of their findings and the potential impact of any uncertainties or errors in the data.

2. How should missing data be addressed in a study?

Missing data can significantly affect the results of a study and can lead to biased conclusions. It is important to carefully consider and address missing data before conducting any hypothesis testing. This can be done by using appropriate statistical methods such as imputation or exclusion, depending on the nature and extent of the missing data.

3. What is the purpose of hypothesis testing in scientific research?

Hypothesis testing is used to determine whether there is a statistically significant relationship between two or more variables. It helps researchers make inferences about a population based on a sample of data, and provides evidence for or against a specific research hypothesis.

4. How do you choose the appropriate statistical test for hypothesis testing?

The appropriate statistical test for hypothesis testing depends on the type of data being analyzed and the research question being addressed. Factors such as the type of variables (categorical or continuous) and the number of groups being compared should be considered when selecting a test. Consulting with a statistician or using statistical software can also help in choosing the right test.

5. What is the significance level in hypothesis testing and how is it determined?

The significance level, also known as alpha, is the probability of rejecting the null hypothesis when it is actually true. It is typically set at 0.05 or 0.01, depending on the level of confidence desired by the researcher. This level can also be adjusted based on the type of study and the potential consequences of making a Type I error (incorrectly rejecting the null hypothesis).

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
977
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
464
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
Back
Top