Sensitivity Analysis for Missing Data: Picking Values to Try & Not Reject Null

  • Context: Graduate 
  • Thread starter Thread starter wvguy8258
  • Start date Start date
  • Tags Tags
    Analysis Sensitivity
Click For Summary
SUMMARY

This discussion focuses on conducting sensitivity analysis for missing data in a survival analysis context, specifically using a hazard model to assess the impact of topographic slope on land parcel development. The dataset contains less than 10% missing values, which are not missing at random. The participant, Seth, proposes a method to assign values to missing data to minimize the rejection of the null hypothesis that the slope coefficient is greater than or equal to zero. He seeks feedback on the feasibility of this approach and its implications for multivariate modeling.

PREREQUISITES
  • Understanding of survival analysis and hazard models.
  • Familiarity with sensitivity analysis techniques for missing data.
  • Knowledge of hypothesis testing and null hypothesis formulation.
  • Experience with multivariate statistical modeling.
NEXT STEPS
  • Research methods for handling missing data in survival analysis.
  • Explore sensitivity analysis techniques specific to bounded response variables.
  • Learn about multivariate hazard modeling and its complexities.
  • Investigate statistical software options for implementing these analyses, such as R or Python's lifelines library.
USEFUL FOR

Researchers and statisticians involved in survival analysis, data scientists handling missing data, and anyone interested in hypothesis testing within ecological or environmental studies.

wvguy8258
Messages
48
Reaction score
0
Hi,

I have a large data set with with less than 10% missing values (missing response, but all predictor variables present). It is a near certainty that these values are not missing at random, dependent upon the missing value. The response is survival time of a land parcel with 'death' being development of the parcel. Predictors are things like average topographic slope in the parcel etc. I plan to fit a hazard model to the data to test hypotheses related to the sign and magnitude of slope coefficients. I've read a bit about methods for dealing with missing data, but I feel that because I am primarily interested in testing hypotheses that a simpler method may be available that I haven't yet seen in print. I am here asking for advice on the feasibility of the simple idea to follow, how it can be improved, and if anyone has any pertinent references to share.

The survival time is bounded. I am taking the beginning of colonization of the area as the beginning of the study period and the present as its end. So, the response variable is bounded between zero and 2009-time of first colonization. Let's say I have a very simple hypothesis that the slope coefficient of topographic slope is less than zero, so my null hypothesis is that it is greater than or equal to zero. It seems that I could pick values for the missing data so as to minimize the chance of rejecting this null hypothesis. If I still find evidence to reject the null under this extreme example, then it is reasonable to conclude that the full data set, if missing values were also observed, would likewise lead to this rejection. So, in the example of topographic slope I would assign missing data values that would give the largest slope coefficient (and the smallest variance at a high parameter estimate? less sure of how to think here) possible given the observed data. First, are there any logical pits I am falling into here? This seems rather straightforward with only one predictor in the model, but I suspect that a multivariate model will complicate things. Should each hypothesis considered (corresponding to each slope coefficient of interest) be considered separately? Meaning, should I concoct a series of missing values to try and not reject the null associated with hypothesis 1, start over and do the same thing for hypothesis 2, etc? Or should this be done at once? If you think this is all a bad idea, in a few words, how would you go about modeling the data I've described? Thanks. -seth
 
Physics news on Phys.org


Can someone at least tell me why they read and didn't respond?
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
4K