# Sensitivity Analysis for Missing Data: Picking Values to Try & Not Reject Null

• wvguy8258
In summary, the conversation discusses the use of a hazard model to test hypotheses related to the sign and magnitude of slope coefficients in a data set with less than 10% missing values. The response variable is the survival time of a land parcel with predictors such as topographic slope. The speaker proposes assigning values for the missing data to minimize the chance of rejecting the null hypothesis, and asks for advice on this approach. They also inquire about the feasibility of considering each hypothesis separately versus all at once, and ask for alternative suggestions for modeling the data.

#### wvguy8258

Hi,

I have a large data set with with less than 10% missing values (missing response, but all predictor variables present). It is a near certainty that these values are not missing at random, dependent upon the missing value. The response is survival time of a land parcel with 'death' being development of the parcel. Predictors are things like average topographic slope in the parcel etc. I plan to fit a hazard model to the data to test hypotheses related to the sign and magnitude of slope coefficients. I've read a bit about methods for dealing with missing data, but I feel that because I am primarily interested in testing hypotheses that a simpler method may be available that I haven't yet seen in print. I am here asking for advice on the feasibility of the simple idea to follow, how it can be improved, and if anyone has any pertinent references to share.

The survival time is bounded. I am taking the beginning of colonization of the area as the beginning of the study period and the present as its end. So, the response variable is bounded between zero and 2009-time of first colonization. Let's say I have a very simple hypothesis that the slope coefficient of topographic slope is less than zero, so my null hypothesis is that it is greater than or equal to zero. It seems that I could pick values for the missing data so as to minimize the chance of rejecting this null hypothesis. If I still find evidence to reject the null under this extreme example, then it is reasonable to conclude that the full data set, if missing values were also observed, would likewise lead to this rejection. So, in the example of topographic slope I would assign missing data values that would give the largest slope coefficient (and the smallest variance at a high parameter estimate? less sure of how to think here) possible given the observed data. First, are there any logical pits I am falling into here? This seems rather straightforward with only one predictor in the model, but I suspect that a multivariate model will complicate things. Should each hypothesis considered (corresponding to each slope coefficient of interest) be considered separately? Meaning, should I concoct a series of missing values to try and not reject the null associated with hypothesis 1, start over and do the same thing for hypothesis 2, etc? Or should this be done at once? If you think this is all a bad idea, in a few words, how would you go about modeling the data I've described? Thanks. -seth

Can someone at least tell me why they read and didn't respond?

## 1. What is sensitivity analysis for missing data?

Sensitivity analysis for missing data is a statistical method used to assess the impact of missing data on study results. It involves systematically testing different imputation methods and assumptions to determine how sensitive the results are to missing data.

## 2. Why is sensitivity analysis important for handling missing data?

Sensitivity analysis is important because it allows researchers to evaluate the robustness of their findings and identify potential biases due to missing data. It also helps to determine the best approach for handling missing data in order to obtain accurate and reliable results.

## 3. What is the process of conducting sensitivity analysis for missing data?

The process of conducting sensitivity analysis for missing data involves selecting multiple plausible values for the missing data and comparing the results of the analysis using different imputation methods and assumptions. The goal is to determine the extent to which the results are impacted by missing data.

## 4. What are some common challenges in conducting sensitivity analysis for missing data?

Some common challenges in conducting sensitivity analysis for missing data include selecting appropriate imputation methods, dealing with complex missing data patterns, and addressing biases that may arise from the imputation process. It is also important to consider the limitations of the data and potential sources of bias in the original study.

## 5. How can sensitivity analysis for missing data improve the validity of research findings?

Sensitivity analysis for missing data can improve the validity of research findings by providing insights into the robustness of the results and identifying potential sources of bias. It also allows for a more transparent and thorough evaluation of the impact of missing data on study results, leading to more accurate and reliable conclusions.