Getting flagged values in residual plot in R

  • Thread starter Thread starter FallenApple
  • Start date Start date
  • Tags Tags
    Plot
Click For Summary
SUMMARY

The discussion focuses on identifying flagged values in residual plots generated by R, specifically the significance of the numbers 8 and 50, which represent the lowest and highest residuals, respectively. Users can determine the corresponding row numbers in the dataset by utilizing the functions which.min(mod$residuals) and which.max(mod$residuals), where mod is the regression model variable. For programming-specific inquiries, Stack Overflow is recommended as a resource for further assistance.

PREREQUISITES
  • Familiarity with R programming language
  • Understanding of regression analysis concepts
  • Knowledge of residual plots and their significance
  • Experience with data manipulation in R dataframes
NEXT STEPS
  • Learn how to create and interpret residual plots in R
  • Explore the use of which.min() and which.max() functions in R
  • Investigate advanced regression techniques in R, such as multiple regression
  • Engage with the R community on Stack Overflow for programming-specific questions
USEFUL FOR

Data analysts, statisticians, and R programmers seeking to understand residual plots and identify flagged values in regression analysis.

FallenApple
Messages
564
Reaction score
61
ResidualsVsFitted.png

What does the 8 and the 50 mean? I know that they are flagged values. Often, this happens. The plot is off because of some extreme points, the plot flags them, and I go into the dataset and cannot find them. Are they the built in row numbers of the dataframe? or are they something else?
 
Computer science news on Phys.org
This is a very program-specific question, which is not really in the main subject area of physicsforums. If you have no luck here the best place to ask is stack overflow, tagging your question as about R, graphics and regression. Stack overflow is designed for programming-specific questions, and I have always been able to find answers there.

That said, I have one suggestion. Since the plot points flagged with 8 and 50 have respectively the lowest and highest residuals, you could identify the numbers of the rows in the data to which they relate by which.min(mod$residuals) and which.max(mod$residuals) where mod is the name you have given to the model variable holding the results of the regression.
 
  • Like
Likes   Reactions: FallenApple

Similar threads

Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 11 ·
Replies
11
Views
28K