|
| “R-sq is a first-blush indicator of a good model. R-sq is often misused as the measure to assess which model produces better predictions...The root mean squared error (RMSE) is the measure for determining the better model. The smaller the RMSE value, the better the model is (the predictions are more precise)."
I sourced this from Dr. Ratner's LinkedIn, so I can't link to it. His website is: dmstat1.com
Be careful with R-sq. The more variables you use the higher your R-sq value will be no matter how good or bad your model is. If you are using more than a few predictor variables you should also look at Adjusted-R-sq.
If you would rather get a measure of accuracy you should look at RMSE. If you are comparing two models also look at Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
To answer your question, if the R-sq is 0.7, it is incorrect to say that 70% of the data has been predicted accurately by the model. R-sq is a measure of how much variance is accounted for within the predictor variables you are using. R-sq of 0.7 means that your predictor variables can explain 70% of the variance in the response variable.
For example pretend that the price of a cup of coffee is influenced ONLY by the price of water and the price of coffee beans. If I only know the price of water and I predict the price of coffee using only the price of water my R-sq is 0.5, because I know only 1/2 of the things that influence the price of a cup of coffee.
In reality, it is virtually impossible to know ALL predictors that influence an outcome (R-sq = 1) because we either don't know what variables influence an outcome, or the variables that influence an outcome aren't contained in the dataset.”
So from what I gather 93% of the variance in yield can be explained by the given crop year.
And as crop year increases in value we see a steady increase in yield.
So no it’s not accurate to say a 93 chance for a higher yield but I think it is accurate to say there’s a high chance that as the year increases we will also see an increase in yield.
I’d also like to point out at the end how the guy says, with his coffee analogy, that you can never know all the variables in reality effecting a thing so it’s impossible to predict to accurately predict anything and basically all this math is worthless in real world applications.
And then I’ll post this screen shot of ai saying how r2 is useless and don’t try and use it
And then you look back at the yields for the past 25 years and see that the model has indeed accurately predicted higher yields.
I mean between 1960 and 1990 you can see the model has already formed a really good fit
Edited by Deltamudd 5/5/2025 17:53
(IMG_7593 (full).png)
Attachments ----------------
IMG_7593 (full).png (219KB - 31 downloads)
| |
|