Frank Harrell has a new keynote talk on Youtube. Watch it here.
While the name of the talk is controversies in predictive modeling and machine learning, I would say it also Frank Harrell’s philosophy of statistics in a nutshell. Here are my notes on the talk, which are also sprinkled through-out his RMS book.
External Validation is Overrated:
- Data splitting (using training and testing sets) is bad. Training and testing is not external validation. Dr. Harrell is a biostatistician, so this makes sense, for machine learning, perhaps it makes less sense.
- It’s better to use resampling to validate the entire process that produced the model, than the current model under consideration
- Feature selection is not reliable in most uses. The process is highly unstable.
Validate Researchers, not models
- The quality of researcher and analysis methodology used highly influences the reliability and usefulness of the resulting research.
Bayesian modeling
- Frequentist penalized methods work well for prediction, but not inference. Confidence intervals and \(p\)-values don’t extend well to case when results are biased
- Horseshoe priors work better than LASSO and elastic net (I’m not sure why???)
- There is a tendency to use methods that are fast, but we don’t stop to think about what the population of effects are. You can envision that population and encode it as a prior in Bayes.
- Ordinal predictors are easier in Bayes.
- DF for non-linear effects can be data driven and still preserve operating characteristics.
- Imputation and modeling can be done in a single unified model (joint modeling)
- Validation is less necessary: Overfitting does not occur in the usual sense, but because the analyst and reader have two different priors. The problem of overfitting is translated to a problem of selecting a prior.
- Forward instead of backward probabilities.
Mirage of Variable Selection
- Parsimony is the enemy of predictive discrimination. You don’t want to spend data trying to figure out which features to use.
- Maxwell’s Demon. First law of thermo. Feature selection requires spending information that could be better used for estimation and prediction.
- Pr(selecting ‘right’ variables)=0. You cannot use data to tell you which elements of the data you should use. You don’t have enough information unless you have millions of obs and a few features.
- Researchers worry about FDR, but seldom worry about FNR.
- Fraction of important features not selected >> 0.
- Fraction of unimportant features selected >> 0.
CI for variable importance quantifies difficulty of variable selection
- Example: \(n=300\), \(12\) features, \(\beta_i=i\), \(\sigma=9\). Rank using partial \(\chi^2\). Simulation shows the method is too noisy.
- Gold standard of variable selection: full Bayesian model with carefully selected shrinkage priors that are not necessarily sparsity priors. Project the full model on to simpler models, Piironen and Vehtari (2017).
ML vs. Stats. models
Stats Models
- Probability Distribution for the data.
- Favors Additivity
- Identified parameters of interest
- Inference, Estimation, and Prediction
- Most useful when SNR is low
Machine Learning
- Algorithmic
- Equal opportunity for interactions and main effects
- Prediction only
- Most useful when SNR is high
Predictive Measures
Gold Standards - not popular anymore
- Smooth, flexible calibration curve
- log likelihood
- log likelihood + log prior
- explained outcome heterogeneity
- heterogeneity of the predictions (Kent & O’Quigley-type measures)
- Relative explained variance (relative \(R^2\)): ratio of variances of \(\hat y\) from a subset model to the full model
- Majority of ML papers do not demonstrate adequate understanding of predictive accuracy.
Bad Measures
- Pr(Classified Correctly), sensitivity, specificity, precision, recall. These are improper, discontinuous scoring rules.
- ROC curves are highly problematic, and have a high ink to information ratio.
Decision Making
- Optimal Bayes Decision that maximizes the expected utility: use posterior distribution and the utility function.
- ROC/AUROC play no role