I saw something interesting on Frank Harrell’s Datamethods forum. A user wanted to know what to do with her ‘big p, small n’ data. She was analyzing clinical trial data from a trial that had been cut short. The outcome was a binary variable (something like medicine successfully reduce inflammation), but she’d only collected \(20\) obs, and had a handful of predictors.
The is exactly the situation I don’t want to be in. I knew that there was not enough data to support much of an analysis: binary outcomes do not carry very much information. But I was surprised when Frank Harrell responded to the thread saying the minimum sample size for a logistic regression with no predictors is 96 obs (Check out page 8-13 of this handout).
Wow! What a nice number to have in your back pocket!
But when you google “minimum sample size for logistic regression” you’ll get a much different answer from Google!