Since my wife recently interviewed for a job at Facebook, I’m privy to a slew of FB interview questions. They range from statistics and machine learning to algorithms, but some of the questions were particularly annoying from a statistical perspective.

I’ll share the more extreme examples, but most of the questions were actually pretty modest.

- How do you deal with outliers?

Outliers do not exist. An outlier is a false dichotomy in the data, and if the word outlier enters your thoughts, it just means that your data are over dispersed and that you have to change your model. For example, move from a Poisson model to a negative binomial. And you certainly do not remove or down-weight outliers.

- What would you do if a predictor is skewed?

There is nothing to do. See this post from Frank Harrell Link. In any regression model, the conditional relationship between \(x\) and \(y\) is modeled, not the joint distribution. If the distribution of predictor is bothersome, perhaps there is something wrong with the way you are summarizing the model. Or maybe you are using an improper scoring rule.

From an experimental design perspective, the design is likely poor. But from an observational study perspective, a skewed predictor is inevitable.

- Given an \(n \times p\) design matrix \(X\), in which scenario will you use Lasso, when \(n < p\) or \(n > p\)?

Why not both? If we have a lot of data and a lot of predictors, then shrinkage of any kind is likely a good thing. Obviously the answer FB is looking for is when \(n < p\). The usual linear model can’t be fit in this situation (\(X^{\intercal}X\) is singular).

Again, most of the questions were reasonable, I’m just picking out a few poor examples to demonstrate that some things that are issues from an ML perspective (Questions 1 & 2) are just business as usual for statisticians.