Randomizing run order in a designed experiment

Introduction

Randomization of an experimental design (DOE) is different from randomization in a clinical trial. In a DOE, the design of the $n$ trials is known in advance, but in the clinical trial, it is not.

DOE randomization refers to randomizing the order of runs from the fixed design.

Clinical trial randomization means that as individuals present themselves to the trial, they are assigned to a treatment group depending on the result of a random number generator.

So the difference is a randomization of order versus a randomization of the actual experiment unit. One consequence is that while a full-factorial is necessarily a balanced and orthogonal design regardless of the run order randomization, the clinical trial might have an unbalanced design due to randomization of treatments.

In this post, I’ll talk about run order / DOE randomization. Clinical trial / treatment randomization is important and interesting, but I’ll discuss that one later.

The main benefit of randomization is that you don’t have to worry about confounders potentially mucking up your experimental results. If serious confounders are present, then the inference is not valid (biased estimates, high $p$ -values, wide confidence intervals, etc.). But it may behoove us to accept this inferential nastiness, due to some decision analysis, which may be separate from the statistical arguments.

Here my guidelines for deciding when randomization of run-order should be respected. Certainly not complete, but these are what I came up with in one hour on a Saturday morning.

When the DOE order should be randomized:

If all runs take a roughly equal amount of time and resources to complete, regardless of run order OR
If it is possible (a priori) that lurking variables could have a moderate to severe affect the study results OR
If many different organizations will analyze the experimental data, and there is a desire to have the separate data analyses agree.
- Randomization somewhat ensures that different statisticians will do the ‘obvious’ data analysis and won’t try to get too creative.

When the DOE order should not be randomized:

If time and resources can be saved due to a specific run order, and these savings outweighs statistical concerns AND
If it is only possible (a priori) that effects due to lurking variables could have a small or negligible effect on the experimental outcomes AND
If the prior probability of missing data is 0 AND
If the result of conducting $J$ identical runs in a row is invariant to resetting the factors after each run. AND
If a mixed-effects model is deemed sufficient to answering the experimenter’s research questions, or if a continuous time trend can be added to the statistical model without too-much loss of power.
- (These are some basic analytic corrections that can be applied when randomization is restricted.)

What do the replications look like?

Conventional frequentist statistics are based on hypothetical replications of an experiment. Regardless of randomization of experimental units, it’s interesting to ask ourselves what the hypothetical replications should look like.

When a DOE is randomized, we might think that every hypothetical replication is the same design, but in a different, random run order.

When a DOE is not randomized, we have to think that the hypothetical replications are also the same design, but the run order is always exactly the same. What effect does this have on frequentist inference?

When the DOE replications are in fixed order, the DOE no longer has the expected characteristics of an ANOVA DOE, it now appears to be (at best) a split-plot DOE. So, when a (say) D-optimal design executed in a non-random order is analyzed, we may not be able to expect the frequentist operating characteristics of the analysis to be preserved, but a split-plot/mixed effect approach can sometimes be the solution.

Unfortunately, not all non-randomized designs can simply be analyzed as if they came from a split-plot experiment. For example, if all the factors only have two levels, a mixed-model with random effects won’t be effective – two levels is not enough data to estimate the variance of a random effect with any confidence.¹ In this situation, I don’t have a good answer, but it may be sufficient to include a time trend in the analysis.

Finally, it is interesting to consider what an experimenter should do if the random run order appears to be not random, and there is some suspicion that lurking variables will creep in. In this case, it seems that randomization is not helpful, and the experimenter should just roll the dice again. But if there are many such ‘bad’ randomizations, it could be unprincipled to continue to run a random number generator until a ‘good-looking’ order appears. Again, a split-plot design could be used to these situations, but is not a general solution.

Naive analysis of non-randomized designs

In practice, time trends or mixed models won’t commonly be used to analyze the data from a non-randomized DOE. Chock it up to operator error.

In this case, the ANOVA may still be okay, but the implicit assumption is that the joint effect of all (un)observed confounders on the experimental outcomes is nil. But this problem can easily be waived away by not thinking too hard about design and analysis. (jokes!)

Bayesian view of randomization

To the (subjective) Bayesian, run order randomization does not matter. But this should not be too surprising, because Bayesians have a very hard time justifying one experimental design over another. This is because much of the Bayesian advantage comes from conditioning on the data at hand. But when talking about DOE, there is no data to condition on, so all Bayesian arguments for one design over another are prequential. Nevertheless, there are some Bayesian arguments in favor of randomization, and some of these arguments come down to the ease of conducting the Bayesian analysis under different missing data assumptions.

Another rule of thumb is at least 6 levels of a factor are needed to estimate the variance.↩