Overview of topics in experimental design

Part 1 — Plackett-Burman Screening

What is it, in plain language?

Imagine you're trying to bake the perfect cake, and you suspect that seven things might affect how good it turns out: oven temperature, baking time, amount of sugar, type of flour, mixing speed, amount of butter, and egg quantity. You want to figure out which of these actually matter the most — but running experiments for every possible combination would take forever (with 7 factors at 2 levels each, that's 2⁷ = 128 experiments!).

Plackett-Burman design is a clever shortcut. It lets you screen many factors using surprisingly few experiments to figure out which ones are important. You won't get a perfect picture of how all the factors interact — but you'll quickly identify the "big hitters" worth investigating further.

Key facts:

It's a screening design — used early in experimentation to filter many factors down to a few important ones.
Each factor is tested at just two levels: a "low" value (written as - or -1) and a "high" value (written as + or +1).
The number of experiments (called "runs") must be a multiple of 4: 8, 12, 16, 20, 24, etc.
With N runs, you can study up to N − 1 factors.

The Steps

Step 1: List your factors and pick two levels for each

Pick the things you want to test. For each one, choose a sensible "low" and "high" value. The levels should be far enough apart to actually produce a difference, but still realistic.

Step 2: Decide how many runs you need

Use the smallest multiple of 4 that is at least one more than your number of factors.

1–3 factors → 4 runs
4–7 factors → 8 runs
8–11 factors → 12 runs
12–15 factors → 16 runs

Step 3: Build the design matrix

A design matrix is a table that tells you exactly what settings to use in each experiment. You build it from a special starting row called a generator row (pre-tabulated in any DOE reference). Then you "shift" the row to make all the other rows, and finish with a row of all -'s.

Step 4: Randomize the run order

Don't run the experiments in the order they appear in the table — shuffle them. This protects you from time-related effects (equipment warming up, weather changing).

Step 5: Run the experiments and record the response

The "response" is whatever you're measuring — yield, strength, taste score, error rate.

Step 6: Calculate the effect of each factor

Effect = (Average response when factor is at +) − (Average response when factor is at −)

A large positive effect means raising that factor increases the response. Close to zero means the factor probably doesn't matter.

Step 7: Identify the important factors

The factors with the largest absolute effects are your "vital few." Plot them on a Pareto chart or half-normal plot to visually separate signal from noise.

A Fully Worked Example

The scenario: You're a chemist trying to maximize the yield (%) of a chemical reaction. You suspect 7 factors could influence yield.

Step 1 — Factors and levels

Factor	Symbol	Low (−)	High (+)
Temperature	A	60 °C	80 °C
Pressure	B	1 atm	2 atm
Catalyst type	C	Type 1	Type 2
Reaction time	D	1 hour	2 hours
Mixing speed	E	100 rpm	300 rpm
pH	F	6	8
Concentration	G	0.1 M	0.5 M

Step 2 — Number of runs

We have 7 factors, so we need at least 8 runs.

Step 3 — Build the design matrix

For an 8-run design, the generator row is + + + − + − −. Write it as Row 1, then cyclically shift (move the last element to the front) to get Rows 2–7. Row 8 is all minuses.

Run	A	B	C	D	E	F	G
1	+	+	+	−	+	−	−
2	−	+	+	+	−	+	−
3	−	−	+	+	+	−	+
4	+	−	−	+	+	+	−
5	−	+	−	−	+	+	+
6	+	−	+	−	−	+	+
7	+	+	−	+	−	−	+
8	−	−	−	−	−	−	−

Each column has exactly four +'s and four −'s — the balance that makes the math work.

Steps 4–5 — Randomize and run

Run	A	B	C	D	E	F	G	Yield (%)
1	+	+	+	−	+	−	−	78
2	−	+	+	+	−	+	−	65
3	−	−	+	+	+	−	+	70
4	+	−	−	+	+	+	−	72
5	−	+	−	−	+	+	+	68
6	+	−	+	−	−	+	+	80
7	+	+	−	+	−	−	+	75
8	−	−	−	−	−	−	−	55

Step 6 — Calculate the effects

Factor A (Temperature): + runs 1,4,6,7 → avg 76.25 | − runs 2,3,5,8 → avg 64.50 → Effect = +11.75

Factor B (Pressure): + avg 71.50, − avg 69.25 → Effect = +2.25

Factor C (Catalyst): + avg 73.25, − avg 67.50 → Effect = +5.75

Factor D (Time): + avg 70.50, − avg 70.25 → Effect = +0.25

Factor E (Mixing speed): + avg 72.00, − avg 68.75 → Effect = +3.25

Factor F (pH): + avg 71.25, − avg 69.50 → Effect = +1.75

Factor G (Concentration): + avg 73.25, − avg 67.50 → Effect = +5.75

Step 7 — Rank and interpret

Factor	Effect
A — Temperature	+11.75
C — Catalyst	+5.75
G — Concentration	+5.75
E — Mixing speed	+3.25
B — Pressure	+2.25
F — pH	+1.75
D — Time	+0.25

Temperature is the runaway winner. Catalyst type and Concentration are secondary. Mixing, Pressure, pH, and Time show small effects — set them to whichever level is cheapest.

The Pareto chart below visualises this: factors A, G, and C clear the decision cutoff (dashed line), while the rest fall in the noise zone on the half-normal plot.

Stage 1 — Pareto chart and half-normal probability plot from the Plackett-Burman screening

Part 2 — Full Factorial Design

Once screening identifies the vital few (here: A, C, G), a full factorial design tests every possible combination of those factors. No shortcuts — you cover the entire space.

How it's built

With 3 factors at 2 levels each you run 2³ = 8 experiments:

Run	A	C	G
1	−	−	−
2	+	−	−
3	−	+	−
4	+	+	−
5	−	−	+
6	+	−	+
7	−	+	+
8	+	+	+

Every column is balanced, and every pair of columns shows all four combinations equally — that's what lets you estimate interactions cleanly.

Why use it

The big payoff over Plackett-Burman is detecting interactions — when the effect of one factor depends on the level of another. "Raising temperature only helps when the catalyst is Type 2" is an interaction; PB is blind to it, a full factorial sees it clearly.

You compute main effects the same way as before. Interaction effects use a derived column: the A×C column is just A × C row by row, and you average the response at + rows minus the − rows.

What you can model

A 2-level full factorial gives you a linear model with interactions:

Yield = β₀ + β₁·A + β₂·C + β₃·G + β₁₂·AC + β₁₃·AG + β₂₃·CG + β₁₂₃·ACG

There are no squared terms — you can't see curvature with only two levels per factor.

The catch

Runs explode fast. 3 factors = 8 runs (fine). 5 factors = 32 runs (painful). 7 factors = 128 runs (impractical). Screen first, then run a full factorial only on the 2–4 survivors.

The plots below show main effects (top row) and interaction plots (bottom row) for factors A, C, G. Non-parallel lines in the interaction plots are the signature of a real interaction — notice A × C: the two lines cross, confirming these two factors interact strongly.

Stage 2 — Main effects and interaction plots from the full factorial

Part 3 — Response Surface Design

A full factorial at 2 levels tells you the direction of each factor but not the shape. A response surface design adds a third level (and sometimes more), so you can map out curves, peaks, and valleys.

Why you need curvature

Imagine yield as a function of temperature. From a 2-level experiment you might see yield going up and conclude "hotter is better." But the true relationship might be a hill — yield rises, peaks at 75 °C, then falls. With data only at 60 °C and 90 °C you'd miss the peak entirely. A third measurement (75 °C) reveals the curve.

Common designs

Central Composite Design (CCD). Start with a 2-level factorial (corners of a cube). Add a center point — a run with every factor at its midpoint. Then add axial points that push one factor past its high/low while the others stay at the center. For 2 factors the geometry looks like a square with a star superimposed: 4 corners + 1 center + 4 star points = 9 runs. The star points are what detect curvature along each axis.

Box-Behnken Design. Skips the corner and extreme axial runs. More efficient when you have many factors and running every factor at its extreme simultaneously would be dangerous or expensive.

What you can model

The fitted model is a full quadratic — squared terms that capture curvature:

Yield = β₀ + β₁·A + β₂·C + β₃·G + β₁₁·A² + β₂₂·C² + β₃₃·G² + β₁₂·AC + β₁₃·AG + β₂₃·CG

Once you have this equation you can take derivatives, set them to zero, and solve for the optimum. You can also draw contour plots — topographic maps of the response — that show exactly where the peak is and how steep the surrounding slopes are.

The contour plots below show the predicted yield surface for each pair of factors (third factor held at its optimum). The red star marks the predicted optimum.

Stage 3 — Contour plots from the Central Composite Design response surface

The 3D surface below shows the same A vs C slice: a clear hill with a peak around yield = 73.7, confirming the model has captured the curvature well.

Stage 3 — 3D response surface for factors A and C

How the Three Stages Fit Together

Stage	Design	Question answered
1. Screening	Plackett-Burman	Of my 7+ factors, which 2–4 actually matter?
2. Characterization	Full factorial (2-level)	How do those factors and their interactions affect the response?
3. Optimization	Response surface (CCD or Box-Behnken)	Exactly what settings maximize the response?

Each stage uses the results of the previous to focus effort. You start with a wide net and many factors, then progressively zoom in — fewer factors, more levels, more detail.

To close with the cake-baking example: Plackett-Burman might tell you that temperature, time, and sugar are the only factors that really matter. A 2³ full factorial tells you whether temperature and time interact (they almost certainly do). Finally, a response surface design pinpoints the exact combination that produces the perfect cake — including the curvature that says "175 °C is the sweet spot, and going either above or below makes things worse."

Python Simulation

All three stages above are simulated end-to-end in a companion Python project. The script runs the full pipeline — PB screening, full factorial, and CCD response surface — against a known true model so you can see exactly how well each stage recovers reality.

The true underlying process (unknown to the experimenter, known to us) is:

Yield = 70 + 4·A + 0.1·B + 2·C + 0.1·D + 0.2·E + 0.15·F + 3·G + 1.0·A·C − 3·A² − 2·C² − 2·G² + noise (σ = 1.0)

All factors are in coded units ∈ [−1, +1]. The important factors are A, C, and G; the real interaction is A × C.

The screening stage correctly identifies A, C, G as the vital few. The factorial stage picks up the A×C interaction. The response surface stage finds an optimum within ≈ 0.2 yield units of the true analytical maximum.

Run	A	B	C	D	E	F	G
1	+	+	+	−	+	−	−
2	−	+	+	+	−	+	−
3	−	−	+	+	+	−	+
4	+	−	−	+	+	+	−
5	−	+	−	−	+	+	+
6	+	−	+	−	−	+	+
7	+	+	−	+	−	−	+
8	−	−	−	−	−	−	−

Run	A	B	C	D	E	F	G	Yield (%)
1	+	+	+	−	+	−	−	78
2	−	+	+	+	−	+	−	65
3	−	−	+	+	+	−	+	70
4	+	−	−	+	+	+	−	72
5	−	+	−	−	+	+	+	68
6	+	−	+	−	−	+	+	80
7	+	+	−	+	−	−	+	75
8	−	−	−	−	−	−	−	55

Run	A	B	C	D	E	F	G
1	+	+	+	−	+	−	−
2	−	+	+	+	−	+	−
3	−	−	+	+	+	−	+
4	+	−	−	+	+	+	−
5	−	+	−	−	+	+	+
6	+	−	+	−	−	+	+
7	+	+	−	+	−	−	+
8	−	−	−	−	−	−	−

Run	A	B	C	D	E	F	G	Yield (%)
1	+	+	+	−	+	−	−	78
2	−	+	+	+	−	+	−	65
3	−	−	+	+	+	−	+	70
4	+	−	−	+	+	+	−	72
5	−	+	−	−	+	+	+	68
6	+	−	+	−	−	+	+	80
7	+	+	−	+	−	−	+	75
8	−	−	−	−	−	−	−	55

Run	A	B	C	D	E	F	G
1	+	+	+	−	+	−	−
2	−	+	+	+	−	+	−
3	−	−	+	+	+	−	+
4	+	−	−	+	+	+	−
5	−	+	−	−	+	+	+
6	+	−	+	−	−	+	+
7	+	+	−	+	−	−	+
8	−	−	−	−	−	−	−

Run	A	B	C	D	E	F	G	Yield (%)
1	+	+	+	−	+	−	−	78
2	−	+	+	+	−	+	−	65
3	−	−	+	+	+	−	+	70
4	+	−	−	+	+	+	−	72
5	−	+	−	−	+	+	+	68
6	+	−	+	−	−	+	+	80
7	+	+	−	+	−	−	+	75
8	−	−	−	−	−	−	−	55