Surface response design for bond strength & print visual quality optimization

Author

Laura Symul

Published

April 29, 2024

In this document, we plan the experiments and analyze corresponding data to optimize the bond strength and print visual quality of the seal from a machine sealing bags containing sterile material.

1 Central-composite design

We anticipate that both the bond and the print quality will be well described by a second-order polynomial model and we have a budget of 20 trials for finding the optimum. Consequently, we could use a central composite design or a Box-Behnken design. Here, we will use a central composite design. As exercise, you could repeat these experiments and analyses with a Box-Behnken design.

Code
x_formula <- "Temperature + Time + Pressure + Temperature:Time + Temperature:Pressure + Time:Pressure + I(Temperature^2) + I(Time^2) + I(Pressure^2)"

CCD <- rsm::ccd(
  basis = 3, n0 = c(3,0), randomize = F, alpha = "faces",
  coding = list (
    x1 ~ rescale(Temperature, from = c(120, 180), to = c(-1,1)), 
    x2 ~ rescale(Time, from = c(0.2, 2), to = c(-1,1)),
    x3 ~ rescale(Pressure, from = c(50, 150), to = c(-1,1))
  )
) 

The design is:

Code
CCD
   run.order std.order Temperature Time Pressure Block
1          1         1         120  0.2       50     1
2          2         2         180  0.2       50     1
3          3         3         120  2.0       50     1
4          4         4         180  2.0       50     1
5          5         5         120  0.2      150     1
6          6         6         180  0.2      150     1
7          7         7         120  2.0      150     1
8          8         8         180  2.0      150     1
9          9         9         150  1.1      100     1
10        10        10         150  1.1      100     1
11        11        11         150  1.1      100     1
12         1         1         120  1.1      100     2
13         2         2         180  1.1      100     2
14         3         3         150  0.2      100     2
15         4         4         150  2.0      100     2
16         5         5         150  1.1       50     2
17         6         6         150  1.1      150     2

Data are stored in coded form using these coding formulas ...
x1 ~ rescale(Temperature, from = c(120, 180), to = c(-1, 1))
x2 ~ rescale(Time, from = c(0.2, 2), to = c(-1, 1))
x3 ~ rescale(Pressure, from = c(50, 150), to = c(-1, 1))

Note: I chose to do 3 repetitions at the center point, such that the total number of experiment sums to 17. Remember that I had a budget of 20 experiments, so, why not doing more repetitions and better estimate the experimental variability?

Well, I could have done that, but I wanted to keep some experiments for validation. Once we have the 17 initial data points from the CCD, we can fit our model, identify the best combination of factor values, then use the remaining 3 experiments to collect additional data at that point to confirm that the obtained values are indeed optimal.

I used the plan generated by the ccd function to run the experiments on the simulator.

Note that I did not randomize the order of the experiments. This is not a problem because we’re using a simulator and we can assume that the simulator is stable over time. In a real experiment, you should randomize the order of the experiments to avoid any bias due to any change in conditions with time.

Once I’ve used the simulator to “perform the experiments”, I downloaded the results from the simulator using the tab “Report” in the simulator in the “decimal: dot” format. The results are stored in the file data/CCD_17.csv. Let’s load the data and analyze it.

2 Data analysis

Code
ccd_df <- read_delim("data/CCD_17.csv", delim = ";")
Rows: 17 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
dbl (6): Trial_number, Temperature, Time, Pressure, Bond, Print

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
ccd_df |> pander()
Trial_number Temperature Time Pressure Bond Print
1 120 0.2 50 13.65 3.41
2 180 0.2 50 91.46 2.27
3 120 2 50 91.17 2.45
4 180 2 50 44.73 0.95
5 120 0.2 150 8.66 4.09
6 180 0.2 150 93.49 3.07
7 120 2 150 89.58 3.16
8 180 2 150 41.14 1.67
9 150 1.1 100 95.74 3.79
10 150 1.1 100 89.75 3.59
11 150 1.1 100 93.72 3.51
12 120 1.1 100 65.82 3.52
13 180 1.1 100 87.65 2.7
14 150 0.2 100 70.68 3.93
15 150 2 100 81.65 2.51
16 150 1.1 50 89.71 3.1
17 150 1.1 150 91.93 4.04

2.1 Data visualization

We first do some visual exploration of the data.

Code
bond_colors <- colorRampPalette(colors = c("red4","tomato", "steelblue1", "black", "black"))(85*2)
print_colors <- colorRampPalette(colors = c("red4", "tomato", "steelblue1"))(50)

j_bond <- seq(ccd_df$Bond |> min() |> floor(), ccd_df$Bond |> max() |> ceiling())
j_print <- seq((10 * ccd_df$Print) |> min() |> floor(), (10 * ccd_df$Print) |> max() |> ceiling())

ccd_df |>  
  plot_ly(
    type = "scatter3d", mode = "markers",
    x = ~Temperature, y = ~Time, z = ~Pressure,
    color = ~Bond, colors = bond_colors[j_bond],
    text = ~paste0("Bond: ", Bond,"\nPrint: ", Print)
  ) |> 
  add_text(text = ~Trial_number, color = I("black"), showlegend = F) |> 
  layout(
    scene = 
      list(
        xaxis = list(range = c(120, 180)), 
        yaxis = list(range = c(0.2, 2)), 
        zaxis = list(range = c(50, 150), tickvals = seq(50, 150, by = 25))
      )
  )
Code
ccd_df |>  
  plot_ly(
    type = "scatter3d", mode = "markers",
    x = ~Temperature, y = ~Time, z = ~Pressure,
    color = ~Print, colors = print_colors[j_print],
    text = ~paste0("Bond: ", Bond,"\nPrint: ", Print)
  ) |> 
  add_text(text = ~Trial_number, color = I("black"), showlegend = F) |> 
  layout(
    scene = 
      list(
        xaxis = list(range = c(120, 180)), 
        yaxis = list(range = c(0.2, 2)), 
        zaxis = list(range = c(50, 150), tickvals = seq(50, 150, by = 25))
      )
  )

2.2 Response = Bond

We start the statistical analysis of the data by modeling the bond strength.

Code
bond_formula <- "Bond ~ " |> str_c(x_formula)

mod_bond <- lm(bond_formula, data = ccd_df)
mod_bond |> summary() 

Call:
lm(formula = bond_formula, data = ccd_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9470 -1.5851  0.2439  1.5209  3.0430 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -5.734e+02  4.066e+01 -14.100 2.14e-06 ***
Temperature           6.796e+00  5.634e-01  12.062 6.14e-06 ***
Time                  2.314e+02  7.423e+00  31.173 9.03e-09 ***
Pressure              5.998e-02  1.677e-01   0.358    0.731    
I(Temperature^2)     -1.742e-02  1.858e-03  -9.376 3.27e-05 ***
I(Time^2)            -2.006e+01  2.065e+00  -9.717 2.58e-05 ***
I(Pressure^2)        -6.389e-04  6.690e-04  -0.955    0.371    
Temperature:Time     -1.192e+00  3.585e-02 -33.256 5.76e-09 ***
Temperature:Pressure  4.183e-04  6.453e-04   0.648    0.537    
Time:Pressure        -6.167e-03  2.151e-02  -0.287    0.783    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.738 on 7 degrees of freedom
Multiple R-squared:  0.996, Adjusted R-squared:  0.9908 
F-statistic: 192.7 on 9 and 7 DF,  p-value: 1.517e-07
Code
# model.matrix(mod_bond)

We observe that the model is significant (very small p-value) and explains a lot of variance in the data (R-squared). We do observe that many parameters are statistically different from 0 except for those including the Pressure. From that, we conclude that the Pressure is not relevant for the strength of the bond.

2.2.1 Goodness of fit / Lack of fit

Let’s further check the quality of our model.

Code
tibble(
  actual_bond = ccd_df$Bond,
  predicted_bond = predict(mod_bond)
) |> 
  ggplot(aes(x = actual_bond, y = predicted_bond)) +
  geom_abline(slope = 1, intercept = 0, color = "black") +
  geom_smooth(col = "black", method = "lm", formula = 'y ~ x') +
  geom_point(col = "steelblue1") +
  xlab("Observed bond") + ylab("Predicted bond")

Predicted values are very close to obseved values, which is good.

Code
center_repetitions <- 
  ccd_df |> 
  group_by(Temperature, Time, Pressure) |> 
  mutate(mean_bond = mean(Bond), residuals = Bond - mean_bond, n = n()) |> 
  filter(n > 1) |> ungroup()

tibble(
  residuals = mod_bond$residuals,
  predicted_bond = predict(mod_bond)
) |> 
  ggplot(aes(x = predicted_bond, y = residuals)) +
  geom_hline(yintercept = 0, color = "black") +
  geom_hline(data = center_repetitions, aes(yintercept = residuals), col = "tomato", linetype = 2) +
  geom_smooth(col = "black", method = "lm", formula = 'y ~ x') +
  geom_point(col = "steelblue1") +
  xlab("Predicted bond") + ylab("Residuals")

We see that the residuals are relatively well distributed around 0, which is good. We also see that they are relatively small compared to the residuals at the center point (= the difference between the observed values at the center point - the mean of these values) shown by the horizontal red lines.

From this, it does not look like there is lack of fit, but let’s formally test for it.

Code
pure_error <- 
  ccd_df |> 
  group_by(Temperature, Time, Pressure) |> 
  mutate(
    mean_bond = mean(Bond),
    residuals = Bond - mean_bond
  ) |> 
  summarise(n = n(), SSE = sum(residuals^2), df = n-1, .groups = "drop") |> 
  filter(n > 1) |> 
  mutate(source = "pure error") |> 
  select(source, df, SSE)

lack_of_fit <- 
  tibble(
    source = "lack of fit",
    df = nrow(ccd_df) - length(mod_bond$coefficients) - pure_error$df,
    SSE = sum(mod_bond$residuals^2) - pure_error$SSE
    )

model_residuals <- 
  tibble(
    source = "model residuals",
    SSE = sum(mod_bond$residuals^2),
    df = mod_bond$df.residual
  )

bind_rows(pure_error, lack_of_fit, model_residuals) |> 
  mutate(MSE = SSE/df) |> 
  mutate(
    F_stat = ifelse(source == "lack of fit", MSE/lag(MSE), NA),
    p_value = pf(F_stat, df[1], df[2], lower.tail = F)
    ) |> pander()
source df SSE MSE F_stat p_value
pure error 2 18.57 9.287 NA NA
lack of fit 5 33.89 6.779 0.7299 0.5271
model residuals 7 52.47 7.495 NA NA

2.2.2 Surface response

Since we saw that the pressure was not very relevant for the bond strength, we can predict the bond for a range of values of the temperature and time, keeping the pressure constant at the center point, and visualize how the bond strength varies with these two factors.

Code
newdata <- 
  expand_grid(
  Temperature = seq(min(ccd_df$Temperature), max(ccd_df$Temperature), length.out = 60),
  Time = seq(min(ccd_df$Time), max(ccd_df$Time), length.out = 100),
  Pressure = 100
) 

newdata <- 
  newdata |> 
  mutate(
    bond = predict(mod_bond, newdata = newdata)
  )



newdata |> 
  ggplot(aes(x = Temperature, y = Time)) +
  geom_tile(aes(fill = bond)) +
  scale_fill_gradient2(low = "tomato", mid = "steelblue1", high = "black", midpoint = 85) +
  geom_contour(aes(z = bond), color = "white", binwidth = 10)

Code
newdata |> 
  ggplot(aes(x = Temperature, y = Time)) +
  geom_contour_filled(aes(z = bond), binwidth = 5)

Code
bond_wide <- 
  newdata |> 
  select(Temperature, Time, bond) |> 
  pivot_wider(names_from = Time, values_from = bond) |>
  select(-Temperature) |> 
  as.matrix()

plot_ly(x = unique(newdata$Temperature), y = unique(newdata$Time), z = bond_wide) |> 
  add_surface() |> 
  layout(
    scene = 
      list(
        xaxis = list(title = "Temperature"),
        yaxis = list(title = "Time"),
        zaxis = list(title = "Bond")
      )
  )

From these plots, we see that there are ranges of coupled values of Temperature & Time that give us the optimal bond.

Let’s now repeat these analyses but with the response variable being the print.

2.3 Response = Print

Code
print_formula <- "Print ~ " |> str_c(x_formula)

mod_print <- lm(print_formula, data = ccd_df)
mod_print |> summary() 

Call:
lm(formula = print_formula, data = ccd_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.18594 -0.08126 -0.01526  0.08074  0.18806 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)   
(Intercept)          -7.831e+00  2.363e+00  -3.314  0.01288 * 
Temperature           1.574e-01  3.274e-02   4.807  0.00195 **
Time                  1.041e+00  4.314e-01   2.412  0.04665 * 
Pressure              1.122e-02  9.748e-03   1.151  0.28735   
I(Temperature^2)     -5.805e-04  1.080e-04  -5.375  0.00104 **
I(Time^2)            -5.092e-01  1.200e-01  -4.243  0.00383 **
I(Pressure^2)        -2.499e-05  3.888e-05  -0.643  0.54097   
Temperature:Time     -3.843e-03  2.084e-03  -1.844  0.10768   
Temperature:Pressure  1.083e-05  3.750e-05   0.289  0.78106   
Time:Pressure        -1.389e-04  1.250e-03  -0.111  0.91466   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1591 on 7 degrees of freedom
Multiple R-squared:  0.9852,    Adjusted R-squared:  0.9661 
F-statistic: 51.66 on 9 and 7 DF,  p-value: 1.422e-05

Here too, we observe that the model is significant (very small p-value) and explains a lot of variance in the data (R-squared). We observe that many parameters are statistically different from 0 except for those including the Pressure and the interaction. From that, we conclude that the Pressure is also not relevant for the visual quality of the print - lucky us!

2.3.1 Goodness of fit / Lack of fit

Let’s further check the quality of our model.

Code
tibble(
  actual_print = ccd_df$Print,
  predicted_print = predict(mod_print)
) |> 
  ggplot(aes(x = actual_print, y = predicted_print)) +
  geom_abline(slope = 1, intercept = 0, color = "black") +
  geom_smooth(col = "black", method = "lm", formula = 'y ~ x') +
  geom_point(col = "steelblue1") +
  xlab("Observed print") + ylab("Predicted print")

Predicted values are very close to obseved values, which is good.

Code
center_repetitions <- 
  ccd_df |> 
  group_by(Temperature, Time, Pressure) |> 
  mutate(mean_print = mean(Print), residuals = Print - mean_print, n = n()) |> 
  filter(n > 1) |> ungroup()

tibble(
  residuals = mod_print$residuals,
  predicted_print = predict(mod_print)
) |> 
  ggplot(aes(x = predicted_print, y = residuals)) +
  geom_hline(yintercept = 0, color = "black") +
  geom_hline(data = center_repetitions, aes(yintercept = residuals), col = "tomato", linetype = 2) +
  geom_smooth(col = "black", method = "lm", formula = 'y ~ x') +
  geom_point(col = "steelblue1") +
  xlab("Predicted print") + ylab("Residuals")

We see that the residuals are relatively well distributed around 0, which is good. We also see that they are of the same range than the residuals at the center point (= the difference between the observed values at the center point - the mean of these values) shown by the horizontal red lines.

From this, it does not look like there is lack of fit, but let’s formally test for it.

Code
pure_error <- 
  ccd_df |> 
  group_by(Temperature, Time, Pressure) |> 
  mutate(
    mean_print = mean(Print),
    residuals = Print - mean_print
  ) |> 
  summarise(n = n(), SSE = sum(residuals^2), df = n-1, .groups = "drop") |> 
  filter(n > 1) |> 
  mutate(source = "pure error") |> 
  select(source, df, SSE)

lack_of_fit <- 
  tibble(
    source = "lack of fit",
    df = nrow(ccd_df) - length(mod_print$coefficients) - pure_error$df,
    SSE = sum(mod_print$residuals^2) - pure_error$SSE
    )

model_residuals <- 
  tibble(
    source = "model residuals",
    SSE = sum(mod_print$residuals^2),
    df = mod_print$df.residual
  )

bind_rows(pure_error, lack_of_fit, model_residuals) |> 
  mutate(MSE = SSE/df) |> 
  mutate(
    F_stat = ifelse(source == "lack of fit", MSE/lag(MSE), NA),
    p_value = pf(F_stat, df[1], df[2], lower.tail = F)
    ) |> pander()
source df SSE MSE F_stat p_value
pure error 2 0.0416 0.0208 NA NA
lack of fit 5 0.1356 0.02713 1.304 0.3501
model residuals 7 0.1772 0.02532 NA NA

We do not see a significant lack of fit here.

2.3.2 Surface response

Since we saw that the pressure was not very relevant for the print quality, we can do the same as we did for the bond and predict the print for a range of values of the temperature and time, keeping the pressure constant at the center point, and visualize how the print quality varies with these two factors.

Remember that for the print, the higher the better.

Code
newdata <- 
  newdata |> 
  mutate(
    print = predict(mod_print, newdata = newdata)
  )

newdata |> 
  ggplot(aes(x = Temperature, y = Time)) +
  geom_tile(aes(fill = print)) +
  scale_fill_gradient(low = "tomato", high = "steelblue1") +
  geom_contour(aes(z = print), color = "white", binwidth = 0.25)

Code
newdata |> 
  ggplot(aes(x = Temperature, y = Time)) +
  geom_contour_filled(aes(z = print), binwidth = 0.25)

Code
print_wide <- 
  newdata |> 
  select(-Pressure, -bond) |> 
  pivot_wider(names_from = Time, values_from = print) |>
  select(-Temperature) |> 
  as.matrix()

plot_ly(x = unique(newdata$Temperature), y = unique(newdata$Time), z = print_wide) |> 
  add_surface() |> 
  layout(
    scene = 
      list(
        xaxis = list(title = "Temperature"),
        yaxis = list(title = "Time"),
        zaxis = list(title = "Print quality")
      )
  )

From these plots, we see that there is one point at the optimum, which is obtained for the following values of the factors:

Code
newdata |> filter(print == max(print)) |> pander()
Temperature Time Pressure bond print
135.3 0.4909 100 61.59 3.983

2.4 Combining results

We can now combine the results and see what are the optimal values for the factors that give us the best bond and print quality.

Code
g <- 
  newdata |> 
  ggplot(aes(x = Temperature, y = Time)) +
  geom_contour_filled(aes(z = bond), binwidth = 5) +
  geom_contour(aes(z = print), color = "white", binwidth = 0.1) +
  geom_point(data = newdata |> filter(print == max(print)), col = "white", size = 3, shape = 4)

g

From the plot above, we see that the optimum for the print conflicts with the acceptable range for the bond. We will thus pick the values that keep the bond within the acceptable range and maximize the print quality.

We define an acceptable range for the bond as 85 \(\pm \ \delta\).

This is achieve with the following values:

Code
delta <- 0.5

optimum <- 
  newdata |> 
  filter(bond >= 85 - delta, bond <= 85 + delta) |> 
  filter(print == max(print))

optimum <- 
  optimum |> 
  mutate(
    bond_PI_low = predict(mod_bond, newdata = optimum, interval = "prediction")[2],
    bond_PI_high = predict(mod_bond, newdata = optimum, interval = "prediction")[3],
    print_PI_low = predict(mod_print, newdata = optimum, interval = "prediction")[2],
    print_PI_high = predict(mod_print, newdata = optimum, interval = "prediction")[3]
  ) |> 
  select(Temperature, Time, Pressure, starts_with("bond"), starts_with("print"))

optimum |> pander()
Table continues below
Temperature Time Pressure bond bond_PI_low bond_PI_high print
142.4 0.8727 100 84.53 77.49 91.56 3.869
print_PI_low print_PI_high
3.46 4.278
Code
g +
  geom_hline(yintercept = optimum$Time, linetype = 2) +
  geom_vline(xintercept = optimum$Temperature, linetype = 2) +
  geom_point(data = optimum, size = 3)

Let’s now repeat our experiment at this optimum to see if we indeed obtain good values.

3 Validations

Code
val_df <- read_delim("data/Validation_3 .csv", delim = ";")
Rows: 3 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
dbl (6): Trial_number, Temperature, Time, Pressure, Bond, Print

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
df <- 
  bind_rows(
    ccd_df |> mutate(type = "experiment"),
    val_df |> mutate(type = "validation")
  ) |> 
  mutate(trial = row_number() |> factor())

df |> 
  ggplot(aes(x = trial, y = Bond, col = type)) +
  geom_hline(yintercept = 85, linetype = 2) +
  geom_hline(yintercept = c(optimum$bond_PI_low, optimum$bond_PI_high), linetype = 3) +
  geom_point(size = 2)  +

df |> 
  ggplot(aes(x = trial, y = Print, col = type)) +
  geom_hline(yintercept = c(optimum$print_PI_low, optimum$print_PI_high), linetype = 3) +
  geom_point(size = 2)  +
  
  plot_layout(guides = "collect")

Code
# df |> 
#   pivot_longer(cols = c(Bond, Print), names_to = "response", values_to = "value") |>
#   ggplot(aes(x = n, y = value, col = type)) +
#   geom_point() +
#   facet_grid(response ~ ., scales = "free_y")

Did we fail?

Code
t.test(val_df$Bond, mu = 85, alternative = "two.sided", conf.level = 0.95) |> pander()
One Sample t-test: val_df$Bond
Test statistic df P value Alternative hypothesis mean of x
-3.08 2 0.09123 two.sided 82.28

We didn’t :)

(But even if the p-value would have been smaller, this would not have meant that we failed but that the variability is larger than our “delta”)