Surface response design for bond strength & print visual quality optimization
Author
Laura Symul
Published
April 29, 2024
In this document, we plan the experiments and analyze corresponding data to optimize the bond strength and print visual quality of the seal from a machine sealing bags containing sterile material.
1 Central-composite design
We anticipate that both the bond and the print quality will be well described by a second-order polynomial model and we have a budget of 20 trials for finding the optimum. Consequently, we could use a central composite design or a Box-Behnken design. Here, we will use a central composite design. As exercise, you could repeat these experiments and analyses with a Box-Behnken design.
Code
x_formula <-"Temperature + Time + Pressure + Temperature:Time + Temperature:Pressure + Time:Pressure + I(Temperature^2) + I(Time^2) + I(Pressure^2)"CCD <- rsm::ccd(basis =3, n0 =c(3,0), randomize = F, alpha ="faces",coding =list ( x1 ~rescale(Temperature, from =c(120, 180), to =c(-1,1)), x2 ~rescale(Time, from =c(0.2, 2), to =c(-1,1)), x3 ~rescale(Pressure, from =c(50, 150), to =c(-1,1)) ))
Note: I chose to do 3 repetitions at the center point, such that the total number of experiment sums to 17. Remember that I had a budget of 20 experiments, so, why not doing more repetitions and better estimate the experimental variability?
Well, I could have done that, but I wanted to keep some experiments for validation. Once we have the 17 initial data points from the CCD, we can fit our model, identify the best combination of factor values, then use the remaining 3 experiments to collect additional data at that point to confirm that the obtained values are indeed optimal.
I used the plan generated by the ccd function to run the experiments on the simulator.
Note that I did not randomize the order of the experiments. This is not a problem because we’re using a simulator and we can assume that the simulator is stable over time. In a real experiment, you should randomize the order of the experiments to avoid any bias due to any change in conditions with time.
Once I’ve used the simulator to “perform the experiments”, I downloaded the results from the simulator using the tab “Report” in the simulator in the “decimal: dot” format. The results are stored in the file data/CCD_17.csv. Let’s load the data and analyze it.
Rows: 17 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
dbl (6): Trial_number, Temperature, Time, Pressure, Bond, Print
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We start the statistical analysis of the data by modeling the bond strength.
Code
bond_formula <-"Bond ~ "|>str_c(x_formula)mod_bond <-lm(bond_formula, data = ccd_df)mod_bond |>summary()
Call:
lm(formula = bond_formula, data = ccd_df)
Residuals:
Min 1Q Median 3Q Max
-2.9470 -1.5851 0.2439 1.5209 3.0430
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.734e+02 4.066e+01 -14.100 2.14e-06 ***
Temperature 6.796e+00 5.634e-01 12.062 6.14e-06 ***
Time 2.314e+02 7.423e+00 31.173 9.03e-09 ***
Pressure 5.998e-02 1.677e-01 0.358 0.731
I(Temperature^2) -1.742e-02 1.858e-03 -9.376 3.27e-05 ***
I(Time^2) -2.006e+01 2.065e+00 -9.717 2.58e-05 ***
I(Pressure^2) -6.389e-04 6.690e-04 -0.955 0.371
Temperature:Time -1.192e+00 3.585e-02 -33.256 5.76e-09 ***
Temperature:Pressure 4.183e-04 6.453e-04 0.648 0.537
Time:Pressure -6.167e-03 2.151e-02 -0.287 0.783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.738 on 7 degrees of freedom
Multiple R-squared: 0.996, Adjusted R-squared: 0.9908
F-statistic: 192.7 on 9 and 7 DF, p-value: 1.517e-07
Code
# model.matrix(mod_bond)
We observe that the model is significant (very small p-value) and explains a lot of variance in the data (R-squared). We do observe that many parameters are statistically different from 0 except for those including the Pressure. From that, we conclude that the Pressure is not relevant for the strength of the bond.
2.2.1 Goodness of fit / Lack of fit
Let’s further check the quality of our model.
Code
tibble(actual_bond = ccd_df$Bond,predicted_bond =predict(mod_bond)) |>ggplot(aes(x = actual_bond, y = predicted_bond)) +geom_abline(slope =1, intercept =0, color ="black") +geom_smooth(col ="black", method ="lm", formula ='y ~ x') +geom_point(col ="steelblue1") +xlab("Observed bond") +ylab("Predicted bond")
Predicted values are very close to obseved values, which is good.
Code
center_repetitions <- ccd_df |>group_by(Temperature, Time, Pressure) |>mutate(mean_bond =mean(Bond), residuals = Bond - mean_bond, n =n()) |>filter(n >1) |>ungroup()tibble(residuals = mod_bond$residuals,predicted_bond =predict(mod_bond)) |>ggplot(aes(x = predicted_bond, y = residuals)) +geom_hline(yintercept =0, color ="black") +geom_hline(data = center_repetitions, aes(yintercept = residuals), col ="tomato", linetype =2) +geom_smooth(col ="black", method ="lm", formula ='y ~ x') +geom_point(col ="steelblue1") +xlab("Predicted bond") +ylab("Residuals")
We see that the residuals are relatively well distributed around 0, which is good. We also see that they are relatively small compared to the residuals at the center point (= the difference between the observed values at the center point - the mean of these values) shown by the horizontal red lines.
From this, it does not look like there is lack of fit, but let’s formally test for it.
Since we saw that the pressure was not very relevant for the bond strength, we can predict the bond for a range of values of the temperature and time, keeping the pressure constant at the center point, and visualize how the bond strength varies with these two factors.
bond_wide <- newdata |>select(Temperature, Time, bond) |>pivot_wider(names_from = Time, values_from = bond) |>select(-Temperature) |>as.matrix()plot_ly(x =unique(newdata$Temperature), y =unique(newdata$Time), z = bond_wide) |>add_surface() |>layout(scene =list(xaxis =list(title ="Temperature"),yaxis =list(title ="Time"),zaxis =list(title ="Bond") ) )
From these plots, we see that there are ranges of coupled values of Temperature & Time that give us the optimal bond.
Let’s now repeat these analyses but with the response variable being the print.
2.3 Response = Print
Code
print_formula <-"Print ~ "|>str_c(x_formula)mod_print <-lm(print_formula, data = ccd_df)mod_print |>summary()
Call:
lm(formula = print_formula, data = ccd_df)
Residuals:
Min 1Q Median 3Q Max
-0.18594 -0.08126 -0.01526 0.08074 0.18806
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.831e+00 2.363e+00 -3.314 0.01288 *
Temperature 1.574e-01 3.274e-02 4.807 0.00195 **
Time 1.041e+00 4.314e-01 2.412 0.04665 *
Pressure 1.122e-02 9.748e-03 1.151 0.28735
I(Temperature^2) -5.805e-04 1.080e-04 -5.375 0.00104 **
I(Time^2) -5.092e-01 1.200e-01 -4.243 0.00383 **
I(Pressure^2) -2.499e-05 3.888e-05 -0.643 0.54097
Temperature:Time -3.843e-03 2.084e-03 -1.844 0.10768
Temperature:Pressure 1.083e-05 3.750e-05 0.289 0.78106
Time:Pressure -1.389e-04 1.250e-03 -0.111 0.91466
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1591 on 7 degrees of freedom
Multiple R-squared: 0.9852, Adjusted R-squared: 0.9661
F-statistic: 51.66 on 9 and 7 DF, p-value: 1.422e-05
Here too, we observe that the model is significant (very small p-value) and explains a lot of variance in the data (R-squared). We observe that many parameters are statistically different from 0 except for those including the Pressure and the interaction. From that, we conclude that the Pressure is also not relevant for the visual quality of the print - lucky us!
2.3.1 Goodness of fit / Lack of fit
Let’s further check the quality of our model.
Code
tibble(actual_print = ccd_df$Print,predicted_print =predict(mod_print)) |>ggplot(aes(x = actual_print, y = predicted_print)) +geom_abline(slope =1, intercept =0, color ="black") +geom_smooth(col ="black", method ="lm", formula ='y ~ x') +geom_point(col ="steelblue1") +xlab("Observed print") +ylab("Predicted print")
Predicted values are very close to obseved values, which is good.
Code
center_repetitions <- ccd_df |>group_by(Temperature, Time, Pressure) |>mutate(mean_print =mean(Print), residuals = Print - mean_print, n =n()) |>filter(n >1) |>ungroup()tibble(residuals = mod_print$residuals,predicted_print =predict(mod_print)) |>ggplot(aes(x = predicted_print, y = residuals)) +geom_hline(yintercept =0, color ="black") +geom_hline(data = center_repetitions, aes(yintercept = residuals), col ="tomato", linetype =2) +geom_smooth(col ="black", method ="lm", formula ='y ~ x') +geom_point(col ="steelblue1") +xlab("Predicted print") +ylab("Residuals")
We see that the residuals are relatively well distributed around 0, which is good. We also see that they are of the same range than the residuals at the center point (= the difference between the observed values at the center point - the mean of these values) shown by the horizontal red lines.
From this, it does not look like there is lack of fit, but let’s formally test for it.
Since we saw that the pressure was not very relevant for the print quality, we can do the same as we did for the bond and predict the print for a range of values of the temperature and time, keeping the pressure constant at the center point, and visualize how the print quality varies with these two factors.
Remember that for the print, the higher the better.
Code
newdata <- newdata |>mutate(print =predict(mod_print, newdata = newdata) )newdata |>ggplot(aes(x = Temperature, y = Time)) +geom_tile(aes(fill = print)) +scale_fill_gradient(low ="tomato", high ="steelblue1") +geom_contour(aes(z = print), color ="white", binwidth =0.25)
From these plots, we see that there is one point at the optimum, which is obtained for the following values of the factors:
Code
newdata |>filter(print ==max(print)) |>pander()
Temperature
Time
Pressure
bond
print
135.3
0.4909
100
61.59
3.983
2.4 Combining results
We can now combine the results and see what are the optimal values for the factors that give us the best bond and print quality.
Code
g <- newdata |>ggplot(aes(x = Temperature, y = Time)) +geom_contour_filled(aes(z = bond), binwidth =5) +geom_contour(aes(z = print), color ="white", binwidth =0.1) +geom_point(data = newdata |>filter(print ==max(print)), col ="white", size =3, shape =4)g
From the plot above, we see that the optimum for the print conflicts with the acceptable range for the bond. We will thus pick the values that keep the bond within the acceptable range and maximize the print quality.
We define an acceptable range for the bond as 85 \(\pm \ \delta\).
Rows: 3 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
dbl (6): Trial_number, Temperature, Time, Pressure, Bond, Print
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.