Data Mining Builds Process Understanding for Vaccine Manufacturing WCBP 2009 Current Topics in Vaccine Development January 14, 2009 Julia O Neill, Principal Engineer Merck & Co., Inc. Global Vaccine Technology & Engineering
Merck develops and applies the most powerful data mining techniques to untangle the complexities of manufacturing biologic products. 2
An Example Manufacturing History of a Vaccine Bulk Bulk Potency by Lot Sequence Bulk Potency The inherent variability of biologics manufacturing presents challenges to developing process understanding. 3
Traditional Approach to Building Process Understanding: Examine One Change at a Time Bulk Potency by Lot Sequence 1 2 3 4 5 6 1. Identify potency shifts. 2. Identify process changes. 3. Match timing of shifts to changes. 4
Example Vaccine Manufacturing Process Bioreactors Downstream Cell Growth ~ 3 weeks Cell Growth and Virus propagation ~ 4 weeks Purification, Inactivation, etc. ~ 2 weeks Assay to determine bulk potency Dilution to appropriate strength in vials. Simplified schematic of a viral vaccine manufacturing process. 5
Biologics mantra: the product is the process. * Bioreactors Downstream Cell Bank Lot exhausted; new lot introduced. Virus Stock Seed Lot exhausted; new lot introduced. Chromatography resin lots exhausted and replaced. Raw Material preparation methods improved by vendor. A fixed process does not guarantee a fixed product. Improved assay implemented * Building on Steven Kozlowski s Monday talk. 6
New Approach to Building Process Understanding: Apply Multivariate Data Mining X s Y = Potency Investment in creation of electronic database: 900 + X variables Raw material lots Bioreactor monitored variables Time to conduct process steps Known changes etc. 7
Tree-Based Predictors X s Y = Potency Raw material lots Bioreactor monitored variables Time to conduct process steps Known changes etc. ( 900 + X variables ) Lots a,b,c Tree is grown by sequentially splitting Potency on additional input variables. Lots d,e,f 8
Acknowledgements Collaboration across many functional areas within Merck: Applied Computer Science & Mathematics Bioprocess & Bioanalytical Research & Development Fermentation & Cell Culture Global Vaccine Technology & Engineering Merck Lean Six Sigma Process Analytical Technology Regulatory & Analytical Sciences Vaccine Manufacturing Operations External statistical consultant: Jim Lucas 9
Random Forests A collection of trees with controlled variations. Trees vote for the best predictors. Advantages: Consistently matches or outperforms accuracy of other data mining methods. Handles a large number of inputs, resistant to over-fitting. Robust to outliers. Very fast. Not confounded by confounding. Estimates the importance of variables as predictors of the output. 10
Variable Importance for Bulk Potency by Random Forests process change 1 raw material change Day 4 Glucose DS raw material change 1 Day 1 DO input 1 CE Split II variable 1 CE Split II variable 2 Day 3 ph Day 2 Lactate CE Split I variable 1 timing variable GUR raw material prep CE Split III variable 3 CE Split I variable 1 CE Split III variable 4 CE Split III variable 5 CE Cell Bank lot change CE Split II variable 4 CE Split III variable 1 Day 5 DO CE Split II variable 5 Day 8 DO variable 6 CE Split III input variable Variable 7 Day 2 temperature input variables Important variables were suspected in advance of random forests analysis. Only 1 variable is Downstream all others are Bioreactor or Cell Expansion. 11
Simple Regression Model predictions based on 1 st, 2 nd, and 4 th variables on list Although a large percentage of the variation is explained overall, the predictions are not satisfactory for recent production. 12
Raw Material Lot Change Timing Bulk product lots New raw material lot New raw material lot Growth Propagation Purification Weeks Raw material changes may have a creeping impact. 13
160 140 120 100 80 60 40 20 0 Bioreactor - subtle shifts in Glucose - Day 3 Glucose (mg/dl) (1) 14 381 264 292 301 334 347 355 368 375 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 140 120 100 80 60 40 20 0 Lot # - Day 4 Glucose (mg/dl) (2) - Day 6 Glucose (mg/dl) (2) of Bioreactors (2) 100 90 80 70 60 50 40 30 20 10 0 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 Lot # Lot #
Partial Least Squares model improves predictions Predictions based on 1 st, 2 nd, and 4 th suspect variables alone. Partial Least Squares predictions incorporating all bioreactor monitored variables. 15
Causes of Bulk Potency Changes Bioreactors Downstream Higher output from bioreactors due to known raw material and process changes. Yield shifts related to variation across raw material lots. Contributing factor: Bioreactor performance cycling. Newly discovered pre-existing variability (Kozlowski) 16
Results Merck develops and applies the most powerful data mining techniques to untangle the complexities of manufacturing biologic products. Additional benefits: Ability to predict potency before assay results are available. - Monitor against a forecast potency. - Builds our understanding of the biology. Basis for revising CPP s. - Developing new control strategies. 17