Data Mining Builds Process Understanding for Vaccine Manufacturing

Similar documents
Biopharmaceutical Process Evaluated for Viral Clearance

YSI 2900 Series Biochemistry Analyzers FERMENTATION CONTROL FOOD & BEVERAGE CLINICAL RESEARCH

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Eden Biodesign ebook Monoclonal Antibody Production: Building the Platform

A Risk Assessment of Pre-Licensure Manufacturing Changes

Workshop on process validation

Challenges in the cgmp Manufacturing of hescs: Lessons Learned from Monoclonal Antibodies

A Methodology for Predictive Failure Detection in Semiconductor Fabrication

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Process Performance Qualification. Demonstrating a High Degree of Assurance in Stage 2 of the Process Validation Lifecycle

Valentina Gualato, Ph.D. Process Development Scientist

Using Ensemble of Decision Trees to Forecast Travel Time

Advanced analytics at your hands

Introduction to Bioprocessing

Viral Safety of Plasma-Derived Products

Luca Romagnoli, Ph.D. Business Development Manager

Knowledge Discovery and Data Mining

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Manufacturing process of biologics

Changes to an Approved Product

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

SCANTIBODIES Laboratory, Inc. Contract Monoclonal Antibody Production

Predictive Modeling Techniques in Insurance

Application Note. Separation of three monoclonal antibody variants using MCSGP. Summary

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Multivariate Tools for Modern Pharmaceutical Control FDA Perspective

Multiple Products in a Monoclonal Antibody S88.01 Batch Plant

Data Mining. Nonlinear Classification

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Monoclonal Antibody Production: Building the Platform. Andrew Clutterbuck Eden Biodesign Ltd.

Predicting borrowers chance of defaulting on credit loans

Decision Trees from large Databases: SLIQ

Assay Qualification Template for Host Cell Protein ELISA

Biomanufacturing Vision for the Future

Drugs store sales forecast using Machine Learning

Applying Statistics Recommended by Regulatory Documents

Application Note. Purifying common light-chain bispecific antibodies using MCSGP. Summary

Guidance for Industry

Using multiple models: Bagging, Boosting, Ensembles, Forests

Gerry Hobbs, Department of Statistics, West Virginia University

Application Note. Increasing the activity of monoclonal antibody isoforms by MCSGP. Summary

Better decision making under uncertain conditions using Monte Carlo Simulation

How to Deploy Models using Statistica SVB Nodes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Prediction of Stock Performance Using Analytical Techniques

Data Mining Methods: Applications for Institutional Research

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Why do statisticians "hate" us?

Biotechpharma company profile

Technology Transfer of CMC Activities for MAb Manufacturing ge healthcare (

Why is Internal Audit so Hard?

Hazard Analysis and Critical Control Points (HACCP) 1 Overview

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data Mining Practical Machine Learning Tools and Techniques

LFB GROUP & SANOFI combine their bioproduction capabilities to provide integrated CMO services for biopharmaceuticals

Advances in Biopharmaceutical and Vaccine Manufacturing Plants

Exciting Trends in Bioprocessing

Control of fermentation of lignocellulosic hydrolysates

Leveraging Ensemble Models in SAS Enterprise Miner

Vaccine Manufacturing Facilities of the Future. Howard L. Levine, Ph.D. Vaccines Europe London, England December 1 2, 2010

Guidance for Industry. Monoclonal Antibodies Used as Reagents in Drug Manufacturing

Predictive Analytics: Extracts from Red Olive foundational course

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

SAS Certificate Applied Statistics and SAS Programming

Terry Blevins Principal Technologist Emerson Process Management Austin, TX

Risk analysis and management is the cornerstone

ICH guideline Q11 on development and manufacture of drug substances (chemical entities and biotechnological/ biological entities)

Paper AA Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Data Mining for Knowledge Management. Classification

Random forest algorithm in big data environment

3 Chapter Three: Material and methods (clone creation, upstream and downstream process)

QbD based Development and Characterization of a Cell Culture Process for Therapeutic Proteins

The Importance of Developing a High Yield of Product

Knowledge Discovery and Data Mining

Forecasting in supply chains

Classification and Prediction

ID Class MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS

Biopharmaceutical Perspectives of Outsourcing: what are supply chain management issues and opportunities

Guidance for Industry

Monitoring chemical processes for early fault detection using multivariate data analysis methods

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Beating the MLB Moneyline

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

The Ever-increasing Complexity of Biotech Changes - A Pledge for Global Convergence

Master of Mathematical Finance: Course Descriptions

Transcription:

Data Mining Builds Process Understanding for Vaccine Manufacturing WCBP 2009 Current Topics in Vaccine Development January 14, 2009 Julia O Neill, Principal Engineer Merck & Co., Inc. Global Vaccine Technology & Engineering

Merck develops and applies the most powerful data mining techniques to untangle the complexities of manufacturing biologic products. 2

An Example Manufacturing History of a Vaccine Bulk Bulk Potency by Lot Sequence Bulk Potency The inherent variability of biologics manufacturing presents challenges to developing process understanding. 3

Traditional Approach to Building Process Understanding: Examine One Change at a Time Bulk Potency by Lot Sequence 1 2 3 4 5 6 1. Identify potency shifts. 2. Identify process changes. 3. Match timing of shifts to changes. 4

Example Vaccine Manufacturing Process Bioreactors Downstream Cell Growth ~ 3 weeks Cell Growth and Virus propagation ~ 4 weeks Purification, Inactivation, etc. ~ 2 weeks Assay to determine bulk potency Dilution to appropriate strength in vials. Simplified schematic of a viral vaccine manufacturing process. 5

Biologics mantra: the product is the process. * Bioreactors Downstream Cell Bank Lot exhausted; new lot introduced. Virus Stock Seed Lot exhausted; new lot introduced. Chromatography resin lots exhausted and replaced. Raw Material preparation methods improved by vendor. A fixed process does not guarantee a fixed product. Improved assay implemented * Building on Steven Kozlowski s Monday talk. 6

New Approach to Building Process Understanding: Apply Multivariate Data Mining X s Y = Potency Investment in creation of electronic database: 900 + X variables Raw material lots Bioreactor monitored variables Time to conduct process steps Known changes etc. 7

Tree-Based Predictors X s Y = Potency Raw material lots Bioreactor monitored variables Time to conduct process steps Known changes etc. ( 900 + X variables ) Lots a,b,c Tree is grown by sequentially splitting Potency on additional input variables. Lots d,e,f 8

Acknowledgements Collaboration across many functional areas within Merck: Applied Computer Science & Mathematics Bioprocess & Bioanalytical Research & Development Fermentation & Cell Culture Global Vaccine Technology & Engineering Merck Lean Six Sigma Process Analytical Technology Regulatory & Analytical Sciences Vaccine Manufacturing Operations External statistical consultant: Jim Lucas 9

Random Forests A collection of trees with controlled variations. Trees vote for the best predictors. Advantages: Consistently matches or outperforms accuracy of other data mining methods. Handles a large number of inputs, resistant to over-fitting. Robust to outliers. Very fast. Not confounded by confounding. Estimates the importance of variables as predictors of the output. 10

Variable Importance for Bulk Potency by Random Forests process change 1 raw material change Day 4 Glucose DS raw material change 1 Day 1 DO input 1 CE Split II variable 1 CE Split II variable 2 Day 3 ph Day 2 Lactate CE Split I variable 1 timing variable GUR raw material prep CE Split III variable 3 CE Split I variable 1 CE Split III variable 4 CE Split III variable 5 CE Cell Bank lot change CE Split II variable 4 CE Split III variable 1 Day 5 DO CE Split II variable 5 Day 8 DO variable 6 CE Split III input variable Variable 7 Day 2 temperature input variables Important variables were suspected in advance of random forests analysis. Only 1 variable is Downstream all others are Bioreactor or Cell Expansion. 11

Simple Regression Model predictions based on 1 st, 2 nd, and 4 th variables on list Although a large percentage of the variation is explained overall, the predictions are not satisfactory for recent production. 12

Raw Material Lot Change Timing Bulk product lots New raw material lot New raw material lot Growth Propagation Purification Weeks Raw material changes may have a creeping impact. 13

160 140 120 100 80 60 40 20 0 Bioreactor - subtle shifts in Glucose - Day 3 Glucose (mg/dl) (1) 14 381 264 292 301 334 347 355 368 375 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 140 120 100 80 60 40 20 0 Lot # - Day 4 Glucose (mg/dl) (2) - Day 6 Glucose (mg/dl) (2) of Bioreactors (2) 100 90 80 70 60 50 40 30 20 10 0 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 Lot # Lot #

Partial Least Squares model improves predictions Predictions based on 1 st, 2 nd, and 4 th suspect variables alone. Partial Least Squares predictions incorporating all bioreactor monitored variables. 15

Causes of Bulk Potency Changes Bioreactors Downstream Higher output from bioreactors due to known raw material and process changes. Yield shifts related to variation across raw material lots. Contributing factor: Bioreactor performance cycling. Newly discovered pre-existing variability (Kozlowski) 16

Results Merck develops and applies the most powerful data mining techniques to untangle the complexities of manufacturing biologic products. Additional benefits: Ability to predict potency before assay results are available. - Monitor against a forecast potency. - Builds our understanding of the biology. Basis for revising CPP s. - Developing new control strategies. 17