Protein engineering for structural biology J. Michael Sauder, Ph.D. Lilly Biotechnology Center Eli Lilly and Company San Diego, CA 92121 michael.sauder@lilly.com PSDI 2012, Nov 12-13
Structural Biology at Lilly Impacting all therapeutic areas Informs chemistry and biotherapeutic design Focus on innovation Indianapolis, IN San Diego, CA Chicago, IL 2
Structure/Fragment guided design 3
Structures at Lilly Over 6000 structures determined at Lilly 4
Technology Protein analysis Bioinformatics LIMS database Molecular Biology Cloning Expression Fermentation Protein Purification: Crystallization Assay Protein Characterization Mass Spec LP/MS MoA & Biophysical Methods: SPR, TDF, Protein Crystallization & X ray Diffraction Synchrotron Beamline Data collection Structure Solution & Refinement SB/FB design 5
LRL-CAT at the APS Screen 300 crystals, or collect 100 datasets, per day. 6
LRL-CAT Automated Processes Showering of crystal with liquid N 2 to remove surface ice Positioning in X-ray beam Crystal quality evaluation Selection of best crystal from a group Data collection Data processing Quality control Data transmission LN 2 Shower Cryo-stream Crystal 7
Data management Challenges: Integration is essential: Data management Process flow Material hand-offs Project management Project security Tracking historical i results Minimal downtime, complete data recovery >100 active users across multiple divisions and sites 20,000 clones tracked 18,000 + protein purifications completed >5 million initial crystallization conditions screened 90,000 + crystals harvested and screened 33,000 + X-ray datasets collected 6,000 + protein structures t 8
SB LIMS Experimental data captured includes: DNA sequences, primers, mutations Vectors, expression systems, cell types Expression/solubility results Fermentation conditions Purification steps, results, gel images, chromatograms, etc. MALDI & ESI-MS spectra, AnSEC, proteolysis data, etc. Crystallization conditions, well images, well scores, etc. Tight integration with X-ray data collection at LRL-CAT (APS, Chicago) Structure coordinates and statistics Project management, data mining 9
SB LIMS Data integration Contextual search Dynamic, customized reports 10
Construct design strategies Boundary optimization / termini truncation Mutations Deletions Fusions / insertions Binding partners N C 11
Engineering and Crystal optimization 8343a11KW 8343a33SP 8343a22KW 8343a11KW 8343a33SP 8343a22KW 12
Protein Family Analysis Software tools Conservation, o domain prediction, secondary structure, disorder, etc. Family analysis e.g., Kinases Identify functional residues, domain boundaries, activation loop Activation loop length is 25-33 aa for 90% of kinome Identify residues for deletion e.g., Nuclear hormone receptors Pfam domain hormone_recrec incomplete Due to low homology in helix 1 Identify helix 1 and start of ligand binding domain DFG APE 123 13
Kinome analysis Gatekeeper residue by subgroup TK TKL Specificity-determining residue STE TK TKL CMGC CK1 STE AGC CAMK 14
Recent Structure Statistics To obtain first structure: truncation mutation none / 6.6% 34% 26% 1.6% 2.4% 4% 2.4% deletion/fusion none / full length 23% Almost 45% of targets required N- or C-terminal truncations Over one third required mutations or the use of an ortholog 10% required deletion of a flexible region or fusion with an interaction partner or crystallization chaperone One quarter required no modification (full length protein) 15
Boundary selection Domains Target domain Domain prediction Inter-domain and intermolecular interactions Residue e conservation PDB, Pfam, etc. Secondary structure t Disorder prediction Number of residues in domain S. Brewerton 16
Domain size S. Brewerton (2004) 17
Domain continuity Discontinuous Continuous S. Brewerton (2004) 18
Mutagenesis Cysteine modification Reduce oxidiation or aggregation / misfolding Create new disulfide bond (stabilization, oligomerization) Surface Entropy Reduction (Derewenda et al) Patches of Lys or Glu to Ala or Tyr Shrink hydrophobic patches Loop stabilization (Pro) Reduce phosphorylation heterogeneity (S/T,Y to A,F) Phosphomimicry (S/T/Y to D/E) Improve / Disrupt inter-molecular interactions Chimeras 19
Heterogeneity Phosphorylation Mutagenes is not always necessary to control phosphorylation 8343a6KWg1h1 8343a33SPt3p1 0P 1P 2P Insect cells E. coli with phosphatase 20
Deletions / Insertions Eliminate flexible, disordered loops Isolate soluble domain Remove membrane-interacting regions Co-expression, or express as polyprotein Fusion Insertion of binding partner or peptide Linker optimization 21
Inter-domain linker composition S. Brewerton (2004) 22
Buffer optimization Screen for stabilizing conditions Buffers ph Metals Inhibitors / activators / allosteric binders etc. 23
Software and automation Software to speed target analysis and design Sensitive sequence search Accurate multiple sequence alignments Highlight ht conservation Secondary structure prediction Disorder prediction Transmembrane helix / signal peptide prediction Combine the available data Annotate alignments 24
CDC7 Kinator TM 25
CDC7 model Second largest insertion in C-lobe, Preceding subdomain 11 Large insertion in the activation loop Possible locations of other insertions (after subdomain 10) 26
CDC7 Engineering Iterative engineering with experimental analysis ΔA (activation loop is >150 residues longer than average) ΔN (N-terminal 20-35 residues predicted to be disordered) ΔC (unusual insertion in C-terminal lobe) 45 constructs (different boundaries, deletions, vectors) 574 413 330 residues 107 purification experiments Multiple rounds of limited proteolysis/mass spec to define boundaries of catalytic domain Protein active, but less than full length CDC7+DBF4 27
CDC7 crystallization TDF to choose optimal stabilizing compound for colysis/purification/crystallization (evaluate internal inhibitors) and analytical GF to assess protein behavior of each construct T m increase of 15 C (not highest ΔT m, but nicest profile) 780 crystallization trays set up (69,000 experiments) 115 crystals harvested for X-ray screening Brute force molecular replacement All engineered sites visible ibl 28
Multiple paths Lilly CDC7 structure CDC7+DBF4 structure by Hughes et al Engineered CDC7, DBF4 Co-expressed in E. coli Hughes et al, NSMB (2012) 19:1101 29
PRMT5 Attempt to work with a single (catalytic) domain residues 275/290-637 Family member structures, secondary structure predictions, LP/MS,.. None of these approaches worked Can produce lots of soluble protein, but No activity in biochemical assay No co-factor binding (SPR) No crystals Need another approach Full length protein? Soluble, active. But.. Very low yields and aggregation issues Antonysamy et al, PNAS (2012) 109:17960; 4GQB 30
Maximizing success Rational design Integrate family knowledge / literature Take advantage of all experimental data and other people s brains More shots on goal and better triage Think outside the box 31
Acknowledgements 50 people p in Lilly s structural biology group CDC7 and PRMT5 teams LRL-CAT Use of the Advanced Photon Source at Argonne National Laboratory was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Under Contract DE-AC02-06CH11357. 32