LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. hi@labgeni.us V1.5 NOV 15

LabGenius The world s most advanced synthetic DNA libraries Technical design notes hi@labgeni.us V1.5 NOV 15

Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture of custom synthetic DNA libraries. We have developed a proprietary technology that, unlike traditional gene synthesis methods, bypasses the need for long sequence hybridisation. This novel approach enables us to construct large and complex libraries that are well beyond the reach of other commercial providers. By complementing this manufacturing capability with advanced sequence design algorithms, we provide researchers with the world s most advanced synthetic DNA library synthesis service. POPULAR APPLICATIONS Synthetic DNA libraries are commonly used for protein engineering, expression optimisation and the development of functional nucleic acids. Popular applications include: Protein engineering Antibodies - affinity and specificity Solubility Enzymes - activity, specificity and enantioselectivity Thermostability Expression optimisation Activity of regulatory elements (e.g. promoters, RBSs, transcription factors, etc.) Optimising codon usage Functional nucleic acids Aptamers - affinity and specificity Ribozymes - activity and specificity 1

Design Guide LabGenius offers combinatorial, scanning and rationally designed libraries. Please note, we are rapidly developing our technology so even if your design does not comply with one of the stated design constraints, feel free to send it over and we will quickly evaluate whether it can be made. COMBINATORIAL LIBRARY Combinatorial libraries are used to simultaneously vary specific bases in a polynucleotide sequence. Key features Scale - we can provide combinatorial libraries that contain up to 10^13 unique sequences. Diversity - in contrast to other companies, our combinatorial library construction process is PCR-free, which prevents diversity loss through sub-pool amplification. In addition, our approach allows us to guarantee that every variant within a large combinatorial library will be unique. Flexibility - our combinatorial libraries are highly flexible. For example, it is possible to simultaneously vary up to 450 bases (i.e. 150 amino acids) within a DNA sequence. Design parameters 1 variable non-variable A library design is abstracted into variable blocks (VB) and non-variable blocks (NVB). 2 1 to 150 bp A variable block must contain at least one IUPAC-defined variable base (R, Y, S, W, K, M, B, D, H, V, N) and may optionally include up to 149 non-variable bases (A, T, C, G). Variable blocks may be 1-150 bases in length. 3 30 bp A non-variable block is a contiguous subsequence composed o f at least 30 non-variable bases (A, T, C, G). 2

4 The sequence must terminate with non-variable blocks. Variable blocks must be separated by non-variable blocks. 5 40 %GC 60 The GC content of non-variable blocks must be between 40% and 60%. Fidelity - on average, sequence fidelity drops from 90% to 60% as the length and number of variable blocks increases. Note that, for example, a library of 5E12 variants with a fidelity of 50% still yields 2.5E12 correct variants. Number of variable positions - The maximum number of variable nucleotide positions in a library design is determined by its total length as shown in the graph below. max. no. of variable nucleotide positions sequence length (bp) Design format Combinatorial libraries can be designed at either the protein or DNA level. Protein level designs If you are embarking on a protein engineering project, feel free to send us a protein-level design. We will then use our in-house algorithms to compile it down into a DNA-level design. An example of a protein level design would be TRKS [K, S, G, T] T (explained in table below). Codon 1 2 3 4 5 Amino acid(s) T R K K, S, G, T T Protein-level design T R K [K, S, G, T] T 3

If you would like amino acids at variable positions to occur with defined frequencies, then this information can also be incorporated into a design (see table below). If you do not indicate your desired amino acid frequencies at variable positions, we will assume that you would like equal representation. Codon 1 2 3 4 5 Amino acid(s) T R K K, S, G, T T Frequency 1 1 1 0.1, 0.2, 0.1, 0.6 1 Proteinlevel design TRK[K:0.1, S:0.2, G:0.1, T:0.6] T DNA-level designs DNA-level designs can be represented using the IUPAC code. An example is shown below. Base 1 2 3 4 5 Nucleotide(s) A A,C A, T, C, G A, T G, C DNA-level design AMNWS If you would like nucleotides at variable positions to occur at defined frequencies, then this information can also be incorporated into a design (see table below). Base 1 2 3 4 5 Nucleotide(s) A A,C A, T, C, G A, T G, C Frequency 1 0.5, 0.5 0.1, 0.2, 0.1, 0.6 0.1, 0.9 0.2, 0.8 DNA-level design A[A:0.5, C:0.5][A:0.1, T:0.2, C:0.1, G:0.6][A:0.1, T:0.9][G:0.2, C:0.8] If you have not yet designed your sequence, we can help with our in-house library design algorithms. Email us with high-level design specifications and we ll guide you through the process. 4

SCANNING LIBRARY Scanning libraries are often used to investigate the impact of substitution mutations at defined positions in a protein sequence. Key features Variability - at a given variable position, the native amino acid can be substituted for any combination of the 64 codons. Design parameters Our scanning libraries can cover up to a maximum 800 amino acids positions and do not have to be contiguous. Design format Scanning libraries can be designed either at the protein or DNA level. RATIONALY DESIGNED LIBRARY Rationally designed libraries allow thousands of explicitly defined sequences to be tested - members of a library can be similar in sequence or entirely different. Note, we are still beta testing our rationally designed libraries. If you would like to participate in our beta testing program, please get in touch. Design parameters Individual sequences in the library must not exceed 500 bases in length. Design format Every sequence in a rationally designed library must be specified individually. 5

Other Considerations ALGORITHMIC DESIGN Variability at either the nucleotide, codon or domain level can be controlled using, any IUPAC-definable nucleotide (e.g. A, T, C, G, R, Y, S, W, K, M, B, D, H, V, N), trinucleotides, rationally designed sequences or any combination of these three technologies. Determining how and when to deploy variability at the nucleotide, codon or domain level is a complex task. For this reason, we have built cutting-edge software tools to automatically compile high-level proteinbased designs down into DNA-based designs. If you would like help designing your library, you can provide us with the goal of the experiment and if appropriate, we will deploy one of our in-house algorithms. FORMAT We ship our libraries either in a linear format or scarlessly cloned into a customer s vector of choice. When provided in a linear format, we can also functionalise each sequence within a library with transformation enhancers. QUALITY CONTROL Depending on a customer s needs, we can ensure the quality of our libraries using both end-point clonal sequencing and nextgeneration sequencing. TURNAROUND TIME Depending on library complexity, our turnaround time can be as short as three weeks. This is significantly faster than other library suppliers who can take up to several months to complete orders. 6