INDEX. I. Lecture abstract



Similar documents
Transformation Protocol

TransformAid Bacterial Transformation Kit

HighPure Maxi Plasmid Kit

Plant Genomic DNA Extraction using CTAB

Classic Immunoprecipitation

50 g 650 L. *Average yields will vary depending upon a number of factors including type of phage, growth conditions used and developmental stage.

Wizard DNA Clean-Up System INSTRUCTIONS FOR USE OF PRODUCT A7280.

GRS Plasmid Purification Kit Transfection Grade GK (2 MaxiPreps)

NimbleGen DNA Methylation Microarrays and Services

Cloning GFP into Mammalian cells

Application Guide... 2

Southern Blot Analysis (from Baker lab, university of Florida)

LAB 11 PLASMID DNA MINIPREP

Frozen-EZ Yeast Transformation II Catalog No. T2001

FOR REFERENCE PURPOSES

TIANquick Mini Purification Kit

UltraClean Soil DNA Isolation Kit

CLONING IN ESCHERICHIA COLI

How To Make A Tri Reagent

Investigating a Eukaryotic Genome: Cloning and Sequencing a Fragment of Yeast DNA

Genomic DNA Extraction Kit INSTRUCTION MANUAL

Chromatin Immunoprecipitation

Automation in Genomics High-throughput purification of nucleic acids from biological samples. Valentina Gualdi Operational Scientist PGP

AxyPrep TM Mag PCR Clean-up Protocol

A Metagenomics & Phylogenetic Analysis of Woods Hole Passage Microorganisms. Gail P. Ferguson University of Edinburgh (UK)

LAB 7 DNA RESTRICTION for CLONING

PCR was carried out in a reaction volume of 20 µl using the ABI AmpliTaq GOLD kit (ABI,

Chromatin Immunoprecipitation (ChIP)

RAGE. Plugs for RAGE/PFGE

UltraClean PCR Clean-Up Kit

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

PowerFecal DNA Isolation Kit

CHEF Genomic DNA Plug Kits Instruction Manual

Lab 10: Bacterial Transformation, part 2, DNA plasmid preps, Determining DNA Concentration and Purity

Troubleshooting Guide for DNA Electrophoresis

In vitro analysis of pri-mirna processing. by Drosha-DGCR8 complex. (Narry Kim s lab)

MICB ABI PRISM 310 SEQUENCING GUIDE SEQUENCING OF PLASMID DNA

Genolution Pharmaceuticals, Inc. Life Science and Molecular Diagnostic Products

Procedure for RNA isolation from human muscle or fat

Mouse ES Cell Nucleofector Kit

ELUTION OF DNA FROM AGAROSE GELS

How To Get Rid Of Small Dna Fragments

RevertAid Premium First Strand cdna Synthesis Kit

Protein Precipitation Protocols

Updated: July ' End label RNA markers (18mer) and (24mer) with Kinase and 32P-gamma-ATP. Gel purify labeled markers.

The fastest spin-column based procedure for purifying up to 10 mg of ultra-pure endotoxin-free transfection-grade plasmid DNA.

Genomic DNA Clean & Concentrator Catalog Nos. D4010 & D4011

Molecular Biology Techniques: A Classroom Laboratory Manual THIRD EDITION

Bacterial Transformation with Green Fluorescent Protein. Table of Contents Fall 2012

First Strand cdna Synthesis

Kevin Bogart and Justen Andrews. Extraction of Total RNA from Drosophila. CGB Technical Report doi: /cgbtr

An In-Gel Digestion Protocol

DNA Isolation Kit for Cells and Tissues

Whole genome Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

Agrobacterium tumefaciens-mediated transformation of Colletotrichum graminicola and Colletotrichum sublineolum

MagExtractor -Genome-

One Shot TOP10 Competent Cells

TECHNICAL BULLETIN. HIS-Select Nickel Affinity Gel. Catalog Number P6611 Storage Temperature 2 8 C

UltraClean Forensic DNA Isolation Kit (Single Prep Format)

Running protein gels and detection of proteins

MEF Nucleofector Kit 1 and 2

DNA SPOOLING 1 ISOLATION OF DNA FROM ONION

DNA ligase. ATP (or NAD+)

How To Use An Enzymatics Spark Dna Sample Prep Kit For Ion Torrent

SOLIDscript Solid Phase cdna Synthesis Kit Instruction Manual

PCR and Sequencing Reaction Clean-Up Kit (Magnetic Bead System) 50 preps Product #60200

Protocol for Western Blotting

Enzymes: Amylase Activity in Starch-degrading Soil Isolates

Protein extraction from Tissues and Cultured Cells using Bioruptor Standard & Plus

Preparing Samples for Sequencing Genomic DNA

Agencourt RNAdvance Blood Kit for Free Circulating DNA and mirna/rna Isolation from μL of Plasma and Serum

Lab 5: DNA Fingerprinting

restriction enzymes 350 Home R. Ward: Spring 2001

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Effects of Antibiotics on Bacterial Growth and Protein Synthesis: Student Laboratory Manual

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation

Instructions. Torpedo sirna. Material. Important Guidelines. Specifications. Quality Control

Optimal Conditions for F(ab ) 2 Antibody Fragment Production from Mouse IgG2a

DP419 RNAsimple Total RNA Kit. RNAprep pure Series. DP501 mircute mirna Isolation Kit. DP438 MagGene Viral DNA / RNA Kit. DP405 TRNzol Reagent

Troubleshooting Sequencing Data

ab Hi-Fi cdna Synthesis Kit

Western Blot Protocol Protein isolation

HARVESTING AND CRYOPRESERVATION OF HUMAN EMBRYONIC STEM CELLS (hescs)

QUANTITATIVE RT-PCR. A = B (1+e) n. A=amplified products, B=input templates, n=cycle number, and e=amplification efficiency.

Western Blot Analysis with Cell Samples Grown in Channel-µ-Slides

pmod2-puro A plasmid containing a synthetic Puromycin resistance gene Catalog # pmod2-puro For research use only Version # 11H29-MM

empcr Amplification Method Manual - Lib-A

ISOLATE II PCR and Gel Kit. Product Manual

HiPer RT-PCR Teaching Kit

Constructing Normalized cdna Libraries

ExpressArt Bacterial H-TR cdna synthesis kit. With extreme selectivity against rrnas

Taq98 Hot Start 2X Master Mix

PicoMaxx High Fidelity PCR System

HiPer Ion Exchange Chromatography Teaching Kit

Recommended Procedures for the Extraction of RNA. Jan Pedersen USDA, APHIS, VS, National Veterinary Services Laboratories, Ames, IA 50010

MEF Starter Nucleofector Kit

User Manual. CelluLyser Lysis and cdna Synthesis Kit. Version 1.4 Oct 2012 From cells to cdna in one tube

Transfection reagent for visualizing lipofection with DNA. For ordering information, MSDS, publications and application notes see

Blood Collection and Processing SOP

Bacterial Transformation and Plasmid Purification. Chapter 5: Background

Transcription:

INDEX I. Lecture abstract I.1. The gold era of metagenomics Dr. Manuel Ferrer I.2. Molecular Methods to construct environmental DNA libraries Dr. Manuel Ferrer I.3. High-throughput sequencing: applications and challenges Dr. Julián Pérez I.4. Biodiversity and biologically active molecules Dr. Olga Guenilloud I.5. Bioinformatics applied to bacterial (meta)genomics Dr. Javier Tamames I.6. Revealing the identity of DNA fragments Dr. Ramón Roselló II. Experimental procedures II.1. DNA extraction and plafr3 shoulder preparation Sample preparation DNA extraction Gel preparation plafr3 shoulder preparation II.2. DNA and 16S rrna gene libraries production (1) 16S rrna gene libraries construction CopyControl fosmid library production plafr3 cosmid library production Lambda phage library production II.3. DNA and 16S rrna gene libraries production (2) II.4. DNA library production (3) II.5. DNA library production (4) and activity screens III. In silico procedures III.1. Bioinformatics for metagenomcis. A beginners guide Dr. Michael Richter III.2. Phylogenetic reconstructions. An ARB software introduction Dr. Pablo Yarza III.3. Meta(genomics) assembling methodologies Dr. Giuseppe D Auria IV. Contacts 1

I. LECTURE ABSTRACT I.1. The gold era of metagenomics Dr. Manuel Ferrer & Ana Beloqui CSIC Instituto de Catálisis, Madrid, Spain Metagenomics (also Environmental Genomics, Ecogenomics or Community Genomics) is an emerging approach to study microbial communities in the environment. This relatively new technique enables studies of organisms that are not easily cultured in a laboratory, thus differing from traditional microbiology that relies on cultured organisms. Metagenomics technology thus holds the premise of new depths of understanding of microbes and, importantly, is a new tool for addressing biotech problems, without tedious cultivation efforts. DNA sequencing technology has already made a significant breakthrough and generation of giga base pairs of microbial DNA sequences is not posing a challenge any longer. However conceptual advances in microbial science will not only rely on the availability of innovative sequencing platforms but also on sequence-independent tools for getting an insight into the functioning of microbial communities. This is an important issue as we know that even the best annotations of genomes and metagenomes only created hypotheses of the functionality and substrate spectra of proteins which require experimental testing by classical disciplines such as physiology and biochemistry. Here, we addressed the following question, how to take advantage of, and how can we improve the, metagenomic technology for accommodating the needs of microbial biologists and enzymologists. 2

I. LECTURE ABSTRACT I.2. Molecular Methods to construct environmental DNA libraries Dr. Manuel Ferrer & Ana Beloqui CSIC Instituto de Catálisis, Madrid, Spain Recent emergency of metagenomics allows the analysis of microbial communities without tedious cultivation efforts. Metagenomics approach is analogous to the genomics with the difference that it does not deal with the single genome from a clone or microbe cultured or characterized in laboratory, but rather with that from the entire microbial community present in an environmental sample, it is the community genome. Global understanding by metagenomics depends essentially on the possibility of isolating the entry bulk DNA and identifying the genomes, genes and proteins more relevant to each of the environmental sample under investigation. Here, we tried to provide a broad view at current technical issues to illustrate the potential of getting appropriate metagenomic material to create representative gene libraries, as the first step for analysis community genomes. 3

I. LECTURE ABSTRACT I.3. High-throughput sequencing: applications and challenges Dr. Julián Pérez Secugen, Madrid, Spain 4

I. LECTURE ABSTRACT I.4. Biodiversity and biologically active molecules Dr. Olga Guenilloud 5

I. LECTURE ABSTRACT I.5. Bioinformatics applied to bacterial (meta)genomics Dr. Javier Tamames Cavanilles Institute on Biodiversity and Evolutionary Biology, Valencia, Spain Metagenomics sequencing obtains vast amount of DNA sequences that must be analysed and annotated. This requires massive amounts of computational resources and also the adaptation of existing bioinformatic techniques to the particular characteristics of this kind of data. We will focus on the current state of the bioinformatic developments for metagenomics, identifying the main problems that still need to be solved in order to get the most of the data. 6

I. LECTURE ABSTRACT I.6. Revealing the identity of DNA fragments Dr. Ramón Roselló Marine Microbiology Group (MMG), IMEDEA, Esporles The metagenomic approach applied to natural microbial communities has brought important information on the genetic potential of the organisms thriving in the studied environments. However, one of the major drawbacks of the approach is to identify the identity of the fragments of the cloned DNA. Molecular microbial ecology has long been directed the efforts in describing an extremely hidden diversity that was not achieved by classical culturing techniques. Much of the effort has been centred in the 16S rrna gene as harboring a phylogenetic signal that allows the identification of the organisms harbouring it. However, there are other housekeeping genes that contain as well a signal that can be useful for their identification. Due to the low amount of paralog sequences of 16S rrna genes in a given genome, the probabilities to find them in a cloned fragment by using the metagenomic approach are very low. Due to this reason, alternative genes may be selected that will help in understanding the origin of the DNA. In such cases in where a phylogenetic valid gene is found, the putative identity of an organism is normally guaranteed. However, in most of the cases, DNA fragments may not contain any of such genes. In these cases, there is a need to find alternative approaches to be able to affiliate a DNA fragment with an existing taxon. During the talk, it will be discussed what does identity means by using gene sequences. Different genes with different phylogenetic signals will be discussed in the frame of the purpose of identifying their property. In addition, alternative but less accurate approaches as tetranucleotide signals will be outlined in order to understand different levels of assigning a sequence to an existing organism. 7

II. Experimental procedures Day 1 (afternoon) DNA extraction and plafr3 shoulder preparation Material Nycodenz (1.3 mg ml -1 ) Disruption buffer (0.2M NaCl, 50 mm Tris-HCl ph 8) PBS 1x buffer TE buffer Sample Agarose 0.6-0.7% (w/v) λ-hindiii marker λ mono-cut marker LB-agar-Amp 50 -XGal HindIII, EcoRI, BamHI and buffers Shrimp Alkaline Phosphatase Microcon-100 (Millipore) E. coli S17-3 (bearing plafr3 cosmid) LBa and LBb Large construct kit (Qiagen) GeneClean Kit (BIO101) Protocol 1 sample preparation [1] Prepare sample suspension: to 40 g sample add 140 ml disruption buffer in a Waring blender. [2] Blend the suspension on a low speed setting for 3x1 min periods with collind on ice for 1 min between blending. [3] Centrifuge at low speed (approx. 200-400 g for 1-5 min) to eliminate large soil particles and then use supernatant for biomass separation via Nycodenz [4] 25-mL of the soil homogenate is transferred to an ultracentrifuge tube and 9-11 ml of nycodenz (1.3 g ml -1 ) is carefully pipetted to form a layer below the homogenate. [5] Centrifuge at 10000 g x for 20-40 min at 4ºC. Preferably swing-out rotor. [6] A faint whitish band containing bacterial cells is resolved at the interface between the nycodenz and the aqueous layer. This band is transferred into a sterile tube. Note that 8

sometimes, soils contain a lot of small particles which are not separable: they cover nycodenz surface making solid layer mixed with microbial biomass (this problem is typical for clay soils) [7] Approx. 35 ml of phosphate buffered saline buffer (PBS) is added and the cells pelleted by centrifugation at 10000 g for 20 min. The cells pellet, re-suspended in 0.5-2.0 ml TE buffer ph 8.0, is then ready for lysis and DNA extraction. Protocol 2 DNA extraction [1] To the above cells, add 1.85 ml Cell Suspension Solution (use a 15 ml clear plastic tube for efficient mixing). Mix until the solution appears homogeneous. [2] Add 50 μl of RNase Mix, mix thoroughly. Add 100 μl of Cell Lysis/Denaturing Solution, mix well. [3] Incubate at 55 C for 15 minutes. [4] Add 25 μl Protease Mix, mix thoroughly. [5] Incubate at 55 C for 30 to 120 minutes (the longer time will result in minimal protein carry over and will also allow for substantial reduction in residual protease activity). [6] Add 500 μl Salt-Out Mixture, mix gently yet thoroughly. Divide sample into 1.5ml tubes. Refrigerate at 4 C for 10 minutes. [7] Spin for 10 minutes at maximum speed in a microcentrifuge (at least 10000 g). Carefully collect the supernatant, avoid the pellet. If a precipitate remains in the supernatant, spin again until it is clear. Pool the supernatants in a 15 ml (or larger) clear plastic tube. [8] To this supernatant, add 2 ml TE buffer and mix. Then add 8 mls of 100% ethanol. If spooling the DNA, add the ethanol slowly and spool the DNA at the interphase with a clean glass rod. If centrifuging the DNA, add the ethanol and gently mix the solution by inverting the tube. [9] Spin for 15 minutes at 10000 g. Eliminate the excess ethanol by blotting or air drying the DNA. [10] Dissolve the genomic DNA in TE buffer. [11] Quantify the amount of nucleic acid. [12] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v). Protocol 3 Gel preparation [1] Prepare an agarose gel (0.7%). 9

[2] Run an aliquot (about 400 ng) together with markers. [3] Run overnight a 20 cm long gel 1% agarose at 30-35 V overnight at 4ºC Protocol 4 - plafr3 shoulders preparation [1] Inoculate 200 ml of LB, Tc 10 μg/ml with a single colony of E. coli S17-3 (bearing plafr3 cosmid) and grow it overnight with orbital shaking (250 rpm) at 30ºC. Pellet cells for 10 min at 7000 g and islolate plafr3 plasmid with large construct kit (Qiagen), treating the sample with ATP-dependent exonuclease to have just this cosmid, thus eliminating DNA chromosome. [2] Then take two aliquots of around 10 μg of plafr3 and cut one with HindIII (shoulder 1) and the other with EcoRI (shoulder 2) at 37ºC during 1-2 hours. Then, run small aliquots in a 0.75% agarose electrophoresis gel just to see that the digestion worked property. Then incubate samples at 65 C for 20 min to inactivate restriction enzymes. 20 μl plafr3 vector (10 μg) 5 μl Buffer NEB2 10X 5 μl BSA 10X 19 μl MilliQ water 1 μl EcoRI 20U/μl Total reaction volume: 50 μl 20 μl plafr3 vector (10 μg) 5 μl Buffer NEB2 10X 5 μl BSA 10X 19 μl MilliQ water 1 μl HindIII 20U/μl Total reaction volume: 50 μl [3] Add 3 μl of Shrimp Alkaline Phosphatase (SAP, from Biotec ASA) to dephosphorylate DNA, incubate 1 hr at 37 C. In order to spurn DNA shearing avoid pipetting, just stir the tube to mix. Then incubate samples at 65 C for 20 min to inactivate SAP. [4] Mix the plafr3 shoulders at 1:1 and add 400 μl of water to wash it off in Microcon- 100 (Millipore). Concentrate to a small volume (around 30-40 μl). 10

[5] To a volume of 37 μl of Microcon-concentrated DNA add 5 μl of buffer 10X NEB3 (New England Biolabs Buffer 3), 5 μl of BSA 10X, 2 μl of MilliQ water and 1 μl of BamHI enzyme and digest overnight at 37ºC. [6] Run small aliquotes in a 0.75% agarose electrophoresis gel just to see that the fragments will remain the same size (22 Kb), as before BamHI-digestion. [7] Use the GeneClean Kit (BIO101) to inactivate BamHI and to concentrate the plafr3 shoulders. [8] To do that add 150 μl NaI solution [9] Add 5 μl GLASSMILK (previous vortexing) and mix [10] Incubate at room temperature for 5 min and mix [11] Pellet the GLASSMILK with DNA at 14000 g x 5 seg and discard supernatant [12] Add 500 μl NEW Wash and resuspend [13] Centrifuge at 14000 g x 5 seg and discard supernatant [14] Repeat washing step. [15] Dry pellet to remove residual EtOH [16] Add 50-100 μl TE or water and mix [17] Centrifuge for 30 seg and store supernatant containing plafr3 ready-to-use vector. 11

II. Experimental procedures Day 2 (morning and afternoon) DNA and 16S rrna gene libraries production Material Samples 16S rrna primer 16F530 (5 -TTCGTGCCAGCAGCCGCGG-3 ) 16S rrna primer 16R1492 (5'-TACGGYTACCTTGTTACGACTT-3') pgem-easy T4 DNA ligase pcc1fos Epicentre (Cat. No. CCFOS110), plafr3 digested and ZAP Express vector (Stratagene) 0.5 M EDTA ph 8.0 and TE buffer Agarose 0.6-1.0% (w/v) (normal and low melting point) λ-hindiii marker, λ mono-cut marker LB-agar-Amp 50 -XGal Sau3A and buffer Microcon-100 (Millipore) LBa and LBb and Tc 5-10 mg/ml GELase (Epicentre) Protocol 5 16S rrna gene libraries construction [1] The PCR reaction (50 μl) is performed with an annealing temperature of 50ºC and 25 cycles should be used. The PCR products are purified from a 1% agarose gel and inserted into the pgemt-easy vector (Promega) as follows: Reaction 1: 1 μl pgemt-easy, 1 μl T4 DNA ligase buffer (x10), 0.5 μl T4 DNA ligase, 3.3 μl PCR product, 4.1 μl MilliQ water Reaction 2: 1 μl pgemt-easy, 1 μl T4 DNA ligase buffer (x10), 0.5 μl T4 DNA ligase, 7.0 μl PCR product, 0.5 μl MilliQ water [2] Ligate at 4ºC overnight. Protocol 6 CopyControl Fosmid Library Production The CopyControl Fosmid Library Production kit (EPICENTRE) utilizes a strategy of cloning randomly sheared, end-repaired DNA with an average insert size of 40 kbp. Shearing the DNA into approximately 40 Kb fragments leads to the highly random generation of DNA 12

fragments in contrast to more biased libraries that result from partial restriction endonuclease digestion of the DNA. Frequently genomic DNA is sufficiently sheared, as a result of the purification process, that additional shearing is not necessary. Test the extent of shearing of the DNA by first running a small amount of it (around 100 ng). Run the sample on a 20 cm long gel 1% agarose at 30-35 V overnight at 4ºC and stain. If 10% or more of the genomic DNA migrates with the Fosmid control DNA provided with the kit (36 Kb size), then you can proceed to the end repair protocol. If the genomic DNA migrates slower (higher MW) than the 6 Kb fragment, then the DNA needs to be sheared. Shear the DNA (2.5 μg) by passing it through a 200 μl small bore pipette tip. Aspirate and expel the DNA from the pipette tip 50-100 times. If the genomic DNA migrates faster than the 36 Kb fragment (lower MW) then it has been sheared too much and should be reisolated. For the end-repair protocol, take into account these suggestions: End repair protocol [1] Thaw and thoroughly mix all of the reagents listed below before dispensing; place on ice. Combine the following on ice: 8 μl 10X End-Repair Buffer 8 μl 2.5 mm dntp Mix 8 μl 10 mm ATP 32 μl sheared insert DNA (approximately 4.3 μg)* 20 μl sterile water 4 μl End-Repair Enzyme Mix 80 μl Total reaction volume *The end-repair reaction can be scaled up or scaled down as dictated by the amount of DNA available. [2] Incubate at room temperature for 45 minutes. [3] Add gel loading buffer and incubate at 70ºC for 10 min to inactivate the End-Repair Enzyme Mix. [4] Select the size of the end-repaired DNA by low melting point (LMP) agarose gel electrophoresis. Run the sample on a 20 cm long 1% agarose gel at 30-35 V overnight at 4ºC. Do not stain the DNA with EtBr and do not expose it to UV. Use stained DNA marker lanes as a ruler to cut out the agarose region containing the 25-60 Kb DNA and trim excess agarose. 13

Protocol 7 plafr3 Cosmid Library Production Since the discovery rate of novel proteins using traditional cultivation techniques has significantly decreased during the past couple of years, many different expression hosts, apart from the usual E. coli systems, are used at the moment for cloning the DNA fragments. Of particular interest is the mining and further reconstitution of natural product biosynthetic pathways where large multienzyme assemblies should be functionally expressed and where the choice of a suitable heterologous host is critical. In this case, it has been proposed the generation of broad host range vectors for replication in different Gram-negative species, such us plafr3 vector, which is able to replicate in Pseudomonas strains hosts (30). To this end, we are going to prepare metagenomic libraries with the plafr3 vector, which allow the cloning of around 23 Kb insert DNA in the expression hosts of the Pseudomonas genus. Partial Sau 3AI digestion of DNA insert for plafr3 cloning. In order to obtain DNA fragments of 25-50 Kb partially digested with Sau3AI is recommended to do some pilot reactions using different amounts of enzyme. Set up a series of reactions. [1] Take enzyme dilutions in 1 x reaction buffer (is enzyme 10 U/μl) 1/10 μl, 1/20, 1/50, 1/100, 1/200. [2] Do a trial digestion for 30 min at 37ºC. 2 μl DNA (1 μg) 1 μl Buffer 10X 1 μl BSA 10X 19 μl MilliQ water 1 μl Sau3A diluted Total reaction volume: 10 μl [3] Then add 1.5 μl EDTA 0.5 M ph 8.0 heat at 65 C for 20 min. [4] Then run a 20 cm long gel 0.7-1% agarose and stain. Use the partial digestion conditions that result in a majority of the DNA migrating in the desired size range (25-50 Kb). [5] Make a scale-up reaction. Scale up Sau3AI enzyme amount for about 10 μg DNA. You should choose 2 different restriction conditions, as in the following example: 14

Reaction 1 20 μl concentrated insert DNA (10 μg) 5 μl Ligation Buffer NEB1 10X 5 μl BSA 10X X μl MilliQ water X μl Sau3AI diluted Total reaction volume: 50 μl Reaction 2 20 μl concentrated insert DNA (10 μg) 7 μl Ligation Buffer NEB1 10X 7 μl BSA 10X X μl MilliQ water X μl Sau3AI diluted Total reaction volume: 50 μl [6] Incubate 20 min at 37ºC. [7] Stop reactions by adding 7.5 μl EDTA 0.5 M ph8 and heat the samples to 65 ºC 15 min. [8] Then mix both reactions and load samples on a 20 cm long preparative low melting point (LMP) gel 1% agarose, run it at 30-35 V overnight at 4ºC and cut and stain the slots with the DNA marker. Do not stain the part of the gel containing your DNA for cloning. Under UV light cut out the part of the gel blocks with the DNA markers in the range of ca. 20 kbp to use them as a marker to excise the gel with environmental DNA. Protocol 8 Lambda phage Library Production Small insert expression libraries, especially those made in lambda phage vectors, are specially constructed for activity screens; however, in contrast with cosmid or fosmid vectors, the Zap Express pbk vector (Stratagene) allows cloning of up to 15 kbp (optimal about 8.5-9.5 kbp). Partial Sau3AI digestion of DNA insert for cloning in Zap Express vector. In order to obtain DNA fragments of about 8.5-9.5 kbp partially digested with Sau3AI is recommended to do some trial reactions using different amounts of enzyme. Set up a series of reactions starting for example from 0.1 to 0.04 U of enzyme per 1 μg of DNA: [1] Take enzyme dilutions in 1 x reaction buffer (is enzyme 10 U/μl) 1/10 μl, 1/20, 1/50, 1/100, 1/200. [2] Do a trial digestion for 30 min at 37ºC. 2 μl DNA (1 μg) 1 μl Buffer 10X 15

1 μl BSA 10X 5 μl MilliQ water 1 μl Sau3A diluted Total reaction volume: 10 μl [3] Incubate 20 min at 37ºC. [4] Stop reactions by adding 1.5 µl 0.5 M EDTA ph 8 and by heating the samples at 65 ºC for 15 min. [5] Then run a 20 cm long gel 1% agarose stain. Use the partial digestion conditions that result in a majority of the DNA migrating in the desired size range (5-15 Kb). So, for the partial digestion of the DNA, you should scale up Sau3AI enzyme amount for at least 2-10 μg DNA. The two best restriction conditions are selected and scale up, as in the following example: Reaction 1 20 μl concentrated insert DNA (10 μg) 5 μl Ligation Buffer NEB1 10X 5 μl BSA 10X X μl MilliQ water X μl Sau3AI diluted Total reaction volume: 50 μl Reaction 2 20 μl concentrated insert DNA (10 μg) 7 μl Ligation Buffer NEB1 10X 7 μl BSA 10X X μl MilliQ water X μl Sau3AI diluted Total reaction volume: 50 μl [6] Incubate 20 min at 37ºC. [7] Stop reactions by adding 7.5 μl EDTA 0.5 M ph8 and heat the samples to 65 ºC 15 min. [8] Then mix both reactions and load samples on a 20 cm long preparative low melting point (LMP) gel 1% agarose, run it at 30-35 V overnight at 4ºC and cut and stain the slots with the DNA marker. Do not stain the part of the gel containing your DNA for cloning. Under UV light cut out the part of the gel blocks with the DNA markers in the range of ca. 20 kbp to use them as a marker to excise the gel with environmental DNA. 16

II. Experimental procedures Day 3 (morning) DNA and 16S rrna gene libraries production Material T4 DNA ligase pcc1fos Epicentre (Cat. No. CCFOS110) 0.5 M EDTA ph 8.0 Agarose 0.6-1.0% (w/v) (normal and low melting point) TE buffer Agarose λ-hindiii marker λ mono-cut marker LB-agar-Amp 50 -XGal Sau3A and buffer Microcon-100 (Millipore) LBa and LBb Tc 5-10 mg/ml in ethanol GELase (Epicentre) plafr3 digested ZAP Express vector (Stratagene) E. coli XL1 MRF E. coli EPI300 E. coli DH5α MgSO 4 1 M and MgSO 4 10 mm Protocol 9 16S rrna gene libraries construction (cont. protocol 5) [1] The product of this ligation (2 μl) is used to transform 50 μl competent E. coli DH5α cells. [2] Cells are plated in LB-agar-Amp 50 -XGal plates and incubated at 37ºC overnight. [3] Around 100 positives random selected clones (white colonies) are sequenced using the M13f primer. 17

Protocol 10 CopyControl Fosmid Library Production (cont. protocol 6) DNA fragment size selection [1] Once run de gel overnight, proceed to the agarose gel-digesting assay using the GELase (EPICENTRE) Agarose Gel-Digesting protocol described in steps below. Cut the area > 20-30. [2] Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel. [3] Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each 200 mg of gel. [4] Add 4 μl 50x gelase buffer per each 200 mg agarose [5] Add 2 μl GELase and incubate for 1-4 h at 45 ºC. [6] Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at 4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be careful to avoid the gelatinous pellet. [7] Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa cut-off) at 4ºC to a final volume of 20-50 μl. Be sure that you cut the yellow tip to transfer the supernatant. [8] Then add 450 μl steril water and concentrate again to 20-50 μl. This concentrated DNA is the insert to ligate to the pcc1fos vector. [9] Quantify the amount of nucleic acid. DNA concentration should be not less that 75 ng/μl (in 50 μl a total of 3.75 μg). [10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v). Protocol 11 plafr3 Cosmid Library Production (cont. protocol 7) DNA fragment size selection [1] Once run de gel overnight, proceed to the agarose gel-digesting assay using the GELase (EPICENTRE) Agarose Gel-Digesting protocol described in steps below. Cut the area > 20 kb*. * You must see that the DNA is not intact (you run the control), but already smears. And major fraction is running above 10-15 kbp. Take from 20 kb and higher. The initial DNA will not exceed 30-40 kb anyway. So take everything that is above. [2] Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel. 18

[3] Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each 200 mg of gel. [4] Add 4 μl 50x gelase buffer per each 200 mg agarose [5] Add 2 μl GELase and incubate for 1-4 h at 45 ºC. [6] Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at 4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be careful to avoid the gelatinous pellet. [7] Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa cut-off) at 4ºC to a final volume of 20-50 μl. Be sure that you cut the yellow tip to transfer the supernatant. [8] Then add 450 μl steril water and concentrate again to 20-50 μl. This concentrated DNA is the insert to ligate to the plafr3 vector. [9] Quantify the amount of nucleic acid. DNA concentration should be not less that 75 ng/μl (in 50 μl a total of 3.75 μg). [10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v). [11] Ligate overnight at 14 C partially Sau3AI digested DNA and plafr3 shoulders in a ratio 1:2 or 1:1. The ligation volume must be as low as possible (5-10 μl). If you take 100 ng of both shoulders together, then add 50 or 100 ng of the insert (you may do two separate ligations and see what works better). It is highly recommended to run small aliquots (for example 1 μl) of all your samples after any manipulation, and after ligation Reaction 1: 1 μl plafr3, 1 μl T4 DNA ligase buffer (x10), 0.5 μl T4 DNA ligase, X DNA fragment, X μl MilliQ water. Protocol 12 Lambda phage Library Production (continuation of protocol 8) DNA fragment size selection [1] Once run de gel overnight, proceed to the agarose gel-digesting assay using the GELase (EPICENTRE) Agarose Gel-Digesting protocol described in steps below. Cut the area < 15 kb. [2] Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel. [3] Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each 200 mg of gel. [4] Add 4 μl 50x gelase buffer per each 200 mg agarose 19

[5] Add 2 μl GELase and incubate for 1-4 h at 45 ºC. [6] Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at 4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be careful to avoid the gelatinous pellet. [7] Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa cut-off) at 4ºC to a final volume of 20-50 μl. Be sure that you cut the yellow tip to transfer the supernatant. [8] Then add 450 μl steril water and concentrate again to 20-50 μl. This concentrated DNA is the insert to ligate to the lambda vector. [9] Quantify the amount of nucleic acid. DNA concentration should be not less that 75 ng/μl (in 50 μl a total of 3.75 μg). [10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v). [11] Ligate overnight at 14 C partially Sau3AI digested DNA and pbk-cmv, using the following ligation conditions (the final volume should not exceed 5.0-5.5 µl) 1 µl Zap Express Vector 0.6 µl T4 ligase buffer (x10) 4 µl of concentrated insert 0.6 µl T4 DNA ligase [12] Inoculate 50 ml of LB, supplemented with 10 mm MgSO 4 and 0.2% (w/v) maltose, with a single colony of E. coli XL1 MRF. [13] Grow at 30 C, shaking overnight, shaking at 200 rpm 20

II. Experimental procedures Day 4 (morning) DNA gene library production Material pcc1fos Epicentre (Cat. No. CCFOS110) Agarose 0.6-1.0% (w/v) (normal and low melting point) Microcon-100 (Millipore) LBa and LBb, NZYa and NZYb E. coli XL1 MRF, E. coli EPI300, E. coli DH5α MgSO 4 1 M and MgSO 4 10 mm SM buffer Chloroform Tc 5-10 mg/ml and Cm 50 mg/ml Protocol 13 CopyControl Fosmid Library Production (cont. protocol 10) Ligation reaction in the pcc1fos fosmid vector. A single ligation reaction will produce 10 3-10 6 clones depending on the quality of the insert DNA. Based on this information calculate the number of ligation reactions that you will need to perform. The ligation reaction can be scaled-up as needed. A 10:1 molar ratio of pcc1fos vector to insert DNA is optimal. If we use 0.5 μg of 100 Kb DNA insert we need around 0.5 μg of vector. [1] Combine the following reagents in the order listed and mix thoroughly after each addition. 1 μl 10X Fast-Link Ligation Buffer 1 μl pcc1fos (0.5 μg/μl) 1 μl 10 mm ATP 6.8 μl concentrated insert DNA (75 ng/μl) 0.2 μl MilliQ water 1 μl Fast-Link DNA Ligase 10 μl Total reaction volume 21

[2] Incubate at room temperature for 2 hours and then transfer the reaction to 70ºC for 10 minutes to inactivate the Fast-Link DNA Ligase. Packing reaction in the pcc1fos fosmid vector. [1] Thaw, on ice, 1 tube of the MaxPlax Lambda Packaging Extracts for every ligation reaction performed in the above step. [2] When thawed, immediately transfer 25 μl (one-half) of each packaging extract to a second 1.5 ml microfuge tube and place on ice. [3] Add 10 μl of the ligation reaction to each 25 μl of the thawed, extracts being held on ice. [4] Mix by pipetting the solutions several times. Avoid the introduction of air bubbles. Briefly centrifuge the tubes to get all liquid to the bottom. [5] Incubate the packaging reactions at 30ºC for 90 minutes. After the 90 minute packaging reaction is complete, add the remaining 25 μl of MaxPlax Lambda Packaging Extract from to each tube. [6] Incubate the reactions for an additional 90 minutes at 30ºC. [7] At the end of the second 90 minute incubation, add Phage Dilution buffer (PD buffer: 10 mm Tris-ClH ph 8.3, 100 mm NaCl, 10 mm MgCl 2 ) to 1 ml final volume in each tube and mix gently. Add 25 μl of chloroform to each. Mix gently and store at 4ºC (up to a month). A viscous precipitate may form after addition of the chloroform. This precipitate will not interfere with library production. Determine the titer of the phage particles (packaged fosmid clones) and then plate the fosmid library. See next day. [8] The day of the packaging reactions, inoculate 50 ml of LB broth + 10 mm MgSO4 with 5 ml of the EPI300-T1 R overnight culture. Shake at 37ºC to an OD 600nm = 0.8-1.0. Store the cells at 4ºC until needed (Titering). The cells may be stored for up to 72 hours at 4ºC if necessary. Protocol 14 plafr3 Cosmid Library Production (cont. protocol 11) Packaging Protocol [1] Remove the appropriate number of packaging extracts from a 80 C freezer and place the extracts on dry ice. [2] Quickly thaw the packaging extract by holding the tube between your fingers until the contents of the tube just begins to thaw. 22

[3] Add the experimental DNA immediately (1 4 μl containing 0.1 1.0 μg of ligated DNA) to the packaging extract. [4] Stir the tube with a pipet tip to mix well. Gentle pipetting is allowable provided that air bubbles are not introduced. [5] Spin the tube quickly (for 3 5 seconds), if desired, to ensure that all contents are at the bottom of the tube. [6] Incubate the tube at room temperature (22 C) for 2 hours. [7] Add 500 μl of SM buffer (50 mm Tris-ClH ph 7.5, NaCl 0.1M, 8.5 mm MgSO 4 and 0.01% (w/v) gelatin) to the tube. The gelatin in SM buffer stabilizes lambda phage particles during storage. [8] Add 20 μl of chloroform and mix the contents of the tube gently. [9] Spin the tube briefly to sediment the debris. [10] The supernatant containing the phage is ready for titering. The supernatant may be stored at 4 C for up to 1 month. [11] Streak the bacterial glycerol stock (E. coli DH5α or XL1Blue) onto the LB agar plates. Incubate the plates overnight at 37 C. Do not add antibiotic to the medium in the following step. The antibiotic will bind to the bacterial cell wall and will inhibit the ability of the phage to infect the cell. [12] Inoculate 50 ml of LB, supplemented with 10 mm MgSO 4 and 0.2% (w/v) maltose, with a single colony. [13] Grow overnight at 30 C, shaking at 200 rpm. Protocol 15 Lambda phage Library Production (cont. protocol 12) Packaging Protocol [1] Remove the appropriate number of packaging extracts from a 80 C freezer and place the extracts on dry ice. [2] Quickly thaw the packaging extract by holding the tube between your fingers until the contents of the tube just begins to thaw. [3] Add the experimental DNA immediately (1 4 μl containing 0.1 1.0 μg of ligated DNA) to the packaging extract. [4] Stir the tube with a pipet tip to mix well. Gentle pipetting is allowable provided that air bubbles are not introduced. [5] Spin the tube quickly (for 3 5 seconds), if desired, to ensure that all contents are at the bottom of the tube. 23

[6] Incubate the tube at room temperature (22 C) for 2 hours. [7] Add 500 μl of SM buffer (50 mm Tris-ClH ph 7.5, NaCl 0.1M, 8.5 mm MgSO 4 and 0.01% (w/v) gelatin) to the tube. The gelatin in SM buffer stabilizes lambda phage particles during storage. [8] Add 20 μl of chloroform and mix the contents of the tube gently. [9] Spin the tube briefly to sediment the debris. [10] The supernatant containing the phage is ready for titering. The supernatant may be stored at 4 C for up to 1 month. [11] Inoculate 50 ml of LB, supplemented with 10 mm MgSO 4 and 0.2% (w/v) maltose, with a single colony of E. coli XL1 MRF. [12] Grow at 30 C, shaking overnight, shaking at 200 rpm 24

II. Experimental procedures Day 5 Activity screens Protocol 16 CopyControl Fosmid Library Production (cont. protocol 13) Titering the Packaged Fosmid Clones. Before plating the library we recommend that the titer of packaged fosmid clones be determined. This will aid in determining the number of plates and dilutions to make to obtain a library that meets the needs of the user. [1] Make serial dilutions of the 1 ml of packaged phage particles into PD buffer in sterile microfuge tubes. For example, use dilutions 1:10 1, 1:10 2, 1:10 4 and 1:10 5. [2] Add 10 μl of each above dilution, individually, to 100 μl of the prepared EPI300-T1 R host cells. Incubate each for 20 minutes at 37ºC. [3] Spread the infected EPI300-T1 R cells on an LB plate plus 12.5 μg/ml chloramphenicol and incubate at 37ºC overnight to select for the fosmid clones. [4] Count colonies and calculate the titer of the packaged clones as following: if there were 200 colonies on the plate streaked with the 1:10 4 dilution, then the titer in cfu/ml, (where cfu represents colony -forming units) of this reaction would be: [5] (# of colonies) (dilution factor) (1000 μl/ml) / (volume of phage plated [μl]) [6] That is: (200 cfu) (10 4 ) (1000 μl/ml)/ (10 μl)= 2 x 10 8 cfu/ml Based on the titer of the phage particles determined before, dilute the phage particles from with PD buffer to obtain the desired number of clones and clone density on the plate. Mix the diluted phage particles with EPI300-T1 R cells prepared in the ratio of 100 μl of cells (prepared as above) for every 10 μl of diluted phage particles. Spread the infected bacteria on an LB plate plus 12.5 μg/ml chloramphenicol and incubate at 37ºC overnight to select for the fosmid clones. Subsequently these clones are plated with the help of a colony-picker robot, in 384-wells plates (LB, 12.5 μg/ml chloramphenicol and 15% of glycerol). Plates are incubated overnight without shaking at 37ºC. The colony-picker robot is again used to produce copies of the 384-wells plates. Protocol 17 plafr3 Cosmid Library Production (cont. protocol 14) Titering the cosmid packaging reaction 25

[1] Pellet the bacteria at 500 g for 10 minutes. [2] Gently resuspend the cells in half the original volume with sterile 10 mm MgSO 4. [3] Dilute the cells to an OD 600 of 0.5 with sterile 10 mm MgSO 4. The bacteria should be used immediately following dilution. [4] Prepare a 1:10 and a 1:50 dilution of the cosmid packaging reaction in SM buffer. [5] Mix 25 μl of each dilution with 25 μl of the appropriate bacterial cells at an OD 600 of 0.5 in a microcentrifuge tube and incubate the tube at room temperature for 30 minutes. [6] Add 200 μl of LB broth to each sample and incubate for 1 hour at 37 C, shaking the tube gently once every 15 minutes. This incubation will allow time for expression of the antibiotic resistance. [7] Spin the microcentrifuge tube for 1 minute and resuspend the pellet in 50 μl of fresh LB broth. [8] Using a sterile spreader, plate the cells on LB agar plus 10 μg/ml tetracycline and incubate at 37ºC overnight to select for the fosmid clones. Incubate the plates overnight at 37 C. [9] Count colonies and calculate the titer of the packaged phage particles as is described above. Based on the titer of the phage particles, dilute the phage particles from with SM buffer to obtain the desired number of clones and clone density on the plate. Mix the diluted phage particles with E. coli DH5α or XL1Blue cells prepared in the ratio of 100 μl of cells for every 10 μl of diluted phage particles. Spread the infected bacteria on LB agar, tetracycline 10 μg/ml, XGal 40 μg/ml plates and incubate at 37ºC overnight to select for the plasmid clones. Subsequently these clones are plated with the help of a colony-picker robot, in 384-wells plates (LB, tetracycline 10 μg/ml, and 15% of glycerol). Plates are incubated overnight without shaking at 37ºC. The colony-picker robot is again used to produce copies of the 384-wells plates. Protocol 18 Lambda phage Library Production (cont. protocol 15) Titering the cosmid packaging reaction [1] Pellet the bacteria at 500 g for 10 minutes. [2] Gently resuspend the cells in half the original volume with sterile 10 mm MgSO 4. [3] Dilute the cells to an OD 600 of 0.5 with sterile 10 mm MgSO 4. The bacteria should be used immediately following dilution. 26

[4] Prepare dilutions from 1:1 to 1:10 5 1:10 of the packaging reaction in SM buffer. [5] Mix 1 μl of each dilution with 200 μl of the appropriate bacterial cells at an OD 600 of 0.5 in a microcentrifuge tube and incubate the tube at 37ºC for 15 minutes shaking the tube gently. [6] Add 500 μl of NZY soft agar to each sample plate on NZY agar plates. Incubate the plates overnight at 37 C. [7] Count phage particles and calculate the titter of the packaged phage particles as is described above. After the titter, used to calculate the library size, the library is further amplified. Amplification can be performed both in liquid medium or agar plates. For amplification in liquid culture use the following protocol: [1] Mix 2 ml of a fresh, overnight bacterial culture (OD 600 0.95) with approximately 10 6 pfu of bacteriophage in a sterile culture tube. [2] Incubate for 15 minutes at 37ºC to allow the bacteriophage particles to adsorb. [3] Add 8 ml of pre-warmed LB medium (or NZY) and incubate with vigorous shaking until lysis occurs (6-12 h at 37ºC). [4] After lysis has occurred, add 2 drops of chloroform and continue incubation for 15 minutes at 37ºC. [5] Centrifuge at 4.000 g for 10 minutes at 4ºC. [6] Recover the supernatant, add 1 drop of chloroform, and store at 4ºC. The titter of the stock should be approximately 10 10 pfu/ml, and this usually remains unchanged as long as the stock is stored at 4ºC. For the amplification in solid agar, E. coli XL1 MRF cells are prepared as described above in MgSO 4 10 mm and OD 600 of 0.5. Then proceed as follows: [1] Two aliquots are prepared, each of them containing approximately 5x10 4 pfu and 600 µl E. coli cells. Do not exceed 300 µl phage solution per 600 µl of cells. [2] Incubate for 15 minutes at 37ºC with gently shaking after which 3 ml of NZY broth are added and further spread over NZY agar plated (20x20 cm) pre-warmed at 37ºC. [3] Incubate the plates at 37 C for about 8-10 h after which 8-10 ml SM buffer is added while shaking gently the plates (50 rpm) for additional 10 h at 4ºC. [4] The buffer is then decanted in a Falcon tube. Two additional ml of SM buffer are added to the agar and mixed with the previous solution. 27

[5] Add 5% (v/v) chloroform and incubate 15 min at 4ºC. [6] Centrifuge at 500 g for 10 minutes at 4ºC. [7] The supernatant is collected and stored: one small aliquot at 4ºC for lab use and other is stored at -70ºC after addition of 7% dimethyl sulfoxide (DMSO). The library is then ready to use. Protocol 19 Activity screens Lambda phage libraries will be used to screen particular activities. Plates 22.5 x 22.5 cm of NZYa, in which 7000-10000 phage particles may be screens, will be used. [8] Pellet the bacteria at 500 g for 10 minutes. [9] Gently resuspend the cells in half the original volume with sterile 10 mm MgSO 4. [10] Dilute the cells to an OD 600 of 0.5 with sterile 10 mm MgSO 4. The bacteria should be used immediately following dilution. [11] Mix 1 μl of library with 2 ml of the appropriate bacterial cells at an OD 600 of 0.5 in a Falcon 15 ml tube and incubate the tube at 37ºC for 15 minutes shaking the tube gently. [12] Add to 40 ml NZY soft agar to each sample plate on NZY agar plates. Incubate the plates overnight at 37 C. [13] Spray the plate with substrate and see colour development. 28

III. In silico procedures III.1. Bioinformatic for Metagenomics A beginners guide Dr. Michael Richter Michael Richter. Marine Microbiology Group, IMEDEA The sequencing of microbial genomes has become a fundamental approach for the understanding of complex biological networks. Currently, over 900 sequenced bacterial and archaeal genomes are publicly available and many more are on their way to be fully sequenced (www.genomesonline.org). The traditional cultivation-based sequencing approach has been complemented by the ground breaking cultivation-independent approaches, called metagenomics. Novel, cheap and ultra-fast sequencing technologies are generating enormous amounts of sequence data every day. On the one hand, this opens an unprecedented possibility to dig into the gold mine of sequence space; on the other, such large datasets raise several processing problems and drive current bioinformatic tools to their limit. In this practical course, the students will learn about the basic bioinformatic concepts of (meta)genome analysis, based on a large genomic fragment recovered form the environment. Independent of the chosen sequencing strategy, all data generated goes through a similar pipelines based on generic bioinformatic tools and databases, to accumulate knowledge through functional assignments and data integration. The starting point is always the localization of functional regions such as protein-coding genes. These predicted protein-coding genes have to be in silico compared to proteins from a public database. These protein sequence comparisons are used to infer a potential function for newly sequenced genes by information propagation from already published knowledge, a process referred to as gene annotation. Further, in metagenomics it is a common problem that genomic fragments that have been retrieved from environmental samples cannot be related to a specific group, because no phylogenetic marker genes are present. In this course we will use the free available software Tetra (www.megx.net/tetra/) to calculate tetra-nucleotide usage patterns and compare them to whole genome sequences. This method will provide valuable information about the relatedness of the compared sequences. The computational needs for genome analysis and comparisons are extensive and require a specialized infrastructure. This infrastructure includes powerful hardware systems consisting of a computing cluster and dedicated servers. Moreover, 'large' metagenomic datasets constitute an additional computational load, which must be processed through the same pipeline. In order, to get an overview of possibilities the genomic fragment will be analyzed by using the online available MG-RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes (metagenomics.nmpdr.org). This server provides a wide spectrum of tools for the annotation of sequence fragments, their phylogenetic classification and metabolic reconstructions. In summary, accurate, consistent data acquisition and processing is a prerequisite to generate biological understanding from the flood of sequence data. Future conceptual advances in microbial sciences will increasingly rely on the availability of an innovative computational infrastructure to interrogate these growing genomic and metagenomic datasets. But only by a close partnership of biologists and bioinformatics we will be finally able to understand the complex interplay of biological entities that form the basis of our planet earth. 29

III. In silico procedures III.2. Phylogentic reconstructions. An ARB software introduction Pablo Yarza Marine Microbiology Group. IMEDEA Phylogenetic affiliation of the inserts in a metagenomic library is easier once we detect the presence of certain genes with phylogenetic signal (as 16s and 23s rrnas) in a given clone. Rather than being common, good phylogenetic markers are restricted to a very small group of molecules that must fulfill most of the following requirements: to be ubiquitous, to have enough informational power, to have well documented orthologous in public databases, and to support the current taxonomic schema. The abundance of these markers and other potentially interesting genes in a metagenomic library depends on the library coverage and phylotype's richness of the sample source. These and other reasons make the construction and analysis of 16S rrna clone libraries as a recommendable step prior to the metagenomic approach in environmental samples. On the best scenario, inserts containing complete or partial SSU/LSU sequences can be optimally affiliated. In the absence of ribosomal markers, a small set of genes from those classified as 'housekeeping genes' can be used, although they could generate low-resolution phylogenetic reconstructions. On the worst case, where any kind of molecule with phyogenetical signal exists, other methods based on sequence composition could be used to hypothesize affiliation to known biodiversity. A phylogenetic reconstruction contains three main steps: i- searching and retrieving reference sequences from comprehensive databases, ii- aligning the sequences to verify positional orthology, iii- the final bulk of sequences has to be submitted to different treeing methodologies to guarantee a stable final topology. Nowadays a broad range of online tools and public databases facilitate the phylogenetic inference. Among them, of high relevance are: the SILVA project (http://www.arb-silva.de) which hosts one of the biggest and curated database of SSU and LSU genes with more than 300.000 entries; the All-Species Living Tree Project (http://www.arb-silva.de/projects/living-tree) which since one year updates a curated database built on only type strain sequences; the online automatic aligner for ribosomal sequences SINA aligner (http://www.arb-silva.de/aligner); and the free-cost ARB software package (http://www.arb-home.de) which integrates under the same interface all the necessary tools for any kind of phylogenetic reconstructrion based either on ribosomal markers or coding genes. This practical course will consist on a brief introduction to the phylogenetic reconstruction through a number of exercises consisting on retrieving sequences from public repositories, importing into the ARB software, performing alignments with a secondarystructure based editor, calculation of some trees and evaluation of the results. 30

III. In silico procedures III.3. Meta(genomics) assembling methodologies Dr. Giuseppe D Auria Cavanilles Institute on Biodiversity and Evolutionary Biology, Valencia, Spain The exponential improvement of sequencing technologies is going faster than our skills in data analysis. The last new high-throughput technologies such as pyrosequencing (454- Roche), Solexa and Solid, jointly with the still useful Sanger method, give to the researcher important instruments to obtain sequences information from single cultivated microbes (the best of the cases), complex communities with a necessary metagenomics approach, or more complex eukaryotic systems. In all these frames bioinformatics is the key step to reach the information hidden into the obtained data. The selection of the good strategy of sequencing depends on the first by the budget of the lab then by the studied organism, its genomic history (sample with single or multiple organisms, genome length, genome plasticity, presence of repeated sequences and mobile elements). In all cases, the possibility to access different kind of technologies with different types of sequences (in terms of length and quality) is extremely helpful in order to overcome the pro and cons of each kind of technology. So the bioinformatics efforts are strictly related to the correct choose of the strategy. This section is divided in two parts, the first will give hints about sequences formats, format conversions, accessing sequence quality data, assembly strategies by the use of open source Staden Package and MIRA (Mimicking Intelligent Read Assembly). The second part is cantered in assembly and complete genome data visualization and comparison. 31

IV. Contacts List of participans Manuel Ferrer CSIC Institute of Catalysis, Madrid e-mail: mferrer@icp.csic.es Ana Beloqui CSIC Institute of Catalysis, Madrid e-mail: abeloqui@icp.csic.es Nieves López-Cortés CSIC Institute of Catalysis, Madrid e-mail: nieveslopez@icp.csic.es Jodé Maria Vieites CSIC Institute of Catalysis, Madrid e-mail: vieites@icp.csic.es María Eugenia Guazzaroni CSIC Institute of Catalysis, Madrid e-mail: meugenia@icp.csic.es Yamal Al-ramahi CSIC Institute of Catalysis, Madrid e-mail: yamal_a_g@icp.csic.es Azam Ghazi CSIC Institute of Catalysis, Madrid e-mail: azamghazi@yahoo.com Javier Tamames Cavanilles Institut on Biodiversity and Evolutionary Biology, University of Valencia e-mail: javier.tamames@uv.es Giussepe D Auria Cavanilles Institut on Biodiversity and Evolutionary Biology, University of Valencia e-mail: Giuseppe.Dauria@uv.es 32