Web-based Gene Expression Handling with the Genetic Data Warehouse Jörg Lange, Toralf Kirsten Microarray-Workshop, June 2006
Outline Requirements for Gene Expression Analyses Intensity values MIAME Genetic Data Warehouse Usage Chip Handling Chip and Gene Annotations Analyses and Report
Intensity values Huge amounts of data with every new chip Data type: Numeric Must be interpreted and analyzed with statistical and empirical method Need annotations for interpretation and verification Gene annotation: Publicly available in databases: GenBank, UNIPROT, KEGG Chip/experiment annotation: Manually prompted by the experimenter; include sample data, array data and laboratory data
MIAME Minimum Information About a Microarray Experiment needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment Checklist for chip annotation which an experimenter has to capture Standard developed by Microarray Gene Expression Data Society (MGED) Necessary for publications e.g. in Nature Genetics, Bioinformatics http://www.mged.org/workgroups/miame/ miame.html
Components of MIAME
MIAME Example Entries Experiment design: Goal of the experiment, experimental factors and design Sample: Origin and characteristics (name, provider, gender, age) and manipulations (growth condition, treatments) Hybridization: Used protocols and conditions (temperature, duration) Measurement data: Raw and normalized data, used image scanning hardware and software and processing procedures Array: Platform, location of each spot [not necessary for standard array]
MIAMExpress Exact implementation of MIAME for annotation of microarray data Developed at the European Bioinformatics Institute (EBI), Hinxton Utilization of controlled vocabularies Annotation export to the public GE-Repository ArrayExpress Disadvantages Many input fields => error-prone, due to describe the same entities in different manner No query function No import function Not extendable for further annotations, e.g. such as captured in studies
Genetic Data Warehouse Developed at IZBI Leipzig Handling, analysis and storage of large chip-based genetic data Microarray-based gene expression data (Affymetrix) Matrix-CGH (Array-CGH) data Web-based interfaces for data im- & export and to perform analysis methods Load cel-files of chips and preprocessed data Chip annotation using predefined and extendable templates Visualize intensity values Generate statistical reports Integration of public annotation data Data export in tab-delimited form
www.izbi.de/geware Login of an user into an user group in which she has access
Current Applications GeWare is used in two collaborative cancer research studies Molecular Mechanism in Malignant Lymphoma http://www.lymphome.de/projekte/mmml German Glioma Network http://www.gliomnetzwerk.de/ Collaboration: Germany-wide clinical, pathological and molecular-genetics centers Heterogeneous data for hundreds of patients Harmonized study design managed by clinical trial software => integrated as chip annotation and GeWare is open for all researchers to share its functions
Apply Chips Apply the prompted number of chips Create or Append Experiments as Collection of Chips
Available Chip Types in GeWare Gene Expression: Affymetrix GeneChips Human: HG-U95A, HG-U95Av2, HG-U133A, HuGeneFl, Hu35KsubA, HG-U133_Plus_2 Mouse: MG_U74Av2, MOE430A, Mouse430_2 Rat: RAE230A C. elegans Further on demand (we load cdf then) Matrix-CGH Laboratory chips produced in Ulm within the MMML project Upload in form of of tab-delimited files by the user
Chip Annotation Goal: Utilization of a uniform and comprehensive annotation for later analysis Focus-dependent annotation data in different clinical studies, e.g. Lymphoma vs. Glioma Annotation templates Collections of annotation categories (parameters) for which the annotation values has to be captured Generic management of metadata and values Hierarchical arrangement of categories Definition of MIAME compliant templates Controlled vocabularies (predefined terms)
Chip Annotation (2) List of Chips with filters to decrease the number of entries Navigation bar Select values from specified controlled vocabulary
Browse Chip Annotation Search for relevant mol.-biol. data in GeWare using clinical data Group data for later reuse in other analysis
Browse Gene Annotation Fetch Probe Set Names and Chip Type by querying gene annotation like Gene Symbol, Map Location, OMIM, Annotation attributes are appended
Preprocessing List of possible preprocessings Selection of the chips with cel-files
Group Management GE Data CGH Data Chip Annotation (e.g. clinical data) Group Management Chip groups Gene- and clone groups Parameter groups Analyses Visualization Export
Visualization Display signal values of a gene and chip group in a line plot Display a M/A Plot of 2 selected chips Signal difference (Chip1 - Chip2) on the y axis and Signal sum (Chip1 + Chip2) on the x axis Draw a Heatmap of a chip and gene group Computed by statistical software R Selected chip annotation as class label Output: *.png or *.pdf Also available for Matrix-CGH
Visualization in Heatmaps (1) List of gene groups according to the selected chips and chip type e.g. HG-U133A Additional: Chip annotation as class label
Visualization in Heatmaps (2) Additional annotation class label: Stage (# infected Lymph nodes) Chips / Patients Heatmap including hierarchical cluster analysis Genes
Statistical Reports Standard error Chip and gene groups to filter Generation of new gene groups Flexible report extension with annotation attributes Available NetAffx annotation attributes Further annotation attributes Downloadable results Chip group filter Lymphome Controls View further gene annotations Gene group filter Selection to store genes Selected of interest annotation in a attributes group
Statistical Reports - Correlation Specify one probe set which correlation with a set of probe sets should be Additional Annotation computed Probe sets of the group sorted by correlation
Data Export All experimental files imported or generated by the own user group CEL-Files, Affymetrix Reports (RPT) if available Files that are generated by analyses, e.g. TIF Other data Intensity values (expression matrix) determined by chip and gene group Default: tab-delimited, other separators can specified Flexible extension by chip annotation data
Outlook Integration of further genetic data into GeWare Single nucleotide polymorphisms (SNP) Tiling Arrays More analysis methods based on projects members needs Differential gene expression analysis Chip quality control by M. Rosolowski
Thank you for your attention!