Big Data Visualization for Genomics Luca Vezzadini Kairos3D
Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to handle hundreds of thousands variables Traditional tools are not suited to handle such complexity Partial views of the data set, not easy to correlate different datasets, long computational time, more tools involved in a session, GenomeCruzer builds on a 3D Big Data analytics platform Interactive 3D scene that enables capturing, visualizing and navigating hundreds of thousands of data items Kairos3D MIMOS 2012 Conference 1
Project information Developed by Kairos3D Based on proprietary software platform Derived from experience on Big Data applications Designed with Institute for Cancer Research and Treatment Part of Fondazione Piemontese per la Ricerca sul Cancro Strong knowledge in cancer research and bioinformatics Tested on well known international data sets Project web site: genomecruzer.com Kairos3D MIMOS 2012 Conference 2
Current state of the art The 2D heat map. Tabular dataset where each column is a sample (normally a patient) and each row is a gene. Colors represent measurements for each gene in each sample. Rows and columns can be grouped in clusters (e.g. male/female patients) Different measurements (e.g. gene expression and copy number) result in more heat maps. Kairos3D MIMOS 2012 Conference 3
GenomeCruzer view[1/2] A view from above, which looks like a regular heat map. Clusters are more visibly separated Kairos3D MIMOS 2012 Conference 4
GenomeCruzer view[2/2] A full 3D view. Two data sets are displayed together now (one for color, the other for height) The user can select which data set to map to both parameters. Kairos3D MIMOS 2012 Conference 5
Main features [1/2] Interactive 3D scene with data walls Each wall displays a different type of information Relations among wall elements are displayed User can select items on a wall, the system updates values on all related items, also on other walls Hierarchical data model to convey clustering information LOD, both Automatic & Manual 3D view is optimized for better readability Kairos3D MIMOS 2012 Conference 6
Main features [2/2] Statistical Analysis functions User can select items on any wall The system updates values on all related items, also on other walls User can select what operations to apply For example: select a group of patients, this will update the values on all genes and gene groups, by computing the average value of the genes for the selected patients. Available on desktop & laptop computers Kairos3D MIMOS 2012 Conference 7
LOD example Kairos3D MIMOS 2012 Conference 8
Data walls (beyond heat map) Kairos3D MIMOS 2012 Conference 9
A 3 walls view Selection on a wall updates values on other walls Kairos3D MIMOS 2012 Conference 10
Scientific applications IRCC has prepared 3 case studies using GenomeCruzer More details on genomecruzer.com Based on colon cancer data sets from The Cancer Genome Atlas GenomeCruzer greatly simplifies integrative analysis Simultanous visualization of gene sequencing, copy number, expression or metylation data. No other easy way to correlate two linked data sets Fast screening of working hypothesis (search recurrent patterns) Kairos3D MIMOS 2012 Conference 11
Future work A discovery release currently available for free Includes the 3 case studies and video tutorial Scientific dissemination process is on going Need feedback from the international community Next planned improvents include: Generalized data input/output system Extended UI interaction to create and edit clusters Port to tablet devices (ios and Android) Address wider user base (biologists, pharma industries, ) Kairos3D MIMOS 2012 Conference 12
GenomeCruzer video Kairos3D MIMOS 2012 Conference 13