Visual Mining for Big Data Big Dive June 21st, 2013 Alessandro Piglia Kairos3D
Where do we come from? Kairos3D comes from real-time 3D graphics Serious Games (virtual visits, training for industry operators, ) Highly Immersive visualization (up to CAVE environments)
the BIG idea: Visual Mining Use interactive 3D technology to enable Big Data analytics Visually represent your Big Data sets View thousands of variables at the same time Interactively analyze single data items or groups Visually discover data patterns and correlations and gain insights
the BIG idea: Visual Mining
«positioning»
What the project is SW platform developed by Kairos3D Fully C++, heavily using GPU processing Packaged as a generic API (still evolving) Based on proprietary code, integrating open-source modules Main library: OpenSceneGraph (3D engine) Derived from experience on applications with huge data sets See examples and demos
And what the project is NOT A complete Big Data platform We only focus on the presentation layer (visualization and analysis) We don t access directly the (big)data store We rely on other tools for querying and preliminary normalization A commercial tool Mainly used internally or by trained partners to create applications A cloudy tool It is a client application (multi platform), it does not run in a Web browser (yet )
Long-term goals (potential) Create a generic data model (and related file format) Allow to easily input data from any source Generalize visual metaphors and keep adding new ones Provide different representations for different data structures Generalize analytics functionality Open to scripting and/or plugin creation Implement wizards to quickly assemble everything Which would mean: quickly create any app
Roadmap (potential) any data txt xml xls network metaphor library db f x = a 0 + a n + b n n=1 metada mapping
Example 1 Time Series The problem: visualize historical series for road traffic data Big volumes: over 16,000 values every minute Other info to integrate: event database The goals: Spot anomalies in traffic flow Try to correlate events and anomalies
Example 1 Time Series
Example 1 Time Series
Example 1 Time Series
Example 2 3D CMDB The problem: visualize a complex IT infrastructure Big volumes: thousands of items in a hierarchy Clustered in hundreds of groups (subnets, IT processes, ) Other info to integrate: monitoring data (system status) The goals: Check overall IT organization and spot potential issues Try to correlate malfunctions to their system causes
Example 2 3D CMDB
Example 3
the problem: genomic data complexity is increasing modern DNA sequencing produces billions of values per sample new era of cloud-based systems for managing, analyzing and sharing genomic data MORE DATA per single sample MORE SAMPLES in clouds MORE TOOLS for researches MORE data PERSPECTIVES analysis process fragmented as relevant resources are scattered among a pletora of different software tools and databases scientists need to analyze the structure and dynamics of a number of related variables
2000 who has the problem? TODAY more than 30,000 biomedical workgroups are publishing analysis on genomic data. The beginning of the digital age of molecular research. Genomic research trend Human Genome published TOMORROW there will be 10X workgroupgs switching to genomics for their research.
GenomeCruzer: what is the value 3D big data visualization has the potential to dramatically increase the volume of cancer research and shorten the path to cures makes the analysis process accessible to a wider range of researchers, even those with no bio-software skills, such as biologists and physicians the tool ultimately slashes the timelines of analysis and allows unsupervised, fast data analysis unique environment where the whole data set can be visualized and explored, together with its data patterns and relations expand the current reach of the software to attack new markets / new segments (i.e. agrigenomics / personal genomics)
GenomeCruzer today Preliminary release rolled out in production environment @ the Institute for Cancer Research and Treatment at Candiolo (Torino - Italy) Free discovery version completed and available for download (includes 3 case studies) @ http://genomecruzer.com Patent pending Thanks for showing us the fantastic software 黎 文 雁 Wenyan Li BGI-Europe I really enjoyed the demo and very much like a 3D approach to looking at this complex data Lukas J Smink,PhD,Manager, Regional Marketing EMEA Illumina The amazing thing is the speed at which we are exploring huge datasets and discovering features we never noticed before Dr Andrea Bertotti researcher @ the Institute for Cancer Research and Treatment Dr Enzo Medico has captivated an entire hall with his presentation about GenomeCruzer Dr Ovidiu Balacescu The Oncology Institute Cluj-Napoca
GenomeCruzer today Award for the AACR Annual Meeting 2013 Ongoing evaluation of discovery release great feedback received after poster presentation and demo at the second annual TCGA Scientific Symposium November 27-28, 2012 in Washington, D.C. GenomeCruzer evaluation (full TCGA datasets analysis + MBI samples data analysis)
Kairos3D s.r.l. Corso Casale 297 Bis - 10132 Torino, Italy VAT number: 10190870013 info@kairos3d.it thank you!