Big data for better science. Data Science Institute

Size: px
Start display at page:

Download "Big data for better science. Data Science Institute"

Transcription

1 Big data for better science Data Science Institute

2

3

4 Sense and Sensibility Julie A. McCann Adaptive Embedded Systems The aim of the Adaptive Emergent Systems Engineering (AESE) group in the Department of Computing is to examine the relationships between embedded systems and their environments (physical and human) to better understand their behaviours and impacts and to exploit this knowledge to enhance the performance of such systems. ICRI Cities London Living Lab (L3) The Hyde Park L3 Platform will advance the use of sensing and social platforms deployed in-the-wild to support research into ecology, air quality, water quality, noise and light pollution, public engagement, and the communication and manageability of sensed data. This will enable e.g. the Royal Parks authority to visualise real and near-time data through a simple dashboard alongside deeper analysis of raw data. Also in the Isis Education Centre educators and their audiences can engage school children and the general public with a better understanding of the park and its ecology, usage and history. Crowdsourcing and Opportunistic Networking Future smart cities will require sensing on a scale hitherto unseen. Fixed infrastructures have limitations regarding sensor maintenance, placement and connectivity. Employing the ubiquity of mobile phones is one approach to overcoming some of these problems whereby the phone carries the data. This work is first to exploit underlying social networks and financial incentivisation, combining network science principles and Lyapunov optimisation techniques, we have shown that global social profit across a hybrid sensor and mobile phone network can be maximised. Smart Water Systems Water networks are moving away from sparsely instrumented telemetry systems. The vast majority of next generation approaches to manage such networks consist of denser sensor networking but these still require data to be sent back to some core management servers. Actuation technologies are becoming more on-line and in-line with sensor networking. This brings about opportunities to make water networks smarter and in turn more resilient and optimal. Such a network is an example of a Cyberphysical System (CPS). With sample rates of up to 120/s there is a strong need for big data analytics and adaptive cloud computing. Acknowledgements London Living Labs is sponsored by Intel and Future Cities Catapult. Smart Water Systems is sponsored by NEC Japan and FP7 WISDOM. Photos Ivan Stoianov. Opportunistic Sensing is sponsored by the Intel Collaborative Research Institute Sustainable Connected Cities. Department of Computing, Huxley Building, South Kensington Campus, Imperial College London, SW7 2AZ. jamm@imperial.ac.uk wp.doc.ic.ac.uk/aese

5 Co-design of Cyber-Physical Systems Eric Kerrigan Cyber-Physical Systems Cyber-physical systems (CPS) are composed of physical systems that affect computations, and vice versa, in a closed loop. By tightly integrating computing with physical systems one can design CPS that are smarter, cheaper, more reliable, efficient and environmentally friendly than systems based on physical design alone. Examples include modern automobiles (the 2013 Ford Fusion generates 25GB of data per hour), aircraft and trains, power systems, medical devices and manufacturing processes. The dramatic increase in sensors and computing power in CPS present unique big data challenges to the engineer of today and tomorrow. The key big data questions for CPS are what, where, when and how accurate to measure, compute, communicate and store? My team is providing answers to these by developing control systems theory and mathematical optimization methods to automatically design the computer architecture and algorithms at the same time as the physical system. This co-design process results in a better overall system compared to iterative methods, where sub-systems are independently designed and optimized. cyber-physical system optimal inputs u (y) physical system computing system disturbances u (y) :=argminf(u, y) u s.t. g(u, y) =0 h(u, y) 0 numerical errors measurements y optimal design parameters for physical system p c co-designer (p,c ):=argminφ(p, c) p,c s.t. α(p, c) =0 β(p, c) 0 optimal design parameters for computing system By understanding the nature and timescales of the physical dynamics one can dramatically reduce the amount of data needed in order to make a decision and/or increase the quality and quantity of information extracted from a given data set. Current work is concerned with model-based feedback methods that allow one to minimize the amount of measurements and computational resources to estimate, in real-time, information that can then be used to control and optimize the behaviour of the overall system. Mathematical Optimization Most CPS co-design problems can be formulated as a multiobjective and constrained mathematical optimization problem. Furthermore, CPS are optimal only if the computing system is executing tasks with the goal of optimising given performance criteria. We are therefore developing methods to: model and solve the non-smooth and uncertain optimization problems that result during the co-design process, and solve constrained, nonlinear optimization algorithms in realtime on embedded and distributed computing systems. Control and Dynamical Systems Theory The main technical challenge in the co-design of CPS is to merge abstractions from physics with computer science: the study of physical systems is based on differential equations, continuous mathematics and analogue data, whereas the study of computing systems is based on logical operations, discrete mathematics and digital data. Furthermore, while a computation is being carried out, time is ticking and the system continues to evolve according to the laws of physics. A designer therefore has to trade off system performance, robustness and physical resources against the timing and accuracy of measurements, communications, computations and model fidelity. We are developing system-theoretic methods to understand and exploit this hybrid and real-time nature of CPS. Current work includes the co-design of parallel computing architectures, linear algebra and optimization algorithms to increase the efficiency of the computations. Acknowledgements This research is in collaboration with George Constantinides, Jonathan Morrison, Rafael Palacios, Mike Graham and Jan Maciejowski (Univ. of Cambridge). Department of Electrical & Electronic Engineering and Department of Aeronautics, Imperial College College London, South Kensington Campus, London SW7 2AZ. e.kerrigan@imperial.ac.uk

6 Crystallisation of Biological Molecules for X-Ray Crystallography Lata Govada Sahir Khurshid Tim Ebbels Naomi E. Chayen The Problem Detailed understanding of protein structure is essential for rational design of therapeutic treatments and also for a variety of industrial applications. The most powerful method for determining the structure of proteins is X-ray crystallography which is totally reliant on the availability of high quality crystals. The crystallisation of proteins involves purified protein undergoing slow precipitation from an aqueous solution where the protein molecules organise themselves in a repeating lattice structure. The Challenge There is currently no means of predicting suitable crystallisation conditions for a new protein. Figure 3 illustrates a sample/cross section of the enormous chemical space explored during screening. Finding crystallization conditions for a new protein is like searching for a needle in a haystack. Initial attempts (referred to as screening) involve the exploration of multi-dimensional parameter space using 1000s of candidate conditions. The miniaturisation and automation of such screening trials has been of great benefit but crystallisation continues to remain the rate limiting step to structure determination (Figure 1). Figure 2 is a crystal of the Human Macrophage Migration Inhibitory Factor. Figure 3. Plot of crystal hits for 269 macromolecules from the structural genomics community. Dark blue indicates five or more crystal hits for that cocktail, medium blue 3-4 and light blue 1-2. White areas are unsampled areas of chemical space. Figure 1. Results from structural genomics centres worldwide (Target Track PSI). Figure 2. Crystal of Human Macrophage Migration Inhibitory Factor. The relevant parameters include the type and concentration of precipitating agent, the concentration of protein, the type and concentration of a secondary precipitating agent and/or of an additive, the ph and temperature amongst others. One or more of these conditions may show some promise, most often in the form of microcrystals, clusters, or microcrystalline suspension. The following, optimisation step consists of fine-tuning these promising conditions by changing the values of the various parameters, such as concentrations and ph, in small increments, until useful crystals are obtained. This common approach fails in 80% of cases even when high throughput methods are employed. High throughput has not yielded high output and significant amounts of protein sample, time and resources are wasted. Computational and Systems Medicine, Department of Surgery and Cancer, Imperial College College London, South Kensington Campus, London SW7 2AZ. n.chayen@imperial.ac.uk A wealth of public data (PDB, BMCD) exists which is not being tapped into efficiently. The ability to predict crystallisation conditions would revolutionise this field. Addressing this challenge will require two aspects of Big Data Science. Firstly, the data generated from structural genomics projects on crystallisation conditions is huge, with millions of combinations of protein sequence and conditions attempted in high throughput screens. Storage, search and retrieval of this data in an efficient way will require tools of big data bases. Secondly, the discovery of patterns in sequences and other molecular properties which predict optimal crystallisation conditions will require sophisticated statistical and machine learning algorithms in order to make sense of the high dimensional but still sparsely sampled data. The desired result would be a more efficient methodology for conducting crystallisation experiments and an in silico approach to prediction of crystallisability. This would save immense amounts of experimental time, protein sample etc., and transform the field.

7 Global fits of dark matter theories Roberto Trotta Pat Scott Charlotte Strege The Dark Matter mystery The experimental hunt for dark matter is entering a crucial phase. Decades of astrophysical and cosmological studies have shown almost conclusively that 80% of the matter in the Universe is made of a new type of particle. One of the key questions of cosmology and particle physics today is to determine the nature and characteristics of such a particle. The aim of our work is to put constraints on the physical parameters of theoretical models for dark matter (such as Supersymmetry) by combining four complementary probes: cosmology, direct detection, indirect detection and colliders. This is the so-called global fits approach. Experimental probes of Dark Matter Cosmology: Observations of the relic radiation from the Big Bang, the cosmic microwave background, constrain the amount of dark matter in the Universe with very high precision. Direct detection: Direct detection experiments aim at detecting dark matter by measuring the recoil energy of nuclei undergoing a collision with a dark matter particle. Some highly controversial claims for detection are directly contradicted by other experiments, which have not found any statistically significant signal. Indirect detection: Dark matter particles annihilating into Standard Model particles produce high energy photons and neutrinos, which can be detected using dedicated space and ground-based observatories. Colliders: The Large Hadron Collider at CERN is putting strong limits on the properties of putative particles beyond the Standard Model. The recent discovery of the Higgs boson (for which the Nobel Prize in physics 2013 was awarded) also puts strong constraints on the properties of such speculative theories. Our work implements, for the first time, the entire spectrum of these constraints in a statistically correct way, in order to extract the maximum information possible about the nature of dark matter. Statistical constraints from Global Fits on the dark matter mass and scattering cross section in a 15-dimensional theory (Strege et al, to appear) Big Data challenges Our group has developed a world-leading Bayesian approach to the problem, allowing us to explore in a statistically convergent way theoretical parameters spaces previously inaccessible to detailed numerical study. Our methodology couples advanced Bayesian techniques with fast approximated likelihood evaluations. Even so, it remains computationally very challenging: Each likelihood evaluation requires numerical simulation of the ATLAS detector. This involves the generation of a large number of simulated events, the production of a numerical likelihood function based on a binned analysis and the evaluation of the ensuing constraint. The above process is CPU and disk-space intensive: our current study (see above) required 100s of TB of disk space and 400 CPU-years of computing power. We studied theoretical models with up to 15 free parameters. The most general models have up to 105 parameters, so novel techniques are needed to explore such complex parameter spaces. Acknowledgements We thank Imperial High Performance Computing services and the University of Amsterdam for providing computing resources. This project is in collaboration with G Bertone, R Ruiz de Austri and S Caron. Map of the relic radiation from the Big Bang, used to measure the amount of dark matter in the Universe. Credit: Planck/ESA Astrophysics Group, Blackett Laboratory, Imperial College London, Prince Consort Road, London SW7 2AZ. r.trotta@imperial.ac.uk

8 Digital City Exchange David Birch Yike Guo Nilay Shah Orestis Tsinalis John Polak Koen van Dam Eric Yeatman Context and Challenge Cities are now home to more than half of the worlds population. They face significant challenges, such as congestion, air quality, provision of food and electricity, but also offer opportunities for innovation and collaboration as well as an increased efficiency enabled by their density. A smart city is a connected city: efficient use of resources through interaction and integration. This requires a better understanding of the complexity of cities and urban living. Approach A three tier solution comprising an ontology-supported sensor data store, a workflow engine and a web based interface to build a chain of connected data sets and models, enables the creation of services which take advantage of (real-time) data, analytics and predictive models. We have the data but how can we make the most of city data and cope with integration and the vast scale? City infrastructures are connected and influence one another. Currently data is collected, analysed and used in the traditional silos of energy, transport, education, waste, etc, but the hypothesis of the Digital City Exchange is that through data integration better decisions can be made. We are building the infrastructure to facilitate this and then test it with analytical and predictive models. City Data Data is collected by utility companies, (local) governments and service providers, but also by residents. This includes induction loops in the roads to measure traffic flows, air quality monitors, pothole reporting via smartphone, smart bins that report when they are full, social media messages sent, etc. Much of this data is closed and only one party has access to it, while other data is shared (possibly paid for) or even released as open data for anyone to use. Platforms are needed to store, analyse and collaborate using this data. Acknowledgements Digital City Exchange is a five-year programme at Imperial College London funded by Research Councils UK s Digital Economy Programme (EPSRC Grant No. EP/I038837/1) D.Stokes@imperial.ac.uk

9 Astronomically Big Data David L. Clements Steve Warren Daniel Mortlock Alan Heavens Large scale catalogs in astrophysics are already large, but the next generation of surveys will boost that size by orders of magnitude. In particular the Euclid Mission will provide Hubble Space Telescope quality near-ir images across the entire sky, while the Large Synoptic Survey Telescope (LSST) will image the entire (accessible) sky in 5 different colours every 5 days. Conventional methods of classifying objects (using image metrics or Citizen Science) may be inadequate for fully exploiting the discovery space of these vast surveys. Statistical analysis of these vast datasets, to test Einstein's theory of gravity and shed light on the Big Bang, also presents formidable data analysis challenges which need to be met if the power of the surveys is to be realised. Euclid & LSST: The coming deluge The forthcoming Euclid and LSST projects will be orders of magnitude beyond the scale of SDSS & similar current projects. Current state of the art: SDSS Sloan Digital Sky Survey (SDSS) observed ¼ of the sky in 5 optical bands, obtaining imaging & photometry for 500 million sources, and spectroscopy for 1 million. Images and spectra are automatically analysed, but human eyeball citizen science through Zooniverse has proved useful in finding truly unusual objects, for example Hanny s Voorwerp, the green object shown below, a previously unknown and poorly understood ionised gas cloud in the intergalactic medium, found through the citizen science project Galaxy Zoo. (Source: NASA/ESA/W) Euclid will observe ~40% of the sky to resolutions comparable to the Hubble Space Telescope (HST). 10 billion galaxies will be imaged, each of which will have 100 times the number of pixels in an SDSS image, for ~ 2000x the amount of data/night. LSST (Large Synoptic Survey Telescope) will be a wide field 8m telescope which will survey ~ ½ the sky (20000 sq. deg.) in 5 colours every 5 days. Can be combined to give time resolution to search for transient sources (eg. Supernovae), stacked to go deep, or some variety. Data rate is 30 Terrabytes/ night & it will run for >10 years. The discovery space for these projects is so big that it cannot be handled by either conventional computing or citizen science approaches. Physics Department, Imperial College College London, South Kensington Campus, London SW7 2AZ. d.clements@imperial.ac.uk

10 Future Computational Platforms Christos Bouganis Specialized Computational Platforms The increasing need for processing large amount of data as fast as possible, combined with the development of increasingly complex computational models for the more accurate modeling of the underlying processes, has led researchers and practitioners to adopt suboptimal approximation models or, in certain cases, to the heavy use of High-Performance Computing clusters. However, both approaches are not desirable as the former one does not provide the best possible solution where the latter approach results in low silicon efficiency and high power consumption as these systems are not tailored to the structure of a specific application. Probabilistic Inference Acceleration Our work also focuses on the bioinformatics domain where it is often required to analyze large amount of data using complex probabilistic models. As Probabilistic Inference algorithms are computational expensive, our work focuses on the design of computational platforms with an architecture that is tuned to the probabilistic inference algorithm. Recent results obtained form the acceleration of population-based MCMC algorithms show that two orders of magnitude speed-ups over traditional CPU code can be achieved with minimum power footprint. In the Circuit and Systems group of Electrical and Electronic Engineering Department, we contact research into core computational platforms that can be adapted to specific applications leading to high performance gains within a power budget compared to the classical computer architectures. Our current work involves the design of computational platforms for the acceleration of the training stage of computational demanding Machine Learning algorithms and the acceleration of probabilistic algorithms for Bayesian Inference when they applied to health care. Machine Learning Our group has developed a computational platform that accelerates the training stage of a Support Vector Machine algorithm making possible to achieve high classification rates within a limited time and power budget. By designing the architecture of the system to match the targeted algorithm, the system has achieved a speed up of two orders of magnitude consuming only a fraction of the power footprint compared to a personal computer. Other key aspects of our research are the optimization of the memory interface to maximize the bandwidth between computation and SDRAM memory, and data-path optimization, including computer arithmetic, for low power and high performance. Department of Electrical and Electronic Engineering, Imperial College College London, South Kensington Campus, London SW7 2AZ. christos-savvas.bouganis@imperial.ac.uk

11 Big Data in Medical Imaging Daniel Rueckert Ben Glocker Overview In medical imaging, a vast amount of information is collected about individual subjects, groups of subject or entire populations. A characteristic of medical imaging is that the sensors or devices (e.g. CT or MR machines) can produce 2D, 3D or even 4D datasets. While each dataset is large in itself, the amount of derived information from each dataset is often much larger than the original information. In the following we outline the challenges of big data in the context of medical imaging that are addressed in the Biomedical Image Analysis Group at Imperial College London. Big data from clinical studies/trials Over the last years there has been an explosion of imaging data generated from clinical trials. In addition to imaging data collected for drug development there is an increasing amount of data available for research purposes. Two of the most prominent examples this are the Alzheimer s Disease Neuroimaging Initiative (ADNI) and the Human Connectome Project (HCP). The latter project is build a comprehensive map of neuronal connections at macroscale. For this state-of-the art diffusion and functional MR imaging (see figure below left) is collected from 1200 subjects, producing more than 25GB of raw data per subject. The analyzed data (see figure below right) requires more than 1 PB storage. Machine learning for medical imaging The use of machine learning in the analysis of medical images plays an increasingly important role in many real-world, clinical applications ranging from the acquisition of images of moving organs such as the heart, liver and lungs to the computer-aided detection, diagnosis and therapy. For example, machine learning techniques such as manifold learning can be used to identify classes in the image data and classifiers may be used to differentiate clinical groups across images (see figure below left). In addition, the approaches allow the combination of imaging information and non-imaging information, e.g. genetics (see figure below right). Special vertices encoding, non-imaging information, e.g. ApoE genotype The figure below shows the application of these ideas for the automatic identification of subjects with dementia. Big data from population studies An example of big data from population studies is the UK Biobank imaging effort. This project has recently received funding for a large-scale feasibility study which, if successful, will allow it to conduct detailed imaging assessments of 100,000 UK Biobank participants. This more detailed characterisation of these participants will allow scientists to develop an even greater understanding of the causes of a wide range of different diseases (including dementia) and of ways to prevent and treat them. The imaging study will involve magnetic resonance imaging of the brain, heart and abdomen (see figure right), low power X-ray imaging of bones and joints and ultrasound of neck arteries. Biomedical Image Analysis Group, Department of Computing, Huxley Building, Imperial College London, South Kensington Campus, London SW7 2AZ d.rueckert@imperial.ac.uk, b.glocker@imperial.ac.uk

12 Effects of high-frequency company specific news on individual stocks Robert Kosowski Ras Molnar Research Objectives The aim of this research is to study the impact of high-frequency company specific news on individual stocks. The term high frequency news in this context means news that are reported electronically by news company during the day. Why are high-frequency news interesting to study? Highfrequency news is an important information source for all market participants and sheds light on economic transmission mechanisms that cannot be observed using lower frequency, for example, daily end of day closing prices or low frequency economic indicators. How is our research novel? The contribution of our research lies in the fact that we are not only measure the sentiment extracted from news but other news characteristics as well. We also utilize high-frequency data which have not been studied in this perspective extensively. What are expected outputs? We expect to find that high frequency news and novel sentiment measures have an economically significant impact on asset prices. It is likely that the innovations in our methodology will lead to more significant results compared to existing studies. Big Data For the purpose of our project, we use two main sources of the high-frequency information. Both imply a vast amount of data related to both news and trades. We use the news database based on the Reuters Site Archive. This dataset contains about 5.6 million Reuters news from the beginning of 2007 until the end of Raw HTML files take about 426GB, while the database containing news identifiers and news text is around 31GB large. Number of high-frequency news by year Number of news We use the TAQ database for high-frequency stock data. This dataset contains trades and quotes from major American stock exchanges. In our research we intend to use trades only. Trade data is an example of Big Data because the number of trades increased over time from 92 million trades in 1993 to 7.5 billion in The extensive number of trades implies a large size of the database itself. A cumulative size of databases containing TAQ trades from the beginning of 2007 until the end of 2012 is expected to be around 4TB. Number of trades by year Number of trades (in million) Year Methodology The methodology we use in this research is a methodology in line with the existing literature stream (for example Gross- Klussmann, A. and N. Hautsch. When machines read the news: Using automated text analytics to quantify high frequency news-implied market reactions. Journal of Empirical Finance 18 (2), ). The frequency and amount of data we have to process means we pre-process data within the database before we progress with the analysis. In the case of news data we calculate the sentiment, relevance and novelty of news using the textual analysis similar to Boudoukh et al. (Which news moves stock prices? A textual analysis. Technical report, National Bureau of Economic Research. 2013). Stock market data are sampled and only parts necessary for our analysis are selected. The analysis itself consists of two parts, an event study and the vector auto regression model. The goal is to explain the reaction of stock market given the characteristics of news. Acknowledgements Our news data database is based on the Reuters News Web Archive Year Finance Group, Imperial Business School, Imperial College London, South Kensington Campus, London SW7 2AZ. r.kosowski@imperial.ac.uk

13 Development of an ovarian cancer database for translational research Haonan Lu Christina Fotopoulou Ioannis Pandis Yike Guo Hani Gabra Ovarian cancer is a systemic disease which can be dysregulated through multiple mechanisms, therefore it is crucial to understand the detailed molecular pathways behind it. Recently, the Cancer Genome Atlas(TCGA) project has generated multiple levels of OMIC data from genome to phenome, which gives us a comprehensive view of high grade epithelial ovarian cancer. However, the cross-correlation of good quality clinical data to multilevel molecular profile is required to obtain valid biomarkers. Furthermore, the difficulty in accessing but also reproducing the TCGA data has been a known issue impairing interpretation and implementation of the findings. Multiple molecular profile constructed for 175 ovarian cancer cases We have previously systematically collected samples from 175 primary epithelial ovarian cancer patients and obtained molecular information across multiple platforms, including gene expression microarray, SNP array, exome sequencing and Reverse Phase Protein Array(Figure 1). A great advantage of these data is the samples are collected from a single institute which had much less bias on the sample type, therefore the clinical data is cleaner and the molecular data is more reliable. (a) (b) Metabolomics Serum and urine(to be done) Imperial College Gene expression profile >47,000 transcripts Genome Institute of Singapore 175 Ovarian Tumor Samples DNA copy number variation 5,677 CNV regions Genome Institute of Singapore Exome sequencing Whole exome London Research Institue Proteomics >160 proteins MD Anderson Figure 1. (a)type of molecular profile data obtained from 175 ovarian cancer patients. Coverage of each platform in bold and collaborators in italic. (b) Published result using part of the gene expression data. We compared the gene expression profile among three subtypes of ovarian cancer(benign, borderline and malignant). We found distinct gene expression pattern between benign and malignant tumor, whereas borderline tumor showed two distinct subgroups: one benign-like and the other malignant-like. Courtesy from Molecular subtypes of serous borderline ovarian tumor show distinct expression patterns of benign tumor and malignant tumor-associated signatures., Mod Pathol, , Curry EW, Stronach EA, Rama NR, et al., (i) (ii) (a) Number of clinical parameters TCGA Hammersmith Total Surgical Chemotherapy Correlate with outcome (e.g. overall survival and progression free survival) Personalize Surgical operation Novel clinical parameters Biomarkers to stratify patients Correlate with molecular profile Personalize drug treatment Figure 2. (a)comparison of the number of clinical parameters collected from TCGA and Hammersmith. (b)planned workflow after obtaining the new clinical data. (b) Data Interpretation using transmart Apart from generating quality data, we ve also been working on making the data more accessible to researchers, by collaborating with the transmart project. transmart is a database platform with built-in analytical tools that is ready-touse for all the researchers. We are currently creating the Ovarian Cancer Database within the transmart platform, which contains our dataset together with other popular datasets to help researchers perform data analysis across multiple studies(a work example is shown in figure 3). We are aiming to significantly accelerate the ovarian cancer research for both clinicians and scientists. Figure 3. Example workflow of using Ovarian Cancer Database in transmart. (i)discovering the association between chemotherapy response and overall survival using GIS dataset. Kaplan- Meier plot shows patients responding to chemotherapy well(blue) has a significantly higher survival rate comparing to the chemo-resistant patients(red). (ii)differential gene expression between the two patient cohort(complete response and progressive disease). As the corresponding gene expression profile is available for these patients, differential gene expression analysis can be performed to discover potential marker genes for chemo resistance. (iii)crossvalidate gene of interest in multiple datasets and hence guide following experimental research. All the analysis shown is performed within transmart. (iii) Continuously updated clinical data In order to place these molecular data within the correct frame of context and be able to define valid biomarkers of surgical and clinical outcome, we are currently generating robust, updated and detailed surgical and clinical data to be cross-correlated with the molecular biological information(figure 2). Acknowledgements We specially thank Prof. Yike Guo, Dr. Ioannis Pandis and other group member for their help with transmart platform. Tothill Dataset TCGA Dataset Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Imperial College College London, Hammersmith Campus, London W12 0NN. h.gabra@imperial.ac.uk

14 Data Science Institute, Imperial College London Institute for Security Science & Technology Donal Simmie Maria Grazia Vigliotti Erwan Le Martelot Chris Hankin Influence in Social Networks Influential agents in networks play a pivotal role in information diffusion. Influence may rise or fall quickly over time and thus capturing this evolution of influence is of benefit to a varied number of application domains. We propose a new model for capturing both time-invariant influence and also temporal influence. We performed a primary survey of our population users to elicit their views on influential users. The survey allowed us to validate the results of our classifier. We introduce a novel reward-based transformation to the Viterbi path of the observed sequences which provides an overall ranking for users. benefits to us in solving these problems by improving our memory and recall and presenting data to us in a manner that leads to insight and/or questions our decision for a more positive outcome. Sensemaking provenance captures the reasoning flow of an analyst during a specific task. We perform Machine Learning on the interactions of the analyst with the computer and the context of those actions to determine their probable reasoning. Our results show an improvement in ranking accuracy over using solely topology-based methods for the particular area of interest we sampled. Utilising the evolutionary aspect of the HMM we predict future states using current evidence. Our prediction algorithm significantly outperforms a collection of models, especially in the short term (1-3 weeks). TRAIN TEST Automated Sensemaking Recovery Complex data analysis is often multi-modal incorporating visualisations, structured and unstructured data sources possibly from numerous disparate data sources. Making sense of the presented data and interrogating it successfully to form hypotheses and conclusions are non-trivial tasks but they are aided by leveraging applications and bespoke tools designed for exactly this purpose. Humans are skilled at solving difficult problems or at exploring data and discovering new insights. However computers can provide Fast Multiscale Community Detection Many systems can be described using graphs, or networks. Detecting communities in these networks can provide information about the underlying structure and functioning of the original systems. Yet this detection is a complex task and a large amount of work was dedicated to it in the past decade. One important feature is that communities can be found at several scales, or levels of resolution, indicating several levels of organisations. Therefore solutions to the community structure may not be unique. Also networks tend to be large and hence require efficient processing. In this work, we present a new algorithm for the fast detection of communities across scales using a local criterion. We exploit the local aspect of the criterion to enable parallel computation and improve the algorithm's efficiency further. Acknowledgements Influence in Social Networks and Fast Multiscale Community Detection are supported by Making Sense project under EPRSC grant EP/H023135/1. Automated Sensemaking Recovery is supported by UKVAC project under funding by US DHS and UK Home Office. Institute for Security Science and Technology, Imperial College College London, South Kensington Campus, London SW7 2AZ. d.simmie@@imperial.ac.uk

15 Future science on exabytes of climate data David Ham When climate models execute on 100 million cores, and generate exabytes of data, how will we work with this data? How will we account for the diverse numerical schemes used to produce it? How will the users of climate research know that our calculations were valid and that our results can be relied on? Climate Model Intercomparison Climate modelling basis for the UN Intergovernmental Panel on Climate Change (IPCC) assessment reports. A very large component of modern climate science is based on the analysis of data from the CMIP simulations. As computing power increases, climate model resolutions become ever finer, and the resulting data sets demonstrate exponential growth. CMIP Phase 3 (2006) produced 36 Terabytes A proposed toolchain for high productivity, scalable and verifiable climate data science Rather than hand-writing bespoke low-level processing tools, climate researchers need to be able to state their questions in high level mathematical form. The code implementing the query will be automatically generated by the Firedrake system and applied to the climate model data. A query for the mean sea surface temperature in the North Atlantic might appear as: CMIP Phase 5 (2011) produced 3,3 Petabytes CMIP Phase 6 (~2020) is expected to yield 100s Petabytes 1 Exabyte Climate science queries Climate science questions typically require mathematical functions to be applied to reduce vast spatial and temporal field data sets to meaningful climate statistics. Across the vast field of climate science, each research project has its own specialised questions to ask. For example: Which models predict an increase in coastal flooding for the UK? How does Atlantic sea surface temperature differ in different simulations? What is the strength of the Gulf Stream in all of the CMIP simulations? Current methodology Data is downloaded by each researcher, and custom analysis scripts are developed for each query. This is: Labour-intensive: researchers, often PhD students and Postdocs, around the world are constantly re-implementing very similar work. Error-prone: every query script is bespoke and is a new source of errors. There is no systematic mechanism for finding errors. Untraceable and unverifiable: there is no effective mechanism to publish the actual techniques applied to the data, and verifying their correctness is next to impossible. The results published in the literature must currently be taken on trust, as there is no mechanism for establishing their provenance. north_atlantic = domain (latitude = (0., 60.), longitude = ( 60., 0.)) for date in <list of dates>: atlantic_multidecadal_oscillation = \ integral(sea_surface_temperature*dx(north_atlantic))/\ area(north_atlantic) An Imperial-developed system for automatic generation of high performance, parallel, numerical code from the mathematical query. Different numerics will be generated to execute the same mathematics on the outputs of different models. The code generator can be extensively tested to provide verifiably correct results. Generated code will be applied to the data using cloud resources attached to the archive site. The original data is not downloaded by the user. The original query is short and expressive and can therefore be included in publications. This will enable verification and reproduction of results, which is currently effectively impossible. Departments of Mathematics and Computing, Imperial College College London. david.ham@imperial.ac.uk

16 Bio-Inspired Paradigm Within the Centre for Bio-Inspired Technology we utilise biological principals and mechanisms to create more efficient healthcare technology. This bio-inpsired paradigm allows for (1) Learning from biology to create more efficient healthcare technologies and (2) Modeling biology to understand it better. Expanding this principle we apply local intelligence to our devices to create more efficient data transmission and to implement closed-loop protocols. Biologist s Electrophysiology Models Electrical Engineer Applied Physics Intelligent Neural Interfacing Systems Amir Eftekhar Sivylla Paraskevopoulou Timothy Constandinou Christofer Toumazou Brain Interfacing The brain is a complex network of 100 billion neurons. To transmit the full quantity of data it produces would be nearly 16,000 Tb/s per person. In a chronic disease population of 1 million people if we were to monitor what can be achieved with modern electrodes and communication (100 electrodes) this equates to 16Tb/s. The same is true for other monitoring schemes: heart activity (ECG, 2-3 channels), non-invasive brain (EEG, up to 64 channels). Although lower in sampling frequency, they still equate to 3Gb/s per channel for a population of 1M, or 11Tb/hour. Biological Hardware Software Modelling Biology Understanding Biology High Density Microelectrode Array Organs /Systems Architectures Applications Simulating Biology Bio-Inspired Architectures Examples from our group include a closed-loop artificial pancreas, cochlea implant and retina chip. Some of our more recent work applies local intelligence neural interfacing. Implanted Biotelemetry Closed-Loop Appetite Control Obesity is one of the greatest public health challenges of the 21st century. Affecting over half a billion people worldwide, it increases the risk of stroke, heart disease, diabetes, cancers, depression and complications in pregnancy. Bariatric surgery is currently the only effective treatment available but is associated with significant risks of mortality and longterm complications. The peripheral nervous system is a complex network of over 45 miles of nerve with impulses at speeds of 275mph. In this project we are tapping into the Vagus Nerve to extract the signals that control appetite and electrically stimulate to regulate it. The gut is densely innervated by the Vagus nerve, thus its signals represent an integrated response to nutrients, gut physiology and hormones and have a powerful effect on appetite. The nerve is a complex structure so requires interfacing with dozens of electrodes monitoring chemical and electrical activity. Here we are utilising real-time, self learning algorithms for closed-loop control of appetite. Connectors Cuff Contacts Microchip Nerve Cuff Electrode Microspike array Spinal Stimulation TYPICAL NEURAL RECORDINGS Amplification Conditioning + Pre-processing Spike Detection Spike Sorting Analysis Stimulation TOWARDS INTELLIGENT NEXT GENERATION NEURAL INTERFACES Prosthetic Control Brain-Computer Interface External Transponder / Power unit With the advent of High Density Microelectrode Arrays we can tap into a subset of these. Neural activity can be monitored from 100 s of channels, with data rates exceeding 20Mbps this is not possible in medical implants. We require local, intelligent processing of neural signals can reduce this to less than 1Mbps which facilitates closedloop systems, such as for spinal cord stimulation. We have developed low power, realtime spike detection and sorting algorithms, part of the process for processing neural signals i.e. identifying which has fired in the vicinity of the electrode. We are currently in the process of developing the final generation of microchip with this processing embedded. With it, we can reduce the 1Tb/s to less than a Mb/s for 500 neurons. SINGLE NEURON SIGNALS Acknowledgements This work is primarily a multi-disciplinary among many researchers and students at the Centre for Bio-Inspired Technology and collaborators. Centre for Bio-Inspired Technology. Dept. of Electrical and Electronic Engineering, Imperial College College London, South Kensington Campus, London SW7 2AZ. amir.eftekhar@imperial.ac.uk

17 Digital Money Llewellyn Thomas Antoine Vernet David Gann Project Context and Goals Money is one of the most influential factors shaping human history, driving not only wealth creation and socio-economic development, as well as religion, ethics, morality, and fine art (Eagleton & Williams, 2011). Some have argued that digital money, as distinct to earlier forms of money, has the potential to provide major economic and social benefits, such as by removing the friction in transactions or enabling inclusive innovation (Dodgson et al., 2012). Moreover, the big data generated by digital money can used to improve business operating efficiency, develop novel business models, as well as complementing or even extending the notion of identity. However there is little, if any, systematic research into digital money, its adoption and impact. Given this gap, it is our ambition to address the following: Does digital money adoption make a difference? What are the big data implications of digital money? Is it possible to quantify the benefits to governments, corporations and individuals? What are the factors that affect the outcome of a digital money initiative? Conceptualizing Digital Money We define digital money as currency exchange by electronic means. Digital money is a socio-technical system that fulfils societal functions through technological production, diffusion and use (Geels, 2004). It is a system of value interchange relying on information and communication technologies that themselves form a systems. As a result and given the importance of regulation to digital money, we conceptualised the digital money system as four interacting components: the national institutional context, the enabling technological and financial infrastructure, the demand for digital money, and the industries that drive digital money supply. Taking these four components as the pillars of the composite index, we selected a range of indicators which measure progress along each pillar, ranked countries according to their digital money readiness, and, using cluster analysis, identified four stages of readiness. We also correlated our index with existing cashlessness measures, and found that although there is strong correlation, there are also developed and developing world outliers that reflect the social and cultural aspects of money. Future Directions This research has begun to widen the discussion of digital money to a broader academic audience. It has also provided a comprehensive definition of digital money that encompasses both the wide variety of existing digital means of exchange, as well as those future technologies that are undoubtedly to come. Our digital money readiness index also has important implications for policy makers. Moving forward, we intend to: Improve the transparency of the index; Include measures of digital currencies, such as Bitcoin; Implement a penalty for bottleneck to improve policy implications for the index; Investigate the big data implications of digital money; Digital Money Readiness To provide better insight into the differing readiness of countries for digital money, we have developed a Digital Money Readiness Index. By readiness we mean the level of development of the country with respect to the institutional, financial, technological, and economic factors that underpin digital money. Investigate whether the claimed economic and social benefits of digital money are indeed present. Acknowledgements We gratefully acknowledge both the financial and intellectual support of Citigroup, and would particularly like to thank Greg Baxter, Sandeep Dave, and Ashwin Shirvaikar. We also thank Lazlo Szerb and Erkko Autio for their suggestions on composite indices. Business School, Imperial College College London, South Kensington Campus, London SW7 2AZ. llewellyn.thomas@imperial.ac.uk

18 Impact of Changes in Primary Health Care Provision Elizabeth Cecil Alex Bottle Mike Sharland Sonia Saxena Unplanned hospital admissions in children have been rising across England over the last decade [1] Access to timely and effective primary care for minor or non urgent conditions prevents potentially avoidable hospital admission [2] GP s withdrawal from out of hours care in 2004 may have resulted in children being seen in hospital emergency departments where previously parents would have contacted their GP particularly for acute infectious illness The Quality and Outcomes Framework (QOF) has been successful in incentivising primary care to improve adult health outcomes for chronic disease. Yet children who make up 25% of GP workload are underrepresented in quality improvement targets in primary care. Hence children may access hospital based alternatives to primary care for acute exacerbations of chronic conditions [3] Aim: To investigate whether GP services changes have impacted on unplanned and short stay hospital admissions in children for infectious and chronic disease. Design: National population-based time trens study. GP Methods Alternative eg. walk in centres, telecare A&E We used Hospital Episode Statistics (HES) data from all English hospitals from on children aged<15 years to calculate age/sex standardized admission rates for all unplanned admissions; short stays <=2 days with no readmissions and very short stays (no overnight stay). We adjusted for deprivation. The interrupted time series analysis model design allowed for a step change at and gradient change post 2004, in rate of unplanned hospital admissions in children. Outcomes: Total unplanned, short and very short stay hospital admission rates; for all cause, infectious and chronic disease. Exposure: Post 2004 Results Crude unplanned admission rates increased between 2000/1 and 2010/11 in all developmental age bands in children aged <15 years. The adjusted rate of all cause unplanned admissions increased by 2%/ year after the introduction of GP service changes in 2004, compared to the trend in previous years (rate ratio (RR) = 1.02 (95% CI: 1.02, 1.03)). The biggest changes were observed in very short stay admissions, those unplanned admissions with no overnight stay. There was an estimated step change of 8.5% (RR = 1.08, (95%CI: 1.07, 1.10)) in adjusted unplanned admission rates for all chronic diseases, in There was no evidence of a step change in the adjusted unplanned admission rates in infectious disease but the rate of increase doubled after 2004 from 1.2% to 2.3% per year All Cause Chronic Disease Infectious Disease Standardized Rate Fitted Rate Department of Primary Care and Public Health, Imperial College College London, South Kensington Campus, London SW7 2AZ. Peadiatric Infectious Diseases Unit, St. George s, University of London, Cranmer Terrace, London SW17 0RE. e.cecil@imperial.ac.uk

19 Early in-hospital mortality following trainee doctors first day at work Min Hua Jen Alex Bottle Azeem Majeed Derek Bell Paul Aylin There is a commonly held assumption that early August is an unsafe period to be admitted to hospital in England, as newly qualified doctors start work in NHS hospitals on the first Wednesday of August. A previous UK study using national death certificate data found no effect, but could not discriminate between in and out of hospital deaths. US studies have suggested an equivalent July effect. We investigate whether in-hospital mortality is higher in the week following the first Wednesday in August than in the previous week using national hospital administrative data. Methods Two retrospective cohorts of all emergency patients admitted on the last Wednesday in July and the first Wednesday in August for 2000 to 2008, each followed up for one week. If by the end of the following Tuesday, a patient had died in hospital, we counted them as a death; otherwise we presumed them to have survived. We calculated the odds of death in admissions occurring on the week after the first Wednesday in August compared with those on the week before, adjusted for age (20 groups: <1 year, 1 4, 5 9, and five-year bands up to 90+), sex, area-level socio-economic deprivation (quintile of Carstairs index of deprivation), year (NHS financial year of discharge, from 1st April each year to the 31st March the next year) and comorbidity (using the Charlson index of co-morbidity, ranging from 0 to 6+). Results Odds ratios comparing odds of death in patients admitted on first Wednesday in August compared to last Wednesday in July (unadjusted and adjusted*). Discussion Strengths: Large study national, 9 years Only included deaths in hospital Denominator No overlap in care. Limitations: Only looked at those admitted on a single day Our figures equate to just 11 extra deaths per year Short follow up how long does effect last? Patients admitted on the first Wednesday in August have a higher death rate than those admitted on the last Wednesday in July in hospitals in England. There is also a statistically significantly higher death rate for medical patients that was not evident for surgical admissions or patients with malignancy. If this effect is due to the changeover of junior hospital staff, then this has potential implications not only for patient care, but for NHS management approaches to delivering safe care. We suggest further work to look at other measures such as patient safety, quality of care, process measures or medical chart review to identify preventable deaths rather than overall early mortality to further evaluate the effect of junior doctor changeover. Acknowledgements PA, MHJ, AB are employed within the Dr Foster Unit at Imperial College London. The Unit is funded from a research grant from Dr Foster Intelligence (an independent health service research organization). The unit is also affiliated with the CPSSQ at Imperial College Healthcare NHS Trust, which is funded by the NIHR. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript or poster. Dr Foster Unit at Imperial College & Department of Primary Care and Public Health, School of Public Health, Imperial College London, South Kensington Campus, London SW7 2AZ. Department of Medicine, Imperial College London, Chelsea and Westminster Campus, 369 Fulham Road, London SW10 9NH.

20 Interrupted time-series analysis of London stroke services re-organisation Roxana Alexandrescu John Tayu Lee Alex Bottle Paul Aylin Stroke accounts for around 11% of all deaths in England. Most people survive a first stroke, but often have significant morbidity. In England, approximately 110,000 people have a first or recurrent stroke a year, and it is estimated stroke costs the economy around 7 billion per year from which 2.8 billion is a direct cost to the NHS. Prior to 2010, provision of stroke care in London was complex, with care spread across a number of units and only 53% of patients treated on a dedicated stroke ward.1 To improve the quality of service, eight Hyper Acute Stroke Units (HASUs) were established in London from February The units, which are dedicated to treating stroke patients, are open 24 hours, seven days a week to offer immediate access to stroke investigations and imaging, including CT brain scan and clot-busting thrombolysis drugs. Our aim was to assess the impact of the HASU policy using established stroke performance indicators based on national routine hospital administrative data. Methods We used Hospital Episode Statistics (HES) from April 2006 to March 2012 to include a time period before and after the policy introduction. We identified all admissions with a primary diagnosis of stroke in any episode of care based on an ICD-10 disease code of I60, I61, I62, I63 and I64. We examined six indicators defined previously. These were: Brain scan on the day of admission; Thrombolysis treatment; Diagnosis of aspiration pneumonia in the hospital; Seven-day in-hospital mortality; Discharge to usual place of residence within 56 days; and Thirty-day emergency readmission (all causes). We plotted the unadjusted rates for the process and outcome indicators by time (quarter of year). We tested for linear trends pre and post intervention (excluding a six-month intervention period Jan 10 to Jun 10) and for a step change at the time of the intervention for each indicator using an interrupted time series (ITS) negative binomial regression model.. Results During a 6-year period, April 2006 to March 2012, we identified 536,034 stroke admissions to hospitals in England, 61,643 of these (11.5%) being in the London area. Compared with areas outside London, 7 day in-hospital deaths rate reduced significantly following the restructuring of services, as did aspiration pneumonia. However, same day brain scans showed a small but significant reduction following the intervention, as well as a slowing down in the rate of increase. This study suggests that HASU policy was effective in improving the treatment of stroke patients in the London area, the intervention being associated with decreasing in-hospital mortality and decreasing rates of aspiration pneumonia in the post-intervention period. S c a n r a t e % Rates of same day brain scan by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Intervention Area Time (quarter year) England without London London Rates of pneumonia by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Intervention P n 14 e u 12 m o 10 n i 8 a 6 r a 4 t e 2 % Area Time (quarter year) England without London London Rates of discharge to usual place of residence by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Intervention D i 70 s c 60 h a 50 r g 40 e 30 r a 20 t e 10 % Time (quarter year) T h r o m b. r a t e % D e a t h r a t e % R e a d m i s s i o n r a t e % 16 Intervention Rates of thrombolysis by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Area Time (quarter year) England without London London Rates of deaths within 7 days by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Intervention Area Time (quarter year) England without London London Rates of emergency readmission by quarter of year, April March 2012 Reorganisation of stroke services: London Jan.- July Intervention Time (quarter year) Our model also included seasonal effect (dummy variable for each month), patient characteristics including age (six categories: 0-44, 45-54, 55-64, 65-74, and 85 years or over), sex and socioeconomic deprivation status (Carstairs deprivation quintiles). Area England without London Area England without London London London Figure 1. Unadjusted temporal changes for the performance indicators for stroke care by study area Acknowledgements This poster represents independent research supported by NIHR Patient Safety Translational Research Centre. Dr Foster Unit at Imperial College & Department of Primary Care and Public Health, School of Public Health, Imperial College London, South Kensington Campus, London SW7 2AZ. Department of Medicine, Imperial College London, Chelsea and Westminster Campus, 369 Fulham Road, London SW10 9NH.

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,

More information

Big Data Trends A Basis for Personalized Medicine

Big Data Trends A Basis for Personalized Medicine Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees

More information

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM Uncovering Value in Healthcare Data with Cognitive Analytics Christine Livingston, Perficient Ken Dugan, IBM Conflict of Interest Christine Livingston Ken Dugan Has no real or apparent conflicts of interest

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

University Uses Business Intelligence Software to Boost Gene Research

University Uses Business Intelligence Software to Boost Gene Research Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Big Data Hope or Hype?

Big Data Hope or Hype? Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

How To Change Medicine

How To Change Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

MANAGING AND MINING THE LSST DATA SETS

MANAGING AND MINING THE LSST DATA SETS MANAGING AND MINING THE LSST DATA SETS Astronomy is undergoing an exciting revolution -- a revolution in the way we probe the universe and the way we answer fundamental questions. New technology enables

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Taming the Internet of Things: The Lord of the Things

Taming the Internet of Things: The Lord of the Things Taming the Internet of Things: The Lord of the Things Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA Taming the Internet

More information

Maximization versus environmental compliance

Maximization versus environmental compliance Maximization versus environmental compliance Increase use of alternative fuels with no risk for quality and environment Reprint from World Cement March 2005 Dr. Eduardo Gallestey, ABB, Switzerland, discusses

More information

Putting IBM Watson to Work In Healthcare

Putting IBM Watson to Work In Healthcare Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research marty.kohn@us.ibm.com Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or

More information

Pharmacology skills for drug discovery. Why is pharmacology important?

Pharmacology skills for drug discovery. Why is pharmacology important? skills for drug discovery Why is pharmacology important?, the science underlying the interaction between chemicals and living systems, emerged as a distinct discipline allied to medicine in the mid-19th

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

Big Data better business benefits

Big Data better business benefits Big Data better business benefits Paul Edwards, HouseMark 2 December 2014 What I ll cover.. Explain what big data is Uses for Big Data and the potential for social housing What Big Data means for HouseMark

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

From Big Data to Smart Data Thomas Hahn

From Big Data to Smart Data Thomas Hahn Siemens Future Forum @ HANNOVER MESSE 2014 From Big to Smart Hannover Messe 2014 The Evolution of Big Digital data ~ 1960 warehousing ~1986 ~1993 Big data analytics Mining ~2015 Stream processing Digital

More information

GE Global Research. The Future of Brain Health

GE Global Research. The Future of Brain Health GE Global Research The Future of Brain Health mission statement We will know the brain as well as we know the body. Future generations won t have to face Alzheimer s, TBI and other neurological diseases.

More information

2019 Healthcare That Works for All

2019 Healthcare That Works for All 2019 Healthcare That Works for All This paper is one of a series describing what a decade of successful change in healthcare could look like in 2019. Each paper focuses on one aspect of healthcare. To

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

Five Essential Components for Highly Reliable Data Centers

Five Essential Components for Highly Reliable Data Centers GE Intelligent Platforms Five Essential Components for Highly Reliable Data Centers Ensuring continuous operations with an integrated, holistic technology strategy that provides high availability, increased

More information

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

can you effectively plan for the migration and management of systems and applications on Vblock Platforms? SOLUTION BRIEF CA Capacity Management and Reporting Suite for Vblock Platforms can you effectively plan for the migration and management of systems and applications on Vblock Platforms? agility made possible

More information

Personalized medicine in China s healthcare system

Personalized medicine in China s healthcare system Personalized medicine in China s healthcare system Jingmin Kan, Sam Linsen Netherlands office for Science and Technology, Guangzhou and Shanghai, China Content PERSONALIZED MEDICINE 2 FOCUS AT THE INDIVIDUAL

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Data Driven Discovery In the Social, Behavioral, and Economic Sciences Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008 Embedded Systems in Healthcare Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008 About the Speaker Working for Philips Research since 1982 Projects

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

The Rise of Industrial Big Data

The Rise of Industrial Big Data GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial

More information

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London Building a Collaborative Informatics Platform for Translational Research: An IMI Project Experience Prof. Yike Guo Department of Computing Imperial College London Living in the Era of BIG Big Data : Massive

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

SYSTEMS, CONTROL AND MECHATRONICS

SYSTEMS, CONTROL AND MECHATRONICS 2015 Master s programme SYSTEMS, CONTROL AND MECHATRONICS INTRODUCTION Technical, be they small consumer or medical devices or large production processes, increasingly employ electronics and computers

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE IT organizations must store exponentially increasing amounts of data for long periods while ensuring its accessibility. The expense of keeping

More information

Digital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader.

Digital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader. 1 Digital Catapult The impact of Big Data in a Connected Digital Economy Future of Healthcare Mark Wall Big Data & Analytics Leader March 12 2014 Catapult is a Technology Strategy Board programme Agenda

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Learning from Big Data in

Learning from Big Data in Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ From traditional astronomy 2 to Big Data

More information

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi Judith Hurwitz President and CEO Sponsored by Hitachi Introduction Only a few years ago, the greatest concern for businesses was being able to link traditional IT with the requirements of business units.

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Big Data and Healthcare

Big Data and Healthcare Big Data and Healthcare Dr. George Poste Chief Scientist, Complex Adaptive Systems Initiative and Del E. Webb Chair in Health Innovation Arizona State University george.poste@asu.edu www.casi.asu.edu Panel

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

More information

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials Pharmaceutical leader deploys TIBCO Spotfire enterprise analytics platform across its drug discovery organization

More information

EL Program: Smart Manufacturing Systems Design and Analysis

EL Program: Smart Manufacturing Systems Design and Analysis EL Program: Smart Manufacturing Systems Design and Analysis Program Manager: Dr. Sudarsan Rachuri Associate Program Manager: K C Morris Strategic Goal: Smart Manufacturing, Construction, and Cyber-Physical

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Functional Data Analysis of MALDI TOF Protein Spectra

Functional Data Analysis of MALDI TOF Protein Spectra Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF

More information

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

More information

Data Center Infrastructure Management. optimize. your data center with our. DCIM weather station. Your business technologists.

Data Center Infrastructure Management. optimize. your data center with our. DCIM weather station. Your business technologists. Data Center Infrastructure Management optimize your data center with our DCIM weather station Your business technologists. Powering progress Are you feeling the heat of your data center operations? Data

More information

Delivering real Information Management solutions that result in better, more efficient healthcare.

Delivering real Information Management solutions that result in better, more efficient healthcare. Delivering real Information Management solutions that result in better, more efficient healthcare. For more than 30 years, we have helped companies overcome challenges and identify opportunities to achieve

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Course Requirements for the Ph.D., M.S. and Certificate Programs

Course Requirements for the Ph.D., M.S. and Certificate Programs Health Informatics Course Requirements for the Ph.D., M.S. and Certificate Programs Health Informatics Core (6 s.h.) All students must take the following two courses. 173:120 Principles of Public Health

More information

EMC PERSPECTIVE. The Private Cloud for Healthcare Enables Coordinated Patient Care

EMC PERSPECTIVE. The Private Cloud for Healthcare Enables Coordinated Patient Care EMC PERSPECTIVE The Private Cloud for Healthcare Enables Coordinated Patient Care Table of Contents A paradigm shift for Healthcare IT...................................................... 3 Cloud computing

More information

Biomedical Informatics Applications, Big Data, & Cloud Computing

Biomedical Informatics Applications, Big Data, & Cloud Computing Biomedical Informatics Applications, Big Data, & Cloud Computing Patrick Widener, PhD Assistant Professor, Biomedical Engineering Senior Research Scientist, Center for Comprehensive Informatics Emory University

More information

WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective

WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective WHITE PAPER OCTOBER 2014 Unified Monitoring A Business Perspective 2 WHITE PAPER: UNIFIED MONITORING ca.com Table of Contents Introduction 3 Section 1: Today s Emerging Computing Environments 4 Section

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

How To Use Predictive Analytics To Improve Health Care

How To Use Predictive Analytics To Improve Health Care Unlocking the Value of Healthcare s Big Data with Predictive Analytics Background The volume of electronic data in the healthcare industry continues to grow. Adoption of electronic solutions and increased

More information

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Data Isn't Everything

Data Isn't Everything June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Automotive Applications of 3D Laser Scanning Introduction

Automotive Applications of 3D Laser Scanning Introduction Automotive Applications of 3D Laser Scanning Kyle Johnston, Ph.D., Metron Systems, Inc. 34935 SE Douglas Street, Suite 110, Snoqualmie, WA 98065 425-396-5577, www.metronsys.com 2002 Metron Systems, Inc

More information

Image Area. View Point. Medical Imaging. Advanced Imaging Solutions for Diagnosis, Localization, Treatment Planning and Monitoring. www.infosys.

Image Area. View Point. Medical Imaging. Advanced Imaging Solutions for Diagnosis, Localization, Treatment Planning and Monitoring. www.infosys. Image Area View Point Medical Imaging Advanced Imaging Solutions for Diagnosis, Localization, Treatment Planning and Monitoring www.infosys.com Over the years, medical imaging has become vital in the early

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk. White paper: Unlocking the potential of load testing to maximise ROI and reduce risk. Executive Summary Load testing can be used in a range of business scenarios to deliver numerous benefits. At its core,

More information

Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Big Data Analytics in Health Care

Big Data Analytics in Health Care Big Data Analytics in Health Care S. G. Nandhini 1, V. Lavanya 2, K.Vasantha Kokilam 3 1 13mss032, 2 13mss025, III. M.Sc (software systems), SRI KRISHNA ARTS AND SCIENCE COLLEGE, 3 Assistant Professor,

More information

Extended abstract: Model-based computer-aided framework for design of process monitoring and analysis systems

Extended abstract: Model-based computer-aided framework for design of process monitoring and analysis systems Extended abstract: Model-based computer-aided framework for design of process monitoring and analysis systems Summary In chemicals based product manufacturing, as in pharmaceutical, food and agrochemical

More information