ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

Size: px
Start display at page:

Download "ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry."

Transcription

1 Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

2 From the Vantage Point What is in for me? Use Case: How Pharmaceutical Companies Can Leverage Big Data to Shorten Drug Development Cycles Digital Microscopy: What is it and what are its challenges? MapReduce and the Building Blocks for Solving the Business Problem The Solution Architecture Key Features Flow Chart Business Opportunities Conclusion Contents

3 Utilizing MapReduce to address Big Data Enterprise Needs Persistent From the Vantage Point According to IDC, digital content will grow to 2.7 zettabytes (ZB), up 48% from Over 90% of this information will be unstructured (e.g., images, videos, MP3 files, and files based on social media and Web-enabled workloads) full of rich information, but challenging to understand and analyze*. With this exponentially growing data, enterprises are struggling with information overload and are turning to Big Data technologies to address the challenge of transforming data into opportunity. This is especially critical in the pharmaceutical industry where shorter time to market can make a significant difference in patient s lives, academic research and the healthcare community at large. Inthisyou willfind Use cases of histology (microscopy) data which exemplify the need for an enterprise software platform to upload, store, visualize and analyze enormous amounts of data in a high performance environment. This document explains the architecture and need for a unified technology platform which makes managing, storing, processing and analyzing Big Data faster and more efficient. We introduce the HighPerformance(GPU)&Cloud ComputingEnterpriseSolution using the MapReduce Paradigm which is designed to solve relevant workflows and provide new insights into the increasing data available during different process stages. Due to the strategic nature of the data, high computing capabilities act as the foundation for other applications like Business Intelligence and Dashboards; and in turn help in critical business decision making. Thanks to major IT trends such as GPGPU (General Purpose Graphic Processors Units), Hadoop (MapReduce) and the Cloud, we are able to introduce this solution. The High Performance (GPU) & Cloud Computing Enterprise Solution can be applied to a wide range of areas and industries. This solution is industry agnostic and is beneficial to verticals such as telecom, retail, pharmaceutical, banking, financial services and insurance industries which not only generate huge amounts of data, but also need to process this data to ensure continuous growth and performance. This solution: Handles huge amount of data Provides data analytics in a parallel computing manner Enables business solution Generates critical reports solving underlying business problems Here we provide an example of Pharma-specific paradigm that can easily be extended to other industries. For example, there are underlying similarities in the following case story and personalized healthcare; they both require collecting, managing and processing valuable medical information such as electronic health records, blood samples and DNA sequencing data to allow physicians to make the best recommendations possible to patients. 3

4 Persistent Utilizing MapReduce to address Big Data Enterprise Needs What is in for me? Faced with an ever increasing amount of data, you will learn how to leverage the MapReduce concept to manage, analyze and utilize latest IT trends such as the evolution of Graphical Processing Units (GPU) deployed on a Cloud Infrastructure to gain enormous compute facilities which together with a well-defined distributed processing layer can handle data and bring out the intelligence hidden within. At a high level, we will: Chart out architectural decisions to be made by an enterprise system leveraging the latest trends in Cloud Computing, Distributed Processing, Mobility, Collaboration and High Performance Computing Show how HighPerformance(GPU)&CloudComputingEnterpriseSolutioncreates value for its stakeholders (i.e. Business Head, VP Engineering, Architect and Solution Specialist) Even though this focuses on a Pharmaceutical use case, the information herein is helpful to any stakeholders involved with an enterprise implementation executives, technical consultants, business analysts, users, and implementation partners, particularly those responsible for the overall success of the systems requiring high performance computing handling enormous data that needs to be processed in close to real time Persistent Systems Ltd. All rights reserved.

5 Utilizing MapReduce to address Big Data Enterprise Needs Persistent Use Case: How Pharmaceutical Companies Can Leverage Big Data to Shorten Drug Development Cycles Data explosion occurred early on in the Life Sciences and Healthcare sector. The industry has recognized the need for technologies that could mine, analyze and translate the archipelago of data from the Human Genome Project into specific therapeutic drug targets. The deepening of the R&D productivity crisis that characterizes today s pharmaceutical development pipelines, requires the industry to validate and predict the clinical attributes of a drug earlier in the product lifecycle. Predictive modelling technologies and other BI tools are helping the Pharma industry reduce the attrition rate of their drug pipelines. Translational medicine aims to address the imbalance between the number of disease targets and therapeutic agents and enrich the drug pipeline by allowing scientists and clinicians to make associations between drug and disease earlier in the drug development process. Toxicology departments of Pharmaceutical companies utilize translational medicine and integrate technologies earlier in the product life cycle to predict benign as well as harmful effects of chemicals during the developmental phase. Digital Microscopy: What is it and what are its challenges? There is an critical need in the preclinical toxicology departments of Pharmaceutical companies to retrieve, digitize and store all the histological slides of the various organs from different animal models. Currently most of this analysis happens manually and is the most time consuming portion of drug development, especially in generic drug development. 5

6 Persistent Utilizing MapReduce to address Big Data Enterprise Needs Pharmaceutical companies are moving from slides to digitalmicroscopy thanks to the advent of scanners with high throughput. But this adds up to enormous amount of digital data (as slides are stored at various resolutions from 5X to 100X ranging 1.6GB to 20 GB per image) for regulatory and analysis purposes. Automated analysis involves segmentation, image processing and classification engines to aid in supervised feature detection and reporting. Moreover the image and associated analytical procedures need to be archived and maintained post necropsy for regulatory reasons. The problem is further compounded by the fact that feature identification and its corresponding abnormal effect due to a chemical has to be cross-verified across various organs on animal models to make correct histo-pathological analysis. DigitalMicroscopyApplication each image in various resolution (5X to 100X) along with its thumbnail data format and store when available generated catalogue to a central location & Algorithms process Data classified engine separating normal tissues vs. abnormal tissues various images, add manual annotations & do peer reviews The amount of digital data created by Digital Microscopy is enormous and the datasets created in this business case are around 2-3 TB. This kind of dataset would previously have been very challenging and expensive to take on with a traditional RDBMS using standard bulk load and ETL approaches. The solution to this problem needs to efficiently combine multiple data sources such as multiple sites across several countries simultaneously or data residing on multiple machines (often dozens). MapReduce platforms handle this effectively by using a distributed file system that's specifically designed to handle datasets residing across distributed server farms. Distributed file systems should also be fault resilient and not impose the requirement of RAID drives on individual nodes. One scanner handles around 300 slides at a time generating 0.5 TB data for each digital scanning High Performance Computing (GPU), Cloud and MapReduce analytics technologies enable study of larger quantities of biological data, with a higher precision and in shorter periods of time, ultimately helping to accelerate advancements in personalized medicine. These technologies are expected to be applied to molecular medicine, pharmaceuticals, biomedicine, and industrial biotechnologies Persistent Systems Ltd. All rights reserved.

7 Utilizing MapReduce to address Big Data Enterprise Needs Persistent MapReduce and the Building Blocks for Solving the Business Problem The diagram below highlights the building blocks for solving the business problem associated with managing data created by Digital Microscopy The Service Platform depicts the middleware in a Pharma organization. The platform is augmented by a data acquisition component in order to acquire OEM vendor specific data format ensuring the data compression and security requirements are handled at the network layer. Each lab has its corresponding workflow requirements orchestrating the data analysis procedures based on its own study and biomedical requirements. The Reviewing/Monitoring and Reporting component helps technicians/physicians across sites to validate the decision making process and ensures that patient and insurance agencies receive well documented reports utilizing healthcare protocols. Generic Solution Platform for a Pharmaceutical Company 2015 Persistent Systems Ltd. All rights reserved. 7

8 Persistent Utilizing MapReduce to address Big Data Enterprise Needs MapReduce Component The open source implementation of MapReduce called Hadoop provides a distributed file system through a MapReduce component. This component processes data from multiple inputs (creating the "map"), and then reduces it using an image processing function specific to a given organ (as defined in the workflow component) which will distill and extract the desired results. The MapReduce component is planned to scale over thousands of nodes and tends to have high latency. GPGPU enabled nodes allow MapReduce nodes to perform processing of large volumes in write-once data format. Hadoop provides efficient data file processing across various organs and across various sites. This enables distributed data processing without forcing data to be collected and processed in a central location. Compute Unified Device Architecture (CUDA) helps to parallelize program in the second level when the MapReduce framework is regarded as the first level parallelization. MapReduce MapReduce is a simple yet very powerful method for processing and analyzing extremely large data sets, reaching up to the multi-petabyte level Algorithm Component This represents the implementation of learning algorithms based on organ histology as defined by pathologist for a given animal model. Graphics processors based implementation will surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. It is presumed that GPU based machine learning task will be designed keeping constraints of instruction types and memory accesses for GPU architecture Persistent Systems Ltd. All rights reserved.

9 Utilizing MapReduce to address Big Data Enterprise Needs Persistent The Solution Architecture The solution proposes the usage of Enterprise Information Architecture modelled on High Performance Computing grid utilizing GPU nodes on Hadoop MapReduce Architecture Persistent Systems Ltd. All rights reserved. 9

10 Persistent Utilizing MapReduce to address Big Data Enterprise Needs Key Features Persistent Systems Ltd. All rights reserved.

11 Utilizing MapReduce to address Big Data Enterprise Needs Persistent Flow Chart Digital microscopy with help of scanners to generate the image Automated analysis involving image processing, segmentation and classification engines to aid in supervised feature detection Feature identification and its corresponding abnormal effect due to a chemical, is verified across various organs through a conclusion engine on animal models to make correct histopathological analysis 11

12 Persistent Utilizing MapReduce to address Big Data Enterprise Needs Business Opportunities Due to security and data constraints across sites, the case study solution is an on-premise solution for the Pharmaceutical companies. However enterprises in other industries are now willing to move data into private-public hybrid cloud environments once the de-identification and compliance requirements are fulfilled. Very low cost commodity hardware can be used to power MapReduce clusters since redundancy and fault resistance are built into the software platform offering an alternative to an expensive enterprise hardware or software with proprietary solutions. A public-hybrid cloud solution based on OpenStack can be commercialized for this purpose. This makes it easier to add more capacity (and therefore scale) making the above solution/platform an affordable and very granular way to scale out instead of up. $ With public cloud vendors providing options to choose GPU capabilities and computational power to the node level, processing enormous amount of data becomes feasible. It also enables companies to carry out detailed analysis of business data that would take too long or would be too expensive to carry out using a traditional RDBMS. The ability to take mountains of inbound or existing business data, spread the work over a large distributed cloud, add structure (workflow and GPU power), and import the result into an RDBMS makes this solution very generic across various industries. Many organizations already have proven code that is tested and hardened and ready to use but is limited without an enabling framework. The above enterprise platform solution depicted along with a mature distributed computing layer can transition these assets to a much larger and more powerful environment Persistent Systems Ltd. All rights reserved.

13 Utilizing MapReduce to address Big Data Enterprise Needs Persistent Conclusion Graphical processors have emerged as a commodity platform for parallel computation. However the development team needs knowledge of GPU architecture and effort in tuning the performance. The High Performance (GPU) & Cloud Computing Enterprise Solution uses a GPU based MapReduce implementation which is scaled over public and private cloud. The platform will be extended in the future with data mining capabilities to utilize datasets shared across private and public domain. Already, some academic implementations (i.e. Stanford s MapReduce framework on graphics processors (MARs)) have been proven to be successful. Amazon.com is working aggressively to convert the same architecture into commercially available solutions. Taking projects like MARs, expanding network fabric (over public and private cloud) and adding more power at the node level through GPU will allow solutions to be used across different industries. In the future, projects will see various levels of maturity where operations will monitor power, cooling (example GreenHDFS) and the reliability of a problem and automatically orchestrate the components of a class of service based on accounting or rating rules. Rules will process the problem across a portfolio of sensors, network fabrics, storage fabrics, desktop and servers. Massively parallel methods and building supervised engines to intelligently process the data will help resolve Big Data problems across industries. Surpassing the computational capabilities of multicore CPUs, modern graphics processors will revolutionize the applicability of deep unsupervised learning methods Persistent Systems Ltd. All rights reserved. 13

14 Persistent Utilizing MapReduce to address Big Data Enterprise Needs About Persistent Systems Persistent Systems (BSE & NSE: PERSISTENT) builds software that drives our customers' business; enterprises and software product companies with software at the core of their digital transformation. For more information, please visit: India Persistent Systems Limited Bhageerath, 402, Senapati Bapat Road Pune Tel: +91 (20) Fax: +91 (20) USA Persistent Systems, Inc Laurelwood Road, Suite 210 Santa Clara, CA Tel: +1 (408) Fax: +1 (408) Persistent Systems Ltd. All rights reserved.

15 Utilizing MapReduce to address Big Data Enterprise Needs Persistent References IDC Predictions 2012: Competing for 2020 (Doc #231720) Architectural Description of Component-Based Systems - David Garlan et.al., white paper, research community project at Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF Generation of Component Based Architecture from Business Processes: Model Driven Engineering for SOA Dahman, K. et.al IEEE 8th European Conference Hadoop-GIS: a High Performance Query System for Analytical Medical Imaging - interdisciplinary biomedical research, which accelerates the diagnosis and understanding of brain tumor for better cure. J. Dean and S. Ghemawat: MapReduce: Simplified data processing on large clusters. OSDI CUDA - Shubin Zhang et. al.: SJMR:Parallelizing Spatial Join with MapReduce on Clusters. IEEE Clusters Computing. 15

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Galaxy BI Consulting Services. Listening to Business, Applying Technology

Galaxy BI Consulting Services. Listening to Business, Applying Technology Galaxy BI Consulting Services Listening to Business, Applying Technology Who we are Incorporated in 1987. An ISO 9000:2008 organization. Amongst the most respected Information Technology Integrators. Leading

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Big Data Trends A Basis for Personalized Medicine

Big Data Trends A Basis for Personalized Medicine Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Apache Hadoop Patterns of Use

Apache Hadoop Patterns of Use Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when

More information

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

for Oil & Gas Industry

for Oil & Gas Industry Wipro s Upstream Storage Solution for Oil & Gas Industry 1 www.wipro.com/industryresearch TABLE OF CONTENTS Executive summary 3 Business Appreciation of Upstream Storage Challenges...4 Wipro s Upstream

More information

Streamlining the Process of Business Intelligence with JReport

Streamlining the Process of Business Intelligence with JReport Streamlining the Process of Business Intelligence with JReport An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) Product Summary from 2014 EMA Radar for Business Intelligence Platforms for Mid-Sized Organizations

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013 SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI May 2013 SAP s Strategic Focus on Business Intelligence Core Self-service Mobile Extreme Social Core for innovation Complete

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

Powering Cutting Edge Research in Life Sciences with High Performance Computing

Powering Cutting Edge Research in Life Sciences with High Performance Computing A Point of View Powering Cutting Edge Research in Life Sciences with High Performance Computing High performance computing (HPC) is the foundation of pioneering research in life sciences. HPC plays a vital

More information

Integrating Big Data into Business Processes and Enterprise Systems

Integrating Big Data into Business Processes and Enterprise Systems Integrating Big Data into Business Processes and Enterprise Systems THOUGHT LEADERSHIP FROM BMC TO HELP YOU: Understand what Big Data means Effectively implement your company s Big Data strategy Get business

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Building a Scalable Big Data Infrastructure for Dynamic Workflows Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Integrate Big Data into Business Processes and Enterprise Systems. solution white paper

Integrate Big Data into Business Processes and Enterprise Systems. solution white paper Integrate Big Data into Business Processes and Enterprise Systems solution white paper THOUGHT LEADERSHIP FROM BMC TO HELP YOU: Understand what Big Data means Effectively implement your company s Big Data

More information

From Data to Foresight:

From Data to Foresight: Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

IO Informatics The Sentient Suite

IO Informatics The Sentient Suite IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

MicroStrategy Cloud Reduces the Barriers to Enterprise BI...

MicroStrategy Cloud Reduces the Barriers to Enterprise BI... MicroStrategy Cloud Reduces the Barriers to Enterprise BI... MicroStrategy Cloud reduces the traditional barriers that organizations face when implementing enterprise business intelligence solutions. MicroStrategy

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment firms turn big data into actionable research

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Understanding Object Storage and How to Use It

Understanding Object Storage and How to Use It SWIFTSTACK WHITEPAPER An IT Expert Guide: Understanding Object Storage and How to Use It November 2014 The explosion of unstructured data is creating a groundswell of interest in object storage, certainly

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Extend your analytic capabilities with SAP Predictive Analysis

Extend your analytic capabilities with SAP Predictive Analysis September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Big Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group

Big Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data in Enterprise challenges & opportunities Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data Phenomenon 1.8ZB in 2011 2 Days > the dawn of civilization to 2003 750M Photos

More information

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is

More information

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data 101: Harvest Real Value & Avoid Hollow Hype Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

Evolution from Big Data to Smart Data

Evolution from Big Data to Smart Data Evolution from Big Data to Smart Data Information is Exploding 120 HOURS VIDEO UPLOADED TO YOUTUBE 50,000 APPS DOWNLOADED 204 MILLION E-MAILS EVERY MINUTE EVERY DAY Intel Corporation 2015 The Data is Changing

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

Investor Presentation. Second Quarter 2015

Investor Presentation. Second Quarter 2015 Investor Presentation Second Quarter 2015 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

The Impact of PaaS on Business Transformation

The Impact of PaaS on Business Transformation The Impact of PaaS on Business Transformation September 2014 Chris McCarthy Sr. Vice President Information Technology 1 Legacy Technology Silos Opportunities Business units Infrastructure Provisioning

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge

More information

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015 Data Storage At the Heart of any Information System Ken Claffey, VP/GM - June 2015 Seagate: A Unique Vantage Point on the Data Centre Evolution of the world s digital information End-to-end cloud solutions:

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

NetApp Big Content Solutions: Agile Infrastructure for Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data

More information

GRIDS IN DATA WAREHOUSING

GRIDS IN DATA WAREHOUSING GRIDS IN DATA WAREHOUSING By Madhu Zode Oct 2008 Page 1 of 6 ABSTRACT The main characteristic of any data warehouse is its ability to hold huge volume of data while still offering the good query performance.

More information