AEGLE: a reference big data architecture for the healthcare sector
|
|
- Douglas May
- 7 years ago
- Views:
Transcription
1 AEGLE: a reference big data architecture for the healthcare sector jos.dumortier@timelex.eu Amsterdam, 7 June 2016
2 Introduction Horizon 2020 Innovation Action Partners from Belgium, France, Greece, Italy, Netherlands, Portugal, Sweden & UK Started in March 2015 (42 months) Tuesday, 7 June
3 Tuesday, 7 June
4 Objective Reference big-data architecture Covering a large part of the health spectrum Malignant chronic diseases Non-malignant chronic diseases Acute care jos.dumortier@timelex.eu Tuesday, 7 June
5 Access control rules Common representation of data Analytic results to decision makers AEGLE Inventory of available data Syntax structure and semantics Integrated data analysis Informed based decisions AEGLE Principles Alignment with the data value chain Users Stakeholders Providers Hospitals Specialists Patients Researchers DII ICU CLL collect prepare organize integrate analyze visualize decide Product Oriented Research Oriented Tuesday, 7 June
6 Users Challenges Understand the exploitation perspective of real-life big bioclinical data for diverse use cases Provide a framework to accommodate diverse big bioclinical data management and analytics requirements according to the data type and the application domain Enable scientific question answering by exploiting big bioclinical data in a way not possible until now Provide an adaptable and user-friendly working end-user environment Rely on clear and comprehensive end-user agreements for service provision jos.dumortier@timelex.eu Tuesday, 7 June
7 Technical Challenges Address the requirements posed by different types of big bioclinical data: Biomolecular and clinical data (the CLL use case) Real-time streaming and clinical events data (the ICU use case) Large observational healthcare databases (the DII use case) Implement the necessary mechanisms for comprehensive data management and analytics, fully-compliant with privacy, legal and ethical norms (e.g. anonymisation, policies, etc.) Provide efficient response time (when needed) via acceleration technologies Offer a scalable & sustainable IT solution jos.dumortier@timelex.eu Tuesday, 7 June
8 Business Challenges Create sustainable business models taking into account the needs of the three scenarios to which AEGLE is applied Accurately identify the exploitable items Define a strategy for the long-term viability of the platform Create an ecosystem of stakeholders jos.dumortier@timelex.eu Tuesday, 7 June
9 AEGLE Cases Tuesday, 7 June
10 Intensive Care Unit (ICU) Challenges Mechanical ventilation & patient-ventilator interactions Personalization of Patient Care & Early Identification of Deterioration Questions to be answered Ineffective Efforts (IE) characteristics Recognition, Incidence, Significance, Prediction of IE Tools to guide and monitor nutrition Tools to identify early deteriorating trends Tuesday, 7 June
11 Chronic Lymphocytic Leukemia (CLL) Challenges Clinically and biologically heterogeneous Optimal care and treatment decisions depend on the integration of tumor- and host-derived variables Questions to be answered Identification of novel prognostic markers Prediction model for Monoclonal B Lymphocytosis evolution Prediction model for CLL natural course jos.dumortier@timelex.eu Tuesday, 7 June
12 Type 2 Diabetes (T2DM) Challenges Long term condition Increasing in prevelance Increasing in morbidity and mortality Improving disease management Questions to be answered Define why some cases do better than others prognostic indicators Improve patient outcomes methods of intervention Define accurate cohort and feasibility for doing clinical trial Identify potential points for intervention and types of intervention available jos.dumortier@timelex.eu Tuesday, 7 June
13 Where we stand now? User-centered design approach for developing a big bioclinical data analytics platform First release of the AEGLE system architecture Rapid prototyping for proof-of-concept illustration and user engagement: data, analytics and data management mechanisms already in place Beginning of first validation phase Initial legal and ethical assessment First steps taken on the business landscape for AEGLE jos.dumortier@timelex.eu Tuesday, 7 June
14 SQOOP data transfer to HDFS AEGLE Big Data Framework Software Stack WebHDFS HDFS REST API RM RT YARN REST API LIVY SPARK REST API HIVE hadoop sql api PIG scripting workflow mgnt Pydoop Pyhton hadoop api SPARK SQL sql api for spark MLlib machine learning library HADOOP MAPREDUCE distributed engine for batch jobs processing SPARK distributed engine for fast in-memory processing YARN virtual cluster resource manager HDFS2 virtual cluster distributed file system VM Node VM Node VM Node VM Node VM Node VM Node VM Node Tuesday, 7 June
15 Big Data Framework Features Dockerized Hadoop/SPARK cluster Dockerized MySQL cluster REST APIs for web based integration Platform independent automatic deployment Current deployment: 1 Master node, 2 SQL nodes, 4 Data ~okeanos 8 node Hadoop/SPARK ~okeanos jos.dumortier@timelex.eu Tuesday, 7 June
16 Legal Challenges Identify and incorporate all ethical and regulatory issues underlying the realization of the project aims Contribute in the definition of a common ethical and regulatory framework for big bioclinical data management and analytics at the European level jos.dumortier@timelex.eu Tuesday, 7 June
17 Two legal perspectives 1. Short term: legal compliance of research & innovation activities in the scope of the AEGLE action (all phases of the data value chain) 2. Long term: legal framework for the AEGLE start-up (and for other European big data initiatives in the health sector) 17
18 1. Short term Focus on compliance of AEGLE RIA with current data protection law ( avoid doing something illegal ) Core provision (Directive 95/46, article 6.1(b)): Further processing of data for historical, statistical or scientific purposes shall not be considered as incompatible provided that Member States provide appropriate safeguards AEGLE: further processing of clinical data for scientific purposes Main question to examine: which appropriate safeguards of (which) Member States have to be taken into account? (example: need of approval of DPA if re-use is not based on patient consent) 18
19 2. Longer term: EU legal framework for big data processing in the health sector Context: AEGLE start-up Initial legal framework: General Data Protection Regulation Core provision: art. 6.1(b) juncto art. 83 Objective: analyse the (current/developing) legal situation in the 28 Member States + recommend possible EU initiatives 19
20 EU Data Protection Regulation 2016/1679 Art. 6.1(b) : [Personal data must be] collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes; further processing of personal data for archiving purposes in the public interest, or scientific and historical research purposes or statistical purposes shall, in accordance with Article 83(1), not be considered incompatible with the initial purposes; Art : Where personal data are processed for scientific, statistical or historical purposes Union or Member State law may, subject to appropriate safeguards for the rights and freedoms of the data subject, provide for derogations from Articles 14a(1) and (2), 15, 16, 17, 17a, 17b, 18 and 19, insofar as such derogation is necessary for the fulfilment of the specific purposes (...) Art : The appropriate safeguards referred to in paragraphs 1 and 1a shall be laid down in Union or Member State law and be such to ensure that technological and/or organisational protection measures pursuant to this Regulation are applied to the personal data ( ), to minimise the processing of personal data in pursuance of the proportionality and necessity principles, such as pseudonymising the data, unless those measures prevent achieving the purpose of the processing and such purpose cannot be otherwise fulfilled within reasonable means. 20
21 Co-funded by the Horizon 2020 Framework Programme of the European Union under Grant Agreement nº Partners EXUS AE (Coordinator), ICCS, KINGSTON, CERTH, Maxeler Tecnologies Limited, UPPSALA UNIVERSITET, UNISR, Time.Lex, EUR, CHS, LOBA, PAGNI, GNUBILA FRANCE
Reconfigurable Computing for Analytics Acceleration of Big Bio-Data
Reconfigurable Computing for Analytics Acceleration of Big Bio-Data The AEGLE Approach Introduction Nowadays, there is an obvious gap in the area of big data analytics for Health Bio-data. Data-driven
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationPilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu
More informationSEC-19-BES-2016: Data fusion for maritime security applications
SEC-19-BES-2016: Data fusion for maritime security applications Research at EXUS 19 on-going projects 12 as coordinator Security Group Current projects / Key people SECURITY OF INFRASTRUCTURES AND UTILITIES
More informationHealthcare Coalition on Data Protection
Healthcare Coalition on Data Protection Recommendations and joint statement supporting citizens interests in the benefits of data driven healthcare in a secure environment Representing leading actors in
More informationScience Europe Position Statement. On the Proposed European General Data Protection Regulation MAY 2013
Science Europe Position Statement On the Proposed European General Data Protection Regulation MAY 2013 Science Europe Position Statement on the Proposal for a Regulation of the European Parliament and
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationData Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
More informationHadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationTech Note. TrakCel in the wider Clinical Ecosystem: Accelerating Integration and Automation
TrakCel in the wider Clinical Ecosystem: Accelerating Integration and Automation Tech Note Sharing information among Clinical systems can have a very positive effect on patient outcomes, regulatory compliance
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationSALUS: Enabling the Secondary Use of EHRs for Post Market Safety Studies
SALUS: Enabling the Secondary Use of EHRs for Post Market Safety Studies May 2015 A. Anil SINACI, Deputy Project Coordinator SALUS: Scalable, Standard based Interoperability Framework for Sustainable Proactive
More informationHYPER-CONVERGED INFRASTRUCTURE STRATEGIES
1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
More informationWhy Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationG-Cloud Big Data Suite Powered by Pivotal. December 2014. G-Cloud. service definitions
G-Cloud Big Data Suite Powered by Pivotal December 2014 G-Cloud service definitions TABLE OF CONTENTS Service Overview... 3 Business Need... 6 Our Approach... 7 Service Management... 7 Vendor Accreditations/Awards...
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationPRIME DIMENSIONS. Revealing insights. Shaping the future.
PRIME DIMENSIONS Revealing insights. Shaping the future. Service Offering Prime Dimensions offers expertise in the processes, tools, and techniques associated with: Data Management Business Intelligence
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationUnlocking the True Value of Hadoop with Open Data Science
Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big
More informationThe Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale
The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationOverview of the EHR4CR project Electronic Health Record systems for Clinical Research
Overview of the EHR4CR project Electronic Health Record systems for Clinical Research Dipak Kalra UCL on behalf of the EHR4CR Consortium ENCePP Plenary Meeting, 3rd May 2012, London The problem (as addressed
More informationBuilding a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering Self Service at scale 6 5 4 3 2 1 ? Relational? MPP? Hadoop? Linkedin data 350M Members 25B 3.5M 4.8B 2M
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationSolving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF
Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF Solving performance and data protection problems with active-active Hadoop Many Hadoop deployments are not realizing
More informationFITMAN Future Internet Enablers for the Sensing Enterprise: A FIWARE Approach & Industrial Trialing
FITMAN Future Internet Enablers for the Sensing Enterprise: A FIWARE Approach & Industrial Trialing Oscar Lazaro. olazaro@innovalia.org Ainara Gonzalez agonzalez@innovalia.org June Sola jsola@innovalia.org
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationTrusted Personal Data Management A User-Centric Approach
GRUPPO TELECOM ITALIA Future Cloud Seminar Oulu, August 13th 2014 A User-Centric Approach SKIL Lab, Trento - Italy Why are we talking about #privacy and #personaldata today? 3 Our data footprint Every
More informationBIG DATA & DATA SCIENCE
BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationBig Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationBig Data must become a first class citizen in the enterprise
Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationWHITE PAPER. Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets INTELLIGENT BUSINESS STRATEGIES
INTELLIGENT BUSINESS STRATEGIES WHITE PAPER Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets By Mike Ferguson Intelligent Business Strategies June 2015 Prepared for: Table
More informationShark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
More informationBig Picture of Big Data Software Engineering With example research challenges
Big Picture of Big Data Software Engineering With example research challenges Nazim H. Madhavji, UWO, Canada Andriy Miranskyy, Ryerson U., Canada Kostas Kontogiannis, NTUA, Greece madhavji@gmail.com avm@ryerson.ca
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationInternational collaboration to understand the relevance of Big Data for official statistics
Statistical Journal of the IAOS 31 (2015) 159 163 159 DOI 10.3233/SJI-150889 IOS Press International collaboration to understand the relevance of Big Data for official statistics Steven Vale United Nations
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationArchitectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
More informationInteractive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationOracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationBIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum
Big Data Analysis John Domingue (STI International and The Open University) Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 257943) 1 The Data landscape
More informationINFORMATION GOVERNANCE STRATEGY NO.CG02
INFORMATION GOVERNANCE STRATEGY NO.CG02 Applies to: All NHS LA employees, Non-Executive Directors, secondees and consultants, and/or any other parties who will carry out duties on behalf of the NHS LA.
More informationRoadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
More informationIntegrating Medical and Research Information: a Big Data Approach
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationHADOOP BIG DATA DEVELOPER TRAINING AGENDA
HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts
More informationWhite paper: Delivering Business Value with Apache Mesos
Executive Summary In today s business environment, time to market is critical as we are more reliant on technology to meet customer needs. Traditional approaches to solving technology problems are failing
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationStandards for Big Data in the Cloud
Standards for Big Data in the Cloud James Kobielus Chair, CSCC Big Data Working Group Big Data Evangelist, Senior Program Director, Product Marketing, Big Data Analytics, IBM jgkobiel@us.ibm.com 15 October
More informationBig Data Use Case: Business Analytics
Big Data Use Case: Business Analytics Starting point A telecommunications company wants to allude to the topic of Big Data. The established Big Data working group has access to the data stock of the enterprise
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More informationBig Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationSynthetic Data Generation for Realistic Analytics Examples and Testing
Synthetic Data Generation for Realistic Analytics Examples and Testing Ronald J. Nowling Red Hat, Inc. rnowling@redhat.com http://rnowling.github.io/ Who Am I? Software Engineer at Red Hat Data Science
More informationStarting up COST 290 "Wi-QoST: Traffic and QoS Management in Wireless Multimedia Networks"
Starting up COST 290 "Wi-QoST: Traffic and QoS Management in Wireless Multimedia Networks" Koucheryavy Yevgeni, PhD Tampere University of Technology Finland Outline COST 290 Action Motivation, Technical
More informationMPJ Express Meets YARN: Towards Java HPC on Hadoop Systems
Procedia Computer Science Volume 51, 2015, Pages 2678 2682 ICCS 2015 International Conference On Computational Science MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems Hamza Zafar 1, Farrukh
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationBig data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel
Big data platform for IoT Cloud Analytics Chen Admati, Advanced Analytics, Intel Agenda IoT @ Intel End-to-End offering Analytics vision Big data platform for IoT Cloud Analytics Platform Capabilities
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationBuild a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand
Build a Streamlined Data Refinery An enterprise solution for blended data that is governed, analytics-ready, and on-demand Introduction As the volume and variety of data has exploded in recent years, putting
More informationBig Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015
More information