Thomas Baumann Swiss Mobiliar Bern, Switzerland
|
|
|
- Derek Bennett
- 10 years ago
- Views:
Transcription
1 Thomas Baumann Swiss Mobiliar Bern, Switzerland
2 WHAT IS BIG DATA (1/3): 3-5 V = 11 Volume Velocity Variety Veracity Value From Terabytes to Exabytes to process From milliseconds to minutes to respond From structured to unstructured to store and query From ACID to inconsistent to manage From data to insight to transform
3 WHAT IS BIG DATA (2/3): THE IT PERSPECTIVE Distributed, scalable, fault-tolerant technologies Query Languages Pig Latin Hive QL Impala CQL Cypher Map/Reduce Process and Resource Managers YARN Cassandra Kernel Neo4j Kernel Data Stores HDFS (Hadoop Distributed File System) Cassandra Neo4j
4 WHAT IS BIG DATA (3/3) Big Data := Gaining actionable insights to create competitive advantage and to mitigate risks from combining new data sources by using scalable technologies. Th.Baumann 2015
5 WHAT IS BIG DATA (3/3) Big Data Ecosystem Actionable Insights Actionable Information IoT, Sensors Events Data «Lake» OLTP data Data Warehouse event processing OLTP transactions data processing (ETL)
6 CONTENT Swiss Mobiliar Company Introduction What Is Big Data In More Detail Auditing Big Data: What Is Specific For Big Data? Using Big Data Tools and Technologies For Yourself
7 SWISS MOBILIAR Switzerland s most personal insurer legal form of a cooperative association (mutual company). Switzerland s number one insurer for household contents, business and pure risk life insurance. close to customers throughout the country thanks to around 80 general agencies at 160 locations. over 1.7 million insured persons or firms. 13x continuously over 4,400 employees and 325 trainees.
8 INSURANCE MARKET GROWTH IN SWITZERLAND Close to 2/3 of Market Growth to Swiss Mobiliar Growth Mobiliar Market Growth in Mio CHF. Source: Schweizerischer Versicherungsverband
9 THE SPEAKER Born in1963 MSc. from the Swiss Federal Institute of Technology (ETH Zurich) Computer Sciences combined with probability theory and statistics These days, we would call this mix Big Data or Data Sciences Has been focused on DBMS and performance since 1992 Internationally recognized database expert and speaker on numerous conferences Minister of Performance at Swiss Mobiliar dedicated to performance since 1963 also produces this search result:
10 CONTENT Swiss Mobiliar Company Introduction What Is Big Data In More Detail Auditing Big Data: What Is Specific For Big Data? Using Big Data Tools and Technologies For Yourself
11 NIST BIG DATA REFERENCE ARCHITECTURE Source: NIST National Institute of Standards and Technology, U.S. Department of Commerce
12 BIG DATA ARCHITECTURE PRINCIPLES Scale Out Use Commodity Hardware (see example on next slide) Scalable Redundancy Duplicate Data to provide data safety Fault tolerance for both data and jobs Data Locality Minimize amount of network traffic
13 SAMPLE HARDWARE TYPE (COURTESY OF HP)
14 DATA PROCESSING ARCHITECTURE FOR BIG DATA incoming data Master Data (immutable, append-only, schema-on-read) Streaming Data (real-time processing) Precomputed Data (completely re-calculated data) Real-Time Views (read/write database systems) Query (merges precomputed data with real-time views)
15 COMMON TOOLS YOU MIGHT HAVE HEARD ABOUT incoming data Master Data (immutable, append-only, schema-on-read) Streaming Data (real-time processing) Precomputed Data (completely re-calculated data) Real-Time Views (read/write database systems) Query Cypher (merges Gremlin precomputed Spark Mllib data with Spark real-time GraphX views) R CQL
16 SCHEMA-ON-WRITE VS. SCHEMA-ON-READ Traditional RDBMS is Schema-On-Write Data persisted in tabular, agreed and consistent form Structure must be decided before writing Data integration happens in ETL Big Data is Schema-On-Read Data persisted without any checking Interpretation of data captured in code by each program accessing the data Data quality depends on code quality
17 NO-SQL DATABASE OVERVIEW Data Volume KeyValue DB Wide Column DB Document Store RDBMS Graph DB Transactional Properties RDBMS CAP Text Search DB Data Structure Complexity ACID
18 Degree of Data Relationship SWEET SPOT FOR DBMS Graph DB RDBMS RDBMS incl. Column Store (IBM DB2 Analytics Accelerator, Oracle DB In-Memory, SAP HANA) NoSQL Datenbanken (Cassandra, Oracle NoSQL, Redis, Riad, HBase, etc.) 100 Tbyte 500 K Isrt/sec 1000 Tbyte 3 M Isrt/sec Volume Velocity
19 CONTENT Swiss Mobiliar Company Introduction What Is Big Data In More Detail Auditing Big Data: What Is Specific For Big Data? Using Big Data Tools and Technologies For Yourself
20 ISSUES OF INTEREST Business IT Alignment Deployment Model Privacy Backup and Recovery Detecting Data Manipulation DLP (Data Loss Prevention)
21 WHY IS BIG DATA SECURITY DIFFERENT? Data might be gathered from different end points. Data search and selection can lead to privacy and security policy concerns. Privacy-preserving mechanisms are needed for Big Data, such as for Personally Identifiable Information (PII). Big Data is pushing beyond traditional definitions for information trust, openness, and responsibility. Information assurance and disaster recovery may require unique and emergent practices. Big Data creates targets of increased value. Risks have increased for de-anonymization and transfer of PII without consent traceability. Source: NIST National Institute of Standards and Technology, U.S. Department of Commerce. Big Data Interoperability Framework, Volume1: Definitions
22 DATA PRIVACY VERSUS BIG DATA Data Privacy Principles Targeted use of data gathered Consent required Transparent Usage of data Limited amount of data stored Proven necessity of data store Big Data Principles Analytics of heterogeneus sources Consent not traceable Undefined purpose of data store Unlimited data storage Data usable for future use
23 AUDITING BIG DATA OPERATIONS A good source for operations, but also for auditors Covers Metadata Backup and Recovery Tasks for Security and Availability Performance Management and Monitoring Patching Troubleshooting Check if these points are adressed in your target s environment to be audited
24 DETECTING DATA MANIPULATION Requirement for Data(base) Activity Monitoring Even more important than in traditional world Fast data processing requires short time to react Act before React How is DAM organized in your Big Data ecosystem?
25 CONTENT Swiss Mobiliar Company Introduction What Is Big Data In More Detail Auditing Big Data: What Is Specific For Big Data? Using Big Data Tools and Technologies For Yourself
26 MOTIVATION Big Data is about Volume, Velocity, Variety, Veracity of Data Are these V s familiar to you in your daily work as an auditor? If yes, Big Data tools and technologies might help you in your job All tools and frameworks are Open Source Most of them are easy to use Cloud Services available (usually for free while working on small data) Many of those tools are really cool There is more out there than just Microsoft Excel The tools on the following pages are arbitrarily selected by the author and do not necessarily represent best of class tools
27 A VARIETY OF COMPANIES AND PRODUCTS Source:
28 USE CASES Hadoop, Hive and Impala to Analyze Open Data Are there any insurance claims for damages due to storm or strong winds, but meteo data shows maximum wind was unsufficient to cause damages? Impact Analysis using Connected Data with Neo4j Graph DB Suppose we immerse a large porous stone in a bucket of water. Will the center of the stone be wetted? Analogous problems where to apply this algorithm: Objects an administrator might reach? Spread of (computer) viruses? Impact of unavailability of a component? Were people involved in a damage known to each other before?
29 USE CASE 1 Hadoop, Hive and Impala to Analyze Open Data Are there any insurance claims for damages due to storm or strong winds, but meteo data shows maximum wind was unsufficient to cause damages? Master Data (immutable, append-only, schema-on-read) Precomputed Data (completely re-calculated data) incoming data hands-on exercise/demo: step-by-step implementation
30 USE CASE 1 LOAD INTERMED_METEODATA Claims caused by wind, but wind was unsufficiently strong that day in that region TRANSFORM INSERT METEODATA Claims DB Station Measurement 1 Measurement 2 *)
31 USE CASE 1: HIVE/IMPALA IMPLEMENTATION $ wget $ tail -n +4 VQHA69.txt > tmp.txt && mv tmp.txt VQHA69.txt hive> use meteodaten; hive> LOAD DATA LOCAL INPATH 'VQHA69.txt' OVERWRITE INTO TABLE intermed_meteodata; hive> INSERT INTO TABLE meteodata SELECT REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){1}', 1) station, REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){2}', 1) timestamp_gmt, REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){3}', 1) temp, REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){5}', 1) regen, REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){8}', 1) luftdruck, REGEXP_EXTRACT(VALUE, '^(?:([^\ ]*)\.){9}', 1) wind FROM intermed_meteodata; hive> SELECT tag, SUM(regen) AS totalregen FROM (SELECT SUBSTR(timestamp_gmt,5,4) AS tag, regen, station FROM meteodata) AS T1 GROUP BY tag ORDER BY totalregen DESC LIMIT 2 faster with Impala hive> SELECT timestamp_gmt, temp FROM meteodata WHERE station='abo' AND SUBSTR(timestamp_gmt,1,8)= ' ORDER BY timestamp_gmt
32 USE CASE 2 Neo4j Graph DB to Analyze Connected Data Suppose we immerse a large porous stone in a bucket of water. Will the center of the stone be wetted? Analogous problems where to apply this algorithm: Objects an administrator might reach? Spread of (computer) viruses? Impact of unavailability of a component? incoming data Cypher Query Language MATCH allshortestpaths ((a)-[*]-(z)) WHERE a.name="alpha" and z.name="omega" RETURN COUNT(*) actionable insights hands-on exercise/demo: step-by-step implementation
33 USE CASE 2 a W Is there any connection between a und W?
34 USE CASE 2: RDBMS SOLUTION Ex:. 35x25 percolation matrix create table percolation (x1 integer, y1 integer, x2 integer, y2 integer); a) recursive query: b) Explicit joins b1) Init b2) Loop (until n=35 or convergence of min(level)) b3) final analysis with zw (n,m,depth) as ( select x2,y2,1 from percolation where x1=1 union all select x2,y2,depth+1 from zw, percolation where x1=n and y1=n and y2=m+1 or x1=n and y1=m and x2=n+1 or x1=n and y1=m and x2=n-1 or x1=n and y1=m and y2=m-1 and depth < 875) -- max 25x35 iterations select max(n) from zw insert into zw select x2,y2,1 from percolation where x1=1; insert into zw select x2, y2, min(level) from ( select distinct x2,y2,depth+1 as level from zw, percolation where x1=n and y1=n and y2=m+1 or x1=n and y1=m and x2=n+1 or x1=n and y1=m and x2=n-1 or x1=n and y1=m and y2=m-1) t group by x2,y2; select * from zw where x1=35 runs eternally sec, dependent on graph density
35 USE CASE 2: GRAPH DBMS SOLUTION Node Definitionen: CREATE (sx_y:site{name:"sx_y"}) CREATE (s0:start{name:"alpha"}) CREATE (s9:ende{name:"omega"}) Edge Definitionen: MATCH (sx1_y1:site {name:'sx1_y1'}), (sx1_y2:site {name:'sx1_y2'}) CREATE (sx1_y1) - [:bond] -> (sx1_y1+1), (sx1_y2) - [:bond] -> (sx1_y2+1) Query: MATCH allshortestpaths((a)-[*]-(z)) WHERE a.name="alpha" and z.name="omega" RETURN COUNT(*) 3-5 sec, dependent on graph density
36 THANK YOU FOR YOUR ATTENTION Dress up and get ready for the Super Spy Event, buses leaving at 6:10 PM
Building Scalable Big Data Pipelines
Building Scalable Big Data Pipelines NOSQL SEARCH ROADSHOW ZURICH Christian Gügi, Solution Architect 19.09.2013 AGENDA Opportunities & Challenges Integrating Hadoop Lambda Architecture Lambda in Practice
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
REAL-TIME BIG DATA ANALYTICS
www.leanxcale.com [email protected] REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
NextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS
Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs
GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs DMITRIY SETRAKYAN Founder & EVP Engineering @dsetrakyan www.gridgain.com #gridgain Agenda EvoluCon of In- Memory CompuCng
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
How To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Cloud Big Data Architectures
Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization
Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
What s next for the Berkeley Data Analytics Stack?
What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
Data sharing in the Big Data era
www.bsc.es Data sharing in the Big Data era Anna Queralt and Toni Cortes Storage System Research Group Introduction What ignited our research Different data models: persistent vs. non persistent New storage
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Understanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
CIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Business Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
Making Sense of Big Data in Insurance
Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific
Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected] Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Microsoft Azure Data Technologies: An Overview
David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan Agenda About In- Memory
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Ali Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Big Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. [email protected], 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
