Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Similar documents
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Advanced Big Data Analytics with R and Hadoop

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Advanced In-Database Analytics

The 3 questions to ask yourself about BIG DATA

SEIZE THE DATA SEIZE THE DATA. 2015

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

The Internet of Things and Big Data: Intro

Challenges for Data Driven Systems

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

High-Performance Analytics

ANALYTICS BUILT FOR INTERNET OF THINGS

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

NextGen Infrastructure for Big DATA Analytics.

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Transforming the Telecoms Business using Big Data and Analytics

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

ISSN: International Journal of Innovative Research in Technology & Science(IJIRTS)

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Big Data. Fast Forward. Putting data to productive use

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

locuz.com Big Data Services

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Reference Architecture, Requirements, Gaps, Roles

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

DATA MINING AND WAREHOUSING CONCEPTS

Big Data and Your Data Warehouse Philip Russom

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Harnessing the power of advanced analytics with IBM Netezza

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Big Data on Microsoft Platform

Big Data Analytics Nokia

This Symposium brought to you by

Survey of Big Data Architecture and Framework from the Industry

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Torquex Customer Engagement Analytics. End to End View of Customer Interactions and Operational Insights

BIG DATA What it is and how to use?

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

Customized Report- Big Data

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Introducing Oracle Exalytics In-Memory Machine

The University of Jordan

Search and Real-Time Analytics on Big Data

Using In-Memory Computing to Simplify Big Data Analytics

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Internet of Things. Opportunity Challenges Solutions

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Hurwitz ValuePoint: Predixion

Information Management course

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

TEXT ANALYTICS INTEGRATION

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Big Data Analytics. Lucas Rego Drumond

Oracle Big Data SQL Technical Update

The Power of Predictive Analytics

Unified Batch & Stream Processing Platform

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

How To Use Big Data For Telco (For A Telco)

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

How To Handle Big Data With A Data Scientist

In-Memory Analytics for Big Data

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Big Data and Advanced Analytics Technologies for the Smart Grid

Focus on the business, not the business of data warehousing!

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

KnowledgeSEEKER Marketing Edition

OBIEE 11g Analytics Using EMC Greenplum Database

Intro to Big Data and Business Intelligence

Evolving Data Warehouse Architectures

Big Data and Data Science: Behind the Buzz Words

BIG DATA TECHNOLOGY. Hadoop Ecosystem

COMP9321 Web Application Engineering

IoT Analytics: Four Key Essentials and Four Target Industries

Transcription:

Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014

Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions Business Strategies Data Storage Data Diversity Information Filtering Real-Time Data Analysis Techniques III. Conclusion 2

Introduction & Motivation PART I 3

Big Data Analytics in a Cloud. 4

What is Big Data Analytics? Buzz Word for a combination of: o o Big Data Advanced Analytics Not just one Data Type and not just one technique But we will see this in a minute!!! 5

Big Data The three V s (I) Most definition focus on the data size o NOT SUFFIECIENT!! Big Data can be defined using the three V s : o o Volume Velocity o Variety The measurements for each V are absolutely divers 6

Volume: Big Data The three V s (II) o Gigabytes, Terabytes or Petabytes o Number of Files or Records Velocity: o Real-time (as Stream) o Batches Variety: o Structure of data (un-, semi- or structured) o Web data o Real-time data 7

Advanced Analytics (I) Advanced Analytics, as Big Data Analytics is a Buzz word! It stands for a collection of different analysis techniques o All techniques are suited to deal with unknown data sets A.k.a. Discovery Analytics 8

Advanced Analytics (II) Some Techniques: o Predictive Analytics o Data Mining o Statistical Analysis o Natural Language Processing o Data base capabilities MapReduce In-database analytics In-memory databases 9

Importance of Big Data Analytics (I) Big Data Analytics is seen as one of the most profound trends in Business Intelligence according to TDWI Today more and more data is collected by enterprises o See Big Data To gain new insights this data has to be analysed o Not possible with standard analytic platforms 10

Importance of Big Data Analytics (II) The 5 main benefits are: 1. Better targeted social influencer marketing (61%) 2. More numerous and accurate business insights (45%) 3. Segmentation of customer base (41%) 4. Recognition of sales and market opportunities (38%) 5. Automated decisions for real-time processes (37%) 11

Importance of Big Data Analytics (III) The 5 main barriers are: 1. Inadequate staffing or skills for big data analytics (46%) 2. Cost, overall (42%) 3. Lack of business sponsorship (38%) 4. Difficulty of architecting big data analytics system (33%) 5. Current database software lacks in-database analytics (32%) 12

Techniques & Solutions PART II 13

Business Strategies Problems Strategy or architecture for dealing with Big Data Analytics is needed Problems: o Different programming abstractions (compared to desktop environment) o Every choice has direct dollar costs, regardless of the field: Computation Upload / Download Data storage 14

Business Strategies Cloud Computing Every choice directly effects the computation time! Supports many Virtual Machines Correlation of paying more and increasing the computation power o Doubling memory or speed does not linearly scale to halve the time! There are many vendor-based solutions for data upload into the cloud databases 15

Data Storage The HDFS Goals Belongs to the so-called No-SQL Databases Goals of the HDFS: o Fault detection & fast automatic recovery o Streaming data access o Handling large data sets o Simple coherency model o moving computation is cheaper than moving data o portability 16

Data Storage The HDFS Architecture 17

Data Diversity Filtering Information (I) Data mining describes: o Application of methods and algorithms o Supporting or enabling the extraction of empirical links of data objects in data sets Goals of data mining: o Find new correlations, patterns and trends inside large amounts of data 18

Data Diversity Filtering Information (II) Most of the data arriving is unlabeled => classification not possible A clustering is: o A group of same or similar elements gathered or occurring closely together Task: o Organize a collection of n objects into a partitioning or a hierarchy of partitions o Label the data 19

Data Diversity Filtering Information (III) Problems: o Measure similarity o The unknown number of clusters needed o Cluster validity o Outliers 20

Data Diversity Real-Time Data (I) CEP: Complex Event Processing Events are complex in sense of the relations between arriving data parts CEP systems will non only consider arriving events separated from each other o Timestamp + Content + optional constraints Goal is to identify interesting situations by processing event notifications (not generic data) 21

Data Diversity Real-Time Data (II) CEP is an extension to the traditional publishsubscribe interaction concept: o Observer: RSS feed (example) o Consumer: other systems Examples for CEP Engine: o Next CEP (rules based pattern detection) o PB-CEP (plan based pattern detection) 22

Data Diversity Analysis Techniques (I) Analytical computations are moved into the database system in-database analytics: o Model scoring o Predictive analytics o And others Calculations are executed in a single, centralized location o Data access right where it is stored o No data extraction o Memory capabilities o Load balancing o Parallel processing 23

Data Diversity Analysis Techniques (II) Using historical data to predict the future (long or short term) o Data mining techniques (clustering, regression, classification) o Statistical analysis techniques Build a predictive model o Exploit patterns in historical data to identify risks and opportunities Combination with CEP makes sense: o CEP can ensure the calculation of the predictors (main problem!) o Short term realization of complex events 24

Conclusion PART III 25

Summary What we ve seen! Big Data is not all about size Big Data Analytics is important due to the positive influence on many enterprise departments. But it is expensive! One needs the right computation platform, storage system and analysis techniques depending on the data one is working with o Cloud Computing o HDFS o CEP / In-database Analytics 26

FINAL WORDS All presented techniques are just examples o Numerous more systems, software products available in this field Persons from many different fields have to work together to enable the analysis of big data. o Business analysts o Database specialists o System engineers o 27

Thank You for Your Attention! ANY QUESTIONS? 28