Intro to Big Data and Business Intelligence



Similar documents
Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Transforming the Telecoms Business using Big Data and Analytics

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data for Investment Research Management

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Business Intelligence

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Using Data Mining and Machine Learning in Retail

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Large scale processing using Hadoop. Ján Vaňo

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Customized Report- Big Data

The Next Wave of Data Management. Is Big Data The New Normal?

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

BIG DATA IN BUSINESS ENVIRONMENT

Modernizing Your Data Warehouse for Hadoop

Big data and its transformational effects

Big Data Explained. An introduction to Big Data Science.

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

The 4 Pillars of Technosoft s Big Data Practice

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Introduction to Predictive Analytics. Dr. Ronen Meiri

Extend your analytic capabilities with SAP Predictive Analysis

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Application Development. A Paradigm Shift

Copyright 2014, Neudesic. All rights reserved.

HDP Enabling the Modern Data Architecture

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Interactive data analytics drive insights

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA What it is and how to use?

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Hadoop Big Data for Processing Data and Performing Workload

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

III JORNADAS DE DATA MINING

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power

Big Data for Investment Research Management

Virtualizing Apache Hadoop. June, 2012

Manifest for Big Data Pig, Hive & Jaql

BIG DATA TRENDS AND TECHNOLOGIES

Data Warehouse design

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Foundations of Business Intelligence: Databases and Information Management

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Blueprints for Big Data Success

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

ANALYTICS BUILT FOR INTERNET OF THINGS

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

HDP Hadoop From concept to deployment.

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

Doing Multidisciplinary Research in Data Science

Big Data Success Step 1: Get the Technology Right

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Changing the face of Business Intelligence & Information Management

NextGen Infrastructure for Big DATA Analytics.

A Survey on Big Data Concepts and Tools

Comprehensive Analytics on the Hortonworks Data Platform

Workshop on Hadoop with Big Data

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

China Bank BigData Usecase Huawei FusionInsight Solution

WHITE PAPER Big Data Survey Results

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

The 3 questions to ask yourself about BIG DATA

Navigating Big Data business analytics

In-Memory Analytics for Big Data

How Companies are! Using Spark

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Talend Big Data. Delivering instant value from all your data. Talend

Big Data Use Case: Business Analytics

Transcription:

Intro to Big Data and Business Intelligence Anjana Susarla Eli Broad College of Business What is Business Intelligence A Simple Definition: The applications and technologies transforming Business Data into Action Business intelligence (BI) is a business management term refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions. YouTube: What is BI? Guest lecture AESC310 1

Data, information, and knowledge Data a collection of raw value elements or facts used for calculating, reasoning, or measuring. Information the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning Knowledge the concept of understanding information based on recognized patterns in a way that provides insight to information. 3 Driving force - Big Data A collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data. 4 Guest lecture AESC310

5 ISQS7339, Fall 01 Zettabyte (ZB) A quantity of information or information storage capacity equal to 10 1 bytes or 1,000 exabytes. As of April 01, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 006. Seagate reported selling 330 exabytes worth of hard drives during the 011 Fiscal Year. As of 009, the entireworld Wide Web was estimated to contain close to 500 exabytes.this is a half zettabyte. 1,000,000,000,000,000,000,000 bytes = 1000 7 bytes = 10 1 bytes 6 Guest lecture AESC310 3

Data Scale 7 Market "Big data" has increased the demand of information management specialists - major companies have spent more than $15 billion for this. This industry is worth more than $100 billion and growing at almost 10% a year. 4.6 billion mobile-phone subscriptions worldwide and between 1 billion and billion people accessing the internet. The world's effective capacity to exchange information through telecommunication networks was 81 petabytes in 1986, 471 petabytes in 1993,. exabytes in 000, 65 exabytes in 007 It is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 013. 8 Guest lecture AESC310 4

Approach - Cloud Computing Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation. Buzzword: SaaS/IaaS/PaaS 9 Distributed business intelligence Deal with big data the open & distributed approach LAMP Hadoop MapReduce HDFS NOSQL Zookeeper Storm Guest lecture AESC310 5

Apache Hadoop An open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Apache Hadoop framework is composed of the following modules : Hadoop Common - contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS). HadoopYARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce - a programming model for large scale data processing. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. 1 1 A Multi-node Hadoop Cluster 1 Guest lecture AESC310 6

1 3 Guest lecture AESC310 7

1 6 ISQS 6339, Data Mgmt & BI Guest lecture AESC310 8

1 7 1 8 Guest lecture AESC310 9

1 9 ISQS 6339, Data Mgmt & BI Hadoop : Big data's big leap forward The new Hadoop is the Apache Foundation's attempt to create a whole new general framework for the way big data can be stored, mined, and processed. The biggest constraint on scale has been Hadoop s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck. Hadoop uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node. 0 Guest lecture AESC310 10

MapReduce.0 YARN (Yet Another Resource Negotiator) 1 The process of BI Data -> information -> knowledge -> actionable plans Data -> information: the process of determining what data is to be collected and managed and in what context Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining Knowledge -> actionable plans: The most important aspect in a BI process Guest lecture AESC310 11

Actionable Knowledge An information asset retains its value on if the converted knowledge is actionable. Need some methods for extracting value from knowledge This is not a technical issue but an organizational one need empowered individuals in the organization to take the action There is an issue of Return on Investment (ROI) 3 BI Problems Structured Detecting Credit card fraud Setting Loan parameters Market segmentation/mass customization Deciding Marketing mix Customer Churn Reducing employee turnover Improving Quality/Efficiency Unstructured Data exploration Utilization of resources (stored knowledge) to maximum effectiveness 4 Guest lecture AESC310 1

5 Customer Analytics Customer profiling Targeted marketing Personalization Collaborative filtering Customer satisfaction Customer lifetime value Customer loyalty Sales Channel Analytics Marketing Sales performance and pipeline BI Applications 6 BI Applications () Supply Chain Analytics Supplier and vendor management Shipping Inventory control Distribution analysis Behavior Analysis Purchasing trends Web activity Fraud and abuse detection Customer attrition Social network analysis Guest lecture AESC310 13

The Evolution of Business Intelligence 1 st Generation Traditional analytics (query and reporting) nd Generation Traditional generation (OLAP, data warehousing).5 nd Generation New traditional generation 3 rd Generation - Advanced analytics Rules, predictive analytics and realtime data mining Stream analytics 7 Business Intelligence Classifications Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) 3 rd -Generation BI * In lieu of stream analytics, embedded analytics, although architecturally different, could potentially play the same role Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining New Traditional Analytics.5-Gen Analytics (In-Memory OLAP, Search-Based) Source: Bill O Connell IBM, Aug 007 Traditional Analytics 1 st Generation Analytics (Query & Reporting) nd Generation Analytics (OLAP, Data Warehousing) Legacy BI Guest lecture AESC310 14

Big data is at the foundation of all the megatrends that are happening today, from social to mobile to cloud to gaming. Chris Lynch, Vertica Systems Big data is not about the data Gary King, Harvard University, making the point that while data is plentiful and easy to collect, the real value is in the analytics Guest lecture AESC310 15

There were 5 exabytes of information created between the dawn of civilization through 003, but that much information is now created every days. Eric Schmidt, of Google, said in 010 Information is the oil of the 1st century, and analytics is the combustion engine. Peter Sondergaard, Gartner Research Guest lecture AESC310 16

I keep saying that the sexy job in the next 10 years will be statisticians, and I m not kidding Hal Varian, Google You can have data without information, but you cannot have information without data. Daniel Keys Moran, computer programmer and science fiction author Guest lecture AESC310 17

Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. Atul Butte, Stanford School of Medicine Errors using inadequate data are much less than those using no data at all. Charles Babbage, inventor and mathematician Guest lecture AESC310 18

To call in the statistician after the experiment is done may be no more than asking him to perform a post mortem he may be able to say what the experiment died of. Ronald Fisher, biologist, geneticist and statistician Without big data, you are blind and deaf in the middle of a freeway Geoffrey Moore, management consultant and theorist Guest lecture AESC310 19