Data Modeling s Role in the Evolving World of Data Management. Donna Burbank CA Technologies

Size: px
Start display at page:

Download "Data Modeling s Role in the Evolving World of Data Management. Donna Burbank CA Technologies"

Transcription

1 Data Modeling s Role in the Evolving World of Data Management Donna Burbank CA Technologies

2 Who am I? More than more than 15 years of experience in the areas of data management, metadata management, and enterprise architecture. Currently is the Senior Director of Product Marketing for CA s data modeling solutions. Brand Strategy and Product Management roles at Computer Associates and Embarcadero Technologies Senior consultant for PLATINUM technology s information management consulting division in both the U.S. and Europe. Worked with dozens of Fortune 500 companies worldwide in the U.S., Europe, Asia, and Africa and speaks regularly at industry conferences. Co-author of several books including: Data Modeling for the Business Data Modeling Made Simple with CA ERwin Data Modeler r8 PAGE 2

3 Agenda Culture shift New Technologies and Methodologies Big Data Cloud Computing Data Vaults Tagging Agile How Do Traditional Data Management Practices Fit? PAGE 3

4 Technological and Culture Shift in Society PAGE 4

5 Technological and Culture Shift in Data Management Mainframes Flat Files In-House Development Waterfall Methodology Client/Server Computing Relational Databases Data Warehousing Packaged Applications RAD Development Cloud Computing Big Data, NoSQL Data Vaults Agile Development Individual PC s Relational Databases Democratization of Computing Dot COM Bubble Bursts Data Integration Do More with Less PAGE 5 Dot COM Revolution Changing Business Models

6 Paradigm Shift Traditional Top-Down, Hierarchical Design, then Implement Passive, Push technology Manageable volumes of information Stable rate of change Modern Distributed, Democratic Real-time Analysis Collaborative, Interactive Massive volumes of information Rapid and Exponential rate of change PAGE 6

7 Emergence In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. - Wikipedia I love my new Levis jeans. Is Levi coming to the party? LOL. TTYL. Leving soon. Sale #LEVIS 20% at Macys. PAGE 7

8 Relational Databases vs. Big Data and NoSQL

9 Big Data Survey How familiar are you with "Big Data" technologies? A. Very familiar. We are using Big Data technologies now. B. Somewhat. I know what it is, but am not using it. C. Not at all. This is new to me. PAGE 9

10 Relational Databases vs. Big Data and NoSQL With the rise of massive volume, real-time data streams, traditional relational database technologies cannot scale. Google, Yahoo, Facebook, Twitter, MySpace Corporate Data: logs, click stream data, medical diagnostics research, etc. Huge volumes: With Google, for example, you re basically ingesting the entire internet. Issues that arise with Big Data Can t store in a single, relational database Too Big Can t easily index Too Big Can t write traditional SQL Unstructured Can t design data structures top-down Data is constantly in flux and in creation. PAGE 10

11 Relational Databases vs. Big Data and NoSQL Big Data Solutions address these issues by: Distributed data storage across servers Stores data in files Post-creation analysis, real-time Data is structured as it is accessed Natural-language, application-level processing Distributed hash tables pull record by name, similar to application programming PAGE 11

12 Do Relational Databases Go Away? What Relational Databases are good for: Structured data, that can be designed up front with predictable queries The same reasons you always have, actually: Ease of query and retrieval Ease of update - Reducing redundancy Increase data quality What Big Data solutions are good for: Analyzing raw, real-time data BEFORE it goes into a Data Warehouse or Transactional Database Analyzing large volumes of unstructured data that is updated in real-time I love my new Levis jeans. Is Levi coming to the party? Customer Sentiment Database PAGE 12LOL. TTYL. Leving soon. Sale #LEVIS 20% at Macys.

13 Some Technologies + Terms for Big Data Big Data Software Frameworks: Hadoop: An open source Apache project MapReduce: Google s solution Memcache Memory caching system for large-scale data-driven websites Big data s equivalent of MySQL behind a web site With large data sets, can t run a query each time: PAGE 13 Memory layer between database and web server Fetches data on-demand MapReduce MapReduce is the key algorithm that Hadoop uses to analyze data across servers Programmatic, not SQL-based A map transform is provided to transform an input data row of key and value to an output key/value: map(key1,value) -> list<key2,value2> Key aspect is that every transform is independent and the operation can be run in parallel on lists of data distributed across servers

14 Some Technologies + Terms for Big Data Hive Data warehouse infrastructure which allows SQL-like ad-hoc querying of data (in any format) stored in Hadoop Not designed for online transaction processing and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs). Hbase (Apache), BigTable (Google) Database for Big Data Not relational, distributed multi-dimensional sorted map PAGE 14

15 On Premise vs. Cloud

16 Cloud Survey How familiar are you with Cloud Database technologies? A. Very familiar. We are storing data in the cloud now. B. Somewhat. I know what they are, but am not using them. C. Not at all. This is new to me. PAGE 16

17 On-Premise vs. Cloud Historically, corporate data has been stored on-premise Originally centralized Then distributed But still managed and owned by the company directly purchasing hardware and software With the emergence of Cloud Computing, organizations can lease database stored as they need it Fewer in-house resources needed Ability to expand and contract usage as needed Driving Innovation With Cloud services, small, start-up companies have lower cost of entry PAGE 17

18 In the Cloud Realm IT is rented on a subscription basis SaaS: Software-as-a-Service On-Demand Software Delivery Examples: Web mail such as Yahoo or Google, ERP and CRP systems such as Salesforce.com PaaS: Platform-as-a-Service Development platform and solution stack as a service. Rent hardware, operating systems, storage and network capacity over the Internet. Examples: Force.com, Google IaaS: PAGE 18 Infrastructure-as-a-Service Delivery of the computing infrastructure (hardware) as a fully outsourced service. Examples: Google, IBM, Amazon.com etc. DaaS: Database-as-a-Service Provides traditional database features, typically data definition, storage and retrieval, on a subscription basis over the web. Examples: Microsoft SQL Azure, database.com, Amazon SimpleDB

19 The Precedent Exists How many households or companies make their own electricity today? Instead, purchase Electricity As-a-Service (EaaS) PAGE 19

20 How do Traditional Data Models Fit? A Data Model is your roadmap for: What data to move to the cloud, and what to keep on-premise Defining data structures (physical model) and business requirements (logical model) for cloud databases Off-Premise doesn t need to mean Out of your Control DB2 SQL Server Oracle Sybase MS SQL Azure MySQL Teradata PAGE 20

21 Data Warehouses vs. Data Vaults

22 Data Vault Survey How familiar are you with Data Vault technology? A. Very familiar. We are using Data Vaults now. B. Somewhat. I know what it is, but am not using it. C. Not at all. This is new to me. PAGE 22

23 Data Warehouses vs. Data Vaults The Traditional Data Warehouse Based on up-front analysis & design Creation of central data model and database Data is cleansed and transformed through ETL to central store Data Vault's philosophy is that all data is relevant data, even if it is "wrong Based on raw data sets Use of Links: Everything is a many-to-many relationship (no RI) Complete Auditability and Traceability (as a result of Sarbanes-Oxley, Basel II, etc.) Enable re-creation of a source system based on a specific point in time Claim is that traditional modeling is inflexible: When business rules change, the cardinality & model need to change > also cascades to child entities PAGE 23

24 Data Vaults Based on Parallel Processing Split the work so that database engines can optimize Index coverage Data redundancy Parallel query Resource utilization Each table has its own I/O channel associated with it Hypercubes Links and Hubs: Hubs provide the keys. Links describe the keys Using a weighted graph helps you determine which nodes are most important Problem with existing ETL is that everything needs to be done at load-time not always realistic Divide and Conquer do transformations, etc. when load individual marts PAGE 24

25 Is Data Modeling Still Relevant? Yes Source: PAGE 25

26 Is Data Modeling Still Relevant? Yes PAGE 26 Source:

27 Is the Traditional Modeling Process Still Relevant? - No No up-front design Instead, analyzing large volumes of data in real time No up-front design, analyzing raw data as is to discover patterns Distributed processing Sound similar to Big Data? It is PAGE 27

28 Hierarchies vs. Hash Tags

29 Traditional Naming Standards In traditional data modeling and management, naming standards are prescribed, enforced, and structured PAGE 29

30 A New Paradigm - Tagging PAGE 30

31 Error-Free? PAGE 31

32 What works? Traditional Naming Standards, Data Modeling Best Practices are good for upfront design and data quality Clear naming rules, e.g. Womans Hiking Boot (modifier, prime word, class word) Clear definitions, e.g. A boot is an item of footwear with close toes, which covers the calf, often with a raised heel. Supertypes and subtypes Data cleansing (boat or boot?) But Remember Emergence With the volumes of real-time data, it s impossible to model it up-front first Overtime, data quality improves through self-correction and emergence After all, generally, Google searches are pretty good. As with the other examples, the organizational reality will be a mix of technologies. PAGE 32

33 Waterfall vs. Agile

34 Agile Survey How familiar are you with Agile methodologies? A. Very familiar. We are using Agile now. B. Somewhat. I know what it is, but am not using it. C. Not at all. This is new to me. PAGE 34

35 What is Agile? Agile Software Development is a group of software development methodologies based on iterative and incremental development, where requirements and solutions evolve through collaboration between selforganizing, cross-functional teams. - Wikipedia PAGE 35

36 Waterfall vs. Agile Waterfall Up-Front Design Often Long Development Cycles Rigorous analysis of business requirements Documentation Top Down management, e.g. data standards team Agile Little up-front design Short Development Cycles (sprints) Learn as you go requirements Little documentation The documentation is the code Team-focused, all contribute Traditional applicationdevelopment focused (i.e. not data) PAGE 36

37 Recommendation: Be more Agile, but not necessarily using 100% of the Agile Methodology Don t Boil the Ocean Focus on small, manageable project that will show incremental value The days of the one year development cycle to build an enterprise model are gone Involve all Stakeholders All contribute: Engage in frequent conversations with business users, developers, DBAs, etc. The days of the data standards team will tell you the rules are gone (if they ever existed!) Focus on the Business Requirements key to any methodology The goal is to support the organization s needs, not implement a particular technology or methodology Do document requirements. A conceptual data model, for example, can apply to any technology PAGE 37

38 A Data Model is a Communication Tool: Data Model as Mediator A data model is a valuable tool to communicate core requirements across technologies to a wide range of stakeholders. Data Model DB2 SQL Server Oracle ETC! IDMS Teradata DB2 Developers ETC! Business Sponsors MS SQL Azure SAP Data Architects DBAs PAGE 38

39 How has the Social and IT Culture Evolved?

40 Paradigm Shift Traditional Top-Down, Hierarchical Design, then Implement Passive, Push technology Manageable volumes of information Stable rate of change Modern Distributed, Democratic Real-time Analysis Collaborative, Interactive Massive volumes of information Rapid and Exponential rate of change PAGE 40

41 Summary We are in a period of disruptive technology Rapid rate of change Massive volumes of data Social changes: more participatory, engaged Distributed, web-based implementations Real-time analysis required But, as with any Age of Change, the basics still apply Relational databases are still relevant Data Models are valuable to document business requirements and technical implementation The hard stuff still needs to be done: analysis, data quality, etc. Need to Adapt and Integrate Utilize new technologies as appropriate Engage extended teams in a participatory manner Be more agile and adaptive in your projects and methodologies Have fun! This is an exciting time to be in Data Management PAGE 41

42 Questions?

Data Modeling s Role in the Evolving World of Data Management. Donna Burbank CA Technologies

Data Modeling s Role in the Evolving World of Data Management. Donna Burbank CA Technologies Data Modeling s Role in the Evolving World of Data Management Donna Burbank CA Technologies Who am I? More than more than 15 years of experience in the areas of data management, metadata management, and

More information

With the Emergence of Big Data, Where do Relational Technologies Fit?

With the Emergence of Big Data, Where do Relational Technologies Fit? With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank VP Product Marketing CA Technologies DAMA Chicago June 2013 Who am I? More than more than 15 years of experience in the

More information

With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank President, DAMA Rocky Mountain Chapter

With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank President, DAMA Rocky Mountain Chapter With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank President, DAMA Rocky Mountain Chapter Agenda Big Data A Technical & Cultural Paradigm Shift (aka Donna s Rants/Musings)

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics 1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Big data for the Masses The Unique Challenge of Big Data Integration

Big data for the Masses The Unique Challenge of Big Data Integration Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

WHITE PAPER. Four Key Pillars To A Big Data Management Solution WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

When SaaS and On-Premise BI Collide:

When SaaS and On-Premise BI Collide: When SaaS and On-Premise BI Collide: The Pros and Cons of Moving BI to the Cloud Wayne Eckerson Director, TDWI Research November 19, 2009 Sponsor Agenda Cloud-based computing Terminology Types Architectures

More information

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

<Insert Picture Here> Oracle and/or Hadoop And what you need to know Oracle and/or Hadoop And what you need to know Jean-Pierre Dijcks Data Warehouse Product Management Agenda Business Context An overview of Hadoop and/or MapReduce Choices, choices,

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Hexaware E-book on Q & A for Cloud BI Hexaware Business Intelligence & Analytics Actionable Intelligence Enabled

Hexaware E-book on Q & A for Cloud BI Hexaware Business Intelligence & Analytics Actionable Intelligence Enabled Hexaware E-book on Q & A for Cloud BI Hexaware Business Intelligence & Analytics Actionable Intelligence Enabled HEXAWARE Q & A E-BOOK ON CLOUD BI Layers Applications Databases Security IaaS Self-managed

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories SAP Brief Analytics SAP Lumira Objectives Data Doesn t Communicate Itself Using Visualization to Tell Better Stories Tap into your data big and small Tap into your data big and small In today s fast-paced

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

The Cloud Opportunity: Italian Market 01/10/2010

The Cloud Opportunity: Italian Market 01/10/2010 The Cloud Opportunity: Italian Market 01/10/2010 Alessandro Greco @Easycloud.it In collaboration with easycloud.it Who is easycloud.it? Easycloud.it is a Consultant Company based in Europe with HQ in Italy.

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Cloud Computing and Big Data What Technical Writers Need to Know

Cloud Computing and Big Data What Technical Writers Need to Know Cloud Computing and Big Data What Technical Writers Need to Know Greg Olson, Senior Director Black Duck Software For the Society of Technical Writers Berkeley Chapter Black Duck 2014 Agenda Introduction

More information

Google Bing Daytona Microsoft Research

Google Bing Daytona Microsoft Research Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large

More information

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft Introducing the Reimagined Power BI Platform Jen Underwood, Microsoft Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor and manage user

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Running Oracle Applications on AWS

Running Oracle Applications on AWS Running Oracle Applications on AWS Bharath Terala Sr. Principal Consultant Apps Associates LLC June 09, 2014 Copyright 2014. Apps Associates LLC. 1 Agenda About the Presenter About Apps Associates LLC

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Cloud Computing Now and the Future Development of the IaaS

Cloud Computing Now and the Future Development of the IaaS 2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

An Oracle White Paper September 2012. Oracle Database and the Oracle Database Cloud

An Oracle White Paper September 2012. Oracle Database and the Oracle Database Cloud An Oracle White Paper September 2012 Oracle Database and the Oracle Database Cloud 1 Table of Contents Overview... 3 Cloud taxonomy... 4 The Cloud stack... 4 Differences between Cloud computing categories...

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

CLOUD STORAGE USING HADOOP AND PLAY

CLOUD STORAGE USING HADOOP AND PLAY 27 CLOUD STORAGE USING HADOOP AND PLAY Devateja G 1, Kashyap P V B 2, Suraj C 3, Harshavardhan C 4, Impana Appaji 5 1234 Computer Science & Engineering, Academy for Technical and Management Excellence

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Optimizing Service Levels in Public Cloud Deployments

Optimizing Service Levels in Public Cloud Deployments WHITE PAPER OCTOBER 2014 Optimizing Service Levels in Public Cloud Deployments Keys to Effective Service Management 2 WHITE PAPER: OPTIMIZING SERVICE LEVELS IN PUBLIC CLOUD DEPLOYMENTS ca.com Table of

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? December, 2014 What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? Glenn Anderson IBM Lab Services and Training Today s mainframe is a hybrid system z/os Linux on Sys z DB2 Analytics Accelerator

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily

More information

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Virtualization and ETL. Denodo Technologies Architecture Brief Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications

More information

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

Cloud Computing Training

Cloud Computing Training Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

Cloud Computing for Architects

Cloud Computing for Architects Cloud Computing for Architects This four day, hands-on boot camp begins with an examination of the Cloud Computing concept, the structure and key characteristics of Clouds, and takes a look under the hood

More information

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization

More information

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public Announcements TIM 50 Teaching Evaluations Open March 3 through 16 Final Exam Thursday, March 19, 4-7PM Lecture 19 20 March 12, 2015 Cloud Computing Cloud Computing: refers to both applications delivered

More information

Cloud Computing: Making the right choices

Cloud Computing: Making the right choices Cloud Computing: Making the right choices Kalpak Shah Clogeny Technologies Pvt Ltd 1 About Me Kalpak Shah Founder & CEO, Clogeny Technologies Passionate about economics and technology evolving through

More information

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Lee Hawkins (Quality Architect) Quest Software, Melbourne Copyright 2010 Quest Software We are gathered here today to talk

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

An Introduction to Data Virtualization as a Tool for Application Development

An Introduction to Data Virtualization as a Tool for Application Development An Introduction to Data Virtualization as a Tool for Application Development Accur8 Software 73 Main St. Suite 7, Brattleboro, VT 05301 accur8software.com INTRODUCTION When a developer or team of developers

More information

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com Big Data Primer Alex Sverdlov alex@theparticle.com 1 Why Big Data? Data has value. This immediately leads to: more data has more value, naturally causing datasets to grow rather large, even at small companies.

More information

Instant Data Warehousing with SAP data

Instant Data Warehousing with SAP data Instant Data Warehousing with SAP data» Extracting your SAP data to any destination environment» Fast, simple, user-friendly» 8 different SAP interface technologies» Graphical user interface no previous

More information

White Paper: Datameer s User-Focused Big Data Solutions

White Paper: Datameer s User-Focused Big Data Solutions CTOlabs.com White Paper: Datameer s User-Focused Big Data Solutions May 2012 A White Paper providing context and guidance you can use Inside: Overview of the Big Data Framework Datameer s Approach Consideration

More information