/ Cloud Computing. Recitation 11 November 10 th and November 12 th, 2015
|
|
- Shanon Washington
- 8 years ago
- Views:
Transcription
1 / Cloud Computing Recitation 11 November 10 th and November 12 th, 2015
2 Overview Administrative issues Tagging, 15619Project, project code Last week s reflection Project 3.4 Quiz 9 This week s schedule Project 3.5 Unit 5 - Module Project Phase 2 Quiz 10 Demo Twitter Analytics: The 15619Project
3 Reminders Monitor AWS expenses regularly and tag all resources Check your bill (Cost Explorer > filter by tags). Piazza Guidelines Please tag your questions appropriately Search for an existing answer first Provide clean, modular and well documented code Large penalties for not doing so. Double check that your code is submitted!! (verify by downloading it from TPZ from the submissions page) Utilize Office Hours We are here to help (but not to give solutions) Use the team AWS account and tag the 15619Project resources carefully
4 Project 3.4 : FAQs Problem 1: DynamoDB import failure Make sure the data is processed correctly to meet DynamoDB s requirement. Be aware of escaping problems. Problem 2: Maven dependency error Be aware of what you include. Some dependencies might have conflicts.
5 This week: Project 3.5 P3.1: Files, SQL and NoSQL P3.2: Sharding and Replication P3.3: Consistency P3.4: Social network with heterogeneous backends P3.5: Data warehousing and OLAP
6 P3.5: Background Carnegie Eagle(CE), a supermarket chain wishes to expand their business further into other markets and wants a decision support system The CTO of CE wants you to analyze various data warehouses and pick the best one for the job The CTO subscribes to the Hottest Data Warehouses weekly and decides she wants you to analyze the following: Hive Impala Redshift
7 OLTP vs OLAP
8 Data warehousing and OLAP OLAP (Online Analytical Processing) queries deals with historical/archived data OLAP warehouses optimized for reads and aggregations Rarely perform updates Tables in OLAP are denormalized as compared to normalized tables in OLTP (Online Transaction Processing) Data warehouses tuned for high throughput since they process large amounts of data OLTP databases are more tuned for smaller updates and lower latency
9 Hive Impala Redshift OLAP data warehouses Built on Hadoop and provides a SQL-like interface to query the data Translates user SQL query to MapReduce jobs Based on Hadoop as well and provides a SQL-like interface similar to Hive Uses its own engine to translate user queries and directly access data on the cluster (hence lower latency) Datawarehouse-as-a-Service provided by Amazon Web Services Used for real time analytics
10 Module to Read UNIT 5: Distributed Programming and Analytics Engines for the Cloud Module 18: Intro to distributed programming for the Cloud Module 19: Distributed analytics engines: MapReduce Module 20: Distributed analytics engines: Spark Module 21: Distributed analytics engines: GraphLab
11 Distributed Programming Taxonomy of Programs: Sequential Concurrent Parallel Challenges in programming the cloud: Scalability Communication overhead Heterogeneity Synchronization Fault Tolerance Scheduling
12 Upcoming Deadlines Quiz 9 : Unit 5 - Module 18 Open: 11/13/ :01 AM Pittsburgh Due: 11/13/ :59 PM Pittsburgh Project 3.5 : Data warehousing and OLAP Due: 11/15/ :59 PM Pittsburgh 15619Project : Phase 2 Live-test due: 11/11/2015 4:59 PM Pittsburgh Code and report due: 11/12/ :59 PM Pittsburgh
13 Busy Weeks Coming Up! Wednesday Thursday Friday Sunday Wednesday 11/11/ :00:01 EST Phase 2 Live Test Thursday 11/12/ :59:59 EST Phase 2 Code & Report Due Friday 11/13/ :59:59 EST Quiz 10 Sunday 11/15/ :59:59 EST P3.5 Due Wednesday 12/2/ :00:01 EST Phase 3 Live Test Thursday 12/3/ :59:59 EST Phase 3 Code & Report Due Friday 12/4/ :59:59 EST Quiz 12 Sunday 12/6/ :59:59 EST P4.2 Due
14 Project 3.5 OLAP with Cloud Data Warehousing
15 Star Schema
16 Data Warehousing Benchmark
17 Data Warehousing Benchmark Hive No optimization required Follow the instructions and you are done Impala and Redshift Load data, execute unoptimized queries Optimize table schemas and/or queries The evaluation will be on both correctness and response time
18 Notes Hive May take a few hours, be patient. Impala Some unoptimized queries may throw exceptions. Redshift Be aware of the high expenditure! Think before you start.
19 Project 3.5 Demo
20 twitter DATA ANALYTICS: PROJECT
21 15619 Project Phase 2 Deadlines Project Phase Project Phase 2 Live Test Thursday 10/15/ :00:01 ET Thursday 10/29/ :59:59 ET Wednesday 11/11/ :00:01 ET Wednesday 11/11/ :59:59 ET Thursday 11/12/ :59:59 ET Project Phase 2 Q3 & Q4 Development Project Phase 2 Code & Report Due
22 Phase 2 Two more queries (Q3 and Q4) More ETL Multiple tables and queries Live Test!!! Warmup, Q1, Q2, Q3, Q4, MixedQ1-Q4 Each for 30 min Both HBase and MySQL Two DNS Must get over 50% score on both! No more pre-caching of known requests
23 Query 3: Spreading Happiness & Sadness Q. How to compute impact score for each tweet? score = sentiment_score * ( 1 + followers_count ) Q. What s followers_count and how do I find it? Read
24 Query 3: Spreading Happiness & Sadness Sample Query: GET /q3? start_date= &end_date= &userid= &n=10 n will be less or equal to 10
25 Query 3: Spreading Happiness & Sadness Sample Response: Positive Tweets Austin please follow me i love you much! x ,512, ,@ArianaGrande Ariana please follow me i love you much! x ,512, ,@ArianaGrande Ariana please follow me i love you much! x ,512, ,@ArianaGrande Ariana please follow me i love you much! x ,512, ,@ArianaGrande Ariana please follow me i love you much! x ,508, ,@ArianaGrande Ariana please follow me i love you much! x ,270, ,@AustinMahone please follow me. I know that it is "impossible" that you see this tweet but if you see it, please!! x ,270, ,@ArianaGrande please follow me. I know that it is "impossible" that you see this tweet but if you see it, please!! x7 Negative Tweets
26 Query 4: Hashtag Analysis Use the hashtag entity Sample Query: GET /q4?hashtag=01net&n=4 Sample Response: :2: , :#01net: Philips attaque Nintendo et veut faire bannir la Wii U des ventes aux Etats-Unis :1: :#01net: Apple pourrait lancer un MacBook Air Retina et un ipad 12'' au second semestre :1: :#01net: TV 4K : faut-il craquer tout de suite? Nos tests, les prix et ce qui arrive :1: :#01net: Le rapprochement entre Bouygues Telecom et Free se prépare en douceur
27 Phase 2 Clarifications Censorship? q3:, q4: X No required time filtering Tie breaker? q3: q4: Primary key - absolute value of score (descending) Secondary key - tweet id resolve ties by tweet i (ascending) Primary key - count, date with higher count com first Secondary key - date, earliest date comes first
28 Tips for Phase 2 Carefully design and check ETL process Watch your budget. $60 = phase + Live Test Preparing for the live test You are required to submit two URLs, one for the MySQL live test and one for the HBase for live test Budget limited to $1.25/hr for MySQL and HBase web service separately. Tag correctly. Do NOT run anything else. Caching known requests will not work Need to have all Q1-Q4 running at the same time Don t expect testing in sequence.
29 MySQL Understand the difference between different storage engines Understand the difference between different types of indices Remember, replication may not be the best strategy if the tables are big Avoid costly operations in front-end Make use of connection pools, caching configuration in MySQL Consider using a MySQL Cluster
30 HBase Make use of WebUI: Check which tables are getting requests the mosts Use split, move, merge functions to evenly distribute loads Make sure your front end is not the bottleneck Carefully design schema Consider building your own HBase- EMR is expensive and abstracts away details
31 Phase 2 Deadline What s due soon? Submission by 16:59 ET (Pittsburgh) Wed 11/11 HBase Live from 6 PM to 9 PM ET MySQL Live from 9 PM to midnight ET Fix Q1 and Q2 if your Phase 1 did not go well New queries Q3 and Q4. Targets 5000 and 8000 rps Get score higher than 50% on both MySQL and HBase Heads up: Phase 2 counts for 30% of 15619Project grade Report at the end of Phase 2 Make sure you highlight failures and learning If you didn t do well, explain why If you did, explain how Cannot begin to stress how critical this is!!!!
15-319 / 15-619 Cloud Computing. Recitation 9 October 27th and 29th, 2015
15-319 / 15-619 Cloud Computing Recitation 9 October 27th and 29th, 2015 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.2, OLI Unit 4, Module 14, Quiz 7
More information15-319 / 15-619 Cloud Computing. Recitation 5 September 29 th & October 1 st 2015
15-319 / 15-619 Cloud Computing Recitation 5 September 29 th & October 1 st 2015 1 Overview Administrative Stuff Last Week s Reflection Project 2.1, OLI Modules 5 & 6, Quiz 3 New concepts This week s schedule
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationReplicating to everything
Replicating to everything Featuring Tungsten Replicator A Giuseppe Maxia, QA Architect Vmware About me Giuseppe Maxia, a.k.a. "The Data Charmer" QA Architect at VMware Previously at AB / Sun / 3 times
More informationCS15-319 / 15-619 Cloud Computing. Recitation 2 September 2 & 4, 2014
CS15-319 / 15-619 Cloud Computing Recitation 2 September 2 & 4, 2014 Administrative Issues Overview --TA hours, Piazza posts guidelines Last Week s Reflection --Project Primer, Module 1 This Week s Schedule
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationAdministrative Issues
Administrative Issues Make use of office hours We will have to make sure that you have tried yourself before you ask Monitor AWS expenses regularly Always do the cost calculation before launching services
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationCS15-319 / 15-619 Cloud Computing
CS15-319 / 15-619 Cloud Computing Recitation 1 Course Overview and Introduction August 26 & 28 2014 http://www.cs.cmu.edu/~msakr/15619-f14/ Outline What is the course about? What is an online course? Administrivia
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationTungsten Replicator, more open than ever!
Tungsten Replicator, more open than ever! MC Brown, Senior Product Line Manager September, 2015 2014 VMware Inc. All rights reserved. We Face An Age Old Problem BRS/Search 2 It s Gotten Worse 3 Much Worse
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationUsing Kafka to Optimize Data Movement and System Integration. Alex Holmes @
Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need
More informationIBM WebSphere DataStage Online training from Yes-M Systems
Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationFederated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationSQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationOptimization of Analytic Data Flows for Next Generation Business Intelligence Applications
Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications Umeshwar Dayal, Kevin Wilkinson, Alkis Simitsis, Malu Castellanos, Lupita Paz HP Labs Palo Alto, CA, USA umeshwar.dayal@hp.com
More informationREAL-TIME BIG DATA ANALYTICS
www.leanxcale.com info@leanxcale.com REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationBig Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com
More informationBig data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More information6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
More informationSyllabus for Course 1-02-326: Database Systems Engineering at Kinneret College
Syllabus for Course 1-02-326: Database Systems Engineering at Kinneret College Instructor: Michael J. May Semester 2 of 5769 1 Course Details The course meets 9:00am 11:00am on Wednesdays. The Targil for
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationThe Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationFrom Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten
From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation Linas Virbalas, Senior Software Engineer. About Tungsten Replicator Open source drop-in
More informationDaniel J. Adabi. Workshop presentation by Lukas Probst
Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted
More informationData Analytics Infrastructure
Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure
More informationBenchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
More informationReal Time Analytics for Big Data. NtiSh Nati Shalom @natishalom
Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for
More informationTHE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
More informationAmazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
More informationCAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationBig Data Technology CS 236620, Technion, Spring 2013
Big Data Technology CS 236620, Technion, Spring 2013 Structured Databases atop Map-Reduce Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Roadmap Previous class MR Implementation This class Query Languages
More informationAccelerate the Performance of Virtualized Databases Using PernixData FVP Software
WHITE PAPER Accelerate the Performance of Virtualized Databases Using PernixData FVP Software Increase SQL Transactions and Minimize Latency with a Flash Hypervisor 1 Virtualization saves substantial time
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationSpeak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure
3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement
More informationCluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
More informationRyan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform
BDT312 Using the Cloud to Scale from a Database to a Data Platform Ryan Horn, Lead Software Engineer at Twilio November 12, 2014 Las Vegas 2014 Amazon.com, Inc. and its affiliates. All rights reserved.
More informationbrief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
More information#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
More informationEuropean Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
More informationAlexander Rubin Principle Architect, Percona April 18, 2015. Using Hadoop Together with MySQL for Data Analysis
Alexander Rubin Principle Architect, Percona April 18, 2015 Using Hadoop Together with MySQL for Data Analysis About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationWhere We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
More informationHow to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
More informationDesign for Failure High Availability Architectures using AWS
Design for Failure High Availability Architectures using AWS Harish Ganesan Co founder & CTO 8KMiles www.twitter.com/harish11g http://www.linkedin.com/in/harishganesan Sample Use Case Multi tiered LAMP/LAMJ
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationHadoop and MySQL for Big Data
Hadoop and MySQL for Big Data Alexander Rubin October 9, 2013 About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years Started at MySQL AB, Sun Microsystems, Oracle
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationNon-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
More informationGigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP)
More informationproject collects data from national events, both natural and manmade, to be stored and evaluated by
Joseph Sebastian CS 2994 Spring 2014 Undergraduate Research Final Paper GOALS The goal of my research was to assist the Integrated Digital Event Archive (IDEAL) team in transferring their Twitter data
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationBIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
More informationData Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
More informationBASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationInfrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationIntroduction to Hadoop
Introduction to Hadoop Miles Osborne School of Informatics University of Edinburgh miles@inf.ed.ac.uk October 28, 2010 Miles Osborne Introduction to Hadoop 1 Background Hadoop Programming Model Examples
More informationWeek 3 lecture slides
Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically
More information