Google Cloud Data Platform & Services. Gregor Hohpe
|
|
|
- Steven Jefferson
- 9 years ago
- Views:
Transcription
1 Google Cloud Data Platform & Services Gregor Hohpe
2 All About Data
3 We Have More of It Internet data more easily available Logs user & system behavior Cheap Storage keep more of it 3
4 Beyond just Relational Structure Relational (Hosted SQL) Record-oriented (Bigtable) Nested (Protocol Buffers) Graphs (Pregel) 4
5 Beyond just Relational Analysis needs Number crunching (MapReduce / FlumeJava) Ad hoc (Dremel) Precise vs. Estimate (Sawzall) Model Generation & Prediction (Prediction API) 5
6 Google Storage for Developers Store your data in Google's cloud
7 What Is Google Storage? Store your data in Google's cloud o any format, any amount, any time You control access to your data o private, shared, or public Access via Google APIs or 3rd party tools/libraries
8 Sample Use Cases Static content hosting e.g. static html, images, music, video Backup and recovery e.g. personal data, business records Sharing e.g. share data with your customers Data storage for applications e.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computation e.g. BigQuery, Prediction API
9 Core Features RESTful API o GET, PUT, POST, HEAD, DELETE o Resources identified by URI o Compatible with S3 Organized in Buckets o Flat containers, i.e. no bucket hierarchy Buckets Contain Objects o Any type o Size: <100 GB / object Access Control for Google Accounts o For individuals and groups Authenticate via o Signing request using access keys o Web browser login
10 Performance and Scalability Objects of any type and 100 GB / Object Unlimited numbers of objects, 1000s of buckets All data replicated to multiple US data centers Leveraging Google's worldwide network for data delivery Only you can use bucket names with your domain names Read-your-writes data consistency Range Get
11
12 gsutil
13 Pricing & Availability Storage - $0.17/GB/month Network o Upload data to Google $0.10/GB o Download data from Google $0.15/GB for Americas and EMEA $0.30/GB for APAC Requests o PUT, POST, LIST - $0.01 per 1,000 Requests o GET, HEAD - $0.01 per 10,000 Requests Free storage (up to 100GB) during limited preview period o No SLA
14 GOOGLE PREDICTION API GOOGLE'S PREDICTION ENGINE IN THE CLOUD
15 Introducing the Google Prediction API Google's machine learning technology Now available externally RESTful Web service
16 How does it work? 1. TRAIN The Prediction API finds relevant features in the sample data during training. "english" "english" "spanish" "spanish" The quick brown fox jumped over the lazy dog. To err is human, but to really foul things up you need a computer. No hay mal que por bien no venga. La tercera es la vencida. 2. PREDICT The Prediction API later searches for those features during prediction.? To be or not to be, that is the question.? La fe mueve montañas.
17 Countless applications... Customer Sentiment Transaction Risk Species Identification Message Routing Diagnostics Churn Prediction Legal Docket Classification Suspicious Activity Work Roster Assignment Inappropriate Content Recommend Products Political Bias Uplift Marketing Filtering Career Counseling... and many more...
18 Using Your Data with Prediction API & BigQuery 1. Upload Upload your data to Google Storage Your Data 2. Process Import to tables Train a model BigQuery Prediction API 3. Act Run queries Make predictions Your Apps
19 Step 1: Upload Training data: outputs and input features Data format: comma separated value format (CSV), result in first column "english","to err is human, but to really..." "spanish","no hay mal que por bien no venga."... Upload to Google Storage gsutil cp ${data} gs://yourbucket/${data}
20 Step 2: Train To train a model: POST prediction/v1.1/training?data=mybucket%2fmydata Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/mybucket%2fmydata {"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}}
21 Step 3: Predict POST prediction/v1.1/query/mybucket%2fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}
22 Step 3: Predict POST prediction/v1.1/query/mybucket%2fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} { data : { "kind" : "prediction#output", "outputlabel":"french", "outputmulti" :[ {"label":"french", "score": x.xx} {"label":"english", "score": x.xx} {"label":"spanish", "score": x.xx}]}}
23 Step 3: Predict Apply the trained model to make predictions on new data import httplib # put new data in JSON format params = {... } header = {"Content-Type" : "application/json"} conn = httplib.httpconnection(" request("post", "/prediction/v1.1/query/mybucket%2fmydata/predict", params, header) print conn.getresponse()
24 Prediction API Capabilities Data Input Features: numeric or unstructured text Output: up to hundreds of discrete categories Training Many machine learning techniques Automatically selected Performed asynchronously Access from many platforms: Web app from Google App Engine Apps Script (e.g. from Google Spreadsheet) Desktop app
25 GOOGLE BIGQUERY INTERACTIVE ANALYSIS OF LARGE DATASETS
26 Key Capabilities Scalable: Billions of rows Fast: Response in seconds Simple: Queries in SQL-like language Accessible via: o REST o JSON-RPC o Google App Scripts
27 Writing Queries Compact subset of SQL o SELECT... FROM... WHERE... GROUP BY... ORDER BY... LIMIT...; Common functions o Math, String, Time,... Additional statistical approximations o TOP o COUNT DISTINCT
28 BigQuery via REST GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query} Sample JSON Reply: { "results": { "fields": { [ {"id":"count(*)","type":"uint64"},... ] }, "rows": [ {"f":[{"v":"2949"},...]}, {"f":[{"v":"5387"},...]},... ] } } Also supports JSON-RPC
29 Example Using BigQuery Shell Python DB API B. Clapper's sqlcmd
30 WORTH READING SOME OF WHAT IS UNDER THE COVERS
31 FlumeJava Processing Pipeline A Java library that makes it easy to develop, test, and run efficient data-parallel pipelines. At the core of the FlumeJava library are classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies. To enable parallel operations to run efficiently, FlumeJava defers their evaluation, instead internally constructing an execution plan dataflow graph. 31
32 Sawzall Parallel Data Analysis We present a system for automating analyses. A filtering phase, in which a query is expressed using a new programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines. 32
33 Dremel Ad Hoc Query A scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. 33
34 Pregel Comp Model for Graphs Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. 34
35 More information Google Storage for Developers Prediction API BigQuery Dremel
A Quick Introduction to Google's Cloud Technologies
A Quick Introduction to Google's Cloud Technologies Chris Schalk Developer Advocate @cschalk Anatoli Babenia Agenda Introduction Introduction to Google's Cloud Technologies App Engine Review Google's new
The Cloud to the rescue!
The Cloud to the rescue! What the Google Cloud Platform can make for you Aja Hammerly, Developer Advocate twitter.com/thagomizer_rb So what is the cloud? The Google Cloud Platform The Google Cloud Platform
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
Google Cloud Platform The basics
Google Cloud Platform The basics Who I am Alfredo Morresi ROLE Developer Relations Program Manager COUNTRY Italy PASSIONS Community, Development, Snowboarding, Tiramisu' Reach me [email protected]
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
Querying Massive Data Sets in the Cloud with Google BigQuery and Java. Kon Soulianidis JavaOne 2014
Querying Massive Data Sets in the Cloud with Google BigQuery and Java Kon Soulianidis JavaOne 2014 Agenda Massive Data Qualification A Big Data Problem What is BigQuery? BigQuery Java API Getting data
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
Big Data on Google Cloud
Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud way William Vambenepe, Lead Product Manager for Big Data, Google Cloud Platform @vambenepe / [email protected]
TDAQ Analytics Dashboard
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
<Insert Picture Here> Oracle and/or Hadoop And what you need to know
Oracle and/or Hadoop And what you need to know Jean-Pierre Dijcks Data Warehouse Product Management Agenda Business Context An overview of Hadoop and/or MapReduce Choices, choices,
Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise
1 Ability to access the database platforms desired (text, spreadsheet, Oracle, Sybase and other databases, OLAP engines.) Y Y 2 Ability to access relational data base Y Y 3 Ability to access dimensional
The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.
Content Introduction... 2 Data Access Server Control Panel... 2 Running the Sample Client Applications... 4 Sample Applications Code... 7 Server Side Objects... 8 Sample Usage of Server Side Objects...
Data Lab System Architecture
Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
Big Data with the Google Cloud Platform
Big Data with the Google Cloud Platform Nacho Coloma CTO & Founder at Extrema Sistemas Google Developer Expert for the Google Cloud Platform @nachocoloma http://gplus.to/icoloma For the past 15 years,
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz
HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz Overview Google App Engine (GAE) GAE Analytics Libraries
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Visualisation in the Google Cloud
Visualisation in the Google Cloud by Kieran Barker, 1 School of Computing, Faculty of Engineering ABSTRACT Providing software as a service is an emerging trend in the computing world. This paper explores
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Big Data for Conventional Programmers Big Data - Not a Big Deal
Big Data for Conventional Programmers Big Data - Not a Big Deal by Efraim Moscovich, Senior Principal Software Architect, CA Technologies Processing large volumes of data has been around for decades (such
Cloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud [email protected] 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS
Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical
References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline
References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack
Apache Spark Document Analysis Course (Fall 2015 - Scott Sanner) Zahra Iman Some slides from (Matei Zaharia, UC Berkeley / MIT& Harold Liu) Reminder SparkConf JavaSpark RDD: Resilient Distributed Datasets
QualysGuard WAS. Getting Started Guide Version 4.1. April 24, 2015
QualysGuard WAS Getting Started Guide Version 4.1 April 24, 2015 Copyright 2011-2015 by Qualys, Inc. All Rights Reserved. Qualys, the Qualys logo and QualysGuard are registered trademarks of Qualys, Inc.
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Getting Started Guide for Developing tibbr Apps
Getting Started Guide for Developing tibbr Apps TABLE OF CONTENTS Understanding the tibbr Marketplace... 2 Integrating Apps With tibbr... 2 Developing Apps for tibbr... 2 First Steps... 3 Tutorial 1: Registering
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
QualysGuard WAS. Getting Started Guide Version 3.3. March 21, 2014
QualysGuard WAS Getting Started Guide Version 3.3 March 21, 2014 Copyright 2011-2014 by Qualys, Inc. All Rights Reserved. Qualys, the Qualys logo and QualysGuard are registered trademarks of Qualys, Inc.
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
From the Monolith to Microservices: Evolving Your Architecture to Scale. Randy Shoup @randyshoup linkedin.com/in/randyshoup
From the Monolith to Microservices: Evolving Your Architecture to Scale Randy Shoup @randyshoup linkedin.com/in/randyshoup Background Consulting CTO at Randy Shoup Consulting o o Helping companies from
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
Performance Testing Process A Whitepaper
Process A Whitepaper Copyright 2006. Technologies Pvt. Ltd. All Rights Reserved. is a registered trademark of, Inc. All other trademarks are owned by the respective owners. Proprietary Table of Contents
Reporting Services. White Paper. Published: August 2007 Updated: July 2008
Reporting Services White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 Reporting Services provides a complete server-based platform that is designed to support a wide
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture
Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2
SparkR 1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane SparkR 2 Lecture slides and/or video will be made available within one week Live Demonstration
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer [email protected] Assistants: Henri Terho and Antti
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Parquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li [email protected] Software engineer, Cloudera Impala Outline Context from various
Apache Flink Next-gen data analysis. Kostas Tzoumas [email protected] @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas [email protected] @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the
Visualizing a Neo4j Graph Database with KeyLines
Visualizing a Neo4j Graph Database with KeyLines Introduction 2! What is a graph database? 2! What is Neo4j? 2! Why visualize Neo4j? 3! Visualization Architecture 4! Benefits of the KeyLines/Neo4j architecture
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Storing and Processing Sensor Networks Data in Public Clouds
UWB CSS 600 Storing and Processing Sensor Networks Data in Public Clouds Aysun Simitci Table of Contents Introduction... 2 Cloud Databases... 2 Advantages and Disadvantages of Cloud Databases... 3 Amazon
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
FREE computing using Amazon EC2
FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Microsoft SQL Server Features that can be used with the IBM i
that can be used with the IBM i Gateway/400 User Group February 9, 2012 Craig Pelkie [email protected] Copyright 2012, Craig Pelkie ALL RIGHTS RESERVED What is Microsoft SQL Server? Windows database management
What s next for the Berkeley Data Analytics Stack?
What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff
MicroStrategy Desktop
MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
How To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Spark. Fast, Interactive, Language- Integrated Cluster Computing
Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
Monitoring System Status
CHAPTER 14 This chapter describes how to monitor the health and activities of the system. It covers these topics: About Logged Information, page 14-121 Event Logging, page 14-122 Monitoring Performance,
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
Open source framework for interactive data exploration in server based architecture
Open source framework for interactive data exploration in server based architecture D5.5 v1.0 WP5 Visual Analytics: D5.5 Open source framework for interactive data exploration in server based architecture
SQL Server 2005 Features Comparison
Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions
Big Data Analytics Using SAP HANA Dynamic Tiering Balaji Krishna SAP Labs SESSION CODE: BI474
Big Data Analytics Using SAP HANA Dynamic Tiering Balaji Krishna SAP Labs SESSION CODE: BI474 LEARNING POINTS How Dynamic Tiering reduces the TCO of HANA solution Data aging concepts using in-memory and
RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG
1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud
Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud Use case Figure 1: Company C Architecture (Before Migration) Company C is an automobile insurance claim processing company with
The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
