Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud
|
|
|
- Leona Fisher
- 10 years ago
- Views:
Transcription
1 Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud Data Warehouse and Business Intelligence Architect Credit Suisse, Zurich Joint research between Credit Suisse and ETH Zurich: Willy Lai, Maria Grineva, Maxim Grinev, Donald Kossmann, Georg Polzer, Kurt Stockinger Date: Nov. 28, 2011, Slide 1
2 Agenda Auditing in Enterprises Traditional Data Warehouse Approach vs. Cloud Approach Query Log Analysis with Xadoop Results of Security Analysis on Real Data Date: Nov. 28, 2011, Slide 2
3 The Challenge that Many Companies Face Requirement: Every operation against the core database (data warehouse) must be traceable and explainable. Audits are performed at random points in time: Which user accessed attribute A1, A3 and A6 of tables T1 and T2? Which user deleted attribute A4 of view V2? Which user updated the value of attribute A5 of table T3? Capacity and performance management: Which table partitions were never accessed over the last year? Candidate for archiving. What are the top 10 longest running queries? Candidate for query optimization. Date: Nov. 28, 2011, Slide 3
4 Data Warehouse Application Platform Data Warehouse Application Platform Shared platform for integrating data from multiple internal and external sources for developing, deploying and operating applications that implement reporting, analysis and data mining functions. Scope Reporting and analysis (standard and ad-hoc reporting, Online Analytical Processing (OLAP) and data mining (in special areas (CRM)) Data from last end-of-day processing (4500 jobs per day) No operational/transactional reporting or direct initiation of business transactions Key figures Servers: 2 M9000, 25 M5000, several V490 / T2000, > 1000 CPUs, 4TB main memory ~700 TB storage with growth rate of TB/month (overall ET, IT, UAT and P) Throughput between DWH production server and HDS: TB/day (>1 GB/s) Users and applications ~100 applications on the platform with some users Accounting (Management Accounting, Financial Accounting) Legal & Compliance (Anti Money Laundering, Basel II, Swiss National Reporting) Customer Relationship Management (Front Support, Marketing) Operations (Credit-MIS, Credit Risk) etc. Systems Management (IT DWH) Date: Nov. 28, 2011, Slide 4
5 Full Logging of Database Queries in Production All the queries against the databases of the data warehouse are logged via Oracle Audit Option: One XML-file per user session. Compressed at regular intervals. Many Terabytes of uncompressed XML files per month. Data is kept for n days then it is archived. Date: Nov. 28, 2011, Slide 5
6 Possible Solutions: Data Warehouse vs. Cloud Approach Build a data warehouse: Not cost effective for a few queries per a year. Use private cloud-computing approach based on Hadoop/PIG: Ad-hoc analysis at unknown time periods: Compute resources are only needed for short periods. Hadoop is open source software with proven track record: Parallel file system where data is split into several chunks. Allows parallel analysis/querying of data. PIG is XML-based query language that runs on top of Hadoop: Query logs are already stored in XML files. Hadoop/PIG approach can be leveraged to analyze the queries in parallel. Framework can easily be deployed. Date: Nov. 28, 2011, Slide 6
7 Data Warehouse and Private Cloud Approach Audit required: shipping of XML-based query log files Data warehouse processing (typical business analytics) Hadoop/PIG-based analysis of XML query logs (large XML file distributed over all cloud nodes) Date: Nov. 28, 2011, Slide 7
8 Data: Audit Logs and Database Schema Snapshots Audit logs: Contain all database accesses (queries) of the data warehouse. The information logged in a single access contains: Audit type Session-, statement-, entry id Extended timestamp DB User, OS User, User Host OS Process SQL Text Database schema snapshots: Generated once a day. Capture the entire schema of the data warehouse: Table owners, names and configuration View owners, names, SQL statement and configuration View dependencies Synonyms Date: Nov. 28, 2011, Slide 8 8
9 Resolving Matching Illustration: Query Processing Steps Audit log entry: Grouped & aggregated results <AuditRecord> <Extended_Timestamp> </Extended_Timestamp> <DB_User>DEMOUSER</DB_User> <Sql_Text> select e.*, het.salary, het.title from employees e, highsal_engineer_titles het where e.emp_no = het.emp_no </Sql_Text> <AuditRecord> Schem a snapshot of views: <VIEW_NAME>HIGHSAL_ENGINEER_TITLES</VIEW_NAME> <TEXT_LENGTH>100</TEXT_LENGTH> <TEXT> select e.emp_no, t.title, s.salary from employees e, salaries s, title t where e.emp_no = s.emp_no and e.emp_no = t.emp_no and s.salary >= and UPPER(t.title) LIKE '%ENGINEER%' </TEXT> tables: employees {(DEMOUSER, )} salaries {(DEMOUSER, )} title {(DEMOUSER, )} highsal_engineer_titles {(DEMOUSER, )} attributes: employees.* {(DEMOUSER, )} employees.emp_no {(DEMOUSER, )} title.emp_no {(DEMOUSER, )} title.title {(DEMOUSER, )} salaries.salary {(DEMOUSER, )} salaries.emp_no {(DEMOUSER, )} highsal_engineer_titles.salary {(DEMOUSER, )} highsal_engineer_titles.title {(DEMOUSER, )} {(DEMOUSER, )} user: DEMOUSER {( , {(employees),(salaries),(title)},{(employees.*),(employ ees.emp_no),(title.emp_no),(title.title),(salaries.sala ry),(salaries.emp_no),(highsal_engineer_titles.salary), (highsal_engineer_titles.title),(highsal_engineer_title s.emp_no) })} Date: Nov. 28, 2011, Slide 9 9
10 Architecture of Xadoop-Based Query Processing Date: Nov. 28, 2011, Slide 10 10
11 Typical Audit Query: "Was TableX Accessed During Lunch Time?" register./pigxml.jar define DATECOMP ch.ethz.xadoop.udf.datecomp(); A = load '/user/drwho/querylogs1.xml' using ch.ethz.xadoop.loader.xmlloader() as (..); Date: Nov. 28, 2011, Slide 11
12 Typical Audit Query: "Was TableX Accessed During Lunch Time?" register./pigxml.jar define DATECOMP ch.ethz.xadoop.udf.datecomp(); A = load '/user/drwho/querylogs1.xml' using ch.ethz.xadoop.loader.xmlloader() as (..); B = filter A by sql_text matches '*TableX*' and DATECOMP((chararray)extended_timestamp, ' T12:00: ')>0 and DATECOMP((chararray)extended_timestamp, ' T14:00: ')<0; Date: Nov. 28, 2011, Slide 12
13 Typical Audit Query: "Was TableX Accessed During Lunch Time?" register./pigxml.jar define DATECOMP ch.ethz.xadoop.udf.datecomp(); A = load '/user/drwho/querylogs1.xml' using ch.ethz.xadoop.loader.xmlloader() as (..); B = filter A by sql_text matches '*TableX*' and DATECOMP((chararray)extended_timestamp, ' T12:00: ')>0 and DATECOMP((chararray)extended_timestamp, ' T14:00: ')<0; C = foreach B generate db_user, sql_text, extended_timestamp; dump C; store C into '/user/drwho/analysis_querylogs1_2011_05_17.res'; Date: Nov. 28, 2011, Slide 13
14 Synthetic Data and Queries at ETH Zurich - Measurement Results on 1 Node Test set input Input data size Output data size Execution time MB Tables: Attributes: 423 MB MB 1.4 h Users: 637 MB MB Tables: Attributes: 1057 MB MB 3.3 h Users: MB MB Tables: Attributes: MB MB 6.7 h Users: MB MB Tables: Attributes: MB MB 13.6 h Users: MB Date: Nov. 28, 2011, Slide 14 Thursday, July 12,
15 Execution Times Measured at ETH Zurich 13.6 h 6.7 h 3.4 h Date: Nov. 28, 2011, Slide 15 15
16 Experiments inside Credit Suisse on Real Data #1 We ran our SQL parser on real audit log/schema view snapshot files. We measured the number of audit records/view SQL statements that could be parsed. 8.63% of view SQL statements could not be parsed due to: Syntactically incorrect SQL statem ents. Issues in the generation of the schema snapshots (Oracle bug). Date: Nov. 28, 2011, Slide 16 16
17 Experiments inside Credit Suisse on Real Data #2 We extracted object names (tables and views) from successfully parsed SQLs. Measured matching objects between audit log entries and schema snapshots % and 96.5% success rates, respectively. Limitations: Newly created objects not present in the schema cannot be matched. Neither can be table functions and DB-links. Date: Nov. 28, 2011, Slide 17 17
18 Dealing with Intraday Schema Modification: The Problem Assume we have a vicious person that does not want anybody to know that he/she has accessed some critical data. CREATE VIEW my_secret as SELECT * FROM my_secret Table accessed :00 my_secret? DROP VIEW my_secret Create view on critical data Access view on critical data Delete view on critical data Schema Snapshot Schema Snapshot Schema Snapshot Date: Nov. 28, 2011, Slide 18 Thursday, July 12,
19 Dealing with Intraday Schema Modification: The Counter Measure Log all create and drop statements in a separate file. Output all tables and views that could not be matched with the schema. For all these tables check whether they can be matched to some create or drop statements and resolve accordingly. CREATE VIEW my_secret as SELECT * FROM my_secret DROP VIEW my_secret CREATE and DROP log :30 CREATE VIEW my_secret as :45 DROP VIEW my_secret Create view on critical data Access view on critical data Delete view on critical data Unmatched objects log :00 my_secret Tables accessed :00 my_secret, Schema Snapshot Schema Snapshot Schema Snapshot Date: Nov. 28, 2011, Slide 19 Thursday, July 12,
20 Conclusions Prototype implementation was performed on parts of the production query logs of the Credit Suisse data warehouse. Xadoop-based analysis with above 95% accuracy and linear scalability: Several different groups within Credit Suisse in Zurich and New York provided excellent feedback and are ready for further collaboration. Next steps: Perform large-scale query log analysis against the warehouses covering data volumes of several months. Implement more advanced use cases with several new stakeholders in Credit Suisse based on machine learning techniques. Date: Nov. 28, 2011, Slide 20 20
SAP HANA In-Memory Database Sizing Guideline
SAP HANA In-Memory Database Sizing Guideline Version 1.4 August 2013 2 DISCLAIMER Sizing recommendations apply for certified hardware only. Please contact hardware vendor for suitable hardware configuration.
Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich
Technical Challenges for Big Health Care Data Donald Kossmann Systems Group Department of Computer Science ETH Zurich What is Big Data? technologies to automate experience Purpose answer difficult questions
Inge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
Driving Peak Performance. 2013 IBM Corporation
Driving Peak Performance 1 Session 2: Driving Peak Performance Abstract We know you want the fastest performance possible for your deployments, and yet that relies on many choices across data storage,
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
QAD Business Intelligence Release Notes
QAD Business Intelligence Release Notes September 2008 These release notes include information about the latest QAD Business Intelligence (QAD BI) fixes and changes. These changes may affect the way you
Parallel Data Warehouse
MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability
Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
SQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
Creating Connection with Hive
Creating Connection with Hive Intellicus Enterprise Reporting and BI Platform Intellicus Technologies [email protected] www.intellicus.com Creating Connection with Hive Copyright 2010 Intellicus Technologies
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
W I S E. SQL Server 2012 Database Engine Technical Update WISE LTD.
Technical Update COURSE CODE: COURSE TITLE: LEVEL: AUDIENCE: SQSDBE SQL Server 2012 Database Engine Technical Update Beginner-to-intermediate SQL Server DBAs and/or system administrators PREREQUISITES:
Enterprise and Standard Feature Compare
www.blytheco.com Enterprise and Standard Feature Compare SQL Server 2008 Enterprise SQL Server 2008 Enterprise is a comprehensive data platform for running mission critical online transaction processing
POLAR IT SERVICES. Business Intelligence Project Methodology
POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option
Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option Kai Yu, Senior Principal Architect Dell Oracle Solutions Engineering Dell, Inc. Abstract: By adding the In-Memory
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
Overview: X5 Generation Database Machines
Overview: X5 Generation Database Machines Spend Less by Doing More Spend Less by Paying Less Rob Kolb Exadata X5-2 Exadata X4-8 SuperCluster T5-8 SuperCluster M6-32 Big Memory Machine Oracle Exadata Database
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Oracle Database 11g Comparison Chart
Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
Main Memory Data Warehouses
Main Memory Data Warehouses Robert Wrembel Poznan University of Technology Institute of Computing Science [email protected] www.cs.put.poznan.pl/rwrembel Lecture outline Teradata Data Warehouse
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
Cloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Business Intelligence and Healthcare
Business Intelligence and Healthcare SUTHAN SIVAPATHAM SENIOR SHAREPOINT ARCHITECT Agenda Who we are What is BI? Microsoft s BI Stack Case Study (Healthcare) Who we are Point Alliance is an award-winning
Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
EMC/Greenplum Driving the Future of Data Warehousing and Analytics
EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Innovative technology for big data analytics
Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
SQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Zynga Analytics Leveraging Big Data to Make Games More Fun and Social
Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game
Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage
Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
SharePlex for SQL Server
SharePlex for SQL Server Improving analytics and reporting with near real-time data replication Written by Susan Wong, principal solutions architect, Dell Software Abstract Many organizations today rely
Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.
DBA Fundamentals COURSE CODE: COURSE TITLE: AUDIENCE: SQSDBA SQL Server 2008/2008 R2 DBA Fundamentals Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows
Hadoop and its Usage at Facebook. Dhruba Borthakur [email protected], June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur [email protected], June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database
1 Best Practices for Extreme Performance with Data Warehousing on Oracle Database Rekha Balwada Principal Product Manager Agenda Parallel Execution Workload Management on Data Warehouse
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
Server Consolidation with SQL Server 2008
Server Consolidation with SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 supports multiple options for server consolidation, providing organizations
Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
QlikView Business Discovery Platform. Algol Consulting Srl
QlikView Business Discovery Platform Algol Consulting Srl Business Discovery Applications Application vs. Platform Application Designed to help people perform an activity Platform Provides infrastructure
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012
MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course Overview This course provides students with the knowledge and skills to design business intelligence solutions
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
Oracle Architecture, Concepts & Facilities
COURSE CODE: COURSE TITLE: CURRENCY: AUDIENCE: ORAACF Oracle Architecture, Concepts & Facilities 10g & 11g Database administrators, system administrators and developers PREREQUISITES: At least 1 year of
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist
2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage
Big + Fast + Safe + Simple = Lowest Technical Risk
Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate. Technical White Paper
Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate Technical White Paper Mathias Zarick, Karol Hajdu Senior Consultants March-2011 While looking for a solution for
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days SSIS Training Prerequisites All SSIS training attendees should have prior experience working with SQL Server. Hands-on/Lecture
Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015
Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
IBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.
FIFTH EDITION Oracle Essentials Rick Greenwald, Robert Stackowiak, and Jonathan Stern O'REILLY" Beijing Cambridge Farnham Koln Sebastopol Tokyo _ Table of Contents Preface xiii 1. Introducing Oracle 1
SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.
SQL Server 2014 What s New? Christopher Speer Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) [email protected] The evolution of the Microsoft data platform What s New
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Application of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD [email protected] July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Building Scalable Big Data Pipelines
Building Scalable Big Data Pipelines NOSQL SEARCH ROADSHOW ZURICH Christian Gügi, Solution Architect 19.09.2013 AGENDA Opportunities & Challenges Integrating Hadoop Lambda Architecture Lambda in Practice
Security Information/Event Management Security Development Life Cycle Version 5
Security Information/Event Management Security Development Life Cycle Version 5 If your enterprise is like most, you are collecting logs from most every device with security relevance. The flood of events
LearnFromGuru Polish your knowledge
SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities
