Big Data Analytics in LinkedIn. Danielle Aring & William Merritt
|
|
- Dylan Melvin Banks
- 8 years ago
- Views:
Transcription
1 Big Data Analytics in LinkedIn by Danielle Aring & William Merritt
2 2
3 Brief History of LinkedIn - Launched in 2003 by Reid Hoffman ( : Introduced first business lines : Jobs and Subscriptions : Launched public profiles (achieved portability/new features) : LinkedIn goes GLOBAL! ( : Site transformation/rapid growth : ~225 million members (27 % of LinkedIn subscribers are recruiters) : Next decade focused on map of digital economy 3
4 4
5 5
6 Three Major Data 6
7 LinkedIn Challenges for Web-scale OLAP Horizontally scalable currently over 200+ million users adding 2 new members per second Quick response time to user s queries High availability High read & write throughput (billions of monthly page views) Heavy dependency on slowest node s response as data is spread across various nodes 7
8 Current OLAP Solutions - not suited for high-traffic website What is OLAP - Online Analytical Processing Long transactions Complex queries Mining and analyzing large amounts of data Infrequent updates of data Traditional for Business Intelligence (i.e. SAP, Oracle and etc) retrieve & consolidate partial results across nodes (causing slow responses) Distributed (problems: w/latency, availability and cost) Materialized Cubes (loading billions of page views - load too high) 8
9 Avatara: solution for Web-scale Analytics Products Provides fast scalable OLAP system handles small cubes scenarios simple grammar for cube construction and query at scale sharding of cube dimension into key-value model leverage distributed key-value store for low-latency high availability access to cubes leverages hadoop for joins Two examples of analytics features: WVMP - cube sharded by member ID Who s viewed my profile? (WVMP) WVTJ - cube sharded across jobs Who s viewed this job? (WVTJ) 9
10 Avatara: solution con t Sharding (i.e horizontal scaling) divides the data set and distributes the data over multiple servers. Each shard is an independent database and together the shards make up a single logical database sharding on a primary key (turning a big cube into smaller ones) Store cube data s in one location requires a single disk fetch Offline Batch Engine High throughput Batch processing (Hadoop Jobs) Online Query Engine low latency, high availability key-value paradigm for storing data (Voldemort) 10
11 Avatara: Architecture -- 11
12 Avatara: Offline Batch Engine - Three Phases driven by a simple configuration file Preprocessing preparing the data using built-in functions to roll up data customized scripting for further processing Projections and Joins builds the dimension & fact tables a join key ties dimension & fact tables Cubification partitions the data by cube shard key & produces small cubes data can be retrieved in a single disk fetch for faster responses cubes are bulk loaded into a distributed key-value store (i.e. Voldemort) 12
13 Avatara: Online Query Engine Serves queries in real time Retrieves & processes data from key-value store (i.e. Voldemort) Fast retrieval because of compact cubes per sharded key (i.e. member_id) SQL-like syntax for clients Supports select, where, group-by, having, order and etc. operations Simplifies development for developers 13
14 Cube Thinning Avatara s mechanism for thinning cubes too large to process on page load (such as: President Obama or Lebron James) Allows developers to do the following: set priorities and constraints on dimensions aggregated to a specific value (such as other category) drop data across pre-defined dimensions ex: WVMP can opt to drop data across time dimension resulting in a shorter history! 14
15 15
16 16
17 In Summary Avatara has been working several years at LinkedIn (i.e. in-house OLAP system) Allows developers to build OLAP cubes with a single configuration file Hybrid offline/online strategy combined with sharding into key-value store Powers large web-scale applications such as: WVMP, WVTJ and Jobs You May Be Interested In Avatara uses Hadoop for batch computing infrastructure SQL-like query interaction Hadoop batch engine can handle TBs of data & process in less than hrs of time Voldemort can respond to online queries in milliseconds Future Work: Near real-time cubing Streaming joins Dimension and schema changes 17
18 BIG DATA PROJECT 18
19 Data Mining with LinkedIn using AJAX call to REST API Overview: Extract a large quantity of data from LinkedIn using AJAX call to REST API Transform data into structured csv file format via scripts Create tables in nosql database Hive installed on top of HDFS Query database to make Analytic insights on people, jobs, and companies Visualize Hive queries via Tableau 19
20 Issue: Extracting Data From LinkedIn API Followed instructions on LinkedIn Developer: Authenticating with Oauth 2.0 Success: configuring app, requesting authorization code Fail: Exchanging authorization code for request token (INVALID) 20
21 How to Extract Data From LinkedIn When Refused by LinkedIn API? Unable to download streamed data using CONVENTIONAL tools Solution: Data Extraction via AJAX call to REST API Jase Clamp tutorial on YouTube How to Extract Data from LinkedIn 21
22 Tools For Data Extraction/Transformation FireFox Web Console Firebug JavaScript AJAX JQuery Scripts company, person and jobs Since we can not stream data into HDFS data transformation to structured file format done externally! 22
23 REST API, AJAX, JQuery REST API: REpresentational State Transfer stateless separates server from client leverages layered system JQuery: JavaScript and DOM manipulation library makes client side scripting easier simplifies syntax for finding, selecting and manipulating DOM elements AJAX request: (asynchronous JavaScript and XML) ~ XMLHttpRequest uses client side scripting to exchange data with web server types: GET, POST, PUT, DELETE loads all data once 23
24 theurl=' orig=trnv&rsid= &trk=vsrp_jobs_sel&trkinfo=vsrpsearchid%3a , VSRPcmpt%3Atrans_nav&locationType=I&countryCode=us&openFacets=L, C&page_num='+i+'&pt=jobs&rnd= '; 24
25 Retrieving Data Structure 25
26 Structure of Companies Data 26
27 Structure of Jobs Data 27
28 Structure of Person Data 28
29 Data Extraction/Transformation Steps Login Linkedin account Create search (on Jobs, Companies, People etc) Right click select inspect element using firebug (console should display) Code/Paste script into right side of console window move to All tab Navigate to page 2 of search results Locate GET REST api call (in console window) right click copy location Paste call into theurl variable inside script Change pagenum=2 in URL to pagenum= +i+ Run script (parse into comma separated JSON) navigate to info in console copy and paste data into text editor remove empty lines and pasted in csv 29
30 30
31 Hadoop and Hive ~versions Hive chosen because data transformed into structured csv 31
32 Hive ~Table Creation and Data Upload created 3 tables: companies, jobs, person ~Hive HQL: similar to SQL DDL statements for table creation and DML for insert 32
33 HQL (Hive Query Language) Used HQL queries to derive insights from our data Includes: Top companies with highest # followers Top locations with highest job count Job title and count per location Top job titles recently listed Location of jobs listed 1 day ago Comparison of # of connections of people with and without profile image Comparison Profile Headlines with Highest Connection Count vs those with lower connection count Query visualization done in Tableau 33
34 insert overwrite local directory '/usr/local/hql' row format delimited fields terminated by ','Select followercount, name, rank() over (ORDER BY followercount DESC) as rank from companies ranked_followers WHERE ranked_followers.rank < 10 ORDER BY followercount DESC; Top companies with highest number of followers F1~ # of followers 34
35 insert overwrite local directory '/usr/local/hql' row format delimited fields terminated by ',' Select location, jobcount FROM (select location, rank() over (ORDER BY jobcount DESC) as rank, jobcount from companies) ranked_jobs WHERE ranked_jobs.rank < 51 ORDER BY location, jobcount DESC; Top locations that have the highest number of jobs F2~ # of jobs 35
36 insert overwrite local directory '/usr/local/hql' row format delimited fields terminated by ','SELECT c.location, j.jobtitle FROM companies c left outer join jobs j on (c.location = j.location); Join on companies and jobs table selecting location and jobtitle (looking at number of jobs listed in each area) 36
37 insert overwrite local directory '/usr/local/hql' row format delimited fields terminated by ',' SELECT companyname, jobtitle, jobrecency FROM (select companyname, jobtitle, rank() over (ORDER BY jobrecency DESC) as rank, jobrecency from jobs) ranked_jobtitles WHERE ranked_jobtitles.rank < 11 ORDER BY jobtitle, jobrecency DESC; Top Job titles recently listed 37
38 insert overwrite local directory '/usr/local/hql' row format delimited fields terminated by ',' select location, companyname, jobtitle from jobs where jobrecency="1 day ago"; locations of jobs listed 1 day ago 38
39 insert overwrite local directory '/usr/local/hql' select count(*), sum(connectioncount) from person where imageurl!="undefined"; insert overwrite local directory '/usr/local/hql' select count(*), sum(connectioncount) from person where imageurl ="undefined"; Comparison: # connections of people with and without profile photo on webpage. ratio 5 : 454 on Average those w/out profile pic: ~470 connections with profile pic: ~ person connection difference! 39
40 insert overwrite local directory '/usr/local/hql' select connectioncount, firstname, headline from person where connectioncount > 500; Profile Headlines with Highest Connections 40
41 insert overwrite local directory '/usr/local/hql' select connectioncount, firstname, headline from person where connectioncount < 200; Profile Headlines with lowest Connections 41
42 Interested in trying on your own? Links: FireBug add-on to FireFox: Jase Clamp tutorial Extracting Data From LinkedIn : Data Extraction Script on Github: Tableau Download: 42
43 Sources
44 THANK YOU 44
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
More informationCHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationitunes Store Publisher User Guide Version 1.1
itunes Store Publisher User Guide Version 1.1 Version Date Author 1.1 10/09/13 William Goff Table of Contents Table of Contents... 2 Introduction... 3 itunes Console Advantages... 3 Getting Started...
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationIntroduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12
Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language
More informationHow to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
More informationCHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
More informationBest Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
More informationDeploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture
Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent
More informationLofan Abrams Data Services for Big Data Session # 2987
Lofan Abrams Data Services for Big Data Session # 2987 Big Data Are you ready for blast-off? Big Data, for better or worse: 90% of world s data generated over last two years. ScienceDaily, ScienceDaily
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationBIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
More informationExploring the Synergistic Relationships Between BPC, BW and HANA
September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationVisualizing a Neo4j Graph Database with KeyLines
Visualizing a Neo4j Graph Database with KeyLines Introduction 2! What is a graph database? 2! What is Neo4j? 2! Why visualize Neo4j? 3! Visualization Architecture 4! Benefits of the KeyLines/Neo4j architecture
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationOracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014
Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014 Disclaimer The following is intended to outline our general product direction. It is intended
More informationSisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationTIBCO Live Datamart: Push-Based Real-Time Analytics
TIBCO Live Datamart: Push-Based Real-Time Analytics ABSTRACT TIBCO Live Datamart is a new approach to real-time analytics and data warehousing for environments where large volumes of data require a management
More informationQlik REST Connector Installation and User Guide
Qlik REST Connector Installation and User Guide Qlik REST Connector Version 1.0 Newton, Massachusetts, November 2015 Authored by QlikTech International AB Copyright QlikTech International AB 2015, All
More informationQUICK START GUIDE. Cloud based Web Load, Stress and Functional Testing
QUICK START GUIDE Cloud based Web Load, Stress and Functional Testing Performance testing for the Web is vital for ensuring commercial success. JAR:Load is a Web Load Testing Solution delivered from the
More informationHow, What, and Where of Data Warehouses for MySQL
How, What, and Where of Data Warehouses for MySQL Robert Hodges CEO, Continuent. Introducing Continuent The leading provider of clustering and replication for open source DBMS Our Product: Continuent Tungsten
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationGetting Started Guide for Developing tibbr Apps
Getting Started Guide for Developing tibbr Apps TABLE OF CONTENTS Understanding the tibbr Marketplace... 2 Integrating Apps With tibbr... 2 Developing Apps for tibbr... 2 First Steps... 3 Tutorial 1: Registering
More informationIntegrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
More informationHP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate
HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization Presented by: Ritika Rahate MicroStrategy Data Access Workflows There are numerous ways for
More informationDecoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco
Decoding the Big Data Deluge a Virtual Approach Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco High-volume, velocity and variety information assets that demand
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationBig Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715
Big Data Facebook Wall Data using Graph API Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715 Outline Data Source Processing tools for processing our data Big Data Processing System: Mongodb
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationCreating a universe on Hive with Hortonworks HDP 2.0
Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh
More informationSAP BO Course Details
SAP BO Course Details By Besant Technologies Course Name Category Venue SAP BO SAP Besant Technologies No.24, Nagendra Nagar, Velachery Main Road, Address Velachery, Chennai 600 042 Landmark Opposite to
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationAn introduction to creating Web 2.0 applications in Rational Application Developer Version 8.0
An introduction to creating Web 2.0 applications in Rational Application Developer Version 8.0 September 2010 Copyright IBM Corporation 2010. 1 Overview Rational Application Developer, Version 8.0, contains
More informationA very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationtibbr Now, the Information Finds You.
tibbr Now, the Information Finds You. - tibbr Integration 1 tibbr Integration: Get More from Your Existing Enterprise Systems and Improve Business Process tibbr empowers IT to integrate the enterprise
More informationORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationSimplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
More informationToad for Oracle 8.6 SQL Tuning
Quick User Guide for Toad for Oracle 8.6 SQL Tuning SQL Tuning Version 6.1.1 SQL Tuning definitively solves SQL bottlenecks through a unique methodology that scans code, without executing programs, to
More informationMitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform
Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform May 2015 Contents 1. Introduction... 3 2. What is BIM... 3 2.1. History of BIM... 3 2.2. Why Implement BIM... 4 2.3.
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationXML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
More informationREST web services. Representational State Transfer Author: Nemanja Kojic
REST web services Representational State Transfer Author: Nemanja Kojic What is REST? Representational State Transfer (ReST) Relies on stateless, client-server, cacheable communication protocol It is NOT
More informationLost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
More informationMySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
More informationOptimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
More informationCreating Hybrid Relational-Multidimensional Data Models using OBIEE and Essbase by Mark Rittman and Venkatakrishnan J
Creating Hybrid Relational-Multidimensional Data Models using OBIEE and Essbase by Mark Rittman and Venkatakrishnan J ODTUG Kaleidoscope Conference June 2009, Monterey, USA Oracle Business Intelligence
More informationOracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect
Oracle Data Integrator 11g New Features & OBIEE Integration Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Agenda 01. Overview & The Architecture 02. New Features Productivity,
More informationVisualizing an OrientDB Graph Database with KeyLines
Visualizing an OrientDB Graph Database with KeyLines Visualizing an OrientDB Graph Database with KeyLines 1! Introduction 2! What is a graph database? 2! What is OrientDB? 2! Why visualize OrientDB? 3!
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationGeneral principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support
General principles and architecture of Adlib and Adlib API Petra Otten Manager Customer Support Adlib Database management program, mainly for libraries, museums and archives 1600 customers in app. 30 countries
More informationOLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are
More informationSAP BusinessObjects Business Intelligence (BOBI) 4.1
SAP BusinessObjects Business Intelligence (BOBI) 4.1 SAP BusinessObjects BI (also known as BO or BOBJ) is a suite of front-end applications that allow business users to view, sort and analyze business
More informationWINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies
More informationHow To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
More informationSAP Data Services 4.X. An Enterprise Information management Solution
SAP Data Services 4.X An Enterprise Information management Solution Table of Contents I. SAP Data Services 4.X... 3 Highlights Training Objectives Audience Pre Requisites Keys to Success Certification
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationDynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager
Paper SAS1787-2015 Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager Chris Upton and Lori Small, SAS Institute Inc. ABSTRACT With the latest release of SAS
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationOracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
More informationTable of Contents. Table of Contents 3
User Guide EPiServer 7 Mail Revision A, 2012 Table of Contents 3 Table of Contents Table of Contents 3 Introduction 5 About This Documentation 5 Accessing EPiServer Help System 5 Online Community on EPiServer
More informationCategory: Business Process and Integration Solution for Small Business and the Enterprise
Home About us Contact us Careers Online Resources Site Map Products Demo Center Support Customers Resources News Download Article in PDF Version Download Diagrams in PDF Version Microsoft Partner Conference
More informationUsing IBM dashdb With IBM Embeddable Reporting Service
What this tutorial is about In today's mobile age, companies have access to a wealth of data, stored in JSON format. Leading edge companies are making key decision based on that data but the challenge
More informationApache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationTiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley
Tiber Solutions Understanding the Current & Future Landscape of BI and Data Storage Jim Hadley Tiber Solutions Founded in 2005 to provide Business Intelligence / Data Warehousing / Big Data thought leadership
More informationSpring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
More informationGetting Real Real Time Data Integration Patterns and Architectures
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationUsing RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
More informationWA2192 Introduction to Big Data and NoSQL EVALUATION ONLY
WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationOracle Business Intelligence Foundation Suite 11g Essentials Exam Study Guide
Oracle Business Intelligence Foundation Suite 11g Essentials Exam Study Guide Joshua Jeyasingh Senior Technical Account Manager WW A&C Partner Enablement Objective & Audience Objective Help you prepare
More informationSAP Business Objects BO BI 4.1
SAP Business Objects BO BI 4.1 SAP Business Objects (a.k.a. BO, BOBJ) is an enterprise software company, specializing in business intelligence (BI). Business Objects was acquired in 2007 by German company
More informationTrafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
More informationStudy concluded that success rate for penetration from outside threats higher in corporate data centers
Auditing in the cloud Ownership of data Historically, with the company Company responsible to secure data Firewall, infrastructure hardening, database security Auditing Performed on site by inspecting
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationIBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM
IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016 Integration Guide IBM Note Before using this information and the product it supports, read the information
More informationQlik Sense Enabling the New Enterprise
Technical Brief Qlik Sense Enabling the New Enterprise Generations of Business Intelligence The evolution of the BI market can be described as a series of disruptions. Each change occurred when a technology
More informationSTREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationAzure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
More information