Liktap 2012 Search Engine
|
|
- Oscar Stephens
- 3 years ago
- Views:
Transcription
1 Related Searches at LinkedIn Mitul Tiwari Joint work with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah LinkedIn 1
2 Who am I 2
3 Outline About LinkedIn Related Searches Design Implementation Evaluation 3
4 LinkedIn by the numbers 175M+ members 2+ new user registrations per second 4.2 Billion people searches in Billion page views in Q million monthly active users in Q
5 Broad Range of Products Profile Search People You May Know News Skills Hiring Solutions
6 Related Searches at LinkedIn Millions of searches everyday Goal: Build related searches system at LinkedIn To help users to explore and refine their queries 6
7 Related Searches at LinkedIn
8 Related Searches Design Implementation Evaluation 8
9 Related Searches Design Implementation Evaluation 9
10 Design 10
11 Design Signals Collaborative Filtering Query-Result Click graph Overlapping terms 10
12 Design Signals Collaborative Filtering Query-Result Click graph Overlapping terms Length-bias 10
13 Design Signals Collaborative Filtering Query-Result Click graph Overlapping terms Length-bias Ensemble approach for unified recommendation 10
14 Design Signals Collaborative Filtering Query-Result Click graph Overlapping terms Length-bias Ensemble approach for unified recommendation Practical considerations 10
15 Design: Collaborative Filtering Q1 Q2 Q3 Q4 Time Searches correlated by time Searches done in the same session by the same user Collaborative filtering: implicit feedback TFIDF scoring to take care of popular queries (e.g. `Obama ) 11
16 Design: Query-Result Clicks Q1 R1 Qn Rm Searches correlated by result clicks 12
17 Design: Overlapping Terms Q1 Software Developer Q2 Software Engineer Searches with overlapping terms TFIDF scoring to give importance to terms 13
18 Design: Length Bias Insight: clicks on suggestions one term longer 14
19 Design: Length Bias Insight: clicks on suggestions one term longer Corresponds to refining the initial query Statistical biasing model to score a longer query higher
20 Design: Length Bias Insight: clicks on suggestions one term longer Corresponds to refining the initial query Statistical biasing model to score a longer query higher
21 Design: Ensemble Approach Need to generate unified recommendation dataset Analysis to figure out engagement of each signal Attempted ML approach Minimal overlap across different signals 16
22 Design: Ensemble Approach Step-wise unionization Importance based on individual signal performance First, collaborative filter Second, queries correlated by query-result clicks Third, queries overlapping terms 17
23 Design: Practical Considerations System designed for public consumption Strong profanity filters Need to deal with misspellings Languages Remove spammy search queries 18
24 Related Searches Design Implementation Evaluation 19
25 Implementation Challenge Scale 175M+ members Billions of searches Terabytes of data to process 20
26 Implementation Search Backend HDFS Hadoop Front End Kafka Related Searches Backend Metaphor Voldemort Kafka: publish-subscribe messaging system Hadoop: MapReduce data processing system Azkaban: Hadoop workflow management tool Voldemort: Key-value store
27 Implementation: Workflow Query Preprocessing Pair Formation Graph Formation Token Identification CF Scoring QRQ Scoring Partial Scoring Step-wise Union Length Bias Filter Step-wise Union
28 Related Searches Design Implementation Evaluation 23
29 Evaluation Performance of each signal and combination How does the system scale? 24
30 Evaluation Cont d Offline evaluation Precision-Recall Online evaluation A/B testing to measure engagement Performance evaluation 25
31 Offline Evaluation Correct set: set of searches performed by a user in the following K minutes, here K=10 26
32 Online Evaluation Used A/B testing Metrics Coverage: queries with recommendations Impressions: # of recommendations shown Clicks: Clicks on recommendations Click-through rate (CTR): Clicks per impression 27
33 Online Evaluation
34 Evaluation: System Runtime 29
35 Details Metaphor: a System for Related Search Recommendations, Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse, and Sam Shah. In Proceedings of the CIKM,
Building Data Products using Hadoop at Linkedin. Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn
Building Data Products using Hadoop at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1 1 Who am I? 2 2 What do I mean by Data Products? 3 3 People You May Know 4 4 Profile Stats:
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationBuilding a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering Self Service at scale 6 5 4 3 2 1 ? Relational? MPP? Hadoop? Linkedin data 350M Members 25B 3.5M 4.8B 2M
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationThe Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
More informationTHEMIS: Fairness in Data Stream Processing under Overload
THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK Marco Fiscato Imperial College London, UK Theodoros Salonidis IBM Research, USA Peter R. Pietzuch
More informationCRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
More informationPerformance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
More informationMoving Faster: Why Intent Media Chose Cascalog for Data Processing and Machine Learning. Kurt Schrader May 20, 2014
Moving Faster: Why Intent Media Chose Cascalog for Data Processing and Machine Learning Kurt Schrader May 20, 2014 Overview History of data processing at Intent Media Hello World in various data processing
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationAdvanced Analytics. Select any section on the left to jump right in or click Start to begin.
Advanced Analytics Overview Maximize CisionPoint s analytics capability to understand the true impact of media coverage, gain actionable insights, and show ROI. Advanced Metrics Determine the impact and
More informationComparison of Different Implementation of Inverted Indexes in Hadoop
Comparison of Different Implementation of Inverted Indexes in Hadoop Hediyeh Baban, S. Kami Makki, and Stefan Andrei Department of Computer Science Lamar University Beaumont, Texas (hbaban, kami.makki,
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationThe Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
More informationI Logs. Apache Kafka, Stream Processing, and Real-time Data Jay Kreps
I Logs Apache Kafka, Stream Processing, and Real-time Data Jay Kreps The Plan 1. What is Data Integration? 2. What is Apache Kafka? 3. Logs and Distributed Systems 4. Logs and Data Integration 5. Logs
More informationEnsighten Activate USE CASES. Ensighten Pulse. Ensighten One
USE CASES Ensighten Activate Ensighten One Ensighten Pulse Use Case: On-Site Targeting based on Off-Site Display Ad Deliver relevant content to customers after they viewed or clicked through an Off-Site
More informationIntroduction to Pay Per Click
Introduction to Pay Per Click What is Pay-Per-Click Advertising? The advertising model which charges by the click as opposed to previous models which charged by CPM Most commonly known for being text ads
More informationSTREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
More informationThe Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
More informationpomsets: Workflow management for your cloud
: Workflow management for your cloud Michael J Pan Nephosity 20 April, 2010 : Workflow management for your cloud Definition Motivation Issues with workflow management + grid computing Workflow management
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationBig Data Analytics in LinkedIn. Danielle Aring & William Merritt
Big Data Analytics in LinkedIn by Danielle Aring & William Merritt 2 Brief History of LinkedIn - Launched in 2003 by Reid Hoffman (https://ourstory.linkedin.com/) - 2005: Introduced first business lines
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationKafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationAlexander Nikov. 8. ecommerce Marketing Communications. Marketing Communications. Outline
INFO 3435 ecommerce Teaching Objectives 8. ecommerce Marketing Communications Identify the major forms of online marketing communications. Explain the costs and benefits of online marketing communications.
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationHealthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw
Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
More informationBig Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
More informationThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn ABSTRACT The use of large-scale data mining and machine learning has proliferated through the adoption of technologies
More informationLegal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationBeyond "Big data": Introducing the EOI framework for analytics teams to drive business impact
Beyond "Big data": Introducing the EOI framework for analytics teams to drive business impact Michael Li Business Analytics, LinkedIn Nov 20, 2014 J onathan Wu raveen Neppalli Naga P Chi-Yi Kuan Business
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationMap Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
More informationYahoo! Grid Services Where Grid Computing at Yahoo! is Today
Yahoo! Grid Services Where Grid Computing at Yahoo! is Today Marco Nicosia Grid Services Operations marco@yahoo-inc.com What is Apache Hadoop? Distributed File System and Map-Reduce programming platform
More informationBuild Your Knowledge!
About this Course This 3-day Instructor led course Explore several advanced topics of working with SharePoint 2013 sites. Topics include SharePoint Server site definitions (Business Intelligence, Document
More informationSearch Engine Marketing Analytics with Hadoop. Ecosystem: Case Study for Red Engine Digital Agency
Search Engine Marketing Analytics with Hadoop Ecosystem: Case Study for Red Engine Digital Agency Mia Alowaish Northwestern University MaiAlowaish2012@u.northwestern.edu Sunil Kakade Northwestern University
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationCISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 martin@cs.queensu.ca Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationCopyright 2005-2010 Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement
More information
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationHadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone
Hadoop2, Spark Big Data, real time, machine learning & use cases Cédric Carbone Twitter : @carbone Agenda Map Reduce Hadoop v1 limits Hadoop v2 and YARN Apache Spark Streaming : Spark vs Storm Machine
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationtargetdisplaytm PLAYBOOK
targetdisplaytm PLAYBOOK Score a Touchdown with Targeted Display Draft Your All-STAr Team Metaphorically, your team of players represents your arsenal of online display marketing tactics. Digital marketers
More informationHigh-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store
High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store Kurt Rohloff BBN Technologies Cambridge, MA, USA krohloff@bbn.com Richard E. Schantz
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationHadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
More informationHadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationOnline Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationSponsored Search Ad Selection by Keyword Structure Analysis
Sponsored Search Ad Selection by Keyword Structure Analysis Kai Hui 1, Bin Gao 2,BenHe 1,andTie-jianLuo 1 1 University of Chinese Academy of Sciences, Beijing, P.R. China huikai10@mails.ucas.ac.cn, {benhe,tjluo}@ucas.ac.cn
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationSpeak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure
3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement
More informationHow To Get A Better Car From A Car Dealer To A Car Buyer
Making Measurement Meaningful Why do we measure media and advertising? Track progress = Is everything going to plan? Analyse = Why isn t it going to plan? Fix the problem. Gain insight = Give me a better
More informationTECH NOTE. Hadoop Alone Is Not Big Data
TECH NOTE Hadoop Alone Is Not Big Data Twenty-one years ago, a year before the first web browser appeared, Walmart s Teradata data warehouse exceeded a terabyte of data and kicked off a revolution in supply-chain
More informationMS-55052: SharePoint 2013 End User Level II
MS-55052: SharePoint 2013 End User Level II Description This 3-day Instructor led course Explore several advanced topics of working with SharePoint 2013 sites. Topics include SharePoint Server site definitions
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationA B S T R A C T. Index Terms: Hadoop, Clickstream, I. INTRODUCTION
Big Data Analytics with Hadoop on Cloud for Masses Rupali Sathe,Srijita Bhattacharjee Department of Computer Engineering Pillai HOC College of Engineering and Technology, Rasayani A B S T R A C T Businesses
More informationTAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
More informationThe Definitive Guide to Google AdWords
The Definitive Guide to Google AdWords Create Versatile and Powerful Marketing and Advertising Campaigns a ii a Bart Weller Lori Calcott Apress* Contents y About the Author About the Technical Reviewer
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationMarko Grobelnik marko.grobelnik@ijs.si Jozef Stefan Institute
Marko Grobelnik marko.grobelnik@ijs.si Jozef Stefan Institute Kalamaki, May 25 th 2012 Introduction What is Big data? Why Big-Data? When Big-Data is really a problem? Techniques Tools Applications Literature
More informationThe ABCs of AdWords. The 49 PPC Terms You Need to Know to Be Successful. A publication of WordStream & Hanapin Marketing
The ABCs of AdWords The 49 PPC Terms You Need to Know to Be Successful A publication of WordStream & Hanapin Marketing The ABCs of AdWords The 49 PPC Terms You Need to Know to Be Successful Many individuals
More informationTencentRec: Real-time Stream Recommendation in Practice
TencentRec: Real-time Stream Recommendation in Practice Yanxiang Huang Bin Cui Wenyu Zhang Jie Jiang Ying Xu Key Lab of High Confidence Software Technologies (MOE), School of EECS, Peking University Tencent
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationMedical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15
Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies
More informationOperational Intelligence: Real-Time Business Analytics for Big Data Philip Russom
Operational Intelligence: Real-Time Business Analytics for Big Data Philip Russom TDWI Research Director for Data Management August 14, 2012 Sponsor Speakers Philip Russom Research Director, Data Management,
More informationSocial Recruiting How to Effectively Use Social Networks to Recruit Talent
Social Recruiting How to Effectively Use Social Networks to Recruit Talent Introduction As a recruiter, you want to find the most qualified, talented, and largest pool of applicants. LinkedIn, Facebook,
More informationDIGITAL MARKETING BASICS: PPC
DIGITAL MARKETING BASICS: PPC Search Engine Marketing (SEM) is an umbrella term referring to all activities that generate visibility in search engine result pages (SERPs) through the use of paid placement,
More informationUsing In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
More informationHow To Create A Web Alert For A User To Find Interesting Results From A Past Search History On A Web Page
Retroactive Answering of Search Queries Beverly Yang Google, Inc. byang@google.com Glen Jeh Google, Inc. glenj@google.com ABSTRACT Major search engines currently use the history of a user s actions (e.g.,
More informationFor Your Eyes Only: Consuming vs. Sharing Content
For Your Eyes Only: Consuming vs. Sharing Content Ram Meshulam Outbrain Inc. 39 West 13th Street New York NY 10011, United States rmeshulam@outbrain.com Roy Sasson Outbrain Inc. 39 West 13th Street New
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationHadoop Development & BI- 0 to 100
Development Master the Data Analysis tools like Pig and hive Data Science Hadoop Development & BI- 0 to 100 Build a recommendation engine Hadoop Development - 0 to 100 HADOOP SCHOOL OF TRAINING Basics
More informationREAL-TIME BIG DATA ANALYTICS
www.leanxcale.com info@leanxcale.com REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
More informationRealtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
More informationMarko Grobelnik marko.grobelnik@ijs.si Jozef Stefan Institute Ljubljana, Slovenia
Marko Grobelnik marko.grobelnik@ijs.si Jozef Stefan Institute Ljubljana, Slovenia Stavanger, May 8 th 2012 Introduction What is Big data? Why Big-Data? When Big-Data is really a problem? Techniques Tools
More informationComposite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationIntroduction To Hive
Introduction To Hive How to use Hive in Amazon EC2 CS 341: Project in Mining Massive Data Sets Hyung Jin(Evion) Kim Stanford University References: Cloudera Tutorials, CS345a session slides, Hadoop - The
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More information