Student Project 2 - Apps Frequently Installed Together
|
|
- Dorthy Powers
- 8 years ago
- Views:
Transcription
1 Student Project 2 - Apps Frequently Installed Together 42matters is a rapidly growing start up, leading the development of next generation mobile user modeling technology. Our solutions are used by big brand companies within the mobile advertising market to serve mobile users intelligently targeted content. We are an international team, with an innovative and fastpaced company culture. Project Overview The collected anonymized data about mobile devices needs to be used for different data analytics tasks. The data is stored in an online transaction processing system (shortly refereed to as online system in the following) which is not suitable for this type of tasks. The goal of the project is to set up a system which allows offline data analytics based on Hadoop/Spark. The whole system will be implemented on Amazon AWS. Main activities of the project are: Load data from the online system into Hadoop/Spark. Structure and prepare the data to be suitable for required data analytics tasks. Implement and run data analytics tasks. The system will be built on a stack of MongoDB, Couchbase, and Hadoop cluster running on Amazon cloud. More details about the three parts are described in the following sections. In the figure below an overview of the systems involved in the project is provided. On the left side there is the online system which stores all production data. Data is stored in two different database systems, Couchbase and MongoDB. This part of the system will be provided. On the right site there is the offline system which needs to be implemented. Data from the online system has to be loaded into the created offline system and analyzed there. The whole system will be created in Amazon AWS (user credentials for Amazon AWS will be provided by 42matters).
2 Data Sources Structure The source data used in the project is data about mobile devices and about apps. Devices are stored in Couchbase, whereas, apps in MongoDB. Both, devices and apps, are stored in JSON format: Apps Apps are stored in a MongoDB collection (a collection in MongoDB corresponds to a table in a relational database). Each app is represented by a JSON document (which corresponds to a row in a table of a relational database) containing among others the following fields: package name: The unique identifier of an app title: The title of an app description: The description of an app category: The Google Play category the app belongs to rating: The rating of the app on Google Play. The following example represents the app document for the Facebook app:
3 package_name : com.facebook.katana, title : Facebook, description : Keeping up with friends is faster than ever.., category : Social rating : 4.0, } This collection contains about 1 million apps. Devices Devices are stored in Couchbase buckets (a bucket in Couchbase corresponds to a table in a relational database). Each device is represented by a JSON document (which corresponds to a row in a table of a relational database) containing among others the following fields: udid: The unique identifier of a device country: The country of the device timestamp: Timestamp of the last update of the document apps: List of apps installed on the device (apps are identified by their package name) The following example represents a device which among others has the Facebook app installed: udid : ++/OarsCrkiQx5EyY/XTVxOwc4m1H2Re3m+CdiW+YeU=, country : CH, timestamp : ISODate(" T10:18:56.531Z"), apps : [ fit : ISODate(" T03:47:39Z"), lut : ISODate(" T12:15:19Z"), pn : playboard.android }, fit : ISODate(" T08:43:32Z"), lut : ISODate(" T10:11:46Z"), pn : com.facebook.katana }, ], } This bucket contains millions of devices.
4 Offline Computations Requirements The system to be built will enable offline computations on large amounts of data. Offline computations can range from simple aggregations over devices and apps as well as more complex algorithms. For the implementation of the offline computations Python and Spark has to be used (Spark provides a Python API). The output of each computation has to be stored back in Hadoop (or into the online system, MongoDB and Couchbase). Some examples of offline computations to be supported are: Count number of devices having a specific app installed (device app frequency, DAF). This has to be computed for all apps: Package name Count (DAF) com.facebook.katana 1,850,300 Count number of devices having a given pair of apps installed together (device apps pair frequency, DAPF). The DAPF has to be computed for all pairs of apps appearing on any of the devices. Package name #1 Package name #2 Count (DAPF) com.facebook.katana playboard.android 126,000 Compute a score for all given pair of apps based on their DAPF and the DAF of the apps of the pair. Note that for any two apps, app1 and app2, two paris has to be considered, i.e. pair (app1, app2) and pair (app2, app1). The specific formula for computing the pair score will be provided by 42matters. Package name #1 Package name #2 Score com.facebook.katana playboard.android 0.68 Compute the score defined in the previous point over all devices (global) and for devices in specific countries, i.e. CH, DE, US, IT.
5 Country Package name #1 Package name #2 Score GLOBAL com.facebook.katana playboard.android 0.68 CH com.facebook.katana playboard.android 0.56 Project Tasks The project requires several tasks to be accomplished: Loading data from Couchbase and MongoDB into Hadoop. There exist Hadoop connectors which allow to connect to Couchbase and MongoDB. This connectors can be used to extract the data about devices and apps from the source systems in order to load it into Hadoop. Computation implementation After having loaded the data about devices and apps into Hadoop/Spark the different offline computation algorithms described in the previous section need to be implemented. A challenging part of this task might be to handle the large amount of intermediate data generated by the computations. In fact every possible pair of apps found on a device needs to be addressed which in the worst case is the number of apps squared (in practice it is less). Note that and advantage of using Spark in this project (with respect to using Map Reduce) is that intermediate data doesn t need to be stored to disk. Challenges Understanding and using the technology stack Mastering the distributed model of Hadoop/Spark Implementing the offline algorithms in a performant way
Student Project 1 - Explorative Data Analysis with Hadoop and Spark
Student Project 1 - Explorative Data Analysis with Hadoop and Spark 42matters is a rapidly growing start up, leading the development of next generation mobile user modeling technology. Our solutions are
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationCustomer Case Study. Sharethrough
Customer Case Study Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationCloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationHYPER-CONVERGED INFRASTRUCTURE STRATEGIES
1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
More informationSisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
More informationCAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
More informationGetting to Know Big Data
Getting to Know Big Data Dr. Putchong Uthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University Email: putchong@ku.th Information Tsunami Rapid expansion of Smartphone
More informationBackground on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationA bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18
A bit about Hadoop Luca Pireddu CRS4Distributed Computing Group March 9, 2012 luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18 Often seen problems Often seen problems Low parallelism I/O is
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationCloud computing for research
& for research Cécile Cavet cecile.cavet at apc.univ-paris7.fr Centre François Arago (FACe), Laboratoire APC, Université Paris Diderot January 11, 2015 Plan & 1 2 & 3 4 What is cloud computing? &??? Provides
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationBig Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com
Big Data Primer Alex Sverdlov alex@theparticle.com 1 Why Big Data? Data has value. This immediately leads to: more data has more value, naturally causing datasets to grow rather large, even at small companies.
More informationBig Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715
Big Data Facebook Wall Data using Graph API Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715 Outline Data Source Processing tools for processing our data Big Data Processing System: Mongodb
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationBig Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com
More informationBringing Big Analytics to the Masses Neal Leavitt
Bringing Big Analytics to the Masses Neal Leavitt CS846 short paper presentation Song Wang 1 2015/9/29 Motivation Agenda Issues for Small Business Analytics for all Drawbacks Summary 2 2015/9/29 Motivation
More informationThis survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple times - once for each project.
Introduction This survey has been developed jointly by the United Nations Statistics Division (UNSD) and the United Nations Economic Commission for Europe (UNECE). Our goal is to provide an overview of
More informationIntroduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationApigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationAppSymphony White Paper
AppSymphony White Paper Secure Self-Service Analytics for Curated Digital Collections Introduction Optensity, Inc. offers a self-service analytic app composition platform, AppSymphony, which enables data
More informationIncreasing revenue realization CASE STUDY. by leveraging. Big Data. Mobile marketing platform
Increasing revenue realization CASE STUDY by leveraging Big Data Mobile marketing platform background Opera Mediaworks is a part of Opera Software. It is the world's leading mobile advertising platform.
More informationwow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
More informationUbuntu: helping drive business insight from Big Data
WHITE PAPER Ubuntu: helping drive business insight from Big Data February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction For years, web giants such as Facebook, Google and ebay
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationPreparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More information6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationBig Data: Big N. V.C. 14.387 Note. December 2, 2014
Big Data: Big N V.C. 14.387 Note December 2, 2014 Examples of Very Big Data Congressional record text, in 100 GBs Nielsen s scanner data, 5TBs Medicare claims data are in 100 TBs Facebook 200,000 TBs See
More informationAutomated Machine Learning For Autonomic Computing
Automated Machine Learning For Autonomic Computing ICAC 2012 Numenta Subutai Ahmad Autonomic Machine Learning ICAC 2012 Numenta Subutai Ahmad 35% 30% 25% 20% 15% 10% 5% 0% Percentage of Machine Learning
More informationWHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
More information7 Steps to Successful Data Blending for Excel
COOKBOOK SERIES 7 Steps to Successful Data Blending for Excel What is Data Blending? The evolution of self-service analytics is upon us. What started out as a means to an end for a data analyst who dealt
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationGoogle Cloud Platform The basics
Google Cloud Platform The basics Who I am Alfredo Morresi ROLE Developer Relations Program Manager COUNTRY Italy PASSIONS Community, Development, Snowboarding, Tiramisu' Reach me alfredomorresi@google.com
More informationBIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT
BIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT The term Big Data is definitely a leading contender for the marketing buzz-phrase of 2012. On November 11, 2011, a Google search on the phrase
More informationUser Data Analytics and Recommender System for Discovery Engine
User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA- ICT- EX- 2013: 88 User Data Analytics and Recommender System for Discovery
More informationCloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
More informationFREE computing using Amazon EC2
FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationData Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead
Data Discovery and Systems Diagnostics with the ELK stack Rittman Mead - BI Forum 2015, Brighton Robin Moffatt, Principal Consultant Rittman Mead T : +44 (0) 1273 911 268 (UK) About Me Principal Consultant
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationYour Mission: Use F-Response Cloud Connector to access Google Apps for Business Drive Cloud Storage
Your Mission: Use F-Response Cloud Connector to access Google Apps for Business Drive Cloud Storage Note: This guide assumes you have installed F-Response Consultant, Consultant + Covert, or Enterprise,
More informationBig Data for everyone Democratizing big data with the cloud. Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de
Big Data for everyone Democratizing big data with the cloud Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de Does this Data make me look big? Overview Designing big data solutions in
More informationPlatform Agnostic Mobile App Development
Platform Agnostic Mobile App Development January 2016 A cooperative project between NTT DATA, Inc. and University of Texas Dallas Copyright 2012 NTT DATA Corporation Copyright 2012 NTT DATA Corporation
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationDoing Multidisciplinary Research in Data Science
Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov 16 May
More informationYou should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationBig Data & Analytics @ Netflix. Paul Ellwood February 9th, 2015
Big Data & Analytics @ Netflix Paul Ellwood February 9th, 2015 Who Am I? Director, Data Science & Engineering Also Leader, DataKind San Francisco chapter Formerly: Director, Product Analytics @ Netflix
More informationHow To Manage Marketing With A Cloud Based Software
MARKETING ANALYTICS AS A SERVICE Retail Marketing Analytics APRIL 2012 Powered by: 1 Who we are Company Overview Experienced team with a proven history of solving difficult analytics problems for Fortune
More informationStep by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015
Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations
More informationR Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
More informationBASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
More informationThe Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
More informationSearch Engine Marketing(SEM)
Search Engine Marketing(SEM) Module 1 Website Analysis Competition Analysis About Internet Marketing Scope & Career Opportunities Basics Of HTML & Website Development Platforms Module 2. Search Engine
More informationData Analytics Infrastructure
Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure
More informationMachine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323
Machine Learning and Cloud Computing trends, issues, solutions Daniel Pop HOST Workshop 2012 Future plans // Tools and methods Develop software package(s)/libraries for scalable, intelligent algorithms
More informationGetting Started with Hadoop with Amazon s Elastic MapReduce
Getting Started with Hadoop with Amazon s Elastic MapReduce Scott Hendrickson scott@drskippy.net http://drskippy.net/projects/emr-hadoopmeetup.pdf Boulder/Denver Hadoop Meetup 8 July 2010 Scott Hendrickson
More informationAnalyzing Big Data with AWS
Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,
More informationAt-Scale Data Centers & Demand for New Architectures
Allen Samuels At-Scale Data Centers & Demand for New Architectures Software Architect, Software and Systems Solutions August 12, 2015 1 Forward-Looking Statements During our meeting today we may make forward-looking
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationMySQL Comes of Age. Robert Hodges Sr. Staff Engineer Percona Live London November 4, 2014. 2014 VMware Inc. All rights reserved.
MySQL Comes of Age Robert Hodges Sr. Staff Engineer Percona Live London November 4, 2014 2014 VMware Inc. All rights reserved. Continuent is now part of VMware! VMware acquired Continuent on 28 October
More informationBig Data and Analytics (Fall 2015)
Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every
More information[Hadoop, Storm and Couchbase: Faster Big Data]
[Hadoop, Storm and Couchbase: Faster Big Data] With over 8,500 clients, LivePerson is the global leader in intelligent online customer engagement. With an increasing amount of agent/customer engagements,
More informationOracle Big Data Handbook
ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker
More informationImproving Job Scheduling in Hadoop
Improving Job Scheduling in Hadoop MapReduce Himangi G. Patel, Richard Sonaliya Computer Engineering, Silver Oak College of Engineering and Technology, Ahmedabad, Gujarat, India. Abstract Hadoop is a framework
More informationHarnessing Digital. November 2014
Harnessing Digital November 2014 Who is WSI? Founded in 1995 World s largest digital agency network 1000+ offices Operating in 87 Countries 2014 WSI. All rights reserved. Our Corporate Partners Award Winning
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationCLOUD ANALYTICS PROOF OF CONCEPT BRIEFING PAPER
CLOUD ANALYTICS PROOF OF CONCEPT BRIEFING PAPER As in so many other areas, cloud computing has the potential to provide disruptive change to business intelligence. Leveraging emerging open-source technologies
More informationNetworking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
More informationImproving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
More informationSend hyper-personalized emails based on revolutionary predictive algorithms and increase email revenues by 30%.
Send hyper-personalized emails based on revolutionary predictive algorithms and increase email revenues by 30%. Companies using both the Salesforce Marketing Cloud and predictive marketing from AgilOne,
More informationThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform Amir H. Payberah Swedish Institute of Computer Science amir@sics.se June 4, 2014 Amir H. Payberah (SICS) Stratosphere June 4, 2014 1 / 44 Big Data small data
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationÜbung Datenbanksysteme II Web-Scale Data Management. Thorsten Papenbrock
Übung Datenbanksysteme II Web-Scale Data Management Thorsten Papenbrock Introduction 2 MapReduce is a paradigm derived from functional programming. is implemented as framework. operates primarily data-parallel
More informationConcentrate Observe Imagine Launch
SVNLABS Entrepreneur We are growing enterprise in application development on Cloud Hosting like Amazon EC2/S3 and RackSpace. Cloud Hosting & Development Tools: Amazon EC2 AMI Tools, AWS Management Console,
More informationWE RUN SEVERAL ON AWS BECAUSE WE CRITICAL APPLICATIONS CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY.
WE RUN SEVERAL CRITICAL APPLICATIONS ON AWS BECAUSE WE CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY. - Murari Gopalan Director, Technology Expedia Expedia, a leading online travel company for leisure
More informationContents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes
Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter
More informationBenchmarking Sahara-based Big-Data-as-a-Service Solutions. Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015
Benchmarking Sahara-based Big-Data-as-a-Service Solutions Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015 Agenda o Why Sahara o Sahara introduction o Deployment considerations o Performance
More informationOverview of edx Analytics
Overview of edx Analytics I. Data Available from edx EdX provides researchers with data about your institution's classes running on edx.org and edge.edx.org. This includes: Course data Student information
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationAddressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015
Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a
More information