Final Report. FASTMOD: A Framework for ReAltime Spatio- Temporal Monitoring of MObility Data. Gautam Thakur, Yibin Wang, Kun Li

Size: px
Start display at page:

Download "Final Report. FASTMOD: A Framework for ReAltime Spatio- Temporal Monitoring of MObility Data. Gautam Thakur, Yibin Wang, Kun Li"

Transcription

1 Final Report FASTMOD: A Framework for ReAltime Spatio- Temporal Monitoring of MObility Data Gautam Thakur, Yibin Wang, Kun Li Abstract: In this study, we apply large data science techniques for monitoring, analysis, and modeling of vehicular traffic and user mobility in real time. Today, vehicular congestion is major challenge to the efficient handling of traffic and transit system. In the latest (2010) urban mobility report, vehicular congestion caused travel 4.8 billion hours(~30% extra). Also, drivers purchase an extra 3.9 billion gallons of fuel for a cost of $115 billion. On the other hand, user mobility impact network performance. Since user mobility is driven sometimes by behavioral pattern, it creates a disconnect between WL Network activity and Resource Utilization. In order to provide real time traffic analysis and also provide behavioral driven mobility protocols and models, we seek to use a trace driven approach. However, their enormity requires specialized tools and techniques. In order to solve this problem, we propose use of Map Reduce framework for efficient processing and analysis. In this study, we provide a framework for real time analysis using Hadoop and give some first order statistical results and correlations. We believe that our work and the dataset provide a much-needed contribution to the research community for realistic and data-driven design and evaluation of networks. Introduction: Real-time monitoring is an essential component for today s dynamical and time critical systems. It provides instant updates on what is happening in the system in response to performance issues or report problems. Real-time monitoring includes analysis of raw data and generation of statistics that entails the overall state of the system. However, enormous size and widespread deployment of systems like campus Wireless LAN and vehicular traffic monitoring pose serious challenges in the efficient processing of large stream of data. Any lag in the quick redressal to the problem not only affects the current performance but also incrementally complicate the resolution process. For example, if the traffic signal malfunctions during rush hours, congestion on the roads builds up with long queues of cars and other vehicles. Other challenges include real-time data sanitization and interpolation. To our surprise, current system seriously overload themselves in time of need and deemed inadequate to scaling and robust architecture. Existing tools are becoming inadequate to process such large data sets and processing data in real time. We try to propose solutions for the following questions: 1) How to design a system that takes incoming data stream from sensors deployed in a large scale? How to organize the data so that it supports both real time data processing and archive data processing? 1

2 2) How to design a system that can handle some sophisticated analysis algorithm on large streaming data in real time? (potential applications include traffic prediction, traffic causality mining, route suggestion) In this project, we provide a Map-Reduce framework for data-intensive realtime monitoring applications based on Hadoop/Hadoop Online Protocol (HOP). It includes: Real Time Monitoring, Data Acquisition and Processing from the sensors of Wireless Networks, planet-scale online traffic web-cameras and potentially many other types of sensor. Outlier Detection and Removal & Integration Data Analysis, Knowledge Discovery and Modeling Graphical Visualization The responsibility of each member in the group are split as follows: Gautam: Vehicular data analysis, Hive setup Yibin: Mobility data analysis, Hadoop setup Kun: Hadoop setup, Hive setup, HSQL Background: MapReduce[1] is a parallel processing framework proposed by Google in Since then, MapReduce has been heavily used in IT companies such as Google, Yahoo, and Facebook. Google was running about 3,000 computing jobs per day through MapReduce, representing thousands of machine-days according to a presentation by Dean. Among other things, these batch routines analyze the latest Web pages and update Google's indexes. Among all the open source implementation of MapReduce framework, Hadoop is the most popular implementation. We will use Hadoop as our underlying parallel computing framework to run our data cleaning and data analysis jobs. T. Condile et al proposed a pipelined version of Map Reduce[8], being implemented and known as Hadoop Online Prototype (HOP) which supports real time application such as event monitoring and stream processing. There are two prototype of traffic estimation and prediction system developed by MIT and UTX/UMD are named DynaMIT-R and DYNASMART-X, respectively. However, both systems are simulation-based system. Lin[7] proposed DynaCHIN which is a specially-built real-time traffic Prediction System for China. Singapore s Land Transport Authority, together with IBM developed a traffic estimation and prediction tool, which uses historical traffic data and real-time feeds with flow conditions from several sources, in order to predict the levels of congestion up to an hour in advance. The pilot results show overall prediction results above 85 percent of accuracy. Berkley Millenium project [9] also target on building a real time transportation monitoring system. In a general sense, our approach to tackle tracking real time traffic data and mobility data can be applied to ease the traffic congestion, city planning and resource allocation problem. Specifically, an ever increasing problem of vehicular traffic congestion on the roads has became severe around the world. In the latest (2010) urban mobility report[1], congestion caused urban Americans to travel 4.8 billion hours more and to purchase an extra 3.9 billion gallons of fuel for a cost of $115 billion. On average, yearly peak period delay caused by the traffic congestion for the average commuter was 34 hours and the cost to the average commuter has increased by 230% in two decades[1]. Congestions not only affect people during the peak period, but also at other hours, approximately half of total delay occurs in the midday and overnight. 2

3 System Description: Framework: In the following figure, we outline the proposed framework. It consist of the three components (i) Real Monitoring and Processing, (ii) Knowledge Discovery (ii) Modeling. Hadoop/HOP implementation involves a multi-stage distributed architecture for each of these components that include several master and reducers. Components: Monitoring sensor: we have monitoring sensors that are deployed over a certain area (city, campus). They continuously uploading sensed data. Crawler (on mappers): The crawlers are actually mappers in hadoop that crawl the readings from all monitoring sensors. They get the readings in a parallel download fashion. Processing engine: After Crawlers have downloaded the readings in HDFS, processing engine starts to preprocess the raw readings. These include: Outlier detection, density estimation, mobility processing. We use map-reduce framework for all processing task. Hive: A SQL-like hadoop-based query system. We run Hive query over the output of processing engine. This help us to get first order statistics from the preprocessed data HSQL: After gathering the statistics from Hive query, we propose to use the hybrid approach to start deep analysis of the data such as correlation and prediction. These tasks are done using external scripting languages like R, Matlab, Python (under Hadoop framework) since they already have a well-established library for statistical analysis. We call it as Hyper-SQL. Dynamic query: Image algorithm: 3

4 We aim to estimate traffic density(d) on roads considering the number of vehicles or pedestrians crossing the road. We have a sequence of images captured by webcams. Considering our problem, we have to be able to separate information we need, e.g. number of vehicles and pedestrians from the back ground image, which is normally road and buildings around. The main factor that can distinguish between vehicles and background image (road, buildings) is the fact that the vehicles are not in a stationary situation for a long period of time, however the back ground is stationary. The solution for the problem then seems to be applying a sort of high pass filtering over a sequence of images captured by a webcam over time. The high pass filter removes the stationary part of the images (road, buildings, etc.), and keeps the moving components (mainly vehicles). In order to implement such a high pass filter, we sub- tract result of a low pass filter over a sequence of images, from each still image. This is practically equivalent to implementing a high pass filter over sequence of images. In order to obtain low pass filtering effect, we run a moving average filter over a time sequence of images obtained from one webcam. The duration of the moving average filter can be adjusted in an adhoc way. The moving average filter is simply implemented by averaging over the intensity map for several images in a certain duration. At the output of the moving average filter, the intensity of each pixel is obtained by averaging intensity of corresponding pixels in the interval. The output of the moving average filter (low pass filter) is normally the required background image, which is still part of the image. Therefore, subtracting each image from the output of the low pass filter, gives us the moving components (e.g. vehicles). Having the high pass component of the image, the vehicles are highlighted from background. One could then use regular object detection techniques to identify and count number of vehicles in the high pass filtered image. However, this is computationally expensive and unnecessary. As an alternative, we simply count the number of active pixels (pixels with a value higher than a certain threshold). This is much faster than detecting and counting objects in an image. At the same time, it is more effective, because we are look- ing at the traffic densities (d), i.e. percentage of the street (road) which is covered by vehicles (as an indicator of how crowded is the street), rather than number of vehicles. Number of vehicles is not a good indicator of crowdedness, as a long vehicle may introduce more traffic than a small one. Second, our method overcomes the issues that object detection face in case of severe congestion. Counting number of active pixels can indicate what percentage of the road is covered, no matter how many vehicles are in the road. In many instances, images are duplicate, corrupted with zero sized or with extraneous bytes (noise). We use semi- supervised learning and hierarchical clustering to overcome the challenges of outliers detection and removal. The adjoining figure shows the algorithm output. 4

5 The data product: 1. A model of distributed system that is capable of of receiving and processing of the raw data for real time analysis and also capable of collaboratively organizing received data into archive for history analysis - using Hadoop/HOP 2. a multi-user query engine that serves the purpose of sending the vehicular traffic update a. can be done by creating a hadoop like architecture for distributing queries based on the user request. the queries can be prediction, current traffic updates, historical data information, and inference of the future status 3. developing a dynamic model for the finding optimized routes based on the start and end location. (or any other prediction/optimization algorithm/analysis method that utilize the incoming data stream) 4. outlier detection and removal techniques. a. a set of inter-connected and sequenced process for the data processing b. to develop a caching model so as to reduce the map-reduce job. 5. A visualization system to track the vehicular and mobility data in real time Method: In the adjoining figure, we illustrate a step by step process to achieve the goal of near real time traffic monitoring and modeling. 1. Start scripts. 2. Involves crawlers that capture pictures or user mobility instances every few minutes. These crawlers internally store images to external data storage. 3. External Data Storage is unit that maintain image archive. 4. In order to process the images and mobility records in real time, we copy the downloaded raw facts into HDFS. Since this copying happens per record wise and which is not too much, we actually circumvent the issues of copy times. It takes few seconds to copy individual records after crawler download. 5. HDFS is single repository distributed over many disks that store the processed data. 6. Next we perform outliers detection and estimation algorithms to extract traffic and mobility information. 7. this information is then stored into Hive which later on provide an interface for SQL type queries. 8. Along with other scripts, we use HSQL queries to extract information from the Hive DB. These queries are similar to SQL and provide added benefit of directly interacting with database. 9. Since HSQL queries are not sufficient enough, we augment certain Java based procedures to get information related to HotSpots, currene traffic updates and predictions. Framework: In this section, we describe our proposed framework, shown in adjoining figure, which is comprised of 5

6 three parts: (i) Measurements and pre-processing, (ii) Knowledge discovery, and (iii) Modeling and analysis. The measurements and pre-processing component is responsible to capture imagery snapshots, sanitize data and generate a quantifiable value of vehicular traffic, hereafter known as traffic density(d). We store the processed data in Hive for further querying. The knowledge discovery focuses on applying data mining tools to extract traffic patterns, and spatio-temporal information. This activity can help to develop rich mobility scenarios. Next, the modeling and analysis component focus on characterizing the vehicular traffic densities. It can aid in designing and developing new data-driven vehicular mobility models and simulators. Finally, applications like visualization can be developed from the previous component analysis. Real Time Monitoring and Processing: We view the connected global network of webcams as a highly versatile platform, enabling an untapped potential to monitor global trends, or changes in the flow of the city, and providing large- scale data to realistically model vehicular, or even human mobility. We also download and process mobility records from Access Point Controllers that are deployed on-campus. On average, we download 15 Gigabytes of imagery data per day from over 2700 traffic web cameras, with a overall dataset of 7.5 Terabytes containing around 125 million images. To fasten the process of images, we are using background subtraction, a technique with low turn around time. The mobility records from campus are text based and do not require any special processing. The processed information onwards saved in Hive and the processed images are removed from the HDFS. Knowledge Discovery: We did some initial traffic correlation analysis, to measure the degree to which traffic of a camera is linearly associated with itself for 42 days. Traffic Congestion show high Correlation (80%) for 1-2 hour lag. Decrease significantly to ~25-30% for 4 hour lag. Modeling and Analysis: Here, we focus on modeling empirical traffic densities against known theoretical distributions. The objective of this study is to help understand the underlying statistical patterns. We find that traffic at individual cameras can vary a lot, but in general log-logistic, gamma and Weibull distribution can capture some of the key features. In case of mobility data, we find a normal distribution of user traffic on campus wide scale with peaks occurs during noon hours. Applications: The experience gained from the analysis and modeling of traffic densities potentially aids in future design and evaluation of vehicular networks. To aid visualization, we are developing applications to demonstrate traffic conditions on desktop and handheld devices. In the adjoining figure, we show scenarios for vehicular traffic visualization. 6

7 Experiments: Dataset: We use two sets of spatio-temporal data-sets. First data-set has wireless LAN traces of mobile users and second data-set contains vehicular images as captured from online traffic web-cameras. The collective size of the data is well over seven TB. User Mobility Data: We collect different types of traces via network switches including netflows, DHCP and wireless access point (AP) session logs (MAC traps). The wireless session log is collected by each wireless AP or switch port (i.e., aggregate of APs in a building). The trace includes the start and end events for device associations (when they visited or left that specific AP), the device s MAC address, the date and time of those events, and the AP (or switch) IP and port numbers. From the above we can derive the association history (i.e., the location and time of user association) for all MAC addresses. The DHCP log contains the dynamic IP assignments to MAC addresses. The listed IP is given to the MAC address at the indicated date and time. User mobility is then extracted by its association (with AP) log provided that every AP location is pre-determined. Vehicular Data: We utilize the power of online traffic web cameras as pri- mary source of data collection. These web cameras are in- stalled on highways and on critical traffic signals of cities under study. At regular time interval, they capture still pic- tures of on-going road traffic and send them in the form of visual feeds to Department of Transportation(DoT) media servers. For this work, we collaborate with 10 cities (DoTs) across the globe. The details of cities and the data set are given in Table-1. We view the connected global network of these webcams as a highly versatile platform, enabling us to visualize the traffic flow of the city and realistically model vehicular, or even human mobility. We download these im- ages and store them in our media image storage server. Experiement Results: For this project, we are targeting on the prototype of a real-time streaming data tracking system using Hadoop specifically for vehicular and mobility data. So we are going to present result of the analysis and system build for it. Mobility tracking results: We did two experiments using mobility data to test the tracking capability of the proposed system. For each of the experiments, we use animation to showcase the result as shown in our presentation. We show snapshots of these results here and briefly describe the experiments. (1) We try to tack the aggregate user movement among all buildings on campus. Recall that using WiFi log data collected on campus, we can show user location in terms of building. This kind of tracking can show interesting correlation between buildings and help admin to easily pinpoint some events in terms of user dynamics over time. In the following figures, the matrix shows the aggregated user movement transitions among buildings. Each row represents one building index and entry (i, j) means the number of users transit from building i to building j in the given time window that is being captured. We use heat map to show the density and grandniece change of user movement density over time. 7

8 (2) Google earth user density tracking As a complement study of the previous user movement tracking, we now try to track visualize the number of users in each building at each point of time using Google earth animation. We pre-process the mobility trace data together with the coordinates data of each building in Hive database and then using KML generation code as Reducer function to generate KML file which is the file type for Google Earth input. Finally, we can visualize the data in Google Earth in real time. The system also supports the query based filter which allows user to track on specific area/time window/particular user by specifying Hive query statement. In our presentation, we showed an animation with time window from 10 am to 11 am. Vehicular Data Result: For vehicular image data, we show two types of results using our tracking system. (1) Analysis data that show the traffic density of specific intersection/road over time. (2) The animation of the real time tracking system with row image data alone with processed data at each step and density plot in real time. (1) Traffic density The following four figures show the examples of different traffic density captured and analyzed by our tracking system. Note that the vehicular data, being processing from image data, requires more processing power to be pre processed. It can better show the real time tracking feature of our system. From top to bottom, these figures show: High traffic, low traffic, random traffic and rush hour traffic. The 8

9 x axis show the day index and y axis is the hour of day. This analysis can be used for more complex real time analysis like the correlation of traffic density over time. The system supports Hive query to change both spacial and temporal dimension for analysis. (2) Real time traffic tracking animation As we showed in presentation, our real time traffic tracking system is capable of tracking the system from different stages. First, it shows the raw image data captured by camra, then it shows the processed images by using the image analysis algorithm. Finally, it shows the real time plot for traffic density of the area of interest. The system supports Hive query to change the area of interest. Performance We test the system performance for (1) copy time (2) running load (3) system load to compare with the single machine case. We use two machines running Hadoop and Hive for all our analysis and 9

10 performance benchmark. We expect to see more performance advantage if our system is deployed on more machines. The first figure shows the relationship between data size and the amount of time needed for copying these data from local file system to HDFS for map-reduce processing. It shows the performance of batch job of copying crawled data from local file system to HDFS. Concluding Remarks: In this project, we applied the Map Reduce technique using Hadoop to process and analyze large data. We specifically took two different cases of large data processing, from user mobility and vehicular networks. We showed that using Map Reduce framework, we can achieve near real time processing and visualization. In future, we are looking for interactive query processing. In this work, we also introduced a novel framework for large-scale monitoring, analysis, and modeling of vehicular traffic and user mobility. We showed how can we leverage and overcome the challenge of data overloading by achieving near real time performance. However, we agree that its a case specific activity that make more sense for us, as the sensor data is arriving and processed in real time. Our performance analysis results show that Hadoop distributed not only accelerate the proces Finally, we believe that our work will help community to use Map Reduce framework for large data analysis in near real time. Reference: 10

11 1. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun.ACM,51(1): , "IBM and Singapore's Land Transport Authority Pilot Innovative Traffic Prediction Tool". IBM Press release Retrieved Lin, Y. and H. Song. (2007) DynaCHINA: Specially built real-time traffic prediction system for China. Presented at the 86th Annual Meeting of the Transportation Research Board, Washington, DC. 8. T. Condie, N.Conway, P, Alvaro, J. M.Hellerstein: MapReduce Online

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop R. David Idol Department of Computer Science University of North Carolina at Chapel Hill david.idol@unc.edu http://www.cs.unc.edu/~mxrider

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

BIG DATA SOLUTION DATA SHEET

BIG DATA SOLUTION DATA SHEET BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Processing with Google s MapReduce. Alexandru Costan 1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

SEAIP 2009 Presentation

SEAIP 2009 Presentation SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

BIG DATA ANALYTICS For REAL TIME SYSTEM

BIG DATA ANALYTICS For REAL TIME SYSTEM BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Spatial and Temporal Analysis of Planet Scale Vehicular Imagery Data

Spatial and Temporal Analysis of Planet Scale Vehicular Imagery Data Spatial and Temporal Analysis of Planet Scale Vehicular Imagery Data Gautam S. Thakur, Pan Hui, Hamed Ketabedar and Ahmed Helmy CISE, University of Florida, Gainesville, FL 326-62 Deutsche Telekom Laboratories,

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data 100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

More information

HADOOP: Scalable, Flexible Data Storage and Analysis

HADOOP: Scalable, Flexible Data Storage and Analysis HADOOP: Scalable, Flexible Data Storage and Analysis By Mike Olson Beginning in the early 000s, Google faced a serious challenge. Its mission to organize the world s information meant that it was crawling,

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

TIME TO RETHINK REAL-TIME BIG DATA ANALYTICS

TIME TO RETHINK REAL-TIME BIG DATA ANALYTICS TIME TO RETHINK REAL-TIME BIG DATA ANALYTICS Real-Time Big Data Analytics (RTBDA) has emerged as a new topic in big data discussions. The concepts underpinning RTBDA can be applied in a telecom context,

More information

Taking Data Analytics to the Next Level

Taking Data Analytics to the Next Level Taking Data Analytics to the Next Level Implementing and Supporting Big Data Initiatives What Is Big Data and How Is It Applicable to Anti-Fraud Efforts? 2 of 20 Definition Gartner: Big data is high-volume,

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Radoop: Analyzing Big Data with RapidMiner and Hadoop

Radoop: Analyzing Big Data with RapidMiner and Hadoop Radoop: Analyzing Big Data with RapidMiner and Hadoop Zoltán Prekopcsák, Gábor Makrai, Tamás Henk, Csaba Gáspár-Papanek Budapest University of Technology and Economics, Hungary Abstract Working with large

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

A Cost-Benefit Analysis of Indexing Big Data with Map-Reduce

A Cost-Benefit Analysis of Indexing Big Data with Map-Reduce A Cost-Benefit Analysis of Indexing Big Data with Map-Reduce Dimitrios Siafarikas Argyrios Samourkasidis Avi Arampatzis Department of Electrical and Computer Engineering Democritus University of Thrace

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Product Characteristics Page 2. Management & Administration Page 2. Real-Time Detections & Alerts Page 4. Video Search Page 6

Product Characteristics Page 2. Management & Administration Page 2. Real-Time Detections & Alerts Page 4. Video Search Page 6 Data Sheet savvi Version 5.3 savvi TM is a unified video analytics software solution that offers a wide variety of analytics functionalities through a single, easy to use platform that integrates with

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information