In-memory Distributed Processing Method for Traffic Big Data to Analyze and Share Traffic Events in Real Time among Social Groups

Size: px
Start display at page:

Download "In-memory Distributed Processing Method for Traffic Big Data to Analyze and Share Traffic Events in Real Time among Social Groups"

Transcription

1 , pp In-memory Distributed Processing Method for Traffic Big Data to Analyze and Share Traffic Events in Real Time among Social Groups Dojin Choi 1, Bosung Kim 2, Insu Bae 2 and Seokil Song 1* 1 Department of Computer Engineering, Korea Transportation of National University, Chungju, Chungbuk, Republic of Korea 2 Department of Information Technology Convergence, Korea Transportation of National University, Chungju, Chungbuk, Republic of Korea mycdj91@ut.ac.kr, jikol2000@ut.ac.kr, gkdrmf23@ut.ac.kr, sisong@ut.ac.kr Abstract In this paper, we propose an in-memory distributed processing method that can rapidly process vehicle location and traffic event data using Spark Streaming. The proposed system enables to share information about surrounding vehicles, pedestrians, and traffic events in real time with drivers who use the WEVING service. In the proposed method, vehicle location and traffic event streams are indexed using the grid indexing technique according to time, and the continuous range query method is processed based on the index. Also, traffic events are grouped based on occurrence time, location, content, and road segment of the traffic event transferred in real time in order to avoid duplicated traffic events. Through experiments, we show that the proposed method is able to deduplicate similar traffic events efficiently. Keywords: traffic data, big data, social group, spark streaming 1. Introduction Road congestion makes a great negative impact on the community and the environment. Road congestion can be reduced by two approaches like the following. First approach is to increase road capacity. However, it is very difficult, especially in urban environments. Second approach is to reduce demand in congested areas by providing information about road status so that drivers change mode of transport, or alter the route or time. Collecting traffic information on road by using infrastructure sensors such as CCTVs, loop sensors in roads, and so on has some limits. The sensors cannot cover the whole road until deploying them the whole road with an enormous sum of money. Recently, with the rapid increase in vehicles equipped with such smart devices as black boxes or smartphones that are capable of communication, it becomes possible to collect and share traffic information by using those smart devices instead of using infrastructure sensors. Various applications or services for traffic safety and the convenience services based on collecting and sharing traffic information by using smart devices have been developed. WEVING (WE are driving together) is a social driver assistance system [1] with a smartphone application that automatically detects traffic events (i.e., delays, congestion, accidents, road conditions, etc.,) using cameras, GPS (Global Positioning System), gyroscope sensors, and the acceleration sensors mounted in smartphones. The detected traffic events, event images, and current locations are shared with users in social groups via WEVING. If there are many vehicles that use the WEVING service, and frequent attempts are made at detecting the current location, a large workload for the WEVING server is created when assisting in the sharing of vehicle locations, traffic events, and images ISSN: IJSEIA Copy right c 2016 SERSC

2 between social groups. In particular, sharing the locations of surrounding vehicles and traffic events in real time is very important, given that delaying processing because of server overload might make such sharing of current locations and traffic events meaningless. In addition, traffic events detected in areas of heavy road traffic are likely to be duplicated. Therefore, a single server that processes data by storing them in hard disks cannot meet the requirements of applications such as WEVING. To meet such requirements, applying an in-memory-based processing method is necessary, in addition to managing and analyzing data in parallel through multiple servers in order to facilitate fast data storage and search. In this study, a method for data storage and management for traffic events and vehicle location data is proposed using Spark [2] and Spark Streaming [3], two of the distributed in-memory processing platforms. Spark Streaming is an in-memory stream processing system that processes data streams in batches during a pre-determined time (less than 1 sec) as it receives messages, logs, or text files in real time. Spark Streaming is based on the RDD (Resilient Distributed Datasets) [2] model of Spark. RDD provides data recovery functions stored in memory, which can process data without storing in hard disks. Spark Streaming processes data inputted in a pre-determined time using RDD, thereby providing fault-tolerant stream processing for applications based on Spark Streaming. In this paper, we propose a method that can process data rapidly using Spark Streaming with regard to vehicle location and traffic event data as stream data. The proposed method indexes vehicle location and traffic event streams transferred in real time using Spark Streaming according to location and time by means of a grid indexing technique. The grid indexing technique used in this paper is a modified version of the existing grid indexing technique based on suits Spark Streaming [4]. Vehicles that use the WEVING service set the region of interest based on the current vehicle location, and send this to the server in order to request information about other vehicles and pedestrians, as well as traffic events in the region of interesting from the server. The WEVING server processes the request as it recognizes the region of interesting sent by moving vehicles as a CRQ (Continuous Range Query). To process CRQ effectively, this paper proposes a grid indexing-based CRQ technique. In addition, this paper proposes a method that avoids unnecessary traffic event sharing because the same traffic events are sorted using occurrence time, location, and the content of traffic events sent in real time. The traffic event classification method is also proposed based on Spark Streaming. 3. Related Work Proposed method uses gird index we use distributed grid index of [5]. The distributed indexing method of [5] is based on Spark streaming. [5] adds some transformation operators and output operators such as bulkload, bulkinsert, splitindex, search to Spark Stream to index and query vehicles. The input stream is the postions of moving objects that are transmitted periodically. Spark Stream transforms the input stream into D- Streams. As shown in Figure 1, the input stream is transformed into DSt, DSt+1, DSt+2 and DSt+3 continuously by Spark Stream. It performs bulkload and bulkinsert operators on each D-Stream. bulkload builds a gird index GIt with position data in DSt. The proposed indexing method uses grid techniques to reduce the time for building and updating an index. The indexing method does not use lock based concurrency control method. The D-Stream model of Spark Stream is immutable, so update operations and search operations are not performed concurrently on an index. As shown in Figure 1, an index is updated by bulkload and bulkinsert operators, and multiple versions of the index at time t, t+1, t+2 an t+3, which can be accessed by users, are on main memory. Therefore, users can access GIt+2, while GIt+3 is being built. 52 Copy right c 2016 SERSC

3 Figure 1. Build Grid Index with Input Stream 2. Proposed In-Memory Traffic Big Data Processing Technique The server processes the received data differently depending on whether the data are the client's current location or traffic event. The server provides the search result through CRQ on the surrounding vehicles and traffic events displayed in the client screen. The CRQ sent by the client is revised as the vehicle travels, whereas the server calculates the result of the revised CRQ region in real time in order to send the result to the client. The server finds the road segment according to the received current vehicle location from the external map server, and stores the vehicle location and road segment identifier together. Simultaneously, the vehicle's current location is indexed through the grid index manager, and a continuous query result list is updated through the continuous query processing manager. Similarly, the traffic event manager in the server indexes the received traffic event through the grid index manager, and searches the indexed traffic event in order to group similar events. For each group, a representative traffic event is selected, and it is reflected in the CRQ result for each client through the continuous query processing manager. The push manager sends the updated continuous query result to each client in order to display the result on the screen. The aforementioned procedure is designed and implemented based on Spark Streaming. The grid index and continuous query list are maintained in memory as a type of RDD in Spark. In addition, this study proposes a management method that uses multiple version types in consideration of the no-modification characteristic of RDD. Through this, this study is designed to allow concurrency control between read and write operations on the grid index and continuous query result list. Copyright c 2016 SERSC 53

4 Figure 2. Architecture of the Proposed In-memory Big Traffic Data Processing System Figure 3 shows the overall structure of the CRQ processing technique that can search surrounding vehicles according to client location, which is designed based on Spark Streaming. The messages transferred from clients are collected for a certain time (from less than 1 sec to several seconds) in order to create the RDD. The created RDD is then modified to location-data-only RDD through preprocessing, thereby being modified into the version n grid index RDD via the index generator. Figure 3. Architecture of the Continuous Query Processing Method Figure 4 shows the CRQ processing method based on the grid index. The continuous query processor in Figure 2 registers CRQs that include each corresponding cell in the created grid indexes. Each query can be run over multiple cells so that a single query can 54 Copy right c 2016 SERSC

5 be registered over the query list of multiple cells. The algorithm to process the CRQ is described in Figure 5. Figure 4. Continuos Query Processing Method using Grid Index Create_CQ(object) { old_cq = object.cq old_cq.reference -= 1 if(old_cq.reference == 0) { old_cq.mbr = object.mbr else { new _CQ = new CQ(object.MBR) new _CQ.Reference_Objs += object Update_CQ(CQ, objects) { for(i <- 0 to objects.length) { for(j <- 0 to CQ.length) { CQ[j].Contain_Objs = Contain_Check (CQ[j].MBR, object[i].location) refreshmessage(cq.referenceobjs) Figure 5. Algorithm for CRQ Creation and Update The aforementioned CRQ processing technique is explained based on a search of the vehicle's current location. The CRQ of each client should search not only the moving objects, such as the surrounding vehicles and pedestrians, but also traffic events that occur nearby. The CRQ processing method that searches traffic events is processed similarly to the aforementioned CRQ processing method, and the detailed explanation is omitted here. The traffic events received at the server can be duplicated. This is because the same traffic event can be detected through vehicles moving in the same location. Even worse, where heavy traffic is found, the number of duplicated traffic events is great. In this paper, when the same events are detected, they are grouped and a representative event is selected from this group and shared with users within the social group. The grouping event procedure is shown in Figure 6. Copyright c 2016 SERSC 55

6 Figure 6. Process of the Traffic Event Grouping The server verifies whether the received message is an event type, and then adds it to the traffic event index. Indexing of the traffic event is performed through the previously described grid index manager. Next, the traffic event indexes are searched via location, time, and type in order to find the same event candidate groups, and finally, it is determined whether the searched events are located in the same road segment. If they are determined to be the same event, they are added to the existing event group; otherwise, a new event group is created. 3. Experiments The event data set used in the experiment was created by modifying the traffic dataset provided by [8]. This data set contains 5,000 moving objects that traveled for 20 secs. The moving objects are created randomly within the determined region. The created data contains moving object ID, type (newly created moving object or active moving object), latitude, longitude, and timestamp, as shown in Figure 7. (a) Sample of Traffic Data Set (b) Sample of Event Data Set Figure 7. Sample of Traffic and Event Data Our in-memory distributed traffic big data system is implemented based on Spark Streaming. Cluster server that is used in our experiment consists of 8 nodes and each node has 2 Intel CPUs, 8 Giga byte RAM and 500 Gigabyte HDD. We use Scala program language to develop our system. The event data set contains 100,000 virtual event data records (5,000 moving object timestamp of 20) by adding an event column to the data shown in Figure 7(a). The created event data set is shown in Figure 7(b), and a total of four event types (rain, snow, accident, and delay) are created. Figure 8 shows the event grouping results. The event data that consist of 100,000 records are grouped into 56 Copy right c 2016 SERSC

7 735 events. The left side of the figure shows the events prior to grouping, and the right side of the figure shows the grouped events on the map. 4. Conclusion Figure 8. Result of the Event Grouping This paper proposed a method that can rapidly process vehicle location and traffic event data using Spark Streaming in order to share information about surrounding vehicles, pedestrians, and traffic events in real time with drivers who use the WEVING service. In the proposed method, vehicle location and traffic event streams transferred in real time are indexed using the grid indexing technique according to time, and the CRQ method is processed based on the index. Furthermore, traffic events are grouped based on occurrence time, location, content, and road segment of the traffic event transferred in real time in order to share only a representative traffic event. Representative traffic events were extracted through an event grouping experiment. The experiment result verified that 100,000 events were grouped into 735 representative events. Acknowledgments This research was supported by a grant(14tlrp-c ) from Transportation & Logistics Research Program (TLRP) funded by Ministry of Land, Infrastructure and Transport of Korean government, and also, this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF- 2014R1A1A ) Copyright c 2016 SERSC 57

8 References [1] H. Li, Y. Lee, B. Kim, D. Choi, I. Bae, S. Song, M. Yeo and R. Oh, WEAVING : social driving assistant system, In Proceedings of ISITC, (2014). [2] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: cluster computing with working set, Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, (2010), pp [3] M. Zaharia, T. Das, H. Li, S. Shenker and I. Stoica, Discretized streams: an efficient and fault-tolerant model for stream processing on large cluster, Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, (2012), pp [4] X. Xiong, M. F. Mokbel, and W. G. Aref, Lugrid: update-tolerant grid-based indexing for moving objects, In Proceedings of MDM, (2006), pp [5] H. Li, Y. Lee and S. Song, Grid based Distributed In-memory Indexing for Moving Objects, In Proceedings of ISITC, (2014). [6] [7] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, (2008), pp [8] Authors Dojin Choi, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Computer Engineering Department, Korea National University of Transportation, Republic of Korea. His research interests are database systems, concurrency control, snapshot isolation and distributed processing systems. Bosung Kim, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Information Convergence Department, Korea National University of Transportation, Republic of Korea. His research interests are Big Data, DNA Sequence Analysis, Local Alignment Algorithm. Insoo Bae, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Information Convergence Department, Korea National University of Transportation, Republic of Korea. His research interests are file system, data replication and operation system. Seokil Song, received the BS, MS and PhD degrees in Computer and Communication Department from Chungbuk National University of South Korea in 1998, 2000 and 2003, respectively. He is an Associate Professor of the Computer Engineering Department, Korea National University of Transportation, Republic of Korea. His research interests are database systems, index structures, concurrency control, storage systems and distributed stream data processing. 58 Copy right c 2016 SERSC

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

Big Data Analytics Hadoop and Spark

Big Data Analytics Hadoop and Spark Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software

More information

Customized Efficient Collection of Big Data for Advertising Services

Customized Efficient Collection of Big Data for Advertising Services , pp.36-41 http://dx.doi.org/10.14257/astl.2015.94.09 Customized Efficient Collection of Big Data for Advertising Services Jun-Soo Yun 1, Jin-Tae Park 1, Hyun-Seo Hwang 1, Il-Young Moon 1 1 1600 Chungjeol-ro,

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

CSE-E5430 Scalable Cloud Computing Lecture 11

CSE-E5430 Scalable Cloud Computing Lecture 11 CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 30.11-2015 1/24 Distributed Coordination Systems Consensus

More information

Redundant Data Removal Technique for Efficient Big Data Search Processing

Redundant Data Removal Technique for Efficient Big Data Search Processing Redundant Data Removal Technique for Efficient Big Data Search Processing Seungwoo Jeon 1, Bonghee Hong 1, Joonho Kwon 2, Yoon-sik Kwak 3 and Seok-il Song 3 1 Dept. of Computer Engineering, Pusan National

More information

Processing Large Amounts of Images on Hadoop with OpenCV

Processing Large Amounts of Images on Hadoop with OpenCV Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru

More information

Cloud Computing based Livestock Monitoring and Disease Forecasting System

Cloud Computing based Livestock Monitoring and Disease Forecasting System , pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4

More information

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Spark. Fast, Interactive, Language- Integrated Cluster Computing Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC

More information

A Noble Integrated Management System based on Mobile and Cloud service for preventing various hazards

A Noble Integrated Management System based on Mobile and Cloud service for preventing various hazards , pp.166-171 http://dx.doi.org/10.14257/astl.205.98.42 A Noble Integrated Management System based on Mobile and Cloud service for preventing various hazards Yeo ChangSub 1, Ryu HyunKi 1 and Lee HaengSuk

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache

More information

Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service

Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service , pp. 195-204 http://dx.doi.org/10.14257/ijsh.2015.9.5.19 Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service Ju-Su Kim, Hak-Jun

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Rakam: Distributed Analytics API

Rakam: Distributed Analytics API Rakam: Distributed Analytics API Burak Emre Kabakcı May 30, 2014 Abstract Today, most of the big data applications needs to compute data in real-time since the Internet develops quite fast and the users

More information

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System , pp.97-108 http://dx.doi.org/10.14257/ijseia.2014.8.6.08 Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System Suk Hwan Moon and Cheol sick Lee Department

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

Cyber Forensic for Hadoop based Cloud System

Cyber Forensic for Hadoop based Cloud System Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division

More information

Big Data Collection Study for Providing Efficient Information

Big Data Collection Study for Providing Efficient Information , pp. 41-50 http://dx.doi.org/10.14257/ijseia.2015.9.12.03 Big Data Collection Study for Providing Efficient Information Jun-soo Yun, Jin-tae Park, Hyun-seo Hwang and Il-young Moon Computer Science and

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:

More information

ANALYSIS OF BILL OF MATERIAL DATA USING KAFKA AND SPARK

ANALYSIS OF BILL OF MATERIAL DATA USING KAFKA AND SPARK 44 ANALYSIS OF BILL OF MATERIAL DATA USING KAFKA AND SPARK Ashwitha Jain *, Dr. Venkatramana Bhat P ** * Student, Department of Computer Science & Engineering, Mangalore Institute of Technology & Engineering

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information

Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards

Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards , pp. 143-150 http://dx.doi.org/10.14257/ijseia.2015.9.7.15 Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards Ryu HyunKi 1, Yeo ChangSub 1, Jeonghyun

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person

More information

DBaaS Using HL7 Based on XMDR-DAI for Medical Information Sharing in Cloud

DBaaS Using HL7 Based on XMDR-DAI for Medical Information Sharing in Cloud , pp.111-120 http://dx.doi.org/10.14257/ijmue.2015.10.9.12 DBaaS Using HL7 Based on XMDR-DAI for Medical Information Sharing in Cloud Ho-Kyun Park 1 and Seok-Jae Moon 2 1 School of IT Convergence Engineering,

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

A Study on Integrated Operation of Monitoring Systems using a Water Management Scenario

A Study on Integrated Operation of Monitoring Systems using a Water Management Scenario , pp. 55-64 http://dx.doi.org/10.14257/ijseia.2015.9.9.06 A Study on Integrated Operation of Monitoring Systems using a Water Management Scenario Yong-Hyeon Gwon 1, Seung-Kwon Jung 2, Su-Won Lee 2 and

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

A Study on IP Exposure Notification System for IoT Devices Using IP Search Engine Shodan

A Study on IP Exposure Notification System for IoT Devices Using IP Search Engine Shodan , pp.61-66 http://dx.doi.org/10.14257/ijmue.2015.10.12.07 A Study on IP Exposure Notification System for IoT Devices Using IP Search Engine Shodan Yun-Seong Ko 1, Il-Kyeun Ra 2 and Chang-Soo Kim 1* 1 Department

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Effective Use of Android Sensors Based on Visualization of Sensor Information

Effective Use of Android Sensors Based on Visualization of Sensor Information , pp.299-308 http://dx.doi.org/10.14257/ijmue.2015.10.9.31 Effective Use of Android Sensors Based on Visualization of Sensor Information Young Jae Lee Faculty of Smartmedia, Jeonju University, 303 Cheonjam-ro,

More information

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

Crime Hotspots Analysis in South Korea: A User-Oriented Approach , pp.81-85 http://dx.doi.org/10.14257/astl.2014.52.14 Crime Hotspots Analysis in South Korea: A User-Oriented Approach Aziz Nasridinov 1 and Young-Ho Park 2 * 1 School of Computer Engineering, Dongguk

More information

A Resilient Device Monitoring System in Collaboration Environments

A Resilient Device Monitoring System in Collaboration Environments , pp.103-114 http://dx.doi.org/10.14257/ijsh.2014.8.5.10 A Resilient Device Monitoring System in Collaboration Environments KeeHyun Park 1 and JongHwi Lee 1 Department of Computer Engineering, Keimyung

More information

A Research Using Private Cloud with IP Camera and Smartphone Video Retrieval

A Research Using Private Cloud with IP Camera and Smartphone Video Retrieval , pp.175-186 http://dx.doi.org/10.14257/ijsh.2014.8.1.19 A Research Using Private Cloud with IP Camera and Smartphone Video Retrieval Kil-sung Park and Sun-Hyung Kim Department of Information & Communication

More information

A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems

A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems Milan Čermák, Daniel Tovarňák, Martin Laštovička, Pavel Čeleda Institute of Computer Science, Masaryk University

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Brave New World: Hadoop vs. Spark

Brave New World: Hadoop vs. Spark Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Fujitsu Big Data Software Use Cases

Fujitsu Big Data Software Use Cases Fujitsu Big Data Software Use s Using Big Data Opens the Door to New Business Areas The use of Big Data is needed in order to discover trends and predictions, hidden in data generated over the course of

More information

The Design and Implementation of the Integrated Model of the Advertisement and Remote Control System for an Elevator

The Design and Implementation of the Integrated Model of the Advertisement and Remote Control System for an Elevator Vol.8, No.3 (2014), pp.107-118 http://dx.doi.org/10.14257/ijsh.2014.8.3.10 The Design and Implementation of the Integrated Model of the Advertisement and Remote Control System for an Elevator Woon-Yong

More information

Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology

Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Yun Cui 1, Myoungjin Kim 1, Seung-woo Kum 3, Jong-jin Jung 3, Tae-Beom Lim 3, Hanku Lee 2, *, and Okkyung Choi 2 1

More information

Design of Simulator for Cloud Computing Infrastructure and Service

Design of Simulator for Cloud Computing Infrastructure and Service , pp. 27-36 http://dx.doi.org/10.14257/ijsh.2014.8.6.03 Design of Simulator for Cloud Computing Infrastructure and Service Changhyeon Kim, Junsang Kim and Won Joo Lee * Dept. of Computer Science and Engineering,

More information

A Load Balanced PC-Cluster for Video-On-Demand Server Systems

A Load Balanced PC-Cluster for Video-On-Demand Server Systems International Journal of Grid and Distributed Computing 63 A Load Balanced PC-Cluster for Video-On-Demand Server Systems Liang-Teh Lee 1, Hung-Yuan Chang 1,2, Der-Fu Tao 2, and Siang-Lin Yang 1 1 Dept.

More information

An Efficient Application Virtualization Mechanism using Separated Software Execution System

An Efficient Application Virtualization Mechanism using Separated Software Execution System An Efficient Application Virtualization Mechanism using Separated Software Execution System Su-Min Jang, Won-Hyuk Choi and Won-Young Kim Cloud Computing Research Department, Electronics and Telecommunications

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Design and Implementation of Automatic Attendance Check System Using BLE Beacon

Design and Implementation of Automatic Attendance Check System Using BLE Beacon , pp.177-186 http://dx.doi.org/10.14257/ijmue.2015.10.10.19 Design and Implementation of Automatic Attendance Check System Using BLE Beacon Mi-Young Bae and Dae-Jea Cho * Dept. Of Multimedia Engineering,

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture

A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture Hyeon seok O, Namgi Kim1, Byoung-Dai Lee dept. of Computer Science. Kyonggi University, Suwon,

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Yahoo! Research Abstract Data analytics are key applications

More information

Optimization and analysis of large scale data sorting algorithm based on Hadoop

Optimization and analysis of large scale data sorting algorithm based on Hadoop Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,

More information

Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations

Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations Ryu HyunKi, Moon ChangSoo, Yeo ChangSub, and Lee HaengSuk Abstract In this paper,

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Hybrid System for Driver Assistance

Hybrid System for Driver Assistance International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1583-1587 International Research Publications House http://www. irphouse.com Hybrid System

More information

Content-Aware Load Balancing using Direct Routing for VOD Streaming Service

Content-Aware Load Balancing using Direct Routing for VOD Streaming Service Content-Aware Load Balancing using Direct Routing for VOD Streaming Service Young-Hwan Woo, Jin-Wook Chung, Seok-soo Kim Dept. of Computer & Information System, Geo-chang Provincial College, Korea School

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant

More information

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1 Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010

More information

Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment

Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment Vol.46 (Multimedia 2014), pp.307-312 http://dx.doi.org/10.14257/astl.2014.46.64 Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment Jin-tae Park

More information

NetFlow Analysis with MapReduce

NetFlow Analysis with MapReduce NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with

More information

RUBA: Real-time Unstructured Big Data Analysis Framework

RUBA: Real-time Unstructured Big Data Analysis Framework RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, bytelee@etri.re.kr

More information

SPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble!

SPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble! SPARK USE CASE IN TELCO Apache Spark Night 9-2-2014! Chance Coble! Use Case Profile Telecommunications company Shared business problems/pain Scalable analytics infrastructure is a problem Pushing infrastructure

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Spark and Shark. High- Speed In- Memory Analytics over Hadoop and Hive Data

Spark and Shark. High- Speed In- Memory Analytics over Hadoop and Hive Data Spark and Shark High- Speed In- Memory Analytics over Hadoop and Hive Data Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Cliff Engle, Michael Franklin, Haoyuan Li,

More information

The Sensitive Information Management System for Merger and Acquisition (M&A) Transactions

The Sensitive Information Management System for Merger and Acquisition (M&A) Transactions , pp.203-212 http://dx.doi.org/10.14257/ijmue.2014.9.3.19 The Sensitive Information Management System for Merger and Acquisition (M&A) Transactions Kyong-jin Kim * and Seng-phil Hong ** Sungshin Women

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

A Study on Information Technology Plan and Status of University 2013

A Study on Information Technology Plan and Status of University 2013 , pp. 47-54 http://dx.doi.org/10.14257/ijseia.2014.8.10.05 A Study on Information Technology Plan and Status of University 2013 Tae-Yong Shim 1, Il-Jun Choi 2, Jin Kim 3 and Young-Hun Lee 4 1 Department

More information

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came

More information

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution , pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik

More information

Development of a Service Robot System for a Remote Child Monitoring Platform

Development of a Service Robot System for a Remote Child Monitoring Platform , pp.153-162 http://dx.doi.org/10.14257/ijsh.2014.8.5.14 Development of a Service Robot System for a Remote Child Monitoring Platform Taewoo Han 1 and Yong-Ho Seo 2, * 1 Department of Game and Multimedia,

More information

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal

More information

Adaptive Load Balancing Method Enabling Auto-Specifying Threshold of Node Load Status for Apache Flume

Adaptive Load Balancing Method Enabling Auto-Specifying Threshold of Node Load Status for Apache Flume , pp. 201-210 http://dx.doi.org/10.14257/ijseia.2015.9.2.17 Adaptive Load Balancing Method Enabling Auto-Specifying Threshold of Node Load Status for Apache Flume UnGyu Han and Jinho Ahn Dept. of Comp.

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Design of a NAND Flash Memory File System to Improve System Boot Time

Design of a NAND Flash Memory File System to Improve System Boot Time International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

Security Measures of Personal Information of Smart Home PC

Security Measures of Personal Information of Smart Home PC , pp.227-236 http://dx.doi.org/10.14257/ijsh.2013.7.6.22 Security Measures of Personal Information of Smart Home PC Mi-Sook Seo 1 and Dea-Woo Park 2 1, 2 Department of Integrative Engineering, Hoseo Graduate

More information

86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014

86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014 86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014 Dual server-based secure data-storage system for cloud storage Woong Go ISAA Lab, Department of Information Security Engineering,

More information

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack Apache Spark Document Analysis Course (Fall 2015 - Scott Sanner) Zahra Iman Some slides from (Matei Zaharia, UC Berkeley / MIT& Harold Liu) Reminder SparkConf JavaSpark RDD: Resilient Distributed Datasets

More information

A Study on Data Analysis Process Management System in MapReduce using BPM

A Study on Data Analysis Process Management System in MapReduce using BPM A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

A study on Standardization of Integrated database for Intelligent water information management

A study on Standardization of Integrated database for Intelligent water information management , pp.132-136 http://dx.doi.org/10.14257/astl.2015.99.33 A study on Standardization of Integrated database for Intelligent water information management Ji Won Jung *, Seung Kwon Jung **, Jin Tak Choi ***,

More information

Data Mining for Data Cloud and Compute Cloud

Data Mining for Data Cloud and Compute Cloud Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,

More information

Apache Spark and Distributed Programming

Apache Spark and Distributed Programming Apache Spark and Distributed Programming Concurrent Programming Keijo Heljanko Department of Computer Science University School of Science November 25th, 2015 Slides by Keijo Heljanko Apache Spark Apache

More information

Study on the Vulnerability Level of Physical Security And Application of the IP-Based Devices

Study on the Vulnerability Level of Physical Security And Application of the IP-Based Devices , pp. 63-68 http://dx.doi.org/10.14257/ijsh.2015.9.10.07 Study on the Vulnerability Level of Physical Security And Application of the IP-Based Devices Kwang-Hyuk Park 1, Il-Kyeun Ra 2 and Chang-Soo Kim

More information

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers , pp.155-164 http://dx.doi.org/10.14257/ijunesst.2015.8.1.14 A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers Yunhua Gu, Bao Gao, Jin Wang, Mingshu Yin and Junyong Zhang

More information

Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud

Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud , pp.105-111 http://dx.doi.org/10.14257/astl.2014.66.25 Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud Kyoung-Taek Seo 1, Hyun-Seo Hwang 1, Il-Young Moon 1, Oh-Young

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information