In-memory Distributed Processing Method for Traffic Big Data to Analyze and Share Traffic Events in Real Time among Social Groups
|
|
- Wilfred Holt
- 7 years ago
- Views:
Transcription
1 , pp In-memory Distributed Processing Method for Traffic Big Data to Analyze and Share Traffic Events in Real Time among Social Groups Dojin Choi 1, Bosung Kim 2, Insu Bae 2 and Seokil Song 1* 1 Department of Computer Engineering, Korea Transportation of National University, Chungju, Chungbuk, Republic of Korea 2 Department of Information Technology Convergence, Korea Transportation of National University, Chungju, Chungbuk, Republic of Korea mycdj91@ut.ac.kr, jikol2000@ut.ac.kr, gkdrmf23@ut.ac.kr, sisong@ut.ac.kr Abstract In this paper, we propose an in-memory distributed processing method that can rapidly process vehicle location and traffic event data using Spark Streaming. The proposed system enables to share information about surrounding vehicles, pedestrians, and traffic events in real time with drivers who use the WEVING service. In the proposed method, vehicle location and traffic event streams are indexed using the grid indexing technique according to time, and the continuous range query method is processed based on the index. Also, traffic events are grouped based on occurrence time, location, content, and road segment of the traffic event transferred in real time in order to avoid duplicated traffic events. Through experiments, we show that the proposed method is able to deduplicate similar traffic events efficiently. Keywords: traffic data, big data, social group, spark streaming 1. Introduction Road congestion makes a great negative impact on the community and the environment. Road congestion can be reduced by two approaches like the following. First approach is to increase road capacity. However, it is very difficult, especially in urban environments. Second approach is to reduce demand in congested areas by providing information about road status so that drivers change mode of transport, or alter the route or time. Collecting traffic information on road by using infrastructure sensors such as CCTVs, loop sensors in roads, and so on has some limits. The sensors cannot cover the whole road until deploying them the whole road with an enormous sum of money. Recently, with the rapid increase in vehicles equipped with such smart devices as black boxes or smartphones that are capable of communication, it becomes possible to collect and share traffic information by using those smart devices instead of using infrastructure sensors. Various applications or services for traffic safety and the convenience services based on collecting and sharing traffic information by using smart devices have been developed. WEVING (WE are driving together) is a social driver assistance system [1] with a smartphone application that automatically detects traffic events (i.e., delays, congestion, accidents, road conditions, etc.,) using cameras, GPS (Global Positioning System), gyroscope sensors, and the acceleration sensors mounted in smartphones. The detected traffic events, event images, and current locations are shared with users in social groups via WEVING. If there are many vehicles that use the WEVING service, and frequent attempts are made at detecting the current location, a large workload for the WEVING server is created when assisting in the sharing of vehicle locations, traffic events, and images ISSN: IJSEIA Copy right c 2016 SERSC
2 between social groups. In particular, sharing the locations of surrounding vehicles and traffic events in real time is very important, given that delaying processing because of server overload might make such sharing of current locations and traffic events meaningless. In addition, traffic events detected in areas of heavy road traffic are likely to be duplicated. Therefore, a single server that processes data by storing them in hard disks cannot meet the requirements of applications such as WEVING. To meet such requirements, applying an in-memory-based processing method is necessary, in addition to managing and analyzing data in parallel through multiple servers in order to facilitate fast data storage and search. In this study, a method for data storage and management for traffic events and vehicle location data is proposed using Spark [2] and Spark Streaming [3], two of the distributed in-memory processing platforms. Spark Streaming is an in-memory stream processing system that processes data streams in batches during a pre-determined time (less than 1 sec) as it receives messages, logs, or text files in real time. Spark Streaming is based on the RDD (Resilient Distributed Datasets) [2] model of Spark. RDD provides data recovery functions stored in memory, which can process data without storing in hard disks. Spark Streaming processes data inputted in a pre-determined time using RDD, thereby providing fault-tolerant stream processing for applications based on Spark Streaming. In this paper, we propose a method that can process data rapidly using Spark Streaming with regard to vehicle location and traffic event data as stream data. The proposed method indexes vehicle location and traffic event streams transferred in real time using Spark Streaming according to location and time by means of a grid indexing technique. The grid indexing technique used in this paper is a modified version of the existing grid indexing technique based on suits Spark Streaming [4]. Vehicles that use the WEVING service set the region of interest based on the current vehicle location, and send this to the server in order to request information about other vehicles and pedestrians, as well as traffic events in the region of interesting from the server. The WEVING server processes the request as it recognizes the region of interesting sent by moving vehicles as a CRQ (Continuous Range Query). To process CRQ effectively, this paper proposes a grid indexing-based CRQ technique. In addition, this paper proposes a method that avoids unnecessary traffic event sharing because the same traffic events are sorted using occurrence time, location, and the content of traffic events sent in real time. The traffic event classification method is also proposed based on Spark Streaming. 3. Related Work Proposed method uses gird index we use distributed grid index of [5]. The distributed indexing method of [5] is based on Spark streaming. [5] adds some transformation operators and output operators such as bulkload, bulkinsert, splitindex, search to Spark Stream to index and query vehicles. The input stream is the postions of moving objects that are transmitted periodically. Spark Stream transforms the input stream into D- Streams. As shown in Figure 1, the input stream is transformed into DSt, DSt+1, DSt+2 and DSt+3 continuously by Spark Stream. It performs bulkload and bulkinsert operators on each D-Stream. bulkload builds a gird index GIt with position data in DSt. The proposed indexing method uses grid techniques to reduce the time for building and updating an index. The indexing method does not use lock based concurrency control method. The D-Stream model of Spark Stream is immutable, so update operations and search operations are not performed concurrently on an index. As shown in Figure 1, an index is updated by bulkload and bulkinsert operators, and multiple versions of the index at time t, t+1, t+2 an t+3, which can be accessed by users, are on main memory. Therefore, users can access GIt+2, while GIt+3 is being built. 52 Copy right c 2016 SERSC
3 Figure 1. Build Grid Index with Input Stream 2. Proposed In-Memory Traffic Big Data Processing Technique The server processes the received data differently depending on whether the data are the client's current location or traffic event. The server provides the search result through CRQ on the surrounding vehicles and traffic events displayed in the client screen. The CRQ sent by the client is revised as the vehicle travels, whereas the server calculates the result of the revised CRQ region in real time in order to send the result to the client. The server finds the road segment according to the received current vehicle location from the external map server, and stores the vehicle location and road segment identifier together. Simultaneously, the vehicle's current location is indexed through the grid index manager, and a continuous query result list is updated through the continuous query processing manager. Similarly, the traffic event manager in the server indexes the received traffic event through the grid index manager, and searches the indexed traffic event in order to group similar events. For each group, a representative traffic event is selected, and it is reflected in the CRQ result for each client through the continuous query processing manager. The push manager sends the updated continuous query result to each client in order to display the result on the screen. The aforementioned procedure is designed and implemented based on Spark Streaming. The grid index and continuous query list are maintained in memory as a type of RDD in Spark. In addition, this study proposes a management method that uses multiple version types in consideration of the no-modification characteristic of RDD. Through this, this study is designed to allow concurrency control between read and write operations on the grid index and continuous query result list. Copyright c 2016 SERSC 53
4 Figure 2. Architecture of the Proposed In-memory Big Traffic Data Processing System Figure 3 shows the overall structure of the CRQ processing technique that can search surrounding vehicles according to client location, which is designed based on Spark Streaming. The messages transferred from clients are collected for a certain time (from less than 1 sec to several seconds) in order to create the RDD. The created RDD is then modified to location-data-only RDD through preprocessing, thereby being modified into the version n grid index RDD via the index generator. Figure 3. Architecture of the Continuous Query Processing Method Figure 4 shows the CRQ processing method based on the grid index. The continuous query processor in Figure 2 registers CRQs that include each corresponding cell in the created grid indexes. Each query can be run over multiple cells so that a single query can 54 Copy right c 2016 SERSC
5 be registered over the query list of multiple cells. The algorithm to process the CRQ is described in Figure 5. Figure 4. Continuos Query Processing Method using Grid Index Create_CQ(object) { old_cq = object.cq old_cq.reference -= 1 if(old_cq.reference == 0) { old_cq.mbr = object.mbr else { new _CQ = new CQ(object.MBR) new _CQ.Reference_Objs += object Update_CQ(CQ, objects) { for(i <- 0 to objects.length) { for(j <- 0 to CQ.length) { CQ[j].Contain_Objs = Contain_Check (CQ[j].MBR, object[i].location) refreshmessage(cq.referenceobjs) Figure 5. Algorithm for CRQ Creation and Update The aforementioned CRQ processing technique is explained based on a search of the vehicle's current location. The CRQ of each client should search not only the moving objects, such as the surrounding vehicles and pedestrians, but also traffic events that occur nearby. The CRQ processing method that searches traffic events is processed similarly to the aforementioned CRQ processing method, and the detailed explanation is omitted here. The traffic events received at the server can be duplicated. This is because the same traffic event can be detected through vehicles moving in the same location. Even worse, where heavy traffic is found, the number of duplicated traffic events is great. In this paper, when the same events are detected, they are grouped and a representative event is selected from this group and shared with users within the social group. The grouping event procedure is shown in Figure 6. Copyright c 2016 SERSC 55
6 Figure 6. Process of the Traffic Event Grouping The server verifies whether the received message is an event type, and then adds it to the traffic event index. Indexing of the traffic event is performed through the previously described grid index manager. Next, the traffic event indexes are searched via location, time, and type in order to find the same event candidate groups, and finally, it is determined whether the searched events are located in the same road segment. If they are determined to be the same event, they are added to the existing event group; otherwise, a new event group is created. 3. Experiments The event data set used in the experiment was created by modifying the traffic dataset provided by [8]. This data set contains 5,000 moving objects that traveled for 20 secs. The moving objects are created randomly within the determined region. The created data contains moving object ID, type (newly created moving object or active moving object), latitude, longitude, and timestamp, as shown in Figure 7. (a) Sample of Traffic Data Set (b) Sample of Event Data Set Figure 7. Sample of Traffic and Event Data Our in-memory distributed traffic big data system is implemented based on Spark Streaming. Cluster server that is used in our experiment consists of 8 nodes and each node has 2 Intel CPUs, 8 Giga byte RAM and 500 Gigabyte HDD. We use Scala program language to develop our system. The event data set contains 100,000 virtual event data records (5,000 moving object timestamp of 20) by adding an event column to the data shown in Figure 7(a). The created event data set is shown in Figure 7(b), and a total of four event types (rain, snow, accident, and delay) are created. Figure 8 shows the event grouping results. The event data that consist of 100,000 records are grouped into 56 Copy right c 2016 SERSC
7 735 events. The left side of the figure shows the events prior to grouping, and the right side of the figure shows the grouped events on the map. 4. Conclusion Figure 8. Result of the Event Grouping This paper proposed a method that can rapidly process vehicle location and traffic event data using Spark Streaming in order to share information about surrounding vehicles, pedestrians, and traffic events in real time with drivers who use the WEVING service. In the proposed method, vehicle location and traffic event streams transferred in real time are indexed using the grid indexing technique according to time, and the CRQ method is processed based on the index. Furthermore, traffic events are grouped based on occurrence time, location, content, and road segment of the traffic event transferred in real time in order to share only a representative traffic event. Representative traffic events were extracted through an event grouping experiment. The experiment result verified that 100,000 events were grouped into 735 representative events. Acknowledgments This research was supported by a grant(14tlrp-c ) from Transportation & Logistics Research Program (TLRP) funded by Ministry of Land, Infrastructure and Transport of Korean government, and also, this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF- 2014R1A1A ) Copyright c 2016 SERSC 57
8 References [1] H. Li, Y. Lee, B. Kim, D. Choi, I. Bae, S. Song, M. Yeo and R. Oh, WEAVING : social driving assistant system, In Proceedings of ISITC, (2014). [2] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: cluster computing with working set, Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, (2010), pp [3] M. Zaharia, T. Das, H. Li, S. Shenker and I. Stoica, Discretized streams: an efficient and fault-tolerant model for stream processing on large cluster, Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, (2012), pp [4] X. Xiong, M. F. Mokbel, and W. G. Aref, Lugrid: update-tolerant grid-based indexing for moving objects, In Proceedings of MDM, (2006), pp [5] H. Li, Y. Lee and S. Song, Grid based Distributed In-memory Indexing for Moving Objects, In Proceedings of ISITC, (2014). [6] [7] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, (2008), pp [8] Authors Dojin Choi, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Computer Engineering Department, Korea National University of Transportation, Republic of Korea. His research interests are database systems, concurrency control, snapshot isolation and distributed processing systems. Bosung Kim, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Information Convergence Department, Korea National University of Transportation, Republic of Korea. His research interests are Big Data, DNA Sequence Analysis, Local Alignment Algorithm. Insoo Bae, received the BS degrees in Computer Engineering Department from Korea National University of Transportation of South Korea in He is an Master Course of the Information Convergence Department, Korea National University of Transportation, Republic of Korea. His research interests are file system, data replication and operation system. Seokil Song, received the BS, MS and PhD degrees in Computer and Communication Department from Chungbuk National University of South Korea in 1998, 2000 and 2003, respectively. He is an Associate Professor of the Computer Engineering Department, Korea National University of Transportation, Republic of Korea. His research interests are database systems, index structures, concurrency control, storage systems and distributed stream data processing. 58 Copy right c 2016 SERSC
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
More informationMulti-level Metadata Management Scheme for Cloud Storage System
, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationBig Data Analytics Hadoop and Spark
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
More informationCustomized Efficient Collection of Big Data for Advertising Services
, pp.36-41 http://dx.doi.org/10.14257/astl.2015.94.09 Customized Efficient Collection of Big Data for Advertising Services Jun-Soo Yun 1, Jin-Tae Park 1, Hyun-Seo Hwang 1, Il-Young Moon 1 1 1600 Chungjeol-ro,
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationCSE-E5430 Scalable Cloud Computing Lecture 11
CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 30.11-2015 1/24 Distributed Coordination Systems Consensus
More informationRedundant Data Removal Technique for Efficient Big Data Search Processing
Redundant Data Removal Technique for Efficient Big Data Search Processing Seungwoo Jeon 1, Bonghee Hong 1, Joonho Kwon 2, Yoon-sik Kwak 3 and Seok-il Song 3 1 Dept. of Computer Engineering, Pusan National
More informationProcessing Large Amounts of Images on Hadoop with OpenCV
Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru
More informationCloud Computing based Livestock Monitoring and Disease Forecasting System
, pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4
More informationSpark. Fast, Interactive, Language- Integrated Cluster Computing
Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
More informationA Noble Integrated Management System based on Mobile and Cloud service for preventing various hazards
, pp.166-171 http://dx.doi.org/10.14257/astl.205.98.42 A Noble Integrated Management System based on Mobile and Cloud service for preventing various hazards Yeo ChangSub 1, Ryu HyunKi 1 and Lee HaengSuk
More informationArchitectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
More informationSmart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service
, pp. 195-204 http://dx.doi.org/10.14257/ijsh.2015.9.5.19 Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service Ju-Su Kim, Hak-Jun
More informationCLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,
More informationRakam: Distributed Analytics API
Rakam: Distributed Analytics API Burak Emre Kabakcı May 30, 2014 Abstract Today, most of the big data applications needs to compute data in real-time since the Internet develops quite fast and the users
More informationDesigning and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System
, pp.97-108 http://dx.doi.org/10.14257/ijseia.2014.8.6.08 Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System Suk Hwan Moon and Cheol sick Lee Department
More informationTwo-Level Metadata Management for Data Deduplication System
Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,
More informationCyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
More informationBig Data Collection Study for Providing Efficient Information
, pp. 41-50 http://dx.doi.org/10.14257/ijseia.2015.9.12.03 Big Data Collection Study for Providing Efficient Information Jun-soo Yun, Jin-tae Park, Hyun-seo Hwang and Il-young Moon Computer Science and
More informationSuresh Lakavath csir urdip Pune, India lsureshit@gmail.com.
A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data
More informationFP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
More informationANALYSIS OF BILL OF MATERIAL DATA USING KAFKA AND SPARK
44 ANALYSIS OF BILL OF MATERIAL DATA USING KAFKA AND SPARK Ashwitha Jain *, Dr. Venkatramana Bhat P ** * Student, Department of Computer Science & Engineering, Mangalore Institute of Technology & Engineering
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationDevelopment of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards
, pp. 143-150 http://dx.doi.org/10.14257/ijseia.2015.9.7.15 Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards Ryu HyunKi 1, Yeo ChangSub 1, Jeonghyun
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationSpark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
More informationDBaaS Using HL7 Based on XMDR-DAI for Medical Information Sharing in Cloud
, pp.111-120 http://dx.doi.org/10.14257/ijmue.2015.10.9.12 DBaaS Using HL7 Based on XMDR-DAI for Medical Information Sharing in Cloud Ho-Kyun Park 1 and Seok-Jae Moon 2 1 School of IT Convergence Engineering,
More informationSnapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
More informationA Study on Integrated Operation of Monitoring Systems using a Water Management Scenario
, pp. 55-64 http://dx.doi.org/10.14257/ijseia.2015.9.9.06 A Study on Integrated Operation of Monitoring Systems using a Water Management Scenario Yong-Hyeon Gwon 1, Seung-Kwon Jung 2, Su-Won Lee 2 and
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationA Study on IP Exposure Notification System for IoT Devices Using IP Search Engine Shodan
, pp.61-66 http://dx.doi.org/10.14257/ijmue.2015.10.12.07 A Study on IP Exposure Notification System for IoT Devices Using IP Search Engine Shodan Yun-Seong Ko 1, Il-Kyeun Ra 2 and Chang-Soo Kim 1* 1 Department
More informationVolume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
More informationEffective Use of Android Sensors Based on Visualization of Sensor Information
, pp.299-308 http://dx.doi.org/10.14257/ijmue.2015.10.9.31 Effective Use of Android Sensors Based on Visualization of Sensor Information Young Jae Lee Faculty of Smartmedia, Jeonju University, 303 Cheonjam-ro,
More informationCrime Hotspots Analysis in South Korea: A User-Oriented Approach
, pp.81-85 http://dx.doi.org/10.14257/astl.2014.52.14 Crime Hotspots Analysis in South Korea: A User-Oriented Approach Aziz Nasridinov 1 and Young-Ho Park 2 * 1 School of Computer Engineering, Dongguk
More informationA Resilient Device Monitoring System in Collaboration Environments
, pp.103-114 http://dx.doi.org/10.14257/ijsh.2014.8.5.10 A Resilient Device Monitoring System in Collaboration Environments KeeHyun Park 1 and JongHwi Lee 1 Department of Computer Engineering, Keimyung
More informationA Research Using Private Cloud with IP Camera and Smartphone Video Retrieval
, pp.175-186 http://dx.doi.org/10.14257/ijsh.2014.8.1.19 A Research Using Private Cloud with IP Camera and Smartphone Video Retrieval Kil-sung Park and Sun-Hyung Kim Department of Information & Communication
More informationA Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems
A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems Milan Čermák, Daniel Tovarňák, Martin Laštovička, Pavel Čeleda Institute of Computer Science, Masaryk University
More informationUnderstanding traffic flow
White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationBrave New World: Hadoop vs. Spark
Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,
More informationScalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
More informationFujitsu Big Data Software Use Cases
Fujitsu Big Data Software Use s Using Big Data Opens the Door to New Business Areas The use of Big Data is needed in order to discover trends and predictions, hidden in data generated over the course of
More informationThe Design and Implementation of the Integrated Model of the Advertisement and Remote Control System for an Elevator
Vol.8, No.3 (2014), pp.107-118 http://dx.doi.org/10.14257/ijsh.2014.8.3.10 The Design and Implementation of the Integrated Model of the Advertisement and Remote Control System for an Elevator Woon-Yong
More informationHome Appliance Control and Monitoring System Model Based on Cloud Computing Technology
Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Yun Cui 1, Myoungjin Kim 1, Seung-woo Kum 3, Jong-jin Jung 3, Tae-Beom Lim 3, Hanku Lee 2, *, and Okkyung Choi 2 1
More informationDesign of Simulator for Cloud Computing Infrastructure and Service
, pp. 27-36 http://dx.doi.org/10.14257/ijsh.2014.8.6.03 Design of Simulator for Cloud Computing Infrastructure and Service Changhyeon Kim, Junsang Kim and Won Joo Lee * Dept. of Computer Science and Engineering,
More informationA Load Balanced PC-Cluster for Video-On-Demand Server Systems
International Journal of Grid and Distributed Computing 63 A Load Balanced PC-Cluster for Video-On-Demand Server Systems Liang-Teh Lee 1, Hung-Yuan Chang 1,2, Der-Fu Tao 2, and Siang-Lin Yang 1 1 Dept.
More informationAn Efficient Application Virtualization Mechanism using Separated Software Execution System
An Efficient Application Virtualization Mechanism using Separated Software Execution System Su-Min Jang, Won-Hyuk Choi and Won-Young Kim Cloud Computing Research Department, Electronics and Telecommunications
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationIMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationDesign and Implementation of Automatic Attendance Check System Using BLE Beacon
, pp.177-186 http://dx.doi.org/10.14257/ijmue.2015.10.10.19 Design and Implementation of Automatic Attendance Check System Using BLE Beacon Mi-Young Bae and Dae-Jea Cho * Dept. Of Multimedia Engineering,
More informationA Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
More informationDistributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
More informationA Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture
A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture Hyeon seok O, Namgi Kim1, Byoung-Dai Lee dept. of Computer Science. Kyonggi University, Suwon,
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationUPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
More informationHeterogeneity-Aware Resource Allocation and Scheduling in the Cloud
Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Yahoo! Research Abstract Data analytics are key applications
More informationOptimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
More informationDevelopment of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations
Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations Ryu HyunKi, Moon ChangSoo, Yeo ChangSub, and Lee HaengSuk Abstract In this paper,
More informationIntroduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
More informationHybrid System for Driver Assistance
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1583-1587 International Research Publications House http://www. irphouse.com Hybrid System
More informationContent-Aware Load Balancing using Direct Routing for VOD Streaming Service
Content-Aware Load Balancing using Direct Routing for VOD Streaming Service Young-Hwan Woo, Jin-Wook Chung, Seok-soo Kim Dept. of Computer & Information System, Geo-chang Provincial College, Korea School
More informationCloud Storage Solution for WSN Based on Internet Innovation Union
Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationMassive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationSpark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1
Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010
More informationResearch and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment
Vol.46 (Multimedia 2014), pp.307-312 http://dx.doi.org/10.14257/astl.2014.46.64 Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment Jin-tae Park
More informationNetFlow Analysis with MapReduce
NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with
More informationRUBA: Real-time Unstructured Big Data Analysis Framework
RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, bytelee@etri.re.kr
More informationSPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble!
SPARK USE CASE IN TELCO Apache Spark Night 9-2-2014! Chance Coble! Use Case Profile Telecommunications company Shared business problems/pain Scalable analytics infrastructure is a problem Pushing infrastructure
More informationA Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
More informationSpark and Shark. High- Speed In- Memory Analytics over Hadoop and Hive Data
Spark and Shark High- Speed In- Memory Analytics over Hadoop and Hive Data Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Cliff Engle, Michael Franklin, Haoyuan Li,
More informationThe Sensitive Information Management System for Merger and Acquisition (M&A) Transactions
, pp.203-212 http://dx.doi.org/10.14257/ijmue.2014.9.3.19 The Sensitive Information Management System for Merger and Acquisition (M&A) Transactions Kyong-jin Kim * and Seng-phil Hong ** Sungshin Women
More informationReal Time Data Processing using Spark Streaming
Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O
More informationA Study on Information Technology Plan and Status of University 2013
, pp. 47-54 http://dx.doi.org/10.14257/ijseia.2014.8.10.05 A Study on Information Technology Plan and Status of University 2013 Tae-Yong Shim 1, Il-Jun Choi 2, Jin Kim 3 and Young-Hun Lee 4 1 Department
More informationApache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
More informationDevelopment of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
More informationDevelopment of a Service Robot System for a Remote Child Monitoring Platform
, pp.153-162 http://dx.doi.org/10.14257/ijsh.2014.8.5.14 Development of a Service Robot System for a Remote Child Monitoring Platform Taewoo Han 1 and Yong-Ho Seo 2, * 1 Department of Game and Multimedia,
More informationSawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices
Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal
More informationAdaptive Load Balancing Method Enabling Auto-Specifying Threshold of Node Load Status for Apache Flume
, pp. 201-210 http://dx.doi.org/10.14257/ijseia.2015.9.2.17 Adaptive Load Balancing Method Enabling Auto-Specifying Threshold of Node Load Status for Apache Flume UnGyu Han and Jinho Ahn Dept. of Comp.
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationDesign of a NAND Flash Memory File System to Improve System Boot Time
International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong
More informationNear Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
More informationSecurity Measures of Personal Information of Smart Home PC
, pp.227-236 http://dx.doi.org/10.14257/ijsh.2013.7.6.22 Security Measures of Personal Information of Smart Home PC Mi-Sook Seo 1 and Dea-Woo Park 2 1, 2 Department of Integrative Engineering, Hoseo Graduate
More information86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014
86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014 Dual server-based secure data-storage system for cloud storage Woong Go ISAA Lab, Department of Information Security Engineering,
More informationApache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack
Apache Spark Document Analysis Course (Fall 2015 - Scott Sanner) Zahra Iman Some slides from (Matei Zaharia, UC Berkeley / MIT& Harold Liu) Reminder SparkConf JavaSpark RDD: Resilient Distributed Datasets
More informationA Study on Data Analysis Process Management System in MapReduce using BPM
A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationA study on Standardization of Integrated database for Intelligent water information management
, pp.132-136 http://dx.doi.org/10.14257/astl.2015.99.33 A study on Standardization of Integrated database for Intelligent water information management Ji Won Jung *, Seung Kwon Jung **, Jin Tak Choi ***,
More informationData Mining for Data Cloud and Compute Cloud
Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,
More informationApache Spark and Distributed Programming
Apache Spark and Distributed Programming Concurrent Programming Keijo Heljanko Department of Computer Science University School of Science November 25th, 2015 Slides by Keijo Heljanko Apache Spark Apache
More informationStudy on the Vulnerability Level of Physical Security And Application of the IP-Based Devices
, pp. 63-68 http://dx.doi.org/10.14257/ijsh.2015.9.10.07 Study on the Vulnerability Level of Physical Security And Application of the IP-Based Devices Kwang-Hyuk Park 1, Il-Kyeun Ra 2 and Chang-Soo Kim
More informationA RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers
, pp.155-164 http://dx.doi.org/10.14257/ijunesst.2015.8.1.14 A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers Yunhua Gu, Bao Gao, Jin Wang, Mingshu Yin and Junyong Zhang
More informationPerformance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud
, pp.105-111 http://dx.doi.org/10.14257/astl.2014.66.25 Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud Kyoung-Taek Seo 1, Hyun-Seo Hwang 1, Il-Young Moon 1, Oh-Young
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
More information