Cyber Forensic for Hadoop based Cloud System
|
|
|
- Barrie Dawson
- 10 years ago
- Views:
Transcription
1 Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division Software Platform Lab 3 Korea National Open University, Dept. of Computer Science, [email protected], [email protected], [email protected] Abstract Cloud services are so efficient and flexible to expend and manage the service so that various cloud services are commercially implemented and provided by KT ucloud, Amazon EC2, and the other companies. As could service are quickly deployed, more security problems occurs and cloud forensic procedures for cloud systems are needed. But, in multi-users serviced cloud systems, a system suspension makes serious problems to users so that collecting evidences and analysis have to be performed in the field and live analysis is important in cloud systems. Cloud system based on Hadoop distributed file system has characteristics of massive volume of data and multi-users, physically-distributed data, and multi-layered data structures. The previous forensic procedures and methodologies are not appropriate for cloud system based on Hadoop distributed file system. In order to deal with those characteristics of cloud system, we propose Hadoop based cloud forensic procedure that supports static analysis after live analysis and live collection without system suspension, and Hadoop based cloud forensic guidelines. With our proposed Hadoop based cloud forensic procedure, we can decrease the time for evidence collection and evidence volume Keywords: cloud distributed file system, Hadoop distributed file system, cyber forensic 1. Introduction Recently cloud services for companies and personals as like EC2 of Google and ucloud of KT are provided by many cloud service companies [1, 2]. Basically those cloud systems are implemented based on distributed file systems [2] and physically distributed servers so that they could guarantee reliability and scalability of service. There are many kinds of distributed file systems such as network file systems (NFS) of SUN, Google File System (GFS) of Google, and HDFS(Hadoop distributed file system) of Apache, GLORY-FS of ETRI. But Hadoop distributed file system is open source and many commercial cloud service systems adopt it as their file system. Thus in near future, a lot of security problems will occur on Hadoop based cloud service system [4, 5, 6, 8]. But, previous cyber forensic methods could not be adopted to cloud system that is built based on Hadoop distributed file system [9, 10, 11]. First of all, Hadoop distributed file system is based on virtual file system and file access could be virtualized and the proof of forensic could be easily erased or un-audited. Second, cloud system assumes the multi users for various services. For that reason, cloud service halt or shut-down could be fatal to the reliability of the cloud service. But previous cyber forensic methods need the system halt or shut-down and after that forensic methods could be adopted to the cloud system. And last, cloud system cloud system manages huge file system that cloud be distributed physically at several places and the whole amount of storage volume is very large. Thus the time and 83
2 manpower for forensic proof collection cloud be so huge that the previous cyber forensic cloud not be managed. We first compare cloud system and previous distributed system and analyze Hadoop distributed file system. After that, we propose cyber forensic for Hadoop based cloud system and cyber forensic guide line that could be used for real fields. The rest of the paper is outlined as follows. In related works, we analyze the previous cyber forensic and show the disadvantages of the previous cyber forensic on cloud system. Hadoop based cyber forensic methods show the new cyber forensic method for cloud system and the new cyber forensic guide line for cloud system. Lastly we conclude this paper and present future works. 2. Related Works In [12], rather than seizing the whole system, a forensic method of transferring disk images to the investigation center is proposed. This method is based on the snapshot of the virtual machine in the cloud system. And, a creating an image with the volatile memory is to raise the time efficiency However, transferring the large volume of replica image on the network has the overhead. The volume has been bigger, more time and traffic has been occurred for the networking. It also has big problem of re-transferring when the network channel is failed or the hash is different from the original value. In [20], the new concept Forensic Cloud is proposed. This study is for definition of the forensic cloud framework. The objectives of the study are satisfying the requirement of forensic and providing the convenience of the user. The automation in all proposed layers can contain meaningless data and the tendency is clearer in the large-volume of file system. To find meaningful data from the real-analyzed data, information analysis and judgment of the investigators have to be involved. In [21], the limitation of the prior works about the procedures and methodologies was identified. To solve these problems, simplification of the analysis methodology, introduce a database for analysis and applying cloud system were proposed. That is, the analysis is composed of scan, classification, searching& indexing, analysis, and documentation. To save process time, similar step are grouped and they are processed in parallel by utilizing a multi-system or a cloud. Although proposed parallel processing minimizes the processing time, the important evidence data cannot be searched in the distributed step when related evidence are processed in parallel. In the grouping phase, the big whole of analysis can be occurred since the important evidence can be split when distributing an actual storage to the several file groups In the case of applying the original forensic methodology and processes which are based on one information processing media to the Hadoop distributed file system (HDFS), several problems can be occurred. These problems are as follows. Firstly, applications of the original forensic methodology occurs the problem of replicated image-generation time and securing storage in the HDFS since it is impossible that separated replication of each node s disk. Secondly, it is necessary that replicating with the block area of the stored data node or whole node when collecting evidence only in specific block area because HDFS stored the same replica in the different nodes. Thirdly, in general, replica is transferred with being stored in container that block external influences since replica is image file basis, but large image are not possible. Several problems such as, missing, falsification of data, and extra network traffic are also occurred when transferring the replica in the same network. Fourthly, service interruption or work stopping are occurred for other users since HDFS based cloud 84
3 service is utilized by various users. Especially in enterprise environment, system seizing results in terminating the service. Finally, a different analysis methodology is needed since HDFS physically separates master and data nodes. It is also hard to develop and apply a different analysis methodology because secondary name node also has different structure than the master node 3. HADOOP Based Cyber-Forensic Method Our proposed Hadoop forensic step is for generally and widely used forensic procedure. We redefine the each step as a common tool used by the machinery of law or a private organization Table 1. Comparison between original forensic and Hadoop file system based forensic Forensic procedure of Hadoop file system Preparation Identification 1 st step Collection Collection and Analysis Live Analysis 2 nd step Collection and Analysis Collection Transport Static Analysis Reporting To address problems caused by applying original forensic procedure to Hadoop and to be a Hadoop based forensic, we redefine the procedure as follows. From the first preparation step to the last step of reporting, the overall structure is similar to the forensic structure proposed by the National Police Agency or NIST. However, there are some features when considering the characteristic of the cloud service and Hadoop file system. 1 Preparation A team should be composed of experts in research, technical analysis, law, etc., for forensic. To be an efficient collection, equipment and preparation for evidence collection must be ready. Therefore, in this step, it is important that explaining the necessity and appropriateness of investigate to the Hadoop file system administrator, network administrator, service administrator, planner and legal counsel to cooperate with them. 2 Identification Through this step, the load and time for collecting evidence can be minimize. In this step, since it is impossible that generating replica image of the whole large file system, pre-analysis is performed to minimize the scope of data collection. The identification is the most important step in Hadoop forensic and it is base of the shortening the overall 85
4 analysis time, the minimizing the scope of analysis and the reducing the damage of other users or organization. 3 Live Collection Since the name node has the information about where file and block are located, the evidence collection and analysis about the name node are performed in the field. The reason why the real time analysis is necessary in the field is that finding the specific block area for real evidence collection. It can be minimizing the time for generation replica and evidence collection. 4 Live Analysis This step is to analysis the structure of Editlog and FsImage files in the namenode. The real-time analysis of these two files is performed to identify the location of the block and the composition of the cluster which is needed for collecting real evidence. This analysis is for the next static analysis. Sometimes, the secondary namenode can be included for the analysis. 5 Statistic Collection This step is to collect evidence by the file or block from the live collection and live analysis steps. Based on the scope ruled in the live collection step, this step guarantees the integrity with the original one and generates the replica. Sometimes, replicating or seizing the whole node and taking the picture of the damaged system are processed. It must be prepared for the proof of integrity through taking picture or video recording of the all equipment, office and output of the monitor screen. 6 Transport After the collection, as in the original forensic, the transport step delivers the replicated image with being stored in the container or directly transfers the image to the investigator s computer. Not only in the collection and transport steps, the information form the live analysis also included in this step. The proof of integrity is the most important in the transport step. Lawful method and procedure must be processed to ensure it. If the physical equipment is seized, all process must be taken by picture or recoded by video. All process must be monitored by the observer and his statement also be submitted with the evidence. 7 Static Analysis Files or images collected in the field or seized equipment are analyzed by the various forensic tools and methods. Damaged files are recovered in this step. Sometimes, several forensic methods such as, description of the encoding file, extracting hidden information from Slack Space or file, finding usage trace of malicious equipment or program are adopted. 86
5 8 Reporting After the all collection and analysis steps are completed, final report is organized for submitting evidence to the court based on the results from the above steps. This step is similar with the reporting step in other domain and it must prevent analysis and reporting time from getting longer 4. Hadoop Based Cyber-Forensic Guidelines We consider characters of Hadoop distributed file system and cloud services for Hadoop based cyber forensic guidelines and try to help detectives and managers in real fields. We think Hadoop based cyber forensic guidelines could be field manual. Especially Hadoop based cyber forensic guidelines has focus on forensic actions plan and live collection and analysis doe volatile memory and evidences, since Hadoop based cyber forensic guidelines should be referenced in the field. Especially collected and analyzed evidences would be used as proof for cyber crimes at the court. In each step of Hadoop based cyber forensic guidelines, legal process should be followed and the originality and proof integrity should be guaranteed for evidence collection and analyse[16]. 1 Preparation At this stage, in order to make a live forensic team and cooperation, the necessity of cyber forensic should be explained to the Hadoop distributed file system administrator, network administrator, service administrator, planner and legal counsel. 2 Identification At this stage, in order to complete rapid and accurate evidence analysis, evidence collection and analysis should be narrowed and simplified. Maximized limitation of Hadoop distributed file system should be verified, collected and analyzed, since data is distributed over the whole cloud system. Location and structure of name node, secondary name node, backup name node should be confirmed. And network structure of machine and computer should be analyzed. 3 Live Collection and Live Analysis At this stage, live collection and analysis should be applied to name node, secondary node and backup node in the field. The system clock should be recorded and would be used as live collection and analysis process time. The system MAC time could be used as file history tag, evidence access time, and modification time. After that, network connection information, open port information and protocol, user audit information, network routing table, process and file information should be collected and analyzed. 4 Static Collection Disk replica of damaged system should be minimized. Hadoop distributed file system basically replicates and manages three replicas. All digital information should be collected by the unit of file or replica image. And all collected information has hash value for the integrity between the original one and replica. 87
6 5 Transport Whole disk of name node or only d\hard disk replica should be transport. In the case of transporting name node servers, fenders and safety boxes should be equip for safety against external shock and electromagnetic waves. Replicated image files should be transferred to image record disks and put into safety transport box. 6 Static Analysis Accesses trials and modification evidences of servers and clients should be carefully detected. And the original evidences should be carefully managed so that the replicas would be analyzed and used in order to track the cyber crime evidence. If the original evidences should be analyzed, the original evidences would be write-locked[24]. 7 Reporting Reports should be clear and detailed. And it would be treated as evidence at the court and managed carefully 5. Conclusion There are many difficulties in cyber forensics on the IT environment that is rapidly changing. But previous methods and tools of cyber forensics are still insufficient to satisfy new requirements of cloud services. Previous cyber forensics for mass distributed file system based on the Hadoop distributed file system, suggested in this paper, has many portions to be complement and improved. In this paper, we presented the issue and the reason that the previous cyber forensics technologies and procedures could not be appropriate to apply to the Hadoop distributed file system. There are two problems in the stage of gathering evidences. One is confirming the file blocks replicated by nodes that are different each other. And the other is the excessive increase of time of copying the original. In the case of replicating the entire file system on the event of transferring the evidences, there have been some difficulties of acquiring more storage for moving these evidences. In the multi user environment of the Hadoop distributed file system, the entire service could be stopped on the case of halting the system or seizing evidences. Rising demands and interests for mass storage need a new way of forensics technologies and methodologies. In order to solve these problems, the Hadoop based cyber forensics have to be simultaneous with live analysis of master node on the crime scene and static analysis after acquisition of evidences. And the ways of acquirement and analysis of evidences without stopping cloud services are request because of the characteristic of multi user environment References [1] H. Kim, Y. Lee, Cloud computing service present condition and prospects, Korea Information and Communications Society Magazine, vol. 27, no. 12. [2] J. H. Yun, Y. H. Park, S. J. Lee, S.-M. Jang, J. S. Yoo, H. Y. Kim, Y.-K. Kim, A Non-Shared Metadata Management Scheme of Large Distributed File Systems, The Korea Institute of Information Scientists and Engineers Journal, vol. 36, no. 4. [3] S. Ghemawat, H. Gobioff and S.-T. Leung, The Google File System, Proceeding SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles, (2003). 88
7 [4] Y. S. Min, K. S. Jin, H. Y. Kim and Y. K. Kim. A Trend to Distributed File Systems for Cloud Computing, ETRI Electronics and Telecommunications Trends, vol. 24, no. 12, (2009). [5] Hadoop: OSS Based Massive Data Management Technology [6] HDFS Architecture Guide, (2012). [7] adoop The Definitive Guide, nd dition, eilly Media. [8] Apache HBase Reference Guide. Apache Software Foundation, (2012). [9] Analyzing Big Data with Hadoop, decorated a new history. Bloter.net archives/ [10] Y. H. Yoo, B. G. Song and S. J. Park, The necessity and training plans for Digital Forensics expertise, Korean Association Of Police Science Review, vol. 11, no. 4, (2009). [11] G. Y. Na, A Study on the Gathering of Forensics in Cloud Computing Environment, (2011). [12] C.-S. Park, Study on Security Considerations in the Cloud Computing, The Korea Academia-Industrial cooperation Society, vol. 12, no. 3, (2011) March. Authors ChaeHo Cho Mr. Chaeho cho is a graduate student for Korea National Open University, dept. of computer Science. He is interested in cloud computing, and Hadoop file system. SungHo Chin Dr. Chin received the B.S. degree, the M.E. degree, and the Ph.D. degree in Computer Science Education from Korea University, in 2002, 2004 and 2010, respectively. He is currently a Senior Research Engineer at LG Electronics since His research interests are in Cloud computing, Grid computing, Distributed computing and Service Delivery Platforms. Kwang Sik Chung Dr. Chung received B.S. degree, the M.E. degree, and Ph.D. from Korea University, in 1993, 1995 and 2000, respectively. His major was distributed mobile computing. Currently he has interesting in M-learning and cloud computing for smart learning. He is an assistant professor at Korea National Open University. 89
8 90
Distributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
Secret Sharing based on XOR for Efficient Data Recovery in Cloud
Secret Sharing based on XOR for Efficient Data Recovery in Cloud Computing Environment Su-Hyun Kim, Im-Yeong Lee, First Author Division of Computer Software Engineering, Soonchunhyang University, [email protected]
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
A Study on Data Analysis Process Management System in MapReduce using BPM
A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
Multi-level Metadata Management Scheme for Cloud Storage System
, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT
METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT 1 SEUNGHO HAN, 2 MYOUNGJIN KIM, 3 YUN CUI, 4 SEUNGHYUN SEO, 5 SEUNGBUM SEO, 6 HANKU LEE 1,2,3,4,5 Department
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Distributed file system in cloud based on load rebalancing algorithm
Distributed file system in cloud based on load rebalancing algorithm B.Mamatha(M.Tech) Computer Science & Engineering [email protected] K Sandeep(M.Tech) Assistant Professor PRRM Engineering College
A REVIEW: Distributed File System
Journal of Computer Networks and Communications Security VOL. 3, NO. 5, MAY 2015, 229 234 Available online at: www.ijcncs.org EISSN 23089830 (Online) / ISSN 24100595 (Print) A REVIEW: System Shiva Asadianfam
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
Big Data Collection Study for Providing Efficient Information
, pp. 41-50 http://dx.doi.org/10.14257/ijseia.2015.9.12.03 Big Data Collection Study for Providing Efficient Information Jun-soo Yun, Jin-tae Park, Hyun-seo Hwang and Il-young Moon Computer Science and
Scalable Forensics with TSK and Hadoop. Jon Stewart
Scalable Forensics with TSK and Hadoop Jon Stewart CPU Clock Speed Hard Drive Capacity The Problem CPU clock speed stopped doubling Hard drive capacity kept doubling Multicore CPUs to the rescue!...but
Reduction of Data at Namenode in HDFS using harballing Technique
Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu [email protected] [email protected] Abstract HDFS stands for the Hadoop Distributed File System.
Ensuring Security in Cloud with Multi-Level IDS and Log Management System
Ensuring Security in Cloud with Multi-Level IDS and Log Management System 1 Prema Jain, 2 Ashwin Kumar PG Scholar, Mangalore Institute of Technology & Engineering, Moodbidri, Karnataka1, Assistant Professor,
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Cloud Computing based Livestock Monitoring and Disease Forecasting System
, pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Suresh Lakavath csir urdip Pune, India [email protected].
A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India [email protected]. Ramlal Naik L Acme Tele Power LTD Haryana, India [email protected]. Abstract Big Data
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
Mobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, [email protected] Advisor: Professor Priya Narasimhan, [email protected] Abstract The computational and storage
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Research Article Hadoop-Based Distributed Sensor Node Management System
Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE
DATA SECURITY MODEL FOR CLOUD COMPUTING
DATA SECURITY MODEL FOR CLOUD COMPUTING POOJA DHAWAN Assistant Professor, Deptt of Computer Application and Science Hindu Girls College, Jagadhri 135 001 [email protected] ABSTRACT Cloud Computing
Distributed Lucene : A distributed free text index for Hadoop
Distributed Lucene : A distributed free text index for Hadoop Mark H. Butler and James Rutherford HP Laboratories HPL-2008-64 Keyword(s): distributed, high availability, free text, parallel, search Abstract:
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization
2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
Research on Digital Forensics Based on Private Cloud Computing
Research on Digital Forensics Based on Private Cloud Computing Gang Zeng Police Information Technology Department, Liaoning Police Academy, Liaoning, China ABSTRACT With development of network, massive
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Distributed Metadata Management Scheme in HDFS
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 Distributed Metadata Management Scheme in HDFS Mrudula Varade *, Vimla Jethani ** * Department of Computer Engineering,
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Hadoop Distributed FileSystem on Cloud
Hadoop Distributed FileSystem on Cloud Giannis Kitsos, Antonis Papaioannou and Nikos Tsikoudis Department of Computer Science University of Crete {kitsos, papaioan, tsikudis}@csd.uoc.gr Abstract. The growing
Comparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, [email protected]
Data Mining for Data Cloud and Compute Cloud
Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Data movement for globally deployed Big Data Hadoop architectures
Data movement for globally deployed Big Data Hadoop architectures Scott Rudenstein VP Technical Services November 2015 WANdisco Background WANdisco: Wide Area Network Distributed Computing " Enterprise
Forensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black
Forensic Clusters: Advanced Processing with Open Source Software Jon Stewart Geoff Black Who We Are Mac Lightbox Guidance alum Mr. EnScript C++ & Java Developer Fortune 100 Financial NCIS (DDK/ManTech)
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
Cloud Computing based on the Hadoop Platform
Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.
StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data
: High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,
A Study of Key management Protocol for Secure Communication in Personal Cloud Environment
, pp.51-58 http://dx.doi.org/10.14257/ijsia.2014.8.4.05 A Study of Key management Protocol for Secure Communication in Personal Cloud Environment ByungWook Jin 1 and Keun-Wang Lee 2,* 1 Dept. of Computer
MapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
Digital Forensics Tutorials Acquiring an Image with FTK Imager
Digital Forensics Tutorials Acquiring an Image with FTK Imager Explanation Section Digital Forensics Definition The use of scientifically derived and proven methods toward the preservation, collection,
