Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
|
|
|
- Scot Nickolas Gilbert
- 10 years ago
- Views:
Transcription
1 , October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying Qian Abstract The rapid growth of big data analysis needs has resulted in a shortage of professionals in this area. Despite such great need, big data analytics and data science is not well represented in current academic programs. Although a few schools have offered new courses in this area but they all face challenges. Most of existing instruction models are impractical only focus on lectures, seminars, and discussions without any hands-on problem solving computing labs because it requires significant investment in resources and high requirement for instructors (e.g., faculty whose expertise is in this area). To overcome the above difficulties this paper presents a new labware with real hands-on labs to support the course objectives. The designed labware is inexpensive, portable, and easy-to-adopt. Index Terms Big data, analysis, security, lab experimentation T I. INTRODUCTION he recent explosion of cyber activities in the areas of social network, healthcare, research, and many other business resulted in huge unstructured data(big data), most of them are unstructured in various forms. These data may provide very valuable information, such as for bioinformatics research, security and business analysis, for exploring new knowledge, patterns, and relevant information. The rapid growth of big data analysis has resulted in a shortage of professionals in this area. Despite such great need, big data analytics and data science is not well represented in our current academic computing programs. Although a few schools have started offering new courses in this area but they face challenges. It is very crucial for students in data science and computing area to know how to manipulate, store, and analyze big data with computing technology. Most of existing instruction models focus on lectures, seminars, discussions without any hands-on computing labs because it rely on two prerequisites, i.e., significant investment in resources (cloud services) and high requirement for instructors (e.g., faculty whose expertise is in this area). T. Zhao, K. Qian, and D. Lo are with the Dept. of Computer Science and Software Engineering, University of Southern Polytechnic State University, Marietta, GA USA ( [email protected], [email protected], and [email protected]). M. Guo and P. Bhattacharya are with Dept. of Computer Science, University of Cincinnati, Cincinnati, OH USA ( [email protected], and [email protected]). W. Chen is with Dept. of Computer Science, Tennessee State University, Nashville, TN USA ( [email protected]). Y. Qian is with Dept. of Computer Science & Technology, East China Normal University, Shanghai, China ( [email protected]) There is a need to design an adoptable open source laboratory model to be used in big data analysis teaching and learning to overcome the above difficulties to achieve broader big data analysis education, this project aims to develop a new hands-on labware for big data processing and analysis that is portable, modular, and easy-to-adopt. We have adopted an open source framework Hadoop with NoSQL for data store to store big volume unstructured dataset, and MapReduce techniques for parallel processing and for our labware. II. BIG DATA ANALYSIS FOR INFORMATION SECURITY COURSE A. Course Objectives and Learning Outcomes OBJECTIVES: To explore the fundamental concepts of big data analytics To learn the big data analysis with intelligent reasoning and data mining To understand the applications using Map Reduce Concepts. To apply the big data analysis to cybersecurity LEARNING OUTCOMES (LOs): The students will be able to: LO1: Work with big data platform LO2: Work with NoSQL database for storing unstructured big data LO3: Use the HADOOP and Map Reduce technologies associated for big data analytics LO4: Design algorithms for big data mining for business applications especially cybersecurity threat analysis B. Course Description A project-oriented hands-on course focusing on big data application development for information security over distributed environments. Students will explore key data analysis and management techniques applied to massive network traffic datasets to support real-time decision making for security threats in distributed environments. This course is designed to be a hands-on learning experience that students learn better by doing. With concrete, practical experience students will be better prepared to apply their new knowledge into real-life, data-intensive, research situations. Students are expected to make of map-reduce parallel computing paradigm and associated technologies such as distributed file systems, Mongo databases, and stream computing engines to design scalable big data analysis system. Students should be able to apply machine
2 , October, 2014, San Francisco, USA leaning, supervised or unsupervised-learning, information extraction and feature selection and stream mining in information security domains. III. HANDS-ON LABORATORY To achieve the course objectives and student learning outcomes we designed and implemented the hands-on distributed and parallel big data analysis labware. A. Lab-1: Apache Hadoop Configuration Lab-1 mainly introduce a big data platform, Apache Hadoop, to students and allow them to understand how Hadoop and a MapReduce task works. This lab also provides students with step-by-step instructions, which allows them to be able to configure and work with it, which satisfies the LO1 learning outcome. The Apache Hadoop software is an open-source framework built for reliable, scalable distributed computing tasks with huge data sets over a cluster of multiple computers. This framework is written in Java programming language and it is under Apache License. Generally, a Hadoop system is composed of a computer acting as the master node and multiple computers acting as the slave nodes, as shown in Fig 1. After version 0.23, Hadoop starts to adopt new MapReduce framework, called Yarn, consisting of Resource Manager and Node Manager. Resource Manager takes responsibility for managing and dispatching resources in the Hadoop cluster, and receiving data from Node Manager of each slave node. Node Manager is used to send information about usage of resources and task progress to Resource Manager. As shown in Fig 2, MapReduce task is executed in a distributed manner. Each DataNode passes original data to a map function. After being processed by map function, original data becomes key-value pairs and is sent back to a reduce function. After completing the step of reduce function, the data will be saved as result or for further processing. In this lab, the Hadoop cluster is consisted of three laptops, which help students to get a sense of distribute and parallel computing. One laptop acts as the master node, and the other two laptops are used as slave nodes. After following the tutorial to configure the Hadoop cluster, students can use the following command to start it, as shown in Fig 3.: Fig 3. Command For Starting Hadoop Cluster Then, students can use the following command to check if the Hadoop Cluster is running successfully, as shown in Fig 4: Fig 4. Command For Checking Hadoop Cluster Fig 1. Architecture of Hadoop Cluster Hadoop has two modules in total, including HDFS (Hadoop Distributed File System) and MapReduce Framework. HDFS usually only has one NameNode, which is used to manage the directory tree and the metadata of related files of HDFS. It could also own a Secondary NameNode, which can be employed to backup mirror files and to combine logs and mirror files periodically and send back to NameNode. In general, NameNode and Secondary NameNode are deployed on the master node. In addition, DataNode of HDFS is responsible for store data and sending processed data back to NameNode, and it is usually deployed on the slave node. Fig 5, Fig 6, and Fig 7 display the running condition of each laptop. For master node laptop, it should display "Jps, " "NameNode, " "Secondary NameNode, " and "Resource Manager." And for slave node laptops, they should display "Jps, " "DataNode, " and "Node Manager." Fig 5. Master Node Running Condition Fig 6. Slave-1 Node Running Condition Fig 7. Slave-2 Node Running Condition Fig 2. MapReduce Process in Hadoop B. Lab-2: NoSQL Database - Apache HBase Lab-2 concentrates on the introduction of a NoSQL database, Apache HBase. This lab also offers detailed configuration tutorial to students, which can help them to set up HBase on their laptop. In addition, students will be provided with simple instructions of how to use HBase Shell to create table, insert rows, display contents, etc. This lab perfectly meets the requirement of LO2 learning outcome.
3 , October, 2014, San Francisco, USA The Apache HBase is an open-source NoSQL, highly reliable, highly efficient, row-oriented and expandable distributed database system. It is implemented based on BigTable, an open-source software of Google. Similar to the fact that Google BigTable utilizes GFS as its file storage system, HBase employs Hadoop HDFS as its own file storage system; Google executes MapReduce to process huge data sets in BigTable, and HBase runs Hadoop MapReduce; Google BigTable takes advantage of Chubby as cooperative service, and HBase utilizes Zookeeper as its own cooperative service. In this lab, HBase is chosen to introduce to the students because as a NoSQL database, HBase can be easily used to store huge unstructured data, and as Hadoop HDFS provides a reliable low-level storage support for HBase, it is great for processing huge data sets using MapReduce in later labs. After following the tutorial to configure HBase, students can use the following commands to start Hbase service as shown in Fig 8: application is selected because it is a traditional MapReduce example, which can be easily absorbed by students. This lab will help students to gain LO3 learning outcome. This lab is developed in Eclipse IDE, and it only contains one source file: WordCountLab.java. This file will be composed of three parts, including Mapper, Reducer and main function. The MapReduce algorithm of this lab is displayed below in Fig 13: Fig 8. Command For Starting HBase Then, students can start HBase Shell using the following commands, as shown in Fig 9: Fig 9. Command For Starting HBase Shell After starting HBase Shell, students can use the following commands to perform database manipulation. As shown in Fig 10, students can use 'create' to create table and 'list' to make sure the table is created. Fig 13. MapReduce Algorithm for WordCountLab After implementing this source file in Eclipse IDE, students should follow previous instructions to firstly start the Hadoop cluster and configure the Hadoop plugin in Eclipse IDE so that the MapReduce task can be executed directly by Eclipse IDE. The result of Word Count application is displayed in Fig 14(a), (b), (c): Fig 14 (a). Commands For Checking Word Count Result Fig 14 (b). Word Count Application Result in HBase - Part 1 Fig 10. Command For Creating Table in HBase Shell As shown in Fig 11, students can use 'put' to insert one row with a 'TEST_VALUE' in column 'Col_1.' Fig 11. Command For Insert Rows in HBase Shell As shown in Fig 12, students can use 'get' to see the row that they just inserted to the table. Fig 12. Command For Display Contents in HBase Shell C. Lab-3: Word Count Application Lab-3 is designed for students to test the cloud cluster they have just configured, and get a sense of how a MapReduce task is executed. In this lab, Word Count Fig 14 (c). Word Count Application Result in HBase - Part 2 D. Lab-4: Big Data Analysis for Security Lab-4 is using our configured cloud cluster to process a large set of network logs and to analyze if the host machine is attacked with DDoS. This lab is designed for students to get a further better understanding of how to implement a MapReduce task to perform big data security analysis. The method of detecting DDoS attack is Count-Based method[7]. This program will be run over a data set of network logs that has no DDoS attacks, and a threshold will be noted down as the measure of detecting DDoS attack. When this program is used to run over a data set of network logs that has DDoS attacks, it will try to analyze the number of TCP requests, if the number is larger than the threshold, this host will be marked as "In Danger; " If the number is below the threshold, this host will be marked as "Normal." This lab flawlessly helps students to accomplish LO4 learning outcome.
4 , October, 2014, San Francisco, USA Fig 15. shows the whole architecture of this lab. In the very beginning, students should download 2000 DARPA Intrusion Detection Scenario-Specific Data Sets From MIT Lincoln Laboratory. Secondly, students will use WireShark software to convert dump log files to text log files, which will make it easy to abstract log data. Fig 16. MapReduce & Analysis Algorithm for BigDataSecurityAnalysisLab After implementing these five source files in Eclipse IDE, students should follow previous tutorials to firstly start the Hadoop cluster and HBase NoSQL database, and they should be able to execute this application on the cloud cluster of Hadoop and HBase. Fig 15. Network Log Security Analysis Workflow The following steps are listed below: 1. Parse text network log files, and Save data into HBase. (Select timestamp, destination, and request type from each line of log files.) 2. Read Data from HBase to Hadoop HDFS. 3. Hadoop Master Node Read Data From Hadoop HDFS. 4. Split Data Chunk and Send them to Slave Nodes. 5. Execute Map Function on each Slave Node. 6. Send Back Processed Result From Slave Nodes back to Master Node. 7. Execute Reduce Function 8. Analyze Reduce Function Result and Save Result into a result table of HBase Lab-4 project includes 5 files: Constants.java. This file defines constants and the threshold. HBaseUtil.java. This file defines and implements HBase basic operations, such as create and delete tables, get row from or put row into HBase database. StaticSecurityLogMapReduceAnalysis.java. This file implements the MapReduce algorithm of this lab, act as the starting point of this lab, as shown in Fig 16. StaticLogParser.java This file reads the data from text network log files and save it into HBase NoSQL database. MapReduceResultAnalyzer.java This file analyzes the MapReduce task result and evaluate each host to see if it is attacked by DDoS. In Fig 17, it shows the parsed network log files data in HBase: Fig 17 (a). Commands for Checking Parsed Network Log File Data in HBase Fig 17 (b). Parsed Network Log File Data in HBase In Fig 18, it shows the result of this big data security analysis application in HBase: Fig 18. Result of Big Data Security Analysis Application The row name is the destination IP of request, and each row has two columns, one is the number of request, and the other is the evaluation status. When the number of request is over the threshold, the status would become 'In Danger'. Network Administrator could utilize this application analyze DDoS security threats by running it against network log files to see if any host IP was attacked. IV. CONCLUSION The key contribution of our work is to provide hands-on lab environment model and four lab assignments for learning big data analysis from easily configuring Hadoop and HBase environment, to execute example Word Count application, and finally to implement big data security analysis MapReduce application. After taking this course,
5 , October, 2014, San Francisco, USA students will gain the ability to design highly scalable systems that can accept, store, and analyze large volumes of unstructured data. REFERENCES [1] Yasin N. Silva, Suzanne W. Dietrich, Jason M. Reed, Integrating Big Data into the Computing Curricula, ACM SIGCSE [2] An Undergraduate Degree in Data Science: Curriculum and a Decade of Implementation Experience, ACM SIGCSE 2014 [3] Philip Sheridan Buffum, CS Principles Goes to Middle School: Learning How to Teach "Big Data", ACM SIGCSE 2014 [4] Nadeem Hamid, Steven Benzel, Towards Engaging Big Data for CS1/2 [5] ACM CSTA, Getting Ready for Big Data, [6] JeongJin Cheon, Tae-Young Choe, Distributed Processing of Snort Alert Log using Hadoop, International Journal of Engineering and Technology [7] Yeonhee Lee, Youngseok Lee, Detecting DDoS attacks with Hadoop, CoNEXT '11 Student Proceedings of The ACM CoNEXT Student Workshop
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Applied research on data mining platform for weather forecast based on cloud storage
Applied research on data mining platform for weather forecast based on cloud storage Haiyan Song¹, Leixiao Li 2* and Yuhong Fan 3* 1 Department of Software Engineering t, Inner Mongolia Electronic Information
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
HADOOP MOCK TEST HADOOP MOCK TEST II
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Apache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop
Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science
A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee {yhlee06, lee}@cnu.ac.kr http://networks.cnu.ac.kr/~yhlee Chungnam National University, Korea January 8, 2013 FloCon 2013 Contents Introduction
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce Anjali P P and Binu A Department of Information Technology, Rajagiri School of Engineering and Technology,
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Efficacy of Live DDoS Detection with Hadoop
Efficacy of Live DDoS Detection with Hadoop Sufian Hameed IT Security Labs, NUCES, Pakistan Email: [email protected] Usman Ali IT Security Labs, NUCES, Pakistan Email: [email protected] Abstract
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
Generic Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Suresh Lakavath csir urdip Pune, India [email protected].
A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India [email protected]. Ramlal Naik L Acme Tele Power LTD Haryana, India [email protected]. Abstract Big Data
A very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
BIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
Certified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014)
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE N.Alamelu Menaka * Department of Computer Applications Dr.Jabasheela Department of Computer Applications Abstract-We are in the age of big data which
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
ITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
MapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially
Hadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
International Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
Big Data Analytics OverOnline Transactional Data Set
Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop
Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop 1 Amreesh kumar patel, 2 D.S. Bhilare, 3 Sushil buriya, 4 Satyendra singh yadav School of computer science & IT, DAVV, Indore,
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies
Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies Savitha K Dept. of Computer Science, Research Scholar PSGR Krishnammal College for Women Coimbatore, India. Vijaya MS Dept.
International Journal of Emerging Technology & Research
International Journal of Emerging Technology & Research High Performance Clustering on Large Scale Dataset in a Multi Node Environment Based on Map-Reduce and Hadoop Anusha Vasudevan 1, Swetha.M 2 1, 2
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Oracle Big Data Essentials
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Distributed Processing of Snort Alert Log using Hadoop
Distributed Processing of Snort Alert Log using Hadoop JeongJin Cheon #1, Tae-Young Choe 2 # Department of Computer Engineering, Kumoh National Institute of Technology Gumi, Gyeongbuk, Korea 1 [email protected]
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Addressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Apache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
Query and Analysis of Data on Electric Consumption Based on Hadoop
, pp.153-160 http://dx.doi.org/10.14257/ijdta.2016.9.2.17 Query and Analysis of Data on Electric Consumption Based on Hadoop Jianjun 1 Zhou and Yi Wu 2 1 Information Science and Technology in Heilongjiang
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Peers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
HadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
