The Quest for Conformance Testing in the Cloud
|
|
|
- Deirdre Bruce
- 10 years ago
- Views:
Transcription
1 The Quest for Conformance Testing in the Cloud Dylan Yaga Computer Security Division Information Technology Laboratory National Institute of Standards and Technology
2 NIST/ITL Computer Security Division Efforts NIST/ITL Computer Security Division supports the development of biometric conformance testing methodology standards and other conformity assessment efforts through active technical participation in the development of these standards and the development of associated conformance test architectures and test suites. These test tools are developed to promote adoption of these standards and to support users that require conformance to selected biometric standards, product developers and testing labs. These test tools are collectively titled BioCTS
3 BioCTS Background Traditional Desktop-Based Application Developed in Microsoft C# either with the Installer Graphical User Interface, or Command Line, used to test conformance to Biometric Data Interchange Records Two Conformance Test Architectures (CTAs) Tests Implementations of ANSI/NIST-ITL and ANSI/NIST-ITL Update: 2013 Tests Implementations of select ISO/IEC X Generation 1 & 2 Data Formats, ANSI/INCITS, and several PIV Profiles of Standards Tests 1000s+ of files in a single Batch Test Allows Editing of Implementations Under Test Provides Charts, and Detailed Test Results in Text and XML Format
4 BioCTS Screen Shots
5 The Quest BioCTS, like most Desktop-Based applications, is only as powerful as the computer it is being run on. While pursuing methods of increasing performance, several questions were asked: What if BioCTS could run on multiple computers at once? What if BioCTS could run on multiple computers at once, and that fact was transparent to the end-user? What if BioCTS could run on multiple computers at once, and that fact was transparent to the end-user, and the number of computers would scale up or down depending on the work load? Finally, the question was asked: What if BioCTS ran in the cloud?
6 Benefits of Cloud Software Scalable, Powerful Computing Many computers working together The amount of computing power can increase, or decrease depending on the workload Redundancy and Work Load Balancing Often Cloud Frameworks automate redundant distribution of data, in case one of the many computers fails Often Cloud Frameworks ensure there are as little idle computers as possible On Demand Computing Use it when needed, stop when work is done
7 Interests in Testing in the Cloud Several organizations have shown interest in Testing in the Cloud BCC 2012 Bojan Cukic, CITeR: Services and Computational Cloud: The Inevitable Future of Biometrics BCC Ravi Panchumarthy, Ravi Subramanian, and Sudeep Sarkar, University of South Florida: Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge BCC 2013 Prof. Young-Bin Kwon and Dr. Jason Kim - Chung-Ang Univ. and Korea Internet & Security Agency (KISA) - K-NBTC BIOMETRIC TEST SERVICES AND CERTIFICATION
8 Apache Hadoop Background Open Source Framework for creating scalable distributed computing system (also known as a cluster) Runs on commodity computers (not specialized hardware) known as Nodes Linux & Java Based Processes Data using the MapReduce Programming Model MapReduce originally referred to the proprietary Google technology, now is a general term for the technology Maps portions of data out to Nodes, and processes it Reduces the portions of processed data results into an aggregation
9 Setting the Stage for the Quest To perform this research with as little ramp up time as possible, a preconfigured Virtual Machine was used This Virtual Machine was pre-setup with Apache Hadoop, Java and several other helpful utilities A Virtual Machine, available for download from Cloudera, Inc., was used The Cloudera, Inc. QuickStart Virtual Machine image used was version 4.X (4.4 was initially used, but 4.7 has also been tested) The Virtual Machine QuickStart images can be found:
10 Benefits of using a Virtual Machine (VM) VMs are portable The same VM can be used on several different computers. VMs can have their state saved and restored. A VM can have the entire state saved to be restored if things become unstable later on; less time repeating work VMs are how a lot of cloud services work (e.g., Amazon Elastic Compute Cloud (E2C) works off VMs)
11 Problems Encountered on the Quest Platform Incompatibilities Linux vs. Microsoft C# Microsoft within a Java MapReduce Job How Apache Hadoop Processes Data vs. How BioCTS Processes Data Large Files, many splits vs. Small Discrete Files, no splitting Linefeed Characters within Files
12 Problem 1: Platform Incompatibilities How can the Linux and Java-based Apache Hadoop Framework work with the Microsoft C# BioCTS Conformance Test Suites? Possible Solutions: Rewrite the Software in Java No: Long process, could introduce bugs, may not provide 100% exact compatibility with existing Conformance Test Suite Change Platforms Not yet: Initial implementation targets existing systems, which were already running Linux & Apache Hadoop Possible Future Work: Expanding to other Platforms Investigate a method for running Microsoft C# Under Linux Quickest method, if possible
13 Problem 1: Platform Incompatibilities - Solved Step 1: Find a way for Microsoft C# to Run under Linux The open source implementation of C#, known as Mono, is cross-platform, and runs well under Linux The BioCTS software compiles perfectly with Mono and can then run in Linux Step 2: Find a way for the Java-Based Apache Hadoop MapReduce to run C# A method called Hadoop Streaming enables Apache Hadoop to use any program/programming language that Linux supports to be embedded within a MapReduce Job This is where the information was sparse, the API is very informative, but literature and publications simply mention the capability in passing During research, was unable to locate a real world example of doing this exact procedure
14 Hadoop Streaming Hadoop Streaming is a JAR (Java Archive) file that is distributed with Apache Hadoop This JAR file allows a developer to create MapReduce jobs with any script or executable file, allowing the specification of Program for the Map function Program for the Reduce Function Input File Type Additional Files needed for distribution Allows Apache Hadoop to use any program that can run on Linux
15 Next Step Now that BioCTS could run on Linux within a Java-based MapReduce Job, the focus shifted on processing data This is where the second problem occurred Apache Hadoop and BioCTS process data in two fundamentally different ways At least, initially
16 Problem 2: Small File Problem Apache Hadoop processes large amounts of data by splitting it up, and often times from a single file BioCTS processes data by testing a large number of discrete files, as a whole with many inter-relationships needed to be tested within a file therefore it CANNOT be split A simple solution would be to concatenate the files but Apache Hadoop could still split them arbitrarily and not on file boundaries So, how to process many files at once, but have Apache Hadoop think it is one giant file?
17 Problem 2: Small File Problem Solved Sequence Files a file that is comprised of many key-value pairs The Key is the file path The value is the Data for that file Apache Hadoop accepts Sequence Files as input, and knows to split the file based on key-values Each Key, Value pair is recorded on its own line So the data does not get arbitrarily split!
18 Sub-Problem 2: Converting Files with Internal New Line Characters When converting the individual files into a Sequence File a problem was found: When the files-to-convert contained internal new line characters, the process would not work The Line Feed characters were placed in the Sequence File The solution? Eliminate the new line characters, but preserve the data because the complete file is needed for testing The Implementation: A quick conversion of the files-to-convert to a base-64 encoded file, would preserve the data, but not contain new line characters The base-64 encoded data would then have to be decoded by the BioCTS software before processing
19 base64 benefit: Single Line of Data Single ANSI/NIST-ITL Transaction, which is multiple lines (due to internal linefeed characters) Same ANSI/NIST-ITL Transaction, encoded as base64 a single line
20 BioCTS on Apache Hadoop - Overview Six Phase Process Some processing performed on the Client Computer before uploading files to Apache Hadoop Two MapReduce Jobs Additional processing performed on the Client Computer after processing has been performed within Apache Hadoop All of this has been automated with a script
21 BioCTS on Apache Hadoop Phase 1
22 BioCTS on Apache Hadoop Phase 2
23 BioCTS on Apache Hadoop Phase 3
24 BioCTS on Apache Hadoop Phase 4
25 BioCTS on Apache Hadoop Phase 5
26 BioCTS on Apache Hadoop Phase 6
27 BioCTS on Apache Hadoop All Together
28 BioCTS on Apache Hadoop Screen Shot
29 Success BioCTS on Apache Hadoop has successfully performed conformance testing on sample test data (e.g., ANSI/NIST-ITL )! Utilizing cloud computing technologies allows testing many files in parallel Have been able to utilize the existing Conformance Test Suites, leveraging the years of development, within an Apache Hadoop environment Successful mixture of technologies to create a solution The process could be used for other types of testing Performance Quality Image analysis Anything that runs on discrete files that can be done in parallel What was originally a research project resulted in a working implementation
30 Planned Next Steps Expanding the work to a multi-node cluster Ensuring that this process works on the latest versions of the required software Develop a user interface, so users are not using command lines Expand to all available Conformance Test Suites
31 Questions? Contact BioCTS Team All Available BioCTS Software Downloads are available from: BioCTS Team Dylan Yaga
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge
2013 Biometric Consortium Conference Sep17-19, 2013, Tampa, Florida, USA Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge Ravi Panchumarthy, Ravi Subramanian, and Sudeep Sarkar
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud
Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: [email protected] & [email protected] Abstract : In the information industry,
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
Cloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
Hadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Ubuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Click Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Sriram Krishnan, Ph.D. [email protected]
Sriram Krishnan, Ph.D. [email protected] (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
An Introduction to Private Cloud
An Introduction to Private Cloud As the word cloud computing becomes more ubiquitous these days, several questions can be raised ranging from basic question like the definitions of a cloud and cloud computing
Red Hat Storage Server
Red Hat Storage Server Marcel Hergaarden Solution Architect, Red Hat [email protected] May 23, 2013 Unstoppable, OpenSource Software-based Storage Solution The Foundation for the Modern Hybrid
Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi
Cloud Platforms, Challenges & Hadoop Aditee Rele Karpagam Venkataraman Janani Ravi Cloud Platform Models Aditee Rele Microsoft Corporation Dec 8, 2010 IT CAPACITY Provisioning IT Capacity Under-supply
Cloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud [email protected] 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Big Data Analytics OverOnline Transactional Data Set
Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,
Manjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
HDFS Cluster Installation Automation for TupleWare
HDFS Cluster Installation Automation for TupleWare Xinyi Lu Department of Computer Science Brown University Providence, RI 02912 [email protected] March 26, 2014 Abstract TupleWare[1] is a C++ Framework
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
Hadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams
Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
A. Aiken & K. Olukotun PA3
Programming Assignment #3 Hadoop N-Gram Due Tue, Feb 18, 11:59PM In this programming assignment you will use Hadoop s implementation of MapReduce to search Wikipedia. This is not a course in search, so
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box By Kavya Mugadur W1014808 1 Table of contents 1.What is CDH? 2. Hadoop Basics 3. Ways to install CDH 4. Installation and
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
MapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
CDH installation & Application Test Report
CDH installation & Application Test Report He Shouchun (SCUID: 00001008350, Email: [email protected]) Chapter 1. Prepare the virtual machine... 2 1.1 Download virtual machine software... 2 1.2 Plan the guest
Map Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
A Cost-Evaluation of MapReduce Applications in the Cloud
1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public
Announcements TIM 50 Teaching Evaluations Open March 3 through 16 Final Exam Thursday, March 19, 4-7PM Lecture 19 20 March 12, 2015 Cloud Computing Cloud Computing: refers to both applications delivered
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community
Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute [email protected] http://www.jcvi.org/cms/about/bios/kkrampis/
Solution for private cloud computing
The CC1 system Solution for private cloud computing 1 Outline What is CC1? Features Technical details Use cases By scientist By HEP experiment System requirements and installation How to get it? 2 What
File S1: Supplementary Information of CloudDOE
File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community
Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute [email protected] http://www.jcvi.org/cms/about/bios/kkrampis/
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Introduction to Cloud Computing
Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and
What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos
Research Challenges Overview May 3, 2010 Table of Contents I 1 What Is It? Related Technologies Grid Computing Virtualization Utility Computing Autonomic Computing Is It New? Definition 2 Business Business
Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
Microsoft Research Windows Azure for Research Training
Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone [email protected] June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................
Amazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
How To Understand Cloud Computing
Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition
BIG DATA IN BUSINESS ENVIRONMENT
Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania [email protected] 2 Faculty
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
HDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
Using Cloud Services for Test Environments A case study of the use of Amazon EC2
Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Lee Hawkins (Quality Architect) Quest Software, Melbourne Copyright 2010 Quest Software We are gathered here today to talk
Microsoft Research Microsoft Azure for Research Training
Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP
White Paper Big Data and Hadoop Abhishek S, Java COE www.marlabs.com Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP Table of contents Abstract.. 1 Introduction. 2 What is Big
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Market Basket Analysis Algorithm on Map/Reduce in AWS EC2
Market Basket Analysis Algorithm on Map/Reduce in AWS EC2 Jongwook Woo Computer Information Systems Department California State University Los Angeles [email protected] Abstract As the web, social networking,
