A Framework for Leveraging Cloud Computing to Facilitate Biometrics at Large-Scale
|
|
|
- Verity Atkinson
- 10 years ago
- Views:
Transcription
1 A Framework for Leveraging Cloud Computing to Facilitate Biometrics at Large-Scale John K. Mitchell III and Syed S. Rizvi Department of Information Sciences and Technology Pennsylvania State University, Altoona, PA {jkm5270, Submitted to 2014 International Conference on Security and Management (SAM'14), Track: Biometrics & Forensics Abstract The cloud computing paradigm provides an efficient environment for the large-scale use of biometric identification systems. However, there is no established standard for the configuration of the cloud or the type of components necessary to achieve optimal biometric performance. In this paper, we present a conceptual framework which addresses the challenges and the requirements of the biometrics at large-scale while dramatically increasing the performance. In this paper, our contributions are three-fold. First, we describe the related work pertaining to biometric integration with cloud computing. Second, we discuss the issues and challenges preventing biometrics from being adopted at large-scale which influenced the design choices of our proposed framework. Finally, we present the proposed framework and highlight the cloud services that should be utilized with the proper configurations to maximize the biometric efficiency. We also discuss the realization of our proposed framework and the components selected. To show the practicality of the framework, we chose Amazon Web Services even though our framework is applicable to any cloud platform that supports the design criteria. Keywords biometrics; cloud computing; large-scale implementation; authnetication; MapReduce; Hadoop I. INTRODUCTION Biometric technologies are replacing traditional passwordbased methods as a more effective authentication process in the realm of information security. In addition, biometrics are used for visas and passports, voter registration, border control, and credit card transactions [1]. The concept of biometric identification is especially common in the forensic domain. For instance, the Integrated Automated Fingerprint Identification System (IAFIS) [2] is used by the Federal Bureau of Investigations (FBI) to perform crime scene investigations, criminal background checks, or bank employee checks. Furthermore, biometrics are becoming increasingly popular in other areas of the world as well. For example, India has compiled the largest biometric database in the world [3] with the goal of eventually enlisting 1.2 billion citizens. Therefore, biometrics are expected to continually grow exponentially in the next few years. As the acceptance of biometric identification increases, there will be a higher demand from the public and the private sectors. This demand facilitates the need for large-scale biometric systems. However, there are many challenges and performance issues that prevent large-scale biometric applications from becoming fully realized. As a result, cloud computing has been proposed as a viable solution to the inhibitors associated with the biometrics. The cloud has many attractive features that could potentially benefit biometric systems and enable large-scale use which include: parallel processing, large storage space, elasticity, real-time support, and automated failover. Moreover, the highly scalable and flexible nature of the cloud reinforces the idea of integration with biometrics to increase performance. When implementing biometrics with cloud computing, there is a steep learning curve to understand the possible cloud services, configurations, and additional components. Our background and experience with the cloud, specifically Amazon Web Services (AWS) [4], allow us to formulate the necessary services and configurations to abstract the challenges and meet the performance requirements. There were several assumptions about the biometric systems which were taken into consideration when designing our framework. We assume that a multimodal system will be utilized since it provides greater reliability than unimodal systems. One example of a multimodal database is the Next Generation Identification (NGI) program [2] that is being developed by the FBI and is scheduled to replace IAFIS in the summer of A multimodal database significantly increases the quantity of data being stored and computational power required for processing. Additionally, the support of a fused algorithm to compute a decision from multiple modalities may be required. In this research, we design our framework to be user-friendly and simple for non-experts, enable the most efficient integration possible, be applicable to most cloud platforms, and facilitate large-scale use of biometrics. The rest of the paper is organized as follows: We begin in Section II by describing the related work. In Section III, we identify the challenges and requirements to be addressed in our framework. A comprehensive discussion on our cloud framework will be presented in Section IV with the emphasis on the configurations of the cloud services. In Section V, we provide components that meet the specifications presented in our framework and examine each component individually.
2 TABLE 1. SIGNIFICANCE OF EACH FRAMEWORK COMPONENT Critical Factors IaaS Distributed File System Framework Components Distributed Data Processing Distributed Database Distributed Database Management Scalability X Flexibility X Interoperability X Data Archiving X X X Data Processing X X X Near Real-time Speed X X X Fault-tolerance X X Finally, we conclude the paper and present future directions in Section VI. II. RELATED WORK The existing works pertaining to biometric integration with cloud computing are scarce. Moreover, these works do not provide a comprehensive solution to the challenges and requirements of biometrics in the form of a generic framework. However, there are a few relevant publications that are similar in some aspects to our paper. For instance, the authors [5] primarily discuss the MapReduce function and present an implementation involving similar components described in our paper. On the contrary, they only focus on the advantages of parallelism over sequential processing of data. They do not discuss other benefits of the cloud or offer a comprehensive framework. Peer et al. [6] present a case study that involves fingerprint recognition for the implementation. However, the authors are more concerned with how to move the biometric platform into the cloud and the challenges that could arise rather than the actual implementation. Kohlwey et al. [7] present a prototype system that is based on different requirements identified in their research. However, the system proposed by the authors does not incorporate any cloud services or configurations for the components. Instead, they only use the Hadoop software frameworks as the basis for their prototype system. On the other hand, our main focus is the framework and not the actual system used in our implementation. Another work [8] that is similar in scope utilizes the AWS for their biometric and cloud implementation. They do not utilize some of the important services found in our paper such as the Amazon EMR and Hadoop software. In their paper, the authors mention using the MapReduce framework in order to promote more efficient parallelism. We agree with this argument presented by the authors since the Amazon EMR is essential to the data processing capabilities that our framework can offer. There are several other works that contain similar topics but proceed in different directions. An example of such work is Gonzales et al. [9] where the authors focus on securing biometric data from open environments using encryption and access control. Similarly, Banirostam et al. [10] propose a unique method for increasing the security of behavioral biometrics in cloud computing. The main difference between the two works is that Banirostam et al. uses a method which gains behavioral data over time while Gonzales et al. incorporates advanced cryptography to minimize the exposure of the private key. Another research work that involves encryption is presented in [11] where the entire biometric database is encrypted and outsourced to the cloud. After reviewing the related literature, we believe that the cloud is a promising solution to the large-scale issues of biometrics. All of the related works agree that the cloud presents an efficient environment for the growth of biometric applications. However, none of these presents a very detailed framework for optimizing the biometric performance and enabling the large-scale adoption. III. CHALLENGES AND REQUIREMENTS OF BIOMETRICS In the current computing world, biometric identification systems are limited by scalability and flexibility constraints as well as several other challenges and performance requirements that cannot be achieved without access to nearly unlimited storage and processing power. To understand the importance of our cloud framework, the biometric issues that motivated our design must be understood as well. Thus, we identify both the known and expected inhibitors and requirements of a largescale biometric system. The factors that prevent large-scale biometric applications are addressed individually by the components in our framework as shown in Table 1. A. Scalability Conceptually, biometric systems should be able to adapt and accommodate to their own growth while providing higher or lower levels of productivity depending on the scale. On the contrary, the increasing population and demand causes scalability issues that need to be resolved in order to promote large-scale biometric applications. For instance, in previous implementations, the biometric databases have to be entirely redesigned whenever demand increases [12]. As a result, the
3 modifications are not cost efficient which outweighs the benefits of scaling the application. Furthermore, the procedures and technology for matching and retrieving templates will change over time. The most efficient and effective biometric system must be able to incorporate these design modifications without inducing a sufficient cost upon the consumer. B. Flexibility Flexibility is another important factor contributing to the scarcity of a highly efficient biometric system. The ability of a system to change quickly based on the growing requirements is directly related to the scalability. As we mentioned in the previous section, the biometric system must be able to adapt new methods and technology (i.e., next generation scanners) in a cost efficient manner. Additionally, the system should support multiple modalities and multimodal databases. The multimodal databases may require fused algorithms to compile the most accurate results which should be accounted for as well. C. Interoperability The existing biometric systems are developed by a variety of vendors adhering to different national standards. As a result, there is an overabundance of data with different formats and structure which causes an interoperability challenge. Specifically, the proprietary algorithms and variety of hardware utilized present challenges when attempting to share information between different vendors [13]. D. Data Archiving The large volume of data generated by template enrollment presents a challenge when attempting to store the data. The quantity of data depends on the type of modalities used and whether those modalities are used in tandem for a multimodal system. We anticipate that multimodal databases will be commonly employed by the public and private sectors which will result in petabytes of data at their disposal. This requires a database that is capable of supporting a multitude of records ranging from several to hundreds of millions [1]. E. Data Processing As previously discussed, the large amount of data associated with biometrics requires a vast storage capacity. In addition, an extensive volume of computation is required to process identification requests at the ideal speed for optimal performance. The processing power and memory constraints prevent individual servers or computers from processing the data while satisfying the speed and accuracy demands. F. Near Real-time Speed To meet the performance requirements, the biometric identifications need to be performed in near-real time. One of the performance requirements is that data streams should be continuous and in near real-time for improved data sharing. Furthermore, to increase data consistency, there should be support for near real-time updates among concurrent users [14]. The relative operating characteristic (ROC) curve can be calibrated to accommodate situations that dictate different identification requirements. This can only be achieved by altering the curve in a near real-time speed in order to effectively balance the security and convenience of the system. G. Fault-tolerance The possibility of hardware failure for a large-scale application is high and may result in system wide failure. In regards to biometrics, the system must be fault-tolerant to prevent single points of failure and ensure availability for the user. If failure occurs, the system should utilize redundant resources to minimize the interruption of the operation and maintain system reliability [14]. IV. PROPOSED FRAMEWORK There are multiple cloud services and deployment possibilities that should be considered when implementing biometrics with the cloud. These variables affect the configuration options that are available to the user and ultimately affect the overall performance of the biometric system. In our proposed framework, we describe the cloud services and additional components that should be utilized and the configurations necessary to promote the highest biometric performance achievable. Table 1 shows the significance of each component in regards to meeting the biometric performance requirements. Fig. 1 shows the relationship among the different components in our framework. The Infrastructure as a Service (IaaS) is used as the hosting environment for the other components as indicated by the arrows. However, the distributed file system extends further than the IaaS level since the distributed file system is used externally for data archiving and backup. A. IaaS IaaS was chosen as the underlying cloud service since the designer receives more administrative control than Platform as a Service (PaaS) or Software as a Service (SaaS). This allows for a higher degree of customization, which is important when leveraging the cloud to augment existing biometric systems. In addition, the entire biometrics infrastructure can be migrated into the cloud. Thus, a database modification can be performed easily in minutes. This demonstrates the elasticity of the cloud Figure 1: Relationship of framework components
4 Figure 2: Example of virtualization configurations and framework design since a biometric application can be scaled quickly based on demand. When accounting for the fixed cost of the cloud service, it is highly cost efficient compared to traditional deployment models. When examining the virtualization aspect of the cloud, the infrastructure determines the type of hypervisor utilized which affects the performance and capabilities of the system. There are two types of hypervisors known as Type 1 and Type 2, which represent bare-metal and hosted architectures, respectively. Fig. 2 shows the configuration options available to the developer and the design differences between the Type 1 and the Type 2 hypervisors. In our proposed framework, we chose bare-metal architecture when configuring the virtualization layer since it offers several advantages over hosted architecture. Bare-metal architecture usually provides greater performance due to the direct access of the I/O devices from the virtual machines. Hosted architecture results in less performance since the identification requests would have to be directed through an additional layer which would be the hosted operating system. Furthermore, bare-metal architecture supports real-time running on operating systems located inside the virtual machines, unlike hosted architecture [15]. The only complication, which would make hosted architecture more desirable, is the overhead involved in the installation of Type 1 hypervisor. After the type of hypervisor is chosen, the underlying virtualization technique must be taken into consideration. Fig. 2 illustrates the three main virtualization techniques and highlights the design choices of our framework. We selected paravirtualization since it is the most advantageous of the virtualization techniques in regards to optimal biometric performance. The other possibilities are binary translation or hardware assisted virtualization. The performance increase granted from each technique varies according to the coupling of the hypervisors. Paravirtualization enables the greatest performance due to the tightly coupled hypervisors while the loosely coupled hypervisors of binary translation cause significantly less performance. In addition, binary translation and hardware-assisted virtualization can actually result in performance degradation since the hypervisor must routinely interrupt the execution of a virtual machine [15]. Another advantage of paravirtualization is the lack of overhead when compared to the other two techniques. B. Distributed File System A distributed file system for archiving data is essential to meet the capacity demands of a large-scale biometric database. In our framework, the file system would be hosted on the IaaS, which offers nearly unlimited storage capacity and scalability. In addition, the file system must demonstrate functionality since it would be used externally for the immediate retrieval and backup of data. Furthermore, the system must be faulttolerant and robust which requires exceptional recovery and failover capabilities. Ideally, we believe the system should incorporate automatic failover. The failure of a node would no longer hinder the operations of the system. Instead, the task of the failed node would be sent to a redundant node with no interruption in the workflow. To the best of our knowledge, there are currently a very limited number of cloud services that include automatic failover in their recovery plans. C. Distributed Data Processing Distributed processing involves multiple computers or processors being utilized to execute applications. Parallelism represents the solution to the data processing challenges of biometrics. In parallel computing, data is partitioned across multiple nodes where computations are solved concurrently. When applied to biometrics, the calculations performed by the algorithm responsible for template matching would be broken down and distributed to several nodes. There are many big data processing tools and frameworks that can be utilized to increase the data processing capabilities of a biometric system. Some of these include MapReduce, Microsoft Dryad, Message Passing Interface (MPI), and Swift [15]. In order to choose the most effective framework or tool for biometrics, the size of the data set and the number of tasks must be taken into consideration. Therefore, we have chosen MapReduce since
5 the biometric systems require large input data sets and small number of tasks. MapReduce [15], [16], [17] is a data processing paradigm that enables parallel computations for analyzing and generating large data sets. The input data of a user would be sent into the cloud, partitioned, calculated in parallel, compiled, and sent back to the users in the form of a decision. This increases identification speed while promoting the most accurate results. MapReduce provides organization to the database by moving computations to the location of the data that complies with the data locality principle. As a result, the processing and query latency will decrease. D. Distributed Database Distributed databases [18] consist of portions of the database being stored in multiple physical locations. These locations are loosely coupled and do not share any physical components. Moreover, a distributed database is dependent on a distributed database management system. Therefore, we included database management as one of the components in our framework. The distributed database is preferable to a relational database for many reasons. The reliability and availability of the biometric system will increase dramatically. Furthermore, the biometric system expansion and modification is easier and does not affect other related systems. Distribution increases the query processing which improves the biometric performance. In addition, the system becomes more fault-tolerant since offline nodes do not interrupt operations. The distributed database should provide near real-time access to the data located in the biometric database. The record lookups should be in near real-time as well. E. Distributed Database Management System A distributed database management system controls the distributed database and routinely integrates data in the database to ensure that any change made by the user is updated. Moreover, the distributed database management system supports a distributed metadata repository for near real-time availability. We believe that metadata should be exploited to allow for easier interoperability among existing systems. Metadata can mitigate the problems caused by different template formats by describing the records and allowing for interchange V. REALIZATION OF THE FRAMEWORK In order to demonstrate the effectiveness of our framework, we designed a system based on the previously discussed specifications. We selected several components to be utilized for biometric integration. The AWS are the primary cloud components with several applications being included that are supported by Amazon. The additional applications originate from Apache Hadoop, which is an open source software framework that is synonymous with the big data. Fig. 3 illustrates the general workflow of the realized components listed in this Section. The input and output data is sent from Amazon Simple Storage (S3) to Amazon Elastic MapReduce (EMR) where parallel computation is performed. Amazon EMR will have several clusters, based on Hadoop architecture, depending on the modality of the biometric system. The two clusters shown in Fig. 3 indicate that the Apache Hive and the HBase should be run separately to improve performance. The HBase cluster is connected to the Hive cluster by the master nodes. The cluster consists of master and slave nodes. In particular, the slave nodes consist of task and core nodes. The Hadoop Distributed File System (HDFS) is used to store and process data located in the core nodes but does not interact with the task nodes. Finally, the nodes are a product of Amazon Elastic Compute Cloud (EC2). A. Amazon EC2 Amazon EC2 [19] is an IaaS that presents a virtual computing environment to the user. This service grants the ability to create virtual machines called instances on Amazon EC2 and exploit them for improved data processing. The number of virtual machines can be increased or decreased according to the scaling of the system. Moreover, Amazon EC2 utilizes the Xen virtualization in its underlying architecture, which has the benefit of allowing hardware assisted or paravirtualization. As stated in the previous section, we believe paravirtualization is the most effective technique for realizing the full potential of biometrics. It is important to note that the Amazon EC2 costs are included when purchasing the Amazon EMR service. This paper includes the Amazon EC2 in the components since it generates the instances used by Amazon EMR. B. Amazon S3 Amazon S3 [20] is a file storage web service that can be used for archiving and analyzing data when implemented with Amazon EC2. We chose Amazon S3 since it is reliable, durable, and supports the other components in our system. Amazon S3 stores and organizes data into buckets which are linked to each user account. In addition, Amazon S3 allows users to backup data from the HDFS, Amazon EMR, and the HBase which demonstrates effective recovery of data. The scalability or performance bottlenecks associated with traditional storage databases can be resolved by adding more nodes to increase the system speed, capacity, and throughput. C. Apache HDFS Apache HDFS [21] is an open source distributed file system that was chosen to be used in collaboration with Amazon S3. The HDFS is highly portable and is capable of storing large files while replicating data across several hosts to achieve data reliability. One notable feature is the data awareness demonstrated by the interaction of the job tracker and task tracker. The job tracker schedules map or reduce jobs to the task tracker depending on the data location. This reduces the amount of network traffic. The HDFS was primarily selected since it is more fault-tolerant than Amazon S3. In addition, the two file systems overcome many of their own design flaws when implemented together. Thus, it is beneficial but not necessary to include both file systems in our system design to achieve maximum biometric efficiency. There are several advantages for implementing Amazon S3 and HDFS. The HDFS nodes for storing the data are no longer
6 Figure 3: Typical workflow of realized components required since Amazon S3 provides a massive amount of storage capacity. Additionally, the HDFS data located in an Amazon EMR cluster is lost upon the termination of the cluster. On the other hand, Amazon S3 permanently stores the data ensuring that it is never lost. Another benefit is that HDFS nodes will not become overloaded when using the same data set for multiple jobs [22]. D. Amazon EMR Amazon EMR [22] introduces the MapReduce function to the virtual machines created with Amazon EC2. It is fully integrated with Amazon S3 and HDFS. This component is vital to the performance of biometric systems since the parallel processing speed is dependent on the MapReduce functions. The Amazon EMR cluster consists of master nodes and slaves. There is only one master node in a cluster and the master node is responsible for coordinating the slave nodes. The slave nodes are responsible for running the computations and storing the data. E. Apache HBase Apache HBase [23] is an open source distributed database that uses column compression to store large quantities of data. The HBase needs a file system to store its data which is represented in our prototype system by the HDFS. The purpose of the HBase is to augment the HDFS by adding more functionality and addressing several issues. One of these issues is the individual record lookup speed, which is significantly slow. The HBase allows for fast record lookups and access to a small amount of data without the additional latency. Another issue is the batch inserts and deletions. The HBase is an alternative approach that allows users to efficiently perform batch inserts or deletes as well as updates. In addition, there are several more reasons why we chose the HBase as the distributed database for our system. The HBase utilizes the Bloom filters for improved query when there is an extensive volume of data. There is a tight integration between the HDFS and the HBase, which allows for easier implementation. Furthermore, the regions that contain the HBase tables are split automatically and distributed as the quantity of data increases. Lastly, it is highly fault-tolerant and supports automatic failover for the RegionServers. The RegionServers are responsible for managing the regions in a distributed cluster.
7 F. Apache Hive Apache Hive [24] is an open source data warehouse for managing a distributed database while promoting data query and analysis. The Hive operates using a modified version of the SQL called Hive QL, which enables support for map and reduce functions and different data types. Additionally, unstructured data sources are supported by the Hive and are capable of being processed efficiently. Hive is capable of fabricating a metastore, which is located in MySQL database on the master node of the cluster. The metastore contains the partition names and data types. VI. CONCLUSION AND FUTURE WORK In this paper, we presented a conceptual framework for facilitating the biometric applications at large-scale. We identified the challenges and requirements of current biometric systems to provide the basis for our framework. We then designed the framework to address these issues while increasing the overall biometric performance. At last, we demonstrated the effectiveness of the framework by selecting components for a system that adheres to the criteria we specified. The abstraction of the learning curve will increase the productivity of developers when implementing the biometric with the cloud computing. We anticipate that researchers will be highly motivated to integrate biometric systems with the cloud, especially with a framework available for the enterprise environment. Since the framework and realization is conceptual, there are considerations and issues that we will address in the future version. The fused result of a multimodal biometric system could present a challenge for the MapReduce component. If an iteration procedure is proven to be more effective for biometric identification, the framework will probably need to be altered slightly to promote maximum efficiency. To address these issues, we will conduct an experiment using our framework for biometric and cloud integration and analyze the results. REFERENCES [1] Requirements for large-scale biometric systems. (2014) NeuroTechnology. [Online]. Available: [2] Next Generation Identification. (2014) Federal Bureau of Investigation. [Online]. Available: [3] Awareness and Communication Strategy Advisory Council. (2010, May). Aadhar - Communicating to a billion - An Awareness and Communication Report. New Delhi, India. [Online]. Available: [4] J. Varia and S. Mathew. (2014). "Overview of Amazon Web Services. Amazon Web Services." [Online]. Available: [5] Shelly and N. S. Raghava, "Iris recognition on Hadoop: A biometrics system implementation on cloud computing," In Proceedings of 2011 IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pp , Sept [6] P. Peer, J. Bule, J. Z. Gros, and V. Struc, "Building Cloud-based Biometric Services, Informatica, vol. 37, no. 2, pp , June [7] E. Kohlwey, A. Sussman, J. Trost, and A. Maurer, Leveraging the Cloud for Big Data Biometrics: Meeting the Performance Requirements of the Next Generation Biometric Systems, In Proceedings of IEEE World Congress Services (SERVICES), Washington D.C., pp , 4-9 July [8] R. Panchumarthy, R. Subramanian, and S. Sarkar, "Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge," In Proceedings of 2012 IEEE Fifth International Conference on Utility and Cloud Computing (UCC), pp , 5-8 Nov [9] D. G. Martinez, F. J. Castano, E. Argones Rua, J. L. Castro, and D. A. Silva, "Secure crypto-biometric system for cloud computing," In Proceedings of st International Workshop on Securing Services on the Cloud (IWSSC), pp , 6-8 Sept [10] H. Banirostam, E. Shamsinezhad, and T. Banirostam, "Functional Control of Users by Biometric Behavior Features in Cloud Computing," In Proceedings of th International Conference on Intelligent Systems Modelling & Simulation (ISMS), pp , Jan [11] J. Yuan and S. Yu, "Efficient privacy-preserving biometric identification in cloud computing," In Proceedings of 2013 IEEE INFOCOM, pp , April [12] R. Das. Biometrics in the cloud. Keesing Journal of Documents and Identity. No. 42, pp , Feb [13] The National Biometric Interoperability Framework and Capability Requirements. (2014, Feb.). Biometrics Institute. [Online]. Available: [14] A. Sussman, "Biometrics and Cloud Computing," In Proceedings of Biometrics Consortium Conference, pp. 1-8, 19 Sept [15] M. Hamdaqa and L. Tahvildari, "Cloud Computing Uncovered: A Research Landscape," Advances in Computers, vol. 86, pp , [16] D. Gannon and D. Reed, "Parallelism and the Cloud," Microsoft Corp., Redmond, WA., Oct [17] F. Li, B. C. Ooi, M. T. Özsu, and S. Wu. "Distributed Data Management Using MapReduce," in press. [18] Oracle Corp. (1999, Dec.). "Distributed Database Systems." [Online]. Available: [19] Amazon Web Services. (2013, Oct.). "Amazon Elastic Compute Cloud User Guide." [Online]. Available: [20] Amazon Web Services. (2006, Mar.). "Amazon Simple Storage Service Developer Guide." [Online]. Available: [21] Apache Software Foundation. (2014, Feb.). "HDFS Users Guide." [Online]. Available: [22] Amazon Web Services. (2009, Mar.). "Amazon Elastic MapReduce Developer Guide." [Online]. Available: mr-what-is-emr.html [23] Apache Software Foundation. (2014, Feb.). "Apache HBase Reference Guide." [Online]. Available: [24] Apache Software Foundation. (2013, Dec.). "Apache Hive GettingStarted." [Online]. Available:
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Cluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Cloud Courses Description
Cloud Courses Description Cloud 101: Fundamental Cloud Computing and Architecture Cloud Computing Concepts and Models. Fundamental Cloud Architecture. Virtualization Basics. Cloud platforms: IaaS, PaaS,
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi
Cloud Platforms, Challenges & Hadoop Aditee Rele Karpagam Venkataraman Janani Ravi Cloud Platform Models Aditee Rele Microsoft Corporation Dec 8, 2010 IT CAPACITY Provisioning IT Capacity Under-supply
Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Cloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud [email protected] 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
DATA SECURITY MODEL FOR CLOUD COMPUTING
DATA SECURITY MODEL FOR CLOUD COMPUTING POOJA DHAWAN Assistant Professor, Deptt of Computer Application and Science Hindu Girls College, Jagadhri 135 001 [email protected] ABSTRACT Cloud Computing
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud
Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud 1 S.Karthika, 2 T.Lavanya, 3 G.Gokila, 4 A.Arunraja 5 S.Sarumathi, 6 S.Saravanakumar, 7 A.Gokilavani 1,2,3,4 Student, Department
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Investigation of Cloud Computing: Applications and Challenges
Investigation of Cloud Computing: Applications and Challenges Amid Khatibi Bardsiri Anis Vosoogh Fatemeh Ahoojoosh Research Branch, Islamic Azad University, Sirjan, Iran Research Branch, Islamic Azad University,
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014)
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE N.Alamelu Menaka * Department of Computer Applications Dr.Jabasheela Department of Computer Applications Abstract-We are in the age of big data which
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Navigating Among the Clouds. Evaluating Public, Private and Hybrid Cloud Computing Approaches
Navigating Among the Clouds Evaluating Public, Private and Hybrid Cloud Computing Approaches June 2012 Much like the winds of change that continue to alter the cloud landscape in the skies above, a powerful
Building your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha [email protected] AWS Services Deployment & Administration Application Services Compute Storage Database Networking
5-Layered Architecture of Cloud Database Management System
Available online at www.sciencedirect.com ScienceDirect AASRI Procedia 5 (2013 ) 194 199 2013 AASRI Conference on Parallel and Distributed Computing and Systems 5-Layered Architecture of Cloud Database
Cloud Courses Description
Courses Description 101: Fundamental Computing and Architecture Computing Concepts and Models. Data center architecture. Fundamental Architecture. Virtualization Basics. platforms: IaaS, PaaS, SaaS. deployment
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
Introduction to Cloud Computing
Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS
INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS CLOUD COMPUTING Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing
Big Data - Infrastructure Considerations
April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright
EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching
Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching A SafePeak Whitepaper February 2014 www.safepeak.com Copyright. SafePeak Technologies 2014 Contents Objective...
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
Applied research on data mining platform for weather forecast based on cloud storage
Applied research on data mining platform for weather forecast based on cloud storage Haiyan Song¹, Leixiao Li 2* and Yuhong Fan 3* 1 Department of Software Engineering t, Inner Mongolia Electronic Information
How To Compare Cloud Computing To Cloud Platforms And Cloud Computing
Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cloud Platforms
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD
Building Out Your Cloud-Ready Solutions Clark D. Richey, Jr., Principal Technologist, DoD Slide 1 Agenda Define the problem Explore important aspects of Cloud deployments Wrap up and questions Slide 2
StorReduce Technical White Paper Cloud-based Data Deduplication
StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
2) Xen Hypervisor 3) UEC
5. Implementation Implementation of the trust model requires first preparing a test bed. It is a cloud computing environment that is required as the first step towards the implementation. Various tools
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
PRIME IDENTITY MANAGEMENT CORE
PRIME IDENTITY MANAGEMENT CORE For secure enrollment applications processing and workflow management. PRIME Identity Management Core provides the foundation for any biometric identification platform. It
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
International Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
C2030-280.Examcollection.Premium.Exam.34q
C2030-280.Examcollection.Premium.Exam.34q Number: C2030-280 Passing Score: 800 Time Limit: 120 min File Version: 32.2 http://www.gratisexam.com/ Exam Code: C2030-280 Exam Name: IBM Cloud Computing Infrastructure
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Cloud Computing. Chapter 1 Introducing Cloud Computing
Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together
Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
Big Data and Natural Language: Extracting Insight From Text
An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Relational Databases in the Cloud
Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating
Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce
Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Big data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
HDFS Cluster Installation Automation for TupleWare
HDFS Cluster Installation Automation for TupleWare Xinyi Lu Department of Computer Science Brown University Providence, RI 02912 [email protected] March 26, 2014 Abstract TupleWare[1] is a C++ Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
Performance Gathering and Implementing Portability on Cloud Storage Data
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1815-1823 International Research Publications House http://www. irphouse.com Performance Gathering
