Impact of Big Data: Networking Considerations and Case Study

Size: px
Start display at page:

Download "Impact of Big Data: Networking Considerations and Case Study"

Transcription

1 30 Impact of Big Data: Networking Considerations and Case Study Yong-Hee Jeon Catholic University of Daegu, Gyeongsan, Rep. of Korea Summary which exceeds the range possible to store, manage, and Due to the explosive growth of data volume by mobile devices analyze it by general database software. On the other hand, and SNS(Social Networking Service), Big Data has recently it is defined by Korean President s Council on National become one of the important issues in the networking world. Big ICT Strategies as an Information Technology which may traffic is generated as Big Data processing steps and multiple extract valuable information by utilizing and analyzing a regionally distributed data centers are included, and/or data are large volume of data, and may actively react and predict delivered among clusters for the purpose of storage hierarchy management. Therefore, Hadoop clusters in such a Big Data change based on the knowledge generated [2-3]. Therefore, environment require high-speed networking fabric with multi- the meaning of Big Data is becoming expanded as relative Giga bits speed. In this paper, networking infrastructure one which may obtain a value beyond some criteria rather considerations to support Big Data are studied and Big Data than simple data volumes and technology aspects. networking architecture is presented through the case study of It was estimated that such a huge volume of data were Cisco. generated due to the following main elements[4-5]: Key words: - - Mobility trend: mobile devices, mobile events and Big Data, Big Traffic, Networking Considerations, Case Study. sharing, and sensory integration, - - Data access and consumption: Internet, interconnected systems, social networking, convergent interfaces and access models (Internet, search and social networking, and 1. Introduction Big Data is also called as Very large data, Extreme data, and Total data, etc. and the first criterion was the volume of data. Although there is no exact definition of Big Data, it sometimes refers to more data than ZB (Zeta Byte) range and also means data which require distributed parallel processing technology for the analysis of large volume of data such as Hadoop. 1 ZB is a huge amount of data which corresponds to one trillion Giga bytes[1]. As data volume grows explosively by mobile devices and SNS(Social Networking Service), Big Data has recently become an important issue in IT(Information Technology) field. Among them, the scale of data generation by widespread usage of mobile devices is becoming huge. We have already entered in the age of zeta bytes as the digital information amount generated by whole world data reached 1.8 ZB in 2011[1]. According to Cisco, it was forecasted that mobile data grows with an average rate of 78% from 2011 to The only mobile traffic scale generated in 2016 was forecasted to reach 10.8 Exa Bytes[2]. One Exa byte equals to one quintillion bytes (1 ZB = 1,024 EB). Based on the definitions described above, Big Data may be further defined as a huge amount of structured or unstructured data set that is difficult to collect, store, analyze, and manage with the existing methods due to its volume. By Mckinsey, Big Data is defined as such data messaging), - - Ecosystem capabilities: Major changes in information processing model and the availability of an open framework; the general-purpose computing and unified network integration. Big Data has characteristics from the aspects of scale, velocity and pattern as the following[1]: - - Means large volume of data in conceptual range as well as in simply stored physical size: Volume of data already exceeded 100 EB at the end of 1990s, reached 1.8 ZB at 2011, and we have already entered in the age of ZB. By 2020, it was forecasted that the volume of data will be 50 times bigger than the one at 2011, which means the main age of ZB. - - Have characteristics that are produced in real-time and are disseminated very rapidly: In 1980s and 1990s, the structured data was a main stream. However, the data became more diverse, complex and socialized in the 2000s and 2010s. In the 2020s and 2030s, the reality and realtimeliness of data will become important characteristics. - Integrated processing of the existing structured formal data as well as unstructured data uploaded in Internet board, Facebook, SNS etc.. In the 1980s and 1990s, the main stream of data was the structured data such as database, office information, etc. In the 2000s and 2010s, we had entered in the age of unstructured data such as e- mail, multimedia, and SNS. In the 2020s and 2030s, it was Manuscript received December 5, 2012 Manuscript revised December 20, 2012

2 31 forecasted that we will enter in the age of machine information and cognitive information data such as RFID, sensor and machine-to-machine (M2M). Three main elements are required for the utilization of Big Data as follows[2]: - Cloud computing: Because it is difficult to process Big Data with the existing analysis tools, cloud computing technology such as MapReduce, Hadoop, and Hbase is required to analyze and process data. - Networking environment: In order to implement the analysis result by using the real-time cloud computing technology, the construction of network infrastructure is needed. - Real-time usability: Regardless where data is generated, it should be possible to use in real-time basis. As discussed above, due to the exponential growth of digital information volume, Big Data has been an important issue. The government of United States of America had thus established the active utilization strategy of Big Data in March 2012 through the Big Data R&D Initiative [6]. In Korea, communication industry and Big Data have very close relationships. Therefore it is necessary to study domestically on the efficient network infrastructure for Big Data[2]. Accordingly, this paper intends to present the networking considerations and Cisco case study for Big Data. 2. Networking Considerations 2.1 Big Data and Big Traffic Big Data from multi-site corporate produces big traffic. Big Data applications also induce the massive amounts of traffic with significantly increased real-time and workloadintensive transactions. The movement of large volume of data set over WAN(Wide Area Network) is required to support the Hadoop applications before execution, during execution, and after execution of them. IRG(Internet Research Group) recommends examining on big traffic as earliest as possible when the Hadoop cluster installation is considered and/or planned[7]. The reason is that the scalability and usability of Hadoop cluster may be damaged without understanding the role of WAN in the application of enterprise Hadoop. The problem of big traffic arises when the processing stages of Big Data and multiple geographically distributed data centers are included. It also happens due to the propagation of data among clusters for the purpose of storage hierarchy management. In these environments, the Hadoop cluster requires a high-speed networking fabric for multi-giga bits speed. The enterprise networks also should be optimized to provide a strong infrastructure for the volume, velocity, and accessibility of data, supporting the traditional transaction-oriented RDBMS and various applications such as Big Data[8]. Based on the IDC white paper[8], traffic patterns tends be bursty and variable partly because of the uncertainties of movement of data over the network at any given time. Delays in data transfer were noted to be significant unless the requisite network resources are provided. To achieve appropriate network efficiency, proper line rate performance and rightsizing switch capacity are stated to be necessary. 2.2 Network Characteristics Typically, one or more of the following phases of MapReduce jobs were found to transfer data over the network, based on [5]: 1) Writing data: The initial data is written in HDFS(Hadoop Distributed File System) either by streaming or bulk-delivering. When additional data is transferred over the network, data blocks of the loaded files are replicated. 2) Workload execution: The MapReduce algorithm is run in the following four phases: - Map phase: If the data block is not locally available and has to be requested from another data node (i.e., HDFS locality miss occurs), the network is used at the beginning of the map phase. - Shuffle phase: In this phase, the intermediate data is transfered between the servers. Data is transferred over the network when the output of the mappers is shuffled to the reducers. - Reduce phase: In this phase, the data is locally aggregated on the servers. Almost no traffic is sent over the network in this phase because the reducers have all the data they need from the shuffle phase. - Output replication: MapReduce output is stored as a file in HDFS. The network is used when the blocks of the result file have to be replicated by HDFS for redundancy. 3) Reading data: This phase occurs when the final data is read from the HDFS for consumption by the end application, such as the website, indexing, or SQL database. In addition, it was noted that the network is crucial for the Hadoop control plane: the signaling and operations of HDFS and the MapReduce infrastructure. Based on the test results, [5] presents the relative importance of parameters to job completion as shown in the following order: - Availability and resiliency: To provide a network that is available and resilient, the deployed network architecture should provide the required redundancy and that can also

3 32 scale as the cluster grows. Switches and routers should also provide availability and resiliency. - Burst handling and queuing: Because several HDFS operations and phases of MapReduce jobs are bursty, a network is required to handle bursts effectively. Switches and routers with architectures that employ buffer and queuing strategies that can handle bursts effectively should be chosen. - Oversubscription ratio: Because overprovisioning the network can be costly, it was noted that generally accepted oversubscription ratios are around 4:1 at the server access layer and 2:1 between the access layer and the aggregation layer or core. It was concluded that network architecture that deliver a linear increase in oversubscription with each device failure are better than architectures that degrade dramatically during failures. - Data node network speed: It was recommended that data nodes should be provisioned with enough bandwidth for efficient job completion, considering the trade-off relationship between price and performance. - Network latency: It describes that variations in switch and router latency have a minimal impact on cluster performance. A network wide analysis is denoted more important than device level. Moreover, it points out that the latency contribution to the workload is much higher at the application level due to the application logic such as JVM(Java Virtual Machine) software stack, socket-buffer etc than network latency. In any case, it was revealed that slightly more or less network latency will not noticeably affect job completion times. Therefore, it is necessary to have more aggressive and proactive approaches for the planning of Hadoop cluster to support the analysis of Big Data or for the planning of network architecture to support other configurations. IDC white paper[8] denotes that the network is an essential foundation for transactions between massively parallel servers within Hadoop or other architectures and between the server cluster and existing enterprise storage systems. In [8], the hyperscale network architecture mentioned above is called as holistic network. The advantages to the holistic network approach were described as the following: - Ability to minimize duplicative costs whereby one network can support all workloads, - Multitenancy to consolidate and centralize Big Data projects, - Ease of network provisioning where sophisticated intelligence is used to manage workloads based on the business priorities, - Ability to leverage network staffing expertise across the datacenter. Other factors affecting the design and implementation of networks was noted as governance or regulation requirements[8]. For example, in the application of health care, the separation of data plane may be necessary to meet the privacy requirements of sensitive data in the application. 3. Cisco Case Study 2.3 Networking Requirements Due to the change of traffic sources and patterns by the Big Data, network must deal with the phenomenon of traffic shift from server-to-client pattern (which has a traditional enterprise or web server farm characteristics) to server-to-server traffic flow among data center network fabrics. This type of horizontal flow includes links between servers and requires intelligent storage systems to be increased. Big Data imposes its own computing infrastructure requirements and should incorporate essential functions such as creation, collection, storage and analysis of data. These particular processing requirements are distributed server clusters made up of hundreds or thousands of nodes. In [8], the modular installation of servers at hyperscale has been presented as the preferred method to meet those requirements. The hyperscale server architectures may consist of thousands of nodes which have many processors and disks. Therefore, the networking infrastructure which connects these nodes must be scalable and resilient for the optimal performance, especially when the data are shuffled among them during a certain application phase. 3.1 Unified Network Fabric The networking considerations for Big Data presented by Cisco are based on the real network traffic patterns of Hadoop framework. By understanding the traffic pattern of an application, the coordination between the application and network design may be possible. In order to accommodate Big Data, it was proposed in [5,8] that the two main building blocks are being added to the enterprise stack as shown in Fig. 1: - - Hadoop: It is required to provide storage capability through a distributed, shared-nothing file system, and analysis capability through MapReduce. - - NoSQL: It is required to provide the capability to capture, read, and update, in real time, the large influx of unstructured data without schemas (e.g., click streams, social media, log files, event data, mobility trends, and sensor and machine data). Once the basic enterprise requirements are given, these two Big Data components are integrated with the existing enterprise business model. Hadoop is necessary to provide the framework to handle massive amounts of data. The purpose is to either transform it to a more usable structure and format or

4 33 analyze and extract valuable analytics from it. To efficiently process massive amounts of data, it was noted that it is important to move computing to where the data is, using a distributed file system, rather than a central system, for data. Therefore, Cisco proposes that a single large file is split into blocks, and the blocks are distributed among the nodes of the Hadoop cluster. In [8], it was noted that this localized data/compute model introduces two distinct variables: complex data life-cycle management and matching of nodal capacity in terms of compute and I/O need for variety of workloads. Application Virtualized, Bare-Metal, Cloud Logs Click Streams Event Data Social Media Sensor Data Mobility Trends Cisco Unified Fabric Big Data NoSQL Traditional Database Storage Big Data hadoop Real-Time Capture, Read & Update RDBMS SAN/NAS Store and Analyze In the IDC white paper[8], this unified Ethernet fabric is denoted as more flatter and converged networks. Through the converged networks, it is stated that the complexity and cost due to the multi-fabric, separate adaptor and cabling may be reduced. It also describes that the flatter-based network architecture maximize the network efficiency, reduces congestion, and may deal with the limitedness of spanning tree by creating the active layer 2 network path for load balancing and redundancy. Compared with the traditional Ethernet fabric, it is mentioned that the unified Ethernet fabric maximizes the performance and availability of applications while the cost and complexity may be reduced. The unified Ethernet fabric may result in the full link utilization by using the multi-path through the network and by consistently deciding the most efficient path. This architecture also has a superior scalability. In [8], it was mentioned that the unified fabric brings the following benefits to Big Data: Fig. 1. Big Data Building Blocks and Cisco Unified Fabric[5] - Scalability: The fabric can scale incrementally with the growth of big data applications. - - Multitenant architecture: The fabric has ability to provide a multitenant architecture across multiple use cases. - Machine-to-machine traffic: With resource buffering that is integral to Big Data infrastructure architecture, the fabric was denoted to be designed for machine-to-machine traffic flows 3.2 Test Results Because many types of workloads can be run to use distributed computing facilities in Hadoop, there are many factors that affect workload completion times. In order to demonstrate the behavior of the workloads in the network, two types of them were used in the test[5]. - Business Intelligence (BI) workload: This workload is a reduced-function workload in which the amount of output data is much smaller than input data. For example, this workload takes 1TB data as an input and outputs 1 MB data.

5 34 - Extract, Transform, and Load (ETL) workload: In ETL workloads, a large amount of data needs to be converted to another format suited for various applications. These types of work loads are found most common in enterprises. When multiple data nodes running mappers finish the map task and reducer pull the data from each node finishing the map task, it is shown that the multiple bursts of receive data exist. It was also found that this traffic is minimal as the data node is performing a compute intensive map task. From the ETL workload benchmark, the whole event is shown with a data node receiving a large amount of data from all the senders. This is due to the fact that the output of the ETL work load remains the same as input. In the test result of non-local data impact, it is shown that the initial spike occurs in received traffic before the reducers start. This spike represents data that each map task needs that is not local. In the test of Hadoop reduce-shuffle phase, it is shown that there is significant amount of traffic because the entire data set needs to be shuffled across the network. The spikes are made up of many short-lived flows from all the nodes in the job and can potentially create temporary burst trigger short-lived buffer and I/O congestion. In the figure of the aggregate traffic during replication, it also shows the spike caused by multiple nodes sending data at the same time. 4. Conclusions While the steel and coal, and Internet was the main elements supporting the world economic change during the industry revolution and IT revolution respectively; Big Data is expected to play the main role of economic change during the upcoming mobile smart revolution[3]. In order to use Big Data efficiently, the construction of network infrastructure for the implementation of real-time analysis results by using the Cloud Computing technology is required. Big Data produces big traffic and thus results in the significant burden to the network infrastructure. Therefore, the enterprise network should be optimized to support a strong foundation in terms of volume, speed, and accessibility of data for both traditional transactionoriented RDBMS and diverse applications such as Big Data. In this paper, networking infrastructure considerations to support Big Data are surveyed and Big Data networking architecture is presented through the case study of Cisco. References [1] Korean President s Council on National ICT Strategies, The national development strategy in Big Data age, pp.49-58, August [2] Sung-Choon Lee, The viewpoint on the Big Data utilization and communication industry, pp. 6-11, Vol. 60, Spring [3] Sung-Choon Lee, Yang-Soo Lim, Min-Jee Ahn; Big Data: The key to open the future, KT Economics and Management Research Center Report, July [4] Cisco White Paper, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update , Feb [5] Cisco White Paper, Big Data in the Enterprise: Network Design Considerations, [6] Eung-Yong Lee, "The USA government Big Data R&D strategy", KISA, Internet and Security Issue, pp.3-26 August, [7] Internet Research Group, Big Data, Big Traffic and the WAN, Jan [8] Lucinda Borovick and Richard L. Villars, The Critical Role of the Network in Big Data Applications, IDC White Paper, April Yong-Hee Jeon received the B.S degree in Electrical Engineering from Korea University in 1978 and the M.S and Ph. D degrees in Computer Engineering from North Carolina State University at Raleigh, NC, USA, in 1989 and 1992, respectively. From 1978 to 1985, he worked at Samsung and KOPEC(Korea Power Engineering Co.). Before joining the faculty at CUD (Catholic University of Daegu) in 1994, he worked at ETRI(Electronics and Telecommunications Research Institute) from 1992 to Currently, he is a Professor at the School of Information Technology Engineering in the CUD, Gyeongsan, Korea. Since January 2008, he has been a Vice- President of KIISC(Korea Institute of Information Security and Cryptology).

WHITE PAPER The Critical Role of the Network in Big Data Applications

WHITE PAPER The Critical Role of the Network in Big Data Applications WHITE PAPER The Critical Role of the Network in Big Data Applications Sponsored by: Cisco Systems Lucinda Borovick February 2012 Richard L. Villars EXECUTIVE SUMMARY Global Headquarters: 5 Speen Street

More information

Big Data in the Enterprise: Network Design Considerations

Big Data in the Enterprise: Network Design Considerations White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Cisco Unified Data Center

Cisco Unified Data Center Solution Overview Cisco Unified Data Center Simplified, Efficient, and Agile Infrastructure for the Data Center What You Will Learn The data center is critical to the way that IT generates and delivers

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM. 2012-13 CALIENT Technologies www.calient.

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM. 2012-13 CALIENT Technologies www.calient. The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM 2012-13 CALIENT Technologies www.calient.net 1 INTRODUCTION In datacenter networks, video, mobile data, and big data

More information

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

INFORMATION EVERYWHERE, BUT WHERE' S THE KNOWLEDGE?

INFORMATION EVERYWHERE, BUT WHERE' S THE KNOWLEDGE? WHITE PAPER Big Data and the Network Sponsored by: Brocade/EMC Greenplum Richard L. Villars November 2011 Lucinda Borovick IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200

More information

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution , pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik

More information

Survival Tips for Big Data Impact on Performance Share Pittsburgh Session 15404

Survival Tips for Big Data Impact on Performance Share Pittsburgh Session 15404 Survival Tips for Big Data Impact on Performance Share Pittsburgh Session 15404 Laura Knapp WW Business Consultant Laurak@aesclever.com ipv6hawaii@outlook.com 08/04/2014 Applied Expert Systems, Inc. 2014

More information

Juniper Networks QFabric: Scaling for the Modern Data Center

Juniper Networks QFabric: Scaling for the Modern Data Center Juniper Networks QFabric: Scaling for the Modern Data Center Executive Summary The modern data center has undergone a series of changes that have significantly impacted business operations. Applications

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA Big Data: What You Should Know Mark Child Research Manager - Software IDC CEMA Agenda Market Dynamics Defining Big Data Technology Trends Information and Intelligence Market Realities Future Applications

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

I D C A N A L Y S T C O N N E C T I O N. T h e C r i t i cal Role of I/O in Public Cloud S e r vi c e P r o vi d e r E n vi r o n m e n t s

I D C A N A L Y S T C O N N E C T I O N. T h e C r i t i cal Role of I/O in Public Cloud S e r vi c e P r o vi d e r E n vi r o n m e n t s ($B) I D C A N A L Y S T C O N N E C T I O N Rick Villars Vice President, Information and Cloud T h e C r i t i cal Role of I/O in Public Cloud S e r vi c e P r o vi d e r E n vi r o n m e n t s August

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Scalable Approaches for Multitenant Cloud Data Centers

Scalable Approaches for Multitenant Cloud Data Centers WHITE PAPER www.brocade.com DATA CENTER Scalable Approaches for Multitenant Cloud Data Centers Brocade VCS Fabric technology is the ideal Ethernet infrastructure for cloud computing. It is manageable,

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Big Data Driven Knowledge Discovery for Autonomic Future Internet Big Data Driven Knowledge Discovery for Autonomic Future Internet Professor Geyong Min Chair in High Performance Computing and Networking Department of Mathematics and Computer Science College of Engineering,

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Big Data, Big Traffic. And the WAN

Big Data, Big Traffic. And the WAN Big Data, Big Traffic And the WAN Internet Research Group January, 2012 About The Internet Research Group www.irg-intl.com The Internet Research Group (IRG) provides market research and market strategy

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including

More information

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples

More information

Cloud-Based Services: Assure Performance, Availability, and Security

Cloud-Based Services: Assure Performance, Availability, and Security Cloud-Based Services: Assure Performance, Availability, and Security What You Will Learn Services available from the cloud offer cost and efficiency benefits to businesses, but until now many customers

More information

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Software-Defined Networks Powered by VellOS

Software-Defined Networks Powered by VellOS WHITE PAPER Software-Defined Networks Powered by VellOS Agile, Flexible Networking for Distributed Applications Vello s SDN enables a low-latency, programmable solution resulting in a faster and more flexible

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Data Center Fabrics and Their Role in Managing the Big Data Trend

Data Center Fabrics and Their Role in Managing the Big Data Trend Data Center Fabrics and Their Role in Managing the Big Data Trend The emergence of Big Data as a critical technology initiative is one of the driving factors forcing IT decision-makers to explore new alternatives

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Networking Issues For Big Data

Networking Issues For Big Data Networking Issues For Big Data. Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides and audio/video recordings of this class lecture are at: http://www.cse.wustl.edu/~jain/cse570-13/

More information

Requirements of Voice in an IP Internetwork

Requirements of Voice in an IP Internetwork Requirements of Voice in an IP Internetwork Real-Time Voice in a Best-Effort IP Internetwork This topic lists problems associated with implementation of real-time voice traffic in a best-effort IP internetwork.

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Packet Flow Analysis and Congestion Control of Big Data by Hadoop

Packet Flow Analysis and Congestion Control of Big Data by Hadoop Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.456

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS

More information

TRILL Large Layer 2 Network Solution

TRILL Large Layer 2 Network Solution TRILL Large Layer 2 Network Solution Contents 1 Network Architecture Requirements of Data Centers in the Cloud Computing Era... 3 2 TRILL Characteristics... 5 3 Huawei TRILL-based Large Layer 2 Network

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

BUILDING A NEXT-GENERATION DATA CENTER

BUILDING A NEXT-GENERATION DATA CENTER BUILDING A NEXT-GENERATION DATA CENTER Data center networking has changed significantly during the last few years with the introduction of 10 Gigabit Ethernet (10GE), unified fabrics, highspeed non-blocking

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

VMDC 3.0 Design Overview

VMDC 3.0 Design Overview CHAPTER 2 The Virtual Multiservice Data Center architecture is based on foundation principles of design in modularity, high availability, differentiated service support, secure multi-tenancy, and automated

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Building a Scalable Big Data Infrastructure for Dynamic Workflows Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts

More information

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk WHITE PAPER Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDisk Corporation. All rights reserved. www.sandisk.com Table of Contents Introduction

More information

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R O r a c l e V i r t u a l N e t w o r k i n g D e l i v e r i n g F a b r i c

More information

Platfora Big Data Analytics

Platfora Big Data Analytics Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers

More information