G-Cloud Service Definition. Atos Big Data Strategy & Roadmap SCS



Similar documents
G-Cloud Service Definition. Atos Big Data Strategy & Roadmap SCS

G-Cloud Service Definition Canopy Big Data proof of concept Service SCS

G-Cloud Service Definition Canopy Big Data proof of concept Service SCS

G-Cloud Service Definition. Atos Data Quality Audit SCS

G-Cloud Service Definition. Atos Business Intelligence Dashboards and Analytics SCS

G-Cloud Service Definition. Atos Business Intelligence Dashboards and Analytics SCS

G-Cloud Service Definition. Atos Information Security Wireless Scanning Service

G-Cloud Service Definition. Atos SI Oracle CRM and CX Services

G-Cloud Service Definition Lotus Notes to Microsoft SharePoint Migration Discovery Service

G-Cloud 7 Service Definition. Atos Oracle Cloud ERP Implementation Services

G-Cloud Service Definition. Atos Software Development Services

G-Cloud Service Definition. Atos Rapid Pilot Mobile Application Development Service SCS

G-Cloud Service Definition. Atos Oracle Database Upgrade

G-Cloud Service Definition. Atos Digital Marketing Specialist Cloud Services

G-Cloud Service Definition. Atos Oracle Cloud ERP Implementation Services

G-Cloud Service Definition. Atos Accredited Oracle Business Intelligence Solutions SCS

G-Cloud Service Definition. Atos SharePoint Development Service

G-Cloud Service Definition. Canopy Unmanaged Enterprise Private Cloud (IL3 Capable) IaaS

Big Data Cloud Services

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

MICROSOFT DYNAMICS CRM

DATA ANALYTICS SERVICES. G-CLOUD SERVICE DEFINITION.

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

Growth Through Excellence

G-Cloud Service Definition. Atos Security Professional Services SCS

Supplier / Vendor Management Alchemmy Service for G-Cloud 7

Cloud Platform Development Services

SHAREPOINT SERVICE DEFINITION. G-CLOUD Commercial-in-Confidence. civil.lockheedmartin.co.uk

G-Cloud Service Definition. Atos infrastructure Vulnerability Scanning (Outpost24) SaaS

G-Cloud Service Definition. Canopy Remote Backup for Cloud SaaS

BIG DATA TRENDS AND TECHNOLOGIES

Strategies For Setting Up Your Organisation For Success With Big Data. Kevin Long Business Development Director Teradata

Hadoop. Sunday, November 25, 12

Box.com Enterprise Content Management Services

ARCHITECTURE SERVICES. G-CLOUD SERVICE DEFINITION.

G-Cloud Service Definition. Atos Call Centre Services SCS

G Cloud 4 Service Definition Document: CDG Common Digital Platform

G-Cloud Service Definition. Atos infrastructure Vulnerability Scanning (Outpost24) SaaS

IPL Service Definition - Data Recovery, Conversion and Migration

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Salesforce ExactTarget Marketing Cloud Consultancy and Implementation Services

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Suresh Lakavath csir urdip Pune, India

Advanced Big Data Analytics with R and Hadoop

Hadoop Big Data for Processing Data and Performing Workload

Navigating Big Data business analytics

Specialist Cloud Services Lot 4 Cloud Printing and Imaging Consultancy Services

Specialist Cloud Services Lot 4 Cloud EDRM Consultancy Services

IPL Service Definition - Master Data Management for Cloud Related Services

BYOD / Mobile Strategy Alchemmy Service for G-Cloud 7

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Customer Management - Cloud Transformation Services

Data Mining in the Swamp

BIG DATA IS MESSY PARTNER WITH SCALABLE

Data Refinery with Big Data Aspects

Business Intelligence

Large scale processing using Hadoop. Ján Vaňo

G-CLOUD 7 - VIRTUAL ASSET MANAGER (VAM) SPECIALIST CLOUD SERVICES (SCS)

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

AWS IaaS Services. Methods Digital GCloud Service Definition

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

How To Turn Big Data Into An Insight

KPMG Advisory. Microsoft Dynamics CRM. Advisory, Design & Delivery Services. A KPMG Service for G-Cloud V. April 2014

Chapter 7. Using Hadoop Cluster and MapReduce

G-Cloud Service Definition. Performance Testing as a Service (PTaaS) SCS

Service Management and ICT Monitoring and Reporting Advisory and Implementation Services

Apache Hadoop: The Big Data Refinery

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

ediscovery Services from Quadrant - to enable more or better use of Cloud Services (Service Definition, G-Cloud CloudStore Services)

Big Data Support Services. Service Definition

MDM & ENTERPRISE MOBILITY SERVICE DESCRIPTION G-CLOUD 7 OCTOBER 3, 2015

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

A Survey on Big Data Concepts and Tools

Ubuntu and Hadoop: the perfect match

BIG DATA CHALLENGES AND PERSPECTIVES

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Hadoop implementation of MapReduce computational model. Ján Vaňo

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot.

Implement Hadoop jobs to extract business value from large and varied data sets

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Overview... Backup & Disaster Recovery... Quality Management...

HOW TO BUY FROM G-CLOUD AND CLOUDSTORE A GUIDE FOR BUYING ORGANISATIONS

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

ICT and Information Security Resources

The Next Wave of Data Management. Is Big Data The New Normal?

Big Data-Challenges and Opportunities

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Manifest for Big Data Pig, Hive & Jaql

The Rise of Industrial Big Data

WebFOCUS Cloud Express. The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions Ltd.

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Testing Big data is one of the biggest

From Wikipedia, the free encyclopedia

How to Enhance Traditional BI Architecture to Leverage Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

CSE-E5430 Scalable Cloud Computing Lecture 2

Transcription:

G-Cloud Service Definition Atos Big Data Strategy & Roadmap SCS

Atos Big Data Strategy & Roadmap SCS Atos Big Data Strategy & Roadmap Services combine the knowledge and expertise in the area of Big Data to support customers in defining the Big Data Strategy and Roadmap. Building a robust Big Data Strategy and Roadmap enables customers to drive better business performance by harnessing and deriving useful business insights from the vast amount of captured data, both structured and unstructured, which previously may not have been exploited for technological reasons. Successful Big Data Strategy and Roadmap align with the goals of the business enabling strategic, tactical, and operational decision-making. Atos believes that a Big Data Strategy and Roadmap should include consideration for a wide set of processes, technologies, and stakeholders in order to access, collect, and analyse information. The final version of Big Data Strategy should be constantly tuned and adjusted to reflect the needs of your business aligning with your overall IT strategy and Business goals. The amount of data in the present-day world has exploded. Today we get information not only from traditional (structured) data but also from a variety of semi-structured or unstructured sources, both internal and external to your business. Analysing these large data sets, identifying problems and recognising patterns is essential in order to achieve an advantage over your competitors. It also helps in the identification of new productivity, stimulate growth and innovation. The deployment of a Big Solution will help you to achieve the following business benefits: Collect more accurate and detailed performance information, expose variability and boost performance Unlock significant value by making information transparent and usable at much higher frequency Use collection of vast amounts of data and sophisticated predictive analysis techniques on that data to significantly improve management decisions Use collected data for basic low-frequency forecasting to high-frequency now-casting to adjust business levers just in time Create ever-narrower segmentation of customers and allowing to design much more precisely tailored products or services Improve the development of the next generation of products and services A framework, governance, methodology, systems, and technology to deliver improvements and value aligned with the business goals. Improved focus for the business on long-term planning Improved knowledge management, allowing the best use of information. Identification of the right combination of people, processes, and technology. 1

What is it? The service will help to define the Big Data IT strategy and we will provide a Big Data solution architecture and a high-level implementation roadmap to support that strategy. The service will assess how closely aligned your current IT strategy is with the requirements of Big Data, and identifies concrete steps to incorporating Big Data into your strategy. The focus of this service is to ensure that the Big Data architecture fits within your wider organisation landscape, ensuring that it is aligned with the existing Business Strategy, IT strategy, Information Management strategy and/or Data Warehouse strategy. Atos provide their architecture consulting and Big Data market and product expertise to deliver a Big Data architecture definition and an implementation roadmap of how to get to it ensuring that the suggested technical solution delivers a specific business use case or specific technical use case. This service is ideal for organisations keen to adopt Big Data, but not yet ready from a business or technical view point. It will provide the organisation with a roadmap towards the target architecture and transformation stages. Features of the service include: Analysis of business requirements and definition of Big Data solution to address them Assessing of business process impact Understanding what changes to the current IT landscape will be required Recommendations on skills and governance changes required for a Big Data solution Definition of the Big Data solution components Recommendations on technical products selection Delivery of a technical roadmap to move from current IT architecture to proposed Big Data architecture How this service can be used This service can be used in many scenarios, including: Undertaking a complete assessment of the current IT landscape - processes, technology, and people i.e. current users and user profiles. This will help identify any pain points which can be addressed in future strategy documents. Determining the best approach for users to access and consume information in the future (target operating model). This service will define how users will share information and knowledge in the future and the kind of collaborative environment that could be made available to them in order to make best use of the Big Data components available to them. The use of multiple technologies and techniques would be recommended if necessary. Defining the migration or transformation plan to bridge the gap between what the current landscape is, to what the future strategy dictates. This service will outline at a practical level high-level process issues, refresh rates for data, capacity planning, 2

backup, recovery, archive, workflow, and security and at a strategic level the service will outline how a Big Data initiative will help the business achieve strategic, tactical, and operational goals. Augmenting the team with trained and experienced architects and consultants with deep knowledge and experience of Big Data strategy. What makes us unique? The journey into the world of big data is driven by the promise of new insight of new and actionable intelligence revealed. For Atos, Big Data is the same but different. In many ways, it is a straight continuation of what Atos have been doing for years: helping our clients maximise the operational and business advantage gained from their digital assets. But it is different too. The scale of the data explosion, the growing emphasis on deriving value from unstructured data, and the massive increases in connectivity in the internet of things, all help to form a new data landscape. And in this new landscape, those who are best able to generate differentiating intelligence often in real-time will be the ones who achieve and sustain success. Atos offers relevant world-class competencies and customised solutions necessary to handle Big Data and get (more) value out of it. The Atos Big Data team will work with the customer to understand the current situation, and business requirements, and then agree the strategy, principles and solution in order to achieve the desired results. Atos Big Data Strategy & Roadmap service will focus on the following activities: Analysis of Business Requirements Validation and quantification of business goals, for example reviewing current data collection mechanisms and marketing opt-ins to ensure maximum customer contactability; using advanced analytics to enhance personalisation and content strategies; ensuring customer touch points are aligned to deliver consistent messaging; improving the ability to react to behaviours in real time as the consumer is engaging with the business, etc. Confirmation of the success criteria metrics Translation of business goals into specific Big Data requirements Assess Business Process Impact Analysis of business processes required for Big Data Assessment of changes needed in existing business processes Handling Changes to IT Landscape Impact assessment of the existing IT landscape 3

Analysis of data sources that will enable the Big Data strategy to establish whether data is already available or needs to be acquired internally or externally Assessment of the existing data quality, data linkage, data capture, data redundancy Assessment of Big Data hosting and deployment options Recommend People and Data Governance Requirements Evaluating Data Management policies and enhancements needed Recommendations on skills needed for Big Data solution implementation Define the Big Data solution architecture and Implementation Roadmap Provide recommendations on Product selection Deliver a technical roadmap to move from current IT architecture to proposed Big Data architecture A Big Data Strategy & Roadmap document with the following contents: Big Data requirements Big Data metrics Big Data business process design IT landscape gap analysis Skills matrix for handling Big Data implementation Big Data governance structure Architectural components and recommendations Product recommendations Big Data solution components & required infrastructure A high level roadmap for implementation of the Big Data solution 4

5

1. Contents 2. Introduction... 7 3. Service overview... 8 3.1 Services Activities... 8 3.2 Services Deliverables... 9 3.3 Inputs & Outputs... 9 3.4 Roles & Responsibilities... 10 3.5 Client Efforts... 10 4. Information Assurance... 11 5. Timelines... 12 6. Pricing... 13 6.1 Other Professional services... 13 6.2 Termination terms... 13 7. Service management... 14 8. Service constraints... 15 9. Service levels... 16 10. Financial recompense... 17 11. Ordering and invoicing process... 18 12. Termination terms... 19 12.1 By consumers (i.e. consumption)... 19 12.2 By the Supplier (removal of the G-Cloud Service)... 19 13. Technical requirements... 20 14. Trial service... 21 15. Abbreviations & Definitions... 22 6

2. Introduction Atos service will help to define the Big Data Strategy for your organisation, the Big Data Solution Architecture to support that strategy, and also a high-level Implementation Roadmap for the Big Data Solution. This service will assess how closely aligned your current IT strategy is with the requirements of Big Data, and identifies concrete steps to incorporating Big Data into your strategy. The focus of this Atos service is to ensure that the Big Data architecture fits within your wider organisation landscape, ensuring that it is aligned with the existing Business Strategy, IT strategy, Information Management strategy and/or Data Warehouse strategy. Atos provide their architecture consulting and Big Data market and product expertise to deliver a Big Data architecture definition and an implementation roadmap of how to get to it ensuring that the suggested technical solution delivers a specific business use case or specific technical use case. This service is ideal for organisations keen to adopt Big Data, but not yet ready from a business or technical view point, it will provided the organisation with a roadmap towards the target architecture and transformation stages. Features of the service include: Analysis of business requirements and definition of Big Data solution to address them Assessing of business process impact Understanding what changes to the current IT landscape will be required Recommendations on skills and governance changes required for a Big Data solution Definition of the Big Data solution components Recommendations on technical products selection Delivery of a technical roadmap to move from current IT architecture to proposed Big Data architecture. 7

3. Service overview 3.1 Services Activities Atos Big Data team will work with the customer to understand the current situation, and business requirements, and then agree the strategy, principles and solution in order to achieve the desired results. Atos Big Data Strategy & Roadmap service will focus on the following activities: Analysis of Business Requirements Validation and quantification of business goals, for example reviewing current data collection mechanisms and marketing opt-ins to ensure maximum customer contactability; using advanced analytics to enhance personalisation and content strategies; ensuring customer touch points are aligned to deliver consistent messaging; improving the ability to react to behaviours in real time as the consumer is engaging with the business, etc. Confirmation of the success criteria metrics Translation of business goals into specific Big Data requirements Assess Business Process Impact Analysis of business processes required for Big Data Assessment of changes needed in existing business processes Handling Changes to IT Landscape Impact assessment of the existing IT landscape Analysis of data sources that will enable the Big Data strategy to establish whether data is already available or needs to be acquired internally or externally Assessment of the existing data quality, data linkage, data capture, data redundancy Assessment of Big Data hosting and deployment options Recommend People and Data Governance Requirements Evaluating Data Management policies and enhancements needed Recommendations on skills needed for Big Data solution implementation Define the Big Data Solution Architecture and Implementation Roadmap Definition of a high-level Big Data solution architecture Deliver a technical roadmap to move from current IT architecture to proposed Big Data architecture Provide recommendations on Product selection 8

3.2 Services Deliverables The Big Data Strategy & Roadmap documentation will include the following contents: Big Data requirements Big Data metrics Big Data business process design IT landscape gap analysis Skills matrix for handling Big Data implementation Big Data governance structure Architectural components and recommendations Product recommendations Big Data solution components & required infrastructure A high level roadmap for implementation of the Big Data solution 3.3 Inputs & Outputs Inputs Outputs 1. Existing IT Strategy (Client) 2. Existing Architecture documentation (Client) 3. Existing Infrastructure documentation (Client) 4. As Is application architecture documents (Client) 5. As Is infrastructure architecture documents (Client) 6. Relevant data source documentation (Client) 1. Big Data Strategy & Roadmap Document (Atos) 2. Big Data Solution Architecture document (Big Data Template) (Atos) 3. Big Data Roadmap (Big Data Template) (Atos) 9

3.4 Roles & Responsibilities Roles Big Data Consultant Role Description / Responsibilities Lead the Big Data strategy activity and produce the Big Data strategy Document Client / Atos Big Data Architect Big Data Technical Specialist Head of business departments Head of IT/BI function Provide support to the Big Data Consultant on technical areas and documentation of the strategy Document. Lead the Big Data solution architecture and roadmap activity and produce the Big Data architecture and roadmap deliverable. Provide support to the Big Data Architect to produce the Big Data architecture and roadmap deliverable Identify individuals to be available during the strategy and architecture activity to answer questions and provide information Atos Business Implementers IT/BI Implementers Provide answers and information related to the Big Data strategy Provide answers, information and access to system (if required) relevant for the Big Data strategy, architecture and roadmap activity Client 3.5 Client Efforts The client will be responsible for ensuring the following: Ensuring that key individuals (client SME) are available for the period of the development of the Big Data strategy, solution architecture and implementation roadmap Providing Atos with all relevant existing As-Is strategy papers and technical documentation on data sources, architecture and infrastructure details 10

4. Information Assurance This product is currently available at Impact Level 0 (IL0). The service can be run at higher Impact Levels including IL2 and IL3. Atos has considerable experience of providing services at different levels of assurance. Atos currently has a number of products on G-Cloud that have received Pan Government Accreditation (PGA). Details can be found on the Cabinet Office website at: http://gcloud.civilservice.gov.uk/customer-zone/accreditation-status 11

5. Timelines The service is delivered according to the specific needs of each engagement. The timelines involved in the definition of the Big Data Strategy & Roadmap depends on the functional and organisational scope and the selected Big Data use case(s). The timelines assigned below are indicative and are dependent on the client s scope. Activity Business Requirements Assessment Business Process Impact Assessment Handling Changes to IT Landscape Data Management Policies Recommendations Big Data Architecture and Roadmap Timeline between 1 to 2 weeks (5 to 10 days) between 1 to 2 weeks (5 to 10 days) between 3 to 5 weeks (15 to 25 days) between 3 to 5 weeks (15 to 25 days) between 3 to 5 weeks (15 to 25 days) 12

6. Pricing The pricing shown below identifies the primary roles which are engaged in delivering this service. All prices exclude VAT. Resource Type Big Data Consultant NB: BC Grade 6 Big Data Architect NB: SI Practice Grade 6 Big Data Technical Specialist NB: SI Practice Grade 4 Travel and Accommodation Expenses Charge Rate (per day) 1,135 850 590 See below Standards for Consultancy Day Rate: Consultant s Working Day 8 hours exclusive of travel and lunch. Working Week Monday to Friday excluding national holidays Office Hours 09:00 17:00 Monday to Friday Travel and Subsistence Included in day rate within M25. Payable at the department s standard T&S rate outside the M25. Mileage As above Professional Indemnity Insurance included in day rate All pricing excludes VAT. 6.1 Other Professional services Available as per the Atos SFIA rate card 6.2 Termination terms Please refer directly to Atos standard Terms and conditions. 13

7. Service management Where this service is purchased free-standing, a project/service management wrapper will be built in to work plans and effort estimates. This will normally include status/progress reporting. 14

8. Service constraints Please see the sections on Technical Requirements & Consumer Responsibilities, below. 15

9. Service levels Atos provides suitably trained individuals to complete the tasks necessary for this service. They can work at the client site or remotely, depending on the need for access to: Client staff Client applications & network. Our standard working hours / days are 09:00 to 17:00 Monday to Friday, excluding public & regional holidays. 16

10. Financial recompense To minimise the cost to users, Atos does not provide service credits for use of the service. All Atos services are provided on a reasonable endeavours basis. Please refer to G Cloud terms and conditions In accordance with the guidance within the GPS G-Cloud Framework Terms and Conditions, the Customer may terminate the contract at any time, without cause, by giving at least thirty (30) Working Days prior notice in writing. The Call Off Contract terms and conditions and the Atos terms will define the circumstances where a refund of any pre-paid service charges may be available. 17

11. Ordering and invoicing process Ordering this product is a straightforward process. Please forward your requirements to the email address GCloud@atos.net Atos will prepare a quotation and agree that quotation with you, including any volume discounts that may be applicable. Once the quotation is agreed, Atos will issue the customer with the necessary documentation (as required by the G-Cloud Framework) and ask for the customer to provide Atos with a purchase order. Once received, the customer services will be configured to the requirements as per the original quotation. For new customers, additional new supplier forms may need to be completed. Invoices will be issued to the customer and Shared Services (quoting the purchase order number) for the services procured. On a monthly basis, Atos will also complete the mandated management information reports to Government Procurement Services detailing the spend that the customer has placed with us. Cabinet Office publish a summary of this monthly management information at: http://gcloud.civilservice.gov.uk/about/sales-information/ 18

12. Termination terms 12.1 By consumers (i.e. consumption) Termination shall be in accordance with: The G-Cloud Framework terms and conditions Any terms agreed within the Call Off Contract under section 10.2 of the Order Form (termination without cause) where the Government Procurement Service (GPS) guidance states At least thirty (30) Working Days in accordance with Clause CO-9.2 of the Call-Off Contract Atos Supplier Terms for this Service as listed on the G-Cloud CloudStore. For this specific service, by default Atos ask for at least thirty (30) Working Days prior written notice of termination as per the guidance within the GPS G-Cloud Framework Terms and Conditions. 12.2 By the Supplier (removal of the G-Cloud Service) Atos commits to continue to provide the service for the duration of the Call Off Contract subject to the terms and conditions of the G-Cloud Framework and Atos Supplier Terms. 19

13. Technical requirements Efficient project delivery during the requirements management & specification stages - usually requires consideration of access to: The Atos network (usually via the Internet) The Internet/WWW Any shared project areas (e.g. on client or Atos network or cloud-based) Development and test environments Client legacy systems. 20

14. Trial service This service is not available on a trial basis. 21

15. Abbreviations & Definitions Abbreviation / term Big Data Exabyte Hadoop Version Big Data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis. Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data. A primary goal for looking at big data is to discover repeatable business patterns. It s generally accepted that unstructured data, most of it located in text files, accounts for at least 80% of an organization s data. If left unmanaged, the sheer volume of unstructured data that s generated each year within an enterprise can be costly in terms of storage. Unmanaged data can also pose a liability if information cannot be located in the event of a compliance audit or lawsuit. Big data analytics is often associated with cloud computing because the analysis of large data sets in real-time requires a framework like MapReduce to distribute the work among tens, hundreds or even thousands of computers. An exabyte (EB) is a large unit of computer data storage, two to the sixtieth power bytes. The prefix exa means one billion, or one quintillion, which is a decimal term. Two to the sixtieth power is actually 1,152,921,504,606,846,976 bytes in decimal, or somewhat over a quintillion (or ten to the eighteenth power) bytes. It is common to say that an exabyte is approximately one quintillion bytes. In decimal terms, an exabyte is a billion gigabytes. An exabyte of storage could contain 50,000 years' worth of DVD-quality video Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the 22

Abbreviation / term MapReduce Petabyte Unstructured Data Version cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper. The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X. MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004. The framework is divided into two parts: Map a function that parcels out work to different nodes in the distributed cluster, and Reduce another function that collates the work and resolves the results into a single value. The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes. MapReduce is important because it allows ordinary developers to use MapReduce library routines to create parallel programs without having to worry about programming for intracluster communication, task monitoring or failure handling. It is useful for tasks such as data mining, log file analysis, financial analysis and scientific simulations. Several implementations of MapReduce are available in a variety of programming languages, including Java, C++, Python, Perl, Ruby, and C. A petabyte is a measure of memory or storage capacity and is 2 to the 50th power bytes or, in decimal, approximately a thousand terabytes. In recently announcing how many Fibre Channel storage arrays they had sold, Sun Microsystems stated that it had shipped an aggregate of two petabytes of storage or the equivalent of 40 million four-drawer filing cabinets full of text. IBM says that it has shipped four petabytes of SSA Storage. Unstructured data is a generic label for describing any corporate information that is not in a database. Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, 23

Abbreviation / term Version collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files. If left unmanaged, the sheer volume of unstructured data that s generated each year within an enterprise can be costly in terms of storage. Unmanaged data can also pose a liability if information cannot be located in the event of a compliance audit or lawsuit. The information contained in unstructured data is not always easy to locate. It requires that data in both electronic and hard copy documents and other media be scanned so a search application can parse out concepts based on words used in specific contexts. This is called semantic search. It is also referred to as enterprise search. In customer-facing businesses, the information contained in unstructured data can be analysed to improve customer relationship management and relationship marketing. As social media applications like Twitter and Facebook go mainstream, the growth of unstructured data is expected to far outpace the growth of structured data. According to the "IDC Enterprise Disk Storage Consumption Model" report released in Fall 2009, while transactional data is projected to grow at a compound annual growth rate (CAGR) of 21.8%, it's far outpaced by a 61.7% CAGR prediction for unstructured data. 24

25