Cisco Data Preparation



Similar documents
Platfora Big Data Analytics

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Data Virtualization Overview

Databricks. A Primer

Databricks. A Primer

Ten Things You Need to Know About Data Virtualization

SAP Agile Data Preparation

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

BUSINESSOBJECTS DATA INTEGRATOR

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Cisco Business Intelligence Appliance for SAP

BUSINESSOBJECTS DATA INTEGRATOR

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Dell* In-Memory Appliance for Cloudera* Enterprise

Safe Harbor Statement

Get More Scalability and Flexibility for Big Data

Cisco Cloud Architecture for the Microsoft Cloud Platform

Big Data for Investment Research Management

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Cisco Unified Communications and Collaboration technology is changing the way we go about the business of the University.

How To Handle Big Data With A Data Scientist

Cisco Intercloud Fabric for Business

Tagetik Extends Customer Value with SQL Server 2012

Oracle Big Data Building A Big Data Management System

<no narration for this slide>

Unicenter Desktop DNA r11

Cisco Solutions for Big Data and Analytics

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Oracle Big Data Discovery The Visual Face of Hadoop

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Reimagining Business with SAP HANA Cloud Platform for the Internet of Things

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

From Spark to Ignition:

IBM BigInsights for Apache Hadoop

Microsoft Big Data. Solution Brief

Accelerating the path to SAP BW powered by SAP HANA

Cisco Unified Data Center: The Foundation for Private Cloud Infrastructure

Interactive data analytics drive insights

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Using Tableau Software with Hortonworks Data Platform

Unleash your intuition

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Extend your analytic capabilities with SAP Predictive Analysis

MANAGEMENT AND ORCHESTRATION WORKFLOW AUTOMATION FOR VBLOCK INFRASTRUCTURE PLATFORMS

7 things to ask when upgrading your ERP solution

IBM Cognos Performance Management Solutions for Oracle

Izenda & SQL Server Reporting Services

Microsoft Analytics Platform System. Solution Brief

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Big Data and Natural Language: Extracting Insight From Text

The IBM Cognos Platform

G-Cloud Framework. Service Definition. Oracle Fusion Middleware Design and Implementation

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

MICROSOFT DYNAMICS CRM Vision. Statement of Direction. Update: May, 2011

The Future of Data Management

HadoopTM Analytics DDN

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Oracle Hyperion Planning

Informatica Data Quality Product Family

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Dell s SAP HANA Appliance

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Cost-Effective Business Intelligence with Red Hat and Open Source

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

Organizations that are standardizing today are enjoying lower management costs, better uptime. INTRODUCTION

IBM Cognos Business Intelligence Scorecarding

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

SOLUTION BRIEF. Increase Business Agility with the Right Information, When and Where It s Needed. SAP BusinessObjects Business Intelligence Platform

Hadoop & SAS Data Loader for Hadoop

Introducing Oracle Exalytics In-Memory Machine

Fast, Low-Overhead Encryption for Apache Hadoop*

WHAT S NEW IN SAS 9.4

How Customers Are Cutting Costs and Building Value with Microsoft Virtualization

Integrating a Big Data Platform into Government:

Qlik Consulting helps you accelerate time to value, mitigate risk, and achieve better ROI 1/35

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Cisco Cloud Enablement Services for Adopting Clouds

6 Steps to Faster Data Blending Using Your Data Warehouse

Accenture and SAP: Delivering Visual Data Discovery Solutions for Agility and Trust at Scale

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Redefining Infrastructure Management for Today s Application Economy

Microsoft Private Cloud Fast Track Reference Architecture

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Sisense. Product Highlights.

Cisco OverDrive Network Hypervisor 4.0

IBM Tivoli Netcool network management solutions for enterprise

Unified Batch & Stream Processing Platform

Transcription:

Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and analytics become the model, getting the data you need ready is the biggest challenge in any analytical exercise. Gathering the data, identifying duplicate data or blank fields, fixing misspellings, splitting columns, and adding data are difficult and time-consuming tasks. Time wasted on data preparation is time that could be spent on analysis. Cisco Data Preparation is a self-service application that makes it easy for nontechnical business analysts to gather, explore, cleanse, combine and enrich the data that fuels analytics tools like Excel, Tableau, Qlik, SAS, and more. Cisco Data Preparation: Is a comprehensive data preparation solution that provides all essential data preparation functions from any data source, to any analytic or BI tools, with built-in goverance. Works the way business analysts work, allowing data exploration with immediate feedback in an experience similar to Excel, without coding or scripting. Automates the difficult, time-consuming work required by proactively guiding actions via intelligence that improves based on use. Legacy extract, load, and transform (ELT) processes are slow and put additional burden on an already-backlogged IT department. Complex tools require expertise, and basic tools like Excel lack features and don t scale. Cisco Data Preparation lets business analysts get answers faster, provides more comprehensive insights, and delivers better business outcomes across hundreds of projects and thousands of users. Features Cisco Data Preparation is a self-service application that makes it easy for non-technical business analysts to gather, explore, cleanse, combine, and enrich the raw data that fuels analytics. Table 1 lists the features of Cisco Data Preparation. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 5

Table 1. Feature Data Explore Clean Shape Enrich Combine Publish Data Preparation Features Description Gather data regardless of where it comes from (Hadoop Distributed File System (HDFS), relational databases, Excel, flat files) Find quality issues by engaging in ad hoc interactive exploration with full text search, interactive text and numeric filters and histograms, and visual data quality heat maps that highlight patterns, errors, duplicates, and sparse or missing data. One of the fastest ways to do this is with the aggregation feature. Runs a set of sophisticated algorithms across specific sections of the data or across entire data sets. Without any coding or scripting, Data Preparation then highlights inconsistencies, gaps, and duplicate data so that analysts can fill in blanks, remove or rename duplicates, fix inconsistent capitalization, and other tasks needed to improve the data. Pivot or de-pivot data in a single click; quickly split columns and create aggregations to make the data sets more suitable for the required analytic exercise. Provides the contexts needed for an analytic exercise. For example, industry data, appended 5-digit zip codes with +4, or additional information from third-party data providers can be included. Combines data using machine learning. Automatically detects common attributes across multiple data sets, and then provides best-match options to the analyst, who chooses which combination to use for their analytic needs. With one click, analysts can assemble multiple data sets into a single answer set, and then merge multiple overlapping entity references into de-duplicated, trusted entities without any scripting, SQL, or complex Excel functionality like VLOOKUPS, pivot tables, and macros. Makes answer sets available directly through ODBC LiveQuery to Qlik, Tableau, Excel, and any other ODBC-compliant analytics tools or applications. Data Preparation with Data Virtualization Combining Cisco Data Preparation and Cisco Data Virtualization accelerates time-to-analytic solutions. Selfservice data preparation tools, combined with IT-curated data access using Data Virtualziation, provide your business with the data and agility you need. This closed-loop data management process aligns business and IT; business gets the data and agility they need, and IT delivers on the governance, scalability, and control they require. Table 2 lists the Data Virtualization integration areas. Table 2. Data Virtualization Integration Areas Integration Area Data Virtualization data sources Data Virtualization deployments Description Find Data in Business Directory Connect to business directory, which contains curated data from one or more instances of Cisco Information Server. The curated data has been vetted by IT, annotated by endusers, and gains value from repeated use. Find Data in Cisco Information Server (CIS) Connect to CIS and gain access to a broader range of virtualized data. The virtualized data is integrated from numerous sources, ranging from databases to packaged apps to cloud sources. Ingest Data via Cisco Information Server Use CIS to load Data Preparation. Integrate any combination of virtual and physical data quickly and load into Data Preparation to refine data for analytics. Promote answer sets into CIS Data sets prepared by business users in Data Preparation can be further operationalized to CIS and business directory. This allows wider adoption and consumption of Data Preparation output. Technology Cisco Data Preparation runs on an enterprise-scale platform built on Hadoop and powered by Spark. It is built on a four-layer architecture designed for interactive, self-service data preparation at scale. (See Figure 1.) 1. User interface layer: Analysts quickly learn and enjoy using the Data Preparation s visually dynamic, multi-user interface designed using HTML5 and web socket technology, making it an interactive and intuitive application. 2. Web services: A lightweight Java layer translates and mediates actions from the user interface into commands to the underlying platform layer. This layer processes critical capabilities for rules for tenants, users, projects, and cell-level modifications, creating a comprehensive governance foundation. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 5

3. Engines: Enabled by proprietary machine learning, latent semantic indexing, statistical pattern recognition, and text analytics techniques. The first engine has parallel in-memory pipelined capabilities that vastly acceleratemany of the mundane data prepration functions. The second engine leverages Spark, and operates over a large variety and volumes of structured and unstructured data in real-time, enabling Cisco Data Preparation to scale to billions of rows. 4. File management and storage: Provides a cost effective data management environment. Data sets are stored and accessed through the library, which resides on top of HDFS. Figure 1. Cisco Data Preparation Architecture Data Preparation on Cisco UCS Big Data Infrastructure Data Preparation installed on Cisco UCS scales without limits by taking advantage of Cisco s high-performance and easy-to-manage big data infrastructure. Cisco UCS provides a radically simplified architecture with embedded management that makes it easy to scale as your requirements evolve to solve larger problems and explore more complex scenarios. It also reduces your total cost of ownership (TCO) by requiring fewer infrastructure components and reducing operating expenses associated with staff resources. Together you can solve complex analytical problems, improve business performance, and mitigate risk rapidly and confidently. The recommended configuration for the Cisco Data Preparation platform deployment is based on Cisco UCS C220-M4/C240 M4, with: Two Intel Xeon E5-2680 v3 processors 256GB RAM 10K RPM SAS HDD or SSD drives, which work with an external Hadoop cluster for data storage Table 3 lists the benefits of using Cisco UCS. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 5

Table 3. Cisco UCS Highlights Highlights Reliable scalability Reduced TCO and improved staff efficiency Data preparation on Cisco UCS Benefits Cisco Unified Computing System delivers reliable scalability of hardware and management to increase business agility, operational efficiency, and help you rapidly respond to changing business requirements. This simplified, intelligent infrastructure reduces your TCO with fewer management points, switches, adapters, cables, and power and cooling components. Cisco Data Preparation on Cisco UCS streamlines customers ability to prepare their data for analytics at scale, and can be seamlessly integrated into existing enterprise applications environments. Service and Support Cisco Services help you gain better visibility, better information, and better understanding to fuel performance, efficiency, and innovation from your software purchases. Cisco Services span three phases of lifecycle management: plan, build, and manage. In the plan phase, Cisco assists you to develop your Cisco Data Preparation architecture strategy and transformational roadmap in alignment with your business requirements. In the build phase, Cisco works with you to validate that the Data Preparation solution you designed are ready for your production and then implements, integrates, or migrates new solutions and applications. In the manage phase, Cisco assists you to optimize your infrastructure, applications, and service management approach, and monitors and manages your Data Preparation deployment. Technical support is part of services provided during the manage phase, which delivers around-the-clock Data Preparation product support from Cisco s Technical Assistance Center (TAC). It also provides timely, uninterrupted access to Cisco s latest software application updates, including major upgrade releases that might include new features and functionality System Requirements Cisco Data Preparation has the following system requirements: Operating System 64-bit (x64) operating system CentOS Linux, v6.4 and 6.5 for development and testing Software JDK 7 version 1.7 update 67 Spark 1.3 (prebuilt for CDH 5) Cloudera Distribution of Hadoop (CDH) 4.7 and 5.4 Apache Spark 1.3 (prebuilt for CDH 5) Others Cisco Information Server 7.0.2 or later Ordering Information Cisco Data Preparation is available for ordering. Table 4 lists the product identifiers required for ordering. To place an order, contact your Cisco account representative. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 5

Table 4. PID CDP-P-T CDP-P-1Y CDP-P-2Y CDP-P-3Y Ordering Information Product Description Data Prep per core term Data Prep per core term 1 yr Data Prep per core term 2 yr Data Prep per core term 2 yr For More Information For more information about Cisco Data Preparation, contact your Cisco account representative. Printed in USA 09/15 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 5 of 5