Actian Vortex Express 3.0



Similar documents
Actian Analytics Platform Express Hadoop SQL Edition 2.0

CA Cloud Service Delivery Platform

CA APM Cloud Monitor. Scripting Guide. Release 8.2

BrightStor ARCserve Backup for Linux

CA Cloud Service Delivery Platform

CA VPN Client. User Guide for Windows

Unicenter NSM Integration for BMC Remedy. User Guide

CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series

CA Nimsoft Monitor. Probe Guide for Performance Collector. perfmon v1.5 series

CA Change Manager Enterprise Workbench r12

CA Nimsoft Service Desk

CA Spectrum and CA Embedded Entitlements Manager

CA Nimsoft Monitor. Probe Guide for CA ServiceDesk Gateway. casdgtw v2.4 series

CA Nimsoft Monitor. Probe Guide for Cloud Monitoring Gateway. cuegtw v1.0 series

CA Cloud Service Delivery Platform

Arcserve Cloud. Arcserve Cloud Getting Started Guide

CA Unified Infrastructure Management Server

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series

Upgrade Guide. CA Application Delivery Analysis 10.1

Intuit Field Service Management ES

CA Nimsoft Monitor. Probe Guide for Microsoft Exchange Server Response Monitoring. ews_response v1.1 series

CA Mobile Device Management 2014 Q1 Getting Started

CA NetQoS Performance Center

CA Mobile Device Management. How to Create Custom-Signed CA MDM Client App

CA Spectrum and CA Service Desk

CA Technologies SiteMinder

Nimsoft Monitor. dns_response Guide. v1.6 series

CA Nimsoft Monitor. Probe Guide for iseries System Statistics Monitoring. sysstat v1.1 series

How To Install Caarcserve Backup Patch Manager (Carcserver) On A Pc Or Mac Or Mac (Or Mac)

BrightStor ARCserve Backup for Windows

CA Workload Automation Agent for Databases

CA Nimsoft Monitor. Probe Guide for Lotus Notes Server Monitoring. notes_server v1.5 series

CA Performance Center

Mobile Time Manager. Release 1.2.1

etrust Audit Using the Recorder for Check Point FireWall-1 1.5

CA ARCserve Backup for Windows

CA Nimsoft Monitor. Probe Guide for Java Virtual Machine Monitoring. jvm_monitor v1.4 series

CA ARCserve Backup for Windows

CA Spectrum. Microsoft MOM and SCOM Integration Guide. Release 9.4

Chapter 1: How to Configure Certificate-Based Authentication

CA Nimsoft Monitor. Probe Guide for DNS Response Monitoring. dns_response v1.6 series

BrightStor ARCserve Backup for Windows

CA Nimsoft Monitor Snap

CA Service Desk Manager - Mobile Enabler 2.0

CA ARCserve Backup for Windows

CA Identity Manager. Glossary. r12.5 SP8

CA Clarity PPM. Connector for Microsoft SharePoint Product Guide. Service Pack

CA Workload Automation Agent for Microsoft SQL Server

Connector for CA Unicenter Asset Portfolio Management Product Guide - On Premise. Service Pack

CA Nimsoft Unified Management Portal

CA Nimsoft Monitor. Probe Guide for E2E Application Response Monitoring. e2e_appmon v2.2 series

CA Process Automation

BrightStor ARCserve Backup for Windows

CA Clarity PPM. Connector for Microsoft SharePoint Release Notes. v2.0.00

CA Clarity Project & Portfolio Manager

CA Unified Infrastructure Management

CA Unified Infrastructure Management

Unicenter TCPaccess FTP Server

How to Deploy Models using Statistica SVB Nodes

CA Cloud Storage for System z

Unicenter Service Desk

CA Nimsoft Service Desk. Compatibility Matrix

CA SiteMinder. Web Agent Installation Guide for IIS. r12.5

CA SiteMinder. Web Agent Installation Guide for IIS 12.51

Intuit Field Service Management ES. Self Configuration Quick Start. User Guide

By the Citrix Publications Department. Citrix Systems, Inc.

Chapter 1: How to Register a UNIX Host in a One-Way Trust Domain Environment 3

CA XOsoft Replication for Windows

Advantage Joe. Deployment Guide for WebLogic v8.1 Application Server

Unicenter Patch Management

Veritas Cluster Server Database Agent for Microsoft SQL Configuration Guide

SolarWinds Migrating SolarWinds NPM Technical Reference

CA Single Sign-On r12.x (CA SiteMinder) Implementation Proven Professional Exam

Open Items Analytics Dashboard System Configuration

CA arcserve Unified Data Protection Agent for Linux

CA Clarity Project & Portfolio Manager

CA Nimsoft Monitor. Probe Guide for Apache HTTP Server Monitoring. apache v1.5 series

Citrix Systems, Inc.

CA Performance Center

Shavlik Patch for Microsoft System Center

CA Nimsoft Monitor. Probe Guide for Internet Control Message Protocol Ping. icmp v1.1 series

CA Nimsoft Monitor. Probe Guide for Sharepoint. sharepoint v1.6 series

SAP Business Intelligence Suite Patch 10.x Update Guide

Configuring IBM Cognos Controller 8 to use Single Sign- On

CA Unified Infrastructure Management

CA SiteMinder. Upgrade Guide. r12.0 SP2

Intuit Field Service Management. Interacting with the Dispatcher User Guide. Interacting with the Dispatcher -- User Guide 1

Dell One Identity Cloud Access Manager How to Configure vworkspace Integration

4.0. Offline Folder Wizard. User Guide

How To Load Data Into An Org Database Cloud Service - Multitenant Edition

CA Desktop Migration Manager

Qlik Sense Cloud. Qlik Sense Copyright QlikTech International AB. All rights reserved.

The cloud server setup program installs the cloud server application, Apache Tomcat, Java Runtime Environment, and PostgreSQL.

QUICK START. GO-Global Cloud 4.1 SETTING UP A LINUX CLOUD SERVER AND HOST INSTALL THE CLOUD SERVER ON LINUX

CA Process Automation

CA SiteMinder. Directory Configuration - OpenLDAP. r6.0 SP6

CA SMF Director. Release Notes. Release

CA Clarity PPM. Business Objects Universe Developer Guide. v

Set Up Hortonworks Hadoop with SQL Anywhere

Bentley CONNECT Dynamic Rights Management Service

Transcription:

Actian Vortex Express 3.0 Quick Start Guide AH-3-QS-09

This Documentation is for the end user's informational purposes only and may be subject to change or withdrawal by Actian Corporation ("Actian") at any time. This Documentation is the proprietary information of Actian and is protected by the copyright laws of the United States and international treaties. It is not distributed under a GPL license. You may make printed or electronic copies of this Documentation provided that such copies are for your own internal use and all Actian copyright notices and legends are affixed to each reproduced copy. You may publish or distribute this document, in whole or in part, so long as the document remains unchanged and is disseminated with the applicable Actian software. Any such publication or distribution must be in the same manner and medium as that used by Actian, e.g., electronic download via website with the software or on a CD- ROM. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Actian. To the extent permitted by applicable law, ACTIAN PROVIDES THIS DOCUMENTATION "AS IS" WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IN NO EVENT WILL ACTIAN BE LIABLE TO THE END USER OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USER OF THIS DOCUMENTATION, INCLUDING WITHOUT LIMITATION, LOST PROFITS, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF ACTIAN IS EXPRESSLY ADVISED OF SUCH LOSS OR DAMAGE. The manufacturer of this Documentation is Actian Corporation. For government users, the Documentation is delivered with "Restricted Rights" as set forth in 48 C.F.R. Section 12.212, 48 C.F.R. Sections 52.227-19(c)(1) and (2) or DFARS Section 252.227-7013 or applicable successor provisions. Copyright 2015 Actian Corporation. All Rights Reserved. Actian, Actian DataFlow, Actian Director, Actian Vector, Actian Vector Express, Actian Vector ExpressPlus, Actian Vector in Hadoop, Actian Vortex Express, Actian Vortex ExpressPlus, Action Server, Cloud Action Platform, Cloud Action Server, EDBC, Enterprise Access, Ingres, OpenROAD, and Vectorwise are trademarks or registered trademarks of Actian Corporation. All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.

Contents Quick Start Demo 5 Introduction... 5 Use Case... 5 Prerequisites... 5 Remove KNIME Public Server (MapR Installations Only)... 5 Run the Demo... 5 Run the Workflow on a Cluster as a YARN Job... 8 Summary... 10 Next Steps... 10 Contents iii

Quick Start Demo Introduction This Quick Start Guide shows you how to run a simple, pre-built, end-to-end workflow that demonstrates the basic principles, components, and power of the Actian Analytics Platform. The demonstration workflow has been pre-configured to run immediately after installation with minimal additional configuration. Use Case This simple demonstration features the ability of the Actian Analytics Platform to connect to more than one data source, join the sources together, create a new field using the expression builder, persist the results into Actian Vector, and interact with those results through Actian Director. In this case we are exploring customer churn across two telecommunications data sets: one shows customer demographics and call log history, and the other contains geospatial (area code) information. The data allows the telecommunications provider to identify which customers have churned (changed carriers) and explore the characteristics and geographic regions where churn was the highest. Prerequisites Actian Vortex Express 3.0 must be installed to run the demo. Remove KNIME Public Server (MapR Installations Only) To use DataFlow extensions to KNIME with MapR, you must remove the KNIME Public Server Access software. For instructions, see Integrating DataFlow with MapR (http://help.pervasive.com/display/df651/integrating+dataflow+with+mapr). Run the Demo To run the demo, you must connect as user actian to the Linux master node on which you installed Vortex Express. Quick Start Demo 5

Run the Demo Start DataFlow (KNIME) 1. As user "actian", start Actian DataFlow (KNIME) on the Linux master node on which you installed Vortex Express. The password for user actian is the one you chose during installation. Start DataFlow by running the following command: knime 2. If prompted for a workspace location, accept the default option. The Welcome to KNIME dialog is displayed. 3. Select Open KNIME workbench (if this is the first time you have started KNIME). The workbench is loaded. Run the workflow 1. Open the pre-built Churn_Quick_Start workflow: Expand the LOCAL workspace located in the KNIME Explorer and double-click the Churn_Quick_Start workflow. KNIME loads the workflow. 6 Quick Start Guide

Run the Demo 2. Click the Execute All button on the toolbar. The Actian DataFlow workflow executes the following steps: a. In parallel, reads 10,000 rows of customer relationship management (CRM) data from a CSV file and combines them with 1,000,000 rows of Customer Geospatial data contained in a separate CSV file. b. Joins the two sources of information based on the common customer ID field. c. Derives a new field called Total Call Minutes based on the existing source fields, "calls" and "mins". d. Stores the resulting information in an Actian Vector in Hadoop table called tbl_churn_quick_start. Note: To rerun the workflow, you need to reset the nodes in the workflow. To reset a node: Right-click the node and select Reset. To reset all the nodes: Right-click the last node in the workflow (Load Actian Vector On Hadoop) and select Reset. The node status indicator turns from green to amber to indicate that it has been reset. To run the workflow again: Click Execute All. 3. Use Actian Director to view the resulting data stored in the Actian Vector table tbl_churn_quick_start. Actian Director lets you do a variety of tasks, such as creating and administering databases, running queries, and configuring security. a. Start Actian Director. You should be connected as user actian. On the command line, enter director. Director is started. The Actian Vector in Hadoop AH instance is displayed in the Instance Explorer. b. Connect to the Actian Vector in Hadoop AH instance by right-clicking the instance and selecting Connect. The Connect to Instance dialog is displayed. c. Enter the following credentials, and then click Connect: Authentication: Authenticated User Login: demo Password: hsedemo d. In the Instance Explorer, expand the Actian Vector in Hadoop AH instance and select Databases, sample, Tables, demo.tbl_churn_quick_start. Quick Start Demo 7

Run the Workflow on a Cluster as a YARN Job e. Right-click demo.tbl_churn_quick_start table and select Select First 1000 Rows. The query and its results are displayed. Note the last three columns: churn Identifies whether this customer has churned (changed carriers) areacode This field originated from the geospatial data we connected to at the beginning of the workflow. total_mins This is the new field derived automatically in the workflow. Run the Workflow on a Cluster as a YARN Job Adapting your workflow to your business needs can make the workflow more complicated and use larger amounts of data. DataFlow lets you execute your workflow as a distributed YARN job over your HDFS cluster so that it can scale out to use resources across the cluster. This is achieved through the DataFlow Cluster Manager, which is installed and set up as part of the Vortex Express installation. The process for enabling the workflow to execute over a cluster is as follows: 1. Create an execution profile. 2. Configure the workflow to use the profile. 3. Run the workflow. To create an execution profile 1. In KNIME, click File, Preferences, Actian. Existing profiles are shown in the Profiles section. You will create a cluster profile. 2. Click Add and enter "cluster" as the name of the new profile. A new profile called "cluster" is created and is automatically selected. 8 Quick Start Guide

Run the Workflow on a Cluster as a YARN Job 3. Click on the value for Execute in cluster and change it from false to true. 4. Click on the Cluster URL value and change it to yarn://<host-name-of-the-clustermanager>:47000. The Cluster Manager runs on the same node as the HDFS NameNode. For MapR, the Cluster Manager runs on the Vector in Hadoop master node. To get the name of the NameNode for non-mapr distributions, run the command hdfs getconf -confkey fs.defaultfs, which returns a string in the format: hdfs://<hdfs-namenode>:8020. 5. Click OK to accept the changes. The profile is configured to point to the DataFlow Cluster Manager. To configure the workflow to use the cluster execution profile 1. Right-click the Churn_Quick_Start_Vortex workflow in KNIME Explorer and select Configure. The Configure dialog is displayed. 2. Select the Job Manager Selection tab. In the Profile drop-down, choose the cluster profile that you have just created. The workflow is now configured to execute in a cluster. To run the workflow 1. Click Execute All on the toolbar (or select Node, Execute All). The workflow is executed across the cluster. Note: Executing the workflow over YARN requires YARN containers to be created. The creation of these containers can take time and you may find that your workflow seems to execute slower now than it did earlier. This startup time is usually a fixed duration and not tied to the size of the data that will be processed in the workflow. The DataFlow Cluster Manager also offers a Web UI console that you can log into and look at the YARN job execution details. To look at YARN job execution details 1. Point your browser to the URL http://<hdfs-namenode>:47100 and log in as user "root" and password "changeit". Note: On MapR, the URL is http://<vector-in-hadoop-master-node>:47100. The Vector in Hadoop master node is the node where you ran the Vortex Express installer. 2. Click Recent Jobs under the cluster monitoring section. The Churn_Quick_Start_Vortex application that you just executed is listed. 3. Click the Churn_Quick_Start_Vortex. Details are displayed such as how many YARN containers were used to run the job and which hosts the containers ran on. Quick Start Demo 9

Summary Summary In this Quick Start Guide, you were introduced to Actian s KNIME integration that featured a pre-built workflow that let you explore a simple telecommunications churn dataset. The executed workflow ran using Actian s embedded DataFlow engine that transparently distributed the data and processing across multiple cores on the nodes of your Hadoop cluster, and loaded the transformed data in parallel into the Vector in Hadoop database. Actian Vector in Hadoop is a distributed columnar database that leverages the Hadoop Distributed File System (HDFS) and MapR file system (MapR-FS) for storage. It uses a proven and an ANSI standard compliant SQL engine that performs native SQL processing of data in the distributed file system and can be used for efficient large-scale data warehousing, data mining, and reporting. It has rich SQL language support, an advanced query optimizer, support for trickle updates, and has been certified for use with the most popular BI tools. Vector guides can be accessed at esd.actian.com (http://esd.actian.com) or docs.actian.com (http://docs.actian.com/). Using KNIME is just one of the ways in which you can leverage DataFlow technology. DataFlow comes with rich Java and RushScript (based on JavaScript) APIs that offer you more programmatic control over your workflows. If you are interested in looking at the DataFlow API, look at DataFlow API Usage (http://help.pervasive.com/display/df651/dataflow+api+usage) for Java and at Using RushScript (http://help.pervasive.com/display/df651/using+rushscript) for RushScript. General help and overview on DataFlow can be found at Actian DataFlow 6.5.1 Help (http://help.pervasive.com/display/df651/actian+dataflow+6.5.1+help). Using DataFlow (either through KNIME, Java, or RushScript), you are able to pull in data from a variety of sources flat files, various database systems, S3, HDFS, and more process it according to your requirements, and optionally store the results into Vector in Hadoop for fast analytical reporting. DataFlow will transparently scale up and scale out (when using a cluster), which lets you stay focused on tweaking your analytical workflows and algorithms. Although this is a simple example with limited data and a simplified workflow for demonstration purposes, it represents a model for how customers are using the Actian Analytics Platform to solve real-world big data business problems. For more comprehensive workflow examples and real-world solution blueprints, visit the Actian Clear Path Program (http://www.actian.com/solutions/customer-analytics/). Next Steps As a next step, we recommend reading the Tutorial guide, which provides lessons on how to create a workflow from scratch using the drag and drop interface, and deploy it. It also includes lessons on how to connect to Actian Vector using JDBC. To proceed, access the Tutorial at http://esd.actian.com/express/vortex/3.0/tutorial_vortex.pdf (http://esd.actian.com/express/vortex/3.0/tutorial_vortex.pdf). 10 Quick Start Guide