Lesson 5 Build Transformations

Similar documents
Lesson 7 Pentaho MapReduce

Getting Started with Pentaho Data Integration

owncloud Configuration and Usage Guide

Working with SQL Server Integration Services

How to Use JCWHosting Reseller Cloud Storage Solution

Configuration Guide. Remote Backups How-To Guide. Overview

Adobe Summit 2015 Lab 718: Managing Mobile Apps: A PhoneGap Enterprise Introduction for Marketers

Results CRM 2012 User Manual

Learn About Analysis, Interactive Reports, and Dashboards

IIS, FTP Server and Windows

Affiliation Security

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Installation Guidelines (MySQL database & Archivists Toolkit client)

E-Map Application CHAPTER. The E-Map Editor

Configuring the SST DeviceNet OPC Server

Visual Studio.NET Database Projects

Instructions for Configuring a SAS Metadata Server for Use with JMP Clinical

Crystal Reports Installation Guide

Install FileZilla Client. Connecting to an FTP server

CloudBerry Dedup Server

Outlook Profile Setup Guide Exchange 2010 Quick Start and Detailed Instructions

eadvantage Certificate Enrollment Procedures

USING STUFFIT DELUXE THE STUFFIT START PAGE CREATING ARCHIVES (COMPRESSED FILES)

DevInfo 7 Web Installer Guide

Video Administration Backup and Restore Procedures

BID2WIN Workshop. Advanced Report Writing

The basic steps involved in installing FLEETMATE Enterprise Edition and preparing it for initial use are as follows:

UF Health SharePoint 2010 Document Libraries

Importing Contacts to Outlook

Utilities ComCash

MicroStrategy Desktop

Print Audit 6 - SQL Server 2005 Express Edition

User Guide. Version 3.2. Copyright Snow Software AB. All rights reserved.

USING MS OUTLOOK. Microsoft Outlook

Crystal Reports Payroll Exercise

Install and Configure Oracle Outlook Connector

USING MS OUTLOOK WITH FUS

Troubleshooting Guide. 2.2 Click the Tools menu on Windows Explorer 2.3 Click Folder Options. This will open a dialog box:

Tool Tip. SyAM Management Utilities and Non-Admin Domain Users

Decision Support AITS University Administration. Web Intelligence Rich Client 4.1 User Guide

How to Mail Merge PDF Documents

Creating a New Digital ID or Signature for Adobe Acrobat

Oracle Business Intelligence Publisher: Create Reports and Data Models. Part 1 - Layout Editor

UF Health SharePoint 2010 Introduction to Content Administration

NDA ISSUE 1 STOCK # CallCenterWorX-Enterprise IMX MAT Quick Reference Guide MAY, NEC America, Inc.

Mitigation Planning Portal MPP Reporting System

Bare Bones Guide to Using Outlook 2010 for

WebSphere Business Monitor V6.2 Business space dashboards

To configure Outlook Express for your InfoMetrics address:

HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1

Migrating MSDE to Microsoft SQL 2008 R2 Express

Interact for Microsoft Office

Adobe Dreamweaver CC 14 Tutorial

How to Configure Windows 8.1 to run ereports on IE11

Tech Tips Helpful Tips for Pelco Products

Identity Finder Setup

Creating IBM Cognos Controller Databases using Microsoft SQL Server

Create!form Folder Monitor. Technical Note April 1, 2008

TAMUS Terminal Server Setup BPP SQL/Alva

How to connect to VUWiFi

Voyager Reporting System (VRS) Installation Guide. Revised 5/09/06

edgebooks Quick Start Guide 4

Legal Information Trademarks Licensing Disclaimer

Web Intelligence User Guide

TM Online Storage: StorageSync

Deploying Physical Solutions to InfoSphere Master Data Management Server Advanced Edition v11

Getting started with 2c8 plugin for Microsoft Sharepoint Server 2010

1. CONFIGURING REMOTE ACCESS TO SQL SERVER EXPRESS

AppLoader 7.7. Load Testing On Windows Azure

Appendix A How to create a data-sharing lab

WebSphere Business Monitor V7.0 Business space dashboards

USING OUTLOOK WITH ENTERGROUP. Microsoft Outlook

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

2. Unzip the file using a program that supports long filenames, such as WinZip. Do not use DOS.

MICROSTRATEGY 9.3 Supplement Files Setup Transaction Services for Dashboard and App Developers

Create an Excel BI report and share on SharePoint 2013

Informatica PowerCenter Express (Version 9.5.1) Getting Started Guide

University of Rochester

Baylor Secure Messaging. For Non-Baylor Users

Vizit 4.1 Installation Guide

Microsoft FrontPage 2003

Neoteris IVE Integration Guide

How to use FTP Commander

Inventory Manager. Getting started Usage and general How-To

Office of History. Using Code ZH Document Management System

Dashboard Designer. Introduction Guide. Basic step by step guide to creating a Dashboard. June 2012 V1.2

Installing Microsoft Outlook on a Macintosh. This document explains how to download, install and configure Microsoft Outlook on a Macintosh.

Virtual Office Remote Installation Guide

Remote Storage Area (RSA) Basics

IceWarp Notifier User Guide

E-Notebook SQL 12.0 Desktop Database Migration and Upgrade Guide. E-Notebook SQL 12.0 Desktop Database Migration and Upgrade Guide

etoken Enterprise For: SSL SSL with etoken

How to test and debug an ASP.NET application

Using the New InfoAssist Tool for Ad Hoc Query and Reporting. John Osborn Information Builders

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link:

Constant Contact User Manual

How To Sync Between Quickbooks And Act

CONTRACT MANAGEMENT SYSTEM USER S GUIDE VERSION 2.7 (REVISED JULY 2012)

Using FileMaker Pro with Microsoft Office

Transcription:

Lesson 5 Build Transformations Pentaho Data Integration, or PDI, is a comprehensive ETL platform allowing you to access, prepare, analyze and immediately derive value from both traditional and big data sources. During this lesson, you will be introduced to PDI s graphical design environment, Spoon. You will learn how to create your first transformation to load sales transaction data from a CSV file into an H2 database. Upon launching you are prompted to login to the Data Integration repository. The repository allows you to store and share ETL jobs and transformations and provides access to Enterprise Edition features that will not be covered in this video such as document revision history and scheduling. You can click Cancel to continue on to the design environment. 1. On the Repository Login dialog, click Cancel. Begin creating your first ETL transformation by selecting File New Transformation from the menubar. A new tab titled Transformation 1 appears for you to begin designing your data flow using the design palate on the left side of the screen.

Pentaho Data Integration provides over 200 pre-built steps including input and output steps for reading and writing data, transformation steps for manipulating data, lookup steps for enriching data, and a number of purpose built steps for working with Big Data platforms such as Hadoop and NoSQL databases. Step by step with Pentaho Corporation: 1. Click to expand the Input folder and then click again to collapse. 2. Click to expand the Transform folder and then click again to collapse. 3. Click to expand the Big Data folderand then click again to collapse.

4. Click on the Input folder to open it and leave open for the next section of the demonstration We ll add our first step in the ETL flow by dragging the CSV file input step from the Design Palate onto the canvas to the right. Double-click to on the step to begin editing the configuration. We ll provide a friendly name, and then click browse to select the CSV file containing the sales transactions we want to read. 1. Drag the CSV file input step onto the canvas 2. Double-click on the step to open the edit dialog 3. In the Step name field, enter Read Sales Data 4. Click on Browse, select the sales_data.csv file (C:\Program Files\pentaho\design-tools\data-integration\samples\transformations\files) and click Open 5. Uncheck the Lazy conversion option Next we ll use the Get Fields button to bring back a list of fields to be read from the CSV. PDI will analyze a sample of the data to suggest metadata about the fields including field names from the header row if present and the data type and when finished, you are presented with a summary of the results of the scan. Upon closing the scan results, you can see that each of the fields to be read is listed in the Field list. You can now preview the data to ensure our step is properly configured. Everything looks correct, so we ll click OK to continue building our transformation.

1. Click Get Fields 2. In the Sample Size dialog, enter 0 to sample all fields, then click OK 3. In the Scan results dialog, scroll slowly to display some of the results (maybe down to Field nr 5., and then click Close. 4. Click the Preview button, then click OK on the Preview size dialog. Slowly scroll through some of the preview data before clicking the Close button. 5. Click OK to exit the step configuration. Now we have our CSV input step for reading the sales data. Let s add a Table output step to write that data into a relational database. You describe the flow of data in your transformations by adding hops between steps on the canvas. Using the hover menu, we ll draw a hop from our Read Sales Data step to the newly added Table output step. 1. Close the Input folder in the Design palate, and expand the Output folder. 2. Drag a Table output step onto the canvas, allow the hover tip on How to create a new hop remain on screen for several seconds. 3. Click to select the Read Sales Data step, and then use the hover menu to draw a hop from Read Sales Data step to the Table output step.

4. When prompted, select this as the Main output of the step. We ll configure the Table output step.. After providing a connection name and entering the connection details, we can use the Test button to ensure our connection is properly configured. We want to write our data to a table named Sales, so we ll enter Sales in the Target table field. By clicking on the SQL button, PDI will suggest any SQL necessary to ensure the step works correctly. Since the target table does not exist, you will see a CREATE TABLE DDL statement to execute to prepare for executing our transformation. After executing the DDL, we re now ready to save and run the transformation. 1. Double-click on the Table output step to open the edit dialog. 2. Enter a Step name of Write to Database 3. Click the New button to begin creating a connection 4. Enter Sales H2 as the connection name 5. Select H2 under Connection Type 6. Enter localhost as the Host Name 7. Enter Sales as the Database Name 8. Enter 9092 as the Port Number 9. Enter sa as the User Name (leave the Password blank) 10. Click Test

11. Pause 2 seconds on the Connection Test dialog, then click OK to close, then OK again to close to exit the Database Connection dialog and complete the connection creation. 12. Enter Sales in the Target table field 13. Check the Truncate table option (so that multiple runs of the transformation won t keep adding duplicate rows) 14. Click the SQL button, pause/scroll slowly to show the generated DDL 15. Click Execute, pause for 2 seconds to show the results dialog, then click OK to close it, then click close on the Simple SQL editor dialog. 16. Click OK on the Table output edit dialog (returning to the canvas) We ll save the transformation as Load Sales Data. We are ready to run the transformation. As you can see, Pentaho Data Integration provides options for running transformations locally, remotely on a dedicated PDI server, or against a cluster of PDI servers for processing large volumes of data or reducing execution times. For this example, we ll simply run the transformation locally. 1. Click the Save button on the toolbar 2. Enter the name Load Sales Data and click the Save button 3. Click the Play button on the sub-toolbar to run the transformation 4. As you describe the run options, hover the mouse over the three options 5. Click Launch to run the transformation

Congratulations on building and running your first PDI transformation! You can see in the Step Metrics tab that we have successfully loaded 2823 records into our target database. To find out more information about the powerful Pentaho platform try another lesson or contact Pentaho and start your free proof of concept with the expertise of a Pentaho sales engineer. Contact Pentaho at http://www.pentaho.com/contact/