SnapLogic Sidekick Guide Document Release: October 2013 SnapLogic, Inc. 2 West 5th Avenue, Fourth Floor San Mateo, California 94402 U.S.A. www.snaplogic.com
Copyright Information 2011-2013 SnapLogic, Inc. All Rights Reserved. Terms and conditions, pricing, and other information subject to change without notice. SnapLogic and SnapStore are among the trademarks of SnapLogic, Inc. All other product and company names and marks mentioned are the property of their respective owners and are mentioned for identification purposes only. Snap- Logic is registered in the U.S. and Trademark Office.
Table of Contents SnapLogic Sidekick Guide 1 Table of Contents 3 Sidekick Overview 5 Supported Platforms for Sidekick 6 Performance 6 Sidekick Registration and Communication Architecture 9 Data Communication 9 Configuration 11 Configuring the Server 13 Configuring Sidekick 15 Linux 15 Windows 15 Explicit Proxy Setting 15 Non-persistent Proxy Connections 16 Default Ports 16 Automatically Starting and Stopping Sidekick and SnapLogic Server on Linux 19 Auto Startup on RedHat-like Distribution (Red Hat, Fedora Core, CentOS, Suse) 19 Auto Startup on Debian-like Distribution (Debian, Ubuntu) 20 Sidekick Use Case 21 Appendix: Python Resources 23 Python Components 23 Python Snaps 23-3 -
SnapLogic Sidekick Guide - 4 -
1 Sidekick Overview The SnapLogic Sidekick is a service installed locally that lets you access data both on site behind a firewall and in the cloud. Sidekick can be installed anywhere on your network that has access to the data that you want to use. The advantage of using Sidekick instead of an onpremises installation of the SnapLogic Server is a lighter install and footprint on the ground. Sidekick's lighter footprint means that all Pipelines and associated metadata are stored in the cloud instance of the SnapLogic Server. Sidekick communicates securely with the SnapLogic Server in the cloud through the Secure Bridge, a bi-directional (or two-way) secure connection. The SnapLogicServer is used to create new certificate pairs needed for Sidekick communication. The Sidekick Proxy's key responsibilities are: Establishing two communication links with the SnapLogic Server in the cloud: one for operational communication, and the other for data transfers. Self monitoring to ensure that the Sidekick Proxy is up and running Rollbacks and recovery to the last stable version Enabling an on-premises Component Container Sidekick is essentially a SnapLogic Java Component Container (Java CC). A SnapLogic Server running in the cloud stores the metadata and controls the Sidekick service. When a Pipeline needs to run, Sidekick will execute the Pipelines following commands from the SnapLogic Server; actual Java code executes within the Sidekick. - 5 -
SnapLogic Sidekick Guide Note: When Sidekick is enabled, only the Java CC is running (as Sidekick). The Python CC is inaccessible. There is no difference whether the Java CC is running on the ground (as Sidekick) or in the cloud. Snap logs are stored with the Sidekick, but logs can be shown in the Designer. If communication between the SnapLogic Server and Sidekick is interrupted, the server will no longer be able to talk to the Sidekick, so Pipelines will not be executed. Scheduled Pipelines have an option to be executed if they miss their window, so those Pipelines can still be run. Once Sidekick is enabled, it becomes the Component Container used by thesnaplogic Server for Pipeline execution. Any data movement is initiated inside Sidekick. The only realtime data that flows in or out of Sidekick are Pipelines that utilize an input/output feed. Supported Platforms for Sidekick Sidekick is supported on the following platforms: RedHat Enterprise Linux 5/CentOS 5 Ubuntu 12.04, 64-bit Windows Server 2008 Minimum Requirements The Sidekick host should have a minimum of 1.5 GB RAM and 15 GB HDD. Note: How much RAM and hard drive space you need will vary depending on your Pipelines and the amount of data you are working with. For installation, root access is required. After that, the user snapuser can run Sidekick without needing root. Network connectivity is required. Make sure that the hostname configured on the Sidekick machine as displayed by the command hostname can be resolved to an IP address and it matches one of the network interfaces of the host. If this is not properly configured, Sidekick will not start up. If this issue happens, you can edit /etc/hosts and create an appropriate entry. Sidekick must have direct access to any databases or SaaS applications in your Pipelines. For anything outside the network, direct access through a corporate firewall is required. Performance The performance of your SnapLogic configuration can vary depending on: the number of instances of SnapLogic Server the systems specifications of your host machine the performance of and connection to databases or other applications - 6 -
Sidekick Overview the number and type of Components in your Pipelines (the longer and more complex the Pipeline, the longer it will take to complete) The following use cases provide some examples of performance on Pipelines with large amounts of data. Use Case: MySQL/Oracle to Salesforce Inputs Outputs 25 million rows, each row around 300 bytes Create/Update 25 million contacts in Salesforce Configuration SnapLogic Server on a 4 core, 6 GB virtual system. SnapLogic and MySQL database in the cloud within the same virtual server averages 25,347,945 rows in 2 hours. SnapLogic in the cloud and Oracle database in a local data center averages 25, 053,691 rows in 2 hours. Use Case: Hadoop to MySQL Hadoop: Inputs Outputs 8 million rows in MySQL 48 million rows data, each row around 300 bytes Total size 13.4 GB Configuration SnapLogic Server on a 4 core, 6 GB virtual system. A single instance of SnapLogic processes 48 million rows in 11.43 minutes (or 756 million rows in 3 hours). Two instances process 96 million rows in 14.35 minutes (or 1.2 billion rows in 3 hours). - 7 -
SnapLogic Sidekick Guide - 8 -
2 Sidekick Registration and Communication Architecture The following diagram highlights the architecture of Sidekick communication. Sidekick initiates all communications with the SnapLogic Server. Since it is assumed that Sidekick is not accessible outside the firewall it is behind, the Server never initiates a connection to Sidekick. Sidekick uses an Apache Tomcat web server for its requests and responses. Data Communication Step Description 1 Session Start Sidekick initiates a TCP connection to the SnapLogic Server (HTTPS: default 443*) * Firewall must allow outbound TCP trans- - 9 -
SnapLogic Sidekick Guide Step Description mission to this port 2 Sidekick Registration Sidekick authenticates with the SnapLogic Server using Sidekick user credentials Once authenticated, Sidekick is allowed to begin SSL connection to the secure Cloud Proxy (SSL: default 8443*) which is termed as the Secure Bridge * Firewall must allow outbound TCP transmission to this port 3 Sidekick Secure Bridge Sidekick makes a secure SSL connection to Cloud Proxy (SSL: default 8443*) to create a Secure Bridge using self-signed certificates that were downloaded from the server after installation All communications between Server and Sidekick use this secure bridge SSL connection * Firewall must allow outbound TCP transmission to this port 4 Data Transmission over Secure Bridge Heartbeats and Pipeline control commands are sent from the Snap- Logic Server to Sidekick Pipeline execution status information is sent from Sidekick to Server Pipeline execution and data movement is isolated to Sidekick These are web service request/responses using REST/JSON protocol. 5 Session End Heartbeats and Pipeline control commands will continue to be transmitted from the SnapLogic - 10 -
Sidekick Registration and Communication Architecture Step Description Server to Sidekick via the secure Cloud Proxy until Sidekick is unregistered or disabled Configuration Configuration of the SnapLogic Server is handled in the file /opt/snaplogic/config/snapserver.conf while Sidekick configuration is addressed in the file /opt/snaplogic/config/snaplogic.properties. The latter will be automatically configured when the script snaplogic_sidekick_download_config.sh is run to download certificates and configuration. This call uses basic authentication so it is recommended that you use the https URL to the SnapLogic Server. The configuration file can be edited to change the defaults or to add additional properties. Registration The property cloud_server is set in snaplogic.properties for use in registration of Sidekick with a SnapLogic Server. For example: cloud_server=https://server.company.com:443 Heartbeat and Data Transfer The property cloud_proxy is set for use in heartbeat and Component Container data transfers with the SnapLogicServer via Ground and Cloud Proxy in snaplogic.properties on Sidekick. For example: cloud_proxy=https://server.company.com:8443 This communication must be SSL (not http) and it must be terminated at the SnapLogic Server. Therefore, proper certificates must be installed as discussed in the next section. The SnapLogic Server can be configured to request the heartbeat of Sidekick by updating the following section of the snapserver.conf file: [[sidekick]] # Server pings the CC every heartbeat_seconds (set to 0 to disable). heartbeat_seconds = 60 Certificates SnapLogic generates self-signed certificates. Certificates on SnapLogic Server The certificates for SnapLogic Server are created using the script make_certs.sh and make_ agent_certs.sh either during installation or when it is regenerated using the script snaplogic_sidekick_generate_config.sh. It generates the following files: host.cert - X.509 SHA1 Certificate for SnapLogic Server using Private key (RSA512) host.pem - SnapLogic Server Certificate and Private key - 11 -
SnapLogic Sidekick Guide host.jks - SnapLogic Server Certificate and Private key stored in Java keystore (jks) format textpasswords - Contains various passwords in cleartext including the Sidekick password Another script, make_agent_certs.sh, is run to generate certificates used for Sidekick SSL communication for heartbeat and data transfer (default port 8443). This is run either during installation or when it is regenerated using the script snaplogic_sidekick_generate_config.sh. It generates the following files: sidekick.jks - host.cert certificate imported into Java keystore format cloud-keystore.jks - X.509 SHA1 Certificate/Private Key for Cloud Proxy cloud-truststore.jks - Truststore that contains Ground Proxy Certificate ground-keystore.jks - X.509 SHA1 Certificate/Private Key for Ground Proxy ground-truststore.jks - Truststore that contains Cloud Proxy Certificate Certificates on Sidekick Once Sidekick is installed, the script snaplogic_sidekick_download_config.sh is used to download the certificates and configuration used for this communication. The following.jks files are downloaded from the server: sidekick.jks ground-keystore.jks ground-truststore.jks sidekick_creds - Sidekick credentials used for registration (also available on Server) Cloud Proxy Validation of Client Certificate By default, the Cloud Proxy does not validate the Client host certificate in sidekick.jks. To change this behavior to validate the client certificate, /opt/snaplogic/bin/snaplogic_ proxy.sh must be changed to add a property -Dsnap.need_client_cert=1. - 12 -
Configuring the Server Configuring the Server By default, the SnapLogic Server is installed with Sidekick disabled and the data server running on port 443. To enable Sidekick on the SnapLogic Server machine, do /opt/snaplogic/bin/snapadmin.sh -e "sidekick enable" This will regenerate the certificates, enable Sidekick and prompt for restarting the server. The default ports are also changed. The Data Server will now run on port 8443 and Cloud Proxy will run on 443. Any incoming request to the Cloud Proxy which is not through the Secure Tunnel will automatically be forwarded to the Data Server. - 13 -
SnapLogic Sidekick Guide - 14 -
3 Configuring Sidekick Linux 1. Install the appropriate Linux build of Sidekick onto the Sidekick host using dpkg i <debfile> or rpm ivh <rpmfile>. Once the package successfully installs, it will print a message to run a download script to download certificates as listed below. 2. Navigate to the /bin directory: cd /opt/snaplogic/bin 3. Use the su command to login with snapuser: su snapuser./snaplogic_sidekick_download_config.sh -s https://snaplogicserver.yourcompany.com:443/ 4. Type the admin password for the server. The certificates will be downloaded and untarred. Then, Sidekick will be started and it will connect to the cloud instance above. Windows From the Start menu, select SnapLogic Sidekick > Download Configuration from Server. A DOS window opens and prompts you for the server URI and the admin username and password. Once the information is supplied, the configuration is downloaded and Sidekick is restarted. Explicit Proxy Setting If Sidekick is running on a network with explicit HTTP/HTTPS proxy, you need to add the following line to the snaplogic.properties file in the /opt/snaplogic/config directory on the machine hosting Sidekick: where: java_debug=-dhttp.proxyhost=10.200.10.100 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=10.200.10.100 -Dhttps.proxyPort=3128 10.200.10.100 is the IP address 3128 is the proxy port Substitute these values with your actual proxy IP and port. Note: If Sidekick is being used to connect to cloud applications through an explicit proxy, many Snaps are not designed to connect through a proxy. - 15 -
SnapLogic Sidekick Guide Non-persistent Proxy Connections As of 3.6.1, changes to the Sidekick cloud proxy connection can make it a non-persistent connection. Two new properties are defined in snaplogic.properties (located in /opt/snaplogic/config/snaplogic.properties). Change these and restart the Sidekick to enable the nonpersistent connection behavior. connection_timeout_interval: Timeout after which a connection from Sidekick to cloud is assumed to be inactive (in minutes). Set to zero to disable entering the inactive state. connection_retry_interval: The interval at which the Sidekick polls for incoming requests after a connection is inactive (in seconds). This has to be a value between 1 to 60 seconds. Note: Default installation should have no changes in behavior from prior releases since connection_timeout_interval is set to 0 in snaplogic.properties, which means the connection will not time out. After a configurable timeout (of connection_timeout_interval minutes), Sidekick closes its outbound connection to the cloud proxy and starts to poll periodically (at an interval of connection_retry_interval seconds) for any incoming requests to the Sidekick. If connection_timeout_interval is set to a positive, non-zero value, the connection from Sidekick to the server is set to inactive if there is no activity for that number of minutes. After it enters the inactive state, it will poll at a interval of connection_retry_interval seconds to check if any incoming requests have to be processed. The first request sent when Sidekick is inactive can have a delay of up to 60 seconds before being processed. After this is processed, subsequent requests should not have a delay. If a meta information request is sent by the server, it is processed but the Sidekick continues to be in the inactive state. When a non-meta information request is seen, Sidekick goes into the active state and again maintains a persistent connection. Default Ports The following ports are the defaults defined for use with the SnapLogic Server and Sidekick. SnapLogic Server 80 - server http (disabled by default) 8443- server https 8089 - cc1 http 8092 - cc1 https When Sidekick is disabled: - 16 -
Configuring Sidekick 8090 - cc2 http 8093 - cc2 https When Sidekick is enabled: 8094 - Internal port used by cloud_proxy for server to cloud-proxy communications for forwarding to Sidekick if it is connected 443 - External port used by cloud_proxy to listen to connections from Sidekick. If the incoming request is destined for the data server, then the cloud proxy automatically forwards the request to the data server. Sidekick 8095 - cc2 on Sidekick binds to this port Note: Sidekick needs to be able connect to server_port (usually https port = 443) and cloud_proxy SSL port (8443). - 17 -
SnapLogic Sidekick Guide - 18 -
4 Automatically Starting and Stopping Sidekick and SnapLogic Server on Linux This document explains how to set up SnapLogic Sidekick and SnapLogic Server applications to be run automatically on machine startup or reboot as long as the Linux machine is in multiuser runlevels (2, 3, 4 or 5). Auto Startup on RedHat-like Distribution (Red Hat, Fedora Core, CentOS, Suse) Automatic Sidekick Startup To add SnapLogic Sidekick as a Service: Login to the SnapLogic Sidekick Linux machine as root. Change directories as follows: cd /etc/init.d/ Create a softlink to the snaplogic.rc file as follows: ln -s /opt/snaplogic/bin/snaplogic.rc snapsidekick Add the softlink to chkconfig management as follows: chkconfig --add snapsidekick To delete SnapLogic Sidekick as a Service: Remove the service from chkconfig management as follows: chkconfig --del snapsidekick Automatic Server Startup To add SnapLogic Server as a Service: Login to SnapLogicserver Linux machine as root Change directories as follows: cd /etc/init.d/ Create a softlink to the snapctl.sh file as follows: ln -s /opt/snaplogic/bin/snapctl.sh snaplogic Add the softlink to chkconfig management as follows: chkconfig --add snaplogic To delete SnapLogic Server as a Service: Remove the service from chkconfig management as follows: chkconfig --del snaplogic - 19 -
SnapLogic Sidekick Guide Auto Startup on Debian-like Distribution (Debian, Ubuntu) Automatic Sidekick Startup To add SnapLogic Sidekick as a Service: Login to SnapLogic Sidekick Linux machine as root. Change directories as follows: cd /etc/init.d/ Create a softlink to the snaplogic.rc file as follows: ln -s /opt/snaplogic/bin/snaplogic.rc snapsidekick Add the softlink to update-rc.d with priority 60 on runlevels 2,3, 4 and 5 for Start and priority 99 on the same runlevels for Kill as follows: update-rc.d -f snapsidekick start 60 2 3 4 5. stop 99 2 3 4 5. To delete SnapLogic Sidekick as a Service: Remove the service from update-rc.d management as follows: update-rc.d -f snapsidekick remove Automatic Server Startup To add SnapLogic Server as a Service: Login to SnapLogicServer Linux machine as root. Change directories as follows: cd /etc/init.d/ Create a softlink to the snapctl.sh file as follows: ln -s /opt/snaplogic/bin/snapctl.sh snaplogic Add the softlink to update-rc.d with priority 60 on runlevels 2,3, 4 and 5 for Start and priority 99 on the same runlevels for Kill as follows: update-rc.d -f snaplogic start 60 2 3 4 5. stop 99 2 3 4 5. To delete SnapLogic Server as a Service: Remove the service from update-rc.d management as follows: update-rc.d -f snaplogic remove - 20 -
5 Sidekick Use Case This section of the document describes a complex use case using Sidekick across disconnected networks. Scenario: A company has two separate divisions, each on their own network. They want to use SnapLogic to gather data from databases on the local networks. The following diagram maps out what this scenario might look like. The architecture would consist of the following elements: Each division has its own instance of SnapLogic with a Sidekick installed locally to get and update the data on the local server. - 21 -
SnapLogic Sidekick Guide A third SnapLogic instance is then used as a primary server to communicate with the division instances. An FTP server is added to the same server with the centralsnaplogic instance and the FTP root directory for the SnapLogic user is the same as the [SNAP_HOME]/data directory for the third SnapLogic Server. The same credentials must be used across all SnapLogic Servers so that Pipelines can access data on other servers. If you want to copy the data from the database in Division 1 to the other in Division 2: Division 1: A read-transfer Pipeline is defined on the one division instance to collect the data from the individual database, converts it into a flat file (CSV), and transfer it to the FTP server. Division 2: A transfer-write Pipeline is defined to fetch the data from the FTP server and to write the information to the database. Primary SnapLogic Server: Pipelines are defined to run the Pipelines on Division 1, then run the Pipelines on Division 2 once the first ones are complete. If you want data from both databases to be merged together and copied back to both databases, you would define the read-transfer and transfer-write Pipelines on both Division servers and add merge and filter functionality to the primary server's Pipelines. - 22 -
A Appendix: Python Resources When Sidekick is in use, Python-based Components and Snaps will not work. Following is a list of Python-Based resources that you will not be able to use with Sidekick. Note: These lists are based solely on whether the resource was developed in Python or Java. They do not take into account any other compatibility concerns or Snap availability. Python Components Compute Date Dimension HTMLFormatter RSS Reader RSS Writer Note: Date Operations has both a Java and a Python version of the Component. Python Snaps Apache Stats Box.net Google Analytics Magento NetSuite OpenAir SugarCRM YouTube - 23 -
SnapLogic Sidekick Guide - 24 -
Index C certificates server 11 Sidekick 12 validation of client certificate 12 configuration 15 data transfer 11 default ports 16 D E explicit proxy setting 15 heartbeat 11 overview 5 H O P proxy connections non-persistent 16 registration 11 R S starting on Linux 19 stopping on Linux 19 supported platforms 6 use case 21 U - 25 -
SnapLogic Sidekick Guide - 26 -