Utilizing SFTP within SSIS By Chris Ware, Principal Consultant, iolap, Inc. The Problem Within SSIS a FTP task exists which enables you to access a FTP server. However, it does not support Secure FTP (SFTP). There are many Secure FTP tools to choose from and each requires a unique set of commands and designs to be used within SSIS. For this article, I am using Putty's file transfer tool commonly known as PSFTP (a free open source tool from www.putty.org). This tool performed as needed and we were able to build additional processes around it to enhance its use and satisfy our customers' requirements. The Task We needed to create a process in which the SSIS package looks at existing XML files in the data warehouse and determines which XML files need to be pulled from the external FTP location. After importing these files, a series of checks and balances take place ending with loading the file(s) into a SQL Server database. Please note: 1. The FTP location does not allow the removal of the XML files from the directory. 2. The FTP location contains up to 6 months of XML files. 3. The FTP location only supports Secure FTP. 4. There is a possibility of single or multiple files to be pulled at any given time.
The Process Create Variables MaxXMLDateString (Create as an "Object" Data type) SQL_Query (String Data type created to store the SQL statement) XML_Datekey (String Data type created to store the date pulled from the FTP location) Create SQL Table If you have an existing table for your process that stores the file names, then this table can be utilized to look for the appropriate file. If you do not have an existing SQL table to facilitate this then you will need to create a simple table that will hold the Filename and any other pertinent fields related to this execution process. Step 1 Build a list of the files that you need to pull. This is accomplished by creating a "Script Task." The purpose for this task is to query the datbase in which the processed XML files names exist and extract the latest date part from the filename so that it can be used to extract the needed file(s) from the FTP location. DB File format: IO701_20121020_59249.xml Task Output: 20121020 Public Sub Main() ' Dts.Variables("SQL_Query").Value = "select convert(varchar,datekey,112) datekey from db.dbo.dim_date where datekey between (Select substring(max(substring(filename, CHARINDEX('IO701',filename),24)),7,8)+1 From db.dbo.importaudit) and CONVERT(varchar,getdate(),112)" ' Dts.TaskResult = ScriptResults.Success End Sub
Step 2 Run the query created in Step 1 using the "Execute SQL Task." Set the Result Set to "Full Result Set" to capture everything from the SQL. On the "Result Set" selection (see below), insert the variable name and the Result Name as 0.
Step 3 Create a "Foreach Loop" that will use the value queried from the database (shown here as the "User::MaxXMLDateString") to select the appropriate XML files from the FTP site at runtime. This must be done using an ADO object source variable and shown below.
Below in the "Variable Mappings" section, the variable "User::XML_Datekey" is set and will hold each iteration of the value returned from the Foreach loop at run time.
Step 4 Embedded within the "Foreach Loop" is an "Execute Process Task" that contains the call to the.bat script that will essentially make the call to the external FTP site. Use expressions here to pass all of the arguments requested by the FTP site. The path to the executable and the working directory are also shown. These expressions are shown in the illustration below.
Step 5 Create your.bat file for PSFTP to handle the credentials and daily maintenance (i.e. file existence checks). Within the Parent batch file will be code used to create the iolap_get.bat " REM [DYNAMICALLY BUILDS THE GET BATCH FILE]" The last line contains the code that makes the SFTP call. Notice that the user and password information is parameterized. We are also asking SFTP to provide an outputfile and an errorfile. Additionally we have turned on the Verbose setting "-v" to get a more detailed result in each file.
REM [BUILDS THE XML FILENAME] set scriptsdir=d:\work\projects\ set Filename=IO701_%1 REM [BUILDS THE FTP OUTPUT LOG FILES] set OUTPUTFILE=%scriptsdir%OUTPUTFILE.log set ERRORFILE=%scriptsdir%ERRORFILE.log ECHO %Filename% REM [CHECKS FOR THE EXISTENCE OF THE GET BATCH FILE] IF EXIST %scriptsdir%iolap_get.bat rm %scriptsdir%iolap_get.bat REM [DYNAMICALLY BUILDS THE GET BATCH FILE] ECHO lcd>>%scriptsdir%iolap_get.bat D:\Work\Projects>>%scriptsdir%iolap_get.bat ECHO. >>%scriptsdir%iolap_get.bat ECHO mget>>%scriptsdir%iolap_get.bat %Filename%*.xml>>%scriptsdir%iolap_get.bat ECHO. >>%scriptsdir%iolap_get.bat ECHO quit>>%scriptsdir%iolap_get.bat ECHO. >>%scriptsdir%iolap_get.bat echo y psftp %2 -pw %3 -i "%KEY%" >"%OUTPUTFILE%" 2>"%ERRORFILE%" -v -b "D:\Work\Projects\iolap_get.bat" The Result On execution, SSIS will start and the Foreach Loop will extract the max value in the SQL table. It will then begin to search the FTP site and return all XML that is beyond the max date string based on the SQL query. (We set the SQL query as greater than the max but less than the current time) One by one it will match it to the date part of the file in the FTP location. If it finds a match, it will extract that file and then continue on to the next value until it has exhausted all of the values in the local directory between the max and current dates.
At this point you can continue your normal processing as needed. Upon successful completion of the load process you can update your SQL table with the file names that were processed as seen below in the "Local Directory After SFTP." Subsequent executions will begin from this point. Local Directory Before SFTP External FTP Site Local Directory After SFTP IO701_20130101_64962.xml IO701_20130101_64962.xml IO701_20130101_64962.xml IO701_20130105_65340.xml IO701_20130105_65340.xml IO701_20130105_65340.xml IO701_20130111_66066.xml IO701_20130111_66066.xml IO701_20130111_66066.xml IO701_20130118_66616.xml IO701_20130118_66616.xml IO701_20130118_66616.xml IO701_20130126_67174.xml IO701_20130126_67174.xml IO701_20130126_67174.xml IO701_20130201_67690.xml IO701_20130201_67690.xml IO701_20130201_67690.xml IO701_20130208_68534.xml IO701_20130208_68534.xml IO701_20130208_68534.xml IO701_20130215_69107.xml IO701_20130215_69107.xml IO701_20130222_69738.xml IO701_20130222_69738.xml (Max Value) IO701_20130301_70317.xml IO701_20130301_70317.xml 20130208 IO701_20130308_71207.xml IO701_20130308_71207.xml IO701_20130315_71905.xml IO701_20130315_71905.xml (Current Value) IO701_20130322_72569.xml IO701_20130322_72569.xml 20130330 IO701_20130329_73169.xml IO701_20130329_73169.xml SSIS pulls all values from 20130208 forward where available OUTPUTFILE.log Example (No Match): Connected to ftp.myftpsite.com Remote working directory is / New local directory is D:\Work\Projects\ IO701_20130618*.xml: nothing matched OUTPUTFILE.log Example (Match Found): Connected to ftp.myftpsite.com Remote working directory is / New local directory is D:\Work\Projects\ IO701_20130618*.xml
ERRORFILE.log Example (On Success): Looking up host "ftp.myftpsite.com" Connecting to 192.168.92.99 port 22 Server version: SSH-2.0-WS_FTP-SSH_7.1 Using SSH protocol version 2 We claim version: SSH-2.0-PuTTY_Release_0.62 Using Diffie-Hellman with standard group "group14" Doing Diffie-Hellman key exchange with hash SHA-1 Host key fingerprint is: ssh-rsa 1024 d5:d4:3a:b8:db:07:d9:bf:da:90:3a:c9:76:22:0a:99 Initialised AES-256 CBC client->server encryption Initialised HMAC-SHA1 client->server MAC algorithm Initialised AES-256 CBC server->client encryption Initialised HMAC-SHA1 server->client MAC algorithm Using username "00123456789@ssh". Sent password Access granted Opened channel for session Started a shell/command Sent EOF message Disconnected: All channels closed ERRORFILE.log Example (On Failure): Fatal: Server unexpectedly closed network connection Below is a screenshot of the completed SSIS section of the job and the tasks described above.
About the Author Chris Ware is a Principal Consultant at iolap, Inc., with a focus on ETL. He began his career providing analysis support and reporting in marketing and then insurance for a major retailer. Chris has had the opportunity to work within many different verticals throughout his data warehousing career using a variety of different integration tools. He feels that because data integration is ever changing that the opportunity to employ new processes is endless.