Import and Output XML Files with SAS Yi Zhao Merck Sharp & Dohme Corp, Upper Gwynedd, Pennsylvania



Similar documents
A Recursive SAS Macro to Automate Importing Multiple Excel Worksheets into SAS Data Sets

Effective Use of SQL in SAS Programming

How to Create an XML Map with the XML Mapper

Importing from Tab-Delimited Files

How To Write A File System On A Microsoft Office (Windows) (Windows 2.3) (For Windows 2) (Minorode) (Orchestra) (Powerpoint) (Xls) (

Technical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication

Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio

What's New in ADP Reporting?

Reading Delimited Text Files into SAS 9 TS-673

XML for SAS Programmers

5. Crea+ng SAS Datasets from external files. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming»

Defining an OLEDB Library in SAS Management Console Using Windows Authentication

Technical Paper. Reading Delimited Text Files into SAS 9

Paper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL

SPSS for Windows importing and exporting data

Define an Oracle Library in SAS Management Console

Producing Listings and Reports Using SAS and Crystal Reports Krishna (Balakrishna) Dandamudi, PharmaNet - SPS, Kennett Square, PA

Qlik REST Connector Installation and User Guide

Importing Data from a Dat or Text File into SPSS

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint InfoPath 2013 Web Enabled (Browser) forms

Chapter 2 The Data Table. Chapter Table of Contents

Microsoft Access Rollup Procedure for Microsoft Office Click on Blank Database and name it something appropriate.

10 Database Utilities

Downloading Your Financial Statements to Excel

Creating a Distribution List from an Excel Spreadsheet

ABSTRACT INTRODUCTION SAS AND EXCEL CAPABILITIES SAS AND EXCEL STRUCTURES

Getting started with the Asset Import Converter. Overview. Resources to help you

Setting Up ALERE with Client/Server Data

Microsoft Office. Mail Merge in Microsoft Word

SAS Enterprise Guide A Quick Overview of Developing, Creating, and Successfully Delivering a Simple Project

SAS Macro Autocall and %Include

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

Data Tool Platform SQL Development Tools

SDTM, ADaM and define.xml with OpenCDISC Matt Becker, PharmaNet/i3, Cary, NC

Help File. Version February, MetaDigger for PC

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

DocumentsCorePack for MS CRM 2011 Implementation Guide

READING AND WRITING XML FILES FROM SAS Miriam Cisternas, MGC Data Services, Carlsbad, CA Ricardo Cisternas, MGC Data Services, Carlsbad, CA

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

SAS 9.3 Drivers for ODBC

Importing and Exporting With SPSS for Windows 17 TUT 117

SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide

ibolt V3.2 Release Notes

Importing Data into SAS

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

2.1 Data Collection Techniques

PharmaSUG Paper QT26

Software Application Tutorial

XLATE Evolution User Guide

Managing Qualtrics Survey Distributions and Response Data with SAS

Switching from PC SAS to SAS Enterprise Guide Zhengxin (Cindy) Yang, inventiv Health Clinical, Princeton, NJ

Inmagic DB/Text for SQL. Importer User s Guide

Importing Data into R

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Paper Creating SAS Datasets from Varied Sources Mansi Singh and Sofia Shamas, MaxisIT Inc, NJ

Secure Website and Reader Application User Guide

How to Import Data into Microsoft Access

Generating a Custom Bill of Materials

Using the Advanced Tier Data Collection Tool. A Troubleshooting Guide

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

SAS to Excel with ExcelXP Tagset Mahipal Vanam, Kiran Karidi and Sridhar Dodlapati

IBM Unica emessage Version 8 Release 6 February 13, User's Guide

HPCC Data Handling. Boca Raton Documentation Team

How To Write A Clinical Trial In Sas

Downloading RIT Account Analysis Reports into Excel

Suite. How to Use GrandMaster Suite. Exporting with ODBC

Before You Begin... 2 Running SAS in Batch Mode... 2 Printing the Output of Your Program... 3 SAS Statements and Syntax... 3

Creating Raw Data Files Using SAS. Transcript

How To Use Optimum Control EDI Import. EDI Invoice Import. EDI Supplier Setup General Set up

Lab 9 Access PreLab Copy the prelab folder, Lab09 PreLab9_Access_intro

How to use MS Excel to regenerate a report from the Report Editor

Leveraging the SAS Open Metadata Architecture Ray Helm & Yolanda Howard, University of Kansas, Lawrence, KS

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Managing Custom Data Standards in SAS Clinical Data Integration

9.1 SAS/ACCESS. Interface to SAP BW. User s Guide

Jet Data Manager 2012 User Guide

Basic SQL Server operations

Create a Pipe Network with Didger & Voxler

WHAT S NEW IN MS EXCEL 2013

Improving Your Relationship with SAS Enterprise Guide

PharmaSUG 2014 Paper CC23. Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA

DiskPulse DISK CHANGE MONITOR

SAS/ACCESS 9.3 Interface to PC Files

Exchanger XML Editor - Data Import

isupplier PORTAL ACCESS SYSTEM REQUIREMENTS

Table Of Contents. iii

Adobe Acrobat 9 Pro Accessibility Guide: Creating Accessible PDF from Microsoft Word

Importing and Exporting Databases in Oasis montaj

Encoding the Password

Hydrologic Modeling System HEC-HMS

William E Benjamin Jr, Owl Computer Consultancy, LLC

Importing Excel Files Into SAS Using DDE Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA

Flat Pack Data: Converting and ZIPping SAS Data for Delivery

Use Mail Merge to create a form letter

Transcription:

Paper TS06-2011 Import and Output XML Files with SAS Yi Zhao Merck Sharp & Dohme Corp, Upper Gwynedd, Pennsylvania Abstract XML files are widely used in transporting data from different operating systems or data storages. Processing XML files efficiently becomes one of the necessary technical skills for SAS programmers. This paper presents four aspects in importing and exporting XML files with SAS. They include: (1). Using XML engine LIBNAME and data step to import or output data between XML and SAS datasets. (2). Using PROC COPY. (3). Developing XMLMap to facilitate the conversion; and (4). Importing XML file to Excel and converting it to SAS dataset. Sample program codes and tips are provided and discussed. Introduction SAS programmers are used to imputing or outputting SAS datasets from or to other formats such as Excel spreadsheet, tab or space delimited ASCII file, XPT transport file. In recent years, another file format has become more and more popular that is XML file. In pharmaceutical industry, the Food and Drug Administration (FDA) mandates pharmaceutical companies to submit certain data and documentation in XML format. Investigators are providing patient data with XML format. Consequently, it is imperative for SAS programmers to have good knowledge working with XML file. What Advantages Does XML Provide? XML is loaded with a mission of "carrying" data in a way that multiple applications on different platforms can recognize. Transporting a SAS data set in XML is the process of putting the file in a format in order to move it across hosts. Using the XML engine with the DATA step or with PROC COPY provides the following advantages: XML data is stored as text. Unlike SAS files, XML documents can be read and updated by using a text editor. XML documents can be imported into applications other than SAS applications. For example, an XML document can be input to an Oracle application. It can be delivered to the Web. It can also be restored as a SAS data set for continued processing. If compatibility with other programs is important for your data, the XML engine is recommended. The XML engine supports SAS 8 and later features. Unlike the XPORT engine, the XML engine supports SAS 8 features such as long names. How Does XML Engine Work? The XML engine works much like other SAS engines, such as XPT. That is, you execute a LIBNAME statement in order to assign a libref and specify an engine. You then use that libref in the SAS session where a libref is valid. However, instead of the libref being associated with the physical location of a SAS library, the libref for the XML engine is associated with a physical location of an XML document. When you use the libref that is associated with an XML document, SAS either translates the data in a SAS data set into XML markup or translates the XML markup into SAS format. There are multiple approaches to inputting or outputting XML files using XML engine depending on the source data or personal preference. Using XML Engine LIBNAME and Data Step The syntax of the XML libname is similar to that of a standard SAS libname.

Inputting XML to SAS Dataset: The following XML document, subjects.xml, will be converted to a SAS dataset in the example below: <?xml version="1.0" encoding="windows-1252"?> <TABLE> <FirstName> James </FirstName> <Age> 10 </Age> <FirstName> Tom </FirstName> <Age> 12 </Age> <FirstName> Joe </FirstName> <Age> 15 </Age> </TABLE> Here is the SAS code to create a permanent SAS dataset from the xml file: libname myxml xml 'U:\XML\subjects.xml'; libname dat 'U:\data\'; data dat.subjects; set myxml.subjects; proc print data = dat.subjects noobs; PROC PRINT of data set dat.subjects: The SAS System FirstName Age James 10 Tom 12 Joe 15 Outputting SAS Dataset to XML: libname myxml xml 'U:\XML\new_subjects.xml'; libname dat 'U:\data\'; data myxml.new_subjects; set dat.subjects; Please note that 'U:\XML\subjects.xml' is the physical location of the XML document for export or import. It is necessary to include the complete pathname, the filename, and the file extension. Enclose the physical name in single or double quotation marks. The.xml file specification must be specified.

In the LIBNAME statement above, an option xmltype= could be specified to generate brand-specific XML. For example, to export an XML document for use by Oracle, we could use the following LIBNAME: libname myxml xml 'U:\XML\new_subjects.xml' xmltype = oracle; By specifying the Oracle format, the XML engine generates XML that is specific to Oracle standards. The default xmltype is GENERIC, which is optional. Using XML Engine LIBNAME and Proc COPY Alternatively, we could use the COPY procedure to convert data set to or from an XML document. libname myxml xml 'U:\XML\subjects.xml'; libname dat 'U:\data\'; proc copy in=myxml out=dat; In the code above, the libref MYXML points to the location of an XML document. The XML engine specifies that the XML document is to be read in XML format. The libref DAT points to the location where the contents of the XML document will be copied to. The PROC COPY statement copies the contents of the library that is specified in the IN= option to the library that is specified in the OUT= option. Using XMLMap to Input XML Document To successfully input an XML document using XML engine, SAS requires the inputted XML document to be well-formed with a rectangular structure. More specifically, the XML document must meet the following requirements: 1. The root enclosing element (top-level node) of an XML document is the document container. For SAS, it translates to a library. 2. The nested elements occurring within the container (repeating element instances) begin with the second-level instance tag. 3. The repeating element instances must represent a rectangular organization. For a SAS data set, they become a collection of rows with a constant set of columns. Here's the XML document used in the previous session that illustrates the physical structure that SAS requires: <?xml version="1.0" encoding="windows-1252"?> <TABLE> -- root enclosing element -- repeating element instance...first row <FirstName> James </FirstName> <Age> 10 </Age> -- repeating element instance...second row <FirstName> Tom </FirstName> <Age> 12 </Age> -- repeating element instance...third row

<FirstName> Joe </FirstName> <Age> 15 </Age> </TABLE> However, due to different sources and approaches an XML file is generated, many XML documents don't have the above physical structure required by SAS. Please see the sample XML document below. <?xml version="1.0"?> <VEHICLES> <FORD> <Model>Mustang</Model> <Year>1965</Year> <Model>Explorer</Model> <Year>1982</Year> <Model>Taurus</Model> <Year>1998</Year> <Model>F150</Model> <Year>2000</Year> </FORD> </VEHICLES> Looking at the above XML document, there are three sequences of element start tags and end tags: VEHICLES, FORD, and ROW. As a result, SAS will not input it successfully. SAS would fail to identify columns of data and generate concatenated data. Here is what would happen when SAS reads this XML document: 1. The XML engine reads the XML markup until it encounters the <FORD> start tag, because FORD is the second-level instance tag. 2. The XML engine scans subsequent elements for variables. As a value for each variable is encountered, it is read into the input buffer. For example, after reading the first ROW element, the input buffer contains the values Mustang and 1965. 3. The XML engine continues reading values into the input buffer until it encounters the </FORD> end tag, at which time it writes the completed input buffer to the SAS data set as an observation. 4. The end result is one observation, which is not what we want. Here is PROC PRINT output showing the concatenated observation. The SAS System Model Year Mustang Explorer Tau 1965 To get separate observations, one solution is to use XMLMap. We change the table location path so that the XML engine writes separate observations to the SAS data set. Here are the syntax of XMLMap file and the process that the engine would follow: <?xml version="1.0"?> <SXLEMAP version="1.2"> <TABLE name="ford"> <TABLE-PATH syntax="xpath"> /VEHICLES/FORD/ROW </TABLE-PATH>

<COLUMN name="model"> <DATATYPE> string </DATATYPE> <LENGTH> 20 </LENGTH> <TYPE> character </TYPE> <PATH syntax="xpath"> /VEHICLES/FORD/ROW/Model </PATH> </COLUMN> <COLUMN name="year"> <DATATYPE> string </DATATYPE> <LENGTH> 4 </LENGTH> <TYPE> character </TYPE> <PATH syntax="xpath"> /VEHICLES/FORD/ROW/Year </PATH> </COLUMN> </TABLE> </SXLEMAP> 1. The XML engine reads the XML markup until it encounters the start tag, because ROW is the last element specified in the table location path. 2. The XML engine scans subsequent elements for variables based on the column location paths. As a value for each variable is encountered, it is read into the input buffer. 3. The XML engine continues reading values into the input buffer until it encounters the end tag, at which time it writes the completed input buffer to the SAS data set as an observation. That is, one observation is written to the SAS data set that contains the values Mustang and 1965. 4. The process is repeated for each start-tag and end-tag sequence. 5. The result is four observations. The following SAS statements input the XML document and specify the XMLMap. The PRINT procedure verifies the results. filename path 'U:\XML\path.xml'; filename map 'U:\XML\path.map'; libname path xml xmlmap=map; proc print data=path.ford noobs; PROC PRINT output showing desired FORD data set The SAS System Model Year Mustang 1965 Explorer 1982 Taurus 1998 F150 2000 In SAS version 9.1.3 a product called XML Mapper is available which allows users to generate an XML Map by drag-and-drop interface. A separate license is required. Please refer to Reference for more details. Importing XML File to Excel and Converting it to SAS Dataset If you have trouble inputting an XML file to SAS dataset by using LIBNAME or XMLMap, there is another shortcut to achieve this task using Excel to open the XML and then using PROC IMPORT to convert the Excel spreadsheet to SAS dataset.

XML features are available only in Microsoft Office Professional Edition 2003 and Microsoft Office Excel 2003. Here are the steps to input an XML file to Excel: 1. Select Open from the File menu and choose the xml file to be open. 2. On the pop up window 'Open XML', select 'As an XML list' and click OK. 3. Click OK on the next pop up window to let Excel create a schema based on the XML source data. 4. Click OK. The spreadsheet is displayed. Verify the spreadsheet against the XML file to ensure the conversion is correct. 5. Use PROC IMPORT to import the Excel spreadsheet to SAS dataset. PROC IMPORT OUT=subjects DATAFILE="U:\subjects.xls" DBMS=xls; GETNAMES=yes; RUN; Conclusion XML is a simple and flexible text format markup language playing an increasing role in the exchange of a wide variety of data. It has become a data interchange format standard. SAS software provides multiple ways to input and output XML files. Using XML engine and LIBNAME, combined with data step or COPY procedure, or an intermediate Excel conversion followed by IMPORT procedure, we could achieve our goal of data conversion successfully for most well-structured, generic XML documents. For non-generic XML documents, an XMLMap could be created to facilitate the conversion. References Shi-Tao Yeh, Using XML Mapper to Create An XMLMap, Proceedings of the Northeast SAS Users Group Conference 2005. www.nesug.org/proceedings/nesug05/cc/cc8.pdf SAS Help and Documentation, SAS Institute Inc. Contact Information Your comments and questions are valued and encouraged. Contact the author at: Yi Zhao Merck Sharp & Dohme Corp 301 N. Summertown Pike North Wales, PA 19454 Phone: 267-305-7672 E-mail: yi_zhao@merck.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.