# The RevoScaleR Data Step White Paper

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 REVOLUTION ANALYTICS WHITE PAPER The RevoScaleR Data Step White Paper By Joseph B. Rickert INTRODUCTION This paper provides an introduction to working with large data sets with Revolution Analytics proprietary R package, RevoScaleR. Although the main focus is on the use and capabilities of the rxdatastep function, we take a broad view and describe the capabilities of the functions in the RevoScaleR package that may be useful for reading and manipulating large data sets, cleaning them, and preparing them for statistical analysis with R. BIG DATA AND ITS CHALLENGES Many What is a large data set? These days, one company s impossibly huge data set may be approximately the same size as another company s ordinary, typical size, morning log file. For the purpose of this paper, a large data set satisfies two criteria: 1. It is a file that contains data structured as rows of observations and columns of variables 2. It is too large for R to get all of the data into memory and analyze it on the computer that is available The intention of the first criterion is to distinguish between all the data you may have, in a database or data warehouse and the data that need to be analyzed as a coherent whole for a particular project. No matter what its origin, we assume that the data to be analyzed can be shaped into a flat file with perhaps billions or rows and thousands of columns. The second criterion acknowledges that size of a data file must be measured relative to the amount of computing resources at hand. The R language is memory constrained. Normally, RevoScaleR and a few other big-data packages being exceptions, all of the data must fit into the available RAM, in addition to any copies or data created during analysis. So, though objectively your data file might not be that big, it doesn t much matter if your PC can t handle it. The first challenge in analyzing a large data set is just to get hold of it: to see what variables it contains, to figure out their data types, compute basic summary statistics and get an idea of missing values and possible errors. The next task is to prepare the data for analysis, which in addition to cleaning the data may involve supplementing the data set with additional information, removing unnecessary variables and, perhaps, transforming some variables in a way that makes sense for the contemplated analysis. 1

2 Moreover, it is likely that the data set will eventually be reduced in size as the analysis proceeds. Perhaps only a subset of all the available variables are useful for the final model, or some natural stratification exists in the data set that would naturally lead to the data being distributed among many smaller files, or the some sort of aggregation step such as summing counts of time stamped data to form daily or monthly time series will boil down the data. It is for these purposes: cleaning and preparing large data sets for analysis that the data step functions of RevoScaleR were designed. THE REVOSCALER APPROACH TO BIG DATA Revolution Analytics RevoScaleR package overcomes the memory limitation of R by providing a binary file (extension.xdf) format that is optimized for processing blocks of data at a time. The xdf file format stores data in a way that facilitates the operation of external memory algorithms. RevolScaleR provides functions for importing data from different kinds of sources into xdf files, manipulating data in xdf files and performing statistical analyses directly on data in these files. In this paper we showcase the data handling capabilities of RevoScaleR for reading, writing, sorting, merging and transforming data stored in xdf files. Data handling in RevoScaleR is built around the function rxdatastep. We introduce this function in the next section, and then consider an example illustrating how the family of RevoScaleR data manipulation functions play together. The rxdatastepfunction The RevoScaleR function rxdatastep has been designed to be the work horse for manipulating xdf files and selecting, and transforming the data in these files. It can also be used with data frames in memory. The function can perform all of its work on a single xdf file, or it can be used to create a new xdf file; transforming, selecting and filtering data in the process. The key arguments for rxdatastep are 1 : indata: a data frame,.xdf file name, or a data source representation of an.xdf file outfile: an optional name of an.xdf file to store the output data. If omitted, a data frame is returned varstokeep or varstodrop: a vector of variable names to either keep in the data set or drop from the data set rowselection: a logical expression indicating which rows or observations to keep in the data set transforms: a list of expressions for creating or transforming variables We will exercise these and some of the additional available options while working through an extended example. Additionally, we will gain some experience with rximport which is used to import data files, rxsort used to sort data sets, rxmerge used to merge two data sets, and rxcube, a cross tabulation function that is useful for reducing data stored in xdf files. 1 Please see Appendix 1 for a description of all of rxdatastep parameters and the RevoScaleR User s Guide for a complete description for all RevoScaleR commands. 2

3 Importing Large Data Files To illustrate some of the capabilities of rximport, we will work with a.csv file containing edge data for US domestic flights from 1990 to 2009 that is available for free from infochimps.com 2. At a little over 77MB and with 3.6 million records the file is of modest size but too big for Excel to open and big enough to get a feel for working with large files. Table 1 shows the first 10 records of the file which is named flights_with_colnames.csv. Table 1: First 10 records of airlines edge data file origin_airport destin_airport passengers flights month MHK AMW EUG RDM EUG RDM EUG RDM MFR RDM MFR RDM MFR RDM MFR RDM MFR RDM SEA RDM The first two columns of the airlines edge data file contain the three letter code for the origin and destination airports. The last column identifies the month for which the data were collected. The third and fourth columns contain the total number of passengers and total flights between the origin and destination airports for the month indicated. RevoScaleR s main function for importing data files is rximport. It imports delimited and fixed format text, SAS and SPSS files as well as data stored in a relational database via odbc. Smaller data sets can be imported into a data frame in memory; larger data sets are best stored in an xdf file. The code in BLOCK 1 of the R script in Appendix 2 illustrates using the rximport file to read the data in the.csv file into a.xdf file called FlightsXdf. The colclasses option is used to make sure origin_airport and destin_airport get read in as factors and month gets read in as a character string. This is in preparation for breaking apart the month and year later on. The output from the rxgetinfo function given in Table 2 shows that we have imported 3,606,803 rows of data. Table 2. First 10 rows of imported airline edge data File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\Flights.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Data (10 rows starting with row 1): origin_airport destin_airport passengers flights month 1 MHK AMW EUG RDM EUG RDM EUG RDM MFR RDM MFR RDM

4 7 MFR RDM MFR RDM MFR RDM SEA RDM A variation of the rxgetinfo function (Table 3) interrogates the meta data stored in the xdf file and shows some information about the data types of the stored variables Table 3. Meta data stored in xdf file for airlines edge data > rxgetinfo("flights", getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\Flights.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Variable information: Var 1: origin_airport 683 factor levels: MHK EUG MFR SEA PDX... CRE BOK BIH MQJ LCI Var 2: destin_airport 708 factor levels: AMW RDM EKO WDG END... COS HII PHD TBN OH1 Var 3: passengers, Type: integer, Low/High: (0, 89597) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Notice that there are 683 factor levels for the variable origin_airport and 708 destination airports. The maximum number of passengers carried between any origin/destination pair in one month was 89,597 and the maximum number of flights between any origin/destination pair was 1,128. Merging Xdf Files Before exploring the data, it would be nice to be able to interpret the airport codes. Infochimps.com has another file which gives the airport name associated with each code as well as additional information on airport locations. 3 Table 4 shows the first ten rows of this airport locations file. It is clear the variables Code and Airport_Name contain exactly the information required. Also note that the locations file is rather small (600KB). Even on a modest PC this entire file could be read into a data frame and concise R code written to merge it into an Xdf file. However, the approach we will take to merge the flights and locations file is treat them as if they are too big to be read into memory and perform the task entirely with xdf files as the only data structure. Table 4. Airport location data Code Lattitude Longitude Airport_Name City Country Country Code GMT offset Runway Length Runway Elevation MSW Massawa Massawa Eritrea ER International TES Tessenei Tessenei Eritrea ER ASM Yohannes IV Asmara Eritrea ER ASA Assab International Assab Eritrea ER NLK Norfolk Island Norfolk Island Norfolk Island NF URT Surat Thani Surat Thani Thailand TH

5 PHZ Phi Phi Island Phi Phi Island Thailand TH -7 PHS Phitsanulok Phitsanulok Thailand TH UTP Utapao Utapao Thailand TH UTH Udon Thani Udon Thani Thailand TH Dealing with Factor Data Conceptually, what we would like to do is read in this new file containing the airport names and merge it with our original data set using the airport codes which appear in both data sets as the key. Simple? Yes, conceptually simple, but we will be dealing with factor data and we must take care to ensure that the levels of the code variable in the new data set match the levels of the code variables origin_airport and destin_airport in the original data set. It is also the case that merging involves sorting, and sorting factor variables can be a cause for surprise to the uninitiated. In RevoScaleR, unless otherwise specified, factor levels are ordered by the order in which the values of the variable are read into the xdf file. This is because the data is read in in chunks. So if the first row with code TES happens to appear the first row with code ASA in a file, then, when sorted, the TES rows will appear before rows beginning with ASA. The following block of code uses a new function in RevoScaleR, rxfactors, to reset the levels of both the origin_airport factor variable and destin_airport factor variable to the union of the original levels for these variables. Note that a new flights file, FlightsRfXdf, is created in the process. The code to accomplish this is given in BLOCK 2 of the R script. Table 5 presents the meta-data for this new file and shows that origin_airport and destin_airport have the same levels which have been sorted into alphabetical order. Table 5. Meta-information for new flights file FlightsRfXdf showing reset factor levels > rxgetinfo("flightsrf",getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\FlightsRf.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Variable information: Var 1: origin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 2: destin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 3: passengers, Type: integer, Low/High: (0, 89597) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Next (BLOCK 3), we read in the airport location data file, using the colinfo parameter in rximport to set the levels of the Code variable to the same as those in the flights data file. Explicitly setting the levels of a factor variable with the ColInfo command is the preferred method of importing factor levels. We also restrict the data import to the two variables of interest: Code and Airport_Name. Merging Two Xdf Files Before merging the two files, we first rename the variables in the AirportNames data file (BLOCK 4 of the R script) using the rxsetvarinfo function. Note that, when this process is completed, the variable origin_code will exist in both the flights data file FlightsRfXdf and AirportNames so that it can be used as the key in the merge step. We then use the rxmerge function to perform a left 5

6 outer join, with FlightsRfXdf as the left file, the file for which all records will be retained, and AirportNames as the right file, the file whose records will be merged in. As mentioned above, the variable origin_airport which exists in both files is the matching variable. Table 6 shows five rows from the merged files beginning with row 300,000. Table 6. 5 rows from the merged FlightsRf and AirportNames File > rxgetinfo(data="mergedflights",numrows=5,startrow=300000) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\MergedFlights.xdf Number of observations: Number of variables: 6 Number of blocks: 16 Data (5 rows starting with row 3e+05): origin_airport destin_airport passengers flights month origin_name 1 BGM CVG Greater Binghamton 2 BGM CVG Greater Binghamton 3 BGM CVG Greater Binghamton 4 BGM CVG Greater Binghamton 5 BGM CVG Greater Binghamton To get the full names of the destination airports into our newly created airlines data file, we repeat the entire process, again first renaming the variables in AirportNames (BLOCK 5). Table 7 show five lines from the final airlines flights file, FinalFlights.xdf starting at line 300,000. Notice that the data displayed are different from those displayed in Table 6. The reason for this is that during the second merge, the merge of the destination airport names, rxmerge internally sorted the resulting file by the destin_airport variable. Table 7. 5 rows from the final airlines flights file, FinalFlights.xdf > rxgetinfo(data="finalflights",numrows=5,startrow=300000) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\FinalFlights.xdf Number of observations: Number of variables: 7 Number of blocks: 32 Data (5 rows starting with row 3e+05): origin_airport destin_airport passengers flights month origin_name 1 OXR BFL Ventura 2 PHX BFL Sky Harbor Intl 3 PHX BFL Sky Harbor Intl 4 PHX BFL Sky Harbor Intl 5 PHX BFL Sky Harbor Intl destin_name 1 Meadows Field 2 Meadows Field 3 Meadows Field 4 Meadows Field 5 Meadows Field Exploring the Data Let s continue to exercise the data manipulation features of RevoScaleR by exploring the airlines flights data file (BLOCK 6). Just to pick a place to start consider the question: which origin airports had the most flights in any given month? A natural way to proceed might be to sort the data and have a look. First, we use rxsort to sort the data by the flights variable in decreasing order to produce a list showing the airport origin / destination pairs with the most flights in any month. 6

7 The rxgetinfo function which we have seen before extracts the first five lines of this file and writes it into a list (Table 8). Table 8. List showing airport pairs with most flights in a month > mostflights5 File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\sortFlights.xdf Number of observations: Number of variables: 7 Number of blocks: 32 Data (5 rows starting with row 1): origin_airport destin_airport passengers flights month 1 SFO LAX LAX SFO HNL OGG OGG HNL OGG HNL origin_name destin_name 1 San Francisco International Los Angeles International 2 Los Angeles International San Francisco International 3 Honolulu International Kahului 4 Kahului Honolulu International 5 Kahului Honolulu International SF0 and LAX are right up there at the top and it appears that quite a bit of flying went on during the Christmas season of Also note that five airports each make the 10 ten list twice. It might be nice to see what activity was like for these airports over the time period of the whole data set. In preparation for generating a histogram showing the distribution of flights for these airports we will employ rxdatastep to create a file containing only the flights having SFO and LAX as the origin airports. The code to do this and some important housekeeping regarding the factor levels is in BLOCK 6. Notice that the variable top2 contains the character vector SFO LAX. The mechanism for making this global variable from the R environment to the rxdatastep function which will select out the SFO and LAX flights into a new xdf file, mostflights is through the argument transformobjects. This argument must be a named list, so we have transformobjects = list(top2to = top2), denoting a different name for the variable used in the transform environment. The housekeeping is handled through the transforms argument in rxdatastep. This transform creates a new varaible origin having the two factor levels in top2 from the variable orgin_airport which has 683 factor levels. To summarize: the input file to rxdatastep is the flights data file, FinalFlightsXdf, the output file is the smaller, new file, mostflightsxdf. The rowselection parameter uses the transform object vector top2to to pick out only the 2 origin airports in which we are interested. The argument transforms is set to the expression we looked at above. The argument transformobjects is the bridge between the R environment and the transform environment. It makes the vector top2to available to the data step. Overwrite = TRUE merely lets one overwrite a file that already exists. Table 9 displays the variables in mostflights.xdf. Notice that the new variable origin is a factor with only 2 levels, not the 683 levels of origin_airport. The reduced number of levels makes it possible to use origin as the selection variable in the histogram function. With the new file in place a, call to rxhistogram plots the monthly flight data for the the top 2 origin / destination airport pairs by the origin airports using the new variable created for this purpose. Figure 1 shows the results. 7

8 Table 9: Contents of file mostflights > rxgetinfo("mostflights",getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\mostFlights.xdf Number of observations: Number of variables: 8 Number of blocks: 32 Variable information: Var 1: origin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 2: destin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 3: passengers, Type: integer, Low/High: (0, 83153) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Var 6: origin_name, Type: character Var 7: destin_name, Type: character Var 8: origin 2 factor levels: SFO LAX Figure 1: Monthly flight distributions of 2 busiest airports Aggregating Data The edge airlines file that we have been working contains time stamped data from which several time series may be easily extracted. Each record contains the number of flights and the number of passengers that flew from one airport to another during a given month. So, for each origin airport, by aggregating the counts in all of the destination airports, we can produce a monthly time series of outgoing flights. This is an example of a kind of data aggregation task at which R excels and for which the RevoScaleR package has a function (rxcube) that is particularly useful for boiling down large data sets. It fits nicely with our notion of an extended data step and provides another opportunity to work with rxdatastep. Our first task will be to split apart the month field in Table 1 into separate month and year variables that we will eventually turn into a Date data type. We begin by writing new transforms to parse the character strings that comprise the month variable and use rxdatastep to create the 8

9 new variables, Month and Year, in the file (BLOCK 7). We also drop the variable origin_airport from the data set, and specify a new block size for the.xdf file of rows. The next step is to use the function rxcube to aggregate the data into a time series. rxcube is a cross tabulation function that returns its output in long form (one row for each combination) rather than in a table. When the formula that describes the tabulation to be done contains a dependent numeric variable, rxcube computes the average of the instances that were tabulated along with a new variable, Counts, that contains the number of that went into computing the average. In our case, the Table 10 indicates that for the 1 st month of 1990 the flights from SFO from 284 destination airports were summed and used to compute an average or flights per destination. Hence, we can multiply Counts by flights to get the total number of flights out of SFO for each time period (11,088 flights for January 1990). Table 10. Result of rxcube head(t1) Year Month origin flights Counts SFO SFO SFO SFO SFO SFO Next, we rename the variables and combine Month and Year into a variable of type Date. (Note that we arbitrarily selected the 28 th of each month to fill out the date format.) Table 12 displays the first six lines of the new data frame. Table 11. Data frame with date head(t1) Year Month origin avg.flights.per.destin total.destin flights.out Date SFO SFO SFO SFO SFO SFO Finally, we select SFO, sort by date to form a time series of total monthly flights originating from SFO and plot. Figure 2 shows the monthly flights out of SFO. 9

10 Figure 2. Time series of flights out of SFO Next, although it is really not in the scope of this paper to do any time series analysis, just for fun we perform a seasonal decomposition of the SFO time series and plot it in Figure 3. The first panel of Figure 3 reproduces the time series. The second panel shows the periodic, seasonal component. The third panel displays the trend and the fourth panel displays the residuals. Figure 4 and Figure 5 are the corresponding plots for the LAX time series. Figure 3. Seasonal decomposition of the SFO series 10

11 Figure 4. Time series of LAX flights Figure 5. Seasonal decomposition of LAX flights SUMMARY In this paper we have explored some of the data handling capabilities of Revolution Analytics RevoScaleR package by working through a practical example that included reading, writing, sorting and merging xdf files; selecting and transforming variables; and aggregating time-stamped data to produce multiple time series. Although the data sets employed in the example were not very large, the section on merging files was done entirely with external memory algorithms and xdf files to show how a similar process could be accomplished with arbitrarily large files. 11

12 The extended example also illustrated the natural way in which the data structures and functions of RevoScaleR integrate into an interactive R workflow. RevoScaleR provides all of the data handling and data preparation capabilities required to R ready for big data. About Revolution Analytics Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company s flagship Revolution R Enterprise product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Please visit us at 12

13 APPENDIX 1: rxdatastep {RevoScaleR} Data Step for.xdf Files and Data Frames Description Transform data from an input.xdf file or data frame to an output.xdf file or data frame Usage rxdatastep(indata = NULL, outfile = NULL, varstokeep = NULL, varstodrop = NULL, rowselection = NULL, transforms = NULL, transformobjects = NULL, transformfunc = NULL, transformvars = NULL, transformpackages = NULL, transformenvir = NULL, append = "none", overwrite = FALSE, removemissings = FALSE, computelowhigh = TRUE, maxrowsbycols = , rowsperread = -1, startrow = 1, numrows = -1, returntransformobjects = FALSE, blocksperread = rxgetoption("blocksperread"), reportprogress = rxgetoption("reportprogress"),...) Arguments indata infile outfile varstokeep varstodrop rowselection transforms transformobjects transformfunc a data frame, a character string specifying the input.xdf file, or an RxXdfData object. If NULL, a data set will be created automatically with a single variable,.rxrownums, containing row numbers. It will have a total of numrows rows with rowsperread rows in each block. either an RxXdfData object or a character string specifying the input.xdf file. a character string specifying the output.xdf file or an RxXdfData object. If NULL, a data frame will be returned from rxdatastep unless returntransformobjects is set to TRUE. Setting outfile to NULL and returntransformobjects=true allows chunkwise computations on the data without modifying the existing data or creating a new data set. character vector of variable names to include when reading from the input data file. If NULL, argument is ignored. Cannot be used with varstodrop. character vector of variable names to exclude when reading from the input data file. If NULL, argument is ignored. Cannot be used with varstokeep. name of a logical variable or a logical expression (using variables in the data set) for row selection. For example, rowselection = (age >= 20) & (age <= 65) & (value > 20) will select the rows corresponding to ages on the interval [20, 65] and with values greater than 20. As with all expressions, rowselection (or transforms) can be defined outside of the function call using the expression function. an expression of the form list(name = expression,...) representing the first round of variable transformations. As with all expressions, transforms (or rowselection) can be defined outside of the function call using the expression function. a named list containing objects that can be referenced by transforms, transformsfunc, and rowselection. variable transformation function. See rxtransform for details. 13

15 APPENDIX 2: R SCRIPT DATA STEP WHITE PAPER CODE Joseph B. Rickert November 2011 READ MAIN DATA FILE BLOCK 1 READ IN PRIMARY DATA FILE FlightsText <- "C:/Users/Joseph/Documents/DATA/US Flights/flights_with_colnames.csv" Note we make month a character field so we can parse it into month and year later system.time( rximport(indata = FlightsText, outfile = "Flights", overwrite = TRUE, colclasses=c(origin_airport = "factor", destin_airport = "factor", month="character")) ) Look at meta-data stored with the file rxgetinfo("flights", numrows = 10) rxgetinfo(data = flightsds, getvarinfo = TRUE) MERGE FILES BLOCK 2 Rationalize the factor levels for the origin and destination airports in the flights file flightvarinfo <- rxgetvarinfo("flights") originlevels <- flightvarinfo\$origin_airport\$levels destinlevels <- flightvarinfo\$destin_airport\$levels Create a combined list of sorted, unique factors airportlevels <- sort(unique(c(originlevels, destinlevels))) Use rxfactors to re-order the factor levels and make them the same FlightsRfXdf contains the Redone factor levels rxfactors(indata = "Flights", outfile = "FlightsRf", overwrite = TRUE, factorinfo = list( destin_airport = list(newlevels = airportlevels), origin_airport = list(newlevels = airportlevels)) ) Now origin_airport and destin_airport both have the same number of levels rxgetinfo("flightsrf", getvarinfo=true) BLOCK 3 Read in the Airport Locations File, Set the Code factor levels to the same as those used in the flight data, and remove any names that are not used that aren't used Only read in the variables of interest: Code and Airport_Name 15

16 LocationsText <- "C:/Users/Joseph/Documents/DATA/Airport Locations/Airport_locations2.csv" rximport(indata = LocationsText, outfile = "AirportNames", varstokeep = c("code", "Airport_Name"), colinfo = list( Code = list(type = "factor", levels = airportlevels )), rowselection =!is.na(code), overwrite=true) Check to see that the levels are the same rxgetinfo("airportnames", getvarinfo=true) BLOCK 4 Rename the variables in AirportNames to match the desired names for the merged data set namevarinfo <- rxgetvarinfo("airportnames") namevarinfo\$code\$newname <- "origin_airport" namevarinfo\$airport_name\$newname <- "origin_name" rxsetvarinfo(namevarinfo, data = "AirportNames") Perform left merge, matching on origin airport FlightsRfXdf is the "left" file. NamesOrigin is the "right" file. rxmerge(indata1 = "FlightsRf", indata2 = "AirportNames", outfile = "MergedFlights.xdf", type = "left", matchvars = c("origin_airport"), overwrite = TRUE) rxgetinfo(data = "MergedFlights", numrows=5, startrow=300000) BLOCK 5 Rename the variables again to prepare for a merge for the destination airport namevarinfo <- rxgetvarinfo("airportnames") namevarinfo\$origin_airport\$newname <- "destin_airport" namevarinfo\$origin_name\$newname <- "destin_name" rxsetvarinfo(namevarinfo, data = "AirportNames") Perform a second left merge, matching on destination airport MergedFlights is the "left" file. NamesDestin is the "right" file. rxmerge(indata1 = "MergedFlights", indata2 = "AirportNames", outfile = "FinalFlights.xdf", type = "left", matchvars = c("destin_airport"), overwrite = TRUE) rxgetinfo(data = "FinalFlights", numrows=5, startrow=300000) DATA EXPLORATION BLOCK 6 16

17 Find airports connections with most flights in a given month rxsort(indata="finalflights", outfile = "sortflights.xdf", sortbyvars="flights", decreasing = TRUE,overwrite=TRUE) rxgetinfo(data="sortflights") mostflights5 <- rxgetinfo(data = "sortflights", numrows=5, startrow=1) mostflights5 top5f <- as.data.frame(mostflights5[[5]]) topoa <- unique(as.vector(top5f\$origin_airport)) topoa Select the top 2 top2 <- topoa[1:2] top2 Build a file with only flights originating at SFO and LAX Use a transform object to pass in the top2 levels rxdatastep(indata = "FinalFlights", outfile = "mostflights", rowselection = origin_airport %in% top2to, transforms = list( origin = factor(origin_airport, levels = top2to, labels = top2to)), transformobjects = list(top2to = top2), overwrite = TRUE) rxgetinfo("mostflights", numrows=10, startrow=1) rxgetinfo("mostflights", getvarinfo=true) rxhistogram(~flights origin, data="mostflights") AGGREGATING TIME SERIES DATA BLOCK 7 CREATE A TIME SERIES FILE Write transforms to separate out the month and year data rxdatastep(indata = "mostflights", outfile = "SFO.LAX", transforms = list( Month = as.integer(substring(month, 5, 6)), Year = as.integer(substring(month, 1, 4))), varstodrop = c("origin_airport"), rowsperread = 50000, overwrite = TRUE ) rxgetinfo(data = "SFO.LAX", numrows=10) Aggregate data into monthly flights time series t1 <- rxcube(flights ~ F(Year):F(Month):origin, removezerocounts=true, data = "SFO.LAX") t1 <- rxresultsdf(t1) head(t1) Compute total flights out and combine month and dat into a date t1\$flights_out <- t1\$flights*t1\$counts names(t1) <- c("year","month","origin","avg.flights.per.destin","total.destin","flights.out") t1\$date <- as.date(as.character(paste(t1\$month,"- 28 -",t1\$year)),"%m - %d - %Y") head(t1) BLOCK 7 Select SFO entries and plot 17

18 SFO.t1 <- t1[t1\$origin=="sfo",] head(sfo.t1) SFO.t1 <- SFO.t1[order(SFO.t1\$Date),] Make x axis a date and plot x <-SFO.t1\$Date y <-SFO.t1\$flights.out library(ggplot2) make sure ggplot is package is loaded qplot(x,y, geom="line",xlab="", ylab="number of Flights\n",main="Monthly Flights Out of SFO") summary(sfo.t1\$flights.out) Try a seasonal decomposition SFO.ts <- ts(y,start=c(1990,1),freq=12) sd.sfo <- stl(sfo.ts,s.window="periodic") plot(sd.sfo) LAX.t1 <- t1[t1\$origin=="lax",] LAX.t1 <- LAX.t1[order(LAX.t1\$Date),] Make x axis a date and plot x <-LAX.t1\$Date y <-LAX.t1\$flights.out qplot(x,y, geom="line",xlab="", ylab="number of Flights\n",main="Monthly Flights Out of LAX") summary(lax.t1\$flights.out) Perform a seasonal decomposition LAX.ts <- ts(y,start=c(1990,1),freq=12) sd.lax <- stl(lax.ts,s.window="periodic") plot(sd.lax). 18

### Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

### Big Data Analysis with Revolution R Enterprise

REVOLUTION WHITE PAPER Big Data Analysis with Revolution R Enterprise By Joseph Rickert January 2011 Background The R language is well established as the language for doing statistics, data analysis, data-mining

### Big Data Analysis with Revolution R Enterprise

Big Data Analysis with Revolution R Enterprise August 2010 Joseph B. Rickert Copyright 2010 Revolution Analytics, Inc. All Rights Reserved. 1 Background The R language is well established as the language

### RevoScaleR Speed and Scalability

EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

Package HadoopStreaming February 19, 2015 Type Package Title Utilities for using R scripts in Hadoop streaming Version 0.2 Date 2009-09-28 Author David S. Rosenberg Maintainer

### NEXT Analytics Business Intelligence User Guide

NEXT Analytics Business Intelligence User Guide This document provides an overview of the powerful business intelligence functions embedded in NEXT Analytics v5. These functions let you build more useful

### DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

### SQL Server An Overview

SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system

### SPSS: Getting Started. For Windows

For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering

### Big Data Decision Trees with R

REVOLUTION ANALYTICS WHITE PAPER Big Data Decision Trees with R By Richard Calaway, Lee Edlefsen, and Lixin Gong Fast, Scalable, Distributable Decision Trees Revolution Analytics RevoScaleR package provides

### Oracle Data Miner (Extension of SQL Developer 4.0)

An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining

### IBM SPSS Data Preparation 22

IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release

### ACL Command Reference

ACL Command Reference Test or Operation Explanation Command(s) Key fields*/records Output Basic Completeness Uniqueness Frequency & Materiality Distribution Multi-File Combinations, Comparisons, and Associations

### Oracle Data Miner (Extension of SQL Developer 4.0)

An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

### 2. Basic Relational Data Model

2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

### SAS Data Set Encryption Options

Technical Paper SAS Data Set Encryption Options SAS product interaction with encrypted data storage Table of Contents Introduction: What Is Encryption?... 1 Test Configuration... 1 Data... 1 Code... 2

### Bloomberg 1 Database Extension for EViews

Bloomberg 1 Database Extension for EViews Overview The Bloomberg Database Extension is a new feature for EViews 8.1 that adds easy access to Bloomberg s extensive collection of market and economic data

### Database Migration from MySQL to RDM Server

MIGRATION GUIDE Database Migration from MySQL to RDM Server A Birdstep Technology, Inc. Raima Embedded Database Division Migration Guide Published: May, 2009 Author: Daigoro F. Toyama Senior Software Engineer

### Adobe Marketing Cloud Data Workbench Monitoring Profile

Adobe Marketing Cloud Data Workbench Monitoring Profile Contents Data Workbench Monitoring Profile...3 Installing the Monitoring Profile...5 Workspaces for Monitoring the Data Workbench Server...8 Data

### ETL-EXTRACT, TRANSFORM & LOAD TESTING

ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA rajesh.popli@nagarro.com ABSTRACT Data is most important part in any organization. Data

### Wave Analytics External Data API Developer Guide

Wave Analytics External Data API Developer Guide Salesforce, Winter 16 @salesforcedocs Last updated: November 6, 2015 Copyright 2000 2015 salesforce.com, inc. All rights reserved. Salesforce is a registered

### SAP InfiniteInsight Explorer Analytical Data Management v7.0

End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight Explorer Analytical Data Management v7.0 User Guide CUSTOMER Table of Contents 1 Welcome to this Guide... 3 1.1 What this Document

### Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

### User Guide. Analytics Desktop Document Number: 09619414

User Guide Analytics Desktop Document Number: 09619414 CONTENTS Guide Overview Description of this guide... ix What s new in this guide...x 1. Getting Started with Analytics Desktop Introduction... 1

### Can you design an algorithm that searches for the maximum value in the list?

The Science of Computing I Lesson 2: Searching and Sorting Living with Cyber Pillar: Algorithms Searching One of the most common tasks that is undertaken in Computer Science is the task of searching. We

### HYPERION DATA RELATIONSHIP MANAGEMENT RELEASE 9.3.1 BATCH CLIENT USER S GUIDE

HYPERION DATA RELATIONSHIP MANAGEMENT RELEASE 9.3.1 BATCH CLIENT USER S GUIDE Data Relationship Management Batch Client User s Guide, 9.3.1 Copyright 1999, 2007, Oracle and/or its affiliates. All rights

### Instant SQL Programming

Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

### Oracle Database 12c: Introduction to SQL Ed 1.1

Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Introduction to SQL Ed 1.1 Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training helps you write subqueries,

### Creating a universe on Hive with Hortonworks HDP 2.0

Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh

### Oracle Database: Introduction to SQL

Oracle University Contact Us: +381 11 2016811 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn Understanding the basic concepts of relational databases ensure refined code by developers.

### Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants

### Oracle Database: Introduction to SQL

Oracle University Contact Us: 1.800.529.0165 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn View a newer version of this course This Oracle Database: Introduction to SQL training

### Data Integrity in Travel Management Reporting

The travel management reports produced by back office systems and third-party reporting tools are often plagued with omissions, errors and inconsistencies. Although the output of a system is generally

### Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces

### Oracle Database 11g SQL

AO3 - Version: 2 19 June 2016 Oracle Database 11g SQL Oracle Database 11g SQL AO3 - Version: 2 3 days Course Description: This course provides the essential SQL skills that allow developers to write queries

### CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables

### From The Little SAS Book, Fifth Edition. Full book available for purchase here.

From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software

### MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

### THE POWER OF PROC FORMAT

THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use

### Visualization with Excel Tools and Microsoft Azure

Visualization with Excel Tools and Microsoft Azure Introduction Power Query and Power Map are add-ins that are available as free downloads from Microsoft to enhance the data access and data visualization

### MapReduce and the New Software Stack

20 Chapter 2 MapReduce and the New Software Stack Modern data-mining applications, often called big-data analysis, require us to manage immense amounts of data quickly. In many of these applications, the

### Web Intelligence User Guide

Web Intelligence User Guide Office of Financial Management - Enterprise Reporting Services 4/11/2011 Table of Contents Chapter 1 - Overview... 1 Purpose... 1 Chapter 2 Logon Procedure... 3 Web Intelligence

### Oracle SQL. Course Summary. Duration. Objectives

Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data

### Everything you wanted to know about MERGE but were afraid to ask

TS- 644 Janice Bloom Everything you wanted to know about MERGE but were afraid to ask So you have read the documentation in the SAS Language Reference for MERGE and it still does not make sense? Rest assured

### BUSINESS DATA ANALYSIS WITH PIVOTTABLES

BUSINESS DATA ANALYSIS WITH PIVOTTABLES Jim Chen, Ph.D. Professor Norfolk State University 700 Park Avenue Norfolk, VA 23504 (757) 823-2564 jchen@nsu.edu BUSINESS DATA ANALYSIS WITH PIVOTTABLES INTRODUCTION

### MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 1 Contents Accessing the Sage MAS Intelligence Reports... 3 Copying, Pasting and Renaming Reports... 4 To create a new report from an existing report...

### PentaMetric battery Monitor System Sentry data logging

PentaMetric battery Monitor System Sentry data logging How to graph and analyze renewable energy system performance using the PentaMetric data logging function. Bogart Engineering Revised August 10, 2009:

### Raima Database Manager Version 14.0 In-memory Database Engine

+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

### Excel Working with Data Lists

REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

Arena 9.0 Basic Modules based on Arena Online Help Create This module is intended as the starting point for entities in a simulation model. Entities are created using a schedule or based on a time between

### Database Applications Microsoft Access

Database Applications Microsoft Access Lesson 4 Working with Queries Difference Between Queries and Filters Filters are temporary Filters are placed on data in a single table Queries are saved as individual

### Excel Companion. (Profit Embedded PHD) User's Guide

Excel Companion (Profit Embedded PHD) User's Guide Excel Companion (Profit Embedded PHD) User's Guide Copyright, Notices, and Trademarks Copyright, Notices, and Trademarks Honeywell Inc. 1998 2001. All

### Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

### The Data Warehouse ETL Toolkit

2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,

### Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

D80198GC10 Oracle Database 12c SQL and Fundamentals Summary Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff Level Professional Delivery Method Instructor-led

### Cloud Computing at Google. Architecture

Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

### Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP

Paper 232-2012 Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Audrey Ventura, SAS Institute Inc., Cary, NC ABSTRACT Effective data analysis requires easy

### Once the schema has been designed, it can be implemented in the RDBMS.

2. Creating a database Designing the database schema... 1 Representing Classes, Attributes and Objects... 2 Data types... 5 Additional constraints... 6 Choosing the right fields... 7 Implementing a table

### Package uptimerobot. October 22, 2015

Type Package Version 1.0.0 Title Access the UptimeRobot Ping API Package uptimerobot October 22, 2015 Provide a set of wrappers to call all the endpoints of UptimeRobot API which includes various kind

### MongoDB Aggregation and Data Processing Release 3.0.4

MongoDB Aggregation and Data Processing Release 3.0.4 MongoDB Documentation Project July 08, 2015 Contents 1 Aggregation Introduction 3 1.1 Aggregation Modalities..........................................

### Teamstudio USER GUIDE

Teamstudio Software Engineering Tools for IBM Lotus Notes and Domino USER GUIDE Edition 30 Copyright Notice This User Guide documents the entire Teamstudio product suite, including: Teamstudio Analyzer

### Technical Report. The KNIME Text Processing Feature:

Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

### Monitoring Event Logs

Monitoring Event Logs eg Enterprise v6.0 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this document may be reproduced

### Dell KACE K1000 Management Appliance. Asset Management Guide. Release 5.3. Revision Date: May 13, 2011

Dell KACE K1000 Management Appliance Asset Management Guide Release 5.3 Revision Date: May 13, 2011 2004-2011 Dell, Inc. All rights reserved. Information concerning third-party copyrights and agreements,

### Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Chapter 1 Introduction 1 SAS: The Complete Research Tool 1 Objectives 2 A Note About Syntax and Examples 2 Syntax 2 Examples 3 Organization 4 Chapter by Chapter 4 What This Book Is Not 5 Chapter 2 SAS

### Performance Tuning using Upsert and SCD. Written By: Chris Price

Performance Tuning using Upsert and SCD Written By: Chris Price cprice@pragmaticworks.com Contents Upserts 3 Upserts with SSIS 3 Upsert with MERGE 6 Upsert with Task Factory Upsert Destination 7 Upsert

### THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

### Tips and Tricks SAGE ACCPAC INTELLIGENCE

Tips and Tricks SAGE ACCPAC INTELLIGENCE 1 Table of Contents Auto e-mailing reports... 4 Automatically Running Macros... 7 Creating new Macros from Excel... 8 Compact Metadata Functionality... 9 Copying,

The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

### DiskPulse DISK CHANGE MONITOR

DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com info@flexense.com 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product

### Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

### Software Solutions - 375 - Appendix B. B.1 The R Software

Appendix B Software Solutions This appendix provides a brief discussion of the software solutions available to researchers for computing inter-rater reliability coefficients. The list of software packages

### Business Intelligence Tutorial: Introduction to the Data Warehouse Center

IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse Center Version 8 IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse

### Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

### 6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)

### Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

### Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

### GUIDE TO REDCAP EXPORTED FILES

GUIDE TO REDCAP EXPORTED FILES UNDERSTANDING DATA FORMATS AND LOADING DATA INTO ANALYSIS SOFTWARE INTRODUCTION At some point in time in the course of your REDCap project, you will need to export your data

### Oracle Database: Introduction to SQL

Oracle University Contact Us: 1.800.529.0165 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn View a newer version of this course It's important for developers to understand the

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Oracle Database 10g: Introduction to SQL

Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.

### Package retrosheet. April 13, 2015

Type Package Package retrosheet April 13, 2015 Title Import Professional Baseball Data from 'Retrosheet' Version 1.0.2 Date 2015-03-17 Maintainer Richard Scriven A collection of tools

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Performing Queries Using PROC SQL (1)

SAS SQL Contents Performing queries using PROC SQL Performing advanced queries using PROC SQL Combining tables horizontally using PROC SQL Combining tables vertically using PROC SQL 2 Performing Queries

### Oracle Database: Introduction to SQL

Oracle University Contact Us: 1.800.529.0165 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training teaches you how to write subqueries,

### System Administration and Log Management

CHAPTER 6 System Overview System Administration and Log Management Users must have sufficient access rights, or permission levels, to perform any operations on network elements (the devices, such as routers,

### Setting Up Person Accounts

Setting Up Person Accounts Salesforce, Summer 15 @salesforcedocs Last updated: June 30, 2015 Copyright 2000 2015 salesforce.com, inc. All rights reserved. Salesforce is a registered trademark of salesforce.com,

### Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

### Configuring Logging. Information About Logging CHAPTER

52 CHAPTER This chapter describes how to configure and manage logs for the ASASM/ASASM and includes the following sections: Information About Logging, page 52-1 Licensing Requirements for Logging, page