# The RevoScaleR Data Step White Paper

Size: px
Start display at page:

## Transcription

1 REVOLUTION ANALYTICS WHITE PAPER The RevoScaleR Data Step White Paper By Joseph B. Rickert INTRODUCTION This paper provides an introduction to working with large data sets with Revolution Analytics proprietary R package, RevoScaleR. Although the main focus is on the use and capabilities of the rxdatastep function, we take a broad view and describe the capabilities of the functions in the RevoScaleR package that may be useful for reading and manipulating large data sets, cleaning them, and preparing them for statistical analysis with R. BIG DATA AND ITS CHALLENGES Many What is a large data set? These days, one company s impossibly huge data set may be approximately the same size as another company s ordinary, typical size, morning log file. For the purpose of this paper, a large data set satisfies two criteria: 1. It is a file that contains data structured as rows of observations and columns of variables 2. It is too large for R to get all of the data into memory and analyze it on the computer that is available The intention of the first criterion is to distinguish between all the data you may have, in a database or data warehouse and the data that need to be analyzed as a coherent whole for a particular project. No matter what its origin, we assume that the data to be analyzed can be shaped into a flat file with perhaps billions or rows and thousands of columns. The second criterion acknowledges that size of a data file must be measured relative to the amount of computing resources at hand. The R language is memory constrained. Normally, RevoScaleR and a few other big-data packages being exceptions, all of the data must fit into the available RAM, in addition to any copies or data created during analysis. So, though objectively your data file might not be that big, it doesn t much matter if your PC can t handle it. The first challenge in analyzing a large data set is just to get hold of it: to see what variables it contains, to figure out their data types, compute basic summary statistics and get an idea of missing values and possible errors. The next task is to prepare the data for analysis, which in addition to cleaning the data may involve supplementing the data set with additional information, removing unnecessary variables and, perhaps, transforming some variables in a way that makes sense for the contemplated analysis. 1

2 Moreover, it is likely that the data set will eventually be reduced in size as the analysis proceeds. Perhaps only a subset of all the available variables are useful for the final model, or some natural stratification exists in the data set that would naturally lead to the data being distributed among many smaller files, or the some sort of aggregation step such as summing counts of time stamped data to form daily or monthly time series will boil down the data. It is for these purposes: cleaning and preparing large data sets for analysis that the data step functions of RevoScaleR were designed. THE REVOSCALER APPROACH TO BIG DATA Revolution Analytics RevoScaleR package overcomes the memory limitation of R by providing a binary file (extension.xdf) format that is optimized for processing blocks of data at a time. The xdf file format stores data in a way that facilitates the operation of external memory algorithms. RevolScaleR provides functions for importing data from different kinds of sources into xdf files, manipulating data in xdf files and performing statistical analyses directly on data in these files. In this paper we showcase the data handling capabilities of RevoScaleR for reading, writing, sorting, merging and transforming data stored in xdf files. Data handling in RevoScaleR is built around the function rxdatastep. We introduce this function in the next section, and then consider an example illustrating how the family of RevoScaleR data manipulation functions play together. The rxdatastepfunction The RevoScaleR function rxdatastep has been designed to be the work horse for manipulating xdf files and selecting, and transforming the data in these files. It can also be used with data frames in memory. The function can perform all of its work on a single xdf file, or it can be used to create a new xdf file; transforming, selecting and filtering data in the process. The key arguments for rxdatastep are 1 : indata: a data frame,.xdf file name, or a data source representation of an.xdf file outfile: an optional name of an.xdf file to store the output data. If omitted, a data frame is returned varstokeep or varstodrop: a vector of variable names to either keep in the data set or drop from the data set rowselection: a logical expression indicating which rows or observations to keep in the data set transforms: a list of expressions for creating or transforming variables We will exercise these and some of the additional available options while working through an extended example. Additionally, we will gain some experience with rximport which is used to import data files, rxsort used to sort data sets, rxmerge used to merge two data sets, and rxcube, a cross tabulation function that is useful for reducing data stored in xdf files. 1 Please see Appendix 1 for a description of all of rxdatastep parameters and the RevoScaleR User s Guide for a complete description for all RevoScaleR commands. 2

3 Importing Large Data Files To illustrate some of the capabilities of rximport, we will work with a.csv file containing edge data for US domestic flights from 1990 to 2009 that is available for free from infochimps.com 2. At a little over 77MB and with 3.6 million records the file is of modest size but too big for Excel to open and big enough to get a feel for working with large files. Table 1 shows the first 10 records of the file which is named flights_with_colnames.csv. Table 1: First 10 records of airlines edge data file origin_airport destin_airport passengers flights month MHK AMW EUG RDM EUG RDM EUG RDM MFR RDM MFR RDM MFR RDM MFR RDM MFR RDM SEA RDM The first two columns of the airlines edge data file contain the three letter code for the origin and destination airports. The last column identifies the month for which the data were collected. The third and fourth columns contain the total number of passengers and total flights between the origin and destination airports for the month indicated. RevoScaleR s main function for importing data files is rximport. It imports delimited and fixed format text, SAS and SPSS files as well as data stored in a relational database via odbc. Smaller data sets can be imported into a data frame in memory; larger data sets are best stored in an xdf file. The code in BLOCK 1 of the R script in Appendix 2 illustrates using the rximport file to read the data in the.csv file into a.xdf file called FlightsXdf. The colclasses option is used to make sure origin_airport and destin_airport get read in as factors and month gets read in as a character string. This is in preparation for breaking apart the month and year later on. The output from the rxgetinfo function given in Table 2 shows that we have imported 3,606,803 rows of data. Table 2. First 10 rows of imported airline edge data File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\Flights.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Data (10 rows starting with row 1): origin_airport destin_airport passengers flights month 1 MHK AMW EUG RDM EUG RDM EUG RDM MFR RDM MFR RDM

4 7 MFR RDM MFR RDM MFR RDM SEA RDM A variation of the rxgetinfo function (Table 3) interrogates the meta data stored in the xdf file and shows some information about the data types of the stored variables Table 3. Meta data stored in xdf file for airlines edge data > rxgetinfo("flights", getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\Flights.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Variable information: Var 1: origin_airport 683 factor levels: MHK EUG MFR SEA PDX... CRE BOK BIH MQJ LCI Var 2: destin_airport 708 factor levels: AMW RDM EKO WDG END... COS HII PHD TBN OH1 Var 3: passengers, Type: integer, Low/High: (0, 89597) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Notice that there are 683 factor levels for the variable origin_airport and 708 destination airports. The maximum number of passengers carried between any origin/destination pair in one month was 89,597 and the maximum number of flights between any origin/destination pair was 1,128. Merging Xdf Files Before exploring the data, it would be nice to be able to interpret the airport codes. Infochimps.com has another file which gives the airport name associated with each code as well as additional information on airport locations. 3 Table 4 shows the first ten rows of this airport locations file. It is clear the variables Code and Airport_Name contain exactly the information required. Also note that the locations file is rather small (600KB). Even on a modest PC this entire file could be read into a data frame and concise R code written to merge it into an Xdf file. However, the approach we will take to merge the flights and locations file is treat them as if they are too big to be read into memory and perform the task entirely with xdf files as the only data structure. Table 4. Airport location data Code Lattitude Longitude Airport_Name City Country Country Code GMT offset Runway Length Runway Elevation MSW Massawa Massawa Eritrea ER International TES Tessenei Tessenei Eritrea ER ASM Yohannes IV Asmara Eritrea ER ASA Assab International Assab Eritrea ER NLK Norfolk Island Norfolk Island Norfolk Island NF URT Surat Thani Surat Thani Thailand TH

5 PHZ Phi Phi Island Phi Phi Island Thailand TH -7 PHS Phitsanulok Phitsanulok Thailand TH UTP Utapao Utapao Thailand TH UTH Udon Thani Udon Thani Thailand TH Dealing with Factor Data Conceptually, what we would like to do is read in this new file containing the airport names and merge it with our original data set using the airport codes which appear in both data sets as the key. Simple? Yes, conceptually simple, but we will be dealing with factor data and we must take care to ensure that the levels of the code variable in the new data set match the levels of the code variables origin_airport and destin_airport in the original data set. It is also the case that merging involves sorting, and sorting factor variables can be a cause for surprise to the uninitiated. In RevoScaleR, unless otherwise specified, factor levels are ordered by the order in which the values of the variable are read into the xdf file. This is because the data is read in in chunks. So if the first row with code TES happens to appear the first row with code ASA in a file, then, when sorted, the TES rows will appear before rows beginning with ASA. The following block of code uses a new function in RevoScaleR, rxfactors, to reset the levels of both the origin_airport factor variable and destin_airport factor variable to the union of the original levels for these variables. Note that a new flights file, FlightsRfXdf, is created in the process. The code to accomplish this is given in BLOCK 2 of the R script. Table 5 presents the meta-data for this new file and shows that origin_airport and destin_airport have the same levels which have been sorted into alphabetical order. Table 5. Meta-information for new flights file FlightsRfXdf showing reset factor levels > rxgetinfo("flightsrf",getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\FlightsRf.xdf Number of observations: Number of variables: 5 Number of blocks: 8 Variable information: Var 1: origin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 2: destin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 3: passengers, Type: integer, Low/High: (0, 89597) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Next (BLOCK 3), we read in the airport location data file, using the colinfo parameter in rximport to set the levels of the Code variable to the same as those in the flights data file. Explicitly setting the levels of a factor variable with the ColInfo command is the preferred method of importing factor levels. We also restrict the data import to the two variables of interest: Code and Airport_Name. Merging Two Xdf Files Before merging the two files, we first rename the variables in the AirportNames data file (BLOCK 4 of the R script) using the rxsetvarinfo function. Note that, when this process is completed, the variable origin_code will exist in both the flights data file FlightsRfXdf and AirportNames so that it can be used as the key in the merge step. We then use the rxmerge function to perform a left 5

6 outer join, with FlightsRfXdf as the left file, the file for which all records will be retained, and AirportNames as the right file, the file whose records will be merged in. As mentioned above, the variable origin_airport which exists in both files is the matching variable. Table 6 shows five rows from the merged files beginning with row 300,000. Table 6. 5 rows from the merged FlightsRf and AirportNames File > rxgetinfo(data="mergedflights",numrows=5,startrow=300000) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\MergedFlights.xdf Number of observations: Number of variables: 6 Number of blocks: 16 Data (5 rows starting with row 3e+05): origin_airport destin_airport passengers flights month origin_name 1 BGM CVG Greater Binghamton 2 BGM CVG Greater Binghamton 3 BGM CVG Greater Binghamton 4 BGM CVG Greater Binghamton 5 BGM CVG Greater Binghamton To get the full names of the destination airports into our newly created airlines data file, we repeat the entire process, again first renaming the variables in AirportNames (BLOCK 5). Table 7 show five lines from the final airlines flights file, FinalFlights.xdf starting at line 300,000. Notice that the data displayed are different from those displayed in Table 6. The reason for this is that during the second merge, the merge of the destination airport names, rxmerge internally sorted the resulting file by the destin_airport variable. Table 7. 5 rows from the final airlines flights file, FinalFlights.xdf > rxgetinfo(data="finalflights",numrows=5,startrow=300000) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\FinalFlights.xdf Number of observations: Number of variables: 7 Number of blocks: 32 Data (5 rows starting with row 3e+05): origin_airport destin_airport passengers flights month origin_name 1 OXR BFL Ventura 2 PHX BFL Sky Harbor Intl 3 PHX BFL Sky Harbor Intl 4 PHX BFL Sky Harbor Intl 5 PHX BFL Sky Harbor Intl destin_name 1 Meadows Field 2 Meadows Field 3 Meadows Field 4 Meadows Field 5 Meadows Field Exploring the Data Let s continue to exercise the data manipulation features of RevoScaleR by exploring the airlines flights data file (BLOCK 6). Just to pick a place to start consider the question: which origin airports had the most flights in any given month? A natural way to proceed might be to sort the data and have a look. First, we use rxsort to sort the data by the flights variable in decreasing order to produce a list showing the airport origin / destination pairs with the most flights in any month. 6

7 The rxgetinfo function which we have seen before extracts the first five lines of this file and writes it into a list (Table 8). Table 8. List showing airport pairs with most flights in a month > mostflights5 File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\sortFlights.xdf Number of observations: Number of variables: 7 Number of blocks: 32 Data (5 rows starting with row 1): origin_airport destin_airport passengers flights month 1 SFO LAX LAX SFO HNL OGG OGG HNL OGG HNL origin_name destin_name 1 San Francisco International Los Angeles International 2 Los Angeles International San Francisco International 3 Honolulu International Kahului 4 Kahului Honolulu International 5 Kahului Honolulu International SF0 and LAX are right up there at the top and it appears that quite a bit of flying went on during the Christmas season of Also note that five airports each make the 10 ten list twice. It might be nice to see what activity was like for these airports over the time period of the whole data set. In preparation for generating a histogram showing the distribution of flights for these airports we will employ rxdatastep to create a file containing only the flights having SFO and LAX as the origin airports. The code to do this and some important housekeeping regarding the factor levels is in BLOCK 6. Notice that the variable top2 contains the character vector SFO LAX. The mechanism for making this global variable from the R environment to the rxdatastep function which will select out the SFO and LAX flights into a new xdf file, mostflights is through the argument transformobjects. This argument must be a named list, so we have transformobjects = list(top2to = top2), denoting a different name for the variable used in the transform environment. The housekeeping is handled through the transforms argument in rxdatastep. This transform creates a new varaible origin having the two factor levels in top2 from the variable orgin_airport which has 683 factor levels. To summarize: the input file to rxdatastep is the flights data file, FinalFlightsXdf, the output file is the smaller, new file, mostflightsxdf. The rowselection parameter uses the transform object vector top2to to pick out only the 2 origin airports in which we are interested. The argument transforms is set to the expression we looked at above. The argument transformobjects is the bridge between the R environment and the transform environment. It makes the vector top2to available to the data step. Overwrite = TRUE merely lets one overwrite a file that already exists. Table 9 displays the variables in mostflights.xdf. Notice that the new variable origin is a factor with only 2 levels, not the 683 levels of origin_airport. The reduced number of levels makes it possible to use origin as the selection variable in the histogram function. With the new file in place a, call to rxhistogram plots the monthly flight data for the the top 2 origin / destination airport pairs by the origin airports using the new variable created for this purpose. Figure 1 shows the results. 7

8 Table 9: Contents of file mostflights > rxgetinfo("mostflights",getvarinfo=true) File name: C:\Users\Joseph\Documents\Revolution\DATA STEP WHITE PAPER\mostFlights.xdf Number of observations: Number of variables: 8 Number of blocks: 32 Variable information: Var 1: origin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 2: destin_airport 727 factor levels: 1B1 ABE ABI ABQ ABR... YKN YNG YUM ZXX ZZV Var 3: passengers, Type: integer, Low/High: (0, 83153) Var 4: flights, Type: integer, Low/High: (0, 1128) Var 5: month, Type: character Var 6: origin_name, Type: character Var 7: destin_name, Type: character Var 8: origin 2 factor levels: SFO LAX Figure 1: Monthly flight distributions of 2 busiest airports Aggregating Data The edge airlines file that we have been working contains time stamped data from which several time series may be easily extracted. Each record contains the number of flights and the number of passengers that flew from one airport to another during a given month. So, for each origin airport, by aggregating the counts in all of the destination airports, we can produce a monthly time series of outgoing flights. This is an example of a kind of data aggregation task at which R excels and for which the RevoScaleR package has a function (rxcube) that is particularly useful for boiling down large data sets. It fits nicely with our notion of an extended data step and provides another opportunity to work with rxdatastep. Our first task will be to split apart the month field in Table 1 into separate month and year variables that we will eventually turn into a Date data type. We begin by writing new transforms to parse the character strings that comprise the month variable and use rxdatastep to create the 8

9 new variables, Month and Year, in the file (BLOCK 7). We also drop the variable origin_airport from the data set, and specify a new block size for the.xdf file of rows. The next step is to use the function rxcube to aggregate the data into a time series. rxcube is a cross tabulation function that returns its output in long form (one row for each combination) rather than in a table. When the formula that describes the tabulation to be done contains a dependent numeric variable, rxcube computes the average of the instances that were tabulated along with a new variable, Counts, that contains the number of that went into computing the average. In our case, the Table 10 indicates that for the 1 st month of 1990 the flights from SFO from 284 destination airports were summed and used to compute an average or flights per destination. Hence, we can multiply Counts by flights to get the total number of flights out of SFO for each time period (11,088 flights for January 1990). Table 10. Result of rxcube head(t1) Year Month origin flights Counts SFO SFO SFO SFO SFO SFO Next, we rename the variables and combine Month and Year into a variable of type Date. (Note that we arbitrarily selected the 28 th of each month to fill out the date format.) Table 12 displays the first six lines of the new data frame. Table 11. Data frame with date head(t1) Year Month origin avg.flights.per.destin total.destin flights.out Date SFO SFO SFO SFO SFO SFO Finally, we select SFO, sort by date to form a time series of total monthly flights originating from SFO and plot. Figure 2 shows the monthly flights out of SFO. 9

10 Figure 2. Time series of flights out of SFO Next, although it is really not in the scope of this paper to do any time series analysis, just for fun we perform a seasonal decomposition of the SFO time series and plot it in Figure 3. The first panel of Figure 3 reproduces the time series. The second panel shows the periodic, seasonal component. The third panel displays the trend and the fourth panel displays the residuals. Figure 4 and Figure 5 are the corresponding plots for the LAX time series. Figure 3. Seasonal decomposition of the SFO series 10

11 Figure 4. Time series of LAX flights Figure 5. Seasonal decomposition of LAX flights SUMMARY In this paper we have explored some of the data handling capabilities of Revolution Analytics RevoScaleR package by working through a practical example that included reading, writing, sorting and merging xdf files; selecting and transforming variables; and aggregating time-stamped data to produce multiple time series. Although the data sets employed in the example were not very large, the section on merging files was done entirely with external memory algorithms and xdf files to show how a similar process could be accomplished with arbitrarily large files. 11

12 The extended example also illustrated the natural way in which the data structures and functions of RevoScaleR integrate into an interactive R workflow. RevoScaleR provides all of the data handling and data preparation capabilities required to R ready for big data. About Revolution Analytics Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company s flagship Revolution R Enterprise product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Please visit us at 12

13 APPENDIX 1: rxdatastep {RevoScaleR} Data Step for.xdf Files and Data Frames Description Transform data from an input.xdf file or data frame to an output.xdf file or data frame Usage rxdatastep(indata = NULL, outfile = NULL, varstokeep = NULL, varstodrop = NULL, rowselection = NULL, transforms = NULL, transformobjects = NULL, transformfunc = NULL, transformvars = NULL, transformpackages = NULL, transformenvir = NULL, append = "none", overwrite = FALSE, removemissings = FALSE, computelowhigh = TRUE, maxrowsbycols = , rowsperread = -1, startrow = 1, numrows = -1, returntransformobjects = FALSE, blocksperread = rxgetoption("blocksperread"), reportprogress = rxgetoption("reportprogress"),...) Arguments indata infile outfile varstokeep varstodrop rowselection transforms transformobjects transformfunc a data frame, a character string specifying the input.xdf file, or an RxXdfData object. If NULL, a data set will be created automatically with a single variable,.rxrownums, containing row numbers. It will have a total of numrows rows with rowsperread rows in each block. either an RxXdfData object or a character string specifying the input.xdf file. a character string specifying the output.xdf file or an RxXdfData object. If NULL, a data frame will be returned from rxdatastep unless returntransformobjects is set to TRUE. Setting outfile to NULL and returntransformobjects=true allows chunkwise computations on the data without modifying the existing data or creating a new data set. character vector of variable names to include when reading from the input data file. If NULL, argument is ignored. Cannot be used with varstodrop. character vector of variable names to exclude when reading from the input data file. If NULL, argument is ignored. Cannot be used with varstokeep. name of a logical variable or a logical expression (using variables in the data set) for row selection. For example, rowselection = (age >= 20) & (age <= 65) & (value > 20) will select the rows corresponding to ages on the interval [20, 65] and with values greater than 20. As with all expressions, rowselection (or transforms) can be defined outside of the function call using the expression function. an expression of the form list(name = expression,...) representing the first round of variable transformations. As with all expressions, transforms (or rowselection) can be defined outside of the function call using the expression function. a named list containing objects that can be referenced by transforms, transformsfunc, and rowselection. variable transformation function. See rxtransform for details. 13

15 APPENDIX 2: R SCRIPT DATA STEP WHITE PAPER CODE Joseph B. Rickert November 2011 READ MAIN DATA FILE BLOCK 1 READ IN PRIMARY DATA FILE FlightsText <- "C:/Users/Joseph/Documents/DATA/US Flights/flights_with_colnames.csv" Note we make month a character field so we can parse it into month and year later system.time( rximport(indata = FlightsText, outfile = "Flights", overwrite = TRUE, colclasses=c(origin_airport = "factor", destin_airport = "factor", month="character")) ) Look at meta-data stored with the file rxgetinfo("flights", numrows = 10) rxgetinfo(data = flightsds, getvarinfo = TRUE) MERGE FILES BLOCK 2 Rationalize the factor levels for the origin and destination airports in the flights file flightvarinfo <- rxgetvarinfo("flights") originlevels <- flightvarinfo\$origin_airport\$levels destinlevels <- flightvarinfo\$destin_airport\$levels Create a combined list of sorted, unique factors airportlevels <- sort(unique(c(originlevels, destinlevels))) Use rxfactors to re-order the factor levels and make them the same FlightsRfXdf contains the Redone factor levels rxfactors(indata = "Flights", outfile = "FlightsRf", overwrite = TRUE, factorinfo = list( destin_airport = list(newlevels = airportlevels), origin_airport = list(newlevels = airportlevels)) ) Now origin_airport and destin_airport both have the same number of levels rxgetinfo("flightsrf", getvarinfo=true) BLOCK 3 Read in the Airport Locations File, Set the Code factor levels to the same as those used in the flight data, and remove any names that are not used that aren't used Only read in the variables of interest: Code and Airport_Name 15

16 LocationsText <- "C:/Users/Joseph/Documents/DATA/Airport Locations/Airport_locations2.csv" rximport(indata = LocationsText, outfile = "AirportNames", varstokeep = c("code", "Airport_Name"), colinfo = list( Code = list(type = "factor", levels = airportlevels )), rowselection =!is.na(code), overwrite=true) Check to see that the levels are the same rxgetinfo("airportnames", getvarinfo=true) BLOCK 4 Rename the variables in AirportNames to match the desired names for the merged data set namevarinfo <- rxgetvarinfo("airportnames") namevarinfo\$code\$newname <- "origin_airport" namevarinfo\$airport_name\$newname <- "origin_name" rxsetvarinfo(namevarinfo, data = "AirportNames") Perform left merge, matching on origin airport FlightsRfXdf is the "left" file. NamesOrigin is the "right" file. rxmerge(indata1 = "FlightsRf", indata2 = "AirportNames", outfile = "MergedFlights.xdf", type = "left", matchvars = c("origin_airport"), overwrite = TRUE) rxgetinfo(data = "MergedFlights", numrows=5, startrow=300000) BLOCK 5 Rename the variables again to prepare for a merge for the destination airport namevarinfo <- rxgetvarinfo("airportnames") namevarinfo\$origin_airport\$newname <- "destin_airport" namevarinfo\$origin_name\$newname <- "destin_name" rxsetvarinfo(namevarinfo, data = "AirportNames") Perform a second left merge, matching on destination airport MergedFlights is the "left" file. NamesDestin is the "right" file. rxmerge(indata1 = "MergedFlights", indata2 = "AirportNames", outfile = "FinalFlights.xdf", type = "left", matchvars = c("destin_airport"), overwrite = TRUE) rxgetinfo(data = "FinalFlights", numrows=5, startrow=300000) DATA EXPLORATION BLOCK 6 16

17 Find airports connections with most flights in a given month rxsort(indata="finalflights", outfile = "sortflights.xdf", sortbyvars="flights", decreasing = TRUE,overwrite=TRUE) rxgetinfo(data="sortflights") mostflights5 <- rxgetinfo(data = "sortflights", numrows=5, startrow=1) mostflights5 top5f <- as.data.frame(mostflights5[[5]]) topoa <- unique(as.vector(top5f\$origin_airport)) topoa Select the top 2 top2 <- topoa[1:2] top2 Build a file with only flights originating at SFO and LAX Use a transform object to pass in the top2 levels rxdatastep(indata = "FinalFlights", outfile = "mostflights", rowselection = origin_airport %in% top2to, transforms = list( origin = factor(origin_airport, levels = top2to, labels = top2to)), transformobjects = list(top2to = top2), overwrite = TRUE) rxgetinfo("mostflights", numrows=10, startrow=1) rxgetinfo("mostflights", getvarinfo=true) rxhistogram(~flights origin, data="mostflights") AGGREGATING TIME SERIES DATA BLOCK 7 CREATE A TIME SERIES FILE Write transforms to separate out the month and year data rxdatastep(indata = "mostflights", outfile = "SFO.LAX", transforms = list( Month = as.integer(substring(month, 5, 6)), Year = as.integer(substring(month, 1, 4))), varstodrop = c("origin_airport"), rowsperread = 50000, overwrite = TRUE ) rxgetinfo(data = "SFO.LAX", numrows=10) Aggregate data into monthly flights time series t1 <- rxcube(flights ~ F(Year):F(Month):origin, removezerocounts=true, data = "SFO.LAX") t1 <- rxresultsdf(t1) head(t1) Compute total flights out and combine month and dat into a date t1\$flights_out <- t1\$flights*t1\$counts names(t1) <- c("year","month","origin","avg.flights.per.destin","total.destin","flights.out") t1\$date <- as.date(as.character(paste(t1\$month,"- 28 -",t1\$year)),"%m - %d - %Y") head(t1) BLOCK 7 Select SFO entries and plot 17

18 SFO.t1 <- t1[t1\$origin=="sfo",] head(sfo.t1) SFO.t1 <- SFO.t1[order(SFO.t1\$Date),] Make x axis a date and plot x <-SFO.t1\$Date y <-SFO.t1\$flights.out library(ggplot2) make sure ggplot is package is loaded qplot(x,y, geom="line",xlab="", ylab="number of Flights\n",main="Monthly Flights Out of SFO") summary(sfo.t1\$flights.out) Try a seasonal decomposition SFO.ts <- ts(y,start=c(1990,1),freq=12) sd.sfo <- stl(sfo.ts,s.window="periodic") plot(sd.sfo) LAX.t1 <- t1[t1\$origin=="lax",] LAX.t1 <- LAX.t1[order(LAX.t1\$Date),] Make x axis a date and plot x <-LAX.t1\$Date y <-LAX.t1\$flights.out qplot(x,y, geom="line",xlab="", ylab="number of Flights\n",main="Monthly Flights Out of LAX") summary(lax.t1\$flights.out) Perform a seasonal decomposition LAX.ts <- ts(y,start=c(1990,1),freq=12) sd.lax <- stl(lax.ts,s.window="periodic") plot(sd.lax). 18

### RevoScaleR Speed and Scalability

EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

### Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

### Big Data Analysis with Revolution R Enterprise

Big Data Analysis with Revolution R Enterprise August 2010 Joseph B. Rickert Copyright 2010 Revolution Analytics, Inc. All Rights Reserved. 1 Background The R language is well established as the language

### SPSS: Getting Started. For Windows

For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering

### COURSE CONTENT R Programming Training

COURSE CONTENT R Programming Training Module 1: Essential to R programming 1: An Introduction to R History of S and R Introduction to R The R environment What is Statistical Programming? Why use a command

### Big Data Analysis with Revolution R Enterprise

REVOLUTION WHITE PAPER Big Data Analysis with Revolution R Enterprise By Joseph Rickert January 2011 Background The R language is well established as the language for doing statistics, data analysis, data-mining

### Bloomberg 1 Database Extension for EViews

Bloomberg 1 Database Extension for EViews Overview The Bloomberg Database Extension is a new feature for EViews 8.1 that adds easy access to Bloomberg s extensive collection of market and economic data

### System Administration and Log Management

CHAPTER 6 System Overview System Administration and Log Management Users must have sufficient access rights, or permission levels, to perform any operations on network elements (the devices, such as routers,

### Oracle Data Miner (Extension of SQL Developer 4.0)

An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

Package HadoopStreaming February 19, 2015 Type Package Title Utilities for using R scripts in Hadoop streaming Version 0.2 Date 2009-09-28 Author David S. Rosenberg Maintainer

### Wave Analytics External Data API Developer Guide

Wave Analytics External Data API Developer Guide Salesforce, Winter 16 @salesforcedocs Last updated: November 6, 2015 Copyright 2000 2015 salesforce.com, inc. All rights reserved. Salesforce is a registered

### SQL Server An Overview

SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system

### 2. Basic Relational Data Model

2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

Arena 9.0 Basic Modules based on Arena Online Help Create This module is intended as the starting point for entities in a simulation model. Entities are created using a schedule or based on a time between

### ACL Command Reference

ACL Command Reference Test or Operation Explanation Command(s) Key fields*/records Output Basic Completeness Uniqueness Frequency & Materiality Distribution Multi-File Combinations, Comparisons, and Associations

### Visualization with Excel Tools and Microsoft Azure

Visualization with Excel Tools and Microsoft Azure Introduction Power Query and Power Map are add-ins that are available as free downloads from Microsoft to enhance the data access and data visualization

### NEXT Analytics Business Intelligence User Guide

NEXT Analytics Business Intelligence User Guide This document provides an overview of the powerful business intelligence functions embedded in NEXT Analytics v5. These functions let you build more useful

### ETL-EXTRACT, TRANSFORM & LOAD TESTING

ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA rajesh.popli@nagarro.com ABSTRACT Data is most important part in any organization. Data

### Web Intelligence User Guide

Web Intelligence User Guide Office of Financial Management - Enterprise Reporting Services 4/11/2011 Table of Contents Chapter 1 - Overview... 1 Purpose... 1 Chapter 2 Logon Procedure... 3 Web Intelligence

### Oracle Data Miner (Extension of SQL Developer 4.0)

An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining

### Database Applications Microsoft Access

Database Applications Microsoft Access Lesson 4 Working with Queries Difference Between Queries and Filters Filters are temporary Filters are placed on data in a single table Queries are saved as individual

### DATA STRUCTURES. Hanan Samet* Computer Science Department University of Maryland College Park, Maryland 20742

DATA STRUCTURES Hanan Samet* Computer Science Department University of Maryland College Park, Maryland 20742 * Copyright 2006 by Hanan Samet. All figures are also copyright 2006 by Hanan Samet. The field

### Selecting a Sub-set of Cases in SPSS: The Select Cases Command

Selecting a Sub-set of Cases in SPSS: The Select Cases Command When analyzing a data file in SPSS, all cases with valid values for the relevant variable(s) are used. If I opened the 1991 U.S. General Social

### DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

### Oracle Database 12c: Introduction to SQL Ed 1.1

Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Introduction to SQL Ed 1.1 Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training helps you write subqueries,

### Oracle Database: Introduction to SQL

Oracle University Contact Us: +381 11 2016811 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn Understanding the basic concepts of relational databases ensure refined code by developers.

### Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP

Paper 232-2012 Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Audrey Ventura, SAS Institute Inc., Cary, NC ABSTRACT Effective data analysis requires easy

### IBM SPSS Data Preparation 22

IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release

### Oracle Database 11g SQL

AO3 - Version: 2 19 June 2016 Oracle Database 11g SQL Oracle Database 11g SQL AO3 - Version: 2 3 days Course Description: This course provides the essential SQL skills that allow developers to write queries

### Data Integrity in Travel Management Reporting

The travel management reports produced by back office systems and third-party reporting tools are often plagued with omissions, errors and inconsistencies. Although the output of a system is generally

### Adobe Marketing Cloud Data Workbench Monitoring Profile

Adobe Marketing Cloud Data Workbench Monitoring Profile Contents Data Workbench Monitoring Profile...3 Installing the Monitoring Profile...5 Workspaces for Monitoring the Data Workbench Server...8 Data

### Oracle Database: Introduction to SQL

Oracle University Contact Us: 1.800.529.0165 Oracle Database: Introduction to SQL Duration: 5 Days What you will learn View a newer version of this course This Oracle Database: Introduction to SQL training

### Microsoft Access 3: Understanding and Creating Queries

Microsoft Access 3: Understanding and Creating Queries In Access Level 2, we learned how to perform basic data retrievals by using Search & Replace functions and Sort & Filter functions. For more complex

### Oracle SQL. Course Summary. Duration. Objectives

Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data

### CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

### IBM Netezza Database User s Guide

IBM Netezza 7.0.x IBM Netezza Database User s Guide Revised: October 5, 2012 20284-15 Rev. 1 Note: Before using this information and the product that it supports, read the information in Notices and Trademarks

### Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

### Instant SQL Programming

Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

### Database Migration from MySQL to RDM Server

MIGRATION GUIDE Database Migration from MySQL to RDM Server A Birdstep Technology, Inc. Raima Embedded Database Division Migration Guide Published: May, 2009 Author: Daigoro F. Toyama Senior Software Engineer

### SQL Server. 1. What is RDBMS?

SQL Server 1. What is RDBMS? Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained

### Table of Contents. Overview of Segments 2 Creating Segments 3 Managing Segments 17 Deleting Segments 23 Examples of Using Segments 24

Table of Contents Overview of Segments 2 Creating Segments 3 Managing Segments 17 Deleting Segments 23 Examples of Using Segments 24 Overview of Segments An email segment defines the list of recipients

### THE POWER OF PROC FORMAT

THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use

### Contents. Part 1 Getting Started 1. Part 2 DATA Step Processing 27. List of Programs xv Preface xxix Acknowledgments xxxi. Chapter 1 What Is SAS?

From Learning SAS by Example. Full book available for purchase here. Contents List of Programs xv Preface xxix Acknowledgments xxxi Part 1 Getting Started 1 Chapter 1 What Is SAS? 3 1.1 Introduction 3

### Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

### Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants

### SAP InfiniteInsight Explorer Analytical Data Management v7.0

End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight Explorer Analytical Data Management v7.0 User Guide CUSTOMER Table of Contents 1 Welcome to this Guide... 3 1.1 What this Document

### User Guide. Analytics Desktop Document Number: 09619414

User Guide Analytics Desktop Document Number: 09619414 CONTENTS Guide Overview Description of this guide... ix What s new in this guide...x 1. Getting Started with Analytics Desktop Introduction... 1

### HYPERION DATA RELATIONSHIP MANAGEMENT RELEASE 9.3.1 BATCH CLIENT USER S GUIDE

HYPERION DATA RELATIONSHIP MANAGEMENT RELEASE 9.3.1 BATCH CLIENT USER S GUIDE Data Relationship Management Batch Client User s Guide, 9.3.1 Copyright 1999, 2007, Oracle and/or its affiliates. All rights

### Once the schema has been designed, it can be implemented in the RDBMS.

2. Creating a database Designing the database schema... 1 Representing Classes, Attributes and Objects... 2 Data types... 5 Additional constraints... 6 Choosing the right fields... 7 Implementing a table

### SAS Data Set Encryption Options

Technical Paper SAS Data Set Encryption Options SAS product interaction with encrypted data storage Table of Contents Introduction: What Is Encryption?... 1 Test Configuration... 1 Data... 1 Code... 2

### Cloud Computing at Google. Architecture

Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

### Everything you wanted to know about MERGE but were afraid to ask

TS- 644 Janice Bloom Everything you wanted to know about MERGE but were afraid to ask So you have read the documentation in the SAS Language Reference for MERGE and it still does not make sense? Rest assured

### 6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)

### 1 Definition of a Turing machine

Introduction to Algorithms Notes on Turing Machines CS 4820, Spring 2009 Wednesday, April 8 1 Definition of a Turing machine Turing machines are an abstract model of computation They provide a precise,

### CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables

### Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular

### MongoDB Aggregation and Data Processing Release 3.0.4

MongoDB Aggregation and Data Processing Release 3.0.4 MongoDB Documentation Project July 08, 2015 Contents 1 Aggregation Introduction 3 1.1 Aggregation Modalities..........................................

### MOC 10774A: Querying Microsoft SQL Server 2012

MOC 10774A: Querying Microsoft SQL Server 2012 Course Overview This course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2012. Course

### Big Data Decision Trees with R

REVOLUTION ANALYTICS WHITE PAPER Big Data Decision Trees with R By Richard Calaway, Lee Edlefsen, and Lixin Gong Fast, Scalable, Distributable Decision Trees Revolution Analytics RevoScaleR package provides

### Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,

### PentaMetric battery Monitor System Sentry data logging

PentaMetric battery Monitor System Sentry data logging How to graph and analyze renewable energy system performance using the PentaMetric data logging function. Bogart Engineering Revised August 10, 2009:

### Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

### Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch

### Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

D80198GC10 Oracle Database 12c SQL and Fundamentals Summary Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff Level Professional Delivery Method Instructor-led

### Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide

Open Crystal Reports From the Windows Start menu choose Programs and then Crystal Reports. Creating a Blank Report Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick

### Can you design an algorithm that searches for the maximum value in the list?

The Science of Computing I Lesson 2: Searching and Sorting Living with Cyber Pillar: Algorithms Searching One of the most common tasks that is undertaken in Computer Science is the task of searching. We

### Query 4. Lesson Objectives 4. Review 5. Smart Query 5. Create a Smart Query 6. Create a Smart Query Definition from an Ad-hoc Query 9

TABLE OF CONTENTS Query 4 Lesson Objectives 4 Review 5 Smart Query 5 Create a Smart Query 6 Create a Smart Query Definition from an Ad-hoc Query 9 Query Functions and Features 13 Summarize Output Fields

### Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

### Contents Preface Introduction Installing and Updating R Running R

Contents Preface... v 1 Introduction... 1 1.1 Overview... 1 1.2 Similarities Between R and Stata.......................... 2 1.3 WhyLearnR?... 3 1.4 Is R Accurate?..........................................

### Microsoft Access Basics

Microsoft Access Basics 2006 ipic Development Group, LLC Authored by James D Ballotti Microsoft, Access, Excel, Word, and Office are registered trademarks of the Microsoft Corporation Version 1 - Revision

### Microsoft Office Excel 2016: Part 2 (Intermediate)

10422020 Microsoft Office Excel 2016: Part 2 (Intermediate) In this course you'll learn create advanced workbooks and worksheets that can help deepen your understanding of organizational intelligence.

### BUSINESS DATA ANALYSIS WITH PIVOTTABLES

BUSINESS DATA ANALYSIS WITH PIVOTTABLES Jim Chen, Ph.D. Professor Norfolk State University 700 Park Avenue Norfolk, VA 23504 (757) 823-2564 jchen@nsu.edu BUSINESS DATA ANALYSIS WITH PIVOTTABLES INTRODUCTION

### Tips and Tricks SAGE ACCPAC INTELLIGENCE

Tips and Tricks SAGE ACCPAC INTELLIGENCE 1 Table of Contents Auto e-mailing reports... 4 Automatically Running Macros... 7 Creating new Macros from Excel... 8 Compact Metadata Functionality... 9 Copying,

### Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

### Lecture #2. Binary Numbers. Binary Numbers. PubH 6617 Fall 2006 Prof. Oakes

PubH 6617 Fall 2006 Prof. Oakes Lecture #2 Computer (binary) number system Character representation (numeric & string) ACSII & EBCDIC Databases Flat files (fixed, delimited, freeform) Binary files Hierarchical

### Excel Companion. (Profit Embedded PHD) User's Guide

Excel Companion (Profit Embedded PHD) User's Guide Excel Companion (Profit Embedded PHD) User's Guide Copyright, Notices, and Trademarks Copyright, Notices, and Trademarks Honeywell Inc. 1998 2001. All

### Data processing goes big

Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

### ODBC Driver Version 4 Manual

ODBC Driver Version 4 Manual Revision Date 12/05/2007 HanDBase is a Registered Trademark of DDH Software, Inc. All information contained in this manual and all software applications mentioned in this manual

### Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1

Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1 1 Contents Accessing the SAI reports... 3 Running, Copying and Pasting reports... 4 Creating and linking a report... 5 Auto e-mailing reports...

### MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 1 Contents Accessing the Sage MAS Intelligence Reports... 3 Copying, Pasting and Renaming Reports... 4 To create a new report from an existing report...

### Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

### Cache Configuration Reference

Sitecore CMS 6.2 Cache Configuration Reference Rev: 2009-11-20 Sitecore CMS 6.2 Cache Configuration Reference Tips and Techniques for Administrators and Developers Table of Contents Chapter 1 Introduction...

### Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

### Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

### Oracle Database: SQL and PL/SQL Fundamentals

Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

### The Complete Software Package to Developing a Complete Set of Ratio Edits

The Complete Software Package to Developing a Complete Set of Ratio Edits Roger Goodwin and Maria Garcia Roger.L.Goodwin@Census.gov Abstract We present documentation for running the GenBounds software

### Xopero Centrally managed backup solution. User Manual

Centrally managed backup solution User Manual Contents Desktop application...2 Requirements...2 The installation process...3 Logging in to the application...6 First logging in to the application...7 First