Hands on S4 Classes Yohan Chalabi ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich R/Rmetrics Workshop Meielisalp June 2009
Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods
Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods
S3/S4 History The appendix in Software for Data Analysis by Chambers [1] is of great interest to learn more about the history of the S language. First discussion at Bell labs in May 1976 for a new system to interface a large Fortran library. By the end of 1976 Rick Becker and John Chambers with the help of co-workers have a first implementation of S running locally on Honeywell OS. This new language is later ported to UNIX systems and becomes the S version 2. About ten years after the first meeting, a new version with concepts inspired from UNIX system is developed with focus on functional programming and with object self-description. This is the S version 3. Around 1992 is introduced the concept of classes and methods as known today by S4 classes.
Goal of the Tutorial The goal of this tutorial is to introduce concepts and methods of S4 classes in R. We will start with a brief overview of S3 classes and introduce S4 classes in comparison with their S3 counterparts. As an example we will implement a class which could represent a time series. This object will hold a data part (matrix), timestamps (timedate) and additional information in the form of data.frame s that we will call flag s. > library(timedate) > time <- as.character(timesequence(length.out=4)) > data <- matrix(round(rnorm(8), 3), ncol = 2) > colnames(data) <- c("col1", "col2") > flag <- data.frame(flag = sample(c("m", "F"), 4, replace = TRUE))
Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods
S3 classes An S3 class is defined by the special attribute class, a character string vector. In our case the, Bull class. 1 > bull <- data > attr(bull, "class") <- "Bull" > bull col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 attr(,"class") [1] "Bull" > bull <- data > class(bull) <- "Bull" > bull col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 attr(,"class") [1] "Bull" Note the class() function to define a class. 1 imagine bulls and cows mooing in the field next to the conference room
S3 classes Now we add new attributes for the timestamps and the additional information flag. > attr(bull, "time") <- time > attr(bull, "flag") <- flag > bull col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 attr(,"class") [1] "Bull" attr(,"time") [1] "2009-06-01 06:04:46" "2009-06-02 06:04:46" [3] "2009-06-03 06:04:46" "2009-06-04 06:04:46" attr(,"flag") flag 1 F 2 F 3 M 4 F
S3 methods In the world of S3 classes, methods of generic functions can be defined with a new functions named according to the scheme <generic name>.<class>. Here a generic function is a function which dispatches the S3 method with UseMethod(). There are some functions which are S3 generics. For example print(), plot(),... Note S3 methods only dispatch on the type of the first argument. If no method is found, the default methods is used (<generic name>.default). > print function (x,...) UseMethod("print") <environment: namespace:base> > head(methods(print)) #-> too many methods [1] "print.acf" "print.anova" "print.aov" [4] "print.aovlist" "print.ar" "print.arima"
S3 methods Let s define a print() method for our class "Bull" > print.bull <- function(x,...) { y <- matrix(c(x), ncol = ncol(x)) dimnames(y) <- list(as.character(attr(x, "time")), colnames(x)) cat("meielisalp\n") print(y) invisible(x) } > bull Meielisalp col1 col2 2009-06-01 06:04:46-2.123 1.126 2009-06-02 06:04:46 2.833-0.721 2009-06-03 06:04:46 0.921-1.976 2009-06-04 06:04:46-0.917-1.425
S3 generic Let s define a new generic function with its default method > dinner <- function(x,...) UseMethod("dinner") > dinner.default <- function(x,...) cat("a Swiss Fondue\n") This will give with our class > dinner(bull) A Swiss Fondue and with a defined method for the class Bull. > dinner.bull <- function(x,...) cat("hay!!\n") > dinner(bull) Hay!!
S3 group generic There are group generic methods for a specified group of functions Math, Ops, Summary and Complex. > methods("math") [1] Math.data.frame Math.Date Math.difftime [4] Math.factor Math.POSIXt > getgroupmembers("math") [1] "abs" "sign" "sqrt" "ceiling" "floor" [6] "trunc" "cummax" "cummin" "cumprod" "cumsum" [11] "exp" "expm1" "log" "log10" "log2" [16] "log1p" "cos" "cosh" "sin" "sinh" [21] "tan" "tanh" "acos" "acosh" "asin" [26] "asinh" "atan" "atanh" "gamma" "lgamma" [31] "digamma" "trigamma"
S3 inheritance S3 classes indirectly inherits the methods of its data part because the S3 objects is just an R object with attributes. More than one string can be added in the class attributes if one wants to share common properties between different classes. A good example are the classes : POSIXct, POSIXlt and POSIXt. > class(sys.time()) [1] "POSIXt" "POSIXct" > class(as.posixlt(sys.time())) [1] "POSIXt" "POSIXlt"
S3 Classes - Key Functions class() methods() UseMethod() NextMethod() Defines the class attribute Lists S3 methods for a class Generic function mechanism Invokes the next method
Drawbacks of S3 Classes It does not check the consistency of the class. It has no control on inheritance. S3 methods can only dispatch on the first argument. By the time S4 classes were introduced there were too many software implemented in S3 style. We have to live with both worlds.
Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods
S4 Classes A new class can be created with the function setclass(). It defines metadata with information about the new classes. setclass() requires the type of all components of the class. It ensure the consistency of the class. > setclass("cow", representation(data = "matrix", time = "character", flag = "data.frame")) [1] "Cow" > # class metadata >. C Cow Class "Cow" [in ".GlobalEnv"] Slots: Name: data time flag Class: matrix character data.frame
S4 Classes New instance of classes can be created with the function new(). > cow <- new("cow", data = data, time = time, flag = flag) > cow An object of class "Cow" Slot "data": col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 Slot "time": [1] "2009-06-01 06:04:46" "2009-06-02 06:04:46" [3] "2009-06-03 06:04:46" "2009-06-04 06:04:46" Slot "flag": flag 1 F 2 F 3 M 4 F
S4 Classes The structure of the class can be inspected with the str() function. > str(cow) Formal class 'Cow' [package ".GlobalEnv"] with 3 slots..@ data: num [1:4, 1:2] -2.123 2.833 0.921-0.917 1.126.......- attr(*, "dimnames")=list of 2......$ : NULL......$ : chr [1:2] "col1" "col2"..@ time: chr [1:4] "2009-06-01 06:04:46" "2009-06-02 06:04:46" "2009-06-03 06..@ flag:'data.frame': 4 obs. of 1 variable:....$ flag: Factor w/ 2 levels "F","M": 1 1 2 1
S4 slots A class representation is organized in slots which can be accessed by the operator @ : > cow@data col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 > cow@data <- data
restriction on the type of object in slots When a slot is assigned, the object is automatically checked for a valid slot type. For instance, if we try to assign a character vector to our @flag slot which is of type data.frame, we get an error. > cow@flag <- "bad" Error in checkslotassignment(object, name, value) : assignment of an object of class "character" is not valid for slot "flag" in an object of class "Cow"; is(value, "data.frame") is not TRUE
Inheritance In our definition of the Cow class, there is no inheritance method. Trying to use a generic function like + will throw an error. > cow + 1 Error in cow + 1 : non-numeric argument to binary operator
Inheritance But we could have defined the class with the contains argument in setclass(). Let s redefine our class such that it inherits from the class matrix. > setclass("cow", representation(time = "character", flag = "data.frame"), contains = "matrix") [1] "Cow" > cow <- new("cow", data, time = time, flag = flag) > cow + 1 An object of class "Cow" col1 col2 [1,] -1.123 2.126 [2,] 3.833 0.279 [3,] 1.921-0.976 [4,] 0.083-0.425 Slot "time": [1] "2009-06-01 06:04:46" "2009-06-02 06:04:46" [3] "2009-06-03 06:04:46" "2009-06-04 06:04:46" Slot "flag": flag
Inheritance Note a class inheriting from another class must have all slots from its superclass, and may define additional slots. S4 classes cannot inherits from S3 classes unless they have been redefined with the setoldclass() function. > getclass("cow") Class "Cow" [in ".GlobalEnv"] Slots: Name:.Data time flag Class: matrix character data.frame Extends: Class "matrix", from data part Class "array", by class "matrix", distance 2 Class "structure", by class "matrix", distance 3 Class "vector", by class "matrix", distance 4, with explicit coerce
S4 Validity Checks We can also define validity checks with setvalidity(). > validitycow <- function(object) { if (nrow(object@flag)!= nrow(object)) return("length of '@flag' not equal to '@.Data' extent") TRUE } > setvalidity("cow", validitycow) Class "Cow" [in ".GlobalEnv"] Slots: Name:.Data time flag Class: matrix character data.frame Extends: Class "matrix", from data part Class "array", by class "matrix", distance 2 Class "structure", by class "matrix", distance 3 Class "vector", by class "matrix", distance 4, with explicit coerce
S4 Validity Checks Now we define our own initialize() method to ensure that objects created with new() are valid. > setmethod("initialize", "Cow", function(.object,...) { value <- callnextmethod() validobject(value) value }) [1] "initialize" > new("cow", data, flag = data.frame(flag[1:3,])) Error in validobject(value) : invalid class "Cow" object: length of '@flag' not equal to '@.Data' extent
S4 Methods As you have just seen in the previous chunk, S4 methods are defined with setmethod(). Let s write a show() method for our class. > setmethod("show", "Cow", function(object) { value <- getdatapart(object) rownames(value) <- as.character(slot(object, "time")) flag <- as.matrix(slot(object, "flag")) colnames(flag) <- paste(colnames(flag), "*", sep ="") cat("meielisalp\n") print(cbind(value, flag), right = TRUE, quote = FALSE) }) [1] "show" > cow Meielisalp col1 col2 flag* 2009-06-01 06:04:46-2.123 1.126 F 2009-06-02 06:04:46 2.833-0.721 F 2009-06-03 06:04:46 0.921-1.976 M 2009-06-04 06:04:46-0.917-1.425 F
S4 Generic S4 generics are defined with setgeneric() and standardgeneric(). > setgeneric("cowseries", function(x, time, flag,...) standardgeneric("cowseries")) [1] "cowseries" Unlike S3 methods, the S4 setmethod() can turn any existing function to a generic, except primitive functions. Dispatch on primitive functions is implemented in C level and most of the primitive functions in R have it. One can also define group generics with setgroupgeneric() or use the predefined groups : Arith, Compare, Ops, Logic, Math, Math2, Summary, Complex.
Multiple dispatch S3 methods are only dispatch on the first argument. You often need many if... else... in your code when you are dealing with different argument types. > graphics:::plot.factor function (x, y, legend.text = NULL,...) { if (missing(y) is.factor(y)) { dargs <- list(...) axisnames <- if (!is.null(dargs$axes)) dargs$axes else if (!is.null(dargs$xaxt)) dargs$xaxt!= "n" else TRUE } if (missing(y)) { barplot(table(x), axisnames = axisnames,...) } else if (is.factor(y)) { if (is.null(legend.text)) spineplot(x, y,...) else { args <- c(list(x = x, y = y), list(...)) args$yaxlabels <- legend.text
Multiple dispatch With S4 methods you can define the type of all argument and also the special types ANY and missing. > setmethod("cowseries", signature("matrix", "character", "data.frame"), function(x, time, flag,...) new("cow", x, time = time, flag = flag)) [1] "cowseries" > cowseries(data, time, flag) Meielisalp col1 col2 flag* 2009-06-01 06:04:46-2.123 1.126 F 2009-06-02 06:04:46 2.833-0.721 F 2009-06-03 06:04:46 0.921-1.976 M 2009-06-04 06:04:46-0.917-1.425 F
Multiple dispatch > setmethod("cowseries", signature("matrix", "POSIXct", "character"), function(x, time, flag,...) { time <- as(time, "character") flag <- as.data.frame(flag) callgeneric(x, time, flag,...) }) [1] "cowseries" > timect <- seq(from = Sys.time(), to = (Sys.time() + 4*3600), length.out = 4) > flagstr <- as.character(flag[[1]]) > cowseries(data, timect, flagstr) Meielisalp col1 col2 flag* 2009-06-30 08:04:47-2.123 1.126 F 2009-06-30 09:24:47 2.833-0.721 F 2009-06-30 10:44:47 0.921-1.976 M 2009-06-30 12:04:47-0.917-1.425 F
Object Conversion as() can be used to convert an object to another class > as(cow, "matrix") col1 col2 [1,] -2.123 1.126 [2,] 2.833-0.721 [3,] 0.921-1.976 [4,] -0.917-1.425 and one can defined conversion methods with setas(). Let s define a more appropriate as() method for our class : > setas("cow", "matrix", function(from) { value <- getdatapart(from) rownames(value) <- as.character(slot(from, "time")) value }) [1] "coerce<-" > as(cow, "matrix") col1 col2 2009-06-01 06:04:46-2.123 1.126 2009-06-02 06:04:46 2.833-0.721 2009-06-03 06:04:46 0.921-1.976 2009-06-04 06:04:46-0.917-1.425
What is an S4 class in R? S4 slots are actually attributes and, in low level, S4 objects has a special S4 bit > attrscow <- attributes(cow) > madcow <- data > attributes(madcow) <- attrscow > ass4(madcow) Meielisalp col1 col2 flag* 2009-06-01 06:04:46-2.123 1.126 F 2009-06-02 06:04:46 2.833-0.721 F 2009-06-03 06:04:46 0.921-1.976 M 2009-06-04 06:04:46-0.917-1.425 F BUT! You have to promise that you will never use such a trick!
What is an S4 class in R? S4 slots are actually attributes and, in low level, S4 objects has a special S4 bit > attrscow <- attributes(cow) > madcow <- data > attributes(madcow) <- attrscow > ass4(madcow) Meielisalp col1 col2 flag* 2009-06-01 06:04:46-2.123 1.126 F 2009-06-02 06:04:46 2.833-0.721 F 2009-06-03 06:04:46 0.921-1.976 M 2009-06-04 06:04:46-0.917-1.425 F BUT! You have to promise that you will never use such a trick!
S4 Classes - Key functions setclass() new() setgeneric() setmethods() as() / setas() @ / slot() setvalidity() / validobject() getclass() / showmethods() / getmethod() define classes create objects define generics define methods convert objects access slots check object validity access registry
References I J.M. Chambers Software for data analysis: programming with R Springer, 2008. R Development Core Team?Clasees and?methods manual pages 2009.
> tolatex(sessioninfo()) R version 2.10.0 Under development (unstable) (2009-06-23 r48824), i686-pc-linux-gnu Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERI... Base packages: base, datasets, graphics, grdevices, methods, stats, utils Other packages: timedate 2100.86 Loaded via a namespace (and not attached): tools 2.10.0
Hands on S4 Classes Yohan Chalabi ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich R/Rmetrics Workshop Meielisalp June 2009