A Federated Model for Secure Web-Based Videoconferencing Douglas C. Sicker, Ameet Kulkarni, Anand Chavali, and Mudassir Fajandar Interdisciplinary Telecommunications Dept. and Dept. of Computer Science University of Colorado at Boulder Emails: douglas.sicker@colorado.edu; ameet.kulkarni@colorado.edu; anand.chavali@colorado.edu; and mudassir.fajandar@colorado.edu Abstract This paper describes efforts underway within Internet2 to create a secure federated IP based videoconferencing model. The objective is to create an environment that is user-friendly, ensures user privacy, and simplifies user management. This model makes use of the Session Initiation Protocol (SIP) as the underlying session establishment protocol. Since the session can (and most often will) be between domains, securing the process will involve inter-realm authentication and authorization, which gives rise to host of issues such as user privacy and authorization granularity. To address this issue, we make use of a federated trust model for sharing resources based on Shibboleth and the Security Assertion Markup Language (SAML), an XML-based security standard that describes the format and exchange of authentication and authorization information, such as identity, attributes, and artifacts. 1. Introduction Videoconferencing has failed to become as ubiquitous as many had hoped and predicted. In recent times, the development of low rate video codecs, the proliferation of the Internet, the web, and personal computers, and the advent of high rate access technologies have reduced some of the obstacles contributing to this failure. However, there still are a number of fundamental problems with the use and operation of videoconferencing. For videoconferencing to be more ubiquitous, it needs to become easier to deploy, manage and use. It should also be secure, particularly as this relates to requirements of interrealm communications. The model should focus on security from its inception, rather than apply such functions as afterthoughts. Such a cross-domain authentication and authorization processes should satisfy certain requirements and not burden users of network administrators. In order to support a federated model, delegation is practiced with each network domain in control of the information of the users on its domain. This seconds the general practice of network administrators to keep local information within the domain and reduces administrative man-hours, maintenance and investment. Further, it minimizes the concerns of exposing or releasing information that might be viewed as private. The last requirement is that these processes be transparent to the user, needing as little action from the user as possible. The environment should be as easy to use and familiar as browsing the web. This paper is organized as follows: First we provide an overview of some relevant background material. Next, we describe our approach to solving this problem with a focus on the security required for such a model. This entails describing the necessary protocol changes, including extending SIP response messaging, revising client behavior, creating a new role for MIME, and specifying a new binding for SAML. Finally, we present our conclusions and future work. 1 2. Background In this section of the paper, we briefly describe some background material relevant to this research. 1 Sponsoring Agency: UCAID/Internet 2, Project Title: Supporting Research and Collaboration through Integrated Middleware, Proposal No.: 0302.12.0476B.
2.1. SIP SIP is a protocol used for locating end points, and subsequently establishing, maintaining and terminating sessions between these endpoints. It operates by exchanging request messages called methods and responses to these methods. A SIP network essentially consists of SIP user agents that initiate requests and servers that reply. While this is an oversimplification of SIP, a detailed explanation can be found in RFC 3261. [1] 2.2. Federation Network resources exist as islands, controlled and maintained by a network authority, typically a network administrator. This control of resources includes access control mechanisms in the form of authentication and authorization. A problem arises when someone from outside of a particular realm wishes to access a resource for which he/she has no authorization. Resources may be perceived as ranging from public to highly restricted, which suggests the need for granularity of access control. One means of providing this authorization is through the development of an agreement between the user and the realm in which the resource exists. The problem with this approach is that the network authority controlling the resource must now maintain information, such as a username and password, for each foreign user. This can quickly become a burden for the network authority as the number of foreign users increase. An alternative is to create a mutual agreement between realms, explicitly for the sharing of resources between realms. This is the federation, where access is controlled jointly by adopting certain trust agreements between realms. The user must trust the sharing of identifiable user information to access the remote resource. This raises several opportunities to exploit that user s privacy. An alternative would be to assert an attribute (e.g. authority level such as professor/researcher/student etc.) and have this attribute examined by the authority of the remote resource. The remote authority may examine the authenticity of this assertion and make a decision regarding access. The authority need not maintain a separate access control list for each remote user and the remote user is exposing less information about themselves across a network. A federated model brings together parties with common interest while offering them protection at different levels between themselves and from others. 2.3. SAML It is an XML-based framework for exchanging security information. This security information is expressed in the form of assertions about subjects, where a subject is an entity (either human or computer) that has an identity in some security domain. Assertions can convey information about authentication acts performed by subjects, attributes of subjects, and authorization decisions (already made) about whether subjects are allowed to access certain resources. The protocol, consisting of XML-based request and response message formats, can be bound to many different underlying communications and transport. SAML currently defines one binding, SOAP over HTTP. [2] We are presently working on developing a SIP binding and profile for SAML. 2.4. Shibboleth Shibboleth is an Internet2/MACE project that is developing architectures, frameworks, and practical technologies to support inter-institutional sharing of resources that are subject to access controls and is based on SAML. One difference between Shibboleth and other efforts in the access control arena is Shibboleth's emphasis on user privacy and control over information release. Shibboleth is a system for securely transferring attributes about a user from the user's origin site to a resource provider site. It assumes that users employ browsers and that the resources are accessible via standard browser technologies. Shibboleth is also a system for allowing user choice in what information gets released about the user and to which site. Thus, the job of balancing access and privacy lies ultimately with the user, where it belongs. An important element of the Shibboleth architecture is the component that releases information about users, it being the Attribute Authority (AA). Each origin site (i.e. a site with administrative authority over users who access resources at remote providers) has its own AA. The AA's job is to provide attributes about a user to a resource provider. But the AA also has the responsibility of providing a means for users to specify exactly which of their allowable attributes gets sent to each site they visit. The Handle Service (HS) is another component of SHIB that resides at the origin site. It is a web-based service that creates "handles" for attribute queries of a user without revealing the users identity thus guarding the user's privacy. This handle is then used to obtain the attributes of the user requesting access.
3. Our Solution The architecture of our proposed solution is based on three modular functions; resource registration, resource discovery and call initiation. Resource registration allows a user to register within the local domain. Resource discovery allows a user to locate other users from within the same domain as well as other domains. Call initiation allows a user to setup a session with another user. It is desirable for a solution to be modular, which necessitates that each of these three processes be independent of each other; meaning, for instance, that call initiation can take place without resource discovery. In order to preserve complete modularity in terms of all the three processes, it is necessary to protect each of these three processes separately. As with any diverse network, securing this service is difficult. It involves many trust boundaries (and relationships), many modes of operation, a reliance on intermediaries and numerous points of failure. We try to create a model that weighs the risk versus the operational, management and deployment ease. To address common security concerns, we make use of the tools that SIP and HTTP employs. [1] [3] This might include digest authentication, user-to-user and proxy-to-user challenges, S/MIME, TLS, IPSEC and SIPS. However, we would like to take this process one step further by applying an inter-realm transfer of attributes service based on Shibboleth and making use of SAML as a means of providing secure inter-realm authentication. The goal being to make use of practical security functions while providing a robust level of privacy to the end user. We describe the details of this model in the following sections. 3.1. Resource Registration A SIP User Agent (UA) registers itself with a Registrar, likely in its local domain. It is this process that creates the mapping between the SIP URI and the IP address of the host on which the SIP UA is running. This allows the network to route calls to the proper destination. Registration creates a binding in the location service for a particular domain that associates a URI with one or more contact addresses. Registration requires sending a REGISTER method to a Registrar, which acts as a front end to the location service for a domain, reading and writing mappings based on the contents of the REGISTER request. SIP provides for some basic security mechanism during the call signaling and they have been described in RFC 3261. [1] We propose to use the same mechanism for the registration process. In our model, once the user has registered, the contact information of that user is pushed to a presence server. The presence server displays the contact information of only those users who are online and available for call setup. Thus the registration process will trigger the population of that user s information to the presence server. Such a presence server could either be centrally managed or it could be distributed. In a centrally managed server, all users contact information would be stored and managed by a central body. In the other case where it is distributed, a central server could exist that would have information about the different federations and links to the local presence servers. Network administrators may be unwilling to allow information about their users to be displayed outside their domain. Hence, it might make better sense to have a distributed model. The final model may resemble the Instant Message and Presence work under way within the IETF. [4] 3.2. Resource Discovery Resource discovery is the process wherein one user determines the location information of another user. In our model, the user will browse a webpage, which will display the presence information of all users. On locating the person or resource with whom the user wants to establish a video session, the user would click on the hyperlink to that person. This would cause the SIP UA to be invoked on the initiating user s machine. Note that our modularity is not disrupted here, as the tying is optional to the user only by clicking on the link does he launch call initiation during resource discovery. The information on the presence server should be accessible to only those that are authorized to access that information. To implement authorization we propose to use Shibboleth in our solution. When an initiator requests for a resource from the destination, the destination resource authority seeks attributes of the initiator, and on receiving these attributes checks them to validate the initiator and accordingly allows or disallows the request. In effect it brings about a situation where the initiator does not have to log on multiple times at different destination. Further, the initiator can set different release policies for different destination. Hence this model minimizes multiple sign on and enforces selective release of information according to the destination end point and origin end point. [5] Since Shibboleth was designed for HTTP requests, it fits this part of the model perfectly. The presence server is designated as a protected resource and sits behind the Shibboleth process. Whenever an HTTP request is directed to the presence server it is intercepted by the Shibboleth process and requests
authorization information from the user. Once it receives the authorization information and decides that the user is authorized to access the presence server, the HTTP request is forwarded to the server and the user can access the information on it. One of the reasons for protecting the resources on the presence server is to prevent spamming (via resource harvesting) and also to protect the privacy of the users whose contact information is displayed. In our model, when the user clicks on the hyperlink of the target, a metafile is sent to the browser as an HTTP response. The browser on receiving the metafile invokes the associated plug-in and sends the metafile to the plug-in. The plug-in parses the metafile, invokes the user s SIP UA, and places a call to the target using the SIP URI extracted from the metafile. 3.3. Call Initiation Call initiation is the process in which the session is setup. It is in this process that we incorporate various SAML mechanisms to secure the call signaling process. The security requirements we are focusing on include authentication and authorization. In this section, we assume that local authentication has already taken place (either during web authentication or during REG process). The authentication requirements during call initiation consist of conveying this authentication information to the remote domain for authorization purposes. However, we would like to provide more information about the user to the remote domain to allow greater granularity in the authorization process; for example, to allow a remote campus to authorize INVITEs only from faculty members or students of a certain course at a certain time of the day. Providing more information about the user allows the remote domain to have more granular authorization policies. For the purpose of dividing authorization functions, Policy Decision Points (PDPs) are defined. We would ideally like to have two Institutional PDPs, one at the origin and one at the target domain. In addition, we would also like to define the target user as an individual PDP. Of course, in most real-time communication sessions, the target user is an individual PDP by default, as the ultimate decision to accept or decline a call lies with the user. These decisions are generally made on the basis of some form of caller identity (for e.g. telephone number, etc.). While defining the target user as an individual PDP here other attributes apart from the caller-id may be used to make the decision. There are a couple of other requirements that need to be satisfied for this protected call initiation process. The process should not require any special action, like password entry for instance, on the part of the user. Also, for the security reasons discussed earlier, information about a user in one domain should not be stored in another domain. This rules out mechanisms like directory replication across domains and so the required information should be transferred across domains on a per-need basis. The amount of information transferred across domains about the user should also be in accordance with the privacy policies of the local domain and the user. Thus, the information transferred to the remote domain about the user should be just the minimum required for it to make authorization decisions. The lifetime of that information should also be minimal; to avoid reauthorizations for multiple sessions of the user to the same remote domain, the lifetime of the authorization decision can be adjusted. We now make use of an authentication service that will perform the role of verifying authentication of the user and convey information about it and also additional user information in the form of attributes to the remote domain. This service can be provided by the proxy server. This server needs to communicate with a SAML entity (likely a directory server), which would contain the attributes and release policies associated with that user. The specific manner in which the proxy server and the SAML entities will interface as well as the directory database structure is presently being investigated. At the end of the resource registration process, this database is populated with the authenticated user s attributes, which get added the details of the local authentication for the duration the authentication is valid. These details have to be conveyed to the remote domain by the authentication service (local proxy). This transfer is done in the form of a MIME body. The contents of this MIME body shall be discussed in the next section. For now, let us just say that these contents are sufficient for proper authorization at the target domain. There is an important decision that needs to be made here. We need to decide where this MIME body is attached at the SIP user agent or the SIP proxy. It is attractive from the SIP standpoint to push as much control as possible, out to the endpoint. However, given the nature of a federated administration, we require some participation by a local authentication entity. The solution that we take is for a local authentication entity to pass the body back to the UA, where a new INVITE (including the additional MIME body) will be created. The overall requirements for this process are the definition of a new MIME type for conveying authentication information and attributes. New server and user agent behavior needs to be defined and implemented to appropriately attach and deal with this new MIME type. This approach is a variation of the method described in [6].
4. SIP Bindings for SAML In the call initiation section, we discussed exchanging SAML information across domains within a MIME body. This MIME body would provide the necessary information needed at the PDPs to make authorization decisions. There are a few challenges to sending SAML assertions as MIME attachments to SIP messages. SAML assertions are presently defined around web profiles. We need to define a way for them to be ported to the SIP world. Thus there needs to be enhancements that will allow SIP entities to create SAML assertions to interface with SAML entities, package them into MIME type attachments, unpack and interpret SAML assertions (either directly or indirectly), and make authorization decisions based on them. We are currently working on defining the SIP bindings and profiles for SAML. In this, we define two profiles reflecting a push or a pull architecture that describes the manner in which assertions are exchanged. The difference between the two is essentially in what is transmitted initially, as the MIME attachment in the SIP method, the assertions themselves or a reference to them, called an artifact. Basic and Digest Access Authentication, Network Working Group, RFC 2069, June 1999. [4] "A Model for Presence and Instant Messaging", Day M., Rosenberg J.,Sugano H., Network Working Group, February 2000. [5] Shibboleth Project, http://shibboleth.internet2.edu. [6] J. Peterson, Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP), Internet-Draft, SIP WG, October 28, 2002. 5. Conclusions and Future Work In this paper, we have described a videoconferencing model that is user friendly, ensures user privacy through a federated model, and supports network administration with flexible policy decision and enforcement points. The model allows user choice in what information gets released about the user and to which site. Thus, the job of balancing access and privacy lies ultimately with the user, where it belongs. This paper describes a very high level architecture. Many of the specifics of this architecture are areas of present and future research. This includes the SIP/SAML bindings and profiles, the details of the directory/database design, the implementation and the interoperability testing. 6. References [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, E. Schooler, SIP: Session Initiation Protocol, Network Working Group, RFC 3261, June 2002. [2] Security Assertion Markup Language (SAML), OASIS, http://xml.coverpages.org/saml.html. [3] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L. Stewart, HTTP Authentication: