Session Initiation Protocol and Services Harish Gokul Govindaraju School of Electrical Engineering, KTH Royal Institute of Technology, Haninge, Stockholm, Sweden Abstract This paper discusses about the Session Initiation Protocol and the call setup between the user agents with scenarios like single proxy server and multiple proxy servers of different domains. The messages and the parameters of those calls are analyzed in detail. Scenarios for a call flow with and without record are discussed, explaining the various system entities of a SIP network. The session description protocol is elaborated as to how it is related to SIP. The concept of Instant Messaging and Presence is studied and the pros and cons of SIMPLE and Jabber protocols are indicated. The global unique numbering concept, ENUM is explained with a real time example. Keywords: SIP, SDP, Proxy Servers, IMP, SIMPLE, XMPP I. Introduction Session Initiation Protocol (SIP) [1] is a general purpose application-layer control protocol for setting up, altering, and tearing down sessions. Session is by which the source and destination communicate as in Internet multimedia conferences, Internet telephone calls, and multimedia distribution. SIP makes use of the proxy servers to forward (route) the requests to the present location of the users. Such servers allow the users to inform their current locations by registering and are also responsible for call-routing. SIP works on top of various transport layer protocols. SIP works with various other protocols to carry the session descriptions and the media streams. Session descriptions allows the endpoints to agree upon a set of capable media types to communicate and it is governed by Session Description Protocol (SDP). Real-Time Protocol (RTP) is used for actually carrying the media of various types like voice-data, video and text messages. II. Call Flow between Two User Agents and a Proxy Server In this section, we will see in detail the call establishment between two user agents g.harish (sip:g.harish@iptel.org) and g.gokul (sip:g.gokul@iptel.org) connected to the same proxy server iptel.org. They use the SIP application SJPhone to communicate. Figure 1 shows the successful call flow involving the basic SIP functions such as the initial signaling, negotiation of session parameters to establish the session, exchange of media information in the form of SDP payloads, establishment of the media session and the termination of the session once established as in [1] and [2]. Figure 1: SIP call flow between two user agents via proxy server A. Dissection of an INVITE Message Wireshark packet analyzer is used to capture and study the packets in detail. A SIP packet consists of the Request-Line, Message Header and the Message Body. The type of the request made can be found in the Request-Line section of a SIP packet. INVITE sip:g.gokul@iptel.org SIP/2.0 Via:SIP/2.0/UDP 193.10.39.148;branch=z9hG4bKc10a2794000000764cd125fd 000024d300000036;rport From:"Harish" <sip:g.harish@iptel.org>;tag=2cb2479100c To: <sip:g.gokul@iptel.org> Contact: <sip:g.harish@193.10.39.148>
Call-ID: 5CFE60C3F1AF4F939DAAF44A09BBD9100xc10a2794 CSeq: 1 INVITE Max-Forwards: 70 User-Agent: SJphone/1.65.377a (SJ Labs) Content-Length: 368 Content-Type: application/sdp Supported: replaces,norefersub,timer Session Description Protocol Version (v): 0 Owner/Creator, Session Id (o): - 3497763965 3497763965 IN IP4 193.10.39.148 Session Name (s): SJphone Connection Information (c): IN IP4 193.10.39.148 Time Description, active time (t): 0 0 Media Description, name and address (m): audio 49198 RTP/AVP 3 97 98 8 0 101 Media Attribute (a): rtpmap:3 GSM/8000 We could see that the transaction starts with the user g.harish sending an INVITE request addressed to g.gokul s URI (sip:g.gokul@iptel.org). B. Message Header The header fields are described below: Via: has the address (193.10.39.148) at which g.harish waits to receive responses to the request. The branch parameter in this field identifies this transaction. From: consists of a display name (g.harish) and a SIP or SIPS URI (sip:g.harish@iptel.org), which indicates the sender of the request. These display names are described in [3] This header field also contains a tag parameter containing a random string (2cb2479100c) added to the URI by the SIP phone for identification functions. To: consists of the SIP or SIPS URI (sip:g.gokul@iptel.org) to which the request is actually to be sent. Contact: specifies a SIP or SIPS URI for a direct way of contacting g.harish, mostly made of a username at a fully qualified domain name (FQDN). As per RFC 3261 [1], though FQDN is preferred, most systems user IP addresses since they don t have registered domain names. Call-ID: has a unique global id for this call, which is formed by combining a stochastic string and the SIP application s host name or IP address. Dialog: The combination of To and From tags, and Call-ID defines a peer-to-peer SIP relationship between g.harish and g.gokul. Command Sequence: referred to as Cseq contains an integer and a method name. This number is incremented for every new request inside a dialog. Max-Forwards: puts forth the limitation on the number of hops a request can make on its way to the destination. It s an integer, which is decremented by one at every hop. User-Agent: field consists of the information about the user agent client where the request originates. Content-Length: is the size of the message body represented in bytes. Content-Type: consists of a description about the message body. The whole set of SIP header fields are explained in [1]. C. Message Body The message body comprises of the Session Description Protocol. It contains a description of the audio/ video channel that needs to be established and the SDP fields, which are generally categorized as mandatory and optional fields. Mandatory Fields: v refers to the protocol (SDP) version number, which is 0 in our case. o depicts the owner/ creator and an identifier of the session. In our example, 3497763965 is the session ID where IN and IP4 refers to Internet and IP version 4 addresses. s shows the session name (SJphone). t represents the active session time. m is the media type, format and the transport address. In our example, audio 49198 RTP/AVP 3 97 98 8 0 101, the media type is audio. 49198 is the port of RTP (Real-Time Protocol), which must always be an even number. The next odd port is used by RTCP (Real-Time Transport Control Protocol). RTP/AVP denotes the RTP protocol with the profile for "Audio and Video Conferences with Minimal Control" (refer [4]). The numbers denote the codecs and their preferences. For example, 3 denote the GSM codec. Optional Fields: c connection information. a represent session attributes. For example, a=sendrecv means that the media is both sent and received, which is the case mostly with SIP communication. D. Call Trace Analysis Referring to Figure 1, we will assume g.harish as UA1 and g.gokul as UA2, iptel.org proxy server as PS. UA1 sends an INVITE request with the SIP URI of UA2 to the proxy server PS. This is a SIP/ SDP request, which means that it s a signaling as well as a message indicating the description of the session that it wants to establish with UA2. PS sends a 407 Proxy Authentication Required message back to UA1 requesting it to provide its credentials to be authenticated by the server. Now UA1 sends an ACK to the server indicating that it has received its request for providing the login credentials. UA1 again sends an INVITE along with the authentication credentials to PS. On successful authentication of the UA1, PS sends a 100 Trying message indicating that it will work on behalf of UA1 to route the INVITE message to the destination. UA2 receives an INVITE from PS, which is actually a request from UA1. UA2 responds with 180 Ringing to indicate that the phone is actually ringing for it to decide whether to accept or reject. PS routes this 180 ringing to UA1. After deciding to accept the call, UA2 picks up his/her phone and thus indicating a successful 200 OK. If it
was decided to reject the call, an error message would have been sent as response. This message is again routed to UA1 via PS from UA2. UA1 now sends an ACK to this as response, which the UA2 receives via PS. At this juncture, the media session has begun between UA1 and UA2 and they start transferring the media packets using the format to which they agreed upon by exchanging SDP. In our example, UA2 wants to end the call and hence sends a BYE to UA1 directly as it no longer needs the PS to route as it has learnt about its location. It can also be sent via the PS and this concept is known as Record- Routing, which we will be discussing later. To this, UA1 responds with a 200 OK agreeing to UA2 s request to terminate the connection. On receiving this message by UA2, the call between the endpoints is successfully terminated. III Signaling In Two User Agents and Two Proxy Servers of Different Domains Model Figure 2.1: SIP session establishment through two different proxy servers SIP Trapezoid SIP response messages are classified as: Informational 1xx Success 2xx Redirection 3xx Client error 4xx Server failure 5xx Global failure 6xx For the complete list of response codes and information about them, refer section 21 of [1]. A. Call Trace Analysis The call transaction begins with UA1 making an INVITE request for UA2. But its not aware of the location of UA2 in the IP network. Hence it passes the request to Proxy Server 1, which forwards it to PS2 for User Agent 2. And it sends a 100 TRYING response to UA1 informing that it is trying to reach UA2. PS1 knows that it has to forward the request to PS2 through the registration process of SIP. Similar to PS1, PS2 works similarly as PS1 On receiving INVITE. It forwards the INVITE request to UA2 (Assuming that PS2 knows the location of UA2. If not, the INVITE request would have got forwarded to another proxy server) and then issues a 100 TRYING response to PS1. On receiving the INVITE request, the SIP phone at UA2 starts ringing informing to inform the call request. And it issues a 180 RINGING response back to PS2 which reaches UA1 through PS1. UA2 now can choose to either accept or decline the call. In our example, we would like to keep it the accept way as seen in Figure 2. A 200 OK response is sent to PS2 when the call is accepted. Similar to the route of INVITE, this reaches UA1. UA1 sends an ACK message to confirm the call setup. This 3-way-handshaking (INVITE+OK+ACK) is used for reliable call setup. Its important to note that the ACK message doesn't use proxy servers to reach UA2 as by now UA1 is aware of the exact location of UA2. At this point, the connection is set and media flow happens between the two User Agents using the format that was agreed upon by exchanging session description using SDP. When UA2 wants to terminate the call it sends a BYE message to UA1 for which it responds with a 200 OK message to confirm the teardown of the session. Refer [2] for detailed description. B. SIP Trapezoid with Record Routing According to [1], record routing is the process in which the request traverses the proxy or list of proxies. This route set can be learned through message header Record-Route or it can be configured. While comparing with Figure 2.1, we could see that all the messages go through the proxy servers in Figure 2.2. This method of passing through proxy servers is called as Record Routing. It is achieved by informing the endpoints about record routing with a record-route header field. A sample packet with record-route header field is as follows:
User Agent is a logical entity that can act as both a client and server depending on the situation. A. User Agent Client: It s a logical entity that generates a request. Typically UAC s role lasts only for the duration of that transaction. i.e., on initiating a request, it acts as a UAC for the duration of that transaction. B. User Agent Server: On the other hand, on receiving a request, the same UAC becomes a user agent server for processing that transaction. UAS is a logical entity that receives and processes a request. Figure 2.2: With Record Routing - SIP Trapezoid Via:SIP/2.0/UDP 193.10.39.148;branch=z9hG4bKc10a27940000007a4cd12603 000045330000003e;rport=5060 From:"Harish" <sip:g.harish@iptel.org>;tag=2cb2479100c To:"Gokul" <sip:g.gokul@iptel.org>;tag=61f3478fbd8 Contact: <sip:g.gokul@193.10.39.147> Call-ID: 5CFE60C3F1AF4F939DAAF44A09BBD9100xc10a2794 CSeq: 3 BYE Content-Length: 0 Record-Route: <sip:213.192.59.75;ftag=2cb2479100c;avp=orudbwbhy2nv dw50awb5zxmdcqbkawfsb2dfawqwadviotqtngnj ODc5ZjYtMWM3YWE2NDMDBgBzdGltZXIEADE4MDA;l r=on> Server: SJphone/1.65.377a (SJ Labs) IV System Entities The system entities comprise of two important components clients and servers namely. Clients: As per [1], a client is any network element that sends SIP requests and receives SIP responses. User agent clients and proxies are in general termed as clients. The SIP application running on a phone, computer etc is an example of a client. Servers: Servers are the important elements of the network that receives requests, processes them and sends back the responses. Examples of servers are proxies, user agent servers, redirect servers, and registrars etc. C. Proxy Server: The most important role of Proxies are routing in the network. Generally the exact address of the destination is not known in advance to forward the request generated. Proxy servers, forwards them on behalf of the client to the destination or to the nearest proxy server. D. Registrar: Location of the users is very important for the SIP communication to happen. Hence users have to register their locations to a registration server by sending REGISTER requests. E. Redirect Server: It s an UAS that responds with 3xx messages to indicate that the client has to contact an alternate set of URIs. F. Location Server: It deals with the binding of the logical and physical addresses of the users registered to a Registrar. V. Purpose of SDP in SIP As we know, SIP works at the layer 7 to create, modify and terminate media sessions such as voice calls, multimedia exchange, IP conferences etc. The voice/ video stream communications in SIP is carried through another layer 7 protocol, Real-time Transport Protocol (RTP). SIP messages carry session descriptions, which has to be exchanged between the end devices before the actual communication starts. This allows the participants to agree on compatible media types like audio/video codec, encoding information, connection metadata etc. This information is transported inside the SIP message body. For an example on how SDP is encapsulated inside a SIP packet, see Section II. Also refer [6] for more information about SDP. VI. Presence and Instant Messaging Presence is a mechanism by which one can sense the ability of another user to communicate. Messages like Online,
Offline, Busy etc. are examples of Presence. It is generally used to know if the other party is ready to start a conversation via an instant message. Instant Messages are sent when the user hits the send button. They are generally shown in sequential order that is grouped together in a window. AOL is one of the earliest IM that was in use amongst the web surfers. A. SIP for Instant Messaging and Presence: Several millions of users use instant messaging and presence (IMP) programs. Different IMP clients are used to access the various IMP servers (like AOL, Skype.) because they are proprietary. This made IETF to formulate the IMP Working Group (WG). SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE add up the basic SIP protocol with instant messaging and presence capabilities. Though SIP was originally developed for voice over IP (VoIP) it has matured to support conferencing and other media streams. As stated in [7], SIMPLE [5], is designed in a way such that it can be applied to the Session Initiation Protocol in order to register presence information, receive notifications when events occur, send short messages (SMS), handle session of real-time messages like streaming between the involved entities. B. SIMPLE and JABBER (XMPP), a Comparison: SIMPLE is designed to be more general purposed than XMPP. It is widely used in voice, video, push-to-talk and other communication options. It s capable to be used in more applications than just IM and presence though it was developed as a protocol for signaling purposes. This makes it great for single session traffic such as SMS or IM. But it does not cater well for the huge data like video signals. It also misses the popular IM features such as contact lists and group chats. Whereas, the XML-based Extensible Messaging and Presence Protocol (XMPP) for real time communication powers many of the popular applications including instant messaging, presence, video/voice calls, multi-party chat etc. Jabber, the famous instant Messaging and presence technology is built based on this protocol. Jabber is known to be used by millions of people over Internet. It doesn t have many of the pitfalls that SIMPLE possesses. It is said that JABBER is the Linux of IM, where a lot of developers are contributing to its betterment. While this is the case, the popularity of XMPP continues to grow. VII. ENUM A method of representing the various resources of the IP and telephony world under a single unique identifier/ phone number is known as ENUM (telephone NUmber Mapping). It s based on the Domain Name Service (DNS) design where the telephone numbers are mapped to the domain names. Before we move on to see how ENUM works it s imperative to understand what E.164 is. A. E.164 It s an existing global numbering system administered by the International Telecommunications Union (ITU) and is therefore suitable for use by ENUM. An E.164 number comprises of the phone number, country code and the area code. For example, a telephone number 43595670 in Chennai, India can be written as +91 44 43595670. Refer to [7] for more information. B. How ENUM Works On entering a telephone number, it will be converted into E.164 format. Lets take the same example as that of E.164. The entered number 43595670 is converted into +91 44 43595670 (Chennai, India). The preceding symbols are removed and reduced to just numeric digits i.e., 4443595670. These digits are then reversed 0765953444. Each digits are separated by dots 0.7.6.5.9.5.3.4.4.4 E164.arpa is the proposed domain for E164. Therefore add the domain e164.arpa to the end of the numbers - 0.7.6.5.9.5.3.4.4.4.e164.arpa. On this domain, a DNS query is made and the definitive name server is found. Subsequently, NAPTR records are retrieved by ENUM and an action is performed according to the registered services for that number. VIII. Summary We have shown how a SIP communication happens between the user agents connected to proxy servers and analyzed the call trace for the INVITE message in detail explaining all the fields and parameters. We studied how the session description parameters are carried in SDP inside the SIP protocol. The difference between the Instant Messaging and Presence (IMP) protocols such as SIMPLE and XMPP was discussed. We have also seen how the global numbering system, ENUM works. IX. References [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., Schooler, E.: The SIP: Session Initiation Protocol, RFC 3261, June 2002. [2] Johnston, A., Donovan, S., Sparks, R., Cunningham, C., Summers, K.: Session Initiation Protocol (SIP) Basic Call Flow Examples, RFC 3665, December 2003. [3] Resnick, P.: Internet Message Format, RFC 2822, April 2001. [4] Schulzrinne, H., Casner, S.: RTP Profile for Audio and Video Conferences with Minimal Control, RFC 3551, July 2003.
[5] Campbell, Ed, B., Rosenberg, J., Schulzrinne, H., Huitema, C., Gurle, D.: Session Initiation Protocol (SIP) Extension for Instant Messaging, RFC 3428, December 2002. [6] Handley, M., Jacobson, V., Perkins, C.: SDP: Session Description Protocol, RFC 3428, July 2006. [7] Faltstrom, P.: E.164 number and DNS, RFC 2916, September 2000. [8] ENUM, http://www.enum.com