1 SPAM over Internet Telephony and how to deal with it Diploma thesis - Rachid El Khayari Supervisor: Prof. Dr. Claudia Eckert, Dr. Andreas U. Schmidt, Nicolai Kuntze Fraunhofer Institute for Secure Information Technology
2 O misery, misery, mumble and moan! Someone invented the telephone, And interrupted a nation s slumbers, Ringing wrong but similar numbers. Ogden Nash ( / USA)
3 2 Acknowledgements I want to thank Prof. Dr. Claudia Eckert for giving me the opportunity to work on this thesis. Dipl. Inform Nicolai Kuntze and Dr. Andreas U. Schmidt for their great support and trust into my work. my whole family including my parents Mohamed and Yamina, my brother Soufian, my brother Samir and his wife Nadya, my little niece Sara and last but not least my best friend Inesaf and all others who supported me on my way.
4 3 Affidavit I hereby declare that the following diploma thesis "SPAM over Internet Telephony and how to deal with it" has been written only by the undersigned and without any assistance from third parties. Furthermore, I confirm that no sources have been used in the preparation of this thesis other than those indicated in the thesis itself. Place, Date Signature
5 4 Introduction In our modern society telephony has developed to an omnipresent service. People are available at anytime and anywhere. Furthermore the Internet has emerged to an important communication medium. These facts and the raising availability of broadband internet access has led to the fusion of these two services. Voice over IP or short VoIP is the keyword, that describes this combination. The advantages of VoIP in comparison to classic telephony are location independence, simplification of transport networks, ability to establish multimedia communications and the low costs. Nevertheless one can easily see, that combining two technologies, always brings up new challenges and problems that have to be solved. It is undeniable that one of the most annoying facet of the Internet nowadays is spam. According to different sources spam is considered to be 80 to 90 percent of the traffic produced. Security experts suspect that this will spread out on VoIP too. The threat of so called voice spam or Spam over Internet Telephony (SPIT) is even more fatal than the threat that arose with spam, for the annoyance and disturbance factor is much higher. As instance an that hits the inbox at 4 p.m. is useless but will not disturb the user much. In contrast a ringing phone at 4 p.m. will lead to a much higher disturbance. From the providers point of view both spam and voice spam produce unwanted traffic and loss of trust of customers into the service. In order to mitigate this threat different approaches from different parties have been developed. This thesis focuses on state of the art anti voice spam solutions, analyzes them to the core and reveals their weak points. In the end a SPIT producing benchmark tool will be implemented, that attacks the presented anti voice spam solutions. With this tool it is possible for an administrator of a VoIP network to test how vulnerable his system is.
6 5 Contents Acknowledgements 2 Affidavit 3 Introduction 4 1 Basics The history of telecommunication Voice over IP User Datagram Protocol Real-time Transport Protocol RTP Control Protocol Session Initiation Protocol SIP Transport SIP Messages Client/Server SIP URIs SIP Requests SIP Responses SIP session establishment SIP transactions/ dialogs SIP Message layout Session Description Protocol User Agent Registrar Proxy Server SIP security mechanisms SIP Digest Authentication SIPS (SIP Security) S/MIME IPSec SPAM over Internet Telephony SPIT versus SPAM Intuitive SPIT definition SPIT analysis Information gathering SPIT session establishment
7 SPIT media sending SPIT summary SPIT countermeasures and their weaknesses Device Fingerprinting Passive Fingerprinting Active Fingerprinting Weakness of Device Fingerprinting White Lists, Black Lists, Grey Lists Weaknesses of White Lists, Black Lists, Grey Lists Reputation Systems Weakness of Reputation Systems Turing tests, Computational Puzzles Weakness of Turing tests and Computational Puzzles Payments at risk Weakness of Payment at risk Intrusion Detection Mechanisms, Honey phones Weakness of Intrusion Detection Mechanisms, Honey phones Summary SIP XML Scenario Maker Technical Basis Message Editor SIPp message format Scenario Editor Shoot Mode Using SXSM as attack tool Device Spoofing SIP Identity Spoofing SIP Header Spoofing Call Rate Adaption Account Switching Reputation Pushing or Pulling SIP Identity Hijacking CAPTCHA Relay Attack Conclusions and Outlook 74 Glossary 75 List of figures 76 List of tables 77
8 References 78 7
9 8 1 Basics of presented technology 1.1 The history of telecommunication Ever since people searched for opportunities to communicate over long distances. Optical telegraphs are viewed as the first practical applications of communication over distance and can be dated back to prehistoric times . In order to send out messages, optical signals like light or smoke were sent with a specified code, so that the recipient could see them from far. The electric telegraph based on that principle and was used to transmit messages over electric wires. In the mid 1800s Samuel Morse and Alfred Vail invented a telegraph system in combination with an easy to use code (Morse code). This led to the success of telegraphy in America and long distance lines were constructed and spread over the country . Only few decades after telegraphy revolutionized telecommunications, telephony began its history in the early 70s of the 19th century with the invention of the telephone. The forefathers of the telephone Antonio Meucci, Johann Philipp Reis, Alexander Graham Bell and Elisha Gray, amongst others had a clear vision in common of people being able to talk to each other over distance. Philipp Reis first prototype of a telephone was built as an attachment to the existing telegraphy network. The telegraphy network was the common data communication network and with Reis invention it was possible to alternatively transport voice through the same electrical wires. Analog telephony is as old as the invention of the telephone itself. The first devices were physically connected through a wire. The voice was transported through modulation of electric signals on this wire. The first telephone exchange started in 1878 in New Haven. The central office had a very simple switchboard and the connections had to be set manually by an operator. In central offices with manual switching, the operator asked the caller for the destination of the call and connected the line of caller and callee. Switching the connections manually reached its limit soon as the number of participants grew. This led to the development of automated switching systems at the turn of the century. The automated switching systems replaced the operators and had to fulfil the same tasks. The caller signalized call initiation by picking up the phone and dialling the number of the destination. According to the pulses generated by the dialled numbers the electromechanical switches selected, which lines had to be connected to establish the call. This type of negotiation is referenced as in-band signalling, because the signalling for call establishment and the voice are sent over the same wire. Parallel to the analog telephone network telex (teleprinter exchange) systems were developed. With this technology written messages could be transported over wire lines. The telephone network and the telex network coexisted and in Germany e.g. end users had to have two connections, one for telephone and one for telex. The further evolution of the telephone network proceeded from electromechanical switching systems to digital electronic switching
10 9 systems in the late 1970s. The transition from analog to digital techniques in telephony led to the development of ISDN (Integrated Service Digital Network ) a telephone network system which upgraded the existing analog system. End to end digital transmission could be realized and voice and data services could be transmitted over the same network. Nevertheless the Public Switched Telephony Network (PSTN) remained a circuit switched network as far as the communication channels are concerned. A fixed bandwidth channel was reserved between the communication partners, as if they were physically connected through a wire . As the Internet technology arose telephony made the step from the circuit switched to the packet switched communication paradigm and this led to the development of Voice over IP. 1.2 Voice over IP Voice over IP is a generic term for multimedia services, that perform signalling and media transport over the Internet Protocol. Multimedia sessions are communication sessions like Internet Telephony, conferences and similar applications, where different media like audio, video, text messages or data is transmitted. A multimedia session via the Internet or other IP-based networks (an IP-based communication) can only be achieved with the transmission of IP-packets via the Internet Protocol. The main challenge in that scenario is, that the Internet Protocol works connectionless, whereas telephony is connection oriented per definition. This means, that in order to enable two or more participants to communicate with each other, a session has to be established, then media has to be exchanged and in the end the session has to be terminated. It is clear, that this can only be achieved with the aid of other protocols for media transport and session handling. A complete (vertical) communication stack covers all layers of the Open Systems Interconnection Basic Reference Model (OSI Reference Model). Typically, these architectures will include protocols such as the Real-time Transport Protocol (RTP) (RFC 1889, 3550), User Datagram Protocol (UDP) (RFC 768), Internet Protocol (IP) (RFC 791) and at least one layer 2 and layer 1 protocol. As far as call signalling and bearer control is concerned additional protocols are needed. In our scenario Session Initiation Protocol (SIP) (RFC 3261) and the Session Description Protocol (SDP) (RFC 2327, 4566) for describing multimedia sessions are included into the communication stack. The orchestration of all the protocols above, (which will be discussed in detail later) is called SIP-Protocol-Stack as displayed in figure 1.1 on page 10. As an analog the figure implicates the usage of applications on basis of Hyper Text Transfer Protocol (HTTP) (RFC 2616). 1.3 User Datagram Protocol The User Datagram Protocol (RFC 768)  is a simple connectionless working transport protocol on top of the Internet Protocol. As a transport protocol it can be assigned to the Transport Layer of the OSI Reference Model. UDP datagrams are transported as fast as possi-
11 10 Figure 1.1: SIP-Protocol-Stack ble without guarantee of delivery or delivery in correct order . Therefore it is especially useful for realtime communication. In the scenario of Telephony e.g. dropped packets are preferable to delayed packets. Looking back at figure 1.1 on page 10 we can see, that RTP is set on top of UDP, this means, that media transport is fulfilled by RTP via UDP. We can also see, that SIP can be used with UDP or alternatively with TCP, but UDP in fact is the better choice, because SIP already provides techniques for retransmission and sequence control, so even call-signalling and bearer control messages are sent with SIP via UDP. Main tasks of UDP are the partitioning of data into datagrams, checksumming of header and payload and session multiplexing. In order to fulfil session multiplexing port numbers are used. We differ three types of ports: Well Known Ports (ports that are fixed to protocols of higher layers e.g. Port 53 corresponds to Domain Name Service (DNS), Registered Ports (ports that can be registered by companies) and Dynamic Ports (ports that are not bound to a special protocol and can be used dynamically). Well Known Ports are only valid at server side, this means e.g., that a DNS server listens on the Well Known Port 53 (UDP), so if a client wants to send a request to a DNS Server, the client sends his request to UDP Port 53 of the server. In order to receive the response to his request, the client sends a dynamic bound port number within the request, so that the server sends his response to the dynamic bound port of the client. This makes it possible for a client to handle several parallel connections to the same server . In order to guarantee for the server, that he can distinguish between different clients, the IP is used as a differentiating factor . In figure 1.2 we can see how an UDP Datagram is built. The datagram contains 4 header elements:
12 11 Figure 1.2: UDP Datagram Source Port: The first and second octet are reserved for the source port of the sending process. Replies will be sent to this port in the absence of any other information. Destination Port: Octet 3 and 4 are reserved for the destination port of the targeted machine. Length: Octet 5 and 6 are reserved for the length of the whole UDP Datagram including the headers. The length is computed in numbers of octets. Checksum: Octet 7 and 8 are reserved for a calculated checksum. The checksum value is computed from a pseudo header, that includes the whole UDP Datagram and a part of the IP Header. 1.4 Real-time Transport Protocol The Real-time Transport Protocol (RFC 3550)  is a connectionless working transport protocol. As a transport protocol it can be assigned to the Transport Layer of the OSI Reference Model. Since it typically uses UDP and is tightly linked to the application, it is often assigned to the Application Layer of the OSI Reference Model. RTP provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video and is therefore predestined for media transport in VoIP scenarios. Those services include payload type identification, sequence numbering, timestamping and delivery monitoring . Nevertheless RTP does not provide any mechanisms, that guarantee in order delivery or any other quality aspect. RTP just helps the receiver to detect in which order the datagrams were initially sent, so that the receiving application can put them back in correct order. With RTP it is possible to transfer data between one sender and one receiver (unicast) as well as between one sender and several receivers (multicast). Therefore it is simple to establish conferences (audio/video) with RTP. For every direction of transfer a so called RTP session is established, that is characterized by an identifier, that is called Synchronization Source (SSRC) and a UDP Port. RTP does not use a special Well Known Port, but only a Dynamic Port of even number. In figure 1.3 we can see, that an RTP Datagram contains the following header information:
13 12 Figure 1.3: RTP Datagram Version (V): The first 2 bits contain information about the used RTP version. The correct value for RFC 3550 RTP is 2 (decimal). Padding (P): This one-bit value shows, if the payload is followed by padding bytes. Extension (X): The extension bit indicates, if the RTP header is followed by an optional extension header. CSRC Count (CC): This 4 bit value contains the number of Contributing Sources that follow in the CSRC Identifier header (0...15). Marker (M): The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream . Payload Type (PT): This field indicates of which type the transported payload data is. It is necessary for the receiver to know of which type the payload is in order to decode it in the right way. Some formats are predefined in RFC 3551 , e.g. the Payload Type 8 corresponds to PCMA: A-law coded voice with 64 kbit/s. Sequence Number: This field contains a randomly generated number at the beginning of an RTP session and is incremented by 1 with every sent packet. It is used for the detection of packet loss or packet delivery in false order. Timestamp: This header reflects the sampling instant of the first octet in the RTP data packet. Synchronization Source (SSRC) Identifier: This header contains an identifier, that is randomly generated at the beginning of an RTP session. Contributing Source (CSRC) Identifier: This header field is optional and usually empty (in unicast scenario). In case of a multicast transaction the CSRC field contains informa-
14 13 tion about the participating entities, while the SSRC header contains only information about the RTP Mixer RTP Control Protocol The RTP Control Protocol (RFC 3550) is a protocol, that completes RTP with Quality of Service information. As QoS aspects are not relevant in our scenario, RTCP will not be discussed. 1.5 Session Initiation Protocol SIP is a standardized signalling protocol, that bases on the Standard Request for Comments (RFC) 3261 developed by the Internet Engineering Task Force (IETF) and replaces the predecessor RFC 2543. It is an application layer protocol and is used for the establishment, the termination, the management and coordination of multimedia sessions over the Internet or other IP-based networks. It establishes a connection between two or more participated User Agents (UA). Text based messages are exchanged between clients and servers in order to achieve the establishment of connections SIP Transport It is possible to transport SIP Messages via UDP or TCP. In most implementations the transport via UDP is preferred, as SIP itself provides handshake-, replay- and timeout functions in order to keep communication up. For that reason it is possible to reduce time and overhead by using the stateless UDP as transport protocol instead of TCP SIP Messages As SIP is a text based protocol, session establishment and negotiation of session constraints is established via sending of so called SIP Messages. The signalling information is exchanged according to the client server principle. In that scenario two types of SIP-Messages are distinguished: SIP Requests and SIP Responses. Both types of messages consist of a start line, one or more header fields, an empty line indicating the end of the header fields, and an optional message body. The difference is, that a request starts with a request line as start line, while a response starts with a status line as start line.
15 Client/Server Requests are sent from a client to a server. Responses in contradiction are generated from a server and sent to a client. A communication endpoint can act as a User Agent Client (UAC) or as a User Agent Server (UAS). In other words every UA must be able to generate requests and responses. So you can see, that the terms User Agent Client and User Agent Server do not refer to network elements. They define the role in that an endpoint acts in the communication SIP URIs A SIP URI (Uniform Resource Identifier) describes the contact address of a SIP endpoint. The syntax of a SIP URI corresponds to the following scheme: The user part of the SIP URI is built of an individual user name and the host part of the URI is an IP-address or a domain name. We can distinguish two types of SIP-URIs: temporary SIP URIs and permanent SIP URIs. The temporary SIP-URI corresponds to the address, where the SIP endpoint can be reached directly. Therefore the host part of the temporary URI is dependent on the network where the endpoint resides, so the temporary URI can be something like A permanent URI in contradiction is independent from the network where the endpoint resides and is usually generated by a SIP provider. When a user registers with a SIP Registrar, a permanent URI is generated like e.g. The relation between permanent and temporary SIP URI is usually stored in a Location Server, so if e.g. a SIP Proxy needs to know the address where an endpoint can be reached directly, it gets the information from the Location Server and can then send SIP Messages directly to the endpoint SIP Requests SIP Requests are SIP Messages, that introduce the transactions, that are necessary for a communication and are characterized with special methods. The following lists gives an overview over the main methods defined in RFC 3261 : INVITE: The INVITE method initiates the establishment of a SIP session between two communication endpoints. This method contains (in combination with the SDP body) information about session parameters, like e.g. preferred codec. Sending an INVITE request initiates the process that leads to session establishment via sending and receiving of other SIP Messages. Sending an INVITE request during an already established session, is a common technique for changing session parameters within a communication.
16 15 BYE: Sending a BYE request terminates an existing session. OPTIONS: With an OPTIONS request it is possible to ask for an endpoint s abilities without establishing a session. CANCEL: The CANCEL method can be used for cancelling any SIP transaction while the transaction is being established. ACK: The ACK (Acknowledgement) method in fact isn t really a request, because it is used for confirming the receipt of a final status information, that has answered an initial INVITE. It is the only request that is never answered. REGISTER: The REGISTER method is used by a SIP UA for registering itself with a SIP Registrar. In order to complete the methods, that are supported by SIP, the following list shows the extended methods, that are not part of RFC 3261: SUBSCRIBE: The SUBSCRIBE method is described in RFC 3265 and is used to request current state and state updates from a remote node . A subscription can be used e.g for presence functions (determine online status of users). NOTIFY: Even the NOTIFY method is described in RFC 3265 and is the logical answer to a SUBSCRIBE or REFER request and contains the current state of the requested remote node. REFER: The REFER method is described in RFC 3515 and indicates, that the recipient (identified by the Request-URI) should contact a third party using the contact information provided in the request (Third Party Call Control, 3PCC). . MESSAGE: The MESSAGE method is described in RFC 3428 and can be used for sending a short text message to the communication partner. The main purpose is Instant Messaging (IM). PRACK: The PRACK method is described in RFC 3262 and is the short form for Provisional Response Acknowledgement. It is used as an answer to Provisional Responses. UPDATE: The UPDATE method is described in RFC 3311 and is used for changing session parameters, while the session initiation has not yet been finished. INFO: The INFO method is described in RFC 2976 and is used for communicating mid-session signalling information along the signalling path for the call. The INFO request is not used in order to change the state of SIP calls, nor does it change the state of sessions initiated by SIP. Rather, it provides additional optional information which can further enhance the application using SIP . One of the potential uses of the INFO request is carrying mid-call PSTN signalling messages between PSTN gateways.
17 16 PUBLISH: The PUBLISH request, that is described in RFC 3903, can be used for publishing status changes of remote nodes without an initial subscription SIP Responses SIP responses are the answer to SIP requests, which means, that the response contains the information, that was requested and acknowledges the receipt of a request. In contradiction to SIP requests, SIP responses are not characterized with a method, but with a three digit status code. In addition to the status code SIP responses contain a standard reason phrase, that displays the information in words. SIP responses are categorized in six different types, which are distinguished by the first digit of the status code. The following listings contain an overview of status codes, that can be used within SIP responses. 1xx status codes (provisional responses): This type of responses are sent as answers to requests, that are initiated, but not yet finshed. Status code Reason phrase 100 Trying 180 Ringing 181 Call is being forwarded 182 Queued 183 Session progress Table 1.1: 1xx status codes 2xx status codes (successful): This type of responses are sent as answers to requests, that are received and handled successfully. Status code Reason phrase 200 OK 202 Accepted Table 1.2: 2xx status codes
18 17 3xx status codes (redirection): This type of responses are sent as answers to requests, that could not be fulfilled completely. The status information may contain additional information about the user s location. Status code Reason phrase 300 Multiple choices 301 Moved permanently 302 Moved temporarily 305 Use proxy 380 Alternative service Table 1.3: 3xx status codes 4xx status codes (request failure): If a request could not be fulfilled by a UAS because of the content of the request, 4xx responses are used as answers. Status code Reason phrase Status code Reason phrase 400 Bad Request 401 Unauthorized 402 Payment required 403 Forbidden 404 Not found 405 Method not allowed 406 Not acceptable 407 Proxy Authentication required 408 Request timeout 410 Gone 413 Request Entity too large 414 Request URI too long 415 Unsupported Media Type 416 Unsupported URI Scheme 420 Bad Extension 421 Extension required 423 Interval too brief 480 Temporarily unavailable 481 Call/Transaction does not exist 482 Loop detected 483 Too many Hops 484 Address incomplete 485 Ambiguous 486 Busy here 487 Request terminated 488 Not acceptable here 489 Bad Event 491 Request pending 493 Undecipherable Table 1.4: 4xx status codes
19 18 5xx status codes (Server Failure): This type of responses are sent as answers to requests, that could not be fulfilled successfully, because of internal server failure. Status code Reason phrase 500 Server internal error 501 Not implemented 502 Bad Gateway 503 Service unavailable 504 Server Timeout 505 Version not supported 513 Message too large Table 1.5: 5xx status codes 6xx status codes (Global Failure): If the contacted UAS has knowledge, that the request cannot be fulfilled at any server a 6xx response is generated. Status code Reason phrase 600 Busy everywhere 603 Decline 604 Does not exist anywhere 606 Not acceptable Table 1.6: 6xx status codes
20 SIP session establishment The typical SIP session establishment is fulfilled in a three way handshake manner. Figure 1.4: SIP Three Way Handshake As you can see in figure 1.4 User Agent A initiates the session establishment, by sending an INVITE request to User Agent B. The INVITE request is the first component of the three way handshake. User Agent B reacts and sends the provisional response 100 Trying back to User Agent A, followed by the provisional response 180 Ringing, which indicates, that the phone of user B rings. As 100 Trying and 180 Ringing are both provisional (optional) responses, they are not considered to be part of the three way handshake. As soon as user B picks up the phone, response 200 OK is generated by User Agent B and sent to User Agent A. User Agent A answers with the sending of an ACK, which indicates that he is still willing to communicate. As the messages 200 OK and ACK are second and third element of the three way handshake, and all session parameters are exchanged, the session is established. In our example User Agent B terminates the session with a BYE request, which is answered by User Agent A with a 200 OK response SIP transactions/ dialogs We can distinguish two main types of communication relations between SIP entities: transactions and dialogs. A SIP transaction is a sequence of SIP messages, that is sent between SIP entities and includes one SIP request and all responses to that request. The initiator of a SIP