What is SIP?


Session Initiation Protocol (SIP) is a communications protocol used to create, modify, and terminate sessions with one or more users. The working standard for the SIP protocol was first published by the Internet Engineering Task Force (IETF) in 2002. The IETF specification for SIP is RFC 3261. SIP sessions can include voice communications, instant messaging, and multimedia applications.

SIP is most widely used to initiate and terminate Voice Over Internet Protocol (VOIP) calls. SIP was originally designed to mimic the call setup and signaling characteristics of the traditional telephone network over an IP infrastructure. A typical SIP session involves a client requesting a session with a SIP server. After the request is received, the SIP server returns a response to the user indicating the availability of the session. SIP is ASCII text-based and it shares some common characteristics with HTTP. Users are identified by a SIP address which is similar to an email address.

SIP relies on a peer to peer architecture that uses intelligent network elements for advanced call processing and call management functions. These endpoints are referred to as the user agent client and the user agent server. A proxy server can be used as an intermediary responsible for transferring the request from the client to the SIP server. SIP proxy servers can provide advanced call processing functions including security, authentication, and call routing. Real-time Transport protocol (RTP) is used to carry the voice or video content at the application layer between SIP endpoints.

Intelligence at the network edge is a unique advantage that SIP has over the traditional packet switched telephone network (PSTN). Unlike traditional telephony, SIP distributes the call processing functionality to the endpoints, also known as the "network edge". The distributed network architecture used for SIP telephony is very different from the centralized design of the PSTN. The traditional PSTN concentrates call processing functionality in sophisticated and highly complex core network elements. Your average telephone set is a dumb endpoint that does not handle call processing. In this way, a SIP handset that you use to make VOIP calls has intelligence that enables it to conduct call processing functions. SIP networks are increasingly scalable because of the relatively simple core network architecture required.

Through the widespread use of instant messaging, SIP endpoints have become commonplace. Instant messaging applications including Microsoft MSN Messenger and Apple iChat are SIP clients that can be used to transport voice and video free of charge. The popularity of instant messaging has lead to the creation of specific instant messaging protocols based on the SIP standard. Extensible Messaging and Presence Protocol (XMPP) and Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE) are two SIP based protocols that allow presence information to be exchanged between SIP endpoints. Presence information is an indicator of a participant's willingness to communicate with another participant. For example, in an instant messaging client, the icon that tells others that you are online is a presence indicator.

In terms of telephony applications, SIP is not without limitations. The main criticisms of SIP involve limitations surrounding emergency calling and law enforcement interception activities. The peer to peer architecture of the SIP protocol uses IP endpoints that, by their nature, are unrestricted in terms of mobility. Unlike regular PSTN 911 calls, with SIP, there is no network layer location mechanism available to pinpoint the location of the user. It is also extremely difficult for law enforcement to monitor and intercept SIP based phone calls due to the same reason. Despite these limitations, SIP is growing in popularity due to its ease of deployment and management, low cost, and scalable capability.