Source: http://www.voipmechanic.com/sip-basics.htm

SIP (Session Initiation Protocol) is a signaling protocol, widely used for setting up, connecting and disconnecting communication sessions, typically voice or video calls over the Internet. SIP is a standardized protocol with its basis coming from the IP community and in most cases uses UDP or TCP. The protocol can be used for setting up, modifying and terminating two-party (unicast), or multiparty (multicast) sessions consisting of one or more media streams. Modifications can include changing IP addresses or/or ports, inviting more participants, and adding or deleting the media streams.

SIP is an application layer control protocol that supports five parts of making and stopping
communications. It does not provide services, therefore it acts with other protocols to provide these services, one of which is typically RTP that carries the voice for a call. The five parts of setting up and terminating calls that SIP handles are:

User Location: Determines where the end system is that will be used for a call.
User Availability: Determination of the willingness (availability) of the called party to engage in a call.
User Capabilities: Determination of the media and parameters which will be used for the call.
Session Setup: Establishment of the session parameters from both parties (ringing).
Session Management: Invoking the services including transfer, termination, and modifying the sessions parameters.




Diagram of a request, acceptance, setup and termination of a call.

SIP typically sends these messages in UDP (User Datagram Protocol) on port 5060, with 5061 used for a second line on a two line ATA*(see below). Included in the invitation, when setting up a call, are parameters describing exactly what form the audio or video will use. These parameters are included in the SDP (Session Description Protocol. When both endpoints agree and are ready to start exchanging media or data, RTP (Realtime Transport Protocol) is used to actually exchange the data or voice packets. RTP will typically be carried on a port from a range of ports, most likely between 10,000 and 20,000, which are then assigned to each endpoint after a negotiate and acceptance of a particular port on each side.
SIP is also used to keep the ATA device registered with the provider by communication with their server. This occurs when the ATA device or IP-phone is first plugged in and afterwards regularly on a preset interval. Information that is passed includes the IP address where the ATA can be located and other information that keeps the server updated with any information that may have changed since the last registration.

A request might look something like this:

INVITE sip:user@sipserver.com SIP/2.0
(Message Headers)
Via: SIP/2.0/UDP 10.10.10.10:5060
From: "Me" <sip:me@sipserver.org>;tag=a0
To: "User" <sip:user@sipserver.org>
Call-ID: d@10.10.10.10
CSeq: 1 INVITE
Contact: <sip:10.10.10.10:5060>
User-Agent: SIPTelephone
Content-Type: application/sdp
Content-Length: 251
(Message Body)
v=0
o=audio1 0 0 IN IP4 10.10.10.10
s=session
c=IN IP4 10.10.10.10
m=audio 54742 RTP/AVP 4 3
a=rtpmap:4 G729/8000
a=rtpmap:3 GSM/8000


As shown above certain information is sent along with an Invite which starts the process to establish a call session. That call session is typically voice sent via RTP (Realtime Transport Protocol). The Real-Time Transport Protocol (RTP) is an Internet protocol standard that specifies a way for programs to manage real time transmission of multimedia data, with VoIP is usually voice, but could be video, as well. For a list of SIP response codes and their corresponding meanings we have provided a list to the left, along with a PDF download for reference.

*Many VoIP providers can change the ports of a registered ATA so that it might use 5063, 5064 or 5068, etc, instead of the traditional 5060, 5061 ports. They do this as a way to keep multiple ATA devices in the same Natted LAN network from signaling all on the same ports. Even though a router is supposed to be able to send the correct data to the correct device and keep things straight, they often will end up sending the wrong information to the wrong device (registrations) and the end result will be an ATA not having registration. Sometimes a complete power cycle of all devices will correct the issue, but eventually it will likely occur again (if they are all signaling on the same ports). As routers become better and handle layer 2 and layer 3 translations correctly with better firmware, these issues will be less problematic. (But a $50.00 home router is not going to handle what a $1500.00 business router can. Although, some $50.00 routers can become $600.00 routers by loading dd-wrt or tomato open source software, which is free.)

Source: http://www.voipmechanic.com/sip-basics.htm