]> Untangling Identity in SIP Systems RTFM, Inc.
2064 Edgewood Drive Palo Alto CA 94303 USA ekr@rtfm.com
There has been a lot of recent discussion about how to securely identify principals in SIP systems. This document attempts to clarify the inherent technical limits of this effort and the capabilities offered by certain schemes.
From the perspective of the communicating parties, the "identity" service that they want is simple: to know that the person on the other end of the line is who they think it is. Because of the asymmetric nature of telephony, this actually represents two different services, one from the perspective of the caller and one from the perspective of the callee: The caller knows who he is trying to call--indeed he entered their number into his phone. The service the caller wants is ROUTING INTEGRITY: namely that the person he is talking to actually is who he intended to call. This service is complicated to some extent by retargeting services such as call forwarding, but in those cases the caller wants to know that the retargeting was authorized by the intended recipient. Unlike the caller, the callee generally does not have a preconceived idea of who he is trying to call. Rather, his phone rings and he wants to know who is calling. The service he wants is CALLER IDENTIFICATION. It is important to note that regardless of the technological means used to provide these services, they are extremely different conceptually. ROUTING INTEGRITY is a verification problem: the system needs to verify that you reached the person you were trying to reach. By contrast, CALLER IDENTIFICATION is an information provision problem. A variety of technical mechanisms have been proposed (and are in use) to provide these services for SIP-based telephony . Recently, questions have been raised about the suitability of one such mechanism: SIP Identity . This draft is intended to provide a broader perspective on the identity problem, the inherent technical limitations on potential solutions, and the capabilities of the existing approaches.
Before addressing identity in SIP systems, it is useful to start with identity in the PSTN. The two most important facts about identity in the PSTN are these: The only valid identifiers were phone numbers. The PSTN is a fundamentally closed system with no security real security mechanisms to prevent insider attack. As we shall see, these two properties allow for only a fairly limited set of identity services.
The PSTN is a hub-and-spoke topology, as shown below:
Sub. 1 ----+ +---- Sub. 4 \ Trunk / Sub. 2 ------ CO-1 --------------- CO-2------ Sub. 5 / \ Sub. 3 ----+ +---- Sub. 6
Each subscriber is connected by a physical circuit to a switch in their local CENTRAL OFFICE (CO). Although there may be multiple phones in a single house, they are all electrically connected on the same local loop and are indistinguishable from the perspective of the CO Therefore, all phones on the same local loop have the same phone number. The CO is connected to other COs via trunk lines, which are used to carry calls in between COs. Similarly, providers can connect to each other at dedicated switches which only serve to switch traffic and don't have any subscribers directly attached to them. All of these connections are digital and speak SS7. The CO switch is the analog/digital boundary.
From the very beginning, of telephony systems, it has been assumed that they provide routing integrity. This was historically viewed more as a correctness guarantee as a security guarantee--if your phone calls don't go the right place, there's not much point in having a phone--but it is of course also a security guarantee. When a subscriber wants to call another subscriber, they enter the phone number on the phone and it gets transmitted over the analog voice channel in DTMF (again, we're just talking about the basic analog case). The DTMF decoder at the CO switch decodes the DTMF and determines which number the caller intends to call. If the number is assigned to that CO (and if the CO has an authoritative copy of which numbers have been ported), it can handle it locally. If the number is not assigned to that switch, then the call needs to be routed to another switch. This may be the final switch, or the call may be routed to another carrier and so on until it reaches the final switch. This routing information isn't kept locally, at least in the US. Instead, the switch needs to consult a numbering plan database [TODO: need the right name here] in order to determine which provider/switch controls that number. Once that is done, the switch routes the call down the appropriate trunk towards the appropriate destination. This system provides a relatively high degree of routing integrity provided that two invariants are preserved: The inter-switch interconnects are kept secure. The connection to the numbering database (and the database itself) is secure. Both of these invariants are enforced through having a closed system. The connections between the switches are not virtual but rather are manually configured physical circuits, so impersonating one switch to another is a matter of actually interfering with that cable. The connection to the database is similarly manually configured [Note: I don't know if this is physical or over the Internet, but this obviously isn't an architectural issue.] Both of these invariants, however, are very brittle against insider attack. An attacker who controls a switch can reroute all calls that would pass through it . An attacker who controls the numbering database can inject arbitrary routes. The basic assumption is that insiders will not attempt to attack the system.
Unlike routing integrity, caller identification is a relatively recent feature of common phone systems, The typical residential presentation is a one or two line display that shows the phone number of the calling party as well as their name (call center presentation is of course different.) In the PSTN CID is provided by having the originating switch indicate (in the SS7 signaling) the calling party's phone number. This is then modulated over the analog carrier for display on the callee's phone. If the caller's name is provided, it is determined by the callee's local switch, which needs to do a database lookup against a third party database. Somewhat surprisingly, there is no enforcement whatsoever of the correctness of the provided number. The originator simply asserts it and it is trusted by the receiver. Moreover, if a call goes through intermediate carriers, then the trust here is transitive; each carrier trusts the previous carrier to provide correct information. In principle, it might be possible to do some sort of inbound route filtering using the numbering database (although that would only provide hop-by-hop assurance similar to that provided by uPRF) but in practice this is not done. The ability to forge the calling number is not limited to switches. Any subscriber with digital access (e.g., ISDN) to the system can generate any calling number it chooses. Because the local switch knows which numbers are assigned to a given subscriber, it can in principle filter these assertions but even this is often not done. In other cases, subscribers can opt out of this kind of filtering in order to be allowed to assert numbers of their choice. The phone number to name mapping provided with caller id is even less trustworthy, since the databases are simply populated via the phone company's records, and they make no attempt to verify your real-world identity provided that you pay your bill. In summary, the caller identification service provided by the PSTN is completely untrustworthy, except as an assertion of the caller's intent.
We now turn to the question of identity in SIP systems. As before, we start with topology.
The figure below shows the classic "SIP trapezoid", the paradigmatic configuration for SIP-based telephony. To some extent, this is a hub-and-spoke system: each user agent is connected to a SIP proxy and the signalling (mostly) goes between proxies. However, for performance reasons the media is routed directly between the UAs. This is different from the PSTN, where the signalling and the media go through the same switches.
SIP PROXY -------------- PROXY (atlanta.com) (biloxi.com) / \ SIP / \ SIP / \ UA UA (alice@atlanta.com) <-------- RTP --------> (bob@biloxi.com)
The most important difference between the PSTN and VoIP is that the traffic between the various entities is not carried over dedicated physical circuits but over the Internet. As a consequence, an element cannot trust that data it receives claiming to be from another element in fact came from that element. Cryptographic mechanisms are required to restore this security.
The simplest and most widely used SIP security mechanisms are intended to provide same kind of link security that the PSTN provides using dedicated circuits over an Internet transport. In the SIP signalling system, there are two kinds of connections (UA-proxy and proxy-proxy) and this implies three kinds of security properties that must be enforced: A SIP UA knows that it is communicating with its proxy. A proxy knows which UA(s) it is communicating with. A proxy knows which proxy (proxies) it is communicating with. In SIP, these security properties are provided with two mechanisms: TLS and Digest Authentication. Proxies authenticate to UAs and to each other via TLS and UAs authenticate to proxies via digest authentication. when Digest and TLS are used correctly, they are as secure, if not more secure, than the physical connections they replace. For instance, it is possible to steal someone's phone number by connecting to the box on the outside of their house. However, if your SIP phone is authenticated via digest authentication, then even an attacker connected to the Internet cable outside your house cannot impersonate you to the proxy. Digest and TLS alone are sufficient to allow the construction of a SIP system with equivalent (arguably superior) signalling security (ignore the media for the moment) as that of the PSTN. The basic principle is the same: have a closed network where every proxy trusts every other proxy.
In the PSTN model, routing is done courtesy of a giant database. This is necessitated by the existence of a largely unstructured identifier space. Because the SIP identifier space is structured along "user@domain" lines, less centralization is required. Instead, messages directed to "alice@example.com" need to be routed to the proxy "example.com". DNS is used to identify the IP address of the relevant proxy and TLS certificates are used to authenticate it. The routing integrity properties of this system are similar, and perhaps somewhat superior to those of the PSTN. As with the PSTN, if a proxy is compromised, then all traffic that passes through that proxy can be hijacked. Similarly, while the DNS database need not be trusted, if the certificate authority which is trusted to issue proxy certificates is compromised, then an attacker can probably impersonate other sites, but CAs can be operated more securely than the number database server so this attack is probably more difficult. Note that the above only applies to SIP URIs that are in the "user@domain" form. SIP UAs can also accept URIs that represent numbers in the PSTN (either as sip: and tel: URIs). These calls must be routed through a PSTN gateway, at which point the security properties are those described in .
If only channel security is in use, then caller identification in SIP relies on transitive trust. In the basic RFC3261 system, the caller simply asserts their identity in the From header and the caller either trusts it or does not. This is comparable to the case in the PSTN where an ISDN user or a PBX asserts caller identity and the switch does not check it. In addition, the originating proxy (or perhaps some proxy later in the chain) verify the user's identity and inserts it into the PAI header . This header is explicitly intended to be trusted and so proxies are intended to do more than simply trust the user's From field. However, PAI is not cryptographically protected in any way (other than being carried over the TLS connection, of course), and so any proxy in the network can generate any PAI value of its choice. Thus, a single compromised SIP proxy can impersonate any user. If a call enters a SIP network via a PSTN gateway, the caller identification information (which, recall, is basically untrustworthy), can be inserted into the From or PAI headers, at which point it is as vulnerable as if it were a "user@domain" identity.
As described in , The security properties provided by channel security mechanisms are less than optimal, and RFC 3261 and followon RFCs describe a set of mechanisms intended to provide superior security guarantees. The basic principle is to secure the SIP messages themselves, rather than the channels that they pass over. This allows the creation of security properties that are end-to-end rather than hop-by-hop.
RFC 3261 included support for encrypting and signing SIP message bodies with S/MIME. The idea was that each end-user would have an S/MIME certificate containing their domain name. Unfortunately, the burden of obtaining end-user certificates has proven to be so high that S/MIME has seen almost no deployment in SIP. However, it is instructive to examine the security properties that it would provide.
S/MIME allows the partial provision of routing integrity, even in the face of untrusted proxies. In particular: S/MIME encryption can be used to prevent anyone but the intended recipient from reading the message body. If the callee signs his response, then the callee can know the identity of the person he reached. Unfortunately, neither feature interacts well with proxy-based retargeting. If Alice calls Bob and Bob's proxy server retargets to Charlie (Bob's secretary), then the intended recipient will not be the same as the actual recipient. Alice has no way of distinguishing this event from an attacker redirecting the traffic to Charlie. Thus, although S/MIME provides cryptographic mechanisms to stop retargeting, those mechanisms also stop legitimate retargeting. Thus, S/MIME can provide some additional routing integrity by giving the caller clear information about who they reached. However, it does not provide any mechanical way of determining whether that was the right person or not. It leaves that decision to humans. The S/MIME strategy does not work particularly well on calls to and from the PSTN. Clearly, POTS phones do not have S/MIME certificates and will not be signing SIP messages. The PSTN gateway can of course obtain certificates of the form "+16505551212@example.com" for each possible phone number (this only works for signatures, not for encryption) but it's not clear why a callee should trust such a certificate. In order for this to work, there must be some way to determine which proxies are authorized to assert which phone numbers. This is NOT a problem with S/MIME per se. It's a basic problem in going from a system where every node is fully trusted to one in which finer grained trust is desired.
S/MIME does a better job of providing caller identification. The caller can simply S/MIME sign her SIP messages and the recipient can verify them and display the identity in the caller's certificate as the caller's identity. This is a relatively well understood problem in the S/MIME arena, though user interface issues persist. One caveat must be noted here. Because S/MIME only the SIP body and not the headers, there are potential replay and cut-and-paste attacks on the protocol. These can be greatly mitigated by using secure media which is cryptographically tied to the signalling. More on this in Section XXX. All the same problems with S/MIME and PSTN gateway discussed in the previous section apply here as well.
The non-acceptance of S/MIME was widely perceived as a result of the difficulty in getting users to adopt UA-level certificates. In response, new system was created: SIP Identity . SIP Identity is a compromise system was created which used certificates associated not with end-entities but with an "authentication service" associated with each SIP domain. The authentication service verifies the users identity and then signs the message using a single certificate associated with the domain (this can be thought of as a cryptographically secure version of RFC 3325). One important difference between SIP Identity and S/MIME is that SIP Identity can only be applied in requests. The reasoning for why this makes sense is as follows: Namespaces in SIP are hierarchical. Thus, "alice@example.org", means "user Alice in the domain example.org". Thus, the operator of "example.org" is in the best position to say who Alice is and if they wanted to give someone else that address, they could do so. Thus, it makes sense for the domain operator to be vouching for Alice's identity, which they do via the authentication service. Thus, end-entity certificates as used in S/MIME are not required. One advantage of this mechanism over S/MIME is that the signature covers the headers as well as the body and therefore it provides slightly better anti-cut-and-paste properties than S/MIME. Like S/MIME, SIP Identity can only provide a minimal level of security for identities from the PSTN.
SIP Identity provides the same routing integrity guarantees as the signing version of S/MIME: the caller can verify the identity of the person they called but has no mechanical way of determining whether it was the right person or not. However, because RFC 4474 is only usable with requests, this information is not provided in the INVITE. Instead, the callee needs to do a mid-dialog update as described in .
SIP Identity provides the same caller identification guarantees as the S/MIME signatures over the invite: the callee can securely identify the person calling them even in the face of malicious proxies operated by third parties.
The above discussion covers only signalling-level security. However, a VoIP system doesn't provide useful security if the signalling is secure but the media is not. There is common agreement on SRTP as the protocol for encrypting RTP, but a variety of key management approaches have been designed. Because the signalling is used to set up the call, having a secure combined system requires cryptographically tying the signalling to the media. For our purposes we can divide mechanisms for doing this into two categories: Those which require confidentiality in the signalling (e.g., SDES ). Those which require integrity in the signalling (e.g., DTLS-SRTP ).
Note: ZRTP has a mode in which it operates independently of the signalling, but this has two undesirable consequences. First, there are known security problems with the voice-based short authentication string mechanism it uses. Second, it means that all the information in the signalling must be disregarded as untrusted, which is undesirable.
FOOBAR
One way of binding the media to the signalling is to carry a secret in the signalling which is then used to establish the keys for SRTP. SDES is one such mechanism. If such a mechanism is used with S/MIME encryption, it can provide end-to-end security for the media because only the recipient sees the keys. However, if S/MIME encryption is not used, any intermediate proxy can see the keys. In addition, unless TLS encryption is used on every link, a passive attacker may be able to recover the traffic keys. Either attack would allow the attacker to impersonate either side of the call to the other (whether on or off-path) and if he is on-path for the media, to decrypt all the media as well.
Mechanisms also exist which depend only on the integrity of the signalling. For instance, DTLS-SRTP carries a fingerprint of the UA's certificate in the SIP signalling. This cryptographically binds the signalling to the media plane key agreement protocol. In this system, the media is secure provided that the integrity of the signalling messages is preserved, which can be done either with S/MIME signatures or (more likely) with RFC 4474 Identity. If either mechanism is used, the media is secure from attack by any proxy in the middle of the signalling, even if TLS is not used for some proxy-proxy links. It's important to understand that this sort of cryptographic binding simple extends the security boundary of the signalling to cover the media: once you have verified the caller/callee's identity using whatever security mechanism you are using, the binding then allows you to be sure that the media came from the same entity. However, if the signalling authenticates someone you have never heard of, or--as in the PSTN gateway case--provides minimal information about the real caller, then media encryption does not help to resolve that uncertainty.
Prior to IETF 71, a number of internet-drafts were published documenting concerns the authors had about RFC 4474. This section attempts to provide a summary of the relevant issues.
&RFC2119; &RFC3711; &RFC4346; &RFC4347; &RFC4474; &RFC3261; &RFC3263; &RFC3325; &RFC4916; &RFC4568; &I-D.ietf-avt-dtls-srtp; &I-D.ietf-sip-dtls-srtp-framework; Commission Hearing Probes Vegas Vice Hacks