WebRTC General Basics
In this guide will give a deeper understanding of the WebRTC protocol itself. You do not need to read this guide to use the Nabto Edge WebRTC feature, this simply provides background information about the WebRTC protocol for the curious reader.
WebRTC is a protocol for real time communication in web browsers. Its main goal is to establish a peer-to-peer DTLS connection between two peers with a set of negotiated media tracks. Here we will give a basic overview of the WebRTC protocol in general without any Nabto Edge specific details.
A WebRTC connection starts by one of the peers offering a set of tracks. These can be video, audio, and data channel tracks. Once the set of tracks are added by a peer (Alice), WebRTC generates an Offer which must be transferred to the other peer (Bob). At this point, WebRTC does not know who the other peer is or how it will receive the Offer. This is up to the implementation and is part of what the Nabto Edge WebRTC solution includes.
When the Bob peer receives the Offer, it will look through the tracks offered and update each track with acceptable parameters. Bob will then generate an Answer which must be returned to Alice which, again, is not part of WebRTC. At this point, WebRTC will attempt to establish the DTLS connection. In this process, WebRTC may generate ICE candidates which must be transferred to the other peer in the same manner as Offers and Answers. The connection establishment process is detailed in a later section.
The process of transferring Offers, Answers, and ICE candidates is referred to as Signaling.
WebRTC Signaling
WebRTC Signaling messages are using the Session Description Protocol (SDP) to negotiate the WebRTC connection (RFC8866). SDP messages are text strings containing a set of fields. Each field follows the format:
<field character>=<value><CR><LF>
The field character identifies the type of field. The value is a text string formatted depending on the field type. SDP defines a few field types, most notably it defines the attribute field a=
which is a generic general purpose attribute. In this section, eg. the fingerprint attribute will refer to an attribute formatted: a=fingerprint:
.
In WebRTC signaling, Offers and Answers contains full SDP documents, the ICE candidate only contain the value of a candidate attribute.
For purposes of understanding WebRTC, we split the SDP into multiple sections. The first section is the Session description starting with the field v=
and ending at the first Media description marked by the Media field m=
. The Session Description contains options general to the session. Notable fields are the ice-options attribute and the fingerprint attribute. If the peer supports trickle ICE the trickle
is added to the list of ICE options. Trickle ICE is described in the Connection Establishment section. The fingerprint attribute contains the public key fingerprint of the encryption keys the peer intends to use for the resulting DTLS connection. Assuming the signaling can be trusted, this means the peers can ensure the resulting connection does not have a man-in-the-middle.
After the Session description section, is a Media description for each track in the WebRTC connection. A Media description starts with the field type m=
with a value string starting with one of Application
, video
, audio
. Medias of type Application are data channels used for general streaming of data, where video and audio medias are used to stream media tracks. Data channels and Media tracks are handled differently by WebRTC as explained in the sections below.
WebRTC Bundle Transport
In classical SDP negotiations, each media track is offered on a specific UDP port with an assumed RTCP channel on the following port. For 2-way media and possibly with multiple media tracks being offered, this can add up to a significant number of ports required by a single SDP negotiation.
Trying to establish a direct connection between the two peers (using ICE) is one of the main hurdles in a WebRTC connection. Each port the two peers must connect through will be its own connection. It is, therefore, beneficial to keep the number of ports used for a WebRTC connection as low as possible. To achieve this, WebRTC utilizes some extensions to SDP.
Firstly, WebRTC uses RTCP multiplexing (RFC8035) where both RTP and RTCP uses the same RTP session. Secondly, it uses BUNDLE transport where all tracks are bundled onto the same port (RFC9143). This means all RTP sessions are given the same port number. In WebRTC this means an RTP stream can have three port numbers depending on the connection state. In the first Offer, bundled RTP sessions will have port number 9. This is the Discard port which in WebRTC symbolizes that the connection has not yet been established, so it cannot assign a meaningful port number. In subsequent offers/answers, one RTP track should have the actual port used by the bundle, while the remaining tracks in the bundle should have port 0. However, some implementations will keep port 9 for the remaining tracks while others may give them the actual transport port. In the end, WebRTC will establish a connection and bundle all tracks onto that regardless of the RTP ports stated in the SDP document.
In SDP, an offered track is rejected by setting its port to 0. Since 0 has a special meaning when bundling tracks, the track must also be removed from the bundle. However, most actual applications will require a better error messaging than simply knowing a track was rejected. Such error handling is not provided by WebRTC and so should be implemented in what ever signaling protocol is designed for the particular application.
WebRTC Connection Establishment
Once the two WebRTC peers has exchanged Offers and Answers, WebRTC will start to establish the connection. This is done using the ICE framework (RFC8445). In this framework, both peers collects ICE candidates which are network nodes the peer thinks the other peer may be able to use for the connection.
Examples of candidates are the local IP and port of the peer, its public IP and port, and IP and port of a TURN server. In these examples, if both peers are on the same local network, the other peer can connect to the local endpoint directly. If they are remote, the peers can attempt NAT traversal with the public endpoint, and if all else fails, a TURN server can be used to relay data between the two peers.
The ICE candidates are included in the Offer/Answer messages, however, if both peers supports the Trickle ICE (RFC8838) option, candidates can be sent as they are gathered by a peer. Using the previous examples again, the local IP address may be found immediately, whereas the TURN server candidate may require the peer to obtain credentials and connect to the TURN server before it can be sent to the other peer. With Trickle ICE, the peer can include the local candidate in the offer/answer and send it immediately as opposed to waiting for the TURN server candidate to be resolved before sending its offer/answer. This means Trickle ICE can be used to speed up the connection process when early candidates are sufficient. On the other hand, by not supporting Trickle ICE, the signaling can be implemented with a simpler request/response mechanism where a peer send an offer in a request, and the other peer returns an answer in the response.
WebRTC Media tracks
When a WebRTC peer adds a media track to the WebRTC connection it must be negotiated with the other peer. This means an Offer is created with an added Media description section. This carries information about what the peer can offer with this track. This information includes the direction of the stream (send, recv, both), an identifier (mid
attribute), and which Media Codecs the peer supports for this particular track. When the Offer is ready, it will be sent to the other peer as a signaling message.
In WebRTC, the mid
attribute can be any string chosen by the peer adding the track. In practice, this is chosen by the underlying library and so cannot be used to identify the track on an application level (ie. this cannot be used by one peer to request a specific feed (eg. cam2
) from the other peer).
When a peer receives an offer containing a Media description, it must look through it and update any unsupported options. Mainly, this requires sorting through the Media Codecs, selecting which one to use, and removing all other codecs from the track. The returned Answer will then complete the negotiation of the track.
We will use an example to better understand the Media description SDP section and how tracks are negotiation.
Media track negotiation example
Alice adds a track offering to receive a video feed. To keep it simple, we say only supports three codecs. The media description for the track will look similar to this:
m=video 9 UDP/TLS/RTP/SAVPF 100 102 104
c=IN 0.0.0.0
a=mid:0
a=recvonly
a=rtcp-mux
a=rtpmap:100 VP9/90000
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:102 H264/90000
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:104 H264/90000
a=rtcp-fb:104 nack pli
a=fmtp:104 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
The m=
line states Alice added a video feed on the discard port (9). It then lists the protocol stack supported and, finally, that it supports RTP payload types 100, 102, and 104. These three payload types are described in the following attributes. From the attributes it is seen payload type 100 is using the VP9 codec, it supports the RTCP feedback extension nack pli
, and its format specific parameter shows it uses profile-id=2
. Similarly, payload types 102 and 104 are both using the H264 codec but differs in their parameters.
When Bob receives this media description (inside the larger SDP document of an Offer), he searches the payload types and determines he can satisfy the track using payload type 104. He removes the two other payload types and updates the media description. Bob also adds his video source to the description:
m=video 37519 UDP/TLS/RTP/SAVPF 104
c=IN IP4 192.168.1.195
a=mid:0
a=sendonly
a=rtcp:9 IN IP4 0.0.0.0
a=ssrc:42 cname:frontdoor-video
a=rtcp-mux
a=rtpmap:104 H264/90000
a=rtcp-fb:104 nack pli
a=fmtp:104 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
In Bobs answer, he can now assign the session a port number and an address, though, these may be updated later if the final connection established uses another candidate. Bob also flips the direction of the track to sendonly
, since he will be sending video to Alice. The ssrc
Bob intends to use in the RTP packets is also added, while all references to payload type 100 and 102 are removed.
In this example, Bob must carefully sort through all codecs and their parameters supported by Alice to find one he supports. Many IoT cameras only supports a single codec, so when possible, it is recommended that Alice instead uses some other means to ask Bob to offer her the desired track (eg. using the signaling channel or using a data channel).
WebRTC Data channels
WebRTC data channels uses SCTP (RFC9260) over DTLS. When a peer adds a data channel to the WebRTC connection an media description of type application is added to the SDP and offered to the other peer. This media negotiation is only used to exchange SCTP port numbers between the peers enabling them to open an SCTP stream on their DTLS connection. Only after the SCTP stream is opened is the data channel created. This has the benefits that the application can assign a label to a data channel and that subsequent data channels can be created without the need for WebRTC to renegotiate the connection.