Update: Philipp continues to reverse engineer Hangouts using chrome://webrtc-internals. Please see the bottom section for new analysis he just put together in the past couple of days based on Chrome 38.
As initiators and major drivers of WebRTC, Google was often given a hard time for not supporting WebRTC in its core collaboration product. This recently changed when WebRTC support for Hangouts was added with Chrome 36.
So obviously we wanted to check out how this worked. We also were curious to see how a non-googler could make some practical use of chrome://webrtc-internals. Soon thereafter I came across a message from Philipp Hancke (aka Hornsby Cornflower) saying he had already starting looking at the new WebRTC hangouts with webrtc-internals. Fortunately I was able to convince him to share his findings and thorough analysis.
Philipp has been a real-time communications master for more than 10 years. He currently works for simpleWebRTC and talky.io creator &yet. In addition, he is a major contributor to the Jitsi Meet project. As a long-time member of the XMPP Standards Foundation (XSF), Philipp has contributed to a number of key XEPs (XMPP speak for an RFC). Incidentally, he is currently serving on the XSF council (which is XMPP speak for the circle of the initiated XMPP gurus that control the world).
You’ll see how the combo of his vast WebRTC knowledge and XMPP/Jingle background come together in the analysis below.
Google announced in late June that their Hangouts now supports WebRTC. At a technical level, this is a very good chance to analyze how Hangouts works using Chrome’s built-in WebRTC diagnostics tool – chrome://webrtc-internals.
The webrtc-internals page is an extremely useful tool for debugging WebRTC issues in Chrome. It shows all API calls of all PeerConnection objects along with additional statistics like bandwidth consumption in a very nice way. This allows us to observe what PeerConnection API calls are used by WebRTC without digging into the source code at all.
As my webrtc-internals analysis below reveals, some of the non-standard features implemented in Chrome from when its first added WebRTC support are used to facilitate plug-in free Hangouts there. These features are:
None of these are part of the official WebRTC spec. This is also explains why Firefox does not work with the WebRTC version of Hangouts.
Another reason for the lack of support is Firefox is because Hangouts sends multiple audio and video streams over a single PeerConnection. While Firefox does not support this at all, Chrome has used a variant called Plan B (which was not adopted by the IETFs RTCWEB working group) for a while. Basically Plan B calls setRemoteDescription/setLocalDescription repeatedly over the lifetime of the stream, adding source-specific media attribute lines (as described in RFC 5576) in the SDP. The setRemoteDescription calls trigger a onaddstream or onremovestream callback respectively.
This is not the first step Google makes to upgrade Hangouts to WebRTC. In August 2013, the video codec used changed from H.264 (presumably H.264 SVC, the scalable variant) to VP8.
As we will see later, the new WebRTC-hangouts make use of a technique called Simulcast which is similar to SVC. Possibly Google had already introduced this technique back then, making small, well defined steps in the upgrade process.
In simulcast, several video streams are sent, each with different resolutions and framerates. Conversely, SVC does this using a single video stream. When simulcast is used to send multiple streams over the same connection there is not much difference, even though SVC is slightly more efficient. One of the most important features of both SVC and simulcast is that a Selective Forwarding Unit (such as the Jitsi Videobridge or the VidyoRouter that is presumably still used by Hangouts) can forward low-bandwidth versions of the stream to certain clients without having to decode and re-encode the video in the process.
Another issue that was quoted back in August 2013 as a reason that Hangouts did not switch to WebRTC yet was the hats feature. Now all people in the screenshots are wearing hats and Chrome Native Client extensions (NaCL) is required. This might mean that the hats feature (which requires facial recognition) needs more performance than currently available at the JavaScript layer to work well.
The goal of this step seems to have been rolling out compatibility with Chrome without making additional changes on the server-side infrastructure. Removing the non-standard elements like SDES and RTP-based data channels might happen in the future, but will not impact users in any way. Firefox compatibility will then mostly depend on how fast Mozilla is able to add multiple streams per PeerConnection.
Even though the use of Chrome Extensions for screensharing has been advocated by the WebRTC team, Hangouts does not require an extension to use the chrome.chooseDesktopMedia API.
As a summary, let us compare where what Chrome uses in this case is different from the WebRTC standard (either the W3C or the various IETF drafts):
Feature | WebRTC/RTCWeb Specifications | Chrome |
SDES | MUST NOT offer SDES | uses SDES |
data channels | DTLS/SCTP-based | unreliable RTP-based |
ICE | RFC 5245 | google-ice |
Audio codec | Opus or G.711 | ISAC |
Multiple streams | undecided yet | Plan B |
Simulcast | undecided yet | proprietary SDP extension |
If you don’t believe me, or want to learn how to use webrtc-internals to extract this kind of information and more from your WebRTC session, then please review the detailed analysis section below.
We are now going to take a quick look at a session we dumped. The full dump is available here, saved directly from webrtc-internals.
Beware, the following sections use quite a lot of SDP terminology, so reading the SDP anatomy blogpost is highly recommended.
The constraints used to create the RTCPeerConnection object: