WebRTC over WebSocket in Node.js

WebSocket servers based on Node.js can provide full-duplex, real-time signaling for WebRTC implementations.

Introduction

WebSocket is a protocol that enables real-time communication between client applications (e.g., browsers, native platforms, etc.) and a WebSocket server. For full-duplex, real-time communication, the WebSocket protocol is the recommended standard when compared to HTTP due to its lower latency and overhead.

Built on top of TCP, WebSocket supports a two-way communication model, whereby client-server connections are always kept open, enabling seamless transfer of media types like video/audio. The connection can either be encrypted or unencrypted. The specification defines URI schemes: ws (WebSocket) for unencrypted and wss (WebSocket Secure) for encrypted connections.

As we proceed, we will explore the WebSocket protocol and also how to set up a basic WebSocket server with the ws WebSocket library for Node. For now, let’s quickly explore WebRTC, a protocol available on all modern browsers and on native Android and iOS platforms via simple APIs.

Introducing WebRTC

WebRTC, which stands for Web Real-Time Communication, is a protocol that provides a set of rules for bidirectional and secure real-time, peer-to-peer communication for the web. With WebRTC, web applications or other WebRTC agents can send video, audio, and other kinds of media amongst peers.

WebRTC relies on a bunch of other protocols to achieve its purpose of creating a connection or communication channel, and then transferring/exchanging data and/or media types. To coordinate communication, WebRTC clients need some sort of “signaling server” in between, which is necessary for exchanging metadata info.

Socket.IO– and ws-based Node.js servers offer an alternative for providing signaling in a permanent, real-time manner for WebRTC clients to share session descriptions and media information and actually exchange data. Therefore, they can both be used as complementary technologies.

Note: In other to maximize compatibility with established technologies, signaling is not implemented by the WebRTC open standard. As we know, different applications may prefer to use different signaling protocols or services, therefore validating the need not to include it in the core.

How the WebRTC protocol works

A WebRTC agent knows how to create a connection with peers. Signaling triggers this initial attempt, which eventually makes the call between both agents possible. Agents make use of an offer/answer model: an offer is made by an agent to start the call, and another answers the call for compatibility checks with the media description offered.

On a high level, the WebRTC protocol works in about four steps. For these stages, the communication happens in a dependent order, where one stage must be complete before the next stage can commence. These four stages include:

1.) Signaling

This begins the process of identifying two WebRTC agents that intend to communicate and exchange data. When peers eventually connect and can communicate, signaling makes use of another protocol under the hood, SDP.

The session description protocol (a plaintext protocol) is useful for exchanging media sections in key-value pairs. With it, we can share state between two or more intending connecting peers.

Note: A shared state can provide all the needed parameters to establish a connection amongst peers.

2.) Connecting

After signaling, WebRTC agents need to achieve bidirectional, peer-to-peer communication. Although establishing a connection could be difficult due to a number of reasons like IP versions, network location, or protocols, WebRTC connections offer better options when compared to traditional web/server clients. These include reduced bandwidth, lower latency, and better security.

Note: WebRTC also makes use of ICE (Interactive Connectivity Establishment) to connect two agents. ICE is a protocol that tries to find the best way to communicate between two ICE agents. More details can be found here.

3.) Securing

Every WebRTC connection is encrypted and authenticated. It makes use of DTLS and SRTP protocols to enable a seamless and secure communication across the data layer. DTLS, similar to TLS, allows us to negotiate a session and then exchange data securely between two peers. On the other hand, SRTP is designed for exchanging media.

4.) Communication

WebRTC allows us to send and receive an unlimited amount of audio and video streams. The protocol is independent of a particular codec, as there are options.

It relies on two pre-existing protocols: RTP and RTCP. RTP is the protocol that carries the media. It was designed to allow real-time delivery of video. RTCP is the protocol that communicates metadata about the call.

Note: For most WebRTC applications, there is no direct socket connection between the clients (unless they reside on the same local network). A common way to resolve this sort of issue is by using a TURN server.

The term stands for Traversal Using Relay around NAT, and it is a protocol for relaying network traffic. NAT mapping, with the help of STUN and TURN protocols, allows two peers in completely different subnets to communicate.

Use cases for WebRTC

Since WebRTC is generic for real-time applications on the web and on mobile platforms, some of its most common use cases include:

Video and text chatting
Analytics
Social networking
Screen-sharing technologies
Conferencing (audio/video)
Live broadcasting
File transfer
Elearning
Multiplayer online games
And so on…

#node #websocket #web-development #programming #developer