WebRTC for a plugin free IP Network Camera

25 Aug

A brief history

IP Network Cameras (henceforth referred as IPNC) have always had an affinity for the Web Browser. The focus has always been on ease of installation. Consumers would typically want to install their cameras and get going without having to install any software.

Web browser has been the software of choice. This typically does not need any installation, as Internet Explorer has always been bundled along with the most dominant Operating System for Desktops. More recently Chrome and Firefox have garnered sufficient popularity to warrant any introduction.

Initial IPNC Web GUI applications were developed entirely as ActiveX plugins for IE. There was hardly any ‘Web’ component in it. At the time with IE having a dominating market position and supporting easy installation with CAB files there was value for the customer.

However there was  a problem. Customization required working with different set of tools and source code. With browser technology improving at a fast pace, there were new solutions. These solutions involved developing the UI (User Interface elements) using Web technology; audio and Video were being still handled as plugins.

The above solutions still had issues namely portability for audio and video. Over time Chrome and Firefox have cornered a dominant position in the market. Linux and Mac OS have also come up as platforms demanding attention from Camera manufacturers. With browsers supporting different plugin APIs (NPAPI and ActiveX) across different operating systems, it was a challenge.

We did solve the above issue with the help of libVLC. More details can be found here.

Problem brewing

Plugins with their security and stability issues have not been a natural choice for web applications. With the advent of HTML5 technology major browser vendors have decided to phase out support for plugins. Chrome and Edge being leaders. This creates a problem for all legacy systems.

Choosing the technology

– HTML5 Video ?

This is the obvious choice. However it does not support RTP/RTSP and hence is not an ideal fit as far as latency is concerned.

– Media Source Extensions ?

This is a possible choice. It allows us to develop our own protocols. However RTP does support RTP parsers natively. While we could change container formats from RTP to MP4 using javascript it is not advisable. This API is currently suited for streaming media rather than live media.

– WebRTC ?

This is the ideal candidate. It has been built for Real time communication over the web. This has been a boon in disguise. Surveillance video requires low latency. The latency requirement is similar to that of a Video conferencing system. Additionally WebRTC with the help of STUN and TURN provides for firewall traversal, which is added advantage for IPNC use case.


Solution overview

– Tech overview

From an application perspective WebRTC can be divided into two parts.

– – Browser API

This is also known as WebRTC API. This API primarily provides the capabilities of the system in terms of media supported and network endpoints (through ICE). A/V streaming is also handled by the browser.

getUserMedia while technically not a part of WebRTC is a prerequisite. It is used for capturing audio and video from the system.

– – Signalling

Signalling loosely refers to the discovery and setup process. In a typical conferencing scenario it would involve finding your friends to communicate and sharing the media capabilities and network endpoint details of your system.


– A typical WebRTC app

Lets take an example of how a typical WebRTC based Web Conferencing app would work. Let Carl be the person initiating the call and Roy be the person receiving the call.

Initially both of them would log on to the Web App and register their presence in the system. Typically this would involve storing their presence in a database on a server.

Subsequently Carl  would initiate the call. This would involve the JS (Javascript) code on Carl’s browser to ask for permission for Audio and Video capture (via the getUserMedia call). On success Carl’s side of the app would get its endpoint (ICE candidates) and media capabilities (encoded as SDP). This would then be communicated to the server along with an intention to communicate with Roy. This in typical WebRTC parlance would be called the offer.

The server would then communicate the offer to Roy. Roy would then send its capabilities and network endpoint details to the server, which would then relay it back to Carl. This in typical WebRTC parlance would be called the answer.

Now the WebRTC code in the respective browsers would setup the two way A/V stream. Once the setup is complete the respective ends would be notified. The notification would include a MediaStreamEvent.  This would then be used to setup the url in the <video> tag.



The IPNC app differs from a typical app in the way different components are located in the overall system. This is because we want to give the user the same experience he is used to. i.e. Enter the Url for the camera into the browser and things start working.

The details are as follows

– – Server for discovery

Typically the IPNC camera details are known a priori. The camera discovery is done via a CMS, DNS or is advertised manually. i.e we know the hostname / ip address of the camera

– – Signalling Channel

The technology used for signalling must be peer to peer. Deploying an intermediate server for communication would complicate installations.

– – Media

Typically the IPNC application needs to handle media as follows.

  • The browser app would always be the caller and the IPNC would be a receiver.
  • Both the caller and receiver would not have “self views” (previews).
  • The caller would only send audio.
  • The receiver would be sending both audio and video.

– – Portability

The software stack on the receiver would have to be portable on an ARM Linux Embedded system as majority IP Network Camera designs are based on ARM Linux.


The web application design is pretty straightforward as there are plenty of good resources. The challenges are on the IPNC. The following sections discuss about some of the implementation details and the problems that we have faced.


– Architecture

WebRTC Architecture

– Implementation details

The solution has two parts.

– – IPNC Linux App (running on the IP Camera)

  • Based on openwebrtc.
  • Use websocket as the signalling channel..

– – IPNC Web App

The WebApp is basic but functional across Chrome and Firefox (Windows 8.1 and Linux, Ubuntu 14.04, x86_64). The changes have been on similar lines. Details below.

  • Use websocket for signalling.
  • Capture only audio.
  • Send only audio.
  • Disable self views.
  • Fullscreen view
  • Support video overlays

A lot of the IPNC WebGUI’s support drawing overlays on the video (for providing inputs to the image processing algorithms running on the camera.). We have done a simple overlay using canvas for fun :-).


We have a system running using a Logitech Webcam on a Linux PC. A recording of this demonstration is available on here on Youtube.

Way forward

We are working on getting the system to work on an embedded system. We have built a recipe for cross compiling OpenWebRTC with some work still remaining  to get the system working on an embedded platform, with support of hardware enabled encoding.


Email: [email protected]

Leave a Reply