Browser Fundamentals | Part-3
Networking Engine
author
Written byLakshit NagarPrinciple Software Developer@Oracle India (Ex-ThoughtWorker, IIT Kanpur Alumni)
Browser Fundamentals | Part-3
Networking Engine
author
Written byLakshit NagarPrinciple Software Developer@Oracle India (Ex-ThoughtWorker, IIT Kanpur Alumni)

Intended Audience

This article is written by a software developer for anyone who is interested in the technical aspects of modern web browsers. Readers do not need any pre-knowledge to understand the content of the article. All you need to know is overview of browser (Browser Fundamentals | Part-2)

This is a part of series on browser fundamentals. In this series, we will layer by layer peel the unknowns of a browser and eventually deep dive into the technical aspect of it. From an HTTP request to content rendering.
  1. Browser Fundamentals | Part-1 (Why and what do we need to know about browsers?)
  2. Browser Fundamentals | Part-2 (Technical overview of any browser)
  3. Browser Fundamentals | Part-3 (Networking Engine)
  4. Browser Fundamentals | Part-4 (Rendering Engine)
  5. Browser Fundamentals | Part-5 (Javascript Engine)
  6. Browser Fundamentals | Part-6 (Browser Engine)

Content

  1. Introduction
  2. Layer I : Protocols(TCP, UDP, TLS, DNS)
  3. Layer II : Socket Management and Optimization
  4. Layer III : Network Security and Sandboxing
  5. Layer IV : Storage and Session Management
  6. Layer V : Application Layer Protocol - HTTP 1.x/2.0
  7. Layer VI : Browser APIs - XmlHttpRequest, RTCPeerConnection, etc


Introduction

The networking engine of a web browser handles all types of communications, it needed to fetch said content as per the input provided. For example, if the input has HTTP protocol, it will make an HTTP request. Though the HTTP request can only be made through a browser API XmlHttpRequest. Browser Network Engine This engine is like a tip of an iceberg. Only a few APIs are exposed for the usage and implementation of each is hidden beneath it. The tip or surface is platform-independent and the implementation can be provided differently by different manufacturers.

Browser APIs

Here is the full length and breadth of the Networking Stack. Browser Network Stack The surface represented as the blue part is what we use in daily development : XMLHttpRequest, Event Source, Web Sockets, RTCPeerConnection and Data Channel. The rest of the part below it is the browser's implementation of various technologies responsible for network and communication. But the stack remains almost the same for all. For example, Google Chrome might have a different implementation of Cookie management than Firefox. Further, the network stack takes care of:

  • The right connection limits.
  • Formatting our requests.
  • Sandboxing individual applications from one another.
  • Dealing with proxies, caching, and much more.

In this part of the series, we will discuss each layer of the networking stack, from bottom to top.

  • Only few networking apis are exposed for the usage and implementation of each is hidden beneath it.
  • Networking Stack - is a stack of technology and its implementations for various use cases.

Layer I : Protocols(TCP, UDP, TLS, DNS)

In this layer, browsers open a portal to the network layer of the OSI model. Networking protocols such as TCP/UDP are defined here. The implementation of Chrome and Firefox might be different, but a basic standard is common. This standard is defined by W3C organisation. Standard Guidelines : https://www.w3.org/TR/tcp-udp-sockets Chrome Implementation : https://developer.chrome.com/docs/extensions/reference/sockets_tcp/ Firefox Implementation : https://searchfox.org/mozilla-central/source/dom/webidl/TCPSocket.webidl For example chrome creates the tcp socket with its api :

chrome.sockets.tcp.create(properties?: SocketProperties, callback: function)
When designing a web application, we don’t have to worry about the individual TCP or UDP sockets, the browser manages that for us. Browsers remove these complexities so that our applications can focus on the application logic. If you want to read more about TCP/UDP, tune in to my blogs regularly. I will write about it soon.

We don’t have to worry about the individual TCP or UDP sockets, the browser manages that for us.

Layer II : Socket Management and Optimization

If you think you can manage the opening and closing of a network socket in a browser, then my friend you need to think twice. Any web application that runs in the browser never handles the management of sockets. It seems, that they handle but in reality, they only handle the tip of the iceberg, that is HTTP requests. Web applications can request or cancels the HTTP request but can not create or destroy underlying sockets. Browsers separate this network request and socket management for purpose and it is very crucial, which we will see shortly.

Socket is a connection point to establish communication channel (2-way) between two programs running on the network. Like an ip address is unique to a network endpoint, the origin uniquely indentifies a group of sockets. There can be multiple sockets under same origin.

Origin is an unique combination of three things -- Protocol, Domain Name and Port. For example: (https, google.com, 443)

Socket Pool is a group of sockets belonging to the same origin.

Technically, there is no limit on the number of sockets in a socket pool (under one origin). But in the real world, browsers put a strict limit on the number of sockets. This restriction is a result of a HTTP specification RFC2616.

Here is the Chromium based browser's implementation https://source.chromium.org/chromium/chromium/src/+/master:net/socket/client_socket_pool_manager.cc;l=51

Browser Max Connections per origin
Chrome 6
Firefox 6
Safari 6
IE 11 13 (Surprisingly, IE11 can make more than double connections for a single site than other browsers)
iOS 6
Android 6
Table Source: Push Technology

Most of the browser limits the socket pool size to six sockets per origin, except IE11. IE11 gives the freedom to open 13 sockets per origin.

Let’s understand how network calls are made and how sockets are created within a browser-provided limit. In the figure below, we can see there is N number of tabs open in a chrome browser. Tab 1 opens a website that makes 4 requests to origin A and 2 requests to origin B. Still socket pool for origin A has 2 sockets capacity left, to make network calls. Eventually, Tab 2 & Tab N make one request each and fills up the socket pool for origin A. More requests to origin A will be queued up in the respective tab's HTTP request queue until at least one gets free.

Browser Network Socket Pool

Similarly, the socket pool for origin B has only 4 sockets and 2 of them (with fade color) are re-used by Tab 1. This socket pool has an extra capacity of two sockets but it is not created yet because there is no request for it.

Therefore, as being incharge of socket management, gives browsers a lot of advantages. With this browsers can:

  • prioritise the http request it receives from the all the tabs.
  • reuse opened sockets to minimize delay in network due to handshakings.
  • be proactive in opening sockets in anticipation of requests.
  • decide when to close the idle sockets.
  • decide how much bandwidth to give to which socket.

In conclusion of this section, browser socket management does almost everything on our behalf to deliver high-performance web applications. To maintain and leverage the browser's ability, we must understand how browsers implement these tools. Also, for the end performance of every application, we can enhance our design decisions to help the browser determine network communication patterns.
TRICK - Unlock more than six sockets per origin Secret Unlock
  1. Customisation
  2. Change the source of the chromium. https://source.chromium.org/chromium/chromium/src/+/master:net/socket/client_socket_pool_manager.cc;l=51
  3. Domain Sharding
  4. Just like database sharding, distribute the different types of assets on different subdomains. For example, if you have a website, www.mywebsite.com, then keep all your images at www.shard1.mywebsite.com. The browser treats it as a different origin and creates a separate socket pool for it.
But domain sharding has its own drawbacks:
  • Every sharded hostname require and additional DNS lookup.
  • Consume extra resources for each new socket.
  • Developer has to manually manage the assets.

Layer III : Network Security and Sandboxing

In the previous section, we learned that browser takes up the charge of all network-related optimizations. But part of optimization, this deferment also provides the security capability to a browser. As a result, the browser can enforce security and policy constraints on an untrusted application.

Request Formatting & Response Processing

Again, this is the browser's task to properly format all outgoing requests according to all policy/constraints and well-formatted protocol semantics to protect the server. Similarly, the browser also ensures that the response is properly decoded and protects clients from malicious servers.

HTTP structure Image Source: Mozilla

TLS Negotiation

Transport Layer Security : This machenism provides the S to HTTP and make it HTTPS (HTTP-Secured). It includes TLS handshake which performs necessary validation/verfication checks of the host. To open a secured connection, the host must have a CA provided certificate and then this certificate is exchanged and verified through a process. The user is warned when and if the verification fails. For example, the server is using a self-signed certificate. Also, OS X stores certificates in the Keychain. Windows stores certificates in the Certificate Store. Chrome uses the operating system provided store.
Safari uses the certificates in the Keychain.
IE uses the certificates in the Certificate Store. Learn more on: Mozilla

Same-Origin Policy

Same-Origin Policy : This is a browser's policy to provide a critical security dimension. This restricts how an HTML document or script provided by an origin can make request to another origin. For example, by default browser restricts a network call made from www.abc.com to www.xyz.com. Browser provides HTTP headers based machenisms, to control (allow/block) the restriction on network calls to different origins.

  1. CORS - Cross Origin Resource Sharing
    This is a browser's standard that allows a server, through a response header (named, Access-Control-Allow-Origin) to control cross origin requests from outside its own origin. In others words, server can control which other origins are allowed to make a request it from a web browser.
    For example, if a server at www.abc.com sends a response header Access-Control-Allow-Origin: www.xyz.com
    Then, no origin other than apart from www.xyz.com can make a request to www.abc.com from a web browser.

  2. CSP - Content Security Policy
    This is a browser's standard that allows a server, through a response header (named, Content-Security-Policy) to control cross origin requests from inside its own origin. In others words, server can control which other origins are allowed to request from within its web page.
    For example, if a server at www.abc.com sends a response header Content-Security-Policy: 'self' www.xyz.com
    Then, www.abc.com can only make a request to self or www.xyz.com from a web browser.
  3. X-Frame-Options
    This is a browser's standard that allows a server, through a response header (named, X-Frame-Options) to control cross origin document loading request via <frame>, <iframe>, <embed>, <object> from inside other origins. In others words, server can control which other origins are allowed to load it as a document.
    For example, if a server at www.abc.com sends a response header X-Frame-Options: SAMEORIGIN
    Then, www.abc.com can only be loaded from itself.
Learn more on: Mozilla

Browser's security measures :
Request & response formatting, TLS and Same-origin policy (CORS, CSP, X-Frame-Options)

Layer IV : Storage and Session Management

Storage

The fastest way in which we can make a request is to not request at all. It may seem a bit funny, but browsers take it seriously. Before making a fresh HTTP request, the browser checks its assets cache and gives us a local copy if it is present, otherwise, it anyway makes that request. This cache checking depends on certain conditions satisfied. These conditions includes:

  • Evaluating caching directives for each resource
  • Validating/Re-validating expired resources if any
  • Managing size of cache and its eviction
We can control the browser's caching behavior through some response headers from a web server. Cache-Control, ETag, and Last-Modified response headers are used for controlling caching of each resource.

Session

To support authentication/authorization, the browser provides a session management mechanism through browser cookies. The browser maintains a separate cookie jar for each origin. The set of cookies for a particular origin is automatically attached to any HTTP API request made to the server. In this way, the server can verify the authenticity of the user. An authenticated session can be shared across multiple tabs or browser windows, and vice versa; a sign-out action in a single tab will invalidate open sessions in all other open windows.


Layer V : Application Layer Protocol - HTTP 1.x/2.0

Here, like the browser implements its TCP/UDP implementation for OSI's network layer, the browser also implements HTTP 1.x/2.0 to open gates to OSI's application layer. The major feature of HTTP2.0 over HTTP1.x is the multiplexing. HTTP2.0 enabled the browser to load multiple assets over a single TCP connection. Here is the visualization: HTTP2 Multiplexing Image Source: CloudFlare


Layer VI : Browser APIs - XmlHttpRequest, RTCPeerConnection, etc

Finally, walking up to the last layer of browser's networking stack that is APIs exposed to developers. Here are browser's notable APIs exposed :

  1. XMLHttpRequest
  2. Event Source
  3. Web Sockets
  4. RTCPeerConnection
Whenever we make a request via any of the above API, we go through some or all the layers from Layer V to Layer I. There is no one best protocol or API. Every non-trivial application will require a mix of different transports based on a variety of requirements: interaction with the browser cache, protocol overhead, message latency, reliability, type of data transfer, and more. Some protocols may offer low-latency delivery (e.g., Server-Sent Events, WebSocket), but may not meet other critical criteria, such as the ability to leverage the browser cache or support efficient binary transfers in all cases.


BONUS
If you want to create your own Chrome App, for some networking application, here is the api exposed by chrome to develop. Using this you can do socket level programming.
https://developer.chrome.com/docs/apps/app_network/

Links & References

  1. https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
  2. https://developers.google.com/web/updates/2018/09/inside-browser-part1
  3. https://developers.google.com/web/updates/2018/09/inside-browser-part3
  4. https://hpbn.co/primer-on-browser-networking/
  5. https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html
  6. https://web.dev/same-origin-policy/
  7. https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

About Author

author
Lakshit Nagar (A full stack enthusiast)
Principle Software Developer
@Oracle India (Ex-ThoughtWorker, IIT Kanpur Alumni)

I love to shape my ideas into a reality. You can find me either working on a project or having a beer with my close friend. :-)

Connect: