Virtual DOM Explained

By Daniel Pedroso

Mar 22, 2019 • 15 Minute Read

Introduction

The Virtual DOM was one of React’s main differentiators when it first appeared. It was a big advantage in comparison with previous frameworks, and newer libraries started to follow the same approach (e.g. Vue.js).

Even with all the attention that the concept received in the past few years, there are still several questions surrounding the topic. How does it work behind the scenes? Why is it considered faster than direct DOM manipulation? How does it relate to dirty model checking?

What Is It Trying to Solve?

When you're dealing with a client-side application, you quickly face one issue: DOM manipulation is expensive. If your application is big or very dynamic, the time spent in manipulating DOM elements rapidly adds up and you hit a performance bottleneck.

The obvious answer to the problem is to avoid manipulating elements unless strictly necessary. The approach used by Angular, which is arguably the framework that popularized the concept of SPAs (Single Page Applications), is called Dirty Model Checking.

Example model:

          {
  subject: 'World'
}
    

Example template:

          <div>
  <div id="header">
    <h1>Hello, {{model.subject}}!</h1>
    <p>How are you today?</p>
  </div>
</div>
    

With this approach, the framework keeps tabs on all models. If the model changes, it interpolates/executes the corresponding templates, manipulating the DOM directly. If the model doesn't change, it won't touch the DOM.

Now, this is a smart solution. There are still problems with it, though. One of the main issues becomes very obvious when changes to your model don't necessarily translate into a change in the template - or, even worse, when your model and template are super complex. In the example shown above, that p tag will never change. It will still be updated after every single time your model is considered dirty - there is nothing between your template and the actual DOM, so the whole thing is modified every time.

A simple solution to this problem is: Add a layer between your template and your DOM!

What's a Virtual DOM?

Basically, it's an in-memory representation of the actual elements that are being created for your page.

Let's go back to that previous HTML:

          <div>
  <div id="header">
    <h1>Hello, {{state.subject}}!</h1>
    <p>How are you today?</p>
  </div>
</div>
    

After rendering, your virtual DOM could be represented as something like this:

          {
  tag: 'div',
  children: [
    {
      tag: 'div',
      attributes: {
        id: 'header'
      },
      children: [
        {
          tag: 'h1',
          children: 'Hello, World!'
        },
        {
          tag: 'p',
          children: 'How are you today?'
        }
      ]
    }
  ]
}
    

Now, let's say our state changed - state.subject is now Mom. The new representation will be:

          {
  tag: 'div',
  children: [
    {
      tag: 'div',
      attributes: {
        id: 'header'
      },
      children: [
        {
          tag: 'h1',
          children: 'Hello, Mom!'
        },
        {
          tag: 'p',
          children: 'How are you today?'
        }
      ]
    }
  ]
}
    

We can now diff the two trees and identify that only that h1 changed. We then surgically update that single element - no need to manipulate the whole thing.

Let's make things a bit more interesting - we'll write our own naive implementation of a Virtual DOM library!

The Code

Since we want to keep things as simple as possible, let's not worry about edge cases at all - we'll provide just enough functionality to abstract our previous Hello World example.

Base Components

We'll write a few base components: div, p, and h1. In order to keep things as simple as we possibly can, we'll force each node to contain an id, so that we can easily and quickly find the actual DOM element later.

          /*
 * Helper to create DOM abstraction
 */
const makeComponent = tag => (attributes, children) => {
  if (!attributes || !attributes.id) {
    throw new Error('Component needs an id');
  }

  return {
    tag,
    attributes,
    children,
  };
};

const div = makeComponent('div');
const p = makeComponent('p');
const h1 = makeComponent('h1');
    

We now have the functions div, p, and h1 in scope. If you're into Functional Programming, you'll identify that as partial application. If you're not, you can see the functions as just a bit of syntactic sugar - you won't have to provide the tag argument every single time you need a component.

More Complex Components

Now that we have a few basic elements, we can start composing more complex components. Let's introduce the concept of state here.

Again, because we want to keep this simple, we won't go into state management. Let's just assume the state is being tracked/managed somewhere else.

          /*
 * app component - creates a slightly more complex component out of our base elements
 */
const app = state => div({ id: 'main' }, [
  div({ id: 'header' }, [
    h1({ id: 'title' }, `Hello, ${state.subject}!`)
  ]),
  div({ id: 'content' }, [
    p({ id: 'static1' }, 'This is a static component'),
    p({ id: 'static2' }, 'It should never have to be re-created'),
  ]),
]);
    

As you can see, we've just represented something similar to the previous HTML template - but this time in JavaScript. This is the basic essence behind JSX. Below the HTML-esque syntax, it ultimately gets translated to JavaScript function calls - something that's not so fundamentally different from our naive implementation here.

In a nutshell, that "component" is a simple function that takes a state (analogous to our previously-mentioned model) and returns a Virtual DOM tree. Assuming our state looks like this:

          {
  subject: 'World'
}
    

Then our DOM tree should look like this:

          {
  "tag": "div",
  "attributes": {
    "id": "main"
  },
  "children": [
    {
      "tag": "div",
      "attributes": {
        "id": "header"
      },
      "children": [
        {
          "tag": "h1",
          "attributes": {
            "id": "title"
          },
          "children": "Hello, World!"
        }
      ]
    },
    {
      "tag": "div",
      "attributes": {
        "id": "content"
      },
      "children": [
        {
          "tag": "p",
          "attributes": {
            "id": "static1"
          },
          "children": "This is a static component"
        },
        {
          "tag": "p",
          "attributes": {
            "id": "static2"
          },
          "children": "It should never have to be re-created"
        }
      ]
    }
  ]
}
    

Rendering Our Virtual DOM

You didn't think we'd stop there, did you?

Again, to keep with the theme of this guide, let's not build anything too complicated. We'll write just enough code to cover our simple app.

Here's the code:

          /*
 * Sets element attributes
 * element: a DOM element
 * attributes: object in the format { attributeName: attributeValue }
 */
const setAttributes = (element, attributes) => {
  return Object
    .keys(attributes)
    .forEach(a => element.setAttribute(a, attributes[a]));
};

/*
 * Renders a virtual DOM node (and its children)
 */
const renderNode = ({ tag, children = '', attributes = {} }) => {
  // Let's start by creating the actual DOM element and setting attributes
  const el = document.createElement(tag);
  setAttributes(el, attributes);

  if ((typeof children) === 'string') {
    // If our "children" property is a string, just set the innerHTML in our element
    el.innerHTML = children;
  } else {
    // If it's not a string, then we're dealing with an array. Render each child and then run the `appendChild` command from this element
    children.map(renderNode).forEach(el.appendChild.bind(el));
  }

  // We finally have the node and its children - return it
  return el;
};
    

As you can see, this is not super sophisticated and doesn't cover a whole lot of edge cases - but it's just enough for us.

We can now see it in action by running the following script (assuming our HTML contains an element with id #root):

          const virtualDOMTree = app({ subject: 'World' });
const rootEl = document.querySelector('#root');
rootEl.appendChild(renderNode(virtualDOMTree));
    

Handling Changes

So far, we've created a DOM abstraction layer - let's now work on our diff.

The first step is to get two nodes and check if they're different. Let's use the following code:

          /*
 * Runs a shallow comparison between 2 objects
 */
const areObjectsDifferent = (a, b) => {
  // Set of all unique keys (quick and dirty way of doing it)
  const allKeys = Array.from(new Set([...Object.keys(a), ...Object.keys(b)]));

  // Return true if one or more elements are different
  return allKeys.some(k => a[k] !== b[k]);
};

/*
 * Diff 2 nodes
 * Returns true if different, false if equal
 */
const areNodesDifferent = (a, b) => {
  // If at least one of the nodes doesn't exist, we'll consider them different.
  // Also, if the actual `tag` changed, we don't need to check anything else.
  if (!a || !b || (a.tag !== b.tag)) return true;

  const typeA = typeof a.children;
  const typeB = typeof b.children;

  return typeA !== typeB // Cover the case where we went from children being a string to an array
    || areObjectsDifferent(a.attributes, b.attributes) // changes in attributes
    || (typeA === 'string' && a.children !== b.children); // if it's a string, did the text change?
};
    

Finally, let's write a function that navigates our virtual DOM tree and re-renders elements if necessary:

          /*
 * Gets the previous and current node representations
 * replaces the real DOM based on whether or not the representation changed
 */
const diffAndReRender = (previousNode, currentNode) => {
  if (areNodesDifferent(currentNode, previousNode)) {
    // Is the current node different? If so, replace it.
    const nodeId = currentNode.attributes.id;
    console.log('Replacing DOM node:', nodeId);

    return document
      .querySelector(`#${nodeId}`)
      .replaceWith(renderNode(currentNode));
  } else if (currentNode.children instanceof Array) {
    // If not, and the children prop is an array, recursivelly call this function for each child
    currentNode.children.forEach((currChildNode, index) => {
      diffAndReRender(previousNode.children[index], currChildNode);
    });
  }
};
    

Note that we're matching children based on index here. This kind of matching is not good enough for a real-world scenario but works in our example app.

Now that we have a way to run a diff and surgically replace specific elements that actually change, let's run our code again - this time simulating a state update:

          // Render the initial application
const virtualDOMTree = app({ subject: 'World' });
const root = document.querySelector('#root');
root.appendChild(renderNode(virtualDOMTree));

// Generate a new virtual DOM tree based on a change in state:
const newVirtualDOMTree = app({ subject: 'Mom' });

diffAndReRender(virtualDOMTree, newVirtualDOMTree);
    

After running our diffAndReRender function, we'll see a message in the console saying Replacing DOM node: title. That's it, no other element replaced. And indeed, our #title element will now say Hi, Mom!.

Now, this gives us a nice segway into the next segment.

Common Pitfalls

If you've read the previous section, you'll have noticed that we reran the whole app after changing our state. I wrote that for a reason - this is exactly what React does. In most situations, this behavior is completely fine. You're avoiding hammering the actual DOM, and most of your components won't leave a large footprint anyway.

That said, there are always scenarios where your component is far more complex or is running an expensive algorithm. In these situations, you'll need to worry about optimizing your component to prevent wasted update cycles. In other words, you need to make sure your component is only being executed again if it's actually resulting in changes to the output. Luckily, React provides us with a few ways to optimize for that scenario - namely the shouldComponentUpdate lifecycle method, the React.memo HOC, and the React.PureComponent class. I wrote a blog post focusing on performance tuning for React components some time ago; you can find it here, if you're interested.

Another common issue is when one of the elements near the top of the tree changes so dramatically that it ends up completely replaced - say, for instance, that you changed from a <MyComponentForLargeScreens> to a <MyComponentForSmallScreens>. Because you completely replaced this node, every single element branching off of it will be re-created as well. I've seen it first-hand in situations where, as an example, the application changes its root element based on window width (hence the component names!). Running it on a smartphone and changing the device orientation (i.e. rotating between horizontal and vertical) causes the whole application to be unmounted and re-created from scratch. This sounds OK until you realize that you also lost all states kept within components - half-filled forms can suddenly go blank - and that's on top of the performance penalty. This is something that requires attention.

Conclusion

The Virtual DOM is definitely going to be around for a while. It provides a really nice way of decoupling your application's logic from its DOM elements and, therefore, reduces the likelihood of creating unintentional bottlenecks when it comes to DOM manipulation. Other libraries are moving forward with the same approach, further solidifying the concept as one of the preferred strategies for web applications.

It's worth mentioning that dirty model checking and virtual DOM are not mutually exclusive. They both came as solutions for the same problem but tackling it in different ways. An MVC framework could very well implement both techniques. In React's case, it just didn't make much sense - React is mostly a View library after all.

So, in summary, the Virtual DOM implements:

A tree structure representing the DOM elements your application creates.
A diff algorithm designed to identify changes between DOM representations.
A way to replicate said changes in the actual DOM - but only if necessary.

I consider the virtual DOM one of the cornerstones of mastering React - it certainly allowed me to have more context on some of the choices that went into designing the framework, and even to improve my own components and optimization techniques. Hopefully, it'll be as useful to you as it was to me!

If you liked this guide, you could check out some of my other content here:

Pluralsight Guide - Optimizing Redux Store

Blog Post - How To Tune React For Performance

Blog Post - Functional Programing And React