Ryan Chandler

Published in Programming Languages

Baby's First Virtual Machine

In this post, I'll be showing you how to build your first stack-based virtual machine with JavaScript.

At the end of the post, we'll be able to provide our virtual machine with a few mathematical instructions and it will be able to run through those instructions and produce an answer.

We'll be using JavaScript to build our virtual machine so that we don't get lost in type-system details. Most people can read and understand JavaScript too, so it's great for teaching new concepts.

What is a virtual machine?

Before we look at how to build one, let's briefly talk about what a virtual machine is.

At the highest level, a virtual machine is just a software-implementation of a computer. It loosely simulates the behaviour of a CPU by reading instructions and performing actions based on those instructions. Virtual machines are commonly used by programming languages to implement their interpretation layer. This is where the programming language actually does most of the logical work.

What is a stack machine?

There are many types of virtual-machine, so to keep things simple we'll be building one of the simplest.

A stack-based virtual machine uses the stack to, more often than not, store short-lived temporary values. You can think of the stack as an array that is constantly being pushed to, as well as popped from.

When you encounter some form of value, you will generally push that value to the stack. When you encounter an instruction, it will likely need some sort of argument to operate or (commonly known as an operand). This argument is retrieved from the stack by popping it from the end, removing it from the stack completely.

Instructions and Operands

Our virtual machine is going to act a simple calculator. We'll be able to ADD, SUBTRACT, DIVIDE, MULITPLY and PRINT values using a combination of instructions and values.

Let's take a look at a simple ADD instruction. We'll start off by creating a global constant in our program called ADD and use it's string equivalent as the value.

const ADD = 'ADD';

Let's also create an array to store our instructions in.

const ADD = 'ADD';

const instructions = [
    ADD,
];

If you think of the + operation in a mathematical equation, you have 2 operands. The equation 1 + 2 is adding the numbers 1 and 2 together, making them the operands in the equation.

Our ADD instruction will behave in the exact same way. We will need to provide 2 numbers for it to function correctly. The simplest way to do this is by adding them to the instructions array before our ADD instruction.

const instructions = [
    1, 2, ADD,
];

We place them before the instruction so that the ADD instruction can pop them off of the stack and then add them together. To show this in action, let's start creating our main evaluation loop as well as our stack.

We'll use a for..of loop so that we can have direct access to the instruction, as opposed to the instruction's index in the array.

const stack = [];

for (const instruction of instructions) {
    if (! Number.isNaN(instruction)) {
        stack.push(instruction);
        continue;
    }
}

We start by handling our numeric values. To check if something is a number, we can use the isNaN() function. If it returns false, it must be a number. In that case, we can push the value onto the stack and move on to the next iteration.

To keep things super simple, let's use a switch statement to handle all of our instructions:

for (const instruction of instructions) {
    if (! isNaN(instruction)) {
        stack.push(instruction);
        continue;
    }

    switch (instruction) {
        case ADD:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a + b)
            break;
        default:
            throw new Error(`Unhandled instruction: ${instruction}`)
    }
}

When we encounter an ADD instruction, we can pop the 2 operands from the stack. The first value that is popped will be the second (right-hand) operand to the equation.

This can be quite confusing at first, but it's all down to the order that they were pushed in. The first value we pushed was 1, the left-hand side of the equation. The second value was 2, the right-hand side of the equation. When the first value is popped off of the stack, it'll be the number 2.

Once we have both a and b, we can add the 2 numbers together and push the value back on to the stack.

To help with debugging, let's create a new PRINT instruction that will print pop the last value from the stack and run it through console.log().

const ADD = 'ADD';
const PRINT = 'PRINT';

const instructions = [
    1, 2, ADD,
    PRINT,
];

for (const instruction of instructions) {
    if (! isNaN(instruction)) {
        stack.push(instruction);
        continue;
    }

    switch (instruction) {
        case ADD:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a + b)
            break;
        case PRINT:
            console.log(stack.pop())
            break;
        default:
            throw new Error(`Unhandled instruction: ${instruction}`)
    }
}

If we run this program now, we can expect the number 3 to printed to the console.

Subtraction, Multiplication and Division

With the ADD and PRINT instructions in place, let's add the logic for SUBTRACT, MULTIPLY and DIVIDE.

const ADD = 'ADD';
const SUBTRACT = 'SUBTRACT';
const MULTIPLY = 'MULTIPLY';
const DIVIDE = 'DIVIDE';
const PRINT = 'PRINT';

// ...

for (const instruction of instructions) {
    if (! isNaN(instruction)) {
        stack.push(instruction);
        continue;
    }

    switch (instruction) {
        case ADD:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a + b)
            break;
        case SUBTRACT:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a - b)
            break;
        case MULTIPLY:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a * b)
            break;
        case DIVIDE:
            let b = stack.pop()
            let a = stack.pop()
            stack.push(a / b)
            break;
        case PRINT:
            console.log(stack.pop())
            break;
        default:
            throw new Error(`Unhandled instruction: ${instruction}`)
    }
}

We now have a nice set of instructions to calculate some simple mathematical equations. Let's update our instructions array and calculate the value of 1 + 2 * 3.

const instructions = [
    1,
    2, 3, MULTIPLY,
    ADD,
    PRINT,
]

The order of these instructions might seem a little strange at first, but it's all to do with the precedence of each sub-calculation. If we were to insert some parentheses into the equation, it would really look like 1 + (2 * 3) because the multiplication needs to be calculated separately.

Putting that into words we're really calculating the result of 1 added to the result of 2 * 3. Our instruction set is doing the same. We first push the value of 1 onto the stack. The value of 2 * 3 is then calculated and pushed back on to the stack.

When the ADD instruction is reached we pop the result of 2 * 3 from the stack, as well as the value of 1 and add those together, pushing the result back on to the stack.

With any luck, running this program will print 7 in the console.

A small problem

At the moment we can only print a value at the very end of the program. This is because the PRINT instruction will pop the last value off of the stack permanently, meaning any further instructions might not have a value to pop off.

To fix this, we can update our PRINT case to instead retrieve the last value from the stack and print it.

case PRINT:
    console.log(stack[stack.length - 1])
    break;

Refactoring

There's lot of duplicated code in our instruction handling at the moment. All of our mathematical operations are popping 2 values from the stack and pushing back the result of the operation.

To tidy this up, we could create some sort of instruction handler function for each of our instructions. Let's start by creating a handler object that maps the instruction to a callback function.

const handler = {
    ADD: (a, b) => a + b,
    SUBTRACT: (a, b) => a - b,
    MULTIPLY: (a, b) => a - b,
    DIVIDE: (a, b) => a / b,
}

When we encounter one of our mathematical instructions, we can create a common case block that pops the values and invokes one of these functions.

for (const instruction of instructions) {
    if (! isNaN(instruction)) {
        stack.push(instruction);
        continue;
    }

    switch (instruction) {
        case ADD:
        case SUBTRACT:
        case MULTIPLY:
        case DIVIDE:
            let b = stack.pop()
            let a = stack.pop()
            let result = handler[instruction](a, b)

            stack.push(result)
            break;
        case PRINT:
            console.log(stack[stack.length - 1])
            break;
        default:
            throw new Error(`Unhandled instruction: ${instruction}`)
    }
}

This removes a lot of code duplication from our switch statement and gives us a single place to add new instruction handlers in the future.

Next steps

Now that you've created your own stack-based virtual machine, why not try to write a parser that converts a mathematical equation such as 1 + 2 / 3 * 4 into a set of instructions and runs them through your machine. Doing this will essentially give you a tiny calculator program that can grow further into a domain-specific language or general-purpose programming language.