Build a fantasy CPU emulator in JavaScript
Introduction
Emulators are cool pieces of technology that allow users to run a totally different system on top of another one.
Emulators have wide range of applications such as running x86 programs on ARM devices or running ARM Android applications on x86 Windows Desktops and even running your favorite retro game on an emulated computer / console on a Raspberry Pi.
When a full system is emulated, the emulator software needs to handle all the hardware devices of that systems. These may include not only the CPU but also Video System, Input Devices, etc. However, the core concept of an emulator remains emulation of the CPU.
In this article we will explore how a CPU works and how can be emulated in software by implementing a simple machine that will run programs written for a fantasy CPU.
Programs are just a series of bytes
It is well known that CPUs are the core of a machine and their main job is to execute programs. Programs are nothing else but a series of instructions in the computer memory.
At this point you may be tempted to believe that your CPU knows JavaScript. Although a very tempting concept, this is not exactly the case. Imaging changing the CPU if a new version of JavaScript appeared!
The reality is that a CPU understands only a finite set of instructions. These are the instructions that the CPU engineering team designed the CPU to understand.
As a user you can create programs by using these instructions directly or by coding in a higher-level language and then using a compiler / system to translate your program into the set of instructions understood by the CPU.
Regardless how the user created the program, the instructions end up in memory as a series of bytes such as these ones:
11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50
The CPU role is to fetch these instructions from memory and to execute them.
At this point please look again at the series of number above. That is a real program written for our fantasy CPU. To see what it does we need to implement a virtual CPU in software and then we will use it execute the program.
Fantasy CPU high-level specs
The CPU that we’re about to emulate in software is a fantasy one. It does not exist in the real world, but it mimics very closely how a real CPU works. The machine is partially inspired by this article.
The CPU has 4 general purpose number registers: R0, R1, R2, R3 (think of these registers as variables that can store a number). Besides these registers, the CPU can also manipulate a stack with ability to push or pop values on and out of the stack.
The CPU operates by instructions. Some of the instructions don't have operands, while other instructions have several operands (think of operands as arguments).
A series of instructions makes a program. Instructions are encoded in the program as this:
INSTRUCTION1 [OPERAND1] [OPERAND2] [OPERAND3] INSTRUCTION2 [OPERAND1] ...
Each instruction has a unique number associated with it. For simplicity, instruction codes, operands and even addresses are regular numbers. Therefore, no bytes, or any other data types are needed. Everything is a number!
Our program is therefore a series of numbers. Each number occupies a single cell of memory. For example, an instruction with 3 operands will take 4 cells of program memory (1 for the instruction code and 3 for operands).
Instructions
And now let's see the set of the instructions that our CPU accepts. Although all CPUs including our fantasy one executes instructions in binary form, the CPU engineering team usually associates names / mnemonics with the instructions recognized by the CPU.
Using mnemonics makes the task of writing programs much easier for humans. If one writes programs using the instructions mnemonics it is said that he codes in assembly language. It only takes a simple step to transform these programs written in assembly language into binary instructions (e.g. machine language).
Instructions and mnemonics for our fantasy CPU are quite simple and intuitive. Let’s take the time to describe all of the below. Afterall we need to know all the CPU instructions if we want to emulate that CPU into software.
Note: Under each instruction is specified the number used to encode that particular instruction. Also registers R0, R1, R2, R3 are encoded as numbers 0, 1, 2, 3):
Loads the value from regsrc into regdst. E.g. regdst = regsrc
MOVR reg_dst, reg_src
MOVR = 10
Loads the numeric value into register regdst. E.g. regdst = value
MOVV reg_dst, value
MOVV = 11
Adds the value from regsrc to the value of regdst and store the result in reg_dst
ADD reg_dst, reg_src
ADD = 20
Subtracts the value of regsrc from the value of regdst and store the result in reg_dst
SUB reg_dst, reg_src
SUB = 21
Pushes the value of reg_src on the stack
PUSH reg_src
PUSH = 30
Pops the last value from stack and loads it into register reg_dst
POP reg_dst
POP = 31
Jumps the execution to address addr. Similar to a GOTO!
JP addr
JP = 40
Jump to the address addr only if the value from reg1 < reg2 (IF reg1 < reg2 THEN JP addr)
JL reg_1, reg_2, addr
JL = 41
Pushes onto the stack the address of instruction that follows CALL and then jumps to address addr
CALL addr
CALL = 42
Pops from the stack the last number, assumes is an address and jump to that address
RET
RET = 50
Print on the screen the value contained in the register reg
PRINT reg
PRINT = 60
Stops our VM. The virtual CPU doesn't execute instructions once HALT is encountered.
HALT
HALT = 255
Emulating the CPU
Armed with the fantasy CPU specs, we can now move ahead to emulate the CPU in software – in JavaScript.
As presented above, on real machines programs are stored in memory. In our emulator will emulate the memory using a simple array structure. As a matter of fact, we’ll only place a single program in the memory.
let program = [11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50];
Our virtual CPU will need to fetch instructions one by one from this array and execute them. The CPU will keep track of the instruction that needs to fetch by using a special-purpose register named “PC” (Program Counter).
Fact: PC register is a real register on many physical CPU architectures.
The core of the CPU emulator is just a big “switch” statement that will process each instruction according to the specs.
let pc = 0;
let halted = false;
function run()
{
while(!halted)
{
runone();
}
}
function runone()
{
if (halted)
return;
let instr = program[pc];
switch(instr)
{
// handle each instruction according to specs
// also advance pc to prepare for the next fetch
// ...
}
}
That’s all! This is the structure of our glorious CPU emulator. Processing instructions is also a very simple task. Require just careful reading and implementation of the instruction’s specs.
switch(instr)
{
// movr rdst, rsrc
case 10:
pc++;
var rdst = program[pc++];
var rsrc = program[pc++];
regs[rdst] = regs[rsrc];
break;
// movv rdst, val
case 11:
pc++;
var rdst = program[pc++];
var val = program[pc++];
regs[rdst] = val;
break;
...
}
Take your time, read the specifications, and try to see if you can complete this virtual CPU implementation. You’ll know that you did a great job when you’ll see the results of the execution program.
If you coded carefully, the program should display the first 10 Fibonacci numbers.
You can also consult our version from this playground:
Loading the program
As you saw in the above playground our emulator implementation consists in a simple virtual machine that loads initially the program from an array of bytes and then asks the fantasy CPU to execute it.
vm.load([11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]);
Loading like this is fine if your only intent is to load the program we’ve provided. But if you want to design your own programs in machine language and execute them, this may not be a very intuitive or efficient way.
Let me present you a small technique that you can use to load the programs using a human friendly CPU instruction mnemonics. It only takes defining a few constants:
const MOVR = 10;
const MOVV = 11;
const ADD = 20;
const SUB = 21;
const PUSH = 30;
const POP = 31;
const JP = 40;
const JL = 41;
const CALL = 42;
const RET = 50;
const PRINT= 60;
const HALT = 255;
const R0 = 0;
const R1 = 1;
const R2 = 2;
const R3 = 3;
vm.load([
MOVV, R0, 10,
CALL, 6,
HALT,
// PrintFibo: (addr = 6)
PUSH, R0,
MOVV, R0, 0,
MOVV, R1, 1,
MOVV, R3, 1,
PRINT, R1,
// continue: (addr = 19)
MOVR, R2, R0,
ADD, R2, R1,
PRINT, R2,
MOVR, R0, R1,
MOVR, R1, R2,
MOVV, R2, 1,
ADD, R3, R2,
POP, R2,
PUSH, R2,
JL, R3, R2, 19,
POP, R0,
RET
]);
With this trick is like you’re programming in assembly language inside JavaScript.
You can find the new program with mnemonics in this playground:
At this point we can end our article. We will take however the opportunity to quickly implement a few small tools that are commonly used when working with machine language code.
Assembler
Perhaps the most important tool when working with machine language programs is an assembler program.
This program lets users write programs as text files, in easier to use assembly language and then the assembler will do the heavy lifting by converting the assembly source code into binary data understood by the CPU.
We will attempt to build a simplified assembler program that does the basic job. It will work like this:
let code = `
// Loads value 10 in R0
// and calls Fibonacci routine
MOVV R0, 10
CALL 6
HALT
...
`;
let bytes = asm.assemble(code);
If you want to see the code, please check this playground:
Disassembler
Another useful tool that you can build when working with machine language is a disassembler.
A disassembler takes as input a program in binary format and outputs its listing in a human readable way – in assembly language.
When run on the following program:
let src = asm.disassemble(
[11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]
);
... our disassembler should output this:
0 11 0 10 MOVV R0, 10
3 42 6 CALL 6
5 255 HALT
6 30 0 PUSH R0
8 11 0 0 MOVV R0, 0
11 11 1 1 MOVV R1, 1
14 11 3 1 MOVV R3, 1
17 60 6 PRINT R1
19 10 2 0 MOVR R2, R0
22 20 2 1 ADD R2, R1
25 60 6 PRINT R2
27 10 0 1 MOVR R0, R1
30 10 1 2 MOVR R1, R2
33 11 2 1 MOVV R2, 1
36 20 3 2 ADD R3, R2
39 31 2 2 POP R2
41 30 2 PUSH R2
43 41 3 2 19 JL R3, R2, 19
47 31 0 2 POP R0
49 50 RET
The listing contains the memory addresses and binary instructions and associated mnemonic for each instruction in our program.
To see a very simplistic implementation of a disassembler, please check this playground:
About Fibonacci program
You're probably wondering how we came up with the assembly program to print the Fibonacci numbers? Well, the answer is simple. We wrote first the algorithm in JavaScript and then convert it step by step to assembler:
Of course, once you get more experience with writing in assembly, you can do this task directly in one step! It just requires practice!
Conclusion
I hope you liked this article! Even if there is still a lot of work until you can emulate a full system, I hope that this article provided you with a basic overview of what it take to emulate the core of a system – the CPU.
For a quick reference I put together all programs in this article and a few explanations in the following playground:
As a next step I invite you to create additional programs for this fantasy CPU and if needed extend the CPU instruction set with new instructions to support your programs.
Read more blog articles Browse JavaScript projectsAbout codeguppy
CodeGuppy is a FREE coding platform for schools and independent learners. If you don't have yet an account with codeguppy.com, you can start by visiting the registration page and sign-up for a free account. Registered users can access tons of fun projects!
Follow @codeguppy on Twitter for coding tips and news about codeguppy platform. For more information, please feel free to contact us.