LT4335 CPU introduction.
Version of 8 July 2011.
Dave Barber's other pages.
Introduction. The LT4335 is a design of central processing unit (CPU) aimed for use in a general-purpose computer.
In line with most modern computer hardware, the LT4335 uses a binary scheme to store and transmit what can be called information (in the more abstract sense) or data (in the more concrete sense). As elsewhere in computer science, the fundamental unit of information is the bit, which can at any time contain either of two values often written as 0 and 1. Although these symbols happen to be numerals, they will frequently be used with non-mathematical meaning.
The CPU can read a bit and make a decision according to its value. The CPU can also change the value of a bit. Even in a calculation involving millions of steps, a computer does little more than this, over and over.
A collection of bits can, through suitable encoding, represent extremely complicated information. For any large body of data, there are many plausible ways that it can be encoded, each with pros and cons. However, there do exist some standards to guide the programmer, for instance two's complement for numbers and unicode for letters.
Explaining CPUs requires using plenty of numbers; integers will suffice for the time being. Fractions need not be introduced until floating-point numbers, which are optional, are discussed.
Addresses. The CPU references bits by number, whether they be within the CPU or without; the integer identifying a bit is the bit's address. A major advantage to using numbers, rather than something else, as addresses is this: the locations of data can be calculated (frequently with no more than elementary arithmetic) according to whatever formulas the programmer finds effective.
An important contrast must always be borne in mind: the address of a bit is an integer, positive or negative, that might be of large magnitude; but the value of that bit will always be 0 or 1. The address of a bit is fixed, but its value is subject to change.
When a collection of bits is used to represent a piece of data, the programmer will usually for simplicity choose bits whose addresses are consecutive numbers. The smallest address among the bits will then be deemed the address of the collection itself. Bits are consecutive, contiguous or adjacent when the integers forming their addresses are respectively consecutive, contiguous or adjacent. The first or leftmost bit in a collection is the bit of lowest address; the last or rightmost has the highest address. Of two bits, the one with a lower address is said to come before the other; the bit with higher address comes after the other. For all these terms, the values of the bits are immaterial.
The LT4335 differs from the majority of CPUs in allowing addresses to be negative. Another distinctive feature is that each bit has its own address, while in most computers an address refers to a collection of eight bits (a byte). Such aggregation makes manipulation of individual bits less convenient, and creates complication when the natural size of a data item is not a multiple of eight bits.
Numerical notation. In this report, numbers will sometimes be rendered in decimal, and at other times in binary, whichever is clearer. Note that when binary is used, the number's least significant digit will be written first, followed by the other digits in increasing significance. Although this is unconventional, it is helpful because the LT4335 is a little-endian machine. This term means that, when a number is stored in a collection of consecutive bits, the least significant digit is stored first, and the most significant digit is stored last. (Still, decimal will be notated with the most significant digit first.)
Usually, it will be obvious from context whether a number is written in decimal or binary, and the presence of a digit other than 0 or 1 surely indicates decimality. When ambiguity is a risk however, the decimal number will have a trailing subscript D, and the binary a leading subscript B. For instance, 1000D = B0001011111.
Instead of a superscript for exponention, a double asterisk is used, thus 3**5 = 243. For ease of reading, an underscore can be written between a number's digits, and a repeated digit can be indicated by a trailing superscript, as with 0104110 = 010000110 = 01_0000_110. The ellipsis is used to indicate a range of consecutive integers: 5…9 means 5, 6, 7, 8 and 9.
Data sizes. The CPU can handle at most 255 bits in one operation. The reason for this number (and not the more obvious 256 = 2**8) is that data is variable in width, and the CPU keeps track of its size with an eight-bit unsigned integer, which holds values in 0…255. The CPU will sensibly handle data items containing zero bits.
Some data sizes:
A physical data path 255 bits wide is quite large by the standards of year 2011. Because 255 = 85 × 3 = 51 × 5, the cost of producing the CPU might be reduced by having physical data paths that are only 85 or 51 bits wide. Two possibilities are:
The 85-bit configuration is particularly attractive because for many real-world computations in many environments, no data item requires more than than 85 bits.
Data types. Recognized by this CPU are these data types: strings, integers, floats, and extends; only the first two are mandatory. This table has links to discussions of each:
|LT4335 data types|
|string||description and instructions|
|extend||description and instructions|
From these primitive types, programmers can build aggregates to form arbitrarily complicated data types of their own. In particular, pointers are of the integer type.
Locations of data. Although the implementation of a computer can involve an extraordinary level of detail, the LT4335 architecture regards itself as having three places to store and retrieve data:
Most practical computers place a cache between the CPU and genspace; this can certainly be done here, as long as the cache is transparent to the CPU.
Registers. Like most CPUs, the LT4335 contains general-purpose registers to hold data of immediate interest. The exact quantity of these is not critical to the architecture of the machine, but in the implementation of this report there happen to be 64. All registers hold the same kinds of contents. Each register is accompanied by a read-only meta-register, an 8-bit unsigned integer, that tells how many bits of the register contain genuine data.
Each bit within the register has an address, which is an integer in 0…255. Data less than 255 bits wide is placed in contiguous bits starting at location zero. Attempts to read or write a bit not in use will throw exceptions.
A simple way to denote the contents of a register is by listing its bits. For instance, 1001_0011 means this:
The empty-set symbol φ is written when zero bits are in use.
Even when there are many unused register bits, the machine does not attempt to place two data items into one register.
The registers are organized into a stack. On other pages reside a description of it and a list of pertinent instructions.
Genspace. Within this area, each bit has a distinct address, an integer in −(2**253 − 1) … +(2**253 − 1). These limits were selected so that if one address is subtracted from another, the answer will be in −(2**254 − 1) … +(2**254 − 1), which is the range of integers. Because genspace is very large, not all addresses will have hardware (ram, rom, mmio, et cetera) attached to them. The addresses chosen for use need not be contiguous.
A data item in genspace "crosses an n-bit boundary" if the item contains a bit whose address is a multiple if n, and that bit is not the item's first bit. For example, a twelve-bit item at address 182 (hence occupying 182…193) crosses a 17-bit boundary at bit 187. Items that cross a 255-bit boundary are likely to be read and written slower than those which do not, and a similar considerations applies to the 85-bit and 51-bit boundaries of the narrower machines mentioned above.
A data item in genspace "is n-bit aligned" if its address (that is, the smallest address of any of its bits) is a multiple of n. Alignment at 17 bits is also called word-alignment. Note that crossing a boundary and being aligned are not mutually exclusive; an example is a 122-bit item stored at address 850. It is 85-bit aligned, but it also crosses an 85-bit boundary at address 935.
The CPU does not maintain any information about data types and sizes to describe what is stored in genspace. Hence a program can write some sequence of bits to genspace, and read them later with a completely different interpretation. Although this is sometimes useful, it is often erroneous and may deliver puzzling results.
To get is to read an item from genspace and push it into the register stack, and to put is to pull an item from the stack and write it to genspace. The typical sequence of computation is:
Because the LT4335 is a load-store architecture, there is a great deal of getting and putting.
Many input-output instructions (namely get, put and get-and-put) have a synchronized option. When several synchronized instructions appear in a program, these are guaranteed:
Although the synchronized instructions will remain strictly sequenced among themselves, the scheduling of non-synchronized instructions (even input-output) relative to them is unpredictable. Synchrony helps keep memory-mapped input-output organized, and is beneficial when several CPUs are accessing the same genspace. The volatile declaration in high-level programming languages often signals a need for synchrony. On the other hand, synchrony reduces the optimization choices available to a compiler or the CPU itself, and may result in slower (but still correct) programs when used without the need.
Unless it is very sophisticated, a cache between the CPU and genspace needs to implement a strict read-through-and-write-through policy for instructions that specify synchrony.
While synchrony does affect how the CPU deals with genspace, it does not affect storage and retrival involving the stack overflow area, which is completely independent, and generally invisible to the programmer.
A customary distinction made in computer design is between the von Neumann and Harvard architectures. In the former, there is one genspace for all purposes; in the latter are separate genspaces for instructions and data. The LT4335 architecture does not obligate either approach, although operations for moving information between the Harvard's instruction and data genspaces have not yet been specified. Multiple genspaces may complicate the machine, but they allow greater parallelism of operation, hence greater speed.
Some multi-processor installations might attach a private genspace to each CPU, and then offer an additional genspace for all the CPUs to share. In a multi-processor system however, each CPU must have its own private stack overflow area.
Execution. Like most CPUs, the LT4335 executes a sequence of instructions residing in genspace. The representation of each instruction is 17 bits long and must be word-aligned. After executing the instruction at address n, the machine proceeds to the instruction at address n + 17, unless directed otherwise. The exact sequence of bits for each instruction is called its opcode (table). Many opcodes have not yet been assigned, remaining available for expansion of the instruction set.
Strictly speaking, an opcode is a 17-bit string specifying an instruction, execution of which is an operation. However, because there is a one-to-one-to-one correspondence among them, the three terms are used somewhat interchangeably.
Fixed-length instructions help the CPU more efficiently pipeline execution. This is because the CPU will know how many bits are in the instruction before the instruction is fully decoded, and thus can calculate the address of the probable next instruction forthwith. The word "probable" is necessary because the current instruction may incur branching or throw an exception.
Because of the stack design, CPU instructions require only the simplest of addressing modes. Almost all instructions involve inputs or outputs on the stack, and many instructions contain 8 or 9 bits of embedded data in the opcode. Program-counter-relative addressing is extensively supported, easing relocation of programs.
In the table above, four categories of instructions particular to data types were introduced. Two other categories, less specific to any data type, are branching and miscellaneous.
Exceptions. If the machine is in a state where attempting to execute the next instruction would give a nonsense result, an exception is thrown, and a special routine (a handler) provided by the programmer will be invoked. On detecting that execution of an instruction would require an exception to be thrown, the CPU:
This is intended to meet the commit-or-rollback standard. In the rare case where the instructions are obtained not from ram or rom, but rather from some device that changes its value each time that it is read, the standard might not be attainable.
One way to explicitly throw an exception is to invoke an undefined instruction, and for that purpose all opcodes ending in 000 are reserved to the programmer, who can write handler routines to perform any activities desired. To serve a CPU lacking float hardware, an exception handler can be written to simulate each float instruction. That way, if a program expecting float hardware is run, an undefined-instruction exception will be thrown for each float instruction, its handler will be invoked, and correct (but probably slow) results will be obtained from software.
In normal processing, the CPU quietly reserves enough stack space so that a stack overflow exception can still be handled.