TD4 4-bit DIY CPU – Part 3
Having now spent some time thinking about and building my TD4 4-bit CPU this has a look at some code that I can run on it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Programming
As described in Part 1 each command has a command part (bits 4-7) and an immediate data part (bits 0-3).
Unfortunately, due to how the PCB is built, when programming the DIP switches, the bits are ordered 01234567.
Now it would be possible to turn the board upside down of course, but then the program would flow from bottom right to top left instead of the more natural top left to bottom right.
This is also slightly confusing as the IO (LED OUTPUT and DIP switch INPUT) are in the more expected 3210 order.
Consequently, in the following programs, I list the “assembly” but then show the bit patterns in DIP switch order for easy programming!
Aside, after a bit more messing around online, I stumbled across a web-based TD4 emulator here: umtkm.github.io/td4/
And even an implementation in Minecraft…
Clock Options
There are several options for running the board. It has a built-in clock with two speeds: 1Hz or 10Hz. But there is also a “single stepping” option using the button.
Simple LED Output
Sending a value via the OUT instruction will push that value into the OUT register and consequently onto the LEDs. The simplest LED program is therefore the simple “OUT value”, but to make it at least look like something is happening, we can send two alternating values as follows:
0000 OUT 0001 # 1000 11010001 OUT 1000 # 0001 11010010 JMP 0000 # 0000 1111
This flashes bit 1, then bit 4 and then jumps back to the start.
Larson Scanner
Swapping values is no fun, but it is easy to create a 4-bit “Larson Scanner” with a sequence of OUT instructions as shown below.
0000 OUT 0001 # 1000 11010001 ADD A,0000 # 0000 00000010 OUT 0010 # 0100 11010011 ADD A,0000 # 0000 00000100 OUT 0100 # 0010 11010101 ADD A,0000 # 0000 00000110 OUT 1000 # 0001 11010111 ADD A,0000 # 0000 00001000 OUT 0100 # 0010 11011001 ADD A,0000 # 0000 00001010 OUT 0010 # 0100 11011011 ADD A,0000 # 0000 00001100 JMP 0000 # 0000 1111
The ADD instructions are effectively NOP instructions. They are just there to make the changes all two clock cycles. Without these, the LEDs change quite smoothly but then there is a short pause whilst the JMP takes place.
Counting Up and Down
We only have an OUT instruction that takes an immediate value or will use the B register, so if we want to do some counting and output the result, that has to happen using B.
0000 ADD B,0001 # 1000 10100001 OUT B # 0000 10010010 JMP 0000 # 0000 1111
Counting down is a bit more tricky as there is no subtract. But like the vast majority of other CPU architectures, if we use two complement for negative numbers, then the magic of wrapping around binary numbers will do the trick.
Twos complement means taking the bitwise opposite and adding 1, so the twos complement value for n is NOT(n) + 1. For example:
-1 => ~(0001) + 0001 = 1110 + 0001 = 1111 = 15
So working through 5 – 1:
0101 -> 5+ 1111 -> -1 ----- 0100 -> 4 1111
When the binary digit wraps around, and we ignore the “carry”, that gives the correct answer: 4.
0000 ADD B,1111 # 1111 10100001 OUT B # 0000 10010010 JMP 0 # 0000 1111
Taking INPUT
This will simply echo what is on the INPUT switches to the OUTPUT LEDs.
0000 IN B # 0000 01100001 OUT B # 0000 10010010 JMP 00000 # 0000 1111
This will add 1 to the INPUT
0000 IN B # 0000 01100001 ADD B,0001 # 1000 10100010 OUT B # 0000 10010011 JMP 0000 # 0000 1111
Although there is actually a bit of a cheat here. OUT B assumes the immediate data is 0 otherwise that gets added to B as part of the OUT, so really the above could be shorted to the following:
0000 IN B # 0000 01100001 OUT B, ADD 0001 # 1000 10010010 JMP 0000 # 0000 1111
Pretty much any instruction that assumes an immediate value of 0 can have the side effect of addition if an actual value is used instead as a side effect of using the accumulator to move values around…
Programmable Counter
This will read the input and count up to that value then stop. It makes use of the JNC – jump if not carry and twos-complement subtraction to count down from the INPUT value.
0000 IN A # 0000 0100 A = INPUT0001 MOV B,1111 # 1111 1110 B = -10010 ADD B,0001 # 1000 1010 B = B + 10011 OUT B # 0000 1001 OUTPUT = B0100 ADD A,1111 # 1111 0000 A = A + (-1)0101 JNC 0111 # 1110 0111 JUMP IF NO CARRY TO 10000110 JMP 0010 # 0100 1111 JUMP to 00110111 MOV B,0000 # 0000 1110 B = 0
The JUMP IF NOT CARRY works as when 1111 is added to A, the only value of A that won’t generate a CARRY is 0, so this is really executing the logic:
DO (other stuff) A--WHILE (A != 0)
Bit Shift/Doubling
It isn’t possible to add two registers together directly, but it is possible to keep adding 1 in a loop until a register counts down to zero. This can kind of add the two registers together.
The following reads a number from the INPUT and puts it in both A and B and then proceeds to add them together, the result of course being to double the INPUT value. As this is binary, that also acts as a bit-shift-left.
0000 IN A # 0000 0100 A = B = INPUT0001 MOV B,A # 0000 00100010 ADD B,1111 # 1111 1010 B = B + (-1)0011 ADD B,0001 # 1000 1010 B = B + 10100 OUT B # 0000 1001 OUT B0101 ADD A,1111 # 1111 0000 A = A + (-1)0110 JNC 1000 # 0001 0111 JUMP IF NO CARRY to 10000111 JMP 0011 # 1100 1111 JUMP to 00111000 OUT B # 0000 1001 OUT B1001 JMP 1001 # 1001 1111 LOOP HERE FOREVER
Once again, I’m using a counting down from a value as the counter, so the logic going on here is something like the following:
B = A = INPUTDO ADD 1 to B A--WHILE (A != 0)
As I show the working by sending B to the OUTPUT on each pass through the loop, the last instruction is a “JUMP forever” instruction to pause the CPU to allow me to see the OUTPUT.
This can be slightly optimised a bit more if we swap the test (in A) to before the addition (in B).
0000 IN B # 0000 0110 A = B = INPUT0001 MOV A,B # 0000 10000010 ADD A,1111 # 1111 0000 A = A - 10011 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01110100 ADD B,0001 # 1000 1010 B++0101 OUT B # 0000 1001 OUTPUT = B (temporary display)0110 JMP 0010 # 0100 1111 JUMP to 00100111 OUT B # 0000 1001 OUTPUT = B (final result)
In theory, being able to add two registers like this ought to be pretty useful. But as these are the only two registers, and this operation is destructive – i.e. A counts down to zero – and there is no stack or other temporary storage, in reality the options are pretty limited!
Multiplication
Of course what I’d really like to do is multiply two numbers. Or at least, with a single INPUT, be able to multiply it by itself, so square it.
In principle that could be done via repeated addition in a loop, but I can’t think of a way to use two registers (with no storage) to allow me to run two loops. What I think I need to do is something like the following:
A = B = INPUTTOTAL = 0DO B = INPUT DO TOTAL++ B-- WHILE (B != 0) A--WHILE (A != 0)OUT TOTAL
But that requires keeping track of at least three different things, and I only have two areas of workable storage. Of course, if one of those things is fixed, e.g. multiplication by a fixed amount, then it is possible. But that fixed amount has to be hardcoded in a ROM instruction.
I have wondered if I could somehow hook up the OUTPUT to the INPUT to create a sort of third register.
The best I can do, is use half the ROM as a look-up table for the answers to the squares of 0, 1, 2, 3. Then I’m using the sort of “undocumented” instruction JMP B which adds immediate data to whatever is in the B register and jumps to that location.
0000 IN A # 0000 0100 A = INPUT0001 MOV B,A # 0000 0010 B = A0010 ADD B,1111 # 1111 1010 B = B + (-1)0011 ADD B,0001 # 1000 1010 B = B + 10100 ADD A,1111 # 1111 0000 A = A + (-1)0101 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01110110 JMP 0011 # 1100 1111 JUMP TO 00110111 JMP B,1000 # 0001 1011 "Undocumented" JUMP to B+10001000 OUT 0000 # 0000 1101 0*0 = 01001 JMP 1111 # 1111 11111010 OUT 0001 # 1000 1101 1*1 = 11011 JMP 1111 # 1111 11111100 OUT 0100 # 0010 1101 2*2 = 41101 JMP 1111 # 1111 11111110 OUT 1001 # 1001 1101 3*3 = 91111 JMP 1111 # 1111 1111
I have to allow for the lookup table to execute both an OUT and a JMP so I have to use the previous logic to multiply my index by two and then add it to 1000 and JMP to that value.
The pseudo code is as follows:
A = B = INPUTB = B * 2JMP 1000 + B1000 OUT 0; JMP END1010 OUT 1; JMP END1100 OUT 4; JMP END1110 OUT 9; JMP END1111 END
So, this works, and as 3*3 is the maximum we can calculate on a 4-bit OUTPUT register without overflowing anyway, arguably this is doing the job.
But it can hardly be described as multiplication!
If you know of a way to do it computationally, with only two registers, then answers on a postcard to… well, me.
Additional Notes on Multiplication
After posting this blog I ended up have two interesting conversations about the multiplication question.
Using logarithms
The first with Segebodo@chaos.social was a discussion about possibly using logarithms. You may recall that multiplication can be turned into addition by using logarithms.
They presented the following code to add together an arbitrary number of numbers by adding the log2(n) values:
0000 OUT B # 0000 1001 Assumes B=0 at start, then culmulates0001 IN A,1110 # 0111 0100 A = INPUT - 20010 JNC 0000 # 0000 0111 IF NC then A<2 STOP with B=00011 ADD B,0001 # 1000 1010 B++0100 IN A,1100 # 0011 0100 A = INPUT - 40101 JNC 0000 # 0000 0111 IF NC then A<4 STOP with B=10110 ADD B,0001 # 1000 1010 B++0111 IN A,1001 # 1001 0100 A = INPUT - 81000 JNC 0000 # 0000 0111 IF NC then A<8 STOP with B=21001 ADD B,0001 # 1000 1010 B++1010 IN A,0011 # 1100 0100 A = INPUT + 31011 JNC 0000 # 0000 0111 IF NC then A<12 STOP with B=31100 ADD B,0001 # 1000 1010 B++1101 JMP 0000 # 0000 1111 ELSE STOP with B=4
This is using the “undocumented” feature that IN A is actually IN A,data with values that test the range of A and will cumulatively add to B until the limit of the size of A has been found.
On entry, B will be 0 and the first number will be read from A until the JUMP happens. Then the second number can be set on the INPUT register and the calculation will continue with the last value of B, still adding values until the new A has been processed.
This is a great solution and very elegant, but the answer is given in terms of log2 values, so it isn’t a direct output. But apart from that is does work pretty well and is particularly neat for allowing the addition of an arbitrary number of different numbers. Well at least until the 4-bits are exhausted – it will carry on and keep overflowing of course, but the numbers will cease making sense.
Noting a curious mathematical property
I have another idea sent to me in a private conversation that pointed out that if I’m only doing squares, i.e. assuming a single INPUT value, then we can take advantage of the fact that the sum of the first n odd integers is equal to n^2:
0^2 = 0 = 01^2 = 0 + 1 = 12^2 = 0 + 1 + 3 = 43^2 = 0 + 1 + 3 + 5 = 94^2 = 0 + 1 + 3 + 5 + 7 = 165^2 = 0 + 1 + 3 + 5 + 7 + 9 = 256^2 = 0 + 1 + 3 + 5 + 7 + 9 + 11 = 36....
There is an explanation here: mathsisfun.com/numbers/odd-squ… and of course the diagram makes it obvious! I don’t know that it has a name though – let me know if you know otherwise!
So we can get away without needed nested loops as we’re just adding numbers now that are 2 integers apart.
B = INPUTA = 2B - 1 # This is the largest odd number we will be addingB = 0DO B = B + A A = A - 2WHILE A > 0OUTPUT = B
So I have to use the above routines I already have for adding registers and multiplying by 2 in the above, and if it will fit in 16 instructions, that should work!
0000 IN B # 0000 0110 A = B = INPUT0001 MOV A,B # 0000 1000# A = 2B - 1 - Shift left then subtract 10010 ADD B,1111 # 1111 1010 B = B - 10011 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01100100 ADD A,0001 # 1000 0000 A = A + 10101 JMP 0010 # 0100 1111 JUMP to 00100110 ADD A,1111 # 1111 0000 A = A - 1# Now add consecutive odd numbers...
Ok, here is where I hit a snag. The next part of the algorithm requires me to do B = B + A, but I can’t add two registers together directly – I have to count down one of the registers whilst adding 1 to the second.
That works fine for one pass through the loop, but at that point I’ve destroyed the value of either A or B and won’t be able to go back around for a 2nd pass…
I started to chew over if there was a way to work incrementally add the values, but every way I considered I ended up with the act that I am having to add two calculated values together at some point and I just can’t do that.
One interesting train of thought initially looked promising: noticing that each odd value to be added is a number of 2s as follows:
n=4: 4^2 = 1 + 3 + 5 + 7 = 2 - 1 + 2 + 2 - 1 + 2 + 2 + 2 - 1 + 2 + 2 + 2 + 2 - 1
That is the triangle number for n times two, less n times 1. The formular for a triangle number is as follows:
Tri(n) = n (n + 1) / 2
So using that many 2s and taking off the appropriate values too, gives me
Sq (n) = (2n(n + 1)) / 2 - n = (2n(n + 1) - 2n) / 2 = 2n^2 + 2n - 2n / 2 = 2n^2 / 2 = n^2
Ah, so basically the square (n) = square (n). That isn’t particularly insightful…
Ok, so this looked really promising, but fundamentally, if I can’t add the two registers together without trashing one of the values, I think I’m stuck again.
Conclusion
I must admit to wondering if the addressing could be extended to permit longer programs. I don’t see why not, if the PC register could be extended. One limitation might be that jumps remain localised to a 4-bit value somehow.
It is proving a real limitation not being able to add the two registers together – that is really limiting what I can work out how to do with it.
It seems that at least one other person has had similar thoughts and had started to work on an extended version – the TD4E and even the TD4E8: github.com/Nekhaevalex/TD4. It looks like a lot of the work was done in a simulation, but they did get to produce an assembler: hackaday.io/project/161708-4-b….
Other related things I’ve found include:
- Crazy Small CPU (4-bit data, 8-bit address): minnie.tuhs.org/Programs/Crazy…
- CHUMP ICS4U 4-bit TTL project: darcy.rsgc.on.ca/ACES/TEI4M/4B…
- TPS/MyCo (16 pages of 4-bit addressable memory): github.com/GClown25/BIT4
- MiniMax Enhanced TD4: github.com/denjhang/MiniMax-4-…
- TD4 CPU (adds paging): hackaday.io/project/26215-td4-…
- 4-bit TTL Scratch Built CPU (4-bit data, 8-bit address): ttlcpu.com/articles/4-bit-ttl-…
- “How to build a CPU TD4” includes a discussion of expanding the memory: xyama.sakura.ne.jp/hp/4bitCPU_…
Finally, there is this “meta list” of DIY CPU projects: ttlcpu.com/content/links.
And this seems to be a massive list of TD4 related links: cable-net.ne.jp/user/takahisa/… although, unfortunately many of the linked pages no longer exist. But there continues to be interest out there in the TD4!
I have a few ideas of my own, but if they amount to anything, well that still remains to be seen. But getting this far has been really interesting and I already feel like I’ve learned quite a lot about the lowest levels of our technology.
I have to say, “full stack developer” certainly takes on a new meaning when you get to these levels…
Kevin
GitHub - Nekhaevalex/TD4: Schematics and PCBs for extended version of TD4 CPU.
Schematics and PCBs for extended version of TD4 CPU. - Nekhaevalex/TD4GitHub
TD4 4-bit DIY CPU – Part 4
Having had quite a good play with the TD4 so far, I’m now starting to think how I might be able to enhance it slightly. First, I’m thinking about fairly simple things I can do with the kit as is. Then I might consider what I could do if I was to redesign it myself.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Debug Register Output
The biggest limitation I’ve found with using the kit as is, compared to say using a simulator online, is that it isn’t possible to see what is going on with the registers.But there is a pretty simple hack that will add LED outputs to each register.
The pinout for the 74HC161 is as follows:
And looking at the schematic, we can see that TC (15) is unconnected, but CET (10) links to GND. This means that if we want to catch the outputs O0-O3 (14-11) we just need an LED and resistor between the On pin and pin 10.By far the easiest way to do that is with one of these LED “bars”:
These are available very cheaply online and tend to be blocks of four, six or eight surface mount LEDs and resistors connected to a common GND or VCC point. These are common-GND, six-way, blue modules which are perfect.When the GND is aligned with CET on pin 10, the LED connections cover pins 11-16, which provides a handy VCC indicator for the register chip too, which makes it easy to know everything is connected and working.
The best way of installing these I decided was to use a spare 16-way DIP socket and insert the LED bar in the correct place as shown above. Then this can be simply slotted over the top of the chip itself on the PCB. This isn’t, obviously, a robust connection, but it is sufficient for debugging.
I’ve used two of these as shown below, for each register. I don’t need one for OUT as that is shown on the board anyway, and whilst I could add one for the PC, for now, this is fine.
Debug PC Output
As mentioned above, adding an indicator for the Program Counter should be relatively straight forward.But I’d forgotten something. The PC has an automatic increment enabled which means that CET isn’t tied to GND, but to VCC. this means that the above won’t work directly as there is no convenient GND point next to the OUTPUTs. Still, once rewired to an actual GND it would still work in principle, but it will be a binary number representing the instruction.
What I’d really like is something next to the ROM bank showing which instruction is being read. One option is to hang LEDs off the ROM address decoder, shown in the schematic below.
But when we look at the pinout, it isn’t quite as simple as the registers. Also, as these are active LOW then I’d need one of the common VCC LED modules I think so that the LED will only turn on when the signal goes LOW, or just accept that all LEDs will be on apart from the active instruction.One option might be a small, custom PCB to provide the required readout that can go between the 154 and the PCB. But ultimately all the signals from the 154 end up connected to each back of 8 diodes feeding the DIP switches, so that is another possibility.
Each DIP switch circuit is as follows:
The “scan” works as follows:
- All data lines are pulled HIGH by the resistors shown.
- By default all feed lines (the single connection point for all the diodes) will also be HIGH.
- Except when the 154 selects that output and drives it LOW.
- When the block is selected, then any switches that are set will drive the associated data lines LOW too, but just for this selected switch block.
- Later in the circuit all data lines go via the 74HC540 octal inverting buffer/line driver so they become active HIGH again.
So attaching an LED between the common connection point and VCC could show which block is active (LOW).
But now I’m wondering if the diodes could simply be replaced with LEDs anyway. In principle they would serve the same purpose – prevent “ghosting” between the blocks of switches, and the current is already limited by the 10K pull-up resistors.
This would mean that the active program counter would have its data word illuminated when active for all situations apart from the data word being all zero. This would correspond to the instruction ADD A, 0000 which is kind of a “NOP” anyway, so I don’t think that would be a significant limitation.
I am seriously considering buying a second kit now to see if I can use LEDs instead of diodes…
Returning to some kind of PC indicator for a moment. It would be possible to attach an LED to the underside of the board across the diode and switch such that it could indicate when that last block in the row is active. However this would only work if the switch that the LED is soldered to is open.
The LED has to be soldered so that the cathode (short leg) is soldered to the top of the existing diode and the anode (long leg) is bent around to connect to the bottom of the switch block as shown below.
So when switch 8 is ON the LED will not work. That means if there is an OUT or a JMP then the LED won’t light, but that is better than nothing perhaps and is a pretty simple addition to the board.And if there are spare instruction slots remaining then the last instruction can just be a NOP (ADD A, 0).
Simulator Enhancement
Whilst testing out the log2 idea from Part 3, I started with the simulator provided by umtkm on github here: umtkm.github.io/td4/But that has a minor issue in that it implements the TD4 as described, but not as fully implemented in logic. In particular, the undocumented feature that all commands will always add an immediate value if provided, even if the officially described assembler instruction doesn’t include it.
The example in question, used for the log2 code, was IN A. The logic actually implements IN A + im even though the description doesn’t mention it. The instruction table lists this as:
IN A 0010 0000
But there is no reason why that 0000 couldn’t be any value. This is a consequence of using the adder to load a register – it will always add in the immediate value too.By downloading the javascript from the original repository github.com/umtkm/td4 it is possible to tweak it to implement the undocumented feature.
The function “purse_order” handles the implementation of each instruction, so the immediate value needs adding to the register for the IN instruction, and it needs the CARRY building in too. I’ve also added it to the two MOV register functions too.
Here is my new implementation:
function purse_order(bin) { var op = bin.slice(0,4); var im = parseInt(bin.slice(-4),2); registor_a &= 15; registor_b &= 15; switch (op) { case '0000': registor_a += im; c_flag = registor_a&16 ? 1 : 0; return; case '0101': registor_b += im; c_flag = registor_b&16 ? 1 : 0; return; case '0011': registor_a = im; break; case '0111': registor_b = im; break; case '0001': registor_a = registor_b + im; c_flag = registor_a&16 ? 1 : 0; return; case '0100': registor_b = registor_a + im; c_flag = registor_b&16 ? 1 : 0; return; case '1111': program_counter = im; break; case '1110': if (c_flag === 0) program_counter = im; break; case '0010': registor_a = input_port + im; c_flag = registor_a&16 ? 1 : 0; return; case '0110': registor_b = input_port + im; c_flag = registor_b&16 ? 1 : 0; return; case '1001': output_port = registor_b; break; case '1011': output_port = im; break; } c_flag = 0;}
There are four instructions not implemented in the emulator at present – the two duplicate OUT instructions, JMP B and JNC B. I might see if I can add the two jumps as that would be really useful too.But this allows me to test the log2 routine in simulation now too.
Conclusion
Both of the hardware updates described so far are relatively simple, but quite useful additions to the kit.I’ll come back to this post as I think of other things to do.
Kevin
#td4
GitHub - umtkm/td4: TD4 Emulator Web
TD4 Emulator Web. Contribute to umtkm/td4 development by creating an account on GitHub.GitHub