Kevin's Blog

Kevin's Blog

2 months ago (Received 1 month ago) • •

Kevin's Blog
2 months ago (Received 1 month ago) • •

TD4 4-bit DIY CPU – Part 2

Having spent some time analysing how the TD4 4-bit DIY CPU works, I now have my kit, so this post has some notes about the build.

Part 1 – Introduction, Discussion and Analysis
Part 2 – Building and Hardware
Part 3 – Programming and Simple Programs
Part 4 – Some hardware enhancements
Part 5 – My own PCB version
Part 6 – Replacing the ROM with a microcontroller
Part 7 – Creating an Arduino “assembler” for the TD4

makertube.net/w/hhtpEp7XxpJkgx…

The kit I have is labelled “MUSE LAB TD4 Ver 1.3”. Muse labs have a store on Tindie here: tindie.com/stores/johnnywu/ and this kit is available all over Aliexpress. For me it cost around £20-£25.

The PCB

There are a few immediate things to note:

I have v1.3 of the PCB according to the printed label.
There are no component values, so the BOM and schematic will be pretty critical at working out what goes where.
The micro USB connector and four OUTPUT LEDs are all surface mount.
There is an additional 5V/GND pin header connector.
That is a lot of diodes… 128 to be precise.

The GitHub link has a HTML BOM and schematic in the hardware/v1.3 directory. Note these are very slightly different to some of the information in the docs area (apparently there is a mistake in the docs schematic, but the one in v1.3 is correct).

The key parts of BOM, which lists every single component, including each diode, individually, are:

Resistors
R1, R4, R9, R12-19	1K
R2, R7, R8	100K
R3, R20-27	10K
R5	3K3
R6	33K
R10, R11	100R

Capacitors
C1, C2, C3	10uF

Diodes
All	1N4148

The ICs are as follows:

IC
U1	74HC74	Dual D-Type Flip-flop
U2	74HC14	Hex Schmitt Trigger Inverter
U3, U4, U5, U6	74HC161	4-bit Synchronous Binary Counter
U7, U8	74HC153	Dual 4-Line to 1-Line Data Selector/Multiplexer
U9	74HC283	4-Bit Binary Full Adder with Fast Carry
U10	74HC32	Quad 2-Input Positive OR Gate
U11	74HC10	Three 3-Input Positive NAND Gate
U12	74HC540	Octal Buffer and Line Driver with Tri-State Outputs
U13	74HC154	4-Line to 16-Line Decoder/Demultiplexer

In order to make building slightly easier, I printed out the board diagram from the GitHub repository and added the resistor values and IC labels in the correct places.

Before doing anything else, I tackled the two surface mount elements first: the micro USB connector and then the OUTPUT LEDs. I figured if I messed it up then at least I won’t have wasted time building the rest of it first.

As it happens, although it was pretty tricky, as the USB connector only requires the outer two pins connected – for power and ground – it was just about manageable. I pasted the area with loads of flux and popped a bit of solder on each of the two pads. Then I pressed the connector in place and reheated the area until it seemed like it had taken.

One annoyance – the connector’s stress relief connections don’t go through the PCB very far, so I’m really not confident it will last, but I’ve done my best.

The LEDs weren’t too bad. Again after pasting the area with flux, I put some solder on the top pads and then used tweezers to position and solder on one side of an LED. Once that seemed ok I could do the other side and return to the first pad if necessary.

It is important (naturally with LEDs) to ensure the direction is correct. There is an arrow on the underside of each LED and the negative terminal is highlighted in green.

I was able to test the USB socket by plugging in a USB cable and checking for continuity or shorts between the outer USB connections and the GND and 5V header pin socket on the PCB. That seemed ok.

I could similarly test the LEDs by using a multimeter LED tester between the hole/pad just above the LED and the GND header pin socket. That all worked too.

Then it was a case of just getting on with it.

That was a lot of diodes! Referring to my resistor map from above, I did those next, followed by all the IC sockets.

Then I added the two tactile buttons, the 10uF capacitors, the 2-pin power header, all the DIP switches, and finally the two slider switches.

Here is the fully populated board. Note all the vertical ICs face the same direction – pin 1 down – apart from the largest one which is pin 1 up.

Conclusion

This was a really fun build and once the two surface mount elements were done, relatively straight forward.

And best of all – it worked first time! The video shows a simple program to load four different values directly into the OUTPUT register and then loop back to the start.

The main issue I have is that each set of 8 bits is backwards to how I’d expect to be able to read them!

Anyway, time to think about some actual programs to run.

Kevin

#cpu #pcb #td4

MuseLab on Tindie

Dedicated to make interesting hardware

^Tindie

Kevin's Blog

2025-10-01 21:28:17

TD4 4-bit DIY CPU – Part 3

Having now spent some time thinking about and building my TD4 4-bit CPU this has a look at some code that I can run on it.
Part 1 – Introduction, Discussion and Analysis
Part 2 – Building and Hardware
Part 3 – Programming and Simple Programs
Part 4 – Some hardware enhancements
Part 5 – My own PCB version
Part 6 – Replacing the ROM with a microcontroller
Part 7 – Creating an Arduino “assembler” for the TD4

Programming

As described in Part 1 each command has a command part (bits 4-7) and an immediate data part (bits 0-3).
Unfortunately, due to how the PCB is built, when programming the DIP switches, the bits are ordered 01234567.
Now it would be possible to turn the board upside down of course, but then the program would flow from bottom right to top left instead of the more natural top left to bottom right.
This is also slightly confusing as the IO (LED OUTPUT and DIP switch INPUT) are in the more expected 3210 order.
Consequently, in the following programs, I list the “assembly” but then show the bit patterns in DIP switch order for easy programming!
Aside, after a bit more messing around online, I stumbled across a web-based TD4 emulator here: umtkm.github.io/td4/
And even an implementation in Minecraft…
Clock Options

There are several options for running the board. It has a built-in clock with two speeds: 1Hz or 10Hz. But there is also a “single stepping” option using the button.
Simple LED Output

Sending a value via the OUT instruction will push that value into the OUT register and consequently onto the LEDs. The simplest LED program is therefore the simple “OUT value”, but to make it at least look like something is happening, we can send two alternating values as follows:
0000 OUT 0001 # 1000 11010001 OUT 1000 # 0001 11010010 JMP 0000 # 0000 1111
This flashes bit 1, then bit 4 and then jumps back to the start.
Larson Scanner

Swapping values is no fun, but it is easy to create a 4-bit “Larson Scanner” with a sequence of OUT instructions as shown below.
0000 OUT 0001 # 1000 11010001 ADD A,0000 # 0000 00000010 OUT 0010 # 0100 11010011 ADD A,0000 # 0000 00000100 OUT 0100 # 0010 11010101 ADD A,0000 # 0000 00000110 OUT 1000 # 0001 11010111 ADD A,0000 # 0000 00001000 OUT 0100 # 0010 11011001 ADD A,0000 # 0000 00001010 OUT 0010 # 0100 11011011 ADD A,0000 # 0000 00001100 JMP 0000 # 0000 1111
The ADD instructions are effectively NOP instructions. They are just there to make the changes all two clock cycles. Without these, the LEDs change quite smoothly but then there is a short pause whilst the JMP takes place.
Counting Up and Down

We only have an OUT instruction that takes an immediate value or will use the B register, so if we want to do some counting and output the result, that has to happen using B.
0000 ADD B,0001 # 1000 10100001 OUT B # 0000 10010010 JMP 0000 # 0000 1111
Counting down is a bit more tricky as there is no subtract. But like the vast majority of other CPU architectures, if we use two complement for negative numbers, then the magic of wrapping around binary numbers will do the trick.
Twos complement means taking the bitwise opposite and adding 1, so the twos complement value for n is NOT(n) + 1. For example:
-1 => ~(0001) + 0001 = 1110 + 0001 = 1111 = 15
So working through 5 – 1:
0101 -> 5+ 1111 -> -1 ----- 0100 -> 4 1111
When the binary digit wraps around, and we ignore the “carry”, that gives the correct answer: 4.
0000 ADD B,1111 # 1111 10100001 OUT B # 0000 10010010 JMP 0 # 0000 1111
Taking INPUT

This will simply echo what is on the INPUT switches to the OUTPUT LEDs.
0000 IN B # 0000 01100001 OUT B # 0000 10010010 JMP 00000 # 0000 1111
This will add 1 to the INPUT
0000 IN B # 0000 01100001 ADD B,0001 # 1000 10100010 OUT B # 0000 10010011 JMP 0000 # 0000 1111
Although there is actually a bit of a cheat here. OUT B assumes the immediate data is 0 otherwise that gets added to B as part of the OUT, so really the above could be shorted to the following:
0000 IN B # 0000 01100001 OUT B, ADD 0001 # 1000 10010010 JMP 0000 # 0000 1111
Pretty much any instruction that assumes an immediate value of 0 can have the side effect of addition if an actual value is used instead as a side effect of using the accumulator to move values around…
Programmable Counter

This will read the input and count up to that value then stop. It makes use of the JNC – jump if not carry and twos-complement subtraction to count down from the INPUT value.
0000 IN A # 0000 0100 A = INPUT0001 MOV B,1111 # 1111 1110 B = -10010 ADD B,0001 # 1000 1010 B = B + 10011 OUT B # 0000 1001 OUTPUT = B0100 ADD A,1111 # 1111 0000 A = A + (-1)0101 JNC 0111 # 1110 0111 JUMP IF NO CARRY TO 10000110 JMP 0010 # 0100 1111 JUMP to 00110111 MOV B,0000 # 0000 1110 B = 0
The JUMP IF NOT CARRY works as when 1111 is added to A, the only value of A that won’t generate a CARRY is 0, so this is really executing the logic:
DO (other stuff) A--WHILE (A ![url=https://mast.lat/users/gusty]=[/url] 0)
Bit Shift/Doubling

It isn’t possible to add two registers together directly, but it is possible to keep adding 1 in a loop until a register counts down to zero. This can kind of add the two registers together.
The following reads a number from the INPUT and puts it in both A and B and then proceeds to add them together, the result of course being to double the INPUT value. As this is binary, that also acts as a bit-shift-left.
0000 IN A # 0000 0100 A = B = INPUT0001 MOV B,A # 0000 00100010 ADD B,1111 # 1111 1010 B = B + (-1)0011 ADD B,0001 # 1000 1010 B = B + 10100 OUT B # 0000 1001 OUT B0101 ADD A,1111 # 1111 0000 A = A + (-1)0110 JNC 1000 # 0001 0111 JUMP IF NO CARRY to 10000111 JMP 0011 # 1100 1111 JUMP to 00111000 OUT B # 0000 1001 OUT B1001 JMP 1001 # 1001 1111 LOOP HERE FOREVER
Once again, I’m using a counting down from a value as the counter, so the logic going on here is something like the following:
B = A = INPUTDO ADD 1 to B A--WHILE (A ![url=https://mast.lat/users/gusty]=[/url] 0)
As I show the working by sending B to the OUTPUT on each pass through the loop, the last instruction is a “JUMP forever” instruction to pause the CPU to allow me to see the OUTPUT.
This can be slightly optimised a bit more if we swap the test (in A) to before the addition (in B).
0000 IN B # 0000 0110 A = B = INPUT0001 MOV A,B # 0000 10000010 ADD A,1111 # 1111 0000 A = A - 10011 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01110100 ADD B,0001 # 1000 1010 B++0101 OUT B # 0000 1001 OUTPUT = B (temporary display)0110 JMP 0010 # 0100 1111 JUMP to 00100111 OUT B # 0000 1001 OUTPUT = B (final result)
In theory, being able to add two registers like this ought to be pretty useful. But as these are the only two registers, and this operation is destructive – i.e. A counts down to zero – and there is no stack or other temporary storage, in reality the options are pretty limited!
Multiplication

Of course what I’d really like to do is multiply two numbers. Or at least, with a single INPUT, be able to multiply it by itself, so square it.
In principle that could be done via repeated addition in a loop, but I can’t think of a way to use two registers (with no storage) to allow me to run two loops. What I think I need to do is something like the following:
A = B = INPUTTOTAL = 0DO B = INPUT DO TOTAL++ B-- WHILE (B ![url=https://mast.lat/users/gusty]=[/url] 0) A--WHILE (A ![url=https://mast.lat/users/gusty]=[/url] 0)OUT TOTAL
But that requires keeping track of at least three different things, and I only have two areas of workable storage. Of course, if one of those things is fixed, e.g. multiplication by a fixed amount, then it is possible. But that fixed amount has to be hardcoded in a ROM instruction.
I have wondered if I could somehow hook up the OUTPUT to the INPUT to create a sort of third register.
The best I can do, is use half the ROM as a look-up table for the answers to the squares of 0, 1, 2, 3. Then I’m using the sort of “undocumented” instruction JMP B which adds immediate data to whatever is in the B register and jumps to that location.
0000 IN A # 0000 0100 A = INPUT0001 MOV B,A # 0000 0010 B = A0010 ADD B,1111 # 1111 1010 B = B + (-1)0011 ADD B,0001 # 1000 1010 B = B + 10100 ADD A,1111 # 1111 0000 A = A + (-1)0101 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01110110 JMP 0011 # 1100 1111 JUMP TO 00110111 JMP B,1000 # 0001 1011 "Undocumented" JUMP to B+10001000 OUT 0000 # 0000 1101 0*0 = 01001 JMP 1111 # 1111 11111010 OUT 0001 # 1000 1101 1*1 = 11011 JMP 1111 # 1111 11111100 OUT 0100 # 0010 1101 2*2 = 41101 JMP 1111 # 1111 11111110 OUT 1001 # 1001 1101 3*3 = 91111 JMP 1111 # 1111 1111
I have to allow for the lookup table to execute both an OUT and a JMP so I have to use the previous logic to multiply my index by two and then add it to 1000 and JMP to that value.
The pseudo code is as follows:
A = B = INPUTB = B * 2JMP 1000 + B1000 OUT 0; JMP END1010 OUT 1; JMP END1100 OUT 4; JMP END1110 OUT 9; JMP END1111 END
So, this works, and as 3*3 is the maximum we can calculate on a 4-bit OUTPUT register without overflowing anyway, arguably this is doing the job.
But it can hardly be described as multiplication!
If you know of a way to do it computationally, with only two registers, then answers on a postcard to… well, me.
Additional Notes on Multiplication

After posting this blog I ended up have two interesting conversations about the multiplication question.
Using logarithms

The first with Segebodo@[url=https://chaos.social/actor]chaos.social[/url] was a discussion about possibly using logarithms. You may recall that multiplication can be turned into addition by using logarithms.
They presented the following code to add together an arbitrary number of numbers by adding the log2(n) values:
0000 OUT B # 0000 1001 Assumes B=0 at start, then culmulates0001 IN A,1110 # 0111 0100 A = INPUT - 20010 JNC 0000 # 0000 0111 IF NC then A<2 STOP with B=00011 ADD B,0001 # 1000 1010 B++0100 IN A,1100 # 0011 0100 A = INPUT - 40101 JNC 0000 # 0000 0111 IF NC then A<4 STOP with B=10110 ADD B,0001 # 1000 1010 B++0111 IN A,1001 # 1001 0100 A = INPUT - 81000 JNC 0000 # 0000 0111 IF NC then A<8 STOP with B=21001 ADD B,0001 # 1000 1010 B++1010 IN A,0011 # 1100 0100 A = INPUT + 31011 JNC 0000 # 0000 0111 IF NC then A<12 STOP with B=31100 ADD B,0001 # 1000 1010 B++1101 JMP 0000 # 0000 1111 ELSE STOP with B=4
This is using the “undocumented” feature that IN A is actually IN A,data with values that test the range of A and will cumulatively add to B until the limit of the size of A has been found.
On entry, B will be 0 and the first number will be read from A until the JUMP happens. Then the second number can be set on the INPUT register and the calculation will continue with the last value of B, still adding values until the new A has been processed.
This is a great solution and very elegant, but the answer is given in terms of log2 values, so it isn’t a direct output. But apart from that is does work pretty well and is particularly neat for allowing the addition of an arbitrary number of different numbers. Well at least until the 4-bits are exhausted – it will carry on and keep overflowing of course, but the numbers will cease making sense.
Noting a curious mathematical property

I have another idea sent to me in a private conversation that pointed out that if I’m only doing squares, i.e. assuming a single INPUT value, then we can take advantage of the fact that the sum of the first n odd integers is equal to n^2:
0^2 = 0 = 01^2 = 0 + 1 = 12^2 = 0 + 1 + 3 = 43^2 = 0 + 1 + 3 + 5 = 94^2 = 0 + 1 + 3 + 5 + 7 = 165^2 = 0 + 1 + 3 + 5 + 7 + 9 = 256^2 = 0 + 1 + 3 + 5 + 7 + 9 + 11 = 36....
There is an explanation here: mathsisfun.com/numbers/odd-squ… and of course the diagram makes it obvious! I don’t know that it has a name though – let me know if you know otherwise!
So we can get away without needed nested loops as we’re just adding numbers now that are 2 integers apart.
B = INPUTA = 2B - 1 # This is the largest odd number we will be addingB = 0DO B = B + A A = A - 2WHILE A > 0OUTPUT = B
So I have to use the above routines I already have for adding registers and multiplying by 2 in the above, and if it will fit in 16 instructions, that should work!
0000 IN B # 0000 0110 A = B = INPUT0001 MOV A,B # 0000 1000# A = 2B - 1 - Shift left then subtract 10010 ADD B,1111 # 1111 1010 B = B - 10011 JNC 0111 # 1110 0111 JUMP IF NO CARRY to 01100100 ADD A,0001 # 1000 0000 A = A + 10101 JMP 0010 # 0100 1111 JUMP to 00100110 ADD A,1111 # 1111 0000 A = A - 1# Now add consecutive odd numbers...
Ok, here is where I hit a snag. The next part of the algorithm requires me to do B = B + A, but I can’t add two registers together directly – I have to count down one of the registers whilst adding 1 to the second.
That works fine for one pass through the loop, but at that point I’ve destroyed the value of either A or B and won’t be able to go back around for a 2nd pass…
I started to chew over if there was a way to work incrementally add the values, but every way I considered I ended up with the act that I am having to add two calculated values together at some point and I just can’t do that.
One interesting train of thought initially looked promising: noticing that each odd value to be added is a number of 2s as follows:
n=4: 4^2 = 1 + 3 + 5 + 7 = 2 - 1 + 2 + 2 - 1 + 2 + 2 + 2 - 1 + 2 + 2 + 2 + 2 - 1
That is the triangle number for n times two, less n times 1. The formular for a triangle number is as follows:
Tri(n) = n (n + 1) / 2
So using that many 2s and taking off the appropriate values too, gives me
Sq (n) = (2n(n + 1)) / 2 - n = (2n(n + 1) - 2n) / 2 = 2n^2 + 2n - 2n / 2 = 2n^2 / 2 = n^2
Ah, so basically the square (n) = square (n). That isn’t particularly insightful…
Ok, so this looked really promising, but fundamentally, if I can’t add the two registers together without trashing one of the values, I think I’m stuck again.
Conclusion

I must admit to wondering if the addressing could be extended to permit longer programs. I don’t see why not, if the PC register could be extended. One limitation might be that jumps remain localised to a 4-bit value somehow.
It is proving a real limitation not being able to add the two registers together – that is really limiting what I can work out how to do with it.
It seems that at least one other person has had similar thoughts and had started to work on an extended version – the TD4E and even the TD4E8: github.com/Nekhaevalex/TD4. It looks like a lot of the work was done in a simulation, but they did get to produce an assembler: hackaday.io/project/161708-4-b….
Other related things I’ve found include:
Crazy Small CPU (4-bit data, 8-bit address): minnie.tuhs.org/Programs/Crazy…
CHUMP ICS4U 4-bit TTL project: darcy.rsgc.on.ca/ACES/TEI4M/4B…
TPS/MyCo (16 pages of 4-bit addressable memory): github.com/GClown25/BIT4
MiniMax Enhanced TD4: github.com/denjhang/MiniMax-4-…
TD4 CPU (adds paging): hackaday.io/project/26215-td4-…
4-bit TTL Scratch Built CPU (4-bit data, 8-bit address): ttlcpu.com/articles/4-bit-ttl-…
“How to build a CPU TD4” includes a discussion of expanding the memory: xyama.sakura.ne.jp/hp/4bitCPU_…
Finally, there is this “meta list” of DIY CPU projects: ttlcpu.com/content/links.
And this seems to be a massive list of TD4 related links: cable-net.ne.jp/user/takahisa/… although, unfortunately many of the linked pages no longer exist. But there continues to be interest out there in the TD4!
I have a few ideas of my own, but if they amount to anything, well that still remains to be seen. But getting this far has been really interesting and I already feel like I’ve learned quite a lot about the lowest levels of our technology.
I have to say, “full stack developer” certainly takes on a new meaning when you get to these levels…
Kevin
#4bit #cpu #td4

GitHub - Nekhaevalex/TD4: Schematics and PCBs for extended version of TD4 CPU.
Schematics and PCBs for extended version of TD4 CPU. - Nekhaevalex/TD4
^GitHub

This entry was edited (1 month ago)

⇧