The Worst Ideas in Programming

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Everything has a Name

There are lots of times we accidentally make lists of things using names instead of numbers:

std::string faculty0="Chris";
std::string faculty1="Glenn";
std::string faculty2="Jon";

// todo: print all faculty
// todo: write the list of faculty to the disk

(Try this in NetRun now!)

We've accidentally structured the program to make it impossible to loop over the faculty, so we don't. We thus write three copies of the print code, three copies of the read code, three copies of the write code, three...wait, we just hired a new guy. We now have to change all the code that exists.

Sometimes you should have a unique special name. But often it's much easier to just be a number.

The solution is to make an array of faculty, so it's faculty[0] instead of faculty0. This means we can iterate over the list of faculty. We can add faculty and all the old code just works with the new guy.

The same situation shows up over and over again, but sometimes it's hard to solve.

Several variables: just put them in an array, or vector, or list, and you can loop over them.
Several functions: register them in a table of function pointers, and you can loop over them.
Several fields in a class: if they're all the same type, just typecast them to an array. If they're different sizes or types, you need structural recursion; iterating over the fields in a class is really easy in most languages, but needs manual support in C++ (I have done it with pup, but it needs support from every class using it).

Segmented Memory

It's 1978. Your new microprocessor has 16-bit registers, but that only lets you address 64KB of RAM, which is just barely not enough.

Instead of making the leap to 32-bit registers, you add a second 16-bit "segment register" used to determine which 64KB of RAM you want to access. You make the very odd choice to combine segment register and offset into a single pointer address like this:
address = (segment << 4) + offset

This means hardware address 0xA1234 (in the middle of the VGA screen) might get split up into segment 0xA000 and offset 0x1234, sometimes written 0xA000:1234.

Unfortunately, your microprocessor, the Intel 8086, is a huge success, and is the chip used by IBM for the IBM PC. Segment registers go on to make programmer's lives miserable all through the MS-DOS and Windows 3.1 era.

What's so bad about segmented memory? Because in segmented mode:

The compiler needs to support at least two kinds of pointers: "near" pointers with just an offset, and "far" (or "long") pointers have a segment plus an offset. (If you ever see code with "int far *ptr=...")
If you add 64K to the offset, it overflows and wraps around. To increment through a "far" array, you need to check for this overflow and move the segment register (by 64K/16 = 0x1000 bytes). In practice, this made operating on more than 64KB of data much slower than below 64KB, leading to lots and lots of implicit assumptions that "64K should be enough".
Because the segment and offset overlap, the same address has lots of different representations (0xA000:1234 and 0xA100:0234 and 0xA020:1034 are all the same address). This makes comparing pointers basically impossible, unless you first convert them to a sane "flat" format.
It only bought you 4 more bits of address space, so instead of being limited to 64KB, all this pain only increased the limit to 1MB. MS-DOS only supported 640KB of RAM, and used the rest of the address space for hardware devices.
Different kinds of memory access used different implicit segment registers--e.g., a stack access uses the stack segment register ss, while data access uses ds by default. So if you want to pass a stack value as a data pointer, you need to either use a far pointer, or override the default segment.

Segment registers still exist in the hardware, even on a modern 64-bit Skylake box, although operating systems use them for their own purposes, and won't let you use them like you could in the old days.

; Load up the GS segment register with an offset
;  (via eax, because mov gs,constant is not an instruction!)
mov eax,1
mov gs,eax

; Load a byte using the segment register
;  (on modern OS, segment is ignored)
mov al,[gs:where_am_i]

ret

section .data

where_am_i:
	db "abcdefghijklmnopqrstuvwxyz",0

(Try this in NetRun now!)

The full set of x86 segment registers:

cs: code segment. Execution happens here. Disabled in 64-bit mode.
ss: stack segment. push and pop happen here. Disabled in 64-bit mode.
ds: data segment. Normal loads and stores happen here (unless you override the segment). Disabled in 64-bit mode.
es: extra segment, for your own uses. Disabled in 64-bit mode.
fs and gs: yet more extra segments. They're still existent in 64-bit mode. gcc uses one for the "stack protector"; the other one is used by the OS for thread-local memory.

Generally, segmented memory is reviled as one of the worst ideas from the x86.

Writing Everything in Assembly

I have a love/hate relationship with assembly language. No other language gets you so close to the bare metal of the CPU, giving you so much power and control over everything the CPU does. But no other language gives you so many opportunities to silently get the wrong answer or crash: by miscounting bytes, using the wrong register, or operating on the wrong types.

Compared to a program written in a high-level language, a program written in assembly language is:

Much larger and more complex.
Much harder to understand and debug.
Much harder to move between machines and operating systems.

Don't do it! Like nuclear weapons, assembly is a powerful tool to have in your toolbox, but not the right solution to every problem. I personally get a lot more useful work done by reading assembly than writing it.

Other bad ideas:

Joel on software, on rewriting everything from scratch.
quora has a good list of bad ideas.