The Worst Ideas in Programming
CS 301: Assembly
Language Programming Lecture, Dr. Lawlor
Everything has a Name
There are lots of times we accidentally make lists of things using
names instead of numbers:
std::string faculty0="Chris";
std::string faculty1="Glenn";
std::string faculty2="Jon";
// todo: print all faculty
// todo: write the list of faculty to the disk
(Try
this in NetRun now!)
We've accidentally structured the program to make it impossible to
loop over the faculty, so we don't. We thus write three copies
of the print code, three copies of the read code, three copies of
the write code, three...wait, we just hired a new guy. We now
have to change all the code that exists.
Sometimes you should have a unique special name. But often
it's much easier to just be a number.
The solution is to make an array of faculty, so it's faculty[0]
instead of faculty0. This means we can iterate over the list
of faculty. We can add faculty and all the old code just works
with the new guy.
The same situation shows up over and over again, but sometimes it's
hard to solve.
- Several variables: just put them in an array, or vector, or
list, and you can loop over them.
- Several functions: register them in a table of function
pointers, and you can loop over them.
- Several fields in a class: if they're all the same type, just
typecast them to an array. If they're different sizes or
types, you need structural
recursion; iterating over the fields in a class is really
easy in most languages, but needs manual support in C++ (I have
done it with pup,
but it needs support from every class using it).
Segmented Memory
It's
1978. Your new microprocessor has 16-bit registers, but that
only lets you address 64KB of RAM, which is just barely not
enough.
Instead of
making the leap to 32-bit registers, you add a second 16-bit "segment
register" used to determine which 64KB of RAM you want to
access. You make the very odd choice to combine segment
register and offset into a single pointer address like this:
address = (segment << 4) + offset
This means
hardware address 0xA1234 (in the middle of the VGA screen) might
get split up into segment 0xA000 and offset 0x1234, sometimes
written 0xA000:1234.
Unfortunately,
your microprocessor, the Intel 8086, is a huge success, and is the
chip used by IBM for the IBM PC. Segment registers go
on to make programmer's lives miserable all through the MS-DOS and
Windows 3.1 era.
What's so bad
about segmented memory? Because in segmented mode:
- The compiler needs to support at least two kinds of pointers:
"near" pointers with just an offset, and "far" (or "long")
pointers have a segment plus an offset. (If you ever see
code with "int far *ptr=...")
- If you add 64K to the offset, it overflows and wraps
around. To increment through a "far" array, you need to
check for this overflow and move the segment register (by 64K/16
= 0x1000 bytes). In practice, this made operating on more
than 64KB of data much slower than below 64KB, leading to lots
and lots of implicit assumptions that "64K should be enough".
- Because the segment and offset overlap, the same address has
lots of different representations (0xA000:1234 and 0xA100:0234
and 0xA020:1034 are all the same address). This makes
comparing pointers basically impossible, unless you first
convert them to a sane "flat" format.
- It only bought you 4 more bits of address space, so instead of
being limited to 64KB, all this pain only increased the limit to
1MB. MS-DOS only supported 640KB of RAM, and used the rest
of the address space for hardware devices.
- Different kinds of memory access used different implicit
segment registers--e.g., a stack access uses the stack segment
register ss, while data access uses ds by default. So if
you want to pass a stack value as a data pointer, you need to
either use a far pointer, or override the default segment.
Segment registers still exist in the hardware, even on a modern
64-bit Skylake box, although operating systems use them for their
own purposes, and won't let you use them like you could in the old
days.
; Load up the GS segment register with an offset
; (via eax, because mov gs,constant is not an instruction!)
mov eax,1
mov gs,eax
; Load a byte using the segment register
; (on modern OS, segment is ignored)
mov al,[gs:where_am_i]
ret
section .data
where_am_i:
db "abcdefghijklmnopqrstuvwxyz",0
(Try
this in NetRun now!)
The full set of x86 segment registers:
- cs: code segment. Execution happens here. Disabled
in 64-bit mode.
- ss: stack segment. push and pop happen here.
Disabled in 64-bit mode.
- ds: data segment. Normal loads and stores happen
here (unless you override the segment). Disabled in 64-bit
mode.
- es: extra segment, for your own uses. Disabled in
64-bit mode.
- fs and gs: yet more extra segments. They're still
existent in 64-bit mode. gcc uses one for the "stack
protector"; the other one is used by the OS for thread-local
memory.
Generally, segmented memory is reviled as one of the worst ideas
from the x86.
Writing Everything in Assembly
I have a love/hate relationship with assembly language. No
other language gets you so close to the bare metal of the CPU,
giving you so much power and control over everything the CPU
does. But no other language gives you so many
opportunities to silently get the wrong answer or crash: by
miscounting bytes, using the wrong register, or operating on the
wrong types.
Compared to a program written in a high-level language, a program
written in assembly language is:
- Much larger and more complex.
- Much harder to understand and debug.
- Much harder to move between machines and operating systems.
Don't do it! Like nuclear weapons, assembly is a powerful tool
to have in your toolbox, but not the right solution to every
problem. I personally get a lot more useful work done by
reading assembly than writing it.
Other bad ideas: