# CS 601 Homework 0

I'd like you to:

1. Design a new low-level machine architecture "you86" that can express loops, do arithmetic, read and write integers outside the machine, and read and write values from an array of up to 64 values.
• For example architectures, see UEMU (RISC) or FUNK Emu (x86-type CISC), or any real architecture like PIC, MIPS, or ARM.  Feel free to copy or steal as much as you like, as long as you cite your sources.
• Hint: a simpler more regular architecture will be easier for you to design, validate, and test.
• For example, something complicated like x86 makes it really tough to write machine code by hand.
• Relevant prereqs: CS 441/EE 443, and their prereq CS 301.
2. Write a program for your new architecture that computes prime numbers up to an input value using the Sieve of Eratosthenes.
• You should use the machine instructions for the machine you just designed above.
• Pseudocode:
```int N=read_integer();
int factors[N]={0};
for (int P=2;P<N;P++) {
if (factors[P]==0) { /* yes, it is prime */
write_integer(P);

/* Mark all its multiples as not prime */
for (int M=P+P;M<N;M+=P) factors[M]=1;
}
}
```
(Try this in NetRun now!)
• It's probably easiest if you just make the array start at address zero in memory, so array index i is at pointer value i, but you can make the memory allocation as complex as you like.
3. Write a you86 simulator in C++ that simulates your prime-finding program on your new architecture.
• You should validate that it finds primes correctly.  (This is where you pay for all your sins in the previous steps!)
• The compiled program input to the simulator should consist only of an array of machine code instructions, with no symbols or magic references (although comments are highly recommended).  It is OK to hardcode the program into the simulator as a big static array.
4. Benchmark your simulator to get the actual real CPU times in nanoseconds to simulate your program as it finds primes up to N = 4, 8, 16, 32, and 64.
• If the simulator reads or writes to the console or disk as it runs the program, expect that I/O to dominate the simulator performance.  Ideally your simulator should be able to benchmark without any I/O (e.g., read program, start timer, copy simulator, run program on copy, repeat, end timer).
• On the NetRun Skylake machine, native performance (without a simulator) is:
```N=4: 5.32 ns/call
N=8: 12.2 ns/call
N=16: 25.6 ns/call
N=32: 51 ns/call
N=64: 103 ns/call```
(Try this in NetRun now!)
5. Compute the expected big-O time performance of your simulator running your prime finder.  Does your benchmark data match your big-O time?  Why or why not?
• Relevant prereqs: CS 411