Very Long (not in C++!) |
||||||
long long |
2 |
|||||
int |
2 |
4 |
||||
short |
2 |
4 |
8 |
|||
Bytes -> (char) |
2 |
4 |
8 |
16 |
||
Hex Digits -> |
2 |
4 |
8 |
16 |
32 |
|
Bits -> |
4 |
8 |
16 |
32 |
64 |
128 |
2 values |
16 values |
256 |
64KB (segments) |
4GB |
16 EB |
Big: 2.5 x 1038 |
<< and >> |
Hex constants (0xf00d) |
"sizeof", pointers, RAM & disk |
MS-DOS "word" |
Normal code; IPv4 |
New 64-bit code |
IPv6 |
enum {n=1000*1000};This takes about 2.3 million nanoseconds to loop over one million integers, and so takes 2.3 ns/operation.
int arr[n];
int foo(void) {
for (int i=0;i<n;i++) {
arr[i]++;
}
return 0;
}
enum {n=1000*1000};This is now 1.3 million nanoseconds for one million bytes, so 1.3 ns/operation. Of course, a char can't hold numbers as big as an int, but not every problem needs huge numbers, and the performance increase is substantial.
char arr[n];
int foo(void) {
for (int i=0;i<n;i++) {
arr[i]++;
}
return 0;
}
enum {n=1000*1000};Because there are fewer compare and jump instructions, we're now down to 1.0ns/operation. You can unroll more times, for more speed:
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=2) {
arr[i]++;
arr[i+1]++;
}
return 0;
}
enum {n=1000*1000};Unrolling four times gets us to 0.9ns/operation. You can unroll more, but it's tough to get much past 0.9ns/operation with pure unrolling--you quickly hit the point of "diminishing returns". Most of the code is now pure memory accesses:
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=4) {
arr[i]++;
arr[i+1]++;
arr[i+2]++;
arr[i+3]++;
}
return 0;
}
c9: 80 80 00 00 00 00 01 add BYTE PTR [eax],0x1We can improve this by noticing that we're just adding one to each of four bytes. You can do this efficiently by "packing" all four bytes into one integer, and then adding 0x01010101 (which consists of four bytes, each with value one) to the integer version. This is faster because a 4-byte integer addition takes about as long as one 1-byte char addition. As usual, overflow would be a problem.
d0: 80 80 01 00 00 00 01 add BYTE PTR [eax+0x1],0x1
d7: 80 80 02 00 00 00 01 add BYTE PTR [eax+0x2],0x1
de: 80 80 03 00 00 00 01 add BYTE PTR [eax+0x3],0x1
e5: 83 c0 04 add eax,0x4
e8: 3d 3f 42 0f 00 cmp eax,0xf423f
ed: 7e da jle c9 <foo+0x5>
enum {n=1000*1000};This adds one to one million integers using only 1/4 million integer operations, and it only takes half a million nanoseconds, for a total time of 0.5ns/operation!
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=4) {
*(int *)&arr[i] += 0x01010101;
}
return 0;
}
enum {n=1000*1000}; /* number of items (chars) */
enum {ni=n/sizeof(int) }; /* number of integers */
int arr[ni];
int foo(void) {
for (int i=0;i<ni;i++) {
arr[i] += 0x01010101;
}
return 0;
}
enum {n=1000*1000}; /* number of items (hex digits) */This loops over 1 million hex digits in just 1/4 million nanoseconds, or just 0.25 ns/operation!
enum {ni=n/(2*sizeof(int)) }; /* number of integers */
int arr[ni];
int foo(void) {
for (int i=0;i<ni;i++) {
arr[i] += 0x11111111;
}
return 0;
}