| Very Long (not in C++!) |
||||||
| long long |
2 |
|||||
| int |
2 |
4 |
||||
| short |
2 |
4 |
8 |
|||
| Bytes -> (char) |
2 |
4 |
8 |
16 |
||
| Hex Digits -> |
2 |
4 |
8 |
16 |
32 |
|
| Bits -> |
4 |
8 |
16 |
32 |
64 |
128 |
| 2 values |
16 values |
256 |
64KB (segments) |
4GB |
16 EB |
Big: 2.5 x 1038 |
| << and >> |
Hex constants (0xf00d) |
"sizeof", pointers, RAM & disk |
MS-DOS "word" |
Normal code; IPv4 |
New 64-bit code |
IPv6 |
enum {n=1000*1000};
int arr[n];
int foo(void) {
for (int i=0;i<n;i++) {
arr[i]++;
}
return 0;
}
This takes about 2.3 million nanoseconds to loop over one million integers, and so takes 2.3 ns/operation. enum {n=1000*1000};
char arr[n];
int foo(void) {
for (int i=0;i<n;i++) {
arr[i]++;
}
return 0;
}
This is now 1.3 million nanoseconds for one million bytes, so 1.3
ns/operation. Of course, a char can't hold numbers as big as an
int, but not every problem needs huge numbers, and the performance
increase is substantial.enum {n=1000*1000};
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=2) {
arr[i]++;
arr[i+1]++;
}
return 0;
}
Because there are fewer compare and jump instructions, we're now down
to 1.0ns/operation. You can unroll more times, for more speed:enum {n=1000*1000};
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=4) {
arr[i]++;
arr[i+1]++;
arr[i+2]++;
arr[i+3]++;
}
return 0;
}
Unrolling four times gets us to 0.9ns/operation. You can unroll
more, but it's tough to get much past 0.9ns/operation with pure
unrolling--you quickly hit the point of "diminishing returns".
Most of the code is now pure memory accesses:c9: 80 80 00 00 00 00 01 add BYTE PTR [eax],0x1We can improve this by noticing that we're just adding one to each of four bytes. You can do this efficiently by "packing" all four bytes into one integer, and then adding 0x01010101 (which consists of four bytes, each with value one) to the integer version. This is faster because a 4-byte integer addition takes about as long as one 1-byte char addition. As usual, overflow would be a problem.
d0: 80 80 01 00 00 00 01 add BYTE PTR [eax+0x1],0x1
d7: 80 80 02 00 00 00 01 add BYTE PTR [eax+0x2],0x1
de: 80 80 03 00 00 00 01 add BYTE PTR [eax+0x3],0x1
e5: 83 c0 04 add eax,0x4
e8: 3d 3f 42 0f 00 cmp eax,0xf423f
ed: 7e da jle c9 <foo+0x5>
enum {n=1000*1000};
char arr[n];
int foo(void) {
for (int i=0;i<n;i+=4) {
*(int *)&arr[i] += 0x01010101;
}
return 0;
}
This adds one to one million integers using only 1/4 million integer
operations, and it only takes half a million nanoseconds, for a total
time of 0.5ns/operation!enum {n=1000*1000}; /* number of items (chars) */
enum {ni=n/sizeof(int) }; /* number of integers */
int arr[ni];
int foo(void) {
for (int i=0;i<ni;i++) {
arr[i] += 0x01010101;
}
return 0;
}
enum {n=1000*1000}; /* number of items (hex digits) */
enum {ni=n/(2*sizeof(int)) }; /* number of integers */
int arr[ni];
int foo(void) {
for (int i=0;i<ni;i++) {
arr[i] += 0x11111111;
}
return 0;
}
This loops over 1 million hex digits in just 1/4 million nanoseconds, or just 0.25 ns/operation!