Dr. Lawlor's Quick and Dirty C++11 Benchmark Results

See my blog for a detailed benchmark motivation and text summary.

All times are in nanoseconds per operation, including function call overhead.  They're all on the exact same hardware, my i7-3632QM. See benchmark source code for details on each test.  This data is also available as a spreadsheet.
Benchmark g++-4.7 clang-3.3 Visual 2012 Comments
empty function 1 2.2 2.0 3.7 The first call is a little slower, before SpeedStep kicks in.
empty function 2 2.1 2.1 1.9
empty function 3 2.1 2.1 1.9
empty function 4 2.0 2.1 1.9
clock overhead 5670.0 5300.0 22.4 Syscall overhead is higher in Linux due to running in a Virtual Machine.
nullptr C++11 2.1 2.5 1.9
NULL C++03 2.0 2.3 1.9
read traits C++11 2.1 2.1 1.9
no traits C++03 2.0 2.1 1.9
with static_assert 2.1 2.1 1.9
no static_assert 2.1 2.0 1.9
bare pointer 32.1 2.1 59.7 Clang figured out the new/delete cycle, and elided it.
unique_ptr 32.5 2.0 59.7
shared_ptr 62.6 62.5 179.0
make_shared 41.0 36.6 119.0 make_shared merges the refcount allocation, so it's faster.
for int to size 8.4 9.0 7.5
for range (i v) 8.9 8.8 7.5
for iterator ++ 7.4 5.4 11.2
for_each lambda 7.7 4.8 7.5
bare call 2.1 2.1 1.9
lambda call 2.1 2.1 1.9
auto=lambda call 2.0 2.0 1.9
nested lambda 2.1 2.0 1.9
capture[=] 2.1 2.1 1.9
capture[&] 2.0 2.0 1.9
bare call 2.1 2.0 1.9
function<> call 3.9 3.8 3.7
function<> build 8.5 11.8 29.8
function<member> 5.1 4.3 3.7
srand 1370.0 1600.0 14.9 glibc's srand runs rand many times to avoid less-random startup values.
rand 9.9 9.5 14.9
create engine 2.2 2.0 1.9
access engine 7.9 7.6 7.5
create distribution<int> 2.2 2.2 1.9
access distribution<int> 23.0 33.4 29.8
create distribution<float> 2.1 2.2 1.9
access distribution <float> 10.4 93.6 11.2 Not sure what happened to clang here. SSE?
access distribution <double> 16.6 116.0 14.9
access distribution<long double> 18.3 125.0 14.9
normal distribution <float> 31.4 84.0 29.8
normal distribution <double> 51.6 108.0 29.8
auto bind normal <double> 55.5 98.9 29.8
delegated ctor C++11 89.4 77.1
Visual doesn't do delegated constructors yet.
call from ctor C++03 80.3 77.9

int[3] assignment 2.0 2.0

int[3] {initial} 2.1 2.1

vector assignment 31.2 2.1

vector push_back 126.0 109.0

vector {initial} 32.5 2.1

no tuples 2.1 2.3 1.9
get<0> 2.0 2.1 1.9
tie(i,ignore) 2.1 2.1 1.9
tie(i,f) 2.1 2.1 1.9
in-class C++11 (call string) 54.7 57.2
Visual doesn't do in-class member initializers yet.
old ctor C++03 (call string) 57.1 54.3

in-class C++11 (declare string) 57.6 47.1

old ctor C++03 (declare string) 58.8 52.9

in-class C++11 (call char *) 2.1 2.5

old ctor C++03 (call char *) 2.1 2.5

raw string C++11 12.2 2.1

old string C++03 11.2 2.0

atoi 1-digit 14.8 14.9 29.8
stoi 1-digit 17.7 16.8 59.7
atoi 9-digit 36.9 37.6 44.7
stoi 9-digit 37.2 37.2 59.7
stol 9-digit 39.9 54.2 59.7
array create 4.5 4.1 3.7
array one value 4.3 6.7 3.7
array lil write 0.5 0.3 1.2
array lil w/r 0.6 0.4 1.8
array lil read 0.2 0.1 0.8
array med write 0.5 0.6 1.5
array med w/r 0.6 0.5 3.1
array med read 0.1 0.2 0.7
array big write


Runs out of stack space and crashes in Visual, so I skipped these.
array big w/r



array big read



pointer w/new create 12.1 6.5 22.4
pointer w/new one value 39.5 39.3 119.0
pointer w/new lil write 1.5 1.4 2.4
pointer w/new lil w/r 1.7 2.0 3.6
pointer w/new lil read 0.2 0.3 0.8
pointer w/new med write 0.8 0.6 2.3
pointer w/new med w/r 0.9 1.2 3.1
pointer w/new med read 0.2 0.2 0.7
pointer w/new big write 0.7 0.7 2.4
pointer w/new big w/r 0.9 1.3 2.4
pointer w/new big read 0.2 0.2 0.5
vector.reserve create 11.5 5.7 14.9
vector.reserve one value 45.7 39.6 119.0
vector.reserve lil write 2.7 2.4 3.6
vector.reserve lil w/r 2.8 2.5 4.8
vector.reserve lil read 0.4 0.2 0.8
vector.reserve med write 2.1 1.7 3.1
vector.reserve med w/r 2.3 1.9 3.1
vector.reserve med read 0.2 0.1 0.5
vector.reserve big write 2.0 1.8 2.4
vector.reserve big w/r 2.4 2.0 2.4
vector.reserve big read 0.2 0.2 0.6
vector create 7.8 6.1 14.9
vector one value 42.7 41.8 119.0
vector lil write 7.3 7.0 19.1 Reserve makes a big difference in write performance.
vector lil w/r 7.9 7.4 19.1
vector lil read 0.4 0.3 0.8
vector med write 2.4 2.2 9.2
vector med w/r 2.7 2.3 9.1
vector med read 0.2 0.1 1.0
vector big write 4.4 3.8 9.8
vector big w/r 4.5 4.2 9.8
vector big read 0.2 0.2 0.6
deque create 166.0 155.0 119.0
deque one value 177.0 147.0 358.0
deque lil write 4.0 3.8 38.2
deque lil w/r 4.2 4.0 38.2
deque lil read 0.9 0.8 6.1
deque med write 2.9 2.3 24.4
deque med w/r 3.5 3.0 48.9
deque med read 0.8 0.8 4.9
deque big write 2.8 2.7 29.3
deque big w/r 3.3 2.9 29.3
deque big read 0.8 0.8 5.3
forward_list create 6.0 4.3 11.2
forward_list one value 35.2 34.7 89.5
forward_list lil write 31.9 31.2 76.4
forward_list lil w/r 33.2 37.6 76.4
forward_list lil read 1.8 1.8 3.1
forward_list med write 29.7 31.3 73.3
forward_list med w/r 33.7 31.5 73.3
forward_list med read 2.5 2.3 2.9
forward_list big write 29.3 28.9 68.4
forward_list big w/r 34.0 34.2 78.2
forward_list big read 2.3 2.4 3.3
list create 6.6 4.6 119.0
list one value 37.9 40.4 119.0
list lil write 42.5 26.8 76.4
list lil w/r 31.9 33.5 76.4
list lil read 1.8 1.8 3.1
list med write 29.8 32.8 73.3
list med w/r 34.1 34.6 97.8
list med read 1.9 2.4 3.9
list big write 33.0 32.1 78.2
list big w/r 33.9 41.3 78.2
list big read 2.5 2.5 3.9
map create 10.4 7.0 89.5
map one value 45.5 49.0 179.0
map lil write 58.6 77.2 153.0
map lil w/r 62.5 85.6 153.0
map lil read 5.0 5.5 6.1
map med write 106.0 162.0 195.0
map med w/r 105.0 168.0 196.0
map med read 6.9 7.6 7.8
map big write 166.0 257.0 196.0
map big w/r 174.0 267.0 196.0
map big read 10.5 10.1 8.5
unordered_map create 104.0 122.0 239.0
unordered_map one value 219.0 156.0 239.0
unordered_map lil write 84.0 82.1 153.0
unordered_map lil w/r 80.5 78.8 153.0
unordered_map lil read 2.1 2.3 3.1
unordered_map med write 101.0 106.0 195.0
unordered_map med w/r 94.5 95.2 196.0
unordered_map med read 2.9 2.9 6.8
unordered_map big write 108.0 102.0 176.0
unordered_map big w/r 111.0 110.0 186.0
unordered_map big read 3.3 3.1 11.7


Here's an easier to read summary of the container performance.  Writes are actually bad for *any* data structure, but especially bad for list and map, which I blame on the memory allocator .  Here "lil" means 100, "med" means 10000, and "big" means 100000 elements.  All of them fit in cache.

array pointer vector
w/reserve
vector deque forward_
list
list map unordered_
map
create 4.5 12.1 11.5 7.8 166.0 6.0 6.6 10.4 104.0
one value 4.3 39.5 45.7 42.7 177.0 35.2 37.9 45.5 219.0










lil write 0.5 1.5 2.7 7.3 4.0 31.9 42.5 58.6 84.0
med write 0.5 0.8 2.1 2.4 2.9 29.7 29.8 106.0 101.0
big write n/a 0.7 2.0 4.4 2.8 29.3 33.0 166.0 108.0










lil read 0.2 0.2 0.4 0.4 0.9 1.8 1.8 5.0 2.1
med read 0.1 0.2 0.2 0.2 0.8 2.5 1.9 6.9 2.9
big read n/a 0.2 0.2 0.2 0.8 2.3 2.5 10.5 3.3


OLD 2012-12-30 version performance.  This version used _ftime64_s/gettimeofday timer instead of the new C++11 chrono::high_resolution_clock.  Times are still in nanoseconds per operation.

Benchmark g++ 4.7 clang 3.3 Visual 2012 Comments
nullptr C++11 2.2 2.5 2.1
NULL C++03 2.2 2.4 2.1
read traits C++11 2.1 2.0 2.1
no traits C++03 2.1 2.1 2.1
with static_assert 2.0 2.1 2.0
no static_assert 2.0 2.1 2.1
bare pointer 30.2 2.1 55.3 Clang figured out the new/delete cycle here, and elided it.
unique_ptr 32.4 2.1 61.0
shared_ptr 63.6 82.6 146.9
for int to size 0.9 0.9 0.6
for range (i v) 0.8 1.0 0.6
for iterator ++ 0.8 0.5 0.8
for_each lambda 0.7 0.5 0.8
bare call 2.1 2.0 2.0
lambda call 2.1 2.2 2.0
auto=lambda call 2.1 2.1 2.1
nested lambda 2.1 2.1 2.0
capture[=] 2.2 2.1 2.1
capture[&] 2.1 2.1 2.0
bare call 2.1 2.1 2.1
function<> call 3.5 3.8 4.1
function<> build 8.0 12.2 22.2
function<member> 4.2 4.4 4.1
srand 1434.9 1416.0 15.5 glibc's srand prespins rand, which takes a while.
rand 9.9 9.5 14.5
create engine 2.2 2.2 2.0
access engine 7.5 7.3 6.3
create distribution<int> 2.1 2.0 2.1
access distribution<int> 23.7 35.0 20.5
create distribution<float> 2.2 2.0 2.0
access distribution <float> 9.0 109.1 11.2 Not sure what happened to clang here.  Might need SSE flags.
access distribution <double> 17.4 110.8 11.9
access distribution<long double> 18.1 119.4 10.6
normal distribution <float> 30.2 78.1 32.9
normal distribution <double> 48.6 104.2 32.4
auto bind normal <double> 49.3 106.9 31.5
delegated ctor C++11 74.2 80.7
Visual doesn't do delegated constructors yet.
call from ctor C++03 77.6 77.2

int[3] assignment 2.1 2.1

int[3] {initial} 2.1 2.0

vector assignment 32.5 2.0

vector push_back 116.9 105.8

vector {initial} 30.4 2.1

no tuples 2.0 2.1 2.0
get<0> 2.1 2.1 2.1
tie(i,ignore) 2.1 2.1 2.0
tie(i,f) 2.1 2.0 2.0
in-class C++11 (call string) 57.8 53.9
Visual doesn't do in-class member initializers yet.
old ctor C++03 (call string) 56.2 54.0

in-class C++11 (declare string) 57.3 54.6

old ctor C++03 (declare string) 56.7 53.2

in-class C++11 (call char *) 2.1 2.0

old ctor C++03 (call char *) 2.1 2.1

raw string C++11 13.2 2.0

old string C++03 13.0 2.1

atoi 1-digit 14.5 15.2 25.8
stoi 1-digit 17.7 16.8 53.9
atoi 9-digit 39.0 35.9 41.0
stoi 9-digit 41.8 38.7 67.7
stol 9-digit 40.9 38.4 67.7
int[100] container



empty 2.1 2.1 4.0
1 to i 2.1 2.1 2.0
10 to i 0.2 0.2 0.2
100 to i 0.3 1.3 0.3
array container



empty 2.4 2.1 4.1
1 to i 2.0 2.0 2.0
10 to i 0.2 0.2 0.2
100 to i 0.4 1.3 0.6
vector container



empty 2.1 2.1 4.5
1 to back 36.3 30.9 94.4
10 to back 24.8 23.7 59.5
100 to back 7.2 6.6 13.0
1 to pre 33.7 31.1 92.5
10 to pre 4.1 4.9 11.2
100 to pre 2.7 1.7 2.7
deque container



empty 153.4 141.3 75.3
1 to back 157.8 150.5 259.4
10 to back 15.4 16.1 44.3
100 to back 3.7 2.7 32.3
1 to front 183.0 184.4 263.2
10 to front 22.8 20.8 44.3
100 to front 4.2 2.9 26.2
forward_list container



empty 2.2 2.1 4.5
1 to front 32.1 29.0 74.4 The per-element allocations are really a killer here.
10 to front 37.2 37.3 64.1
100 to front 31.5 29.9 74.5
list container



empty 2.1 2.1 66.8
1 to back 35.1 32.3 135.4
10 to back 40.0 41.8 72.5
100 to back 34.2 31.3 89.1
1 to front 33.8 31.7 141.1
10 to front 38.4 43.8 76.3
100 to front 31.9 34.2 90.3
map container



empty 5.0 4.9 70.6
write 1 42.1 45.5 152.6
write 10 46.2 49.6 96.1
write 100 61.1 78.8 122.1
write 1000 69.3 105.8 138.7
write 10000 89.8 142.8 160.9
write 10000,r1 139.0 188.9 228.1 This does one read per write.
write 10000,r10 57.5 65.6 76.3 This does ten reads per write.
unordered_map container



empty 71.5 83.0 162.1
write 1 118.2 133.7 255.6
write 10 63.1 72.2 141.9
write 100 69.4 70.1 134.3
write 1000 96.3 96.7 125.0
write 10000 102.2 103.0 156.3
write 10000,r1 112.1 114.8 259.4
write 10000,r10 21.9 22.1 126.3 On linux, unordered_map is much faster.  On Windows, plain map is much faster  (?!).

See Dr. Lawlor's homepage or email me.