Dr. Lawlor's Quick and Dirty C++11 Benchmark Results

See my blog for a detailed benchmark motivation and text summary.

All times are in nanoseconds per operation, including function call overhead. They're all on the exact same hardware, my i7-3632QM. See benchmark source code for details on each test. This data is also available as a spreadsheet.

Benchmark	g++-4.7	clang-3.3	Visual 2012	Comments
empty function 1	2.2	2.0	3.7	The first call is a little slower, before SpeedStep kicks in.
empty function 2	2.1	2.1	1.9
empty function 3	2.1	2.1	1.9
empty function 4	2.0	2.1	1.9
clock overhead	5670.0	5300.0	22.4	Syscall overhead is higher in Linux due to running in a Virtual Machine.
nullptr C++11	2.1	2.5	1.9
NULL C++03	2.0	2.3	1.9
read traits C++11	2.1	2.1	1.9
no traits C++03	2.0	2.1	1.9
with static_assert	2.1	2.1	1.9
no static_assert	2.1	2.0	1.9
bare pointer	32.1	2.1	59.7	Clang figured out the new/delete cycle, and elided it.
unique_ptr	32.5	2.0	59.7
shared_ptr	62.6	62.5	179.0
make_shared	41.0	36.6	119.0	make_shared merges the refcount allocation, so it's faster.
for int to size	8.4	9.0	7.5
for range (i v)	8.9	8.8	7.5
for iterator ++	7.4	5.4	11.2
for_each lambda	7.7	4.8	7.5
bare call	2.1	2.1	1.9
lambda call	2.1	2.1	1.9
auto=lambda call	2.0	2.0	1.9
nested lambda	2.1	2.0	1.9
capture[=]	2.1	2.1	1.9
capture[&]	2.0	2.0	1.9
bare call	2.1	2.0	1.9
function<> call	3.9	3.8	3.7
function<> build	8.5	11.8	29.8
function<member>	5.1	4.3	3.7
srand	1370.0	1600.0	14.9	glibc's srand runs rand many times to avoid less-random startup values.
rand	9.9	9.5	14.9
create engine	2.2	2.0	1.9
access engine	7.9	7.6	7.5
create distribution<int>	2.2	2.2	1.9
access distribution<int>	23.0	33.4	29.8
create distribution<float>	2.1	2.2	1.9
access distribution <float>	10.4	93.6	11.2	Not sure what happened to clang here. SSE?
access distribution <double>	16.6	116.0	14.9
access distribution<long double>	18.3	125.0	14.9
normal distribution <float>	31.4	84.0	29.8
normal distribution <double>	51.6	108.0	29.8
auto bind normal <double>	55.5	98.9	29.8
delegated ctor C++11	89.4	77.1		Visual doesn't do delegated constructors yet.
call from ctor C++03	80.3	77.9
int[3] assignment	2.0	2.0
int[3] {initial}	2.1	2.1
vector assignment	31.2	2.1
vector push_back	126.0	109.0
vector {initial}	32.5	2.1
no tuples	2.1	2.3	1.9
get<0>	2.0	2.1	1.9
tie(i,ignore)	2.1	2.1	1.9
tie(i,f)	2.1	2.1	1.9
in-class C++11 (call string)	54.7	57.2		Visual doesn't do in-class member initializers yet.
old ctor C++03 (call string)	57.1	54.3
in-class C++11 (declare string)	57.6	47.1
old ctor C++03 (declare string)	58.8	52.9
in-class C++11 (call char *)	2.1	2.5
old ctor C++03 (call char *)	2.1	2.5
raw string C++11	12.2	2.1
old string C++03	11.2	2.0
atoi 1-digit	14.8	14.9	29.8
stoi 1-digit	17.7	16.8	59.7
atoi 9-digit	36.9	37.6	44.7
stoi 9-digit	37.2	37.2	59.7
stol 9-digit	39.9	54.2	59.7
array create	4.5	4.1	3.7
array one value	4.3	6.7	3.7
array lil write	0.5	0.3	1.2
array lil w/r	0.6	0.4	1.8
array lil read	0.2	0.1	0.8
array med write	0.5	0.6	1.5
array med w/r	0.6	0.5	3.1
array med read	0.1	0.2	0.7
array big write				Runs out of stack space and crashes in Visual, so I skipped these.
array big w/r
array big read
pointer w/new create	12.1	6.5	22.4
pointer w/new one value	39.5	39.3	119.0
pointer w/new lil write	1.5	1.4	2.4
pointer w/new lil w/r	1.7	2.0	3.6
pointer w/new lil read	0.2	0.3	0.8
pointer w/new med write	0.8	0.6	2.3
pointer w/new med w/r	0.9	1.2	3.1
pointer w/new med read	0.2	0.2	0.7
pointer w/new big write	0.7	0.7	2.4
pointer w/new big w/r	0.9	1.3	2.4
pointer w/new big read	0.2	0.2	0.5
vector.reserve create	11.5	5.7	14.9
vector.reserve one value	45.7	39.6	119.0
vector.reserve lil write	2.7	2.4	3.6
vector.reserve lil w/r	2.8	2.5	4.8
vector.reserve lil read	0.4	0.2	0.8
vector.reserve med write	2.1	1.7	3.1
vector.reserve med w/r	2.3	1.9	3.1
vector.reserve med read	0.2	0.1	0.5
vector.reserve big write	2.0	1.8	2.4
vector.reserve big w/r	2.4	2.0	2.4
vector.reserve big read	0.2	0.2	0.6
vector create	7.8	6.1	14.9
vector one value	42.7	41.8	119.0
vector lil write	7.3	7.0	19.1	Reserve makes a big difference in write performance.
vector lil w/r	7.9	7.4	19.1
vector lil read	0.4	0.3	0.8
vector med write	2.4	2.2	9.2
vector med w/r	2.7	2.3	9.1
vector med read	0.2	0.1	1.0
vector big write	4.4	3.8	9.8
vector big w/r	4.5	4.2	9.8
vector big read	0.2	0.2	0.6
deque create	166.0	155.0	119.0
deque one value	177.0	147.0	358.0
deque lil write	4.0	3.8	38.2
deque lil w/r	4.2	4.0	38.2
deque lil read	0.9	0.8	6.1
deque med write	2.9	2.3	24.4
deque med w/r	3.5	3.0	48.9
deque med read	0.8	0.8	4.9
deque big write	2.8	2.7	29.3
deque big w/r	3.3	2.9	29.3
deque big read	0.8	0.8	5.3
forward_list create	6.0	4.3	11.2
forward_list one value	35.2	34.7	89.5
forward_list lil write	31.9	31.2	76.4
forward_list lil w/r	33.2	37.6	76.4
forward_list lil read	1.8	1.8	3.1
forward_list med write	29.7	31.3	73.3
forward_list med w/r	33.7	31.5	73.3
forward_list med read	2.5	2.3	2.9
forward_list big write	29.3	28.9	68.4
forward_list big w/r	34.0	34.2	78.2
forward_list big read	2.3	2.4	3.3
list create	6.6	4.6	119.0
list one value	37.9	40.4	119.0
list lil write	42.5	26.8	76.4
list lil w/r	31.9	33.5	76.4
list lil read	1.8	1.8	3.1
list med write	29.8	32.8	73.3
list med w/r	34.1	34.6	97.8
list med read	1.9	2.4	3.9
list big write	33.0	32.1	78.2
list big w/r	33.9	41.3	78.2
list big read	2.5	2.5	3.9
map create	10.4	7.0	89.5
map one value	45.5	49.0	179.0
map lil write	58.6	77.2	153.0
map lil w/r	62.5	85.6	153.0
map lil read	5.0	5.5	6.1
map med write	106.0	162.0	195.0
map med w/r	105.0	168.0	196.0
map med read	6.9	7.6	7.8
map big write	166.0	257.0	196.0
map big w/r	174.0	267.0	196.0
map big read	10.5	10.1	8.5
unordered_map create	104.0	122.0	239.0
unordered_map one value	219.0	156.0	239.0
unordered_map lil write	84.0	82.1	153.0
unordered_map lil w/r	80.5	78.8	153.0
unordered_map lil read	2.1	2.3	3.1
unordered_map med write	101.0	106.0	195.0
unordered_map med w/r	94.5	95.2	196.0
unordered_map med read	2.9	2.9	6.8
unordered_map big write	108.0	102.0	176.0
unordered_map big w/r	111.0	110.0	186.0
unordered_map big read	3.3	3.1	11.7

Here's an easier to read summary of the container performance. Writes are actually bad for *any* data structure, but especially bad for list and map, which I blame on the memory allocator . Here "lil" means 100, "med" means 10000, and "big" means 100000 elements. All of them fit in cache.

	array	pointer	vector w/reserve	vector	deque	forward_ list	list	map	unordered_ map
create	4.5	12.1	11.5	7.8	166.0	6.0	6.6	10.4	104.0
one value	4.3	39.5	45.7	42.7	177.0	35.2	37.9	45.5	219.0

lil write	0.5	1.5	2.7	7.3	4.0	31.9	42.5	58.6	84.0
med write	0.5	0.8	2.1	2.4	2.9	29.7	29.8	106.0	101.0
big write	n/a	0.7	2.0	4.4	2.8	29.3	33.0	166.0	108.0

lil read	0.2	0.2	0.4	0.4	0.9	1.8	1.8	5.0	2.1
med read	0.1	0.2	0.2	0.2	0.8	2.5	1.9	6.9	2.9
big read	n/a	0.2	0.2	0.2	0.8	2.3	2.5	10.5	3.3

OLD 2012-12-30 version performance. This version used _ftime64_s/gettimeofday timer instead of the new C++11 chrono::high_resolution_clock. Times are still in nanoseconds per operation.

Benchmark	g++ 4.7	clang 3.3	Visual 2012	Comments
nullptr C++11	2.2	2.5	2.1
NULL C++03	2.2	2.4	2.1
read traits C++11	2.1	2.0	2.1
no traits C++03	2.1	2.1	2.1
with static_assert	2.0	2.1	2.0
no static_assert	2.0	2.1	2.1
bare pointer	30.2	2.1	55.3	Clang figured out the new/delete cycle here, and elided it.
unique_ptr	32.4	2.1	61.0
shared_ptr	63.6	82.6	146.9
for int to size	0.9	0.9	0.6
for range (i v)	0.8	1.0	0.6
for iterator ++	0.8	0.5	0.8
for_each lambda	0.7	0.5	0.8
bare call	2.1	2.0	2.0
lambda call	2.1	2.2	2.0
auto=lambda call	2.1	2.1	2.1
nested lambda	2.1	2.1	2.0
capture[=]	2.2	2.1	2.1
capture[&]	2.1	2.1	2.0
bare call	2.1	2.1	2.1
function<> call	3.5	3.8	4.1
function<> build	8.0	12.2	22.2
function<member>	4.2	4.4	4.1
srand	1434.9	1416.0	15.5	glibc's srand prespins rand, which takes a while.
rand	9.9	9.5	14.5
create engine	2.2	2.2	2.0
access engine	7.5	7.3	6.3
create distribution<int>	2.1	2.0	2.1
access distribution<int>	23.7	35.0	20.5
create distribution<float>	2.2	2.0	2.0
access distribution <float>	9.0	109.1	11.2	Not sure what happened to clang here. Might need SSE flags.
access distribution <double>	17.4	110.8	11.9
access distribution<long double>	18.1	119.4	10.6
normal distribution <float>	30.2	78.1	32.9
normal distribution <double>	48.6	104.2	32.4
auto bind normal <double>	49.3	106.9	31.5
delegated ctor C++11	74.2	80.7		Visual doesn't do delegated constructors yet.
call from ctor C++03	77.6	77.2
int[3] assignment	2.1	2.1
int[3] {initial}	2.1	2.0
vector assignment	32.5	2.0
vector push_back	116.9	105.8
vector {initial}	30.4	2.1
no tuples	2.0	2.1	2.0
get<0>	2.1	2.1	2.1
tie(i,ignore)	2.1	2.1	2.0
tie(i,f)	2.1	2.0	2.0
in-class C++11 (call string)	57.8	53.9		Visual doesn't do in-class member initializers yet.
old ctor C++03 (call string)	56.2	54.0
in-class C++11 (declare string)	57.3	54.6
old ctor C++03 (declare string)	56.7	53.2
in-class C++11 (call char *)	2.1	2.0
old ctor C++03 (call char *)	2.1	2.1
raw string C++11	13.2	2.0
old string C++03	13.0	2.1
atoi 1-digit	14.5	15.2	25.8
stoi 1-digit	17.7	16.8	53.9
atoi 9-digit	39.0	35.9	41.0
stoi 9-digit	41.8	38.7	67.7
stol 9-digit	40.9	38.4	67.7
int[100] container
empty	2.1	2.1	4.0
1 to i	2.1	2.1	2.0
10 to i	0.2	0.2	0.2
100 to i	0.3	1.3	0.3
array container
empty	2.4	2.1	4.1
1 to i	2.0	2.0	2.0
10 to i	0.2	0.2	0.2
100 to i	0.4	1.3	0.6
vector container
empty	2.1	2.1	4.5
1 to back	36.3	30.9	94.4
10 to back	24.8	23.7	59.5
100 to back	7.2	6.6	13.0
1 to pre	33.7	31.1	92.5
10 to pre	4.1	4.9	11.2
100 to pre	2.7	1.7	2.7
deque container
empty	153.4	141.3	75.3
1 to back	157.8	150.5	259.4
10 to back	15.4	16.1	44.3
100 to back	3.7	2.7	32.3
1 to front	183.0	184.4	263.2
10 to front	22.8	20.8	44.3
100 to front	4.2	2.9	26.2
forward_list container
empty	2.2	2.1	4.5
1 to front	32.1	29.0	74.4	The per-element allocations are really a killer here.
10 to front	37.2	37.3	64.1
100 to front	31.5	29.9	74.5
list container
empty	2.1	2.1	66.8
1 to back	35.1	32.3	135.4
10 to back	40.0	41.8	72.5
100 to back	34.2	31.3	89.1
1 to front	33.8	31.7	141.1
10 to front	38.4	43.8	76.3
100 to front	31.9	34.2	90.3
map container
empty	5.0	4.9	70.6
write 1	42.1	45.5	152.6
write 10	46.2	49.6	96.1
write 100	61.1	78.8	122.1
write 1000	69.3	105.8	138.7
write 10000	89.8	142.8	160.9
write 10000,r1	139.0	188.9	228.1	This does one read per write.
write 10000,r10	57.5	65.6	76.3	This does ten reads per write.
unordered_map container
empty	71.5	83.0	162.1
write 1	118.2	133.7	255.6
write 10	63.1	72.2	141.9
write 100	69.4	70.1	134.3
write 1000	96.3	96.7	125.0
write 10000	102.2	103.0	156.3
write 10000,r1	112.1	114.8	259.4
write 10000,r10	21.9	22.1	126.3	On linux, unordered_map is much faster. On Windows, plain map is much faster (?!).

See Dr. Lawlor's homepage or email me.