The future will be like the past.Hardware designers use locality to predict the future, which allows it to start things early, so they finish sooner.
mov rax,0 mov rcx,0 ; loop counter start: cmp rcx,1500 jle skip add rax,rcx skip: add rcx,1 cmp rcx,1000 ; loop runs 1000 times jle start ret
This loop is a little slower, 0.63 ns/iteration, because the branch predictor hits a hiccup when the branch switches from never hitting to always hitting:
mov rax,0 mov rcx,0 ; loop counter start: cmp rcx,500 jle skip add rax,rcx skip: add rcx,1 cmp rcx,1000 ; loop runs 1000 times jle start ret
This loop runs
at *half* the speed of the other two, 1.2 ns/iter, because it
unpredictably switches from taken to not taken, meaning the branch
predictor guesses wrong repeatedly.
mov rax,0 mov rcx,0 ; loop counter start: mov rdx,rcx and rdx,0x35 cmp rdx,0x13 jle skip add rax,rcx skip: add rcx,1 cmp rcx,1000 ; loop runs 1000 times jle start ret
(Try switching
the compare value around; the worst case is about half taken, half
not taken.)