REP STOSB Write AL to [RDI] a total of ECX times.These string instructions read from [rsi], which gets incremented like [rdi]:
REPE SCASB Find non-AL byte starting at [RDI] (keep repeating while [rdi]==al)
REPNE SCASB Find AL, starting at [RDI] (keep repeating while [rdi]!=al)
REP MOVSB Move ECX bytes from [RSI] to [RDI]
REPE CMPSB Find nonmatching bytes in [RDI] and [RSI] (keep repeating while [rdi]==[rsi])
REPNE CMPSB Find matching bytes in [RDI] and [RSI]
mov rcx,2 ; rep repeats this many times mov rax,'X' ; stosb stores al mov rdi,str ; stosb stores data to [rdi] rep stosb mov rdi,str extern puts call puts ret section .data str: db 'lawlor',0,0,0Prints "XXwlor", because stosb has overwritten rcx=2 chars with al='X'
mov rcx,10 ; rep repeats up to this many times mov rax,'l' ; scasb compares memory to al mov rdi,str ; scasb reads memory at [rdi] (and increments rdi) repe scasb extern puts call puts ret section .data str: db 'lllllawlor',0,0,0Prints "wlor", because each iteration increments rdi, and the iterations stopped when they hit 'a' ([rdi]!=al).
mov rcx,10 ; rep repeats up to this many times mov rax,'w' ; scasb compares memory to al mov rdi,str ; scasb reads memory at [rdi] (and increments rdi) repne scasb extern puts call puts ret section .data str: db 'lllllawlor',0,0,0
(Try this in NetRun now!)
Prints "lor" because the iterations stopped when they hit 'w' ([rdi]==al).
mov rcx,3 ; repne repeats this many times mov rsi,src ; movsb reads memory here (and increments) mov rdi,str ; movsb writes memory here (and increments) rep movsb mov rdi,str extern puts call puts ret section .data src: db 'NOPE',0 str: db 'lawlor',0,0,0Prints "NOPlor" because rcx==3, so "rep movsb" copied 3 chars from [rsi] to [rdi].
mov rcx,10 ; rep repeats up to this many times mov rsi,B ; cmpsb reads memory here (and increments) mov rdi,A ; cmpsb reads memory here (and increments) repe cmpsb extern puts call puts ret section .data A: db 'lolnope',0 B: db 'lolor',0,0,0
Prints "ope" because the repe cmpsb stopped when it hit the 'n' (the first place where [rdi]!=[rsi]).
mov rcx,10 ; rep repeats up to this many times mov rsi,B ; cmpsb reads memory here (and increments) mov rdi,A ; cmpsb reads memory here (and increments) repne cmpsb extern puts call puts ret section .data A: db 'lawlor was here',0 B: db 'yolobrozzz',0,0,0
(Try this in NetRun now!)
Prints " was here" because the repne cmpsb stopped when it hit the 'r' (the first place where [rdi]==[rsi]).
NetRun: Options -> Actions -> Time
mov rcx,10 ; rep repeats up to this many times mov rsi,B ; cmpsb reads memory here (and increments) mov rdi,A ; cmpsb reads memory here (and increments) repne cmpsb ret section .data A: db 'lawlor was here',0 B: db 'yolobrozzz',0,0,0 | mov rcx,10 ; rep repeats up to this many times mov rsi,B ; cmpsb reads memory here (and increments) mov rdi,A ; cmpsb reads memory here (and increments) ;repne cmpsb jmp check_first start: mov al,[rsi] ; load byte from rsi add rsi,1 mov cl,[rdi] ; load byte from rdi add rdi,1 cmp al,cl je done ; repne == break if equal sub rcx,1 ; "rep": decrement rcx check_first: cmp rcx,0 jne start done: ret section .data A: db 'lawlor was here',0 B: db 'yolobrozzz',0,0,0 |
16 ns/call | 5 ns/call |
The big lesson is: assume nothing. Here, a single instruction "repne cmpsb" is much slower than a big block of simple mov, add, and cmp calls, probably because the CPU internally has to translate that weird single "repne cmpsb" into those simpler instructions. Increasingly, CPUs are optimized for the common stuff, not the weird stuff.
(There are exceptions: "rep movsb" and "rep stosb" have good multi-core cache behavior, and are quite fast on some chips.)
See Dr. Agner Fog's optimization resources for all the gory details.