# Static Data: Global Variables and Constants

CS 301 Lecture, Dr. Lawlor

You can tell the assembler to keep some constants right next to your machine code, known as "static" data after the C++ keyword.  The handy way to get the address of your new constant is with a label, and the way to specify the constant's value is with one of the new instructions "dq" (data quadword), "dd" (data dword), "dw" (data word) or "db" (data byte).  These instructions reserve the corresponding amount of space, and initialize that space to the value you give.

The syntax for accessing these constants from normal assembly looks like:
`mov ... DWORD[somePtr] ...section .datasomePtr:    dd constant0`
Here are all the constant-generating "instructions":
 Instruction C++ Access Bits Bytes dq 0x3 long x=3; QWORD[somePtr] 64 8 dd 0x3 int x=3; DWORD[somePtr] 32 4 dw 0x3 short x=3; WORD[somePtr] 16 2 db 0x3 char x=3; BYTE[somePtr] 8 1

## Static Integers

Here's an example where we load a statically allocated integer from memory.
`mov eax,DWORD[myInt] ; copy this int into eaxretsection .datamyInt:	dd 0xa3a2a1a0 ; "data DWORD" containing this value`

(Try this in NetRun now!)

You can copy a pointer value into a register, too.  Here we're dereferencing a pointer stored in a register:
`mov rdx, someIntPtr ; copy the address myIntPtr into rdx (like C++: p=someIntPtr;)mov eax, DWORD [rdx] ; read memory rdx points to (like C++: return *p;)ret section .datasomeIntPtr:  ; A place in memory, where we're storing an integer.	dd 123 ; "data DWORD", our integer`

(Try this in NetRun now!)

A pointer to an array initially looks just like a pointer to anything else:
`mov rcx, myArray ; rcx points to myArray  (like C++: p=arr;)mov eax, DWORD [rcx] ; read memory pointed to by rcx (like C++: return *p;)ret section .datamyArray:  ; A place in memory, where we're storing some integers.	dd 100 ; "data DWORD", here our array element 	dd 101 ; 	dd 102 ; 	dd 103 ; `

(Try this in NetRun now!)

Here's an example where we index into our little 4-integer array:
`mov eax, DWORD [myArray+4*2] ; read myArrayret section .datamyArray:  ; A place in memory, where we're storing some integers.	dd 100 ; "data DWORD", here our array element 	dd 101 ; 	dd 102 ; 	dd 103 ; `

(Try this in NetRun now!)

## Static Bytes & Strings

You can access an individual byte from memory with the syntax BYTE[address].  Most instructions want DWORDs, not BYTEs, so you need to use a BYTE-friendly instruction like  "movzx" (move with zero-extend):
`	movzx reg,BYTE[address]`
Accessing data as bytes is useful for string processing, or to understand what really shows up in memory.

For example, here I'm defining a short 3-byte string, and reading one byte out:
`movzx eax,BYTE[myString + 2] ; read this byte into eaxretsection .datamyString:	db 'w','o','a'`

These are all equivalent ways to get the same 3-byte string:

 db 0x77 db 0x6f db 0x61 db 'w' db 'o' db 'a' `db 'w','o','a'` db 'woa' db "woa"

There are several standard functions that take a "C string": a pointer to a bunch of ASCII bytes, followed by a zero byte.  "puts" is one such function, and it prints the string you pass it plus a newline. We can call puts to print out our string like this:

`mov rdi,myString  ; points to string constant belowextern putscall putsretsection .datamyString:	db 'woa',0 ; need the trailing zero to mark the end of the string...`

(Try this in NetRun now!)

Here's an example where we load a byte from the middle of an integer.  Note that this returns 0xa2, since byte 0 is the 0xa0--the little byte--on our little-endian x86 machines.

`movzx eax,BYTE[myInt + 2] ; read this byte into eaxretsection .datamyInt:	dd 0xa3a2a1a0 ; "data DWORD" containing this value`

## Modifiable Static Data

By default, stuff in "section .data" is readable and writeable.  So this works fine:
`mov DWORD[myInt],7 ; overwrite our intmov eax,DWORD[myInt] ; copy the modified int into eaxretsection .datamyInt:	dd 2 ; "data DWORD" containing this value`

(Try this in NetRun now!)

But if you leave off the "section .data", the constant is stored next to the program's machine code in "section .text" (a weird ancient name; machine code is not human-readable text!).  This code section is readable but not writeable, so this segfaults:
`mov DWORD[myInt],7 ; overwrite our intmov eax,DWORD[myInt] ; copy the modified int into eaxretmyInt:	dd 2 ; "data DWORD" containing this value`

(Try this in NetRun now!)

You can even store *code* in the modifiable "section .data".
`call myFunctionretsection .datamyFunction:	mov eax,2	ret`

(Try this in NetRun now!)

The "mov" and "ret" instructions just emit bytes of machine code, identical to:
`call myFunctionretsection .datamyFunction:	db 0xb8,0x02,0x00,0x00,0x00,0xc3;  code for my function`

(Try this in NetRun now!)

However, when code is in modifiable memory, you can modify the machine code!  For example, if I know what bytes the assembler will output for "myFunction", I can actually figure out where to go in and modify the "myFunction" machine code, to change what the function returns!  In this case, I just want to skip in past the 0xb8 (mov opcode) and overwrite the constant being loaded:
`mov DWORD[myFunction+1],7 ; overwrite constant loaded by first, 0xb8 instructioncall myFunctionretsection .datamyFunction:	mov eax,2 ; <- modified at runtime!	ret`

(Try this in NetRun now!)

This returns 7, because the bytes of "myFunction" are modified before execution.

There's also a "section .bss" that contains zero-initialized storage.  In summary:
• section .text: machine code and non-writeable constants.
• section .data: read-write space, initialized to a specified value.
• section .bss: read-write space, initialized to zeros.

## Static Pointers

One common trick is to use pointers in the static data.  For example, I can build a static linked list like this:
`mov rcx,myFirstData ; cur=headkeep_printing:	mov edi,DWORD[rcx+8] ; print_int(cur->value)	extern print_int	push rcx	call print_int	pop rcx	mov rcx,QWORD[rcx] ; cur=cur->next	cmp rcx,0	jne keep_printingretsection .datamyFirstData:	dq mySecondData	dd 3mySecondData:	dq myThirdData	dd 7myThirdData:	dq 0 ; END of list	dd 0`

(Try this in NetRun now!)

## Interacting with C++ Globals

A C++ global variable or constant is implemented as a location in section .data, with an externally-visible linker name.  So C++ global variables are accessible from assembly, and vice versa:
 Access from C++ Access from Assembly Defined in C++ int totalCounter=3; ... totalCounter++; ... extern totalCounter; bring in from outside add DWORD[totalCounter],1 Defined in Assembly extern "C" int totalCounter; ... totalCounter++; ... global totalCounter; make visible from outside add DWORD[totalCounter],1 section .data totalCounter:     dd 3

Sadly, in assembly there's no way to get to global variables stored inside a C++ class or namespace--the "::" part of the name doesn't translate (because C++ "mangles" it).