CS 381 Fall 2012 > Lecture Notes for Thursday, September 20, 2012

CS 381 Fall 2012
Lecture Notes for Thursday, September 20, 2012

The Details of Transformations

Overview

OpenGL stores a transformation as a \(4\times 4\) matrix. In the Vertex Processing section of the pipeline, a vertex is stored as a 4-D vector.

Note: OpenGL does not specify implementation details, only an interface. Thus, when we say that OpenGL “stores” data in a certain way, what we really mean is that it acts like the data are stored in that way. Internal data formats might be different, in order to increase efficiency.

A transformation is applied to a vector by multiplying the matrix by the vector (plus a division operation, which we will discuss later). To make a single matrix representing one transformation performed after another (e.g., a rotation followed by a translation), multiply the two matrices representing the original transformations.

For each of OpenGL’s internal transformations (model/view, projection, etc.), there is a stack of matrices. The top of the stack represents the current transformation. We have already seen this in operation: recall the glPushMatrix and glPopMatrix commands.

Scaling & Rotation

It is probably not clear yet why we would want to represent a 3-D point by a 4-D vector. So, for the moment, we will pretend that OpenGL represents a vertex using a 3-D vector, and transformations are stored as \(3\times 3\) matrices.

Here is the matrix that would be produced by glScaled(a, b, c) in our pretend world:

\[ \begin{pmatrix} a&0&0\\ 0&b&0\\ 0&0&c \end{pmatrix}. \]

We can check that this does the right thing:

\[ \begin{pmatrix} a&0&0\\ 0&b&0\\ 0&0&c \end{pmatrix} \, \begin{pmatrix} x\\ y\\ z \end{pmatrix} = \begin{pmatrix} ax\\ by\\ cz \end{pmatrix}. \]

Similarly, we can do rotation with a \(3\times 3\) matrix. We will not cover the formulas in full generality, but here is how to do a rotation about the \(z\)-axis. The matrix that would be produced by glRotated(t, 0.,0.,1.), in our pretend world, is the following (with \(t\) being in degrees, of course):

\[ \begin{pmatrix} \cos\,t&-\sin\,t&0\\ \sin\,t&\cos\,t&0\\ 0&0&1 \end{pmatrix}. \]

Again, we can check that this does the right thing.

Translations & 4-D

If we try to do a translation with a \(3\times 3\) matrix, we find that it cannot be done. In particular, glTranslated(a, b, c) should send the point \((0,0,0)\) to \((a,b,c)\). However, multiplying a \(3\times 3\) matrix by the zero vector, always results in the zero vector, not the vector we want.

We solve this problem by moving to 4-D. We represent the point \((x,y,z)\) by the vector \(\langle x,y,z,1 \rangle\). And we represent a transformation by a \(4\times 4\) matrix.

Here is the (actual!) matrix produced by glTranslated(a, b, c):

\[ \begin{pmatrix} 1&0&0&a\\ 0&1&0&b\\ 0&0&1&c\\ 0&0&0&1 \end{pmatrix}. \]

It works, too:

\[ \begin{pmatrix} 1&0&0&a\\ 0&1&0&b\\ 0&0&1&c\\ 0&0&0&1 \end{pmatrix} \, \begin{pmatrix} x\\ y\\ z\\ 1 \end{pmatrix} = \begin{pmatrix} x+a\\ y+b\\ z+c\\ 1 \end{pmatrix}. \]

Any transformation previously represented by a \(3\times 3\) matrix can now be represented by a \(4\times 4\): start with the old matrix, add a fourth column of three zeroes, then add a fourth row containing \(0\), \(0\), \(0\), \(1\). For example, here is the actual matrix generated when we do glScaled(a, b, c):

\[ \begin{pmatrix} a&0&0&0\\ 0&b&0&0\\ 0&0&c&0\\ 0&0&0&1 \end{pmatrix}. \]

See showmat.cpp for code that displays the current model/view matrix.

Perspective & Homogeneous Form

We do a perspective projection based on the synthetic-camera model. Place the camera into the world at the origin, facing in the \(-z\) direction. Put the viewport in front of the camera at distance \(c\). Then the viewport lies at \(z=-c\).

Consider a point \((x,y,z)\) in the scene. We want to find its projection: a point \((a,b,-c)\) that lies on the line from \((x,y,z)\) to the origin. (To simplify our computations, suppose that \(x, y \gt 0\) and \(z \lt 0\).)

Because the two right triangles are similar, we know that \(b/c = y/(-z)\), and so \(b = -cy/z\). Computations for \(a\) are along the same lines; we obtain \(a = -cx/z\).

We cannot divide using matrix multiplication; the only operations involved are addition and multiplication. Thus, we cannot use matrix multiplication alone to find \(-cy/z\) and \(-cx/z\). We need to include a division operation in the computations performed in the pipeline.

Here is how it is done. When we do glVertex3d(x, y, z), this point actually goes through the pipeline as a 4-D vector: \(\langle kx, ky, kz, k \rangle\), for some nonzero number \(k\). This is called the homogeneous form of the 3-D vector \(\langle x, y, z \rangle\). When the vertex enters the pipeline, we (that is, the people who implement the pipeline) get to pick any nonzero value for \(k\) that we want. If we have any sense, then we let \(k=1\). This means that \(\langle x, y, z \rangle\) turns into \(\langle x, y, z, 1 \rangle\), just as we said before. (However, if we pick a different value for \(k\) then all of the transformations we discussed, still work.)

When the vertex leaves the Vertex Processing section of the pipeline, it needs to be convered back into a 3-D vector. But now we do not get to choose the 4th coordinate. So we must divide by whatever the 4th coordinate is. The 4-D vector \(\langle x,y,z,w \rangle\) becomes \(\langle x/w,y/w,z/w \rangle\). This is the 4th-coordinate division. It allows us to do the division necessary for a perspective projection, while not messing up our earlier transformations (rotation, scaling, translation).

Now we consider how to do a perspective projection using matrix-vector multiplication, followed by a 4th-coordinate division. Consider the following matrix:

\[ \begin{pmatrix} -c&0&0&0\\ 0&-c&0&0\\ 0&0&1&-1\\ 0&0&1&0 \end{pmatrix}. \]

If we multiply the above matrix by the vector \(\langle x,y,z,1 \rangle\), then the result is \(\langle -cx,-cy,z-1,z \rangle\). Performing the 4th-coordinate division, we obtain \(\langle -cx/z,-cy/z,(z-1)/z \rangle\).

We see that the \(x\) and \(y\) values in our result are exactly what we want them to be. But the \(z\)-coordinate is a little strange. However, we typically only use the computed \(z\)-coordinate for hidden-surface removal. So all that matters is that our final \(z\)-coordinate gets larger when our original \(z\) gets larger, and smaller when it gets smaller. And, in fact, if \(z<0\), then \((z-1)/z\) does get larger and smaller as \(z\) does.

Note: When the actual matrix for a perspective projection is generated, a few other things need to be taken into account, such as the field-of-view angle, the aspect ratio of the viewport, and the maximum and minimum \(z\) values of the objects that we want to appear in our scene. So, while the above matrix will do a perspective projection, it might not be the actual matrix generated by any particular OpenGL call. See the documentation for glFrustum and gluPerspective for the actual matrices.

The Whole Story

Now we can put together a complete description of how OpenGL handles transformations.

A transformation is stored as a \(4\times 4\) matrix.
Each such matrix is stored as an entry in a stack of matrices, with the top of the stack representing the current transformation.
When a vertex \(\langle x,y,z \rangle\) enters the pipeline, it is turned into \(\langle kx,ky,kz,k \rangle\), for some nonzero number \(k\) (if the implementors have any sense, then they choose \(k\) to be 1). That is how it passes through the Vertex Processing section of the pipeline.
A transformation is applied to a vertex by computing the matrix-vector product.
When a vertex exits the pipeline, it is turned back into a 3-D vector, with each coordinate divided by the 4th coordinate. That is, a vertex \(\langle x,y,z,w \rangle\) becomes \(\langle x/w,y/w,z/w \rangle\).
Doing glLoadIdentity sets the matrix for the current transformation to be the \(4\times 4\) identity matrix.
Doing most other transformation commands (glTranslate*, glRotate*, glScale*, glOrtho, etc.) generates a matrix, and then multiplies this by the current transformation matrix, on the right. That is, if \(A\) is the current transformation matrix, and \(N\) is the matrix generated by the command, then the new transformation matrix is \(A\,N\).

Hidden-Surface Removal

Introduction

Hidden Surface Removal (HSR) means ensuring that objects hidden behind other objects are not drawn. An object that is fully or partially hidden behind another, is said to be occluded.

A generalization involves translucent objects, in which one can see one object through another.

We categorize HSR methods based on where they fit into the rendering pipeline—or whether they fit into the pipeline at all. Thus, we have three kinds of HSR methods.

Geometry-based methods.
Pixel-based methods.
Non-pipeline methods.

We will look briefly at the first two categories. Then we will talk about OpenGL’s built-in HSR. For non-pipeline methods, see CS 481.

Choice of method is affected by the following.

Convenience.
Compatibility with other CG techniques.
- In particular, we are interested in whether a method generalizes conveniently to handle translucency.
- Of course we also care whether a method fits into our pipeline ata ll.
Performance. Not just in the better/worse sense. See below.

Performance Issues

The rendering pipeline, being a pipeline, can have multiple pieces of data in it at once. For example, when a primitive moves from Vertex Processing to Rasterization, then another primitive can be moved into Vertex Processing. Since the sections of the pipeline are operating in parallel, the slowest section is what determines how fast our rendering goes. We say the rendering speed is bound by the performance of the slow section. Thus, a running rendering program can generally be placed in one of two categories.

Geometry-bound.: A geometry-bound program is one in which operations on vertices and geometric primitives (including those done before data enter the pipeline) are what limits the rendering speed.
Fill-bound.: A fill-bound program is one in which operations on fragments and pixels are what limits the rendering speed.

So, for example, if a program is fill-bound, then speeding up the Vertex Processing will not increase our frame rate at all. Rather, the Vertex Processing section of the pipeline will just spend more time waiting for Fragment Processing. On the other hand, moving some work from Fragment Processing to Vertex Processing could be a good idea; one way to do this might be to switch from a pixel-based HSR method to a geometry-based method.

Geometry-Based Methods

Geometry-based HSR methods do HSR at the drawing-primitive level. Thus, generally, they handle polygons.

Depth Sort

If we have a list of the polygons in a scene, sorted by depth, then we can render the scene in two different ways.

Painter’s Algorithm: Draw polygons back-to-front, with new colors overwriting existing ones. Advantages: Generalizes nicely to handling translucency.
Reverse Painter’s Algorithm: Draw polygons front-to-back, with colored pixels never being changed. Advantages: Never writes a pixel more than once. May result in a speed-up in some architectures.

Now, how do we sort polygons? The most straightforward method is to find the barycenter of each polygon, and sort by the depth of the barycenter. The barycenter of a triangle is computed by finding the vector sum of the vertices and dividing by 3.

This sorting method works well much of the time. It generally gives good results on surfaces composed of fine polygonal meshes. However, a simple Depth Sort does not work all the time (to emphasize: a simple Depth Sort can give incorrect results).

If fact, doing HSR correctly simply by sorting polygons, is impossible for some scenes. For example, we can have three polygons \(P\), \(Q\), \(R\), with \(P\) occluding \(Q\), \(Q\) occluding \(R\), and \(R\) occluding \(P\). There is no way to sort such a list properly.

Solutions to this involve splitting polygons. Here is a method that does always result in correctly rendered scenes.

Newell’s Algorithm

Newell’s Algorithm is based on a simple Depth Sort; however, it correct results for all scenes composed of planar polygons. Since a polygon is planar, we can talk about the “plane of the polygon”, meaning the plane that the polygon lies in.

Note that the \(x\)-coordinates used by points in a polygon, form an interval. We call this the polygon’s \(x\)-interval, and similarly \(y\)-interval and \(z\)-interval.

Newell’s Algorithm adds extra tests to a Depth Sort. As part of a sort, we compare two polygons \(P\), \(Q\). We ask a series of questions about them, in order of computational difficulty. if the answer to any question is “yes”, then we know what to do; otherwise, we proceed to the next question.

Are the \(z\)-intervals of \(P\), \(Q\) non-overlapping? If yes, then order the two polygons by depth in the obvious way.
Are the \(x\)-intervals of \(P\), \(Q\) non-overlapping? If yes, then the order of the two polygons does not matter.
Are the \(y\)-intervals of \(P\), \(Q\) non-overlapping? If yes, then the order of the two polygons does not matter.
Are all the vertices of \(P\) behind the plane of polygon \(Q\)? If yes, then order so that \(P\) is behind \(Q\).
Are all the vertices of \(Q\) in front of the plane of polygon \(P\)? If yes, then order so that \(P\) is behind \(Q\).
Are the extents of \(P\), \(Q\) non-overlapping? If yes, then order the two polygons as necessary.

If we make it through all of the above questions with “no” answers, then we split one polygon along the plane of the other. We remove the split polygon from the list of primitives, and add the two pieces to the list.

Once we have a depth-sorted list of polygons, we render using the Painter’s Algorithm or the Reverse Painter’s Algorithm, as before.

Advantages of Newell’s Algorithm:

Works well with translucency.

Disadvantages:

It is intrusive. It requires that a description of the entire scene be available, and it must be allowed to modify that description.
It must have knowledge of what kinds of primitives are used (polygons?) and how to construct new ones.
The entire sort/split procedure must be redone any time the scene or the camera position changes.

The last disadvantage above can be dealt with by using special data structures to hold depth relationships in a scene (look up “BSP Tree”).

Pixel-Based Methods

Pixel-based HSR methods do HSR at the pixel level. If such a method is used with the rendering pipeline, then it will be in the Fragment Processing section.

\(z\)-Buffering

A \(z\)-buffer is a 2-D array that holds \(z\) values, one for each pixel in the framebuffer.

To use a \(z\)-buffer for HSR, we begin by set all \(z\) values to the maximum possible distance. As each fragment exits the pipeline, compare its \(z\) value with the one in the buffer, and discard the fragment if it is not closer. If a fragment is not discarded, then write its color to the framebuffer and its \(z\) value to the \(z\)-buffer.

Recall that the \(z\)-coordinate of vertices exiting the Vertex Processing section of the pipeline is generally not an actual \(z\)-coordinate in our world. Rather, it is a value that gets larger as the actual \(z\)-coordinate of the object gets larger, and smaller as it gets smaller. This is all we need, for \(z\)-buffering to work. However, it is true that, for very far away \(z\) values, the \(z\)-coordinate of vertices exiting the pipeline, changes very little as the actual \(z\)-coordinate of the object changes. Thus, \(z\)-buffering can give incorrect results for very distant objects that occlude other very distant objects.

Advantages of \(z\)-buffering:

Very easy and convenient. Just set it up, and forget about it.
Versatile. Once implemented, works for any kind of primitive, not just polygons.

Disadvantages:

Lots of operations to do, and high memory usage. Note: This disadvantage does not matter much these days, since we have specialized hardware to do \(z\)-buffering at high speed.
Does not work well with translucency.

Built-In HSR in OpenGL

OpenGL includes easy-to-use facilities for \(z\)-buffering. In GL-speak, the \(z\)-buffer is called a depth buffer, since it holds depth values. The comparison of depth values is called the depth test; it occurs in the Fragment Processing section of the pipeline. When the depth test is enabled, each fragment’s depth is compared with the depth stored at the corresponding pixel location. The depth test fails if the fragment depth is greater than or equal to the stored depth; in that case, the fragment is discarded. The depth test passes if the fragment depth is less than the stored depth; in that case, the fragment’s color is stored in the framebuffer (color buffer in GL-speak) and the fragment’s depth is stored in the depth buffer.

Here is how to use OpenGL’s depth buffer in a GLUT program.

Allocate a depth buffer by putting GLUT_DEPTH into your glutInitDisplayMode call.
```
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
```
Enable the depth test before you draw (in your initialization or in the display function) using glEnable.
```
glEnable(GL_DEPTH_TEST);
```
You can also disable the depth test with glDisable(GL_DEPTH_TEST). For example, you might want to disable the test before rendering documentation text in the graphics window. You can reenable the test as above.
Clear the depth buffer by adding GL_DEPTH_BUFFER_BIT to your glClear call. The various constants in your glClear call should be separated by bitwise-or operators (“|”), just as with glutInitDisplayMode.
```
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
```

CS 381 Fall 2012: Lecture Notes for Thursday, September 20, 2012 / Updated: 20 Sep 2012 / Glenn G. Chappell / ggchappell@alaska.edu

CS 381 Fall 2012 Lecture Notes for Thursday, September 20, 2012

The Details of Transformations

Overview

Scaling & Rotation

Translations & 4-D

Perspective & Homogeneous Form

The Whole Story

Hidden-Surface Removal

Introduction

Performance Issues

Geometry-Based Methods

Depth Sort

Newell’s Algorithm

Pixel-Based Methods

\(z\)-Buffering

Built-In HSR in OpenGL

CS 381 Fall 2012
Lecture Notes for Thursday, September 20, 2012