Becoming a 3D expert in 20 minutes

1. 3D-Vectors
3D vectors are three numbers that describe distances in terms of axis segments. In other words; the distances to travel along each axis.

Positions are described as a distance from the origin.

1.1. Addition and subtraction

...is performed in a component-wise fashion. Say we have A + B = C, we first travel along all axes for the components of A and then do the same for B. C is then the distances along each axis from where we started.

Similarly, the distance from point F to point T is T - F.

1.2. Length

The Pythagorean theorem generalizes trivially to three dimensions, so the sum of the squared components gives the square length of the vector. Taking the square root removes the square and tells you how far you have to travel when going straight (not necessarily along axes).

1.3. Scale

Multiplying each component with a scalar multiplies the length (see 1.2) of a vector with that scalar and does not change its direction unless the sign of the scalar is negative in which we get a vector pointing into the exact opposite direction.

1.4. Rotation

Scaling by -1 (that is, changing the sign) of all components is a actually a rotation by 180 degrees around an arbitrary axis perpendicular to the vector's direction (we've had that case in 1.3).

1.4.1. Rotation by 90 degrees around a single axis of the coordinate space

Rotation around one of the coordinate axes in 3D is actually a "2D thing" (or, put slightly more formally; a planar operation): The component of the axis rotated around remains unchanged.

A 90 degree rotation around one axis means swapping two components and changing the sign of one of them. E.g. turn (x,y,z) into (-y,x,z) for rotating around z by 90 degrees.

1.4.2. Rotation by arbitrary angles

Cosine (say, for x) and sine (for y) give us the coordinates of a point on a circle with radius 1 at some angle enclosed with the x-axis, rotated around the z axis. Let's say a point like this becomes our new, rotated x-axis vector. Swapping and negating it gives us a new, rotated, y-axis (see 1.4.1). The z-axis remains unchanged - we're rotating around it (every point within it is called a fix point of the rotation as it does not move).

Scaling these new axes with the x/y components of a vector gives us a rotated vector.

new_x_axis = vec3( cos(a), sin(a), 0.0f); // unit circle coordinate new_y_axis = vec3(-sin(a), cos(a), 0.0f); // 90 degrees rotation of it   new_z_axis = vec3(   0.0f,   0.0f, 1.0f); // new z axis is the old z axis vec3 rotated = new_x_axis * v.x + new_y_axis * v.y + new_z_axis * v.z;

1.4.3. Handedness

Have you wondered why I swapped the sign of y and not of x reading 1.4.1? Have you ever wondered why math teachers go counter-clockwise on circles?

Look at your right hand - you can easily build a coordinate space with thumb, pointer finger, and middle finger for the x,y, and z axes respectively putting them in roughly perpendicular arrangement (or let's say "there is an easy way, go figure it out" - don't break your fingers or at least don't blame me if you do). Now, when making x (the thumb) point right and y (indication finger) point up, the middle finger (the z axis) points towards you. That's it! I've been actually going clockwise. And so do math teachers (although, they rarely tell) - it's just we're looking at it from the front.

Using a right-handed scheme is just a convention of how to arrange a coordinate system, however, it's not too bad of an idea to stick to some defined order not to get confused too easily.

Negating an axis vector changes the handedness of the coordinate system. So does applying an anticyclic permutation to the axis vectors - in other words; swapping them: (X,Y,Z) -> (X,Z,Y), (Y,X,Z), or (Z,Y,X) Cyclic permutations, (X,Y,Z) -> (Z,X,Y), or (Y,Z,X), leave the handedness of the coordinate system unchanged.

1.5. Dot Product (a.k.a. Scalar Product)

It's just the sum of the components of a component-wise product of two vectors and answers the the question

'        "How much do two vectors point into the same direction"?  '

It becomes equal to zero when two vectors point into perpendicular directions and negative when vectors point (to some amount) into opposite directions. Put more precisely, its value is the product of both vectors' lengths times the cosine of their enclosed angle.

With a normalized direction, the dot product yields the length to travel from (0,0,0) in that direction to the point closest to the position described by the other factor.

1.6. Cross Product

A component-wise difference of two vectors that have been previously been multiplied (again, per-component) with the other in a permuted space (see 1.4.3) yields the cross product.

This sounds horribly complicated and boils down to six multiplications and three subtractions that answer the questions:

'        "Where is the perpendicular direction of the two vectors"?  '

'        "How perpendicular are these two vectors"?  ' The phrasing of the first question above is somewhat ambiguous, so we have to add "in right-handed arrangement" to be clear on the signs. This is actually the direction found by the cross product.

The length of the resulting vector answers the second question. It becomes equal to zero for vectors pointing into the same or exact opposite direction (that is, enclose an angle of 0 or 180 degrees). Put more precisely the length is the product of both vectors' lengths times the sine of their enclosed angle.

1.7. Box product (a.k.a. Scalar Triple Product)

The box product is simply dot and cross product chained together for three vectors dot(A, cross(B, C)).

Its absolute value describes the volume of a skewed box (or "Parallelepiped") with the vectors as its edges. More interestingly, its sign is positive when the three vectors form a right-handed space and we get zero when all vectors lie within a plane.

2. Matrices
...can be used to describe coordinate systems and map vectors described in one space to another.

2.1. Chaining Transforms

Rotation, scale, shear can be expressed as 3x3 matrices and the can be chained together. E.g:

V1 = M0 V0                | transform V0 with M0 to get V1    V2 = M1 V1                 | transform V1 with M1 to get V2    V3 = M2 V2                 | transform V2 with M2 to get V3

V3 = (M2 M1 M0) v0        | same thing, transform V0 with the matrix product

When multiplying one matrix with another from the left M_new = M_add M_old its transformation applies to the result of previous transforms stored in M_old. In other words it applies in the already-transformed space.

When multiplying one matrix with another from the right M_new = M_old M_add its transformation applies to the input of previous transforms. In other words it applies to the local space described by the matrix. This is the case in OpenGL when calling glMultMatrix, glRotate, etc.

2.2. How vector transformation actually works

When a vector is multiplied with a matrix from the right (as is the typical convention), the first element in the target vector is the sum of pairwise multiplication of the source vectors components with the elements in the first matrix row. This sounds confusing, but is actually quite simple:

/ t0 \    / m00 m10 m20 \   / s0 \ | t1  | = |  m01 m11 m21  | |  s1  | \ t2 /    \ m02 m12 m22 /   \ s2 /

t0 = m00 s0 + m10 s1 + m20 s2

Similarly, the other components of the target vector are dervied as

t1 = m01 s0 + m11 s1 + m21 s2, and t2 = m02 s0 + m12 s1 + m22 s2.

What does it mean? There are basically two ways of looking at it. I'll start with the easier and less misleading one:

'        It's just a weighted sum of column vectors!  '

Right. The column vectors are the axes of the source coordinate system given in target space. The weights are the source vector's components - the amount to travel along each (local) axis (see 1.1 and 1.3).

Hey, that's simple. Multiplying matrices becomes just as simple: We transform every axis.

2.3. The second interpretation of transformation?

Three dot products (see 1.5) - one for each target component, each of the corresponding matrix row with the input vector. In other words, we calculate the segment length on the target axes.

''Then the rows are the axes of the target coordinate system given in source space?''

No! Not in general. Only if all axis vectors have a length of one and are perpendicular in respect to each other. That is, only rotations - no scale, no shear, no projection (see below). In this case however, inversion of a 3x3 matrix is just as simple as making its rows become columns (or, in other words transpone it).

Note: What's called an "Orthogonal Matrix" has to have unit length axes - having them to be orthogonal is not eneugh. Don't have it confuse you when reading more mathematical sources.

2.4. Inverse?

Well, yeah. Rotating, growing, shrinking through the universe is essentially the same as having the universe rotate, grow and shrink in opposite amounts in reverse order - the only difference is from where we're looking at it.

Source and target spaces are swapped for an inverse transform and you can get there by rotating with negated angles, scaling by reciprocal amounts, etc. and doing so in opposite order.

Also you can trust your math library of choice on doing the right thing (efficiently solving a system of equations for the destination axes) if you don't remember how you built the matrix.

If you are lucky and there are only rotations you can simply turn your head by 90 degrees reading the matrix (see 2.3). In other cases this is a bad idea, as the component are at random scale and you'll get nothing from it except for a hurting neck.

2.5. Homogeneous Coordinates and Translation

This is the trick to also allow translation to happen. We use a 4x4 matrix instead of a 3x3 one, now. Also we add another vector component called W that we set to 1 for positions and 0 for directions.

The fourth column of the matrix now describes where the source space is rooted in target space using the coordinate system of the target space. When w is 0 the extra column does nothing, when w is 1 it's adds the fourth matrix column to the result.

When chaining a translation matrix in source space (that is, multiplying it from the right as glTranslate, see 2.1), the translation is transformed to target space and added to the fourth column. The rest is and identity transform. Multiplying it from the left is even simpler: In this case the translation is added as-is to the fourth column.

The cool thing about it is: We can now also chain translations in the same way as other transforms.

2.6. Perspective Transforms

Perspective projection is not linear as it involves an inverse scale that depends on the z-coordinate. We have added an unused matrix row in 2.5, however. Wouldn't it be cool to have it all in one matrix as far as possible?

Dividing x,y, and z components of the target vector by its w component does the trick. Looking at the corresponding matrices, we can see that w becomes the negated z coordinate of the input vector. Also, 'l - r', 't - b' and 'f - n' can be easily identified as frustum width, height and depth.

So the first two diagonal elements just scale the frustum bounds to -1..1 in the near plane at z = -1 (we're in a right-handed system so looking down the negative z axis but the depth buffer, and hence screen space is organized in opposite direction).

Column three is zero for the transforms of the x and y component, unless we're off center and have a skewed frustum. This is typically not the case (you have to use glFrustum in an unconventional way to get there), so it's insignificant for understanding what's going on. In the case of having zeroes, the frustum extends with increasing depth around (0, 0). The souce x/y coordinates shrink around (0,0) - a wider range of x/y coordinates becomes visible than in the near plane.

OK, we got the screen coordinates, now - only missing is the depth buffer input. The third matrix row takes care of that part by translating the z coordinate into the 'near'..'far' range (-1..1).

2.6.1. Another way to look at perspective projection

There's a point in the world where my eye is at. Then, there's a point in space where the screen is at (the near plane). A ray that starts in the eye and orthogonally pierces the near plane, touhes the far plane at a segment length of 'far'. Appropriately scaled line segments through the corner points of the near plane give the corners of the far plane - even if the focal point (read "eye position") is off-center.

2.6.2. Deriving the furstum boundaries

In screen space, 'left'..'right' and 'bottom'..'top' are -1..1, while 'near'..'far' is -1..1.

Given the full transform (that is P M, where P is the projection matrix and M the modelview matrix when using legacy OpenGL) transforming appropriate vectors with its inverse yields the frustum boundaries.

As these are positional vectors, the input w coordinate should be 1 and x, y, and z coordinates of the transformed vector should be devided by its w coordinate (see 2.5. and 2.6).

3. Unit Quaternions
...can be used as an alternative to 3x3 matrices representing rotations (they form a division algebra like complex numbers, using three orthogonal imaginary planes instead of just one imaginary direction).

As complex numbers are used in electrical engineering to keep track of the phase in AC circuits (also called "Phasors" in this context), quaternions allow us to represent rotations around specific axes by specific angles in space.

3.1. Different ways to represent rotations

Any number of rotations can be combined into one (the resulting axis is the eigenvector of the corresponding 3x3 matrix), so an axis and an angle provide a similarly compact representation of a rotation than a quaternion, which consists of one real component and a three-dimensional vector for its imaginary components. Unlike an axis/angle representation, quaternion multiplication provides a way to combine rotations as with 3x3 matrices and it's even slightly cheaper to compute and more stable (floatingpoint operations induce rounding errors, that can cause a 3x3 matrix to lose its orthogonality - quaternions do not have this problem and restoring the orthogonality of a 3x3 matrix is much more expensive than normalizing a quaternion). Transforming vectors with a quaternion on the other hand is more expensive than with 3x3 matrices.

3.2. Interpolation

Normalized linear interpolation of a quaternion's components already work. It's equivalent to moving its orientation on lines cut through a sphere.

Some special care allows to move more uniformly, that is on the surface of the sphere. This method is commonly referred to as "slerp".

When there are more than just two rotations to interpolate, there's still a discontinuity when moving on to the next line (picture it as drawing a line strip on a sphere surface). Splines solve this issue. And as they can be built upon bezier curves based on "lerp" (linear interpolation), a similar scheme can be applied elevating the curves to the sphere surface using "slerp".

3.3. The right tool for the job

So, when to use quaternions?

- Having to interpolate,

- having to combine many rotations, and

- having to deal with axes and angles and keep flexible (axis/angle representations are more easily accessible from quaternions than from rotation matrices).

When not to use quaternions?

- Having to transform many vectors,

- lacking understanding of matrix transforms hoping they'll safe your day (they most probably won't and just complicate debugging).