Blog

Notes on the Cross Product

2023-03-19T00:00:00+00:00

The cross product is often given in one of two ways: Either geometrically, or as a formula given in coordinates.

The geometric description

The cross product of two vectors in 3D Euclidean space results in a new vector that is orthogonal to both the input vectors, points in the direction given by the right hand rule (by convention), and has a length equal to the area of the paralellogram spanned by the two vectors.

The coordinate formula description

The cross product $c = a \times b$ of $a = (a_x, a_y, a_z)$ and $b = (b_x, b_y, b_z)$ is given by the formula $c = ( a_y b_z - a_z b_y, -a_x b_z + a_z b_x, a_x b_y - a_y b_x )$

An algebraic description

The cross product is less often introduced with an algebraic description. The algebraic properties are usually presented after giving the geometric motivation, but it can be fully determined based on it’s algebraic properties.

The cross product of two vectors $a$ and $b$ is bilinear in it’s arguments and anti-commutative, that is, $a \times b = - b \times a$. Bilinearity means that it is linear separately in each of its arguments:

Additivity: $(a + b) \times c = a \times c + b \times c$ , $a \times (b + c) = a \times b + a \times c$

and homogeniety: $(xa) \times b = x(a \times b) = a \times (xb)$,

Consider again the cross product $c = a \times b$. Let’s expand $a$ and $b$ out in terms of their coordinates in an orthonormal basis $e_1, e_2, e_3$.

$a = {\sum_{i=1}^3 a_i e_i}$

and

$b = \sum_{j=1}^3 b_j e_j$

Then by using the linearity of the cross product we get: $a \times b = \sum_{i=1}^3 \sum_{j=1}^3 (a_i e_i) \times (b_j e_j) = \sum_{i=1}^3 \sum_{j=1}^3 (a_i b_j) (e_i \times e_j)$

Then we use: $e_1 \times e_2 = e_3$, $e_2 \times e_3 = e_1$ and $e_3 \times e_1 = e_2$ and $e_i \times e_j = - e_j \times e_i$

and get

$a \times b = (a_y b_z - a_z b_y) (e_2 \times e_3) + (a_x b_z - a_z b_x)(e_1 \times e_3) + (a_x b_y - a_y b_x)(e_1 \times e_2)$

$a \times b = (a_y b_z - a_z b_y) e_1 - (a_x b_z - a_z b_x) e_2 + (a_x b_y - a_y b_x) e_3$

Which matches the coordinate definition of the cross product. Notice that since we had $e_1 \times e_3$ we had to use anti-commutativity to flip the order, from which we pick up a negative sign, to obtain $e_2$.

Interpretation of the cross product

Consider the case where the vectors $a$ and $b$ both lie in the plane spanned by $e_1$ and $e_2$, that is, the component of $e_3$ is zero. Then, the signed area of the parallelogram spanned by $a$ and $b$ has the same magnitude as the result of their cross product, namely, $| a_x b_y - a_y b_x |$. The same as the $e_3$ component of $a \times b$. This means that for general $a$ and $b$, the $e_3$ component is the same as the parallelogram spanned by the vectors $a$ and $b$ projected down on the $e_1 e_2$ plane. The corresponding is true for the other compoents.

Length Scales In Rendering

2023-02-25T00:00:00+00:00

Usually in rendering we consider spatial scale to be somewhat arbitrary, i.e we only deal with coordinates, and whether we use meters or centimeters as a base unit for our measurement is mostly irrelevant as long as we are consistent and use coordinates that are a reasonable fit for the content we wish to render.

But it is still interesting to consider the appropriate scale to use for rendering. Here are some potentially relevant sizes for reference:

Human hair diameter: 0.06 mm
Grain of sand: 0.06 to 2.0 mm
Pixel distance on a 4K PC monitor: 0.18 mm
Size of a human pupil: 2-9 mm
Diameter of a human eye: About 2.5 cm
Interpupillary distance: About 6 cm
Viewing distance from PC monitor: about 60 cm
Size of a large PC monitor : 70 cm wide, 40 cm high
Average human height: About 1.7m
Size length of typical countries if they were flat and square shaped: 100 to 3000 km
Radius of the earth: 6371 km

So the smallest and largest sizes listed here in the same units (meters) for comparison

0.00006
6000000.0

The scale difference is about $10^{12}$ or approximately $2^{40}$. The geometric mean is about 5.5 meters.

for an arena shooter style game you may only care about objects within at most a kilometer radius, in which case we may choose 1000 and 0.001 meters as our range, for which the geometric mean is exactly 1 meter.

According to the internet:

Humans can at best resolve two lines about 0.01 degrees apart: a 0.026 mm gap, 15cm the eyes. Typically objects 0.04mm wide, the width of a fine human hair, are just distinguishable by good eyes, objects 0.02mm wide are not.

From performing a quick and simple test of displaying a single white pixel on a black background on a 4k resolution 32 inch monitor (0.18 mm pixel distance), the pixel can clearly be seen at a distance of several meters.

Rasterization: Software Rasterization Pipeline

2023-02-10T00:00:00+00:00

A painting of a rasterization pipeline by DALL-E 2

This post will simply be an outline of my plans for my CPU software rasterization project.

Scope

This post will primarily concern just the rasterizer stage of a triangle-rasterizer software rendering pipeline. That is to say, it will mostly not discuss vertex shading or pixel shading, nor any other stage before or after rasterization. However, in the rasterization I also plan on including several sub-stages such as frustum culling, binning, occlusion culling, hierarchical Z-buffering, and so on. Some closely related systems will also be described such as the instancing and batched drawing approach.

Features

For perfomance reasons I will attempt to make as good use of SIMD, multithreading and cache efficiency as possible. This will impact several aspects of the design of the rasterizer. As an experiment I plan on writing a small library to use 16 element wide SIMD operations for integer single precision floating point data, similar to AVX-512 but implemented using AVX and AVX2, as well as some FMA instructions. The expected advantage of this is that each operand will correspond to an entire cache line, making full use of the cache capacity, and simplifying the task of making sure data is contiguous, not straddling cache line boundaries, and preventing false sharing between threads. It should also make it relatively straight forward to estimate the number of cache lines used in various parts of the code. The cache line count could be used as a heuristic, or parameter, for tuning data batch and block sizes during development.

Sub-stages

Frustum culling
Backface culling
Triangle setup ( some data may be re-used from the culling steps )
Triangle binning (low-res conservative rasterization)
Occlusion culling
Rasterization and depth testing

Input and output

The input to the procedure will be a pointer to a buffer of triangle vertex positions, a pointer to a buffer of triangle vertex indices, a triangle count, and triange index offset. The triangle input offset will be used to handle instancing and batched rendering. The output will be a Z-buffer to that can be used by the next shading steps, material calculation and so on, and a visibility buffer, which is simply a buffer, or image, of triangle indices. The indices will also have the instance index baked into them (using the triangle index offset), so the index will have to be decoded to find the actual triangle index for the mesh.

Frustum culling details

There are some notes on frustum culling on the ryg blog here

Backface culling details

When we have access to the normal, or equivalent information, we can simply backface cull triangles by computing a single dot product by the view direction and the triangle normal.

Triangle setup details

For computing triangle overlap one will need to compute the triangle edge functions. It is possible to save some work by making sure to only compute the parts that are invariant over the triangle just once.

Triangle binning details

Triangle binning basically amounts to performing overestimated conservative rasterization, rasterizing triangles over tiles instead of pixels. A pixel is usually rasterized by sampling a single point, but a tile is a screen space axis aligned rectangular region. The basic sampling step for this is essentially a rectangle-triangle overlap test. This is essentially overestimated conservative rasterization. In the 2D case it is possible to determine overlap by testing whether the tile is fully outside one of the triangle edges or the tile is fully outside the triangle bounding box. In the 3D case we do not always have appropriate bounding boxes for the triangles. Instead it is possible to instead use the test of whether the tile is fully outside one of the triangle edges or the triangle is fully outside one of the tile edges. This can be made efficient for a grid of tiles (where many edges are shared), and generalized to 3D and clipless rasterization where the edges become planes for both triangles and tiles.

The test is a series of dot products. The tile edges are axis aligned, so it is possible to omit one coordinate. For 2D this becomes a one-coordinate test. For 3D it becomes a two-coordinate test. The vertical tile edges have normals with zero $y$ (vertical) coordinate. Since we do not care about the length of the normal, just the sign of the dot products, we can just rotate the vector from the camera to where the edge crosses the screen coordinate axes by a quarter turn. Conceptually it is just an optimization of taking the cross product of two adjacent corner positions of the tile.

The overlap test between the tile and the triangle in 3D is closely related to frustum culling with a skew frustum. However, because there are many tiles and they all share their edges, it is better to reuse as much computation and testing as possible instead of doing many full skew frustum - tetrahedon overlap tests.

It is also worth remembering that the edges of the tiles can be defined to lie exactly on the sample point location, reducing the area of the tiles slightly compared to the case where the edges were taken to line up with the pixel borders. However, this requires one to be extra careful with numerical precision and rasterization rules.

Acceptance and rejection tests

There are three basic outcomes of the overlap test. Full acceptance, full rejection, and partial overlap. If at any point there is full acceptance or rejection, then that branch of the computation can complete, and some savings can be had. The cases where there is full overlap must be split into two cases based on whether it is the triangle that fully covers the tile or vice versa. The outcome can be broken down into these cases:

The triangle fully covers the tile ( all triangle vertices are “outside” but around the tile )
The tile fully covers the triangle ( All triangle vertices are inside the tile )
The triangle is fully outside the tile
The triangle and tile partially overlap

For the last case it is sufficient but not neccessary that some of the triangle vertices are inside the tile. But is is neccessary that some edge crossings occurr.

When evaluating the edge function of a triangle on the tile corners it is sufficient to only test the nearest corner, which can be selected using the direction of the edge normal.

Occlusion culling details

For occlusion culling it makes sense to use a hierarchical depth buffer using conservative depth for all but the highest resolution level. If we at any level can find a set of tiles that completely conver the object to be drawn, e.g a triangle or bounding box, and all the depth values in those tiles have a depth value that is nearer than the nearest point of that object, then we can safely cull it. To find the nearest depth value of a triangle one can use rasterization, or one can compute the nearest depth of the camera space bounding box without needing to compute the other values.

Rasterization and depth testing details

Visibility buffer details

every triangle index is an unsigned 32-bit integer value that represents the triangle index of the currently rendered mesh plus a triangle index offset given as a paramter to the rasterization function.

Implementation details

Test cases

Triangle rendering test cases

Crossing frustum clipping planes
Trapped in a box ( all pixels covered )
Ground planes (large triangles with perfectly vertical normals)
Depth order ( overlapping triangles behind eachother rendered in various orders )
Behind camera
Behind “near plane”
Backfacing triangles
Intersecting triangles
Crossing camera view direction plane
Two triangles ( fullscreen )
One triangle ( fullscreen )
Triangle orientations: Side/edge on triangle

Footnotes and references:

Visibility Buffer Rendering with Material Graphs, by John Hable

Rasterization on Larrabee, by Michael Abrash

Conservative rasterization, By Tomas Akenine-Moller and Timo Aila

Rasterization: Triangle Rasterization Without Clipping

2023-02-09T00:00:00+00:00

The usual way that triangle rasterization works is that there is a clipping step and a perspective divide step. The primary reason for the clipping step is to ensure that the perspective divide gives the correct projection on the screen even when the original one or two of the triangle vertices are behind the camera. In that case, a simple perspective divide by the depth coordinate (and projection onto the image plane) will yield the wrong result with regard to what the next steps in the rasterizer expects. In the projective setting, i.e using homogeneous coordinates, the zero depth coordinate corresponds to points at infinity, while points behind the camera can be interpreted as points that have gone past the point at infinity, wrapped around, and appeared again from the other side. Therefore there will be a disconnect between considering the vertices of the triangle in isolation as points, and viewing them as endpoints of line segments or triangles.

If triangles are also clipped against the top, bottom and sides of the view frustum, then all new triangle vertices (of visible triangles) will correctly land on the screen within the screen bounds.

The perspective divide and projection onto the image plane is done so that we can find the pixel coordinates of the vertices on the screen and rasterize the triangle in 2D instead of 3D.

The downside of performing explicit polygon clipping of triangles is that it can be a computationally expensive operation and be somewhat tricky to implement. In some cases a triangle will be clipped to produce two visible triangles on screen, which complicates the rasterization pipeline.

It is possible to avoid explicit polygon clipping altogether by using the fact that a triangle covers a pixel sample point if, and only if, the view-direction correspong to that pixel sample point is a member of the convex cone spanned by the vectors pointing from the camera position, the origin in camera space, to each of the verices of the triangle. The convex cone of a set of vectors contains all vectors that can be expressed as a linear combination of the spanning vectors with the restriction that the coefficients must be non-negative.

The test to check the membership of the view direction in the convex cone can be performed by performing a dot product with normals of each of the faces of the tetrahedra formed by the camera position and the triangle vertices, which intersect the origin. That is, all faces except the triangle itself. If all the dot products are positive, then the view-direction is a member of the convex cone. The normals of these planes can be found by taking the cross product of each of the ordered pairs of the vectors spanning the convex cone. The cross and dot products together amounts to the triple-product. It can also equivalently be computed via determinants.

The operation of computing cross and dot products, or triple products, is essentially the same as computing the matrix-vector product of the inverse of the 3x3 matrix consisting of the spanning vectors with the view-direction, up to a scaling factor. Note the connection between the explicit form of the 3x3 matrix inverse in terms of cross products and the determinant and Cramers rule. The scaling factor being the full determinant of the matrix of spanning vectors.

Rasterization: Deriving Triangle Edge functions

2023-01-21T00:00:00+00:00

Edge functions in the context of triange rasterization are functions that are used to determine which side of a triangle edge a pixel lies on. Conceptually, this can be calculated easily by using the dot product of the inwards facing normal of the edge and an offset vector from a point on the edge to the pixel position. The challenge lies in efficiently determining the projected edge on the screen from the 3D triangle edges and obtaining the inwards facing normal from it.

The projected edge on the screen is the intersection of the near plane with the new triangle formed by the two endpoints of the original triangle edge and the camera position. The normal of this plane is the cross product of the position vectors of the endpoint vertices $V_0 \times V_1$. The screenspace edge normal can then be found by projecting this triangle normal to lie in the image plane. An alternative approach is to use perspective projection on the vertices by dividing them by their Z-component, causing all the vertices to lie in the image plane. The projected triangle edge normals can then be found by performing a simple counter clockwise 2D rotation, given by $x’ = -y$ and $y’ = x$.

For a pixel with position vector $P$, a triangle with vertices $V_i = (x_i, y_i, z_i)$, the 2D counter clockwise rotation matrix $\textbf{R}$, the edge function from the vertex $V_0$ to $V_1$ can be computed by:

\[e(P) = (P - V_0)_{xy} \cdot \textbf{R} \left( \frac{V_1}{z_1} - \frac{V_0}{z_0} \right)_{xy}\]

Notice that the reciprocal Z-values appear again, like in the previous post, but this time, it’s only the Z-values of the vertices themselves, not the ones that go into the Z-buffer after having been interpolated. Mathematically we could also have scaled the function by $z_0 z_1$ to get rid of the divisions, since we only care about the sign of the edge function.

The edge functions are related to barycentric coordinates, but since the barycentric coordinates are slightly more expenisve to compute, we will prefer the edge function where the barycentric coordinates are not needed.

Rasterization: Deriving Z values

2023-01-20T00:00:00+00:00

For determining the values in the Z buffer one needs an algorithm to compute the z value for each pixel in a triangle. In the following we assume that we operate in camera space. That is, the camera is placed at the origin and looking along the Z-axis in this coordinate system. Further, we are gived the coordinates of the pixels we are rendering in this camera space. Each pixel corresponds to a ray from the camera and so has a range of coordinate values associated with it instead of just a fixed value. We also assume that the near plane, or image plane, of the camera is located at a unit length offset along the Z-axis from the camera position. We also know the normal of the triangle, and at least one point on the plane the triangle resides in. This is arbitrary but could for example be a vertex or the centroid. We’ll just assume it’s the first vertex.

To find the Z-coordinate of the point where the pixel ray intersects the triangle plane we first define a useful function. The function will take a position or a vector in camera space and return a scalar. The function is uniquely defined as the function that is zero in the camera position, increases linearly along the normal direction, and is constant in all directions orthogonal to the normal direction. Then that function will always give the same value for each point on the triangle plane. Since each intersection point is by definition on the triangle plane, and we know at least one point on the plane, we can figure out how much we need to scale the position vector of the pixel on the near plane by in order to intersect the triangle plane. The function in question is given by the dot product by the triangle normal vector.

We have:

The position on the near plane corresponding to a pixel: $D$
The triangle normal: $N$
A point on the triangle plane $P$
An unknown scaling value $t$

We want to find the $z$ coordinate of $tD$. We know that $P \cdot N = tD \cdot N$.

Solving for the scaling $t$ gives:

\[t = \frac{P \cdot N}{D \cdot N}\]

The case where the denominator $D \cdot N$ is equal to zero corresponds to when the ray lies in the plane, and can therefore be excluded in an earlier stage of the system. The Z-value can then be found by computing $D_z t$, but since we assumed that $D_z$ was plus or minus one unit, we just get either $-t$ or $t$ depending on our choice.

Notice that we do not require $D$ to be normalized, as we do not care about the length of $tD$, only the Z-value. Also notice that the expression for $t$ is not linear in $D$ and therefore does not vary linearly from pixel to pixel, since $D$ is in the denominator of the expression. But, if we only intend to use the Z-value to compare to other Z-values, then we can instead compute $\frac{1}{Z}$ to get $D$ in the numerator, making the expression linear in $D$. Another advantage to using this reciprocal Z-value is that the division can be performed just once per triangle instead of for each pixel. There may also be precision advantages to using reciprocal Z. See e.g. this article by Nathan Reed

Computing Analytic Disk-Rectangle Intersection Area

2022-10-10T00:00:00+00:00

The problem considered here is that of finding the area of a two dimensional region that is the intersection of a disk, i.e. a filled circle, and a filled rectangle. The motivation for this problem was the use case of computing the exact (up to numerical or implementation errors) coverage of a disk in a pixel, in order to find the correct color value for the pixel when rendering the disk. This could also be used for circle rendering by computing the converage of an annulus, simply by computing the area of two concentric disks and subtracting the smaller one from the larger one. Of course, in order to render more than one of these shapes correctly one would have to compute the mutual occlusion of these shapes as well, which complicates the problem further, so I’ll only consider rendering a single shape here, and focus on the problem of computing the analytical area of rectangle-disk intersection. The rectangle is assumed to be axis aligned, for simplicity, but it is possible to add a step between step 1 and 2 to rotate the coordinate system such that the rectangle becomes axis aligned without affecting the result.

The algorithm is as follows: Input: An axis aligned rectangle (box) of arbitrary size and position, a disk of arbitrary radius and position. Output: The area of the intersecting area

Translate and scale the coordinates such that the disk becomes a unit disk centered at the origin
Clip the rectangle against the X and Y axes to obtain four smaller rectangles, one for each XY quadrant
For each of the four quadrant rectangles:
1. Flip the rectangle such that it lands in the top right quadrant, i.e the positive X and positive Y quadrant.
2. Determine whether the rectangle intersects the unit circle
  - If the rectangle does not intersect the unit circle, then it is either fully inside or fully outside the unit disk
    - If the rectangle is fully outside, then we can skip to the next rectangle
    - Otherwise we compute the area of the rectangle and add it to the total area
  - Otherwise the rectangle intersects and we proceed with the next steps
3. Find the intersection points of the rectangle and the unit circle.
  - The rectangle and circle must intersect in exactly two points
4. Find the rectangle that tightly bounds the two intersection points
5. Compute the area of the circular segment given by the two points
6. Find the area of the triangle given by the three vertices at the bottom left of the rectangle bounding the intersection points, and the intersection points themselves
7. Compute the interection area of the quadrant rectangle as the area of the full rectangle, subtract the area of the rectangle bounding the two intersection points, add back the area of the cap and the triangle and subtract the area of the quadrant rectangle that is outside the circle.
8. Add the intersection area to the total area and continue with the next quadrant rectangle
Return the total area

Subproblem 1: Translate and scale to unit circle

Subproblem 2: Clip rectangle against X and Y coordinate axes

Subproblem 3: Flip all rectangles to top right quadrant

Subproblem 4: Detect rectangles intersecting unit circle

Note that the implemented version only works if the rectangle is fully in the top right quadrant! The rectangle must be clipped against the axes.

Subproblem 5: Find intersection points of circle and rectangle

Subproblem 6: Clip rectangle to intersection tightly bound points

Subproblem 7: Compute area of cap and triangle

Subproblem 8: Compute remaining area inside triangle

Subproblem 9: Combine the area of the remaining rectangle part with the triangle and cap

Subproblem 10: Combine the area of all four quadrant pieces of the rectangle

Subproblem 10b: Another way to visualize the result in 10

The final result

Alternative approaches

Integrals

The height of the unit hemicircle from minus one to one can be given as $y = \sqrt{1-x^2}$.

Consider also taking a slice through the unit disk and considering the indicator function of the disk, i.e a function that is equal to one inside the disk and zero elsewhere. The result is a box function, which is discontinuous, but which it is still possible to take the integral of. The integral wil be a piecewise linear function.

Using this along with taking appropriate limits of integration and using the symmetry of the sphere, it should be possible to derive an expression for the area of the intersection fo the disc and rectangle.

Vectors and Tensors

2022-09-17T00:00:00+00:00

A painting of an abstract vector space by DALL-E

Motivation

Vectors and especially tensors can be a little bit mysterious the first time you encounter them, and it doesn’t help that they are often introduced by giving some intuitive idea or analogy that isn’t entirely precice, or even really correct. So if you are trying to get a firm conceptual understanding of what they are, then it is better to ground all the intuition and examples in a very formal and precise definition and build everything else on top of that.

In this post the goal is to give the true mathematical definitions of these concepts, show how to work with them, give some examples and then give some references that I found useful when I was exploring these topics.

Vectors and tensors are the central objects in vector and tensor algebra / linear and multilinear algebra. Since linear algebra is one of the most well studied fields within all of mathematics, there is a large body of useful theorems that immediately becomes available once something is set identified as having a vector-type structure, especially if there are also some natural ways to define inner products and such. By using abstract definitions we broaden the applicability of those results as much as possible.

Since tensors builds on top of the concepts of vectors, we will start with vectors.

Vectors

Some common ways of explaining what vectors are often include “A vector is like a little arrow (directed line segment)”, “A vector is the difference of two points” or “A vector is just a list (of numbers)”.

These ways of thinking about vectors can be helpful when either reasoning about vectors in geometry or calculating with vectors in a computer program, but they are not good at capturing the essense of makes a vector a vector, or serving as a formal definition. We would like to have a formal abstract and universal definition that captures the core properties of vectors and underlies all possible concrete examples of vectors.

Before we proceed it is worth pointing out that the question of what a vector is, is already a bit misleading, since consider a single vector in isolation does not make sense. They are always part of some structure or algebraic system, called a vector space.

So to give the “one true” definition of a vector, it is:

A vector is an element of a vector space

Ok, so while correct, this definition is not particularly enlightening without first knowing what a vector space is. When we now define a vector space we can then start understand vectors to be the objects that together comprise a vector space.

So without further ado, here is the definition of a vector space:

A vector space over a field $F$ is a set $V$, along with two binary operations, which are functions from $V \times V$ to $V$, called vector addition and scalar multiplication that act on the elements of $V$, called vectors, and must satisfy the following eight axioms:

Associativity of vector addition

Commutativity of vector addition

Existence of an identity element of vector addition

Existence of inverse elements of vector addition

Compatability of vector addition with field multiplication

Existence of an identity element of scalar multiplication

Distrubutivity of scalar multiplication with respect to vector addition

Distrubutivity of scalar multiplication with respect to field addition

I’ll give a brief informal explanation of the definition and axioms, and then some formal statements.

I won’t give a detailed definition of a field here, but you can either refer to the definition on wikipedia, or just think of it as any number system that satisfies the algebraic rules of elementary algebra with the operations of addition, subtraction, multiplication and division. Examples include the rational, real and complex numbers.

Associativity is the property that the operation is invariant to the partitioning of the expression into binary operations.
Commutativity is the property that the operation is symmetric in its arguments
Distributivity of one operaton over the over is the property that applying the operation to each of the arguments of the second operation is the same as applying that same operation to the result instead.
An identity element for an operation is an element that leaves the other operand unchanged
Inverse elements for an operation is an element that gives the identity element when applied to the element to which it is the inverse.

It is useful to make some observations about what is not included in the definition of a vector space. Namely, it does not contain any references to lists, dimension or indices, to basis vectors or coordinates. It also does not mention scalar (dot), inner- or cross products, norms, nor any vector product or division. Neither geometry or physics is mentioned or referred to, vectors are not described as arrows or directed line segments. There is also no mention on linear dependence, linear combinations or spans, but those are rather constructions that are built on top of the definition of a vector space. There is not mention of row or column vectors, and no transpose, just vectors.

Formal expressions

For the following formal expressions, we will use the symbol $+$ for vector addition, $*$ for scalar multiplication, $u, v, w$ for vectors from $V$ and $a, b, c$ for scalars from $F$

The axioms can then be given as

$(u + v) + w = u + (v + w)$

$u + v = v + u$

There is a vector $0 \in V$ such that $v + 0 = v$

There is a vector $-v \in V$ for every $v \in V$ such that $v + (-v) = 0$

$(ab) * v = a * (b * v)$

$I * v = v$, where $I$ is the multiplicative identity of $F$ (usually denoted by just $1$)

$a * (u + v) = a * u + a * v$

$(a + b) * v = a * v + b * v$

An aside: Closure and linear subspaces

It is important to keep in mind that the two operations that are part of the definition of vector spaces both take vectors from the space as input and output a vector in the same space. It is possible that a subset of a vector space is also a vector space with the same operations if applying those operations to the vectors in the subspace always maps back to vector in the subspace. We say that the linear subspace is closed under the operations of vector addition and scalar multiplication. These subspaces can exist even if they do not contain all the vectors of the superset. We call such a set a linear subspace, and they are vector spaces in their own right. All vector spaces are linear subspaces of themselves since they are subsets of themselves and closed under the operations The closure property is used in the definition of linear subspaces.

Examples of vector spaces

“Ordinary” numbers
Directed line segments as vectors
Polynomials
Functions ( infinite dimensional)
Arrays of numbers with vector operations
Matrices
Quaternions

Linear maps and Linearity

The operations that are used in the definition of vectorspaces are exactly the ones used in the definition of linearity

Additivity: $f(x + y) = f(x) + f(y)$
Homogeniety of degree 1: $f(\alpha x) = \alpha f(x)$ for all $\alpha$

Dimensionality, bases, linear dependence and spans

Dual vector spaces

The dual vector space of a vector space is the vector space consisting of all linear functions from $V$ to $F$. The dual basis is a basis for the dual space with has the property that the dual basis vectors evaluated with the primal basis vectors as arguments gives the Kronecker delta. That is for basis vectors $e_i$ and dual basis vectors $e^*_j$ we have

\[e_{j}^{*} ( e_i ) = \delta_{ij}\]

Tensors

Bilinearity and multilinearity

Bilinearity is the property that a function is linear in each of its two arguments separately.

That is, a functions $f$ is bilinear if

\[f(a * u + v, w) = a * f(u, w) + f(v, w)\]

and

\[f(u, a * v + w) = a * f(u, v) + f(u, w)\]

Some examples of bilinear operators include regular multiplication of numbers, the dot product and the cross product.

The tensor product of vector spaces

As with vectors, a good way to define tensors is as elements of tensor products of vector spaces. But again, that begs the question what the definition of tensor products of vector spaces is. Some care must be taken, however, because while tensor products of vectors are fairly straight forward to define, not all tensors are (just) tensor products of vectors. It is the case though that all tensors are elements of tensorproducts of vector spaces. With that in mind, let’s look at the properties that the tensor product of vectors must satisfy. In essense the properties are simply the properties of bilinearity. The tensor product of $u \in U$ and $v \in V$ is denoted $u \otimes v$ and satifies the properties

\[(a * u + v) \otimes w = a * u \otimes w + v \otimes w\]

and

\[u \otimes (a * v + w) = a * u \otimes v + u \otimes w\]

So then it may be tempting to say that the definition of tensors in the tensor product space $U \otimes V$ is just the union of all the tensor products of vectors from $U$ with vectors from $V$. However, it is easy to show that this would leave out tensors on the form $u \otimes v + r \otimes s$, if we let $u$, $v$, $r$, and $s$ be linearly independent vectors from a four-dimensional space $V$ and we consider the tensor product space $V \otimes V$.

So we can do to remedy this is to include every linear combination of all tensor products of vectors from $U$ and vectors from $V$. Since $U \otimes V$ is a vector space, it is possible to choose a basis for it and taking the span of the basis vectors of the tensor product space should therefore yield all the tensors in that space. But since the result does not depend on which basis was chosen, the result is basis-independent. It is important that the tensor product of $U$ and $V$ should not depend on a choice of basis in either.

Some more abstract, but equivalent, definitions of the tensor product of vector spaces that do not referr to a basis at all are via quotient spaces or via the universal property.

The Evaluation Map

Tensor Contraction

Covariance and Contravariance

Dual vectors are also frequently called covectors or linear forms, the components of a vector in a basis are called covariant or contravariant depending on whether they scale proportionally or inversely proportionally with the change to the basis vectors. Covariant components are usually written using lower indices, whilst contravariant components are usually written using upper indices.

It is important to note that vectors themselves do not change with a change of basis, and are therefore invariant to change of basis. It is only the components of a vector in a basis that can be covariant or contravariant.

The Metric Tensor

Resources and references

A small set of lectures on introducing tensors by Daniel Chan

Tensors for beginners by eigenchris This series uses index notation for tensors, but gives some good intuition and a feel for how tensors work.

“A Concrete Introduction to Tensor Products” by Mu Prime Math

“Linear Algebra via Exterior Products” - Book by Sergei Winitzki This book is more focused on linear algebra and the exterior product, but also covers the tensor product in a good way.

“What is a tensor anyway?” by Michael Penn Although perhaps not suitable as a first introduction to tensors, it’s a good ‘take on tensor products from a slightly different point of view’, as he puts it.

“Riemannian Geometry” by Math Curator Zanachan A playlist on Riemannian Geometry, but the first few videos are on tensors and exterior products.

Polynomial interpolation

2022-09-10T00:00:00+00:00

A painting of polynomial functions by DALL-E

Connect the dots

How do you get a straight line through two points? You could just represent the line, or line segment, between points $A$ and $B$ by the pair $(A, B)$, which can be ordered or unordered based on whether you care about the direction or not. This abstracts away the geometry and only retains the connectedness. So this is the way it would be represented in topology or in graphs, as it is agnostic to the path taken between the points. It is a non-parametric approach which can be thought of as having omitted information about the travel time.

But what if we want to find the points on the path between the two points? We could design a function, $g$, that can answer whether a given point is on the path or not. The function $g(X) = 0$ if we are on the path, and nonzero otherwise. For example, let’s say we represent the points by vectors in a normed vector space. Then we can use the function

\[g(X) = ||X - A|| + ||X - B|| - ||B - A||\]

This function makes use of the triangle identity, and is equal to zero when the three points $X$, $A$ and $B$ are collinear. Therefore the function indicates whether a point is on the straight line that goes through $A$ and $B$. If we also want to indicate whether $X$ is on the line segment between the points, and not outside, we must also require that

\[||X - A|| \le ||B - A||\]

and

\[||X - B|| \le ||B - A||\]

If we have a dot product then we could also have used it to find the projection $p(X)$ and rejection $r(X)$ of the point $X$ to and from the line, where

\[p(X) = \frac{(X - A) \cdot (B - A)}{(B - A) \cdot (B - A)} (B - A) + A\]

and

\[r(X) = X - p(X)\]

These are vector valued functions, and a function $g$ that satisfies out requirements can then be defined as

\[g(X) = ||r(X)||\]

Parametric lines and curves

The previous examples were abstract topological, and implicit non-parametric, respectively. We can also represent the line using an explicit parametric approach. In this case, we can define a function that takes a parameter $t$, which is often conceptualized as the travel time, and gives an explicit point on the line. We can identify this point $X$ as a function of $t$ with

\[X(t) = (B - A) t + A\]

This function returns $A$ when $t = 0$, $B$ when $t = 1$ and some point in the line segment when $t$ is in the interval $[0, 1]$. With some reordering, using the distributive property, we can alternatively write it as

\[X(t) = (1 - t)A + tB\]

Which can be interpreted as a fade out of $A$ and a fade in of $B$ as $t$ moves from $0$ to $1$. This interpolation between $A$ and $B$ could also be represented by a different parameterization, e.g.

\[X(s) = (1 - s)B + sA\]

where the order of $A$ and $B$ is reversed. Here, the associated parameter values of are changed, but the image of the function is unchanged. The parametrizations are related by $s = 1 - t$. Other parametrizations can be given by scaling and translating $t$.

Linear and affine

The function $X(t)$ is a line through $A$ and $B$ and is often referred to as linear interpolation, or lerp for short, especially in computer graphics and animation fields. The function itself is technically not linear, but affine because of the constant term in the expression. For real valued functions this is the same as the difference between $f(x) = ax$ and $f(x) = ax + b$, where the former function is linear and the latter function is affine. Linear functions are special cases of affine functions with the constant term set to zero, while affine functions can be represented as linear function in one extra variable, say $y$ with the additional constraint that the variables sum to one. e.g.

\[f(x) = ax + by\]

and

\[x + y = 1\]

so that we can solve for $y$ to find that $y = 1 - x$ and retrieve the familiar form of the affine function

\[f(x) = ax + b(1-x)\]

The same is also true for vector valued linear and affine functions, and analogously for multilinear and multiaffine functions.

More points

If we want to parametrically interpolate more than two points, we can no longer use just one line, unless the points happen to be colinear. We can either break the interpolating function into linear pieces, giving a piecewise defined function consisting of line segments between each pair of neighbouring points in the sequence, or, we can interpolate the points by a curve. The curve can be made out of various classes of functions, for example polynomials, rational functions, or trigonometric functions. Each approach have advantages and disadvantages. Combining the piecewise approach with smooth functions gives rise to various spline techniques, most commonly polynomial splines.

Polynomial interpolation

In order for a polynomial function to interpolate a set of points, the polynomial must in general be of degree one less than the number of points we wish to interpolate. E.g linear (degree one) to interpolate two points, and cubic (degree three) to interpolate four points. Keep in mind that in degenerate cases, the interpolating polynomial has lower degree than the number of points. For example if all points coincide, then the interpolating polynomial is just a constant. If they are colinear, then the polynomial is degree one, and if they all lie on a parabola, then the polynomial is quadratic. But we can always interpolate $N$ points with a degree $d = N-1$ polynomial if we include the lower degree polynomials as special cases of the degree $d$ polynomials.

Polynomials can be represented in various different bases, as linear combinations of basis polynomials, but in general the interpolating polynomials of a set of points is unique. The various interpolating polynomials are just the same polynomial expressed in different bases, e.g. Lagrange interpolation, Newton interpolation, etc.

So how can we construct these polynomials? Let’s try to find a set of suitable basis polynomials such that finding the coefficients of the polynomial interpolating the points becomes easy.

A polynomial $p(x)$ is said to interpolate points $y_i$ at $x_i$ if $p(x_i) = y_i$ for all $i$. If we have a set of basis polynomials $\ell_i(x)$ defined in such a way that $\ell_i(x_i) = 1$ and $\ell_i(x_j) = 0$ for all $i \neq j$, then finding the coefficients would be easy, since we could simply set the coefficient of $\ell_i$ to $y_i$ while all other basis functions would be zero at $x_i$ by definition, and thereby canceling all other terms in the polynomials. They would therefore not change the value of $p(x)$ in that position, and we would get that $p(x_i) = y_i$, as required.

Now, how do we come up with such basis functions? Let’s start with considering polynomials on the form

\[\ell(x) = (x - x_0)(x - x_1) \ldots (x - x_n)\]

A plot of the function $\left(x\ -\frac{1}{2}\right)\left(x-1\right)\left(x-2\right)\left(x-\frac{5}{2}\right)$

It is clear that $\ell(x)$ is zero at $x = x_i$ for all $i = 1 \ldots n$, since one of the factors in the product would be zero, and thereby making the entire expression zero. Another property of this polynomial is that it is nonzero everywhere else, there are no other roots of the polynomial.

So now we have found a polynomial that is zero on all the given points, so now we need to find a way to make the polynomial be non-zero on one of them. Again, this is easy since we can simply omit that factor from the product. Let’s say we want the polynomial to be nonzero at $x_i$, then we just omit the $(x - x_i)$ factor to get

\[\ell(x) = (x - x_0) \ldots (x - x_{i-1}) (x - x_{i+1}) \ldots (x - x_d)\]

So far so good, but we have not yet found the desired basis polynomials. We didn’t just require that the basis polynomials $\ell_i$ were non-zero at $x_i$ but that they should be equal to one at that point. The fix for this is also straightforward, we must find an expression for the value of the previous polynomial at $x_i$ and divide by that value to get one. For any $\ell_i$ we can find the value by simply evaluating $\ell$ at $x_i$ to get

\[\ell(x_i) = (x_i - x_0) \ldots (x_i - x_{i-1}) (x_i - x_{i+1}) \ldots (x_i - x_d)\]

Since this is just a scaling factor, the polynomial will still have the prescribed roots at $x_j$ for $i \neq j$, and we get the final form

\[\ell_i(x) = \frac{(x - x_0) \ldots (x - x_{i-1}) (x - x_{i+1}) \ldots (x - x_d)}{(x_i - x_0) \ldots (x_i - x_{i-1}) (x_i - x_{i+1}) \ldots (x_i - x_d)}\]

or in product notation

\[\ell_i(x) = \prod_{\substack{j=0\\ j \neq i}}^{d} \frac{(x - x_j)}{(x_i - x_j)}\]

These basis polynomials are called the Lagrange basis polynomials

A plot of the function $\ell_2(x) = \frac{\left(x\ -\frac{1}{2}\right)\left(x-1\right)\left(x-2\right)\left(x-\frac{5}{2}\right)}{\left(\frac{3}{2}\ -\frac{1}{2}\right)\left(\frac{3}{2}-1\right)\left(\frac{3}{2}-2\right)\left(\frac{3}{2}-\frac{5}{2}\right)}$

for the node sequence $x_0 = \frac{1}{2}$, $x_1 = 1$, $x_2 = \frac{3}{2}$, $x_3 = 2$ and $x_4 = \frac{5}{2}$

The final interpolating polynomial can then be expressed as simply

\[p(x) = \sum_{i = 0}^d y_i \ell_i(x)\]

On the binomial theorem

2022-09-08T00:00:00+00:00

The binomial theorem is a useful tool when working with polynomials. It can be written as follows:

\[(x + y)^d = \sum_{i=0}^d \binom{d}{i} x^{d-i}y^i\]

The name derives from the sum of two variables (monomial, binomial, …, polynomial). The factor $\binom{d}{i}$ is called the binomial coefficient, and can be read as “$d$ choose $i$”, alluding to the fact that the binomial coefficients count the number of (different) ways of choosing $i$ (different) number of items from a set of $d$ items. Or to put it another way, the number of unique subsets of size $i$ of a set of size $d$. Or drawing $i$ items among $d$ disregarding the order of the draws and without returning drawn items.

The binomial coefficients can be defined in a number of ways. They can be defined to be the coefficients where the binomial theorem holds, but they can also be defined in terms of a ratio of factorials

\[\binom{d}{i} = \frac{d!}{(d-i)!\,i!}\]

Or via a recurrence relations

\[\binom{d}{i} = \binom{d-1}{i} + \binom{d-1}{i-1}\]

With the initial values (or boundry values)

\[\binom{d}{0} = \binom{d}{d} = 1\]

Since the factorial function counts the number of permutations of a sequence, the definition based on factorials can be interpreted as starting out with the number of permutations of $d$ elements and then accounting for the fact that we do not care about the order of the chosen elements, $i!$, or the order of the rest of the items, $(d-i)!$, by dividing them out.

The recurrence relation definition of the binomial coefficients provides and easy way to manually work out pascal’s triangle.

Another interesting fact about the binomial coefficients for a given degree $d$ is that they sum to $2^d$, that is

\[\sum_{i=0}^{d} \binom{d}{i} = 2^d\]

This can be seen by considering the expansion of $(x + y)^d$ by the distributive property and counting number the terms (before like terms are collected, if one was to simplify the expression). Each term in the expansion corresponds to an ordered choice of one of two variables, either x or y. This can be encoded as a string of $d$ bits, each encoding the choice for a single set of parentheses. And since a binary number with $d$ bits can encode $2^d$ different values, this must also be the number of terms in the expansion, and in turn the sum of the binomial coefficients.

A core aspect of the theorem is that it relates what is essentially a product of sums to a sum of products.

\[(x + y)^d = \prod_{i=1}^d (x + y) = \sum_{i=0}^d \binom{d}{i} x^{d-i}y^i\]

Properties

symmetry in x and y

Proof of the theorem

The proof of the theorem is a fairly straight forward induction based argument,

assume it holds for a base case, and for degree $d$, then we need to show that it also holds for $d+1$

starting from the left hand side we have

\[(x + y)^{d+1}\] \[(x + y)^{d} (x + y)\]

Since it is assumed to hold for degree $d$ we can substitute that in:

\[\sum_{i=0}^d \binom{d}{i} x^{d-i}y^i (x + y)\]

distribute the $(x+y)$ factor

\[x \sum_{i=0}^d \binom{d}{i} x^{d-i}y^i + y \sum_{i=0}^d \binom{d}{i} x^{d-i}y^i\]

and distribute $x$ and $y$ again into their respective sums and update the exponents accordingly

\[\sum_{i=0}^d \binom{d}{i} x^{d+1-i}y^i + \sum_{i=0}^d \binom{d}{i} x^{d-i}y^{i+1}\]

change the summation index of the second sum to range to $1$ to ${d+1}$ and update the terms accordingly

\[\sum_{i=0}^d \binom{d}{i} x^{d+1-i}y^i + \sum_{i=1}^{d+1} \binom{d}{i-1} x^{d+1-i}y^{i}\]

temporarily extract the $i=0$ term from the first sum and the $i=d+1$ term from the second sum in order to match their ranges again

\[\binom{d}{0} x^{d+1}y^0 + \binom{d}{d} x^{0}y^{d+1} + \sum_{i=1}^d \binom{d}{i} x^{d+1-i}y^i + \sum_{i=1}^{d} \binom{d}{i-1} x^{d+1-i}y^{i}\]

Then we combine the sums

\[\binom{d}{0} x^{d+1}y^0 + \binom{d}{d} x^{0}y^{d+1} + \sum_{i=1}^d \left(\binom{d}{i} + \binom{d}{i-1}\right) x^{d+1-i}y^i\]

use the recurrence property of the binomial coefficients to combine them

\[\binom{d}{0} x^{d+1}y^0 + \binom{d}{d} x^{0}y^{d+1} + \sum_{i=1}^d \binom{d+1}{i} x^{d+1-i}y^i\]

use the fact that $\binom{d}{0} = \binom{d+1}{0} = \binom{d}{d} = \binom{d+1}{d+1} = 1$ to insert them again

\[\binom{d+1}{0} x^{d+1}y^0 + \binom{d+1}{d+1} x^{0}y^{d+1} + \sum_{i=1}^d \binom{d+1}{i} x^{d+1-i}y^i\]

insert the terms back into the summation as the terms for $i=0$ and $i=d+1$

\[\sum_{i=0}^{d+1} \binom{d+1}{i} x^{d+1-i}y^i\]

which is the desired result.

Applications and connections to other polynomial relations

The binomial theorem are related to the Bernstein polynomials, Bézier curves and can be used to prove the power rule in calculus (differentiation of monomials). It is also formally related to Leibniz rules, which take the same form, in umbral calculus.