The next kernel we’re going to look at does a convolution, or weighted two-dimensional blur. The blur weights are taken from a second input image, filter, so for every output pixel, the kernel needs to access all the pixels from the filter image. We do this using random access.
Here’s the whole kernel:
//Warning: connecting a large image to the filter input will cause the kernel to run very slowly!
//If running on a GPU connected to a display, this will cause problems if the time taken to
//execute the kernel is longer than your operating system allows. Use with caution!
kernel ConvolutionKernel : public ImageComputationKernel<ePixelWise>
{
Image<eRead, eAccessRanged2D, eEdgeClamped> src;
Image<eRead, eAccessRandom> filter;
Image<eWrite> dst;
local:
int2 _filterOffset;
void init()
{
//Get the size of the filter input and store the radius.
int2 filterRadius(filter.bounds.width()/2, filter.bounds.height()/2);
//Store the offset of the bottom-left corner of the filter image
//from the current pixel:
_filterOffset.x = filter.bounds.x1 - filterRadius.x;
_filterOffset.y = filter.bounds.y1 - filterRadius.y;
//Set up the access for the src image
src.setRange(-filterRadius.x, -filterRadius.y, filterRadius.x, filterRadius.y);
}
void process() {
SampleType(src) valueSum(0);
ValueType(filter) filterSum(0);
//Iterate over the filter image
for(int j = filter.bounds.y1; j < filter.bounds.y2; j++) {
for(int i = filter.bounds.x1; i < filter.bounds.x2; i++) {
//Get the filter value
ValueType(filter) filterVal = filter(i, j, 0);
//Multiply the src value by the corresponding filter weight and accumulate
valueSum += filterVal * src(i + _filterOffset.x, j + _filterOffset.y);
//Update the filter sum with the current filter value
filterSum += filterVal;
}
}
//Normalise the value sum, avoiding division by zero
if (filterSum != 0)
valueSum /= filterSum;
dst() = valueSum;
}
};
The first difference between this kernel and the ones we’ve seen previously is that this one iterates over its output in a pixelwise fashion:
kernel ConvolutionKernel : public ImageComputationKernel<ePixelWise>
For this convolution, we use only the first channel of the filter image, so it makes sense to use pixelwise iteration - it means we only have to access each pixel in the filter image once per output pixel, rather than once for each component of each output pixel.
The ConvolutionKernel needs to access the whole of the filter input at every output pixel. To do this, we use the eAccessRandom access method, which allows access to any position in the image from all positions in the output.
Image<eRead, eAccessRandom> filter;
With eAccessRandom input access, you can also specify an edge method for the input, in exactly the same way as you would with eAccessRanged1D or eAccessRanged2D access. However, in this case we don’t intend to access any pixels outside the filter input, so no edge method is specified for this input.
Since random access lets you access any input pixel at any output location, there’s no need to specify anything further about the access requirements for the filter input in the init() function. We do, however, store some information about the filter input here that will be used to help us with our access to src.
It’s often useful to know the bounds of a random access input; in this case, for example, we need to know the size of the filter input in order to decide how big our convolution needs to be. A random access input has a bounds member which you can access via inputName.bounds.
bounds is actually a structure called a recti, which is a rectangle with integer co-ordinates. You can get the width of the rectangle using bounds.width(), or the height using bounds.height(). Here, the *width and height functions are used to obtain a radius for our filter input.
//Get the size of the filter input and store the radius.
int2 filterRadius(filter.bounds.width()/2, filter.bounds.height()/2);
You can also access the lower and upper bounds of the rectangle by using bounds.x1, bounds.y1, bounds.x2 and bounds.y2 respectively. Here, bounds.x1 and bounds.y1 are used to set up and store an offset that we will use later for accessing the src input.
//Store the offset of the bottom-left corner of the filter image
//from the current pixel:
_filterOffset.x = filter.bounds.x1 - filterRadius.x;
_filterOffset.y = filter.bounds.y1 - filterRadius.y;
Finally, the radius we calculated earlier is used to initialise the 2D ranged access to the src input.
//Set up the access for the src image
src.setRange(-filterRadius.x, -filterRadius.y, filterRadius.x, filterRadius.y);
(This code will actually overestimate the access needed for an even-sized input by one row and column, but that’s OK - the important thing is not to underestimate the range we need, which could give the wrong results or even lead to a crash.)
Inside the process() function, we iterate over the filter image to do the convolution. The filter is centred over the current output position and each input pixel covered by the filter is multiplied by the corresponding filter weight. These weighted input values are accumulated, as are the filter weights; if the filter weights do not sum to one, this is compensated for at the end in order to preserve the brightness of the input.
The weighted input values are accumulated into valueSum, declared here:
SampleType(src) valueSum(0);
The ConvolutionKernel has pixelwise access to its src input so we can access a whole pixel at a time. A pixel has the SampleType of the input, accessed here using SampleType(src). All images in Nuke are floating point, so SampleType here will be a vector of floating point values. The number of components in the vector will be the same as the number of components in src; for example, this will be 4 if src is an RGBA image. The declaration above initialises all of the components to zero.
With pixelwise access it’s also possible to access a single component from an input pixel. We do this with the the filter input, as only the first component is used for the filter weights. A single component’s type will be the ValueType of its input, which in this case, since we are in Nuke, is always float. Here filterSum, into which the filter weights will be accumulated, is declared as type ValueType(filter) and its single value is initialised to zero:
ValueType(filter) filterSum(0);
We iterate over the bounds of the filter image in order to accumulate the weighted src values:
//Iterate over the filter image
for(int j = filter.bounds.y1; j < filter.bounds.y2; j++) {
for(int i = filter.bounds.x1; i < filter.bounds.x2; i++) {
//Get the filter value
ValueType(filter) filterVal = filter(i, j, 0);
//Multiply the src value by the corresponding filter weight and accumulate
valueSum += filterVal * src(i + _filterOffset.x, j + _filterOffset.y);
//Update the filter sum with the current filter value
filterSum += filterVal;
}
}
Random access is not relative to the current position, so the filter input is accessed using the position (i, j) inside the filter image. filter(i, j, 0) returns the zeroth component at this position as a ValueType(filter).
Warning: Because this iteration is done at every pixel, it’s a good idea to keep the filter input fairly small, otherwise this kernel will take a very long time to run! This can also cause problems when the kernel is running on a GPU which is connected to your display, if the time taken exceeds the time-out limit for your operating system.
The src is accessed with a 2D range, as we did in the BoxBlur2DKernel except that this time we access an entire pixel at a time, rather than a single component. The access looks just the same, src(xOffset, yOffset), but this time it will return a value of type SampleType(src) instead of a single float. This value - which is actually a vector of values, as described above - is multiplied by the filter weight and accumulated into valueSum.
valueSum += filterVal * src(i + _filterOffset.x, j + _filterOffset.y);
At the end of the process() function, the valueSum is normalised by the accumulated filter weights to preserve the brightness of the input. The ConvolutionKernel doesn’t have any control over the values in its filter input, which might contain all zero values, so we need to be careful to avoid a division by zero here.
//Normalise the value sum, avoiding division by zero
if (filterSum != 0)
valueSum /= filterSum;
Now you’ve seen the main access patterns and kernel types, you should be ready to start writing your own kernels. If the examples we’ve looked at so far don’t cover everything you need, you can look at the list in Example Kernels for more clues, or at the Reference Guide.