Iop: Spatial Operators

The Iop class, as previously discussed, provides a parent class at some level in the inheritance tree for all image operators. It encapsulates the fundamentals of the 2D image architecture in NUKE, from caching through to channels. Most of the specifics of the Iop class itself have already been touched upon, so prior to looking through this section, please read 2D Architecture.

Compared to more specialized image operators, such as PixelIop or DrawIop, the Iop class provides less locked-down image access functions at the expense of requiring more supporting boilerplate code.

If you have an image processing algorithm that cannot be expressed in such a fashion as to only require access to a single row of image input data to calculate the corresponding output row, then Iop is the way to go. An Iop is able to access image data at any point on the input for a given output pixel, and is able to lock areas of the input image into memory (known as ‘tiles’ and ‘interests’ in NUKE terminology). Of course, the larger the area you need to access, the greater the memory overhead of your Op. An algorithm should be reduced to its lowest possible requirement to ensure low memory overhead and smooth operation on all hardware.

If you are just getting started with the NDK, you may want to first experiment with PixelIops before moving onto the more complex Iop class.

Iop is:

  • intended for use when implementing image processing algorithms that cannot be factored to solely rely on the corresponding input pixel or row (PixelIop) or that draw into a single channel (DrawIop).

  • able to access any input pixel.

  • able to lock areas of input images into memory and optionally enforce calculation of their contents at a particular time.

The Iop Class Specifics & Required Virtual Calls

For a detailed overview of the calls used by Iop, see the Iop Call Order section.

The three core calls required are:

void Iop::_validate(bool for_real)

Define what the output of your Op will be - for example, the size of the output image and channels produced.

You should in turn call validate on all your inputs.

void Iop::_request(int x, int y, int r, int t, ChannelMask, int count)

Your Op is called with a request area, which is the area of image that a downstream Op asks for during engine.

For the requested area, you need to turn call request on all your inputs for the area required to produce the image size requested.

For instance, if your Op does a blur with a kernel size of 20, you may need to request your input with a x,y,r,t that is 20 pixels bigger that the area requested.

Iop does provide a basic implementation of _request() which assumes that all inputs are of type Iop. To avoid undefined behaviour you must reimplement _request() if any of your inputs do not derive from Iop (a GeoOp-derived op for example).

void Iop::engine(int y, int l, int r, ChannelMask channels, Row &row)

Do the actual work. Here you should get pixels from your inputs, process them, and return them in ‘row’ for a given line as given in ‘y’. In ‘row’, you should fill from pixels starting at ‘l’ and finishing at ‘r’ for all ‘channels’.

A Simple Iop Example - AddInputs

In this example, we show a simple Iop that simply adds two inputs together.

Let’s just run through the main concepts. Open up the file ‘AddInputs.cpp’ in the example directory to see the full example.

int maximum_inputs() const { return 2; }
int minimum_inputs() const { return 2; }

Here we are overiding some optional virtuals to define how many input pipes to show in the Node Graph (DAG).

Max inputs says we never get more than two inputs. Min inputs says we never have less than two inputs.

Min inputs also means that there are always two inputs connected to this Op even if the node in the Node Graph is not connected to anything.

The inputs, if disconnected, are in fact instances of the Op ‘Black’, which produces black pixels. This has a nice effect on our code as we don’t need to check for disconnected inputs in any part of our processing.

void AddInputs::_validate(bool for_real)
{
  copy_info(); // copy bbox channels etc from input0
  merge_info(1); // merge info from input 1
}

In the implementation of the validate function, we first copy the info (that is, the channels, format, and image size from input0 or the first input) by calling copy_info().

Then we merge the input from the other input. It is important that we perform the merge - otherwise, our output image size would only ever be the size of the Op connected to the first input.

Note that copy_info and merge_info implicitly call validate on each one of the inputs.

void AddInputs::_request(int x, int y, int r, int t, ChannelMask channels, int count)
{
  // request from input 0 and input 1
  input(0)->request( x, y, r, t, channels, count );
  input(1)->request( x, y, r, t, channels, count );
}

Our request is quite simple. We are not accessing pixels spatially, so we just request the same box and channels from both our inputs.

Note how we access the inputs by calling input(inputNo), which returns a pointer to each input Op.

void AddInputs::engine ( int y, int x, int r,
                              ChannelMask channels, Row& row )
{
  // input 0 row
  row.get(input0(), y, x, r, channels);

  // input 1 row
  Row input1Row(x, r);
  input1Row.get(input1(), y, x, r, channels);

  foreach ( z, channels ) {
    const float* input1 = input1Row[z] + x;
    const float* input0  = row[z] + x;
    float* outptr = row.writable(z) + x;
    const float* end = outptr + (r - x);

    while (outptr < end) {
      *outptr++ = *input0++ + *input1++;
  }
}

Now, this is the part that does the real work. We first fetch out of input 0 and input 1 the line required from x through to r for the given channels. We reuse the output row to fetch input0 and create a new Row to hold input1’s Row. Note that we use the helper functions input0() and input1() to fetch the input Op references. These functions are the equivalent of doing *input(0) and *input(1).

Next, we loop through all the channels and simply add all the pixels from input0 and input1 together and output them into ‘row’.

Working With Tiles: SimpleBlur

Often when performing image calculations, your engine call needs to access more than just one Row from its input to produce its output Row. In order to do that, NUKE has a concept of a Tile.

A Tile has accessor functions on it that allow you to access pixels as a 2-dimensional array of the given tile size.

It is important to note, as described in the 2D Architecture section, that the fundamental unit of image processing in NUKE is still a Row. When a Tile is created, NUKE is creating a cache on the input that the Tile was created for (if one didn’t exist already) and then NUKE fills the Rows required on that input to fill the Tile. Those Rows are then locked in the NUKE cache for the existence of the Tile object. You can then use accessor functions on the Tile into the internal Rows in the cache for that input.

This also means that when you have multiple threads all creating Tiles that overlap, quite often when the Tile is created, many of the Rows are already in the cache for that Tile and only minimal extra processing occurs.

Final word of warning about Tiles and the request call: it is very important that your Tile bounds never exceed the bounds requested from the input in _request. Creating a Tile that exceeds the requested area can have unexpected effects, including reading garbage pixels or even crashes.

Now, let’s look into an example: SimpleBlur.cpp. This example does a simple box blur and covers most of the concepts of using Tile.

int _size;

SimpleBlur (Node* node) : Iop (node)
{
  _size = 20;
}

Firstly, we have a hard-coded kernel size for our blur of 20 pixels stored in the constructor.

void SimpleBlur::_validate(bool for_real)
{
  copy_info(); // copy bbox, channels, and so on from input0, which will validate it.
  info_.pad( 20 );
}

In validate, we copy the input size but then we actually grow our bounding box by the blur size.

Without this extra step, our blur would be cropped at the edges to the input size.

void SimpleBlur::_request(int x, int y, int r, int t, ChannelMask channels, int count)
{
  // request extra pixels around the input
  input(0)->request( x - _size , y - _size , r + _size, t + _size, channels, count );
}

Next, we request our input. We also grow the request area by the blur size, as we need this many extra pixels around the requested box from the input to do the blur.

void SimpleBlur::engine ( int y, int x, int r,
                              ChannelMask channels, Row& row )
{

  // make a tile for current line with padding around for the blur
  Tile tile( input0(), x - _size , y - _size , r + _size, y + _size , channels);
  if ( aborted() ) {
    std::cerr << "Aborted!";
    return;
  }

  foreach ( z, channels ) {
    float* outptr = row.writable(z) + x;
    float value = 0;

    for( int cur = x ; cur < r; cur++ ) {
      float value = 0;
      float div = 0;

      if ( intersect( tile.channels(), z ) ) {
        // a simple box blur
        for ( int px = -_size; px < _size; px++ ) {
          for ( int py = -_size; py < _size; py++ ) {
            value += tile[z][ tile.clampy(y + py) ][ tile.clampx(cur + px) ];
            div++;
          }
        }
        if ( div )
          value /= div;
      }
      *outptr++ = value;
    }
  }
}

The engine call first fetches a Tile of input0 around the current output line, growing it by the blur size.

After creating a Tile, you must check for aborted() to check if filling the Tile failed. If it did fail, you should return immediately.

Then, we loop through all pixels and channels doing a simple average of the pixels in the Tile per pixel to do our box blur. Note that the pixels in the Tile can be accessed via a multi-dimension array per channel. Note also that offsets into the Tile always are indexed based on absolute pixel locations, not offsets into the Tile array. If the Tile is 9x9 pixels big starting at pixel x=101 y=101, then the first pixel is accessed like tile[z][101][101], not tile[z][0][0].

Note that in this example we could have created a new Tile for every pixel rather than one big Tile up front for the whole Row.

Note

Because Tile objects lock Rows into NUKE’s internal cache and cannot be freed under low memory conditions, it is best to keep their lifetime as short as possible and their size as minimal as possible. If you require access to the entire image but can access it one Row at a time, it is better to loop through the image, filling a Row object for every Row, rather than creating a Tile object for the whole area.

Full-Frame Processing and Interests

So far in this section, we’ve discussed methods for accessing small sub-sections of the image to be able to generate the current output pixel. However, what about the circumstance where you need to access the entire image for a single output pixel - optical flow based motion estimation, for example?

To do this, we need to implement another type of Iop, the PlanarIop.

The example we’ll give is a ‘Normalise’ operator, which analyzes the entire input image to find the highest value and then normalizes this value to 1.0.

void Normalise::renderStripe ( ImagePlane& imagePlane )
{
  Format format = input0().format();

  // these useful format variables are used later
  const int fx = format.x();
  const int fy = format.y();
  const int fr = format.r();
  const int ft = format.t();

  const int height = ft - fy ;
  const int width = fr - fx ;

  ChannelSet readChannels = input0().info().channels();

  Interest interest( input0(), fx, fy, fr, ft, readChannels, true );
  interest.unlock();

  // fetch each row and find the highest number pixel
  for ( int ry = fy; ry < ft; ry++) {
    progressFraction( ry, ft - fy );
    Row row( fx, fr );
    row.get( input0(), ry, fx, fr, readChannels );
    if ( aborted() )
      return;

    foreach( z, readChannels ) {
      const float *CUR = row[z] + fx;
      const float *END = row[z] + fr;
      while ( CUR < END ) {
        maxValue = std::max( (float)*CUR, _maxValue );
        CUR++;
      }
    }
  }

  input0().fetchPlane(imagePlane);
  imagePlane.makeUnique();

  foreach( z, channels ) {

    static int chanNo = imagePlane.chanNo(z);

    for (int y = imagePlane.bounds().y();
         y != imagePlane.bounds().t();
         y++) {
      float *CUR = imagePlane.writableAt(imagePlane.bounds().x(), y, chanNo);
      const float *END = imagePlane.writableAt(imagePlane.bounds().r(), y, chanNo);
      while ( CUR < END ) {
        *CUR++ *= ( 1. / maxValue );
      }
    }
  }
}

Note

Note that this means that the processing will not be multi-threaded entirely optimally, as the actual rescaling of the pixel values is being done serially rather than in parallel. This could be avoided by a more complex structure, or you could insert your own calls to multi-thread.

Note

Accessing the entire image involves precalculating the entire source image before work can begin on calculating the target image. This means that the Op can both have a large memory footprint and form a breakage in the scanline processing architecture, thus in many circumstances appearing ‘slow’ to users in the interface. If your algorithm can be refactored to not rely on the entire source image, then it should.

Exercise: Build a Median Node

Now, let’s take what we’ve covered so far in this section and apply it. For this exercise, take the SimpleBlur.cpp source code in the NDK, get it building, and then:

  • Change its name to SimpleMedian and ensure it can be created in the Node Graph (DAG).

  • Add a knob to the node to control the Median size.

  • Amend the engine to find the median value of the Tile and set the output Pixel as required.

  • Amend the help and tooltip texts to reflect this.

  • Relax with a satisfying pint of beer.