The Iop class, as previously discussed, provides a parent class at some level in the inheritance tree for all image operators. It encapsulates the fundamentals of the 2d image architecture in NUKE, from caching through to channels. Most of the specifics of the Iop class itself have already been tuoched upon, so prior to looking through this section please read 2d-architecture.
Versus more specialised image operators such as PixelIop or DrawIop, the Iop class provides less locked down image access functions, at the expense of requiring more supporting boilerplate code.
If you have an image processing algorithm which cannot be expressed in such a fashion as to only require access to a single row of image input data to calculate the corresponding output row then Iop is the way to go. An Iop is able to access image data at any point on the input for a given output pixel, and is able to lock areas of the input image into memory (known as ‘tiles’ and ‘interests’ in NUKE terminology). Of course, the larger the area you need to access the greater the memory overhead of your Op. An algorithm should be reduced to its lowest possible requirement to ensure low memory overhead and smooth operation on all hardware.
If you are just getting started with the NDK you may want to experiment first with PixelIops, before moving onto the more complex Iop class.
See the Iop Call Order section for a detailed overview of the calls used by Iop.
The three core calls required are:
Define what the output of your Op will be. Eg. the size of the output image and channels produced.
You should in turn call validate on all your inputs.
Your Op will be called with a request area which is the area of image that a downstream Op will ask for during engine.
For the requested area you need to turn call request on all your inputs for the area required to produce the image size requested.
For instance if your Op does a blur with a kernel size of 20 you may need to request your input with a x,y,r,t that is 20 pixels bigger that the area requested.
Do the actual work. Here you should get pixels from your inputs, process them and return them in ‘row’ for a given line as given in ‘y’. In ‘row’ you should fill from pixels starting at ‘l’ and finishing at ‘r’ for all ‘channels’.
In this example we show a simple Iop that simply adds two inputs together.
Let’s just run through the main concepts, open up the file ‘AddInputs.cpp’ in the example directory to see the full example.
int max_inputs() const { return 2; }
int min_inputs() const { return 2; }
Here we are overiding some optional virtuals to define how many input pipes to show in the DAG.
Max inputs says we will never get more than two inputs. Min inputs says we will never have less than two inputs.
Min inputs also means that there will always be two inputs connected to this Op even if the node in DAG is not connected to anything.
The inputs, if disconnected will in fact be instances of the Op ‘Black’, which produces black pixels. This has a nice affect one our code as we don’t need to check for disconnected inputs in any part of our processing.
void AddInputs::_validate(bool for_real)
{
copy_info(); // copy bbox channels etc from input0
merge_info(1); // merge info from input 1
}
In the implementation of the validate function we first copy the info, that is the channels, format, image size from input0 or the first input, by calling copy_info().
Then we merge the input from the other input. It is important that we perform the merge otherwise our output image size would only ever be the size of Op connected to the first input.
Note that copy_info and merge_info implicitly call validate on each one of the inputs.
void AddInputs::_request(int x, int y, int r, int t, ChannelMask channels, int count)
{
// request from input 0 and input 1
input(0)->request( x, y, r, t, channels, count );
input(1)->request( x, y, r, t, channels, count );
}
Our request is quite simple. We are not accessing pixels spatially so we just request from both our inputs the same box and channels.
Note how we access the inputs by calling input(inputNo) which returns a pointer to each input Op.
void AddInputs::engine ( int y, int x, int r,
ChannelMask channels, Row& row )
{
// input 0 row
row.get(input0(), y, x, r, channels);
// input 1 row
Row input1Row(x, r);
input1Row.get(input1(), y, x, r, channels);
foreach ( z, channels ) {
const float* input1 = input1Row[z] + x;
const float* input0 = row[z] + x;
float* outptr = row.writable(z) + x;
const float* end = outptr + (r - x);
while (outptr < end) {
*outptr++ = *input0++ + *input1++;
}
}
}
Now this is the part that does the real work. We first fetch out of input 0 and input 1 the line required from x through to r for the given channels. We reuse the output row to fetch input0 and create a new Row to hold input1’s Row. Note we use the helper functions input0() and input1() to fetch the input Op references. These functions are the equivalent of doing *input(0) and *input(1).
Next we loop through all the channels and simply add all the pixels from input0 and input1 together and output them into ‘row’.
Often when performing image calculations your engine call will need to access more than just one Row from its input to produce its output Row. In order to do that NUKE has a concept of a Tile.
A Tile has accessor functions on it that allow you to access pixels as a 2D dimensional array of the given tile size.
It is important to note as described in the 2d-architecture section that the fundamental unit of image processing in NUKE is still a Row. When a Tile is created NUKE is creating a cache on the input that the Tile was created for ( if one didn’t exist already ) and then NUKE fills the Rows required on that input to fill the Tile. Those Rows are then locked in the NUKE cache for the existance of the Tile object. You then can use accessor functions on the Tile into the internal Rows in the cache for that input.
This also means when you have multiple threads all creating Tiles that overlap, quite often when the Tile is created many of the Rows are already in the cache for that Tile and only minimal extra processing occurs.
Final word of warning about Tiles and the request call. It is very important that your Tile bounds never exceed the bounds requested from the input in _request. Creating a tile that exceeds the requested area can have unexpected effects inculuding reading garbage pixels or even crashes.
Now lets look into an example, SimpleBlur.cpp. This example does a simple box blur and covers most of the concepts of using Tile.
int _size;
SimpleBlur (Node* node) : Iop (node)
{
_size = 20;
}
Firstly we have a hardcoded kernel size for our blur of 20 pixels stored in the contructor
void SimpleBlur::_validate(bool for_real)
{
copy_info(); // copy bbox channels etc from input0, which will validate it.
info_.pad( 20 );
}
In validate we copy the input size but then we actually grow our bounding box by the blur size.
Without this extra step our blur would have been cropped at the edges to the input size.
void SimpleBlur::_request(int x, int y, int r, int t, ChannelMask channels, int count)
{
// request extra pixels around the input
input(0)->request( x - _size , y - _size , r + _size, t + _size, channels, count );
}
Next we request from our input. We also grow the request area by the blur size as we need this many extra pixels around the requested box from the input to do the blur.
void SimpleBlur::engine ( int y, int x, int r,
ChannelMask channels, Row& row )
{
// make a tile for current line with padding arond for the blur
Tile tile( input0(), x - _size , y - _size , r + _size, y + _size , channels);
if ( aborted() ) {
std::cerr << "Aborted!";
return;
}
foreach ( z, channels ) {
float* outptr = row.writable(z) + x;
float value = 0;
for( int cur = x ; cur < r; cur++ ) {
float value = 0;
float div = 0;
if ( intersect( tile.channels(), z ) ) {
// a simple box blur
for ( int px = -_size; px < _size; px++ ) {
for ( int py = -_size; py < _size; py++ ) {
value += tile[z][ tile.clampy(y + py) ][ tile.clampx(cur + x) ];
div++;
}
}
if ( div )
value /= div;
}
*outptr++ = value;
}
}
}
The engine call first fetches a Tile of input0 around the current output line growing it by the blur size.
After creating a Tile you must check for aborted() to check if filling the Tile failed. If it did fail you should return immediately.
Then we loop through all pixels and channels doing a simple average of the pixels in the Tile per pixel to do our box blur. Note the pixels in the tile can be accessed via a multi-dimension array per channel. Note also that offsets into the Tile always are indexed based on absolute pixel locations not offsets into the Tile array. In if the Tile is 9x9 pixels big starting at pixel x=101, y=101, then the first pixel is access like tile[z][101][101] not tile[z][0][0].
Note in this example we could have created a new Tile for every pixel rather than one big Tile up front for the whole Row.
Note
Because Tile objects lock Rows into NUKE internal cache and cannot be freed under low memory conditions it is best to their lifetime as short as possible and their size as minimal as possible. If you require access to the entire image but can access it one Row at a time it is better to loop through the image filling a Row object for every Row rather than creating a Tile object for the whole area.
So far in this section we’ve discussed methods for accessing small subsections of the image to be able to generate the current output pixel, however what about the circumstance where you need to access the entire image for a single output pixel - optical flow based motion estimation for example.
To do this we’ll lock the first engine() thread, and in that thread force calculation of the entirety of the incoming image. The same thread can then do whatever global calculation is required based on this image data, before unlocking and allowing the remaining engine threads to process based on these results.
The example we’ll give is a ‘Normalise’ operator which analyses the entire input image to find the highest value and then normalise this value to 1.0.
void Normalise::engine ( int y, int x, int r,
ChannelMask channels, Row& row )
{
{
Guard guard(_lock);
if ( _firstTime ) {
// do anaylsis.
Format format = input0().format();
// these useful format variables are used later
const int fx = format.x();
const int fy = format.y();
const int fr = format.r();
const int ft = format.t();
const int height = ft - fy ;
const int width = fr - fx ;
ChannelSet readChannels = input0().info().channels();
Interest interest( input0(), fx, fy, fr, ft, readChannels, true );
interest.unlock();
// fetch each row and find the highest number pixel
_maxValue = 0;
for ( int ry = fy; ry < ft; ry++) {
progressFraction( ry, ft - fy );
Row row( fx, fr );
row.get( input0(), ry, fx, fr, readChannels );
if ( aborted() )
return;
foreach( z, readChannels ) {
const float *CUR = row[z] + fx;
const float *END = row[z] + fr;
while ( CUR < END ) {
_maxValue = std::max( (float)*CUR, _maxValue );
CUR++;
}
}
}
_firstTime = false;
}
} // end lock
Row in( x,r);
in.get( input0(), y, x, r, channels );
if ( aborted() )
return;
foreach( z, channels ) {
float *CUR = row.writable(z) + x;
const float* inptr = in[z] + x;
const float *END = row[z] + r;
while ( CUR < END ) {
*CUR++ = *inptr++ * ( 1. / _maxValue );
}
}
}
First up, the two member variables _firstTime and _lock allow the engine threads to figure out if their instance is the first called for this image, or if another thread is already doing the necessary work. Since engine is multithreaded, we can’t acertain from variables such as the current row index whether this is the first worker thread called (since the thread handling row 2 could be called before the thread handling row 1). Instead, we set _firstTime to true when initialised and in _validate, and then check for this at the beginning of the engine function. Alternative implementations can see the tile access work being done in open. Note that using _validate for such image access is not advisable as it will lock up the NUKE UI as the input image is calculated.
The Guard is a DDImage utility class defined in the DDImage Thread.h header. It’s used to lock all other threads attempting to gain sole access to the _lock boolean, thus preventing them doing anything until that guard is released when it goes out of scope.
In this circumstance we then use the first engine thread pull the whole input image to find the highest value to use for our normalise calculation. Instead of using a Tile we pull each Row out one at a time looping through the entire input image.
Before we fetch each Row however we add an interesting call that creates an Interest object.
Interest interest( input0(), fx, fy, fr, ft, readChannels, true );
interest.unlock();
An Interest object is very similar to a Tile in that it locks an area of pixels into the NUKE Row cache. The main difference being is it locks the area into the cache, but it does not fill the cache straight away during the constructor like Tile.
The final argument to Interest is interesting too. Here we are setting the multi-thread flag to true. What this does it start up some threads that will background fill the cache for the interest area. The reason we are doing this is we have effectivly made NUKE single threaded by locking out all the render threads in our engine call. Without this flag, we would have to fetch the input Rows one at a time from our single thread. With this flag the input cache is being filled by multiple threads.
Note
Both Interest and Tile have a multi-thread flag for their constructor to fill the cache from multiple threads. Never use this option unless you have locked all the render threads in engine like this example. Usually when the engine call is not locked Interests or Tiles overlap and other render threads are filling the cache effectively making the calls already multi-threaded. Adding this multi-thread option in this case will severely degrade performance as multiple threads fight for access to the cache.
Finally after the Interest is created we call ‘unlock’ on it. This says to NUKE that if memory runs low it can still free lines from the cache if required. In this case the Interest is more of a hint that we want NUKE to keep those rows in cache because we will access them all, but its not critical we need the whole interest area at once in memory.
With these optimsations, the Normalise example here can still effectivily Normalise very large image sizes even though it requires the whole input image in the first engine call.
Note
Accessing the entire image involves precalculating the entire source image before work can begin on calculating the target image. This means that the op can have both a large memory footprint, and will form a breakage in the scanline processing architecture, thus in many circumstances appearing ‘slow’ to users in the interface. If your algorithm can be refactored to not rely on the entire source image then it should.
Now lets take what we’ve covered so far in this section and apply it. For this exercise, take the SimpleBlur.cpp source code in the NDK, get it building, then: