Improving OpScript Performance
Understand the Lua Garbage Collector
In Lua, memory can be allocated on either the stack or the heap. Objects or memory allocated on the heap is reclaimed through Lua’s garbage collector. When the garbage collector is running your Lua code cannot make forward progress therefore, reducing the frequency with which the garbage collector must run or, the amount of garbage it must collect, helps to improve the performance of your OpScripts.
The following Lua constructs, while not bad, result in a new object being created which must be subsequently cleaned up during garbage collection. You should pay particular attention it these constructs are used within loops.
- String_one..string_two - String concatenation or any string creation function could potentially result in the creation of a new object. As Lua strings are unique it firstly checks to determine if the string has been created elsewhere.
- { … } - Each time a table constructor is executed, a new table is created.
- function() ... end - Executing a function statement creates a closure. If this executed within a loop of n iterations it results in the creation of n closures.
Note: For more information see lua-users.org.
Avoid “..” in loops
As a concrete example of Understand the Lua Garbage Collector, in this recommendation we explore the use of the string concatenation operator “..”, commonly used in OpScripts due to its convenient shorthand notation.
As an example, consider the following code used to build an L-System description:
local result = "" for i = 1, #inputStr do local char = inputStr:sub(i,i) local ruleStr = rulesTable[char] if ruleStr then result = result..ruleStr else result = result..char end end return result |
result
is appended to in a tight loop. Whilst convenient, this results in a large number of string allocations and poor performance. The preceding code can be re-written to handle all concatenation only once the loop has completed as shown below:
local buf = {} for i = 1, #inputStr do local char = inputStr:sub(i,i) local ruleStr = rulesTable[char] if ruleStr then buf[#buf+1] = ruleStr else buf[#buf+1] = char end end return table.concat(buf) |
Running the above example as part of a large Katana scene file reduced the scene processing time of this function by approximately 2.5x as shown in the following table:
String Method |
Time |
“..” Operator |
7.011s |
Table.concat() |
2.681s |
Consider creating custom C++ Ops to replace costly OpScripts
While Lua and the OpScript API are very convenient and versatile, OpScripts that account for a large portion of scene traversal time may be better suited as custom C++ Ops.
As a concrete example, consider the following OpScript, which places the target location at a random position, orientation and scale within a uniform box.
Local hash = ExpressionMath.stablehash(Interface.GetOutputName())) math.randomseed(hash)
local dim = 10 local s = 0.15
local sx = s*(1 + 0.5*math.random()) local sy = s*(1 + 0.5*math.random()) local sz = s*(1 + 0.5*math.random())
local tx = 2*dim*(math.random() - 0.5) local ty = dim* math.random() + sy local tz = 2*dim*(math.random() - 0.5)
local ax = math.random() local ay = math.random() local az = math.random() local axis = Imath.V3d(ax, ay, az):normalize() local angle = 360 * math.random()
local translate = Imath.V3d(tx, ty, tz) local rotate = Imath.V4d(angle, axis.x, axis.y, axis.z) local scale = Imath.V3d(sx, sy, sz)
Interface.SetAttr("xform.group0.translate", DoubleAttribute(translate:toTable(), 3)) Interface.SetAttr("xform.group0.rotate", DoubleAttribute(rotate:toTable(), 4)) Interface.SetAttr("xform.group0.scale", DoubleAttribute(scale:toTable(), 3)) |
(Such an OpScript is of course contrived, but one could imagine replacing randomness with a well-defined distribution to place geometry.)
This OpScript could be rewritten as the following C++ Op:
#include <FnAttribute/FnAttribute.h> #include <FnGeolib/op/FnGeolibOp.h> #include <FnGeolibServices/FnExpressionMath.h> #include <FnGeolibServices/FnGeolibCookInterfaceUtilsService.h> #include <FnPluginSystem/FnPlugin.h>
#include <cstdlib>
inline double getRandom() { return (double)rand() / (double)RAND_MAX; } struct DistributeGeometryOp : public Foundry::Katana::GeolibOp { static void setup(Foundry::Katana::GeolibSetupInterface &interface) { interface.setThreading(Foundry::Katana::GeolibSetupInterface::ThreadModeConcurrent); } static void cook(Foundry::Katana::GeolibCookInterface &interface) { srand(FnGeolibServices::FnExpressionMath::stablehash(interface.getOutputName())); double dim = 10.0; double s = 0.15f; double sx = s*(1.0 + 0.5f*getRandom()); double sy = s*(1.0 + 0.5f*getRandom()); double sz = s*(1.0 + 0.5f*getRandom());
double tx = 2.0f*dim*(getRandom() - 0.5f); double ty = dim* getRandom() + sy; double tz = 2.0f*dim*(getRandom() - 0.5f);
double ax = getRandom(); double ay = getRandom(); double az = getRandom();
double axisLen = sqrt(ax*ax + ay*ay + az*az); ax /= axisLen; ay /= axisLen; az /= axisLen;
double angle = 360.0 * getRandom();
double translate[] = { tx, ty, tz }; double rotate[] = { angle, ax, ay, az }; double scale[] = { sx, sy, sz }; interface.setAttr( "Xform.group0.translate", FnAttribute::DoubleAttribute(translate, 3, 3)); interface.setAttr( "Xform.group0.rotate", FnAttribute::DoubleAttribute(rotate, 4, 4)); interface.setAttr( "Xform.group0.scale", FnAttribute::DoubleAttribute(scale, 3, 3)); } }; DEFINE_GEOLIBOP_PLUGIN(DistributeGeometryOp); void registerPlugins() { REGISTER_PLUGIN(DistributeGeometryOp, "DistributeGeometryOp", 0, 1); } |
Up to a different random distribution, these Ops produce the same scene. However, the C++ version executes around 3.2 times as fast: when used to position 10,000 geometric primitives, the Ops use the following resources:
Op |
CPU time |
Memory used |
OpScript |
0.989s |
54.24 MiB |
C++ Op |
0.310s |
53.67 MiB |
In this instance, memory usage is similar. However, if the OpScript is allocating and freeing a lot of small objects, converting to C++ also saves time spent in Lua garbage collection.