Share the Load with Distributed CopyCat Training

CopyCat can share the training load across multiple machines, with a familiar interface, so that distributing training feels just like distributing rendering using standard render farm applications such as OpenCue.

Offloading the training onto different machines also means that you can continue working when CopyCat is training models in the background.

Note:  Distributed CopyCat training is qualified on Nuke 14.1 and 15.0.
Distributed CopyCat training via OpenCue has been validated internally using 14.1. We expect it would work using 15.0 but this has not been tested.

As an artist, you work as normal on your workstation and the render manager handles the distribution of training tasks to the available workers. The image shows a typical distributed training set up using a render manager to distribute the training load.

Note:  You must also install Nuke on the worker machines for distributed training to operate as expected.

Note:  CopyCat can train using Nuke Render Licences on distributed machines

Distributed Training in CopyCat is controlled through the following environment variables:

Note:  If you omit one of the four required environment variables, CopyCat runs in non-distributed mode and Nuke is locked until training completes or is stopped.

Manual Distributed Training Example

Set up distributed training from the command line.

Automated Distributed Training Example

Set up distributed training using a render manager.

Manage and Monitor Training in CopyCat

When distributed training is set up and working correctly, you can start training in Nuke's menu bar, rather than in CopyCat's Properties panel. Custom menu items can be added using a Python script, an example of which you can find on the Automated Distributed Training Example page. In this example, we're using OpenCue's CueGUI application to manage training runs from within Nuke and we've added a Train CopyCat on Render Farm function to Nuke's Render menu.

Tip:  If you train CopyCat models without distributed training, Nuke is locked until training completes or is stopped.

Optimizing Distributed Training: How Many Machines Should You Use?

Determining the optimal number of machines for distributed training can be complex. However, here's a guide to help you make an informed decision.

Tip:  Distributed training scales effectively when each GPU is heavily utilized. In simpler terms, the more work each GPU has, the better the performance when using multiple GPUs.

Factors Influencing Scalability

Batch Size

In distributed training, the batch is divided evenly among the GPUs. For instance, with a batch size of 8 on 4 GPUs, each GPU processes 2 crops per step.

A larger batch size means more work for each GPU, leading to better scalability.

Crop Size & Model Size

Larger crop or model sizes mean more computations per GPU per step, enhancing the scalability when using multiple GPUs.

GPU Specifications

High-end GPUs can process the same batch faster and with less utilization compared to lower-end GPUs. Thus, with top-tier GPUs, you might not see significant benefits from scaling unless you're also increasing batch or crop sizes.

Examples:

Scenario A

Batch Size: 12

Crop Size: 256

Model Size: Medium

GPU Spec: Low

Recommendation: Using four GPUs might double your training speed.

Scenario B

Batch Size: 8

Crop Size: 128

Model Size: Small

GPU Spec: High

Recommendation: There might be minimal benefit in using more than one GPU.

Scenario C

Batch Size: 128

Crop Size: 256

Model Size: Large

GPU Spec: High

Recommendation: Deploying up to 8 GPUs could potentially octuple your training speed.

In conclusion, while there's no one-size-fits-all answer, considering the above factors can guide you in optimizing your distributed training setup.