Skip to the content.

Distributed Training

This project contains scripts/modules for distributed training
Based on current deep learning models, size of datasets, training methodologies; waiting for a model to train on a single GPU can be compared to waiting for an infant to take the first steps

Let’s cut to the chase.
In this repository I try to simplify the concepts and (a few) implementations for distributed training

Introduction

There are generally two ways to distributed computation across multiple devices:

Before going ahead there are two cases we need to think about first:

Certain concepts and implementations have been picked up raw, you find the same here

Pytorch Distributed Training

Pytorch has two ways to split models and data across multiple GPUs: nn.DataParallel and nn.DistributedDataParallel

nn.DataParallel:

nn.DistributedDataParallel:

Implementation

Driver script: On execution of main.py on either modules, the script launches a processs for every GPU.
Each process needs to know which gpu to use and where it ranks amongst all the processes that are running

Parameters:

Calculated:

Training:

Drawbacks:
Here we try to train one model on multiple gpus, instead of one.
As elegant and simple the approach seems, there are few pitfalls.

Probelm:

Solution:

Alternative to nn.DistributeDataParallel is Nvidia’s Apex, for mixed precision

Mixed Precision: The use of lower-precision operations ( float16 and bfloat16 ) in a model during training to make it run faster and use less memory. Using mixed precision can improve performance by more than 3 times on modern GPUs and 60% on TPUs.[1].
The weight will remain at 32-bit, whereas other parameters like loss, gradients, etc will be computed at 16-bits. More of that here

Usage

    $ python main.py --nodes 1 --gpus 2 --epochs 5
    $ python main.py --nodes 2 --gpus 2 --epochs 5

Distributed Training on Nvidia’s Apex

Concept

Nvidia’s Apex is the best alterative to traditional distributed training for the following reasons

Changes as compared to torch

Usage

    $ python main.py --nodes 1 --gpus 2 --epochs 5
    $ python main.py --nodes 2 --gpus 2 --epochs 5