This multi-class pipeline demo.
- It uses Hydra to manage the configuration of model, data loading and training configuration.
- It uses Clear.ml to track experiments, models, datasets, and pipelines
The configuration included trains a ResNet (pretrained on imagen et) on the Microsoft Pets dataset
-
Python 3.7+
-
virtualenv / conda (optional)
pip install -r requirements.txt
- Create a free account in Clear.ml website
- Follow the instructions here:
This repo contains a sample of the Microsoft dataset, divided into train and test. If you would like to download the full dataset, you can find it here dataset
.
├── data
│ ├── test
│ │ ├── cat
│ │ ├── dog
│ └── train
│ ├── cat
│ ├── dog
└── src
├── conf
│ ├── data_pipeline
│ └── trainer
│ ├── dataloader
│ │ └── dataset
│ ├── model
│ │ └── optimizer
│ └── tracker
├── data_pipeline
├── evaluator
└── trainer
├── dataloader
│ └── dataset
└── models
Each package/module has an equivalent config "package" with potentially multiple settings.
For example, the model config folder has an optimizer sub-folder. This sub folder contain configurations that specify how to instantiate and configure different optimizers. By default, the model uses the 'adam.yaml' configuration.
Hydra driven ETL code that extracts, transforms and loads data. Configured to work with Clear.ml datasets
In charge of the training loop, publishing checkpoints, and metrics to Tensorboard
trainer.py
contains the main training entrypoint.
In charge of the loading process of data - from augmentation to batching.
Points to a dataset
object that contains details about the actual images
Contains the configuration for actually loading images from file system, including metadata on the image format
In charge of the model architecture, optimizers, learning rates etc. Provided a ResNet model implementation
Using Hydra, you can override any configuration purely from the command line. Each run writes
- output folder with the final configuration used post overrides
- Log files to the file system.
for example, to override an object, such as the dataset we use for training - trigger the trainer as follows:
python src/trainer/trainer.py dataloader/dataset=pets
to override a primitive value, use the dot notation:
python src/trainer/trainer.py dataloader.augmentations.horizontal_flip=False
Using a simple syntax, Hydra allows you to launch multiple instances of your program, with differnt configuration variants.
Using a launcher, you can even specify that these different experiments run in parallel to each other.
Example:
python train.py dataloader.augmentations.horizontal_flip=False,True hydra/launcher=joblib -m
This command:
- Uses the
joblib
job launcher - Fires 2 experiemnts concurrently: a. horizontal_flip=False b. horizontal_flip=True
Each experiment will write its logs to a folder such as this:
└── 2021-09-05
└── 18-36-15
├── 0
│ └── classifier.log
├── 1
│ └── classifier.log
└── multirun.yaml
-
The
trainer
and thedataset
modules have a flag called "track". By default they are set to "false".- If the trainer's flag is set to "true", it will create a new experiment each time you run the trainer.py script and log all the environment including the hydra configuraiton
- If the dataset's flag is set to "true", it will expect to load the data from Clear.ml's Data versioning API and not from your local file system
-
The "data_pipeline.py" module uploads the (processed) dataset to Clear.ml data versioning API
-
The "masks_pipeline.py" that runs both the data_pipeline.py and the trainer.py is configured to clone tasks that already exist within the Clear.ml system. In order to run it, you need to first run the data_pipeline.py and the trainer.py with "track=true"