Last week, PyTorch Lightning 0.9.0 and Hydra’s fourth release candidate for 1.0.0 were released with a choke-full of new features and mostly final APIs. I thought it’d be a good time for me to revisit my side project Leela Zero PyTorch to see how these new versions can be integrated into it. In this post, I’ll talk about some of the new features of the two libraries, and how they helped Leela Zero PyTorch. I’m not going to talk about the details about Leela Zero PyTorch all too much here, so if you want to read more about my side project for more context, you can read my previous blog post about it here.
This is a major milestone for the PyTorch Lightning team as they diligently work toward the 1.0.0 release. It introduces a number of new features and an API that is ever closer to the final one. Before we jump in, if you want to read more about this release, check out the official blog post. If you want to learn more about PyTorch Lightning in general, check out the Github page as well as the official documentation.
Have you found yourself repetitively implementing *_epoch_end
methods just so that you can aggregate results from your *_step
methods? Have you found yourself getting tripped on how to properly log the metrics calculated in your *_step
and *_epoch_end
methods? You’re not alone, and PyTorch Lightning 0.9.0 has introduced a new abstraction called Result
to solve these very problems.
There are two types of Result
, TrainResult
and EvalResult
. As the names suggest, TrainResult
is used for training and EvalResult
is used for validation and testing. Their interfaces are simple: you specify the main metrics to act on during instantiation (for TrainResult
, the metrics to minimize, for EvalResult
, metrics to checkpoint or early stop on), then you specify additional metrics to log. Let’s take a look at how they’re used in my project:
#machine-learning #neural-networks #deep-learning #artificial-intelligence #pytorch