Statistical Programming in Machine Learning: Contrast Between Pyro and TFP

In Machine learning, statistical or probabilistic programming is done using 2 programming languages as shown below. Giving a brief introduction, In simple words, probabilistic programming is a tool for statistical modeling. It basically means to solve problems using a language by which we can make and design statistical models as a solution.

It’s about applying the concepts of statistics using computer programming languages. Using probabilistic models, one can infer how our beliefs about the model’s hyperparameters can change the output.

Famous Probabilistic Programming Language’s

1. Pyro

Pyro is a probabilistic programming language (PPL) that is written in Python and is supported by Pytorch on the backend. With Pyro, we have access to deep probabilistic modeling, Bayesian modeling, and combine the best of modern deep learning algorithms. It can be installed as follows:

pip3 Install Pyro-ppl

or to install it from the source use the following commands:

git clone https://github.com/pyro-ppl/pyro.git

cd pyro

pip install .[extras]

Import Pyro using a simple line of code:

import pyro

2. Tensor Flow Probability (TFP)

TFP is a Python library built on TensorFlow that makes possible the combination of probabilistic models and deep learning models on GPU and TPU. It can be used by anyone who wishes to incorporate domain knowledge to understand and make relevant predictions. To install TFP, type the following command in your command or anaconda prompt.

pip install –upgrade tensorflow-probability

TFP can be used in code using the following line of command:

import tensorflow_probability as tfp

The Contrast Between Pyro and TFP

1. Documentation

Documentation for Pyro and TFP is excellent and plentiful while it’s fewer on the explanation for TFP from the prospect of neural networks. In pyro, the module pyro.nn presents implementations of neural network modules that are useful in the context of deep probabilistic programming. In TFP, tfp.layers represent neural network layers with uncertainty over the functions they represent, extending TensorFlow Layers.

2. Language

The users of both TFP and Pyro write in python. However, the API involved in the case of TFP is extremely verbose. By that, I mean, we sometimes have to write more lines of code to reach a solution. That can be good at times because we have more control over the entire program and bad when it is available in a shorter form within Pyro.

3. Ramp-up Time

With Pyro, the code executes is faster and efficient, and you will require no new concepts to learn. TFP, on the other hand, requires concepts like placeholders, Variable scoping as well as sessions, thereby taking more time to execute.

4. Deployment

Both TFP and Pyro can be easily deployed on a small-scale server-side. For mobile and microcomputer or embedded deployments, TensorFlow works efficiently, unlike Pytorch. A lesser effort is required for deployment of TensorFlow in Android and IOS, compared to Pytorch.

5. Graphs

Tensorflow has better computational graph visualizations, which are indigenous when compared to other libraries like Torch and Theano. Edward is built on TensorFlow and enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard. Pyro, however, does not provide any demonstrative or visualization functionality.

6. Markov Chain Monte Carlo

TFP implements a ton of Markov chain Monte Carlo (MCMC) algorithms(like Metropolis, Gibbs, Hamiltonian) whose use is sample a probability distribution and a few of Value Iteration algorithms in TensorFlow. Until 2018 Pyro didn’t perform Markov chain Monte Carlo. It has been updated and has full MCMC, HMC, and NUTS support.

7. Optimizers

Just like TFP implements several optimizers of TensorFlow, including Nelder-Mead, BFGS, and L-BFGS (for determining unconstrained nonlinear optimization problems), Pyro implements the optimizers that are present in PyTorch.The module pyro.optim provides support for optimization in Pyro. It can be said that the two PPL’s are dependent on their basic modules (TensorFlow and PyTorch).

8. Bijectors

In TFP, bijectors includes the change of variables for a probability density. When we map from one space to another, we also influence a map from probability densities on the initial space to densities on the target space.

But as we are mapping to a different space, we need to track these mapping accounts for them in the computation of the probability density in the latter space. Bijectors are therefore used for smooth mapping. In pyro, the documentation doesn’t mention anything about the bijectors, so I assume they don’t have them.

9. Time Series

The pyro.contrib.timeseries module provides a collection of Bayesian time series models useful for forecasting applications. This can be achieved by making use of the existing Forecaster object in Pyro. After we give input data to the model, we just tell the model how to make an informed prediction.

It’s that easy, just data and a probabilistic framework. TFP however makes use of Tensorflow’s time series models like CNN’s and RNN’s along with its Framework for Bayesian structural time series models (tfp.sts). Bayesian structural time series is a high-level interface for fitting time-series models which is yet to be released.

10. Distributions

It is a base class for constructing and organizing properties (e.g., mean, variance) of random variables (e.g, Bernoulli, Gaussian). One example can be a normal distribution. Most distributions in Pyro are thin wrappers around PyTorch distributions. For details on the PyTorch distribution interface, you can check out torch.distributions.distribution.Distribution. TFP however has its module tfp.distributions.

11. Generalized Linear Models(GLM)

In statistics, the generalized linear model is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. In TFP, the tfp.glm module contains a high-level interface for fitting mixed-effects regression models. Pyro, however, does not have such a module for GLM.

Conclusion

Using these factors, it is safe to conclude that Pyro does not differ so much from TFP. They are both based in the Python programming language. Python APIs are well documented. Pytorch, however, has a good ramp up time and is therefore much faster than TensorFlow. Deciding among these two frameworks will rely on how accessible you find the learning method for each of them. Your selection will also depend on your organization’s requirements.

#artificial intelligence #machine learning #statistical programming