Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
Jane  Reid

Jane Reid


How to Convert PyTorch Model To Keras


PyTorch to Keras model converter.


pip install pytorch2keras 

Important notice

To use the converter properly, please, make changes in your ~/.keras/keras.json:

..."backend": "tensorflow","image_data_format": "channels_first",...


For the proper conversion to a tensorflow.js format, please use the new flag names='short'.

Here is a short instruction how to get a tensorflow.js model:

  1. First of all, you have to convert your model to Keras with this converter:
k_model = pytorch_to_keras(model, input_var, [(10, 32, 32,)], verbose=True, names='short')  
  1. Now you have Keras model. You can save it as h5 file and then convert it with tensorflowjs_converter but it doesn't work sometimes. As alternative, you may get Tensorflow Graph and save it as a frozen model:
# Function below copied from here:# def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):    """    Freezes the state of a session into a pruned computation graph.    Creates a new computation graph where variable nodes are replaced by    constants taking their current value in the session. The new graph will be    pruned so subgraphs that are not necessary to compute the requested    outputs are removed.    @param session The TensorFlow session to be frozen.    @param keep_var_names A list of variable names that should not be frozen,                          or None to freeze all the variables in the graph.    @param output_names Names of the relevant graph outputs.    @param clear_devices Remove the device directives from the graph for better portability.    @return The frozen graph definition.    """    from tensorflow.python.framework.graph_util import convert_variables_to_constants    graph = session.graph    with graph.as_default():        freeze_var_names = \            list(set( for v in tf.global_variables()).difference(keep_var_names or []))        output_names = output_names or []        output_names += [ for v in tf.global_variables()]        input_graph_def = graph.as_graph_def()        if clear_devices:            for node in input_graph_def.node:                node.device = ""        frozen_graph = convert_variables_to_constants(session, input_graph_def,                                                      output_names, freeze_var_names)        return frozen_graphfrom keras import backend as Kimport tensorflow as tffrozen_graph = freeze_session(K.get_session(),                              output_names=[ for out in k_model.outputs])tf.train.write_graph(frozen_graph, ".", "my_model.pb", as_text=False)print([i for i in k_model.outputs])
  1. You will see the output layer name, so, now it's time to convert my_model.pb to tfjs model:
tensorflowjs_converter  \    --input_format=tf_frozen_model \    --output_node_names='TANHTObs/Tanh' \    my_model.pb \    model_tfjs
  1. Thats all!
const MODEL_URL = `model_tfjs/tensorflowjs_model.pb`;
const WEIGHTS_URL = `model_tfjs/weights_manifest.json`;
const model = await tf.loadFrozenModel(MODEL_URL, WEIGHTS_URL);

How to use

It's the converter of PyTorch graph to a Keras (Tensorflow backend) model.

Firstly, we need to load (or create) a valid PyTorch model:

class TestConv2d(nn.Module):    """    Module for Conv2d testing    """    def __init__(self, inp=10, out=16, kernel_size=3):        super(TestConv2d, self).__init__()        self.conv2d = nn.Conv2d(inp, out, stride=1, kernel_size=kernel_size, bias=True)    def forward(self, x):        x = self.conv2d(x)        return xmodel = TestConv2d()# load weights here# model.load_state_dict(torch.load(path_to_weights.pth))

The next step - create a dummy variable with correct shape:

input_np = np.random.uniform(0, 1, (1, 10, 32, 32))input_var = Variable(torch.FloatTensor(input_np))

We use the dummy-variable to trace the model (with jit.trace):

from pytorch2keras import pytorch_to_keras# we should specify shape of the input tensork_model = pytorch_to_keras(model, input_var, [(10, 32, 32,)], verbose=True)  

You can also set H and W dimensions to None to make your model shape-agnostic (e.g. fully convolutional netowrk):

from pytorch2keras.converter import pytorch_to_keras# we should specify shape of the input tensork_model = pytorch_to_keras(model, input_var, [(10, None, None,)], verbose=True)  

That's all! If all the modules have converted properly, the Keras model will be stored in the k_model variable.


Here is the only method pytorch_to_keras from pytorch2keras module.

def pytorch_to_keras(    model, args, input_shapes=None,    change_ordering=False, verbose=False, name_policy=None,):


  • model - a PyTorch model (nn.Module) to convert;
  • args - a list of dummy variables with proper shapes;
  • input_shapes - (experimental) list with overrided shapes for inputs;
  • change_ordering - (experimental) boolean, if enabled, the converter will try to change BCHW to BHWC
  • verbose - boolean, detailed log of conversion
  • name_policy - (experimental) choice from [keep, short, random]. The selector set the target layer naming policy.

Supported layers


  • ReLU
  • LeakyReLU
  • SELU
  • Sigmoid
  • Softmax
  • Tanh



  • Conv2d
  • ConvTrsnpose2d


  • Add
  • Mul
  • Sub
  • Div



  • BatchNorm2d
  • InstanceNorm2d


  • MaxPool2d
  • AvgPool2d
  • Global MaxPool2d (adaptive pooling to shape [1, 1])

Models converted with pytorch2keras

  • ResNet*
  • VGG*
  • PreResNet*
  • DenseNet*
  • AlexNet
  • Mobilenet v2


Look at the tests directory.


This software is covered by MIT License.

Author: gmalivenko
Source Code:
License: MIT License

#python #pytorch #keras 

How to Convert PyTorch Model To Keras
Dorcas  Ferry

Dorcas Ferry


Magnitude: A Fast, Simple Vector Embedding Utility Library

Magnitude: a fast, simple vector embedding utility library

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. It offers unique features like out-of-vocabulary lookups and streaming of large models over HTTP. Published in our paper at EMNLP 2018 and available on arXiv.

Table of Contents


You can install this package with pip:

pip install pymagnitude # Python 2.7
pip3 install pymagnitude # Python 3

Google Colaboratory has some dependency issues with installing Magnitude due to conflicting dependencies. You can use the following snippet to install Magnitude on Google Colaboratory:

# Install Magnitude on Google Colab
! echo "Installing Magnitude.... (please wait, can take a while)"
! (curl | /bin/bash 1>/dev/null 2>/dev/null)
! echo "Done installing Magnitude."


Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

The Magnitude file format (.magnitude) for vector embeddings is intended to be a more efficient universal vector embedding format that allows for lazy-loading for faster cold starts in development, LRU memory caching for performance in production, multiple key queries, direct featurization to the inputs for a neural network, performant similiarity calculations, and other nice to have features for edge cases like handling out-of-vocabulary keys or misspelled keys and concatenating multiple vector models together. It also is intended to work with large vector models that may not fit in memory.

It uses SQLite, a fast, popular embedded database, as its underlying data store. It uses indexes for fast key lookups as well as uses memory mapping, SIMD instructions, and spatial indexing for fast similarity search in the vector space off-disk with good memory performance even between multiple processes. Moreover, memory maps are cached between runs so even after closing a process, speed improvements are reaped.

Benchmarks and Features

MetricMagnitude LightMagnitude MediumMagnitude HeavyMagnitude Stream
Initial load time0.7210s━ 1━ 17.7550s
Cold single key query0.0001s━ 1━ 11.6437s
Warm single key query 
(same key as cold query)
0.00004s━ 1━ 10.0004s
Cold multiple key query 
0.0442s━ 1━ 11.7753s
Warm multiple key query 
(n=25) (same keys as cold query)
0.00004s━ 1━ 10.0001s
First most_similar search query 
(n=10) (worst case)
247.05s━ 1━ 1-
First most_similar search query 
(n=10) (average case) (w/ disk persistent cache)
1.8217s━ 1━ 1-
Subsequent most_similar search 
(n=10) (different key than first query)
0.2434s━ 1━ 1-
Warm subsequent most_similar search 
(n=10) (same key as first query)
First most_similar_approx search query 
(n=10, effort=1.0) (worst case)
First most_similar_approx search query 
(n=10, effort=1.0) (average case) (w/ disk persistent cache)
Subsequent most_similar_approx search 
(n=10, effort=1.0) (different key than first query)
Subsequent most_similar_approx search 
(n=10, effort=0.1) (different key than first query)
Warm subsequent most_similar_approx search 
(n=10, effort=1.0) (same key as first query)
File size4.21GB5.29GB10.74GB0.00GB
Process memory (RAM) utilization18KB━ 1━ 11.71MB
Process memory (RAM) utilization after 100 key queries168KB━ 1━ 11.91MB
Process memory (RAM) utilization after 100 key queries + similarity search342KB2━ 1━ 1 
Integrity checks and tests
Universal format between word2vec (.txt, .bin), GloVe (.txt), fastText (.vec), and ELMo (.hdf5) with converter utility
Simple, Pythonic interface
Few dependencies
Support for larger than memory models
Lazy loading whenever possible for speed and performance
Optimized for threading and multiprocessing
Bulk and multiple key lookup with padding, truncation, placeholder, and featurization support
Concatenting multiple vector models together
Basic out-of-vocabulary key lookup 
(character n-gram feature hashing)
Advanced out-of-vocabulary key lookup with support for misspellings 
(character n-gram feature hashing to similar in-vocabulary keys)
Approximate most similar search with an annoy index
Built-in training for new models

1: same value as previous column
2: uses mmap to read from disk, so the OS will still allocate pages of memory when memory is available, but it can be shared between processes and isn't managed within each process for extremely large files which is a performance win
*: All benchmarks were performed on the Google News pre-trained word vectors (GoogleNews-vectors-negative300.bin) with a MacBook Pro (Retina, 15-inch, Mid 2014) 2.2GHz quad-core Intel Core i7 @ 16GB RAM on SSD over an average of trials where feasible.

Pre-converted Magnitude Formats of Popular Embeddings Models

Popular embedding models have been pre-converted to the .magnitude format for immmediate download and usage:


(basic support for out-of-vocabulary keys)

(advanced support for out-of-vocabulary keys)

(advanced support for out-of-vocabulary keys and faster most_similar_approx)
Google - word2vecGoogle News 100B300D300D300D
Stanford - GloVeWikipedia 2014 + Gigaword 5 6B50D100D200D300D50D100D200D300D50D100D200D300D
Stanford - GloVeWikipedia 2014 + Gigaword 5 6B 
(lemmatized by Plasticity)
Stanford - GloVeCommon Crawl 840B300D300D300D
Stanford - GloVeTwitter 27B25D50D100D200D25D50D100D200D25D50D100D200D
Facebook - fastTextEnglish Wikipedia 2017 16B300D300D300D
Facebook - fastTextEnglish Wikipedia 2017 + subword 16B300D300D300D
Facebook - fastTextCommon Crawl 600B300D300D300D
AI2 - AllenNLP ELMoELMo ModelsELMo ModelsELMo ModelsELMo Models
Google - BERTComing Soon...Coming Soon...Coming Soon...Coming Soon...

There are instructions below for converting any .bin, .txt, .vec, .hdf5 file to a .magnitude file.

Using the Library

Constructing a Magnitude Object

You can create a Magnitude object like so:

from pymagnitude import *
vectors = Magnitude("/path/to/vectors.magnitude")

If needed, and included for convenience, you can also open a .bin, .txt, .vec, .hdf5 file directly with Magnitude. This is, however, less efficient and very slow for large models as it will convert the file to a .magnitude file on the first run into a temporary directory. The temporary directory is not guaranteed to persist and does not persist when your computer reboots. You should pre-convert .bin, .txt, .vec, .hdf5 files with python -m pymagnitude.converter typically for faster speeds, but this feature is useful for one-off use-cases. A warning will be generated when instantiating a Magnitude object directly with a .bin, .txt, .vec, .hdf5. You can supress warnings by setting the supress_warnings argument in the constructor to True.

  • By default, lazy loading is enabled. You can pass in an optional lazy_loading argument to the constructor with the value -1 to disable lazy-loading and pre-load all vectors into memory (a la Gensim), 0 (default) to enable lazy-loading with an unbounded in-memory LRU cache, or an integer greater than zero X to enable lazy-loading with an LRU cache that holds the X most recently used vectors in memory.
  • If you want the data for the most_similar functions to be pre-loaded eagerly on initialization, set eager to True.
  • Note, even when lazy_loading is set to -1 or eager is set to True data will be pre-loaded into memory in a background thread to prevent the constructor from blocking for a few minutes for large models. If you really want blocking behavior, you can pass True to the blocking argument.
  • By default, unit-length normalized vectors are returned unless you are loading an ELMo model. Set the optional argument normalized to False if you wish to recieve the raw non-normalized vectors instead.
  • By default, NumPy arrays are returned for queries. Set the optional argument use_numpy to False if you wish to recieve Python lists instead.
  • By default, querying for keys is case-sensitive. Set the optional argument case_insensitive to True if you wish to perform case-insensitive searches.
  • Optionally, you can include the pad_to_length argument which will specify the length all examples should be padded to if passing in multple examples. Any examples that are longer than the pad length will be truncated.
  • Optionally, you can set the truncate_left argument to True if you want the beginning of the the list of keys in each example to be truncated instead of the end in case it is longer than pad_to_length when specified.
  • Optionally, you can set the pad_left argument to True if you want the padding to appear at the beginning versus the end (which is the default).
  • Optionally, you can pass in the placeholders argument, which will increase the dimensions of each vector by a placeholders amount, zero-padding those extra dimensions. This is useful, if you plan to add other values and information to the vectors and want the space for that pre-allocated in the vectors for efficiency.
  • Optionally, you can pass in the language argument with an ISO 639-1 Language Code, which, if you are using Magnitude for word vectors, will ensure the library respects stemming and other language-specific features for that language. The default is en for English. You can also pass in None if you are not using Magnitude for word vectors.
  • Optionally, you can pass in the dtype argument which will let you control the data type of the NumPy arrays returned by Magnitude.
  • Optionally, you can pass in the devices argument which will let you control the usage of GPUs when the underlying models supports GPU usage. This argument should be a list of integers, where each integer represents the GPU device number (0, 1, etc.).
  • Optionally, you can pass in the temp_dir argument which will let you control the location of the temporary directory Magnitude will use.
  • Optionally, you can pass in the log argument which will have Magnitude log progress to standard error when slow operations are taking place.


You can query the total number of vectors in the file like so:


You can query the dimensions of the vectors like so:


You can check if a key is in the vocabulary like so:

"cat" in vectors

You can iterate through all keys and vectors like so:

for key, vector in vectors:

You can query for the vector of a key like so:


You can index for the n-th key and vector like so:


You can query for the vector of multiple keys like so:

vectors.query(["I", "read", "a", "book"])

A 2D array (keys by vectors) will be returned.

You can query for the vector of multiple examples like so:

vectors.query([["I", "read", "a", "book"], ["I", "read", "a", "magazine"]])

A 3D array (examples by keys by vectors) will be returned. If pad_to_length is not specified, and the size of each example is uneven, they will be padded to the length of the longest example.

You can index for the keys and vectors of multiple indices like so:

vectors[:42] # slice notation
vectors[42, 1337, 2001] # tuple notation

You can query the distance of two or multiple keys like so:

vectors.distance("cat", "dog")
vectors.distance("cat", ["dog", "tiger"])

You can query the similarity of two or multiple keys like so:

vectors.similarity("cat", "dog")
vectors.similarity("cat", ["dog", "tiger"])

You can query for the most similar key out of a list of keys to a given key like so:

vectors.most_similar_to_given("cat", ["dog", "television", "laptop"]) # dog

You can query for which key doesn't match a list of keys to a given key like so:

vectors.doesnt_match(["breakfast", "cereal", "dinner", "lunch"]) # cereal

You can query for the most similar (nearest neighbors) keys like so:

vectors.most_similar("cat", topn = 100) # Most similar by key
vectors.most_similar(vectors.query("cat"), topn = 100) # Most similar by vector

Optionally, you can pass a min_similarity argument to most_similar. Values from [-1.0-1.0] are valid.

You can also query for the most similar keys giving positive and negative examples (which, incidentally, solves analogies) like so:

vectors.most_similar(positive = ["woman", "king"], negative = ["man"]) # queen

Similar to vectors.most_similar, a vectors.most_similar_cosmul function exists that uses the 3CosMul function from Levy and Goldberg:

vectors.most_similar_cosmul(positive = ["woman", "king"], negative = ["man"]) # queen

You can also query for the most similar keys using an approximate nearest neighbors index which is much faster, but doesn't guarantee the exact answer:

vectors.most_similar_approx(positive = ["woman", "king"], negative = ["man"])

Optionally, you can pass an effort argument with values between [0.0-1.0] to the most_similar_approx function which will give you runtime trade-off. The default value for effort is 1.0 which will take the longest, but will give the most accurate result.

You can query for all keys closer to a key than another key is like so:

vectors.closer_than("cat", "rabbit") # ["dog", ...]

You can access all of the underlying vectors in the model in a large numpy.memmap array of size (len(vectors) x vectors.emb_dim) like so:


You can clean up all associated resources, open files, and database connections like so:


Basic Out-of-Vocabulary Keys

For word vector representations, handling out-of-vocabulary keys is important to handling new words not in the trained model, handling mispellings and typos, and making models trained on the word vector representations more robust in general.

Out-of-vocabulary keys are handled by assigning them a random vector value. However, the randomness is deterministic. So if the same out-of-vocabulary key is encountered twice, it will be assigned the same random vector value for the sake of being able to train on those out-of-vocabulary keys. Moreover, if two out-of-vocabulary keys share similar character n-grams ("uberx", "uberxl") they will placed close to each other even if they are both not in the vocabulary:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"uberx" in vectors # False
"uberxl" in vectors # False
vectors.query("uberx") # array([ 5.07109939e-02, -7.08248823e-02, -2.74812328e-02, ... ])
vectors.query("uberxl") # array([ 0.04734962, -0.08237578, -0.0333479, -0.00229564, ... ])
vectors.similarity("uberx", "uberxl") # 0.955000000200815

Advanced Out-of-Vocabulary Keys

If using a Magnitude file with advanced out-of-vocabulary support (Medium or Heavy), out-of-vocabulary keys will also be embedded close to similar keys (determined by string similarity) that are in the vocabulary:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"uberx" in vectors # False
"uberification" in vectors # False
"uber" in vectors # True
vectors.similarity("uberx", "uber") # 0.7383483267618451
vectors.similarity("uberification", "uber") # 0.745452837882727

Handling Misspellings and Typos

This also makes Magnitude robust to a lot of spelling errors:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"missispi" in vectors # False
vectors.similarity("missispi", "mississippi") # 0.35961736624824003
"discrimnatory" in vectors # False
vectors.similarity("discrimnatory", "discriminatory") # 0.8309152561753461
"hiiiiiiiiii" in vectors # False
vectors.similarity("hiiiiiiiiii", "hi") # 0.7069775034853861

Character n-grams are used to create this effect for out-of-vocabulary keys. The inspiration for this feature was taken from Facebook AI Research's Enriching Word Vectors with Subword Information, but instead of utilizing character n-grams at train time, character n-grams are used at inference so the effect can be somewhat replicated (but not perfectly replicated) in older models that were not trained with character n-grams like word2vec and GloVe.

Concatenation of Multiple Models

Optionally, you can combine vectors from multiple models to feed stronger information into a machine learning model like so:

from pymagnitude import *
word2vec = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
glove = Magnitude("/path/to/glove.6B.50d.magnitude")
vectors = Magnitude(word2vec, glove) # concatenate word2vec with glove
vectors.query("cat") # returns 350-dimensional NumPy array ('cat' from word2vec concatenated with 'cat' from glove)
vectors.query(("cat", "cats")) # returns 350-dimensional NumPy array ('cat' from word2vec concatenated with 'cats' from glove)

You can concatenate more than two vector models, simply by passing more arguments to constructor.

Additional Featurization (Parts of Speech, etc.)

You can automatically create vectors from additional features you may have such as parts of speech, syntax dependency information, or any other information using the FeaturizerMagnitude class:

from pymagnitude import *
pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
pos_vectors.dim # 4 - number of dims automatically determined by Magnitude from 100
pos_vectors.query("NN") # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192]) 
pos_vectors.query("JJ") # - array([-0.11681135,  0.10259253,  0.8841201 , -0.44063763])
pos_vectors.query("NN") # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192]) (deterministic hashing so the same value is returned every time for the same key)
dependency_vectors = FeaturizerMagnitude(100, namespace = "SyntaxDependencies")
dependency_vectors.dim # 4 - number of dims automatically determined by Magnitude from 100
dependency_vectors.query("nsubj") # - array([-0.81043793,  0.55401352, -0.10838071,  0.15656626])
dependency_vectors.query("prep") # - array([-0.30862918, -0.44487267, -0.0054573 , -0.84071788])

Magnitude will use the feature hashing trick internally to directly use the hash of the feature value to create a unique vector for that feature value.

The first argument to FeaturizerMagnitude should be an approximate upper-bound on the number of values for the feature. Since there are < 100 parts of speech tags and < 100 syntax dependencies, we choose 100 for both in the example above. The value chosen will determine how many dimensions Magnitude will automatically assign to the particular the FeaturizerMagnitude object to reduce the chance of a hash collision. The namespace argument can be any string that describes your additional feature. It is optional, but highly recommended.

You can then concatenate these features for use with a standard Magnitude object:

from pymagnitude import *
word2vec = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
dependency_vectors = FeaturizerMagnitude(100, namespace = "SyntaxDependencies")
vectors = Magnitude(word2vec, pos_vectors, dependency_vectors) # concatenate word2vec with pos and dependencies
    ("I", "PRP", "nsubj"), 
    ("saw", "VBD", "ROOT"), 
    ("a", "DT", "det"), 
    ("cat", "NN", "dobj"), 
    (".",  ".", "punct")
  ]) # array of size 5 x (300 + 4 + 4) or 5 x 308

# Or get a unique vector for every 'buffalo' in:
# "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo"
# (
    ("Buffalo", "JJ", "amod"), 
    ("buffalo", "NNS", "nsubj"), 
    ("Buffalo", "JJ", "amod"), 
    ("buffalo", "NNS", "nsubj"), 
    ("buffalo",  "VBP", "rcmod"),
    ("buffalo",  "VB", "ROOT"),
    ("Buffalo",  "JJ", "amod"),
    ("buffalo",  "NNS", "dobj")
  ]) # array of size 8 x (300 + 4 + 4) or 8 x 308

A machine learning model, given this output, now has access to parts of speech information and syntax dependency information instead of just word vector information. In this case, this additional information can give neural networks stronger signal for semantic information and reduce the need for training data.

Using Magnitude with a ML library

Magnitude makes it very easy to quickly build and iterate on models that need to use vector representations by taking care of a lot of pre-processing code to convert a dataset of text (or keys) into vectors. Moreover, it can make these models more robust to out-of-vocabulary words and misspellings.

There is example code available using Magnitude to build an intent classification model for the ATIS (Airline Travel Information Systems) dataset (Train/Test), used for chatbots or conversational interfaces, in a few popular machine learning libraries below.


You can access a guide for using Magnitude with Keras (which supports TensorFlow, Theano, CNTK) at this Google Colaboratory Python notebook.


The PyTorch guide is coming soon.


The TFLearn guide is coming soon.


You can use the MagnitudeUtils class for convenient access to functions that may be useful when creating machine learning models.

You can import MagnitudeUtils like so:

  from pymagnitude import MagnitudeUtils

You can download a Magnitude model from a remote source like so:

  vecs = Magnitude(MagnitudeUtils.download_model('word2vec/heavy/GoogleNews-vectors-negative300'))

By default, download_model will download files from to a ~/.magnitude folder created automatically. If the file has already been downloaded, it will not be downloaded again. You can change the directory of the local download folder using the optional download_dir argument. You can change the domain from which models will be downloaded with the optional remote_path argument.

You can create a batch generator for X and y data with batchify, like so:

  X = [.3, .2, .7, .8, .1]
  y = [0, 0, 1, 1, 0]
  batch_gen = MagnitudeUtils.batchify(X, y, 2)
  for X_batch, y_batch in batch_gen:
    print(X_batch, y_batch)
  # Returns:
  # 1st loop: X_batch = [.3, .2], y_batch = [0, 0]
  # 2nd loop: X_batch = [.7, .8], y_batch = [1, 1]
  # 3rd loop: X_batch = [.1], y_batch = [0]
  # next loop: repeats infinitely...

You can encode class labels to integers and back with class_encoding, like so:

  add_class, class_to_int, int_to_class = MagnitudeUtils.class_encoding()
  add_class("cat") # Returns: 0
  add_class("dog") # Returns: 1
  add_class("cat") # Returns: 0
  class_to_int("dog") # Returns: 1
  class_to_int("cat") # Returns: 0
  int_to_class(1) # Returns: "dog"
  int_to_class(0) # Returns: "cat"

You can convert categorical data with class integers to one-hot NumPy arrays with to_categorical, like so:

  y = [1, 5, 2]
  MagnitudeUtils.to_categorical(y, num_classes = 6) # num_classes is optional
  # Returns: 
  # array([[0., 1., 0., 0., 0., 0.] 
  #       [0., 0., 0., 0., 0., 1.] 
  #       [0., 0., 1., 0., 0., 0.]])

You can convert from one-hot NumPy arrays back to a 1D NumPy array of class integers with from_categorical, like so:

  y_c = [[0., 1., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 1.]]
  # Returns: 
  # array([1., 5.])

Concurrency and Parallelism

The library is thread safe (it uses a different connection to the underlying store per thread), is read-only, and it never writes to the file. Because of the light-memory usage, you can also run it in multiple processes (or use multiprocessing) with different address spaces without having to duplicate the data in-memory like with other libraries and without having to create a multi-process shared variable since data is read off-disk and each process keeps its own LRU memory cache. For heavier functions, like most_similar a shared memory mapped file is created to share memory between processes.

File Format and Converter

The Magnitude package uses the .magnitude file format instead of .bin, .txt, .vec, or .hdf5 as with other vector models like word2vec, GloVe, fastText, and ELMo. There is an included command-line utility for converting word2vec, GloVe, fastText, and ELMo files to Magnitude files.

You can convert them like so:

python -m pymagnitude.converter -i <PATH TO FILE TO BE CONVERTED> -o <OUTPUT PATH FOR MAGNITUDE FILE>

The input format will automatically be determined by the extension / the contents of the input file. You should only need to perform this conversion once for a model. After converting, the Magnitude file format is static and it will not be modified or written to make concurrent read access safe.

The flags for pymagnitude.converter are specified below:

  • You can pass in the -h flag for help and to list all flags.
  • You can use the -p <PRECISION> flag to specify the decimal precision to retain (selecting a lower number will create smaller files). The actual underlying values are stored as integers instead of floats so this is essentially quantization for smaller model footprints.
  • You can add an approximate nearest neighbors index to the file (increases size) with the -a flag which will enable the use of the most_similar_approx function. The -t <TREES> flag controls the number of trees in the approximate neigherest neighbors index (higher is more accurate) when used in conjunction with the -a flag (if not supplied, the number of trees is automatically determined).
  • You can pass the -s flag to disable adding subword information to the file (which will make the file smaller), but disable advanced out-of-vocabulary key support.
  • If converting a model that has no vocabulary like ELMo, you can pass the -v flag along with the path to another Magnitude file you would like to take the vocabulary from.

Optionally, you can bulk convert many files by passing an input folder and output folder instead of an input file and output file. All .txt, .bin, .vec, .hdf5 files in the input folder will be converted to .magnitude files in the the output folder. The output folder must exist before a bulk conversion operation.

Remote Loading

You can instruct Magnitude download and open a model from Magnitude's remote repository instead of a local file path. The file will automatically be downloaded locally on the first run to ~/.magnitude/ and subsequently skip the download if the file already exists locally.

  vecs = Magnitude('') # full url
  vecs = Magnitude('word2vec/heavy/GoogleNews-vectors-negative300') # or, use the shorthand for the url

For more control over the remote download domain and local download directory, see how to use MagnitudeUtils.download_model.

Remote Streaming over HTTP

Magnitude models are generally large files (multiple GB) that take up a lot of disk space, even though the .magnitude format makes it fast to utilize the vectors. Magnitude has an option to stream these large files over HTTP. This is explicitly different from the remote loading feature, in that the model doesn't even need to be downloaded at all. You can begin querying models immediately with no disk space used at all.

  vecs = Magnitude('', stream=True) # full url
  vecs = Magnitude('word2vec/heavy/GoogleNews-vectors-negative300', stream=True) # or, use the shorthand for the url

  vecs.query("king") # Returns: the vector for "king" quickly, even with no local model file downloaded

You can play around with a demo of this in a Google Colaboratory Python Notebook.

This feature is extremely useful if your computing environment is resource constrainted (low RAM and low disk space), you want to experiment quickly with vectors without downloading and setting up large model files, or you are training a small model. While there is some added network latency since the data is being streamed, Magnitude will still use an in-memory cache as specified by the lazy_loading constructor parameter. Since languages generally have a Zipf-ian distribution, the network latency should largely not be an issue after the cache is warmed after being queried a small number of times.

They will be queried directly off a static HTTP web server using HTTP Range Request headers. All Magnitude methods support streaming, however, most_similar and most_similar_approx may be slow as they are not optimized for streaming yet. You can see how this streaming mode performs currently in the benchmarks, however, it will get faster as we optimize it in the future!

Other Documentation

Other documentation is not available at this time. See the source file directly (it is well commented) if you need more information about a method's arguments or want to see all supported features.

Other Languages

Currently, we only provide English word vector models on this page pre-converted to the .magnitude format. You can, however, still use Magnitude with word vectors of other languages. Facebook has trained their fastText vectors for many different languages. You can down the .vec file for any language you want and then convert it to .magnitude with the converter.

Other Programming Languages

Currently, reading Magnitude files is only supported in Python, since it has become the de-facto language for machine learning. This is sufficient for most use cases. Extending the file format to other languages shouldn't be difficult as SQLite has a native C implementation and has bindings in most languages. The file format itself and the protocol for reading and searching is also fairly straightforward upon reading the source code of this repository.

Other Domains

Currently, natural language processing is the most popular domain that uses pre-trained vector embedding models for word vector representations. There are, however, other domains like computer vision that have started using pre-trained vector embedding models like Deep1B for image representation. This library intends to stay agnostic to various domains and instead provides a generic key-vector store and interface that is useful for all domains.


The main repository for this project can be found on GitLab. The GitHub repository is only a mirror. Pull requests for more tests, better error-checking, bug fixes, performance improvements, or documentation or adding additional utilties / functionalities are welcome on GitLab.

You can contact us at


  • Speed optimizations on remote streaming and exposing stream cache configuration options
  • Make most_similar_approx optimized for streaming
  • In addition to the "Light", "Medium", and "Heavy" flavors, add a "Ludicrous" flavor that will be of an even larger file size but removes the constraint of the initially slow most_similar lookups.
  • Add Google BERT support
  • Support fastText .bin format

Other Notable Projects

  • spotify/annoy - Powers the approximate nearest neighbors algorithm behind most_similar_approx in Magnitude using random-projection trees and hierarchical 2-means. Thanks to author Erik Bernhardsson for helping out with some of the integration details between Magnitude and Annoy.

Citing this Repository

If you'd like to cite our paper at EMNLP 2018, you can use the following BibTeX citation:

  title={Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package},
  author={Patel, Ajay and Sands, Alexander and Callison-Burch, Chris and Apidianaki, Marianna},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},

or follow the Google Scholar link for other ways to cite the paper.

If you'd like to cite this repository you can use the following DOI badge:  DOI

Clicking on the badge will lead to a page that will help you generate proper BibTeX citations, JSON-LD citations, and other citations.

LICENSE and Attribution

This repository is licensed under the license found here.

Seismic” icon by JohnnyZi from the Noun Project.

Author: plasticityai
Source Code:
License: MIT License

#python #keras 

Magnitude: A Fast, Simple Vector Embedding Utility Library
Jane  Reid

Jane Reid


BentoML: Model Serving Made Easy

Model Serving Made Easy 

BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

  • Supports multiple ML frameworks, including Tensorflow, PyTorch, Keras, XGBoost and more
  • Cloud native deployment with Docker, Kubernetes, AWS, Azure and many more
  • High-Performance online API serving and offline batch serving
  • Web dashboards and APIs for model registry and deployment management

BentoML bridges the gap between Data Science and DevOps. By providing a standard interface for describing a prediction service, BentoML abstracts away how to run model inference efficiently and how model serving workloads can integrate with cloud infrastructures. See how it works!

Join our community on Slack 👈

pypi status Downloads Actions Status Documentation Status join BentoML Slack


BentoML documentation:

Key Features

Production-ready online serving:

  • Support multiple ML frameworks including PyTorch, TensorFlow, Scikit-Learn, XGBoost, and many more
  • Containerized model server for production deployment with Docker, Kubernetes, OpenShift, AWS ECS, Azure, GCP GKE, etc
  • Adaptive micro-batching for optimal online serving performance
  • Discover and package all dependencies automatically, including PyPI, conda packages and local python modules
  • Serve compositions of multiple models
  • Serve multiple endpoints in one model server
  • Serve any Python code along with trained models
  • Automatically generate REST API spec in Swagger/OpenAPI format
  • Prediction logging and feedback logging endpoint
  • Health check endpoint and Prometheus /metrics endpoint for monitoring

Standardize model serving and deployment workflow for teams:

  • Central repository for managing all your team's prediction services via Web UI and API
  • Launch offline batch inference job from CLI or Python
  • One-click deployment to cloud platforms including AWS EC2, AWS Lambda, AWS SageMaker, and Azure Functions
  • Distributed batch or streaming serving with Apache Spark
  • Utilities that simplify CI/CD pipelines for ML
  • Automated offline batch inference job with Dask (roadmap)
  • Advanced model deployment for Kubernetes ecosystem (roadmap)
  • Integration with training and experimentation management products including MLFlow, Kubeflow (roadmap)

ML Frameworks

Deployment Options

Be sure to check out deployment overview doc to understand which deployment option is best suited for your use case.

One-click deployment with BentoML:

Deploy with open-source platforms:

Manual cloud deployment guides:


BentoML provides APIs for defining a prediction service, a servable model so to speak, which includes the trained ML model itself, plus its pre-processing, post-processing code, input/output specifications and dependencies. Here's what a simple prediction service look like in BentoML:

import pandas as pd

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput, JsonOutput
from bentoml.frameworks.sklearn import SklearnModelArtifact

# BentoML packages local python modules automatically for deployment
from my_ml_utils import my_encoder

class MyPredictionService(BentoService):
    A simple prediction service exposing a Scikit-learn model

    @api(input=DataframeInput(), output=JsonOutput(), batch=True)
    def predict(self, df: pd.DataFrame):
        An inference API named `predict` that takes tabular data in pandas.DataFrame 
        format as input, and returns Json Serializable value as output.

        A batch API is expect to receive a list of inference input and should returns
        a list of prediction results.
        model_input_df = my_encoder.fit_transform(df)
        predictions = self.artifacts.my_model.predict(model_input_df)

        return list(predictions)

This can be easily plugged into your model training process: import your bentoml prediction service class, pack it with your trained model, and call save to persist the entire prediction service at the end, which creates a BentoML bundle:

from my_prediction_service import MyPredictionService
svc = MyPredictionService()
svc.pack('my_model', my_sklearn_model)  # saves to $HOME/bentoml/repository/MyPredictionService/{version}/

The generated BentoML bundle is a file directory that contains all the code files, serialized models, and configs required for reproducing this prediction service for inference. BentoML automatically captures all the python dependencies information and have everything versioned and managed together in one place.

BentoML automatically generates a version ID for this bundle, and keeps track of all bundles created under the $HOME/bentoml directory. With a BentoML bundle, user can start a local API server hosting it, either by its file path or its name and version:

bentoml serve MyPredictionService:latest

# alternatively
bentoml serve $HOME/bentoml/repository/MyPredictionService/{version}/

A docker container image that's ready for production deployment can be created now with just one command:

bentoml containerize MyPredictionService:latest -t my_prediction_service:v3

docker run -p 5000:5000 my_prediction_service:v3 --workers 2

The container image produced will have all the required dependencies installed. Besides the model inference API, the containerized BentoML model server also comes with Prometheus metrics, health check endpoint, prediction logging, and tracing support out-of-the-box. This makes it super easy for your DevOps team to incorporate your models into production systems.

BentoML's model management component is called Yatai, it means food cart in Japanese, and you can think of it as where you'd store your bentos 🍱. Yatai provides CLI, Web UI, and Python API for accessing BentoML bundles you have created, and you can start a Yatai server for your team to manage all models on cloud storage(S3, GCS, MinIO etc) and build CI/CD workflow around it. Learn more about it here.

Yatai UI

Read the Quickstart Guide to learn more about the basic functionalities of BentoML. You can also try it out here on Google Colab.

Why BentoML

Moving trained Machine Learning models to serving applications in production is hard. It is a sequential process across data science, engineering and DevOps teams: after a model is trained by the data science team, they hand it over to the engineering team to refine and optimize code and creates an API, before DevOps can deploy.

And most importantly, Data Science teams want to continuously repeat this process, monitor the models deployed in production and ship new models quickly. It often takes months for an engineering team to build a model serving & deployment solution that allow data science teams to ship new models in a repeatable and reliable way.

BentoML is a framework designed to solve this problem. It provides high-level APIs for Data Science team to create prediction services, abstract away DevOps' infrastructure needs and performance optimizations in the process. This allows DevOps team to seamlessly work with data science side-by-side, deploy and operate their models packaged in BentoML format in production.

Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.


Have questions or feedback? Post a new github issue or discuss in our Slack channel: join BentoML Slack

Want to help build BentoML? Check out our contributing guide and the development guide.


BentoML is under active development and is evolving rapidly. It is currently a Beta release, we may change APIs in future releases and there are still major features being worked on.

Read more about the latest updates from the releases page.

Usage Tracking

BentoML by default collects anonymous usage data using Amplitude. It only collects BentoML library's own actions and parameters, no user or model data will be collected. Here is the code that does it.

This helps BentoML team to understand how the community is using this tool and what to build next. You can easily opt-out of usage tracking by running the BentoML commands with the --do-not-track option.

% bentoml [command] --do-not-track

or by setting the BENTOML_DO_NOT_TRACK environment variable to True.



Apache License 2.0

FOSSA Status

Author: bentoml
Source Code:
License: Apache-2.0 License

#python #kubernetes #tensorflow #azure #keras #pytorch 

BentoML: Model Serving Made Easy

DeepPavlov: Open-source Conversational AI Library for Chatbot

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch.

DeepPavlov is designed for

  • development of production ready chat-bots and complex conversational systems,
  • research in the area of NLP and, particularly, of dialog systems.


0.   We support Linux and Windows platforms, Python 3.6 and Python 3.7

  • Python 3.5 is not supported!
  • installation for Windows requires Git(for example, git) and Visual Studio 2015/2017 with C++ build tools installed!
  1. Create and activate a virtual environment:
  • Linux
python -m venv env
source ./env/bin/activate
  • Windows
python -m venv env

2.   Install the package inside the environment:

pip install deeppavlov


There is a bunch of great pre-trained NLP models in DeepPavlov. Each model is determined by its config file.

List of models is available on the doc page in the deeppavlov.configs (Python):

from deeppavlov import configs

When you're decided on the model (+ config file), there are two ways to train, evaluate and infer it:

GPU requirements

To run supported DeepPavlov models on GPU you should have CUDA 10.0 installed on your host machine and TensorFlow with GPU support (tensorflow-gpu) installed in your python environment. Current supported TensorFlow version is 1.15.2. Run

pip install tensorflow-gpu==1.15.2

before installing model's package requirements to install supported tensorflow-gpu version.

Before making choice of an interface, install model's package requirements (CLI):

python -m deeppavlov install <config_path>
  • where <config_path> is path to the chosen model's config file (e.g. deeppavlov/configs/ner/slotfill_dstc2.json) or just name without .json extension (e.g. slotfill_dstc2)

Command line interface (CLI)

To get predictions from a model interactively through CLI, run

python -m deeppavlov interact <config_path> [-d]
  • -d downloads required data -- pretrained model files and embeddings (optional).

You can train it in the same simple way:

python -m deeppavlov train <config_path> [-d]

Dataset will be downloaded regardless of whether there was -d flag or not.

To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.

There are even more actions you can perform with configs:

python -m deeppavlov <action> <config_path> [-d]
  • <action> can be
    • download to download model's data (same as -d),
    • train to train the model on the data specified in the config file,
    • evaluate to calculate metrics on the same dataset,
    • interact to interact via CLI,
    • riseapi to run a REST API server (see doc),
    • telegram to run as a Telegram bot (see doc),
    • msbot to run a Miscrosoft Bot Framework server (see doc),
    • predict to get prediction for samples from stdin or from if -f <file_path> is specified.
  • <config_path> specifies path (or name) of model's config file
  • -d downloads required data


To get predictions from a model interactively through Python, run

from deeppavlov import build_model

model = build_model(<config_path>, download=True)

# get predictions for 'input_text1', 'input_text2'
model(['input_text1', 'input_text2'])
  • where download=True downloads required data from web -- pretrained model files and embeddings (optional),
  • <config_path> is path to the chosen model's config file (e.g. "deeppavlov/configs/ner/ner_ontonotes_bert_mult.json") or deeppavlov.configs attribute (e.g. deeppavlov.configs.ner.ner_ontonotes_bert_mult without quotation marks).

You can train it in the same simple way:

from deeppavlov import train_model 

model = train_model(<config_path>, download=True)
  • download=True downloads pretrained model, therefore the pretrained model will be, first, loaded and then train (optional).

Dataset will be downloaded regardless of whether there was -d flag or not.

To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.

You can also calculate metrics on the dataset specified in your config file:

from deeppavlov import evaluate_model 

model = evaluate_model(<config_path>, download=True)

There are also available integrations with various messengers, see Telegram Bot doc page and others in the Integrations section for more info.

Breaking Changes

Breaking changes in version 0.15.0

Breaking changes in version 0.7.0

Breaking changes in version 0.6.0

    • all models default endpoints were renamed to /model
    • by default model arguments names are taken from configuration parameter instead of pre-set names from a settings file
    • swagger api endpoint moved from /apidocs to /docs
  • when using "max_proba": true in a proba2labels component for classification, it will return single label for every batch element instead of a list. One can set "top_n": 1 to get batches of single item lists as before

Breaking changes in version 0.5.0

  • dependencies have to be reinstalled for most pipeline configurations
  • models depending on tensorflow require CUDA 10.0 to run on GPU instead of CUDA 9.0
  • scikit-learn models have to be redownloaded or retrained

Breaking changes in version 0.4.0!

  • default target variable name for neural evolution was changed from MODELS_PATH to MODEL_PATH.

Breaking changes in version 0.3.0!

  • component option fit_on_batch in configuration files was removed and replaced with adaptive usage of the fit_on parameter.

Breaking changes in version 0.2.0!

  • utils module was moved from repository root in to deeppavlov module
  • ms_bot_framework_utils,server_utils, telegram utils modules was renamed to ms_bot_framework, server and telegram correspondingly
  • rename metric functions exact_match to squad_v2_em and squad_f1 to squad_v2_f1
  • replace dashes in configs name with underscores

Breaking changes in version 0.1.0!

As of version 0.1.0 all models, embeddings and other downloaded data for provided configurations are by default downloaded to the .deeppavlov directory in current user's home directory. This can be changed on per-model basis by modifying a ROOT_PATH variable or related fields one by one in model's configuration file.

In configuration files, for all features/models, dataset readers and iterators "name" and "class" fields are combined into the "class_name" field.

deeppavlov.core.commands.infer.build_model_from_config() was renamed to build_model and can be imported from the deeppavlov module directly.

The way arguments are passed to metrics functions during training and evaluation was changed and documented.

Quick Links

Please leave us your feedback on how we can improve the DeepPavlov framework.


Named Entity Recognition | Slot filling

Intent/Sentence Classification | Question Answering over Text (SQuAD)

Knowledge Base Question Answering

Sentence Similarity/Ranking | TF-IDF Ranking

Morphological tagging | Syntactic parsing

Automatic Spelling Correction | ELMo training and fine-tuning

Speech recognition and synthesis (ASR and TTS) based on NVIDIA NeMo

Entity Linking | Multitask BERT


Goal(Task)-oriented Bot | Open Domain Questions Answering

Frequently Asked Questions Answering


BERT embeddings for the Russian, Polish, Bulgarian, Czech, and informal English

ELMo embeddings for the Russian language

FastText embeddings for the Russian language

Auto ML

Tuning Models


REST API | Socket API | Yandex Alice

Telegram | Microsoft Bot Framework

Amazon Alexa | Amazon AWS

Download Details:
Author: deepmipt
Source Code:
License: Apache-2.0 License

#chatbot #machine-learning #python #deep-learning #tensorflow #nlp #keras #pytorch 

DeepPavlov: Open-source Conversational AI Library for Chatbot
Jane  Reid

Jane Reid


Open Standard for Machine Learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides an open source format for AI models, both deep learning and traditional ML. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Currently we focus on the capabilities needed for inferencing (scoring).

ONNX is widely supported and can be found in many frameworks, tools, and hardware. Enabling interoperability between different frameworks and streamlining the path from research to production helps increase the speed of innovation in the AI community. We invite the community to join us and further evolve ONNX.


Learn about the ONNX spec

Programming utilities for working with ONNX Graphs

NOTICE: ONNX now uses main branch as default branch

Here are the steps from ONNX wiki for migrating to main branch in local repo.


ONNX is a community project. We encourage you to join the effort and contribute feedback, ideas, and code. You can participate in the SIGs and Working Groups to shape the future of ONNX.

Check out our contribution guide to get started.

If you think some operator should be added to ONNX specification, please read this document.


We encourage you to open Issues, or use Slack for more real-time discussion

Follow Us

Stay up to date with the latest ONNX news. [Facebook] [Twitter]



numpy >= 1.16.6
protobuf >= 3.12.2
typing-extensions >=

Official Python packages

ONNX released packages are published in PyPi.

pip install numpy protobuf==3.16.0
pip install onnx

Weekly packages are published in test pypi to enable experimentation and early testing.

Conda packages

A binary build of ONNX is available from Conda, in conda-forge:

conda install -c conda-forge numpy protobuf==3.16.0 libprotobuf=3.16.0
conda install -c conda-forge onnx

You can also use the onnx-dev docker image for a Linux-based installation without having to worry about dependency versioning.

Build ONNX from Source

Before building from source uninstall any existing versions of onnx pip uninstall onnx.

Generally spreaking, you need to install protobuf C/C++ libraires and tools before proceeding forward. Then depending on how you installed protobuf, you need to set environment variable CMAKE_ARGS to "-DONNX_USE_PROTOBUF_SHARED_LIBS=ON" or "-DONNX_USE_PROTOBUF_SHARED_LIBS=OFF". For example, you may need to run the following command:





The ON/OFF depends on what kind of protobuf library you have. Shared libraries are files ending with *.dll/*.so/*.dylib. Static libraries are files ending with *.a/*.lib. This option depends on how you get your protobuf library and how it was built. And it is default OFF. You don't need to run the commands above if you'd prefer to use a static protobuf library.


If you are building ONNX from source, it is recommended that you also build Protobuf locally as a static library. The version distributed with conda-forge is a DLL, but ONNX expects it to be a static library. Building protobuf locally also lets you control the verison of protobuf. The tested and recommended version is 3.16.0.

The instructions in this README assume you are using Visual Studio. It is recommended that you run all the commands from a shell started from "x64 Native Tools Command Prompt for VS 2019" and keep the build system generator for cmake (e.g., cmake -G "Visual Studio 16 2019") consistent while building protobuf as well as ONNX.

You can get protobuf by running the following commands:

git clone
cd protobuf
git checkout v3.16.0
cd cmake
cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=<protobug_install_dir> -Dprotobuf_MSVC_STATIC_RUNTIME=OFF -Dprotobuf_BUILD_SHARED_LIBS=OFF -Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_EXAMPLES=OFF .
msbuild protobuf.sln /m /p:Configuration=Release
msbuild INSTALL.vcxproj /p:Configuration=Release

Then it will be built as a static library and installed to . Please add the bin directory(which contains protoc.exe) to your PATH.

set PATH=<protobug_install_dir>/bin;%PATH%

Please note: if your protobug_install_dir contains spaces, do not add quotation marks around it.

Alternative: if you don't want to change your PATH, you can set ONNX_PROTOC_EXECUTABLE instead.

set CMAKE_ARGS=-DONNX_PROTOC_EXECUTABLE=<full_path_to_protoc.exe>

Then you can build ONNX as:

git clone
cd onnx
git submodule update --init --recursive
# prefer lite proto
pip install -e .


First, you need to install protobuf.

Ubuntu users: the quickest way to install protobuf is to run

apt-get install python3-pip python3-dev libprotobuf-dev protobuf-compiler

Then you can build ONNX as:

git clone --recursive
cd onnx
# prefer lite proto
pip install -e .

Otherwise, you may need to install it from source. You can use the following commands to do it:


git clone
cd protobuf
git checkout v3.16.0
git submodule update --init --recursive
mkdir build_source && cd build_source
make -j$(nproc)
make install


git clone
cd protobuf
git checkout v3.16.0
git submodule update --init --recursive
mkdir build_source && cd build_source
make -j$(nproc)
make install

Here "-DCMAKE_POSITION_INDEPENDENT_CODE=ON" is crucial. By default static libraries are built without "-fPIC" flag, they are not position independent code. But shared libraries must be position independent code. Python C/C++ extensions(like ONNX) are shared libraries. So if a static library was not built with "-fPIC", it can't be linked to such a shared library.

Once build is successful, update PATH to include protobuf paths.

Then you can build ONNX as:

git clone
cd onnx
git submodule update --init --recursive
# prefer lite proto
pip install -e .
  • Mac

Once build is successful, update PATH to include protobuf paths.

Then you can build ONNX as:

git clone --recursive
cd onnx
# prefer lite proto
pip install -e .

Verify Installation

After installation, run

python -c "import onnx"

to verify it works.

Common Build Options

For full list refer to CMakeLists.txt
Environment variables

  • USE_MSVC_STATIC_RUNTIME should be 1 or 0, not ON or OFF. When set to 1 onnx links statically to runtime library.


  • DEBUG should be 0 or 1. When set to 1 onnx is built in debug mode. or debug versions of the dependencies, you need to open the CMakeLists file and append a letter d at the end of the package name lines. For example, NAMES protobuf-lite would become NAMES protobuf-lited.

Default: Debug=0

CMake variables


- When set to ON - onnx will dynamically link to protobuf shared libs, PROTOBUF_USE_DLLS will be defined as described here, Protobuf_USE_STATIC_LIBS will be set to OFF and USE_MSVC_STATIC_RUNTIME must be 0.
- When set to OFF - onnx will link statically to protobuf, and Protobuf_USE_STATIC_LIBS will be set to ON (to force the use of the static libraries) and USE_MSVC_STATIC_RUNTIME can be 0 or 1.

  • ONNX_USE_LITE_PROTO should be ON or OFF. When set to ON onnx uses lite protobuf instead of full protobuf.


  • ONNX_WERROR should be ON or OFF. When set to ON warnings are treated as errors.

Default: ONNX_WERROR=OFF in local builds, ON in CI and release pipelines.

Common Errors

Note: the import onnx command does not work from the source checkout directory; in this case you'll see ModuleNotFoundError: No module named 'onnx.onnx_cpp2py_export'. Change into another directory to fix this error.

Building ONNX on Ubuntu works well, but on CentOS/RHEL and other ManyLinux systems, you might need to open the CMakeLists file and replace all instances of /lib with /lib64.


ONNX uses pytest as test driver. In order to run tests, you will first need to install pytest:

pip install pytest nbval

After installing pytest, use the following command to run tests.



Check out the contributor guide for instructions.


Apache License v2.0

Code of Conduct

ONNX Open Source Code of Conduct

Author: onnx
Source Code:
License: Apache-2.0 License

#python #cpluplus #machine-learning #deep-learning #tensorflow #scikit #keras #pytorch 

Open Standard for Machine Learning interoperability
Jane  Reid

Jane Reid


Training Metrics for Pytorch, Tensorflow and Keras


A lightweight library for neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

HiddenLayer is simple, easy to extend, and works great with Jupyter Notebook. It's not intended to replace advanced tools, such as TensorBoard, but rather for cases where advanced tools are too big for the task. HiddenLayer was written by Waleed Abdulla and Phil Ferriere, and is licensed under the MIT License.

1. Readable Graphs

Use HiddenLayer to render a graph of your neural network in Jupyter Notebook, or to a pdf or png file. See Jupyter notebook examples for TensorFlow, PyTorch, and Keras.

The graphs are designed to communicate the high-level architecture. Therefore, low-level details are hidden by default (e.g. weight initialization ops, gradients, internal ops of common layer types, ...etc.). HiddenLayer also folds commonly used sequences of layers together. For example, the Convolution -> RELU -> MaxPool sequence is very common, so they get merged into one box for simplicity.

Customizing Graphs

The rules for hiding and folding nodes are fully customizable. You can use graph expressions and transforms to add your own rules. For example, this rule folds all the nodes of a bottleneck block of a ResNet101 into one node.

    # Fold bottleneck blocks
    ht.Fold("((ConvBnRelu > ConvBnRelu > ConvBn) | ConvBn) > Add > Relu", 
            "BottleneckBlock", "Bottleneck Block"),

2. Training Metrics in Jupyter Notebook

If you run training experiments in Jupyter Notebook then you might find this useful. You can use it to plot loss and accuracy, histograms of weights, or visualize activations of a few layers.

Outside Jupyter Notebook:

You can use HiddenLayer outside Jupyter Notebook as well. In a Python script running from command line, it'll open a separate window for the metrics. And if you're on a server without a GUI, you can save snapshots of the graphs to png files for later inspection. See for an example of this use case.

3. Hackable

HiddenLayer is a small library. It covers the basics, but you'll likely need to extend it for your own use case. For example, say you want to represent the model accuracy as a pie chart rather than a plot. This can be done by extending the Canvas class and adding a new method as such:

class MyCanvas(hl.Canvas):
    """Extending Canvas to add a pie chart method."""
    def draw_pie(self, metric):
        # set square aspect ratio'equal')
        # Get latest value of the metric
        value = np.clip([-1], 0, 1)
        # Draw pie chart[value, 1-value], labels=["Accuracy", ""])

See the pytorch_train.ipynb or tf_train.ipynb for an example.

The keras_train.ipynb notebook contains an actual training example that illustrates how to create a custom Canvas to plot a confusion matrix alongside validation metrics:




  • tf_graph.ipynb: This notebook illustrates how to generate graphs for various TF SLIM models.
  • tf_train.ipynb: Demonstrates tracking and visualizing training metrics with TensorFlow.
  • An example of using HiddenLayer without a GUI.


  • keras_graph.ipynb: This notebook illustrates how to generate graphs for various Keras models.
  • keras_train.ipynb: Demonstrates model graphing, visualization of training metrics, and how to create a custom Keras callback that uses a subclassed Canvas in order to plot a confusion matrix at the end of each training epoch.


HiddenLayer is released under the MIT license. Feel free to extend it or customize it for your needs. If you discover bugs, which is likely since this is an early release, please do report them or submit a pull request.

If you like to contribute new features, here are a few things we wanted to add but never got around to it:

  • Support for older versions of Python. Currently, it's only tested on Python 3.6.
  • Optimization to support logging big experiments.


1. Prerequisites

a. Python3, Numpy, Matplotlib, and Jupyter Notebook.

b. Either TensorFlow or PyTorch

c. GraphViz and its Python wrapper to generate network graphs. The easiest way to install it is

If you use Conda:

conda install graphviz python-graphviz


pip3 install graphviz

2. Install HiddenLayer

a. Clone From GitHub (Developer Mode)

Use this if you want to edit or customize the library locally.

# Clone the repository
git clone
cd hiddenlayer

# Install in dev mode
pip install -e .

b. Using PIP ("stable" release)

pip install hiddenlayer

c. Install to your site-packages directly from GitHub

Use the following if you just want to install the latest version of the library:

pip install git+

Author: waleedka
Source Code:
License: MIT License

#python #pytorch #tensorflow #keras 

Training Metrics for Pytorch, Tensorflow and Keras
Dorcas  Ferry

Dorcas Ferry


Neural Network Visualization toolkit for Keras

Keras Visualization Toolkit

keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. Currently supported visualizations include:

  • Activation maximization
  • Saliency maps
  • Class activation maps

All visualizations by default support N-dimensional image inputs. i.e., it generalizes to N-dim image inputs to your model.

The toolkit generalizes all of the above as energy minimization problems with a clean, easy to use, and extendable interface. Compatible with both theano and tensorflow backends with 'channels_first', 'channels_last' data format.

Quick links

Getting Started

In image backprop problems, the goal is to generate an input image that minimizes some loss function. Setting up an image backprop problem is easy.

Define weighted loss function

Various useful loss functions are defined in losses. A custom loss function can be defined by implementing Loss.build_loss.

from vis.losses import ActivationMaximization
from vis.regularizers import TotalVariation, LPNorm

filter_indices = [1, 2, 3]

# Tuple consists of (loss_function, weight)
# Add regularizers as needed.
losses = [
    (ActivationMaximization(keras_layer, filter_indices), 1),
    (LPNorm(model.input), 10),
    (TotalVariation(model.input), 10)

Configure optimizer to minimize weighted loss

In order to generate natural looking images, image search space is constrained using regularization penalties. Some common regularizers are defined in regularizers. Like loss functions, custom regularizer can be defined by implementing Loss.build_loss.

from vis.optimizer import Optimizer

optimizer = Optimizer(model.input, losses)
opt_img, grads, _ = optimizer.minimize()

Concrete examples of various supported visualizations can be found in examples folder.


Install keras with theano or tensorflow backend. Note that this library requires Keras > 2.0

Install keras-vis

From sources

sudo python install

PyPI package

sudo pip install keras-vis


NOTE: The links are currently broken and the entire documentation is being reworked. Please see examples/ for samples.

Neural nets are black boxes. In the recent years, several approaches for understanding and visualizing Convolutional Networks have been developed in the literature. They give us a way to peer into the black boxes, diagnose mis-classifications, and assess whether the network is over/under fitting.

Guided backprop can also be used to create trippy art, neural/texture style transfer among the list of other growing applications.

Various visualizations, documented in their own pages, are summarized here.

Conv filter visualization

Convolutional filters learn 'template matching' filters that maximize the output when a similar template pattern is found in the input image. Visualize those templates via Activation Maximization.

Dense layer visualization

How can we assess whether a network is over/under fitting or generalizing well?

Attention Maps

How can we assess whether a network is attending to correct parts of the image in order to generate a decision?

Generating animated gif of optimization progress

It is possible to generate an animated gif of optimization progress by leveraging callbacks. Following example shows how to visualize the activation maximization for 'ouzel' class (output_index: 20).

from keras.applications import VGG16

from vis.losses import ActivationMaximization
from vis.regularizers import TotalVariation, LPNorm
from vis.input_modifiers import Jitter
from vis.optimizer import Optimizer
from vis.callbacks import GifGenerator

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)
print('Model loaded.')

# The name of the layer we want to visualize
# (see model definition in
layer_name = 'predictions'
layer_dict = dict([(, layer) for layer in model.layers[1:]])
output_class = [20]

losses = [
    (ActivationMaximization(layer_dict[layer_name], output_class), 2),
    (LPNorm(model.input), 10),
    (TotalVariation(model.input), 10)
opt = Optimizer(model.input, losses)
opt.minimize(max_iter=500, verbose=True, input_modifiers=[Jitter()], callbacks=[GifGenerator('opt_progress')])

Notice how the output jitters around? This is because we used Jitter, a kind of ImageModifier that is known to produce crisper activation maximization images. As an exercise, try:

  • Without Jitter
  • Varying various loss weights



Please cite keras-vis in your publications if it helped your research. Here is an example BibTeX entry:

  author={Kotikalapudi, Raghavendra and contributors},

Author: raghakot
Source Code:
License: MIT License

#keras #machine-learning #deep-learning #tensorflow #python 

Neural Network Visualization toolkit for Keras
Meggie  Flatley

Meggie Flatley


DeepLIFT: Deep Learning Important FeaTures

DeepLIFT: Deep Learning Important FeaTures

This version of DeepLIFT has been tested with Keras 2.2.4 & tensorflow 1.14.0. See this FAQ question for information on other implementations of DeepLIFT that may work with different versions of tensorflow/pytorch, as well as a wider range of architectures. See the tags for older versions.

This repository implements the methods in "Learning Important Features Through Propagating Activation Differences" by Shrikumar, Greenside & Kundaje, as well as other commonly-used methods such as gradients, gradient-times-input (equivalent to a version of Layerwise Relevance Propagation for ReLU networks), guided backprop and integrated gradients.

Here is a link to the slides and the video of the 15-minute talk given at ICML. Here is a link to a longer series of video tutorials. Please see the FAQ and file a github issue if you have questions.

Note: when running DeepLIFT for certain computer vision tasks you may get better results if you compute contribution scores of some higher convolutional layer rather than the input pixels. Use the argument find_scores_layer_idx to specify which layer to compute the scores for.

Please be aware that figuring out optimal references is still an open problem. Suggestions on good heuristics for different applications are welcome. In the meantime, feel free to look at this github issue for general ideas:

Please feel free to follow this repository to stay abreast of updates.

Table of contents


DeepLIFT is on pypi, so it can be installed using pip:

pip install deeplift

If you want to be able to make edits to the code, it is recommended that you clone the repository and install using the --editable flag.

git clone #will clone the deeplift repository
pip install --editable deeplift/ #install deeplift from the cloned repository. The "editable" flag means changes to the code will be picked up automatically.

While DeepLIFT does not require your models to be trained with any particular library, we have provided autoconversion functions to convert models trained using Keras into the DeepLIFT format. If you used a different library to train your models, you can still use DeepLIFT if you recreate the model using DeepLIFT layers.

This implementation of DeepLIFT was tested with tensorflow 1.7, and autoconversion was tested using keras 2.0.


These examples show how to autoconvert a keras model and obtain importance scores. Non-keras models can be converted to DeepLIFT if they are saved in the keras 2.0 format

#Convert a keras sequential model
import deeplift
from deeplift.conversion import kerasapi_conversion as kc
#NonlinearMxtsMode defines the method for computing importance scores.
#NonlinearMxtsMode.DeepLIFT_GenomicsDefault uses the RevealCancel rule on Dense layers
#and the Rescale rule on conv layers (see paper for rationale)
#Other supported values are:
#NonlinearMxtsMode.RevealCancel - DeepLIFT-RevealCancel at all layers (used for the MNIST example)
#NonlinearMxtsMode.Rescale - DeepLIFT-rescale at all layers
#NonlinearMxtsMode.Gradient - the 'multipliers' will be the same as the gradients
#NonlinearMxtsMode.GuidedBackprop - the 'multipliers' will be what you get from guided backprop
#Use deeplift.util.get_integrated_gradients_function to compute integrated gradients
#Feel free to email avanti [dot] if anything is unclear
deeplift_model =\

#Specify the index of the layer to compute the importance scores of.
#In the example below, we find scores for the input layer, which is idx 0 in deeplift_model.get_layers()
find_scores_layer_idx = 0

#Compile the function that computes the contribution scores
#For sigmoid or softmax outputs, target_layer_idx should be -2 (the default). This computes explanations
# w.r.t. the logits. (See "3.6 Choice of target layer" in for justification)
#For regression tasks with a linear output, target_layer_idx should be -1 (which simply refers to the last layer)
#Note that in the case of softmax outputs, it may be a good idea to normalize the softmax logits so
# that they sum to zero across all tasks. This ensures that if a feature is contributing equally to
# to all the softmax logits, it will effectly be seen as contributing to none of the tasks (adding
# a constant to all logits of a softmax does not change the output). This is discussed in
# One way to efficiently acheive this
# normalization is to mean-normalize the weights going into the Softmax layer as
# discussed in eqn. 21 in Section 2.5 of ("A note on Softmax Activation")
#If you want the DeepLIFT multipliers instead of the contribution scores, you can use get_target_multipliers_func
deeplift_contribs_func = deeplift_model.get_target_contribs_func(
#You can also provide an array of indices to find_scores_layer_idx to get scores for multiple layers at once

#compute scores on inputs
#input_data_list is a list containing the data for different input layers
#eg: for MNIST, there is one input layer with with dimensions 1 x 28 x 28
#In the example below, let X be an array with dimension n x 1 x 28 x 28 where n is the number of examples
#task_idx represents the index of the node in the output layer that we wish to compute scores.
#Eg: if the output is a 10-way softmax, and task_idx is 0, we will compute scores for the first softmax class
scores = np.array(deeplift_contribs_func(task_idx=0,

This will work for sequential models involving dense and/or conv1d/conv2d layers and linear/relu/sigmoid/softmax or prelu activations. Please create a github issue or email avanti [dot] readme if you are interested in support for other layer types.

The syntax for using functional models is similar; you can use deeplift_model.get_name_to_layer().keys() to get a list of layer names when figuring out how to specify find_scores_layer_name and pre_activation_target_layer_name:

deeplift_model =\
#The syntax below for obtaining scores is similar to that of a converted graph model
#See deeplift_model.get_name_to_layer().keys() to see all the layer names
#As before, you can provide an array of names to find_scores_layer_name
#to get the scores for multiple layers at once
deeplift_contribs_func = deeplift_model.get_target_contribs_func(


A notebook replicating the results in the paper on MNIST is at examples/mnist/MNIST_replicate_figures.ipynb, and a notebook demonstrating use on a genomics model with 1d convolutions is at examples/genomics/genomics_simulation.ipynb.


Can you provide a brief intuition for how DeepLIFT works?

The 15-minute talk from ICML gives an intuition for the method. Here are links to the slides and the video (the video truncates the slides, which is why the slides are linked separately). Please file a github issue if you have questions.

My model architecture is not supported by this DeepLIFT implementation. What should I do?

My first suggestion would be to look at DeepSHAP/DeepExplainer (Lundberg & Lee), DeepExplain (Ancona et al.) or Captum (if you are using pytorch) to see if any of them satisfy your needs. They are implemented by overriding gradient operators and thus support a wider variety of architectures. However, none of these implementations support the RevealCancel rule (which deals with failure modes such as the min function). The pros and cons of DeepSHAP vs DeepExplain are discussed in more detail below. If you would really like to have the RevealCancel rule, go ahead and post a github issue, although my energies are currently focused on other projects and I may not be able to get to it for some time.

Note for people in genomics planning to use TF-MoDISco: for DeepSHAP, I have a custom branch of the DeepSHAP repository that has functionality for computing hypothetical importance scores. A colab notebook demonstrating the use of that repository is here, and a tutorial I made on DeepSHAP for genomics is here.

What are the similarities and differences between the DeepLIFT-like implementations in DeepExplain from Ancona et al. (ICLR 2018) and DeepSHAP/DeepExplainer from the SHAP repository?

Both DeepExplain (Ancona et al.) and DeepSHAP/DeepExplainer work by overriding gradient operators, and can thus support a wider variety of architectures than those that are covered in the DeepLIFT repo (in fact, the DeepSHAP/DeepExplainer implementation was inspired by Ancona et al.'s work and builds on a connection between DeepLIFT and SHAP, described in the SHAP paper). For the set of architectures described in the DeepLIFT paper, i.e. linear matrix multiplications, convolutions, and single-input nonlinearities (like ReLUs), both these implementations are identical to DeepLIFT with the Rescale rule. However, neither implementation supports DeepLIFT with the RevealCancel rule (a rule that was developed to deal with failure cases such as the min function, and which unfortunately is not easily implemented by overriding gradient operators). The key differences are as follows:

(1) DeepExplain uses standard gradient backpropagation for elementwise operations (such as those present in LSTMs/GRUs/Attention). This will likely violate the summation-to-delta property (i.e. the property that the sum of the attributions over the input is equal to the difference-from-reference of the output). If you have elementwise operations, I recommend you use DeepSHAP/DeepExplainer, which employs a summation-to-delta-preserving backprop rule. The same is technically true for Maxpooling operations when a non-uniform reference is used (though this has not been a salient problem for us in practice); the DeepSHAP/DeepExplainer implementation guarantees summation-to-delta is satisfied for Maxpooling by assigning credit/blame to either the neuron that is the max in the actual input or the neuron that was the max in the reference (this is different from the 'Max' attribution rule proposed in the SHAP paper; that attribution rule does not scale well).

(2) DeepExplain (by Ancona et al.) does not support the dynamic reference that is demonstrated in the DeepLIFT repo (i.e. the case where a different reference is generated according to the properties of the input example, such as the 'dinucleotide shuffled' references used in genomics). I've implemented the dynamic reference feature for DeepSHAP/DeepExplainer (click for a link to the PR). Also, if you are planning to use DeepSHAP for genomics with TF-MoDISco, please see the note above on my custom implementation of DeepSHAP for computing hypothetical importance scores + a link to the slides for a tutorial.

(3) DeepSHAP/DeepExplainer is implemented such that multiple references can be used for a single example, and the final attributions are averaged over each reference. However, the way this is implemented, each GPU batch calculates attributions for a single example, for all references. This means that the DeepSHAP/DeepExplainer implementation might be slow in cases where you have a large number of samples and only one reference. By contrast, DeepExplain (Ancona et al.) is structured such that the user provides a single reference, and this reference is used for all the examples. Thus, DeepExplain (Ancona et al.) allows GPU batching across examples, but does not allow for GPU batching across different references.

In summary, my recommendations are: use DeepSHAP if you have elementwise operations (e.g. GRUs/LSTMs/Attention), a need for dynamic references, or a large number of references compared to samples. Use DeepExplain when you have a large number of samples compared to references.

How does the implementation in this repository compare with the DeepLIFT implementation in Poerner et al. (ACL 2018)?

Poerner et al. conducted a series of benchmarks comparing DeepLIFT to other explanation methods on NLP tasks. Their implementation differs from the canonical DeepLIFT implementation in two main ways. First, they considered only the Rescale rule of DeepLIFT (according to the implementation here). Second, to handle operations that involve multiplications with gating units (which DeepLIFT was not designed for), they treat the gating neuron as a weight (similar to the approach in Arras et al.) and assign all importance to the non-gating neuron. Note that this differs from the implementation in DeepSHAP/DeepExplainer, which handles elementwise multiplications using a backprop rule base on SHAP and would assign importance to the gating neuron. We have not studied the appropriateness of Arras et al.'s approach, but the authors did find that "LIMSSE, LRP (Bach et al., 2015) and DeepLIFT (Shrikumar et al., 2017) are the most effective explanation methods (§4): LRP and DeepLIFT are the most consistent methods, while LIMSSE wins the hybrid document experiment." (They did not compare with the DeepSHAP/DeepExplainer implementation)

How does DeepLIFT compare to integrated gradients?

As illustrated in the DeepLIFT paper, the RevealCancel rule of DeepLIFT can allow DeepLIFT to properly handle cases where integrated gradients may give misleading results. Independent researchers have found that DeepLIFT with just the Rescale rule performs comparably to Integrated Gradients (they write: “Integrated Gradients and DeepLIFT have very high correlation, suggesting that the latter is a good (and faster) approximation of the former in practice”). Their finding was consistent with our own personal experience. The speed improvement of DeepLIFT relative to Integrated Gradients becomes particularly useful when using a collection of references (since having a collection of references per example increases runtime).

Do you have support for non-keras models?

At the moment, we do not. However, if you are able to convert your model into the saved file format used by the Keras 2 API, then you can use this branch to load it into the DeepLIFT format. For inspiration on how to achieve this, you can look at examples/convert_models/keras1.2to2 for a notebook demonstrating how to convert models saved in the keras1.2 format to keras 2. DeepLIFT conversion works directly from keras saved files without ever actually loading the models into keras. If you have a pytorch model, you may also be interested in the Captum implementation.

What do negative scores mean?

A negative contribution score on an input means that the input contributed to moving the output below its reference value, where the reference value of the output is the value that it has when provided the reference input. A negative contribution does not mean that the input is "unimportant". If you want to find inputs that DeepLIFT considers "unimportant" (i.e. DeepLIFT thinks they don't influence the output of the model much), these would be the inputs that have contribution scores near 0.

How do I provide a reference argument?

Just as you supply input_data_list as an argument to the scoring function, you can also supply input_references_list. It would have the same dimensions as input_data_list, but would contain reference images for each input.

What should I use as my reference?

The choice of reference depends on the question you wish to ask of the data. Generally speaking, the reference should retain the properties you don't care about and scramble the properties you do care about. In the supplement of the DeepLIFT paper, Appendix L looks at the results on a CIFAR10 model with two different choices of the reference. You'll notice that when a blurred version of the input is used as a reference, the outlines of objects stand out. When a black reference is used, the results are more confusing, possibly because the net is also highlighting color. If you have a particular reference in mind, it is a good idea to check that the output of the model on that reference is consistent with what you expect. Another idea to consider is using multiple different references to interpret a single image and averaging the results over all the different references. We use this approach in genomics; we generate a collection of references per input sequence by shuffling the sequence (this is demonstrated in the genomics example notebook).

How can I get a sense of how much an input contributes across all examples?

It is fine to average the DeepLIFT contribution scores across examples. Be aware that there might be considerable heterogeneity in your data (i.e. some inputs may be very important for some subset of examples but not others, some inputs may contribute positively on some examples and negatively on others) so clustering may prove more insightful than averaging. For the purpose of feature selection, a reasonable heuristic would be to rank inputs in descending order of the average magnitude of the DeepLIFT contribution scores.

Can I have multiple input modes?

Yes. Rather than providing a single numpy array to input_data_list, provide a list of numpy arrays containing the input to each mode. You can also provide a dictionary to input_data_list where the key is the mode name and the value is the numpy array. Each numpy array should have the first axis be the sample axis.

Can I get the contribution scores on multiple input layers at once?

Also yes. Just provide a list to find_scores_layer_name rather than a single argument.

What's the license?

MIT License. While we had originally filed a patent on some of our interpretability work, we have since disbanded the patent as it appears this project has enough interest from the community to be best distributed in open-source format.

I have heard DeepLIFT can do pattern discovery - is that right?

You are likely thinking of TF-MoDISco. Here is a link to that code.


Please email avanti [dot] shrikumar [at] with questions, ideas, feature requests, etc. If I don't respond, keep emailing me until I feel guilty and respond. Also feel free to email my adviser (anshul [at] kundaje [dot] net), who can further guilt me into responding. I promise I do actually want to respond; I'm just busy with other things because the incentive structure of academia doesn't reward maintenance of projects.

Under the hood

This section explains finer aspects of the deeplift implementation


The layer (deeplift.layers.core.Layer) is the basic unit. deeplift.layers.core.Dense and deeplift.layers.convolution.Conv2D are both examples of layers.

Layers implement the following key methods:


Returns symbolic variables representing the activations of the layer. For an understanding of symbolic variables, refer to the documentation of symbolic computation packages like theano or tensorflow.

get_pos_mxts() and get_neg_mxts()

Returns symbolic variables representing the positive/negative multipliers on this layer (for the selected output). See paper for details.


Returns symbolic variables representing the importance scores. This is a convenience function that returns self.get_pos_mxts()*self._pos_contribs() + self.get_neg_mxts()*self._neg_contribs(). See paper for details.

The Forward Pass

Here are the steps necessary to implement a forward pass. If executed correctly, the results should be identical (within numerical precision) to a forward pass of your original model, so this is definitely worth doing as a sanity check. Note that if autoconversion (as described in the quickstart) is an option, you can skip steps (1) and (2).

  1. Create a layer object for every layer in the network
  2. Tell each layer what its inputs are via the set_inputs function. The argument to set_inputs depends on what the layer expects
  • If the layer has a single layer as its input (eg: Dense layers), then the argument is simply the layer that is the input
  • If the layer takes multiple layers as its input, the argument depends on the specific implementation - for example, in the case of a Concat layer, the argument is a list of layers
  1. Once every layer is linked to its inputs, you may compile the forward propagation function with deeplift.backend.function([input_layer.get_activation_vars()...], output_layer.get_activation_vars())
  • If you are working with a model produced by autoconversion, you can access individual layers via model.get_layers() for sequential models (where this function would return a list of layers) or model.get_name_to_layer() for Graph models (where this function would return a dictionary mapping layer names to layers)
  • The first argument is a list of symbolic tensors representing the inputs to the net. If the net has only one input layer, then this will be a list containing only one tensor
  • The second argument is the output of the function. In the example above, it is a single tensor, but it can also be a list of tensors if you want the outputs of more than one layer
  1. Once the function is compiled, you can use deeplift.util.run_function_in_batches(func, input_data_list) to run the function in batches (which would be advisable if you want to call the function on a large number of inputs that wont fit in memory)
  • func is simply the compiled function returned by deeplift.backend.function
  • input_data_list is a list of numpy arrays containing data for the different input layers of the network. In the case of a network with one input, this will be a list containing one numpy array
  • Optional arguments to run_function_in_batches are batch_size and progress_update

The Backward Pass

Here are the steps necessary to implement the backward pass, which is where the importance scores are calculated. Ideally, you should create a model through autoconversion (described in the quickstart) and then use model.get_target_contribs_func or model.get_target_multipliers_func. Howver, if that is not an option, read on (please also consider sending us a message to let us know, as if there is enough demand for a feature we will consider adding it). Note the instructions below assume you have done steps (1) and (2) under the forward pass section.

For the layer(s) that you wish to compute the importance scores for, call reset_mxts_updated(). This resets the symbolic variables for computing the multipliers. If this is the first time you are compiling the backward pass, this step is not strictly necessary.

For the output layer(s) containing the neuron(s) that the importance scores will be calculated with respect to, call set_scoring_mode(deeplift.layers.ScoringMode.OneAndZeros).

  • Briefly, this is the scoring mode that is used when we want to find scores with respect to a single target neuron. Other kinds of scoring modes may be added later (eg: differences between neurons).
  • A point of clarification: when we eventually compile the function, it will be a function which computes scores for only a single output neuron in a single layer every time it is called. The specific neuron and layer can be toggled later, at runtime. Right now, at this step, you should call set_scoring_mode on all the target layers that you might conceivably want to find the scores with respect to. This will save you from having to recompile the function to allow a different target layer later.
  • For Sigmoid/Softmax output layers, the output layer that you use should be the linear layer (usually a Dense layer) that comes before the final nonlinear activation. See "3.6 Choice of target layer" in the paper for justification. If there is no final nonlinearity (eg: in the case of many regression tasks), then the output layer should just be the last linear layer.
  • For Softmax outputs, you should may want to subtract the average contribution to all softmax classes as described in "Adjustments for softmax layers" in the paper (section 3.6). If your number of softmax classes is very large and you don't want to calculate contributions to each class separately for each example, contact me (avanti [dot] and I can implement a more efficient way to do the calculation (there is a way but I haven't coded it up yet).

For the layer(s) that you wish to compute the importance scores for, call update_mxts(). This will create the symbolic variables that compute the multipliers with respect to the layer specified in step 2.

Compile the importance score computation function with

  • The first argument represents the inputs to the function and should be a list of one symbolic tensor for the activations of each input layer (as for the forward pass), followed by a list of one symbolic tensor for the references of each input layer
  • The second argument represents the output of the function. In the example above, it is a single tensor containing the importance scores of a single layer, but it can also be a list of tensors if you wish to compute the scores for multiple layers at once.
  • Instead of get_target_contrib_vars() which returns the importance scores (in the case of NonlinearMxtsMode.DeepLIFT, these are called "contribution scores"), you can use get_pos_mxts() or get_neg_mxts() to get the multipliers.

Now you are ready to call the function to find the importance scores.

  • Select a specific output layer to compute importance scores with respect to by calling set_active() on the layer.
  • Select a specific target neuron within the layer by calling update_task_index(task_idx) on the layer. Here task_idx is the index of a neuron within the layer.
  • Call the function compiled in step 4 to find the importance scores for the target neuron. Refer to step 4 in the forward pass section for tips on using deeplift.util.run_function_in_batches to do this.
  • Deselect the output layer by calling set_inactive() on the layer. Don't forget this!
  • (Yes, I will bundle all of these into a single function at some point)

Author: kundajelab
Source Code:
License: MIT License

#deep-learning #keras #tensorflow #python 

DeepLIFT: Deep Learning Important FeaTures

Kapre: Keras Audio Preprocessors


Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!


pip install kapre

API Documentation

Please refer to Kapre API Documentation at

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then.., y)
# Done!

Tflite compatbility

The STFT layer is not tflite compatible (due to tf.signal.stft). To create a tflite compatible model, first train using the normal kapre layers then create a new model replacing STFT and Magnitude with STFTTflite, MagnitudeTflite. Tflite compatible layers are restricted to a batch size of 1 which prevents use of them during training.

# assumes you have run the one-shot example above.
from kapre import STFTTflite, MagnitudeTflite
model_tflite = Sequential()

model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))

# load the trained weights into the tflite compatible model.


Please cite this paper if you use Kapre for your work.

  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},

Author: Keunwoochoi
Source Code:
License: MIT License

#python #audio #tensorflow #keras 

Kapre: Keras Audio Preprocessors
Meggie  Flatley

Meggie Flatley


How to Do Easily Layer and Point Result in Keras

Keract: Keras Activations + Gradients

Tested with Tensorflow 2.3, 2.4, 2.5 and 2.6.

pip install keract

You have just found a way to get the activations (outputs) and gradients for each layer of your Tensorflow/Keras model (LSTM, conv nets...).


Get activations (nodes/layers outputs as Numpy arrays)

keract.get_activations(model, x, layer_names=None, nodes_to_evaluate=None, output_format='simple', nested=False, auto_compile=True)

Fetch activations (nodes/layers outputs as Numpy arrays) for a Keras model and an input X. By default, all the activations for all the layers are returned.

  • model: Keras compiled model or one of ['vgg16', 'vgg19', 'inception_v3', 'inception_resnet_v2', 'mobilenet_v2', 'mobilenetv2', ...].
  • x: Numpy array to feed the model as input. In the case of multi-inputs, x should be of type List.
  • layer_names: (optional) Single name of a layer or list of layer names for which activations should be returned. It is useful in very big networks when it is computationally expensive to evaluate all the layers/nodes.
  • nodes_to_evaluate: (optional) List of Keras nodes to be evaluated.
  • output_format: Change the output dictionary key of the function.
    • simple: output key will match the names of the Keras layers. For example Dense(1, name='d1') will return {'d1': ...}.
    • full: output key will match the full name of the output layer name. In the example above, it will return {'d1/BiasAdd:0': ...}.
    • numbered: output key will be an index range, based on the order of definition of each layer within the model.
  • nested: If specified, will move recursively through the model definition to retrieve nested layers. Recursion ends at leaf layers of the model tree or at layers with their name specified in layer_names. For example a Sequential model in another Sequential model is considered nested.
  • auto_compile: If set to True, will auto-compile the model if needed.

Returns: Dict {layer_name (specified by output_format) -> activation of the layer output/node (Numpy array)}.


import numpy as np
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, concatenate
from keract import get_activations

# model definition
i1 = Input(shape=(10,), name='i1')
i2 = Input(shape=(10,), name='i2')

a = Dense(1, name='fc1')(i1)
b = Dense(1, name='fc2')(i2)

c = concatenate([a, b], name='concat')
d = Dense(1, name='out')(c)
model = Model(inputs=[i1, i2], outputs=[d])

# inputs to the model
x = [np.random.uniform(size=(32, 10)), np.random.uniform(size=(32, 10))]

# call to fetch the activations of the model.
activations = get_activations(model, x, auto_compile=True)

# print the activations shapes.
[print(k, '->', v.shape, '- Numpy array') for (k, v) in activations.items()]

# Print output:
# i1 -> (32, 10) - Numpy array
# i2 -> (32, 10) - Numpy array
# fc1 -> (32, 1) - Numpy array
# fc2 -> (32, 1) - Numpy array
# concat -> (32, 2) - Numpy array
# out -> (32, 1) - Numpy array

Display the activations you've obtained

keract.display_activations(activations, cmap=None, save=False, directory='.', data_format='channels_last', fig_size=(24, 24), reshape_1d_layers=False)

Plot the activations for each layer using matplotlib

Inputs are:

  • activations dict - a dictionary mapping layers to their activations (the output of get_activations)
  • cmap (optional) string - a valid matplotlib colormap to be used
  • save(optional) a bool, if True the images of the activations are saved rather than being shown
  • directory: (optional) string - where to store the activations (if save is True)
  • data_format: (optional) tring - one of "channels_last" (default) or "channels_first".
  • reshape_1d_layers: (optional) bool - tries to reshape large 1d layers to a square/rectangle.
  • fig_size: (optional) (float, float) - width, height in inches.

The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras) while "channels_first" corresponds to inputs with shape (batch, channels, steps).

Display the activations as a heatmap overlaid on an image

keract.display_heatmaps(activations, input_image, save=False)

Plot heatmaps of activations for all filters overlayed on the input image for each layer

Inputs are:

  • activations: a dictionary mapping layers to their activations (the output of get_activations).
  • input_image: numpy array of the image you inputed to the get_activations.
  • save(optional) bool - if True the images of the activations are saved rather than being shown.
  • fix: (optional) bool - if automated checks and fixes for incorrect images should be run.
  • directory: string - where to store the activations (if save is True).

Get gradients of weights

keract.get_gradients_of_trainable_weights(model, x, y)
  • model is a keras.models.Model object.
  • x: Numpy array to feed the model as input. In the case of multi-inputs, x should be of type List.
  • y: Labels (numpy array). Keras convention.

The output is a dictionary mapping each trainable weight to the values of its gradients (regarding x and y).

Get gradients of activations

keract.get_gradients_of_activations(model, x, y, layer_name=None, output_format='simple')
  • model is a keras.models.Model object.
  • x: Numpy array to feed the model as input. In the case of multi-inputs, x should be of type List.
  • y: Labels (numpy array). Keras convention.
  • layer_name: (optional) Name of a layer for which activations should be returned.
  • output_format: Change the output dictionary key of the function.
    • simple: output key will match the names of the Keras layers. For example Dense(1, name='d1') will return {'d1': ...}.
    • full: output key will match the full name of the output layer name. In the example above, it will return {'d1/BiasAdd:0': ...}.
    • numbered: output key will be an index range, based on the order of definition of each layer within the model.

Returns: Dict {layer_name (specified by output_format) -> grad activation of the layer output/node (Numpy array)}.

The output is a dictionary mapping each layer to the values of its gradients (regarding x and y).

Persist activations to JSON

keract.persist_to_json_file(activations, filename)
  • activations: activations (dict mapping layers)
  • filename: output filename (JSON format)

Load activations from JSON

  • filename: filename to read the activations from (JSON format)

It returns the activations.


Examples are provided for:

  • keras.models.Sequential -
  • keras.models.Model -
  • Recurrent networks -

In the case of MNIST with LeNet, we are able to fetch the activations for a batch of size 128:

(128, 26, 26, 32)

(128, 24, 24, 64)

(128, 12, 12, 64)

(128, 12, 12, 64)

(128, 9216)

(128, 128)

(128, 128)

(128, 10)

We can visualise the activations. Here's another example using VGG16:

cd examples
pip install -r examples-requirements.txt

A cat.

Outputs of the first convolutional layer of VGG16.

Also, we can visualise the heatmaps of the activations:

cd examples
pip install -r examples-requirements.txt


  author = {Philippe Remy},
  title = {Keract: A library for visualizing activations and gradients},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{}},

Author: philipperemy
Source Code:
License: MIT License

#keras #machine-learning  #deep-learning 

How to Do Easily Layer and Point Result in Keras
Meggie  Flatley

Meggie Flatley


Interpretability Methods For Tf.keras Models with Tensorflow 2.x


tf-explain implements interpretability methods as Tensorflow 2.x callbacks to ease neural network's understanding. See Introducing tf-explain, Interpretability for Tensorflow 2.0



tf-explain is available on PyPi. To install it:

virtualenv venv -p python3.8
pip install tf-explain

tf-explain is compatible with Tensorflow 2.x. It is not declared as a dependency to let you choose between full and standalone-CPU versions. Additionally to the previous install, run:

# For CPU or GPU
pip install tensorflow==2.6.0

Opencv is also a dependency. To install it, run:

# For CPU or GPU
pip install opencv-python


tf-explain offers 2 ways to apply interpretability methods. The full list of methods is the Available Methods section.

On trained model

The best option is probably to load a trained model and apply the methods on it.

# Load pretrained model or your own
model = tf.keras.applications.vgg16.VGG16(weights="imagenet", include_top=True)

# Load a sample image (or multiple ones)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
img = tf.keras.preprocessing.image.img_to_array(img)
data = ([img], None)

# Start explainer
explainer = GradCAM()
grid = explainer.explain(data, model, class_index=281)  # 281 is the tabby cat index in ImageNet, ".", "grad_cam.png")

During training

If you want to follow your model during the training, you can also use it as a Keras Callback, and see the results directly in TensorBoard.

from tf_explain.callbacks.grad_cam import GradCAMCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)

Available Methods

  1. Activations Visualization
  2. Vanilla Gradients
  3. Gradients*Inputs
  4. Occlusion Sensitivity
  5. Grad CAM (Class Activation Maps)
  6. SmoothGrad
  7. Integrated Gradients

Activations Visualization

Visualize how a given input comes out of a specific activation layer

from tf_explain.callbacks.activations_visualization import ActivationsVisualizationCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)

Vanilla Gradients

Visualize gradients importance on input image

from tf_explain.callbacks.vanilla_gradients import VanillaGradientsCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)


Variant of Vanilla Gradients ponderating gradients with input values

from tf_explain.callbacks.gradients_inputs import GradientsInputsCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)

Occlusion Sensitivity

Visualize how parts of the image affects neural network's confidence by occluding parts iteratively

from tf_explain.callbacks.occlusion_sensitivity import OcclusionSensitivityCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)

Occlusion Sensitivity for Tabby class (stripes differentiate tabby cat from other ImageNet cat classes)

Grad CAM

Visualize how parts of the image affects neural network's output by looking into the activation maps

From Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

from tf_explain.callbacks.grad_cam import GradCAMCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)


Visualize stabilized gradients on the inputs towards the decision

From SmoothGrad: removing noise by adding noise

from tf_explain.callbacks.smoothgrad import SmoothGradCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)

Integrated Gradients

Visualize an average of the gradients along the construction of the input towards the decision

From Axiomatic Attribution for Deep Networks

from tf_explain.callbacks.integrated_gradients import IntegratedGradientsCallback

model = [...]

callbacks = [
        validation_data=(x_val, y_val),
], y_train, batch_size=2, epochs=2, callbacks=callbacks)



To contribute to the project, please read the dedicated section.


A citation file is available for citing this work. Click the "Cite this repository" button on the right-side panel of Github to get a BibTeX-ready citation.

Author: sicara
Source Code:
License: MIT License

#tensorflow #keras 

Interpretability Methods For Tf.keras Models with Tensorflow 2.x
Daron  Moore

Daron Moore


Game Theory Approach to Explain The Outputs Of any Machine Learning

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).


SHAP can be installed from either PyPI or conda-forge:

pip install shap
conda install -c conda-forge shap

Tree ensemble example (XGBoost/LightGBM/CatBoost/scikit-learn/pyspark models)

While SHAP can explain the output of any machine learning model, we have developed a high-speed exact algorithm for tree ensemble methods (see our Nature MI paper). Fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, scikit-learn and pyspark tree models:

import xgboost
import shap

# train an XGBoost model
X, y =
model = xgboost.XGBRegressor().fit(X, y)

# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark, etc.)
explainer = shap.Explainer(model)
shap_values = explainer(X)

# visualize the first prediction's explanation

The above explanation shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue. Another way to visualize the same explanation is to use a force plot (these are introduced in our Nature BME paper):

# visualize the first prediction's explanation with a force plot

If we take many force plot explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset (in the notebook this plot is interactive):

# visualize all the training set predictions

To understand how a single feature effects the output of the model we can plot the SHAP value of that feature vs. the value of the feature for all the examples in a dataset. Since SHAP values represent a feature's responsibility for a change in the model output, the plot below represents the change in predicted house price as RM (the average number of rooms per house in an area) changes. Vertical dispersion at a single value of RM represents interaction effects with other features. To help reveal these interactions we can color by another feature. If we pass the whole explanation tensor to the color argument the scatter plot will pick the best feature to color by. In this case it picks RAD (index of accessibility to radial highways) since that highlights that the average number of rooms per house has less impact on home price for areas with a high RAD value.

# create a dependence scatter plot to show the effect of a single feature across the whole dataset
shap.plots.scatter(shap_values[:,"RM"], color=shap_values)

To get an overview of which features are most important for a model we can plot the SHAP values of every feature for every sample. The plot below sorts features by the sum of SHAP value magnitudes over all samples, and uses SHAP values to show the distribution of the impacts each feature has on the model output. The color represents the feature value (red high, blue low). This reveals for example that a high LSTAT (% lower status of the population) lowers the predicted home price.

# summarize the effects of all the features

We can also just take the mean absolute value of the SHAP values for each feature to get a standard bar plot (produces stacked bars for multi-class outputs):

Natural language example (transformers)

SHAP has specific support for natural language models like those in the Hugging Face transformers library. By adding coalitional rules to traditional Shapley values we can form games that explain large modern NLP model using very few function evaluations. Using this functionality is as simple as passing a supported transformers pipeline to SHAP:

import transformers
import shap

# load a transformers pipeline model
model = transformers.pipeline('sentiment-analysis', return_all_scores=True)

# explain the model on two sample inputs
explainer = shap.Explainer(model) 
shap_values = explainer(["What a great movie! ...if you have no taste."])

# visualize the first prediction's explanation for the POSITIVE output class
shap.plots.text(shap_values[0, :, "POSITIVE"])

Deep learning example with DeepExplainer (TensorFlow/Keras models)

Deep SHAP is a high-speed approximation algorithm for SHAP values in deep learning models that builds on a connection with DeepLIFT described in the SHAP NIPS paper. The implementation here differs from the original DeepLIFT by using a distribution of background samples instead of a single reference value, and using Shapley equations to linearize components such as max, softmax, products, divisions, etc. Note that some of these enhancements have also been since integrated into DeepLIFT. TensorFlow models and Keras models using the TensorFlow backend are supported (there is also preliminary support for PyTorch):

# ...include code from

import shap
import numpy as np

# select a set of background examples to take an expectation over
background = x_train[np.random.choice(x_train.shape[0], 100, replace=False)]

# explain predictions of the model on four images
e = shap.DeepExplainer(model, background)
# ...or pass tensors directly
# e = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), background)
shap_values = e.shap_values(x_test[1:5])

# plot the feature attributions
shap.image_plot(shap_values, -x_test[1:5])

The plot above explains ten outputs (digits 0-9) for four different images. Red pixels increase the model's output while blue pixels decrease the output. The input images are shown on the left, and as nearly transparent grayscale backings behind each of the explanations. The sum of the SHAP values equals the difference between the expected model output (averaged over the background dataset) and the current model output. Note that for the 'zero' image the blank middle is important, while for the 'four' image the lack of a connection on top makes it a four instead of a nine.

Deep learning example with GradientExplainer (TensorFlow/Keras/PyTorch models)

Expected gradients combines ideas from Integrated Gradients, SHAP, and SmoothGrad into a single expected value equation. This allows an entire dataset to be used as the background distribution (as opposed to a single reference value) and allows local smoothing. If we approximate the model with a linear function between each background data sample and the current input to be explained, and we assume the input features are independent then expected gradients will compute approximate SHAP values. In the example below we have explained how the 7th intermediate layer of the VGG16 ImageNet model impacts the output probabilities.

from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
import keras.backend as K
import numpy as np
import json
import shap

# load pre-trained model and choose two images to explain
model = VGG16(weights='imagenet', include_top=True)
X,y = shap.datasets.imagenet50()
to_explain = X[[39,41]]

# load the ImageNet class names
url = ""
fname = shap.datasets.cache(url)
with open(fname) as f:
    class_names = json.load(f)

# explain how the input to the 7th layer of the model explains the top two classes
def map2layer(x, layer):
    feed_dict = dict(zip([model.layers[0].input], [preprocess_input(x.copy())]))
    return K.get_session().run(model.layers[layer].input, feed_dict)
e = shap.GradientExplainer(
    (model.layers[7].input, model.layers[-1].output),
    map2layer(X, 7),
    local_smoothing=0 # std dev of smoothing noise
shap_values,indexes = e.shap_values(map2layer(to_explain, 7), ranked_outputs=2)

# get the names for the classes
index_names = np.vectorize(lambda x: class_names[str(x)][1])(indexes)

# plot the explanations
shap.image_plot(shap_values, to_explain, index_names)

Predictions for two input images are explained in the plot above. Red pixels represent positive SHAP values that increase the probability of the class, while blue pixels represent negative SHAP values the reduce the probability of the class. By using ranked_outputs=2 we explain only the two most likely classes for each input (this spares us from explaining all 1,000 classes).

Model agnostic example with KernelExplainer (explains any function)

Kernel SHAP uses a specially-weighted local linear regression to estimate SHAP values for any model. Below is a simple example for explaining a multi-class SVM on the classic iris dataset.

import sklearn
import shap
from sklearn.model_selection import train_test_split

# print the JS visualization code to the notebook

# train a SVM classifier
X_train,X_test,Y_train,Y_test = train_test_split(*shap.datasets.iris(), test_size=0.2, random_state=0)
svm = sklearn.svm.SVC(kernel='rbf', probability=True), Y_train)

# use Kernel SHAP to explain test set predictions
explainer = shap.KernelExplainer(svm.predict_proba, X_train, link="logit")
shap_values = explainer.shap_values(X_test, nsamples=100)

# plot the SHAP values for the Setosa output of the first instance
shap.force_plot(explainer.expected_value[0], shap_values[0][0,:], X_test.iloc[0,:], link="logit")

The above explanation shows four features each contributing to push the model output from the base value (the average model output over the training dataset we passed) towards zero. If there were any features pushing the class label higher they would be shown in red.

If we take many explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset. This is exactly what we do below for all the examples in the iris test set:

# plot the SHAP values for the Setosa output of all instances
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test, link="logit")

SHAP Interaction Values

SHAP interaction values are a generalization of SHAP values to higher order interactions. Fast exact computation of pairwise interactions are implemented for tree models with shap.TreeExplainer(model).shap_interaction_values(X). This returns a matrix for every prediction, where the main effects are on the diagonal and the interaction effects are off-diagonal. These values often reveal interesting hidden relationships, such as how the increased risk of death peaks for men at age 60 (see the NHANES notebook for details):

Sample notebooks

The notebooks below demonstrate different use cases for SHAP. Look inside the notebooks directory of the repository if you want to try playing with the original notebooks yourself.


An implementation of Tree SHAP, a fast and exact algorithm to compute SHAP values for trees and ensembles of trees.

NHANES survival model with XGBoost and SHAP interaction values - Using mortality data from 20 years of followup this notebook demonstrates how to use XGBoost and shap to uncover complex risk factor relationships.

Census income classification with LightGBM - Using the standard adult census income dataset, this notebook trains a gradient boosting tree model with LightGBM and then explains predictions using shap.

League of Legends Win Prediction with XGBoost - Using a Kaggle dataset of 180,000 ranked matches from League of Legends we train and explain a gradient boosting tree model with XGBoost to predict if a player will win their match.


An implementation of Deep SHAP, a faster (but only approximate) algorithm to compute SHAP values for deep learning models that is based on connections between SHAP and the DeepLIFT algorithm.

MNIST Digit classification with Keras - Using the MNIST handwriting recognition dataset, this notebook trains a neural network with Keras and then explains predictions using shap.

Keras LSTM for IMDB Sentiment Classification - This notebook trains an LSTM with Keras on the IMDB text sentiment analysis dataset and then explains predictions using shap.


An implementation of expected gradients to approximate SHAP values for deep learning models. It is based on connections between SHAP and the Integrated Gradients algorithm. GradientExplainer is slower than DeepExplainer and makes different approximation assumptions.


For a linear model with independent features we can analytically compute the exact SHAP values. We can also account for feature correlation if we are willing to estimate the feature covaraince matrix. LinearExplainer supports both of these options.


An implementation of Kernel SHAP, a model agnostic method to estimate SHAP values for any model. Because it makes not assumptions about the model type, KernelExplainer is slower than the other model type specific algorithms.

Census income classification with scikit-learn - Using the standard adult census income dataset, this notebook trains a k-nearest neighbors classifier using scikit-learn and then explains predictions using shap.

ImageNet VGG16 Model with Keras - Explain the classic VGG16 convolutional nerual network's predictions for an image. This works by applying the model agnostic Kernel SHAP method to a super-pixel segmented image.

Iris classification - A basic demonstration using the popular iris species dataset. It explains predictions from six different models in scikit-learn using shap.

Documentation notebooks

These notebooks comprehensively demonstrate how to use specific functions and objects.

shap.decision_plot and shap.multioutput_decision_plot


Methods Unified by SHAP

LIME: Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.

Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.

DeepLIFT: Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. "Learning important features through propagating activation differences." arXiv preprint arXiv:1704.02685 (2017).

QII: Datta, Anupam, Shayak Sen, and Yair Zick. "Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems." Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 2016.

Layer-wise relevance propagation: Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): e0130140.

Shapley regression values: Lipovetsky, Stan, and Michael Conklin. "Analysis of regression in game theory approach." Applied Stochastic Models in Business and Industry 17.4 (2001): 319-330.

Tree interpreter: Saabas, Ando. Interpreting random forests.


The algorithms and visualizations used in this package came primarily out of research in Su-In Lee's lab at the University of Washington, and Microsoft Research. If you use SHAP in your research we would appreciate a citation to the appropriate paper(s):

Author: slundberg
Source Code:
License: MIT License

#deep-learning #keras #tensorflow 

Game Theory Approach to Explain The Outputs Of any Machine Learning
Daron  Moore

Daron Moore


Highly Optimized inference Engine for Binarized Neural Networks

Larq Compute Engine

Larq Compute Engine (LCE) is a highly optimized inference engine for deploying extremely quantized neural networks, such as Binarized Neural Networks (BNNs). It currently supports various mobile platforms and has been benchmarked on a Pixel 1 phone and a Raspberry Pi. LCE provides a collection of hand-optimized TensorFlow Lite custom operators for supported instruction sets, developed in inline assembly or in C++ using compiler intrinsics. LCE leverages optimization techniques such as tiling to maximize the number of cache hits, vectorization to maximize the computational throughput, and multi-threading parallelization to take advantage of multi-core modern desktop and mobile CPUs.

Larq Compute Engine is part of a family of libraries for BNN development; you can also check out Larq for building and training BNNs and Larq Zoo for pre-trained models.

Key Features

  • Effortless end-to-end integration from training to deployment:
    • Tight integration of LCE with Larq and TensorFlow provides a smooth end-to-end training and deployment experience.
    • A collection of Larq pre-trained BNN models for common machine learning tasks is available in Larq Zoo and can be used out-of-the-box with LCE.
    • LCE provides a custom MLIR-based model converter which is fully compatible with TensorFlow Lite and performs additional network level optimizations for Larq models.
  • Lightning fast deployment on a variety of mobile platforms:
    • LCE enables high performance, on-device machine learning inference by providing hand-optimized kernels and network level optimizations for BNN models.
    • LCE currently supports 64-bit ARM-based mobile platforms such as Android phones and Raspberry Pi boards.
    • Thread parallelism support in LCE is essential for modern mobile devices with multi-core CPUs.


The table below presents single-threaded performance of Larq Compute Engine on different versions of a novel BNN model called QuickNet (trained on ImageNet dataset, released on Larq Zoo) on a Pixel 1 phone (2016) and a Raspberry Pi 4 Model B (BCM2711) board:

ModelTop-1 AccuracyRPi 4 B, ms (1 thread)Pixel 1, ms (1 thread)
QuickNet (.h5)58.6 %31.416.8
QuickNet-Large (.h5)62.7 %48.725.5
QuickNet-XL (.h5)67.0 %82.944.2

For reference, dabnn (the other main BNN library) reports an inference time of 61.3 ms for Bi-RealNet (56.4% accuracy) on the Pixel 1 phone, while LCE achieves an inference time of 41.6 ms for Bi-RealNet on the same device. They furthermore present a modified version, BiRealNet-Stem, which achieves the same accuracy of 56.4% in 43.2 ms.

The following table presents multi-threaded performance of Larq Compute Engine on a Pixel 1 phone and a Raspberry Pi 4 Model B (BCM2711) board:

ModelTop-1 AccuracyRPi 4 B, ms (4 threads)Pixel 1, ms (4 threads)
QuickNet (.h5)58.6 %16.18.9
QuickNet-Large (.h5)62.7 %24.712.6
QuickNet-XL (.h5)67.0 %37.922.8

Benchmarked on August 21st, 2020 with LCE custom TFLite Model Benchmark Tool (see here) and BNN models with randomized inputs.

Getting started

Follow these steps to deploy a BNN with LCE:

Pick a Larq model

You can use Larq to build and train your own model or pick a pre-trained model from Larq Zoo.

Convert the Larq model

LCE is built on top of TensorFlow Lite and uses the TensorFlow Lite FlatBuffer format to convert and serialize Larq models for inference. We provide an LCE Converter with additional optimization passes to increase the speed of execution of Larq models on supported target platforms.

Build LCE

The LCE documentation provides the build instructions for Android and 64-bit ARM-based boards such as Raspberry Pi. Please follow the provided instructions to create a native LCE build or cross-compile for one of the supported targets.

Run inference

LCE uses the TensorFlow Lite Interpreter to perform an inference. In addition to the already available built-in TensorFlow Lite operators, optimized LCE operators are registered to the interpreter to execute the Larq specific subgraphs of the model. An example to create and build an LCE compatible TensorFlow Lite interpreter for your own applications is provided here.

Next steps


Larq Compute Engine is being developed by a team of deep learning researchers and engineers at Plumerai to help accelerate both our own research and the general adoption of Binarized Neural Networks.

Author: larq
Source Code:
License: Apache-2.0 License

#cpluplus #keras #tensorflow 

Highly Optimized inference Engine for Binarized Neural Networks
Daron  Moore

Daron Moore


Tools To Help Users Interoperate Between Deep Learning Frameworks


MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model management, and "dnn" is the acronym of deep neural network.

Major features include:

Model Conversion

  • We implement a universal converter to convert DL models between frameworks, which means you can train a model with one framework and deploy it with another.

Model Retraining

  • During the model conversion, we generate some code snippets to simplify later retraining or inference.

Model Search & Visualization

Model Deployment

We provide some guidelines to help you deploy DL models to another hardware platform.

We provide a guide to help you accelerate inference with TensorRT.

Related Projects

Targeting at openness and advancing state-of-art technology, Microsoft Research (MSR) and Microsoft Software Technology Center (STC) had also released few other open source projects:

  • OpenPAI : an open source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud and hybrid environments in various scale.
  • FrameworkController : an open source general-purpose Kubernetes Pod Controller that orchestrate all kinds of applications on Kubernetes by a single controller.
  • NNI : a lightweight but powerful toolkit to help users automate Feature Engineering, Neural Architecture Search, Hyperparameter Tuning and Model Compression.
  • NeuronBlocks : an NLP deep learning modeling toolkit that helps engineers to build DNN models like playing Lego. The main goal of this toolkit is to minimize developing cost for NLP deep neural network model building, including both training and inference stages.
  • SPTAG : Space Partition Tree And Graph (SPTAG) is an open source library for large scale vector approximate nearest neighbor search scenario.

We encourage researchers, developers and students to leverage these projects to boost their AI / Deep Learning productivity.


Install manually

You can get a stable version of MMdnn by

pip install mmdnn

And make sure to have Python installed or you can try the newest version by

pip install -U git+

Install with docker image

MMdnn provides a docker image, which packages MMdnn and Deep Learning frameworks that we support as well as other dependencies. You can easily try the image with the following steps:

Install Docker Community Edition(CE)

Learn more about how to install docker

Pull MMdnn docker image

docker pull mmdnn/mmdnn:cpu.small

Run image in an interactive mode

docker run -it mmdnn/mmdnn:cpu.small


Model Conversion

Across the industry and academia, there are a number of existing frameworks available for developers and researchers to design a model, where each framework has its own network structure definition and saving model format. The gaps between frameworks impede the inter-operation of the models.

We provide a model converter to help developers convert models between frameworks through an intermediate representation format.

Support frameworks

[Note] You can click the links to get detailed README of each framework.

Tested models

The model conversion between currently supported frameworks is tested on some ImageNet models.

VGG 19
Inception V1
Inception V3
Inception V4o
ResNet V1×o
ResNet V2
MobileNet V1×o
MobileNet V2×o
voc FCN      


One command to achieve the conversion. Using TensorFlow ResNet V2 152 to PyTorch as our example.

$ mmdownload -f tensorflow -n resnet_v2_152 -o ./
$ mmconvert -sf tensorflow -in imagenet_resnet_v2_152.ckpt.meta -iw imagenet_resnet_v2_152.ckpt --dstNodeName MMdnn_Output -df pytorch -om tf_resnet_to_pth.pth


On-going frameworks

  • Torch7 (help wanted)
  • Chainer (help wanted)

On-going Models

  • Face Detection
  • Semantic Segmentation
  • Image Style Transfer
  • Object Detection
  • RNN

Model Visualization

We provide a local visualizer to display the network architecture of a deep learning model. Please refer to the instruction.


Official Tutorial

Keras "inception V3" to CNTK and related issue

TensorFlow slim model "ResNet V2 152" to PyTorch

Mxnet model "LResNet50E-IR" to TensorFlow and related issue

Users' Examples

MXNet "ResNet-152-11k" to PyTorch

Another Example of MXNet "ResNet-152-11k" to PyTorch

MXNet "ResNeXt" to Keras

TensorFlow "ResNet-101" to PyTorch

TensorFlow "mnist mlp model" to CNTK

TensorFlow "Inception_v3" to MXNet

Caffe "voc-fcn" to TensorFlow

Caffe "AlexNet" to TensorFlow

Caffe "inception_v4" to TensorFlow

Caffe "VGG16_SOD" to TensorFlow

Caffe "SqueezeNet v1.1" to CNTK


Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to and actually do, grant us the rights to use your contribution. For details, visit

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact with any additional questions or comments.

Intermediate Representation

The intermediate representation stores the network architecture in protobuf binary and pre-trained weights in NumPy native format.

[Note!] Currently the IR weights data is in NHWC (channel last) format.

Details are in ops.txt and graph.proto. New operators and any comments are welcome.


We are working on other frameworks conversion and visualization, such as PyTorch, CoreML and so on. We're investigating more RNN related operators. Any contributions and suggestions are welcome! Details in Contribution Guideline.


Yu Liu (Peking University): Project Developer & Maintainer

Cheng CHEN (Microsoft Research Asia): Caffe, CNTK, CoreML Emitter, Keras, MXNet, TensorFlow

Jiahao YAO (Peking University): CoreML, MXNet Emitter, PyTorch Parser; HomePage

Ru ZHANG (Chinese Academy of Sciences): CoreML Emitter, DarkNet Parser, Keras, TensorFlow frozen graph Parser; Yolo and SSD models; Tests

Yuhao ZHOU (Shanghai Jiao Tong University): MXNet

Tingting QIN (Microsoft Research Asia): Caffe Emitter

Tong ZHAN (Microsoft): ONNX Emitter

Qianwen WANG (Hong Kong University of Science and Technology): Visualization


Thanks to Saumitro Dasgupta, the initial code of caffe -> IR converting is references to his project caffe-tensorflow.


Licensed under the MIT license.

Author: microsoft
Source Code:
License: MIT License

#tensorflow #python #keras #pytorch 

Tools To Help Users Interoperate Between Deep Learning Frameworks
Dominic  Feeney

Dominic Feeney


TensorFlow Cloud: Provides APIs for Debugging, Training, Tuning Keras

TensorFlow Cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging, training, tuning your Keras and TensorFlow code in a local environment to distributed training/tuning on Cloud.


TensorFlow Cloud run API for GCP training/tuning



For detailed end to end setup instructions, please see Setup instructions.

Install latest release

pip install -U tensorflow-cloud

Install from source

git clone
cd cloud
pip install src/python/.

High level overview

TensorFlow Cloud package provides the run API for training your models on GCP. To start, let's walk through a simple workflow using this API.

  1. Let's begin with a Keras model training code such as the following, saved as
import tensorflow as tf

(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255

model = tf.keras.Sequential([
  tf.keras.layers.Dense(512, activation='relu', input_shape=(28 * 28,)),
  tf.keras.layers.Dense(10, activation='softmax')

              metrics=['accuracy']), y_train, epochs=10, batch_size=128)

2.   After you have tested this model on your local environment for a few epochs, probably with a small dataset, you can train the model on Google Cloud by writing the following simple script

import tensorflow_cloud as tfc'')

Running will automatically apply TensorFlow one device strategy and train your model at scale on Google Cloud Platform. Please see the usage guide section for detailed instructions and additional API parameters.

3.   You will see an output similar to the following on your console. This information can be used to track the training job status.

user@desktop$ python
Job submitted successfully.
Your job ID is:  tf_cloud_train_519ec89c_a876_49a9_b578_4fe300f8865e
Please access your job logs at the following URL:

Setup instructions

End to end instructions to help set up your environment for Tensorflow Cloud. You use one of the following notebooks to setup your project or follow the instructions below.

Colab logoRun in Colab GitHub logoView on GitHub Kaggle logoRun in Kaggle 
  1. Create a new local directory
mkdir tensorflow_cloud
cd tensorflow_cloud

2.   Make sure you have python >= 3.6

python -V

3.   Set up virtual environment

virtualenv tfcloud --python=python3
source tfcloud/bin/activate

4.   Set up your Google Cloud project

Verify that gcloud sdk is installed.

which gcloud

Set default gcloud project

export PROJECT_ID=<your-project-id>
gcloud config set project $PROJECT_ID

5.   Authenticate your GCP account

Create a service account.

export SA_NAME=<your-sa-name>
gcloud iam service-accounts create $SA_NAME
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$SA_NAME@$ \
    --role 'roles/editor'

Create a key for your service account.

gcloud iam service-accounts keys create ~/key.json --iam-account $SA_NAME@$

Create the GOOGLE_APPLICATION_CREDENTIALS environment variable.


6.   Create a Cloud Storage bucket. Using Google Cloud build is the recommended method for building and publishing docker images, although we optionally allow for local docker daemon process depending on your specific needs.

gcloud auth login
gsutil mb -l $REGION gs://$BUCKET_NAME

(optional for local docker setup) shell sudo dockerd

7.   Authenticate access to Google Cloud registry.

gcloud auth configure-docker

8.   Install nbconvert if you plan to use a notebook file entry_point as shown in usage guide #4.

pip install nbconvert

9.   Install latest release of tensorflow-cloud

pip install tensorflow-cloud

Usage guide

As described in the high level overview, the run API allows you to train your models at scale on GCP. The run API can be used in four different ways. This is defined by where you are running the API (Terminal vs IPython notebook), and your entry_point parameter. entry_point is an optional Python script or notebook file path to the file that contains your TensorFlow Keras training code. This is the most important parameter in the API.

  1. Using a python file as entry_point.

If you have your tf.keras model in a python file (, then you can write the following simple script ( to scale your model on GCP.

import tensorflow_cloud as tfc'')

Please note that all the files in the same directory tree as entry_point will be packaged in the docker image created, along with the entry_point file. It's recommended to create a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times.

2.   Using a notebook file as entry_point.

If you have your tf.keras model in a notebook file (mnist_example.ipynb), then you can write the following simple script ( to scale your model on GCP.

import tensorflow_cloud as tfc'mnist_example.ipynb')

Please note that all the files in the same directory tree as entry_point will be packaged in the docker image created, along with the entry_point file. Like the python script entry_point above, we recommended creating a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times.

3.   Using run within a python script that contains the tf.keras model.

You can use the run API from within your python file that contains the tf.keras model ( In this use case, entry_point should be None. The run API can be called anywhere and the entire file will be executed remotely. The API can be called at the end to run the script locally for debugging purposes (possibly with fewer epochs and other flags).

import tensorflow_datasets as tfds
import tensorflow as tf
import tensorflow_cloud as tfc

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']

num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255
    return image, label

train_dataset =
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(
        28, 28, 1)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')

              metrics=['accuracy']), epochs=12)

Please note that all the files in the same directory tree as the python script will be packaged in the docker image created, along with the python file. It's recommended to create a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times.

4.   Using run within a notebook script that contains the tf.keras model.

Image of colab

In this use case, entry_point should be None and docker_config.image_build_bucket must be specified, to ensure the build can be stored and published.

Cluster and distribution strategy configuration

By default, run API takes care of wrapping your model code in a TensorFlow distribution strategy based on the cluster configuration you have provided.

No distribution

CPU chief config and no additional workers'',


1 GPU on chief (defaults to AcceleratorType.NVIDIA_TESLA_T4) and no additional workers.'')


Chief config with multiple GPUS (AcceleratorType.NVIDIA_TESLA_V100).


Chief config with 1 GPU and 2 workers each with 8 GPUs (AcceleratorType.NVIDIA_TESLA_V100).'',


Chief config with 1 CPU and 1 worker with TPU."",

Please note that TPUStrategy with TensorFlow Cloud works only with TF version 2.1 as this is the latest version supported by AI Platform cloud TPU

Custom distribution strategy

If you would like to take care of specifying distribution strategy in your model code and do not want run API to create a strategy, then set distribution_stategy as None. This will be required for example when you are using strategy.experimental_distribute_dataset.

What happens when you call run?

The API call will encompass the following:

  1. Making code entities such as a Keras script/notebook, cloud and distribution ready.
  2. Converting this distribution entity into a docker container with the required dependencies.
  3. Deploy this container at scale and train using TensorFlow distribution strategies.
  4. Stream logs and monitor them on hosted TensorBoard, manage checkpoint storage.

By default, we will use local docker daemon for building and publishing docker images to Google container registry. Images are published to If you specify docker_config.image_build_bucket, then we will use Google Cloud build to build and publish docker images.

We use Google AI platform for deploying docker images on GCP.

Please note that, when entry_point argument is specified, all the files in the same directory tree as entry_point will be packaged in the docker image created, along with the entry_point file.

Please see run API documentation for detailed information on the parameters and how you can modify the above processes to suit your needs.

End to end examples

cd src/python/tensorflow_cloud/core
python tests/examples/

Running unit tests

pytest src/python/tensorflow_cloud/core/tests/unit/

Local vs remote training

Things to keep in mind when running your jobs remotely:

[Coming soon]

Debugging workflow

Here are some tips for fixing unexpected issues.

Operation disallowed within distribution strategy scope

Error like: Creating a generator within a strategy scope is disallowed, because there is ambiguity on how to replicate a generator (e.g. should it be copied so that each replica gets the same random numbers, or 'split' so that each replica gets different random numbers).

Solution: Passing distribution_strategy='auto' to run API wraps all of your script in a TF distribution strategy based on the cluster configuration provided. You will see the above error or something similar to it, if for some reason an operation is not allowed inside distribution strategy scope. To fix the error, please pass None to the distribution_strategy param and create a strategy instance as part of your training code as shown in this example.

Docker image build timeout

Error like: requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))

Solution: The directory being used as an entry point likely has too much data for the image to successfully build, and there may be extraneous data included in the build. Reformat your directory structure such that the folder which contains the entry point only includes files necessary for the current project.

Version not supported for TPU training

Error like: There was an error submitting the job.Field: tpu_tf_version Error: The specified runtime version '2.3' is not supported for TPU training. Please specify a different runtime version.

Solution: Please use TF version 2.1. See TPU Strategy in Cluster and distribution strategy configuration section.

TF nightly build.

Warning like: Docker parent image '2.4.0.dev20200720' does not exist. Using the latest TF nightly build.

Solution: If you do not provide docker_config.parent_image param, then by default we use pre-built TF docker images as parent image. If you do not have TF installed on the environment where run is called, then TF docker image for the latest stable release will be used. Otherwise, the version of the docker image will match the locally installed TF version. However, pre-built TF docker images aren't available for TF nightlies except for the latest. So, if your local TF is an older nightly version, we upgrade to the latest nightly automatically and raise this warning.

Mixing distribution strategy objects.

Error like: RuntimeError: Mixing different tf.distribute.Strategy objects.

Solution: Please provide distribution_strategy=None when you already have a distribution strategy defined in your model code. Specifying distribution_strategy'='auto', will wrap your code in a TensorFlow distribution strategy. This will cause the above error, if there is a strategy object already used in your code.

Coming up

  • Distributed Keras tuner support.


We welcome community contributions, see and, for style help, Writing TensorFlow documentation guide.

Privacy Notice

This application reports technical and operational details of your usage of Cloud Services in accordance with Google privacy policy, for more information please refer to If you wish to opt-out, you may do so by running tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().

Download Details:
Author: tensorflow
Source Code:
License: Apache-2.0 License

#tensorflow #keras #python #cloud #api 

TensorFlow Cloud: Provides APIs for Debugging, Training, Tuning Keras