Gordon  Matlala

Gordon Matlala


LangFlow: A User Interface For LangChain

⛓️ LangFlow

~ A User Interface For LangChain ~

LangFlow is a GUI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows with drag-and-drop components and a chat box.

📦 Installation

You can install LangFlow from pip:

pip install langflow

Next, run:


🎨 Creating Flows

Creating flows with LangFlow is easy. Simply drag sidebar components onto the canvas and connect them together to create your pipeline. LangFlow provides a range of LangChain components to choose from, including LLMs, prompt serializers, agents, and chains.

Explore by editing prompt parameters, link chains and agents, track an agent's thought process, and export your flow.

👋 Contributing

We welcome contributions from developers of all levels to our open-source project on GitHub. If you'd like to contribute, please check our contributing guidelines and help make LangFlow more accessible.

Star History Chart

Download Details:

Author: logspace-ai
Source Code: https://github.com/logspace-ai/langflow 
License: MIT license

#react #flow #language #models #chatgpt 

LangFlow: A User Interface For LangChain

Evals: A Framework for Evaluating OpenAI Models


Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.

You can use Evals to create and run evaluations that:

  • use datasets to generate prompts,
  • measure the quality of completions provided by an OpenAI model, and
  • compare performance across different datasets and models.

With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. To get started, we recommend that you follow these steps in order:

  1. Read through this doc and follow the setup instructions below.
  2. Learn how to run existing evals: run-evals.md.
  3. Familiarize yourself with the existing eval templates: eval-templates.md.
  4. Walk through the process for building an eval: build-eval.md
  5. See an example of implementing custom eval logic: custom-eval.md.

If you think you have an interesting eval, please open a PR with your contribution. OpenAI staff actively review these evals when considering improvements to upcoming models.

🚨 For a limited time, we will be granting GPT-4 access to those who contribute high quality evals. Please follow the instructions mentioned above and note that spam or low quality submissions will be ignored❗️

Access will be granted to the email address associated with an accepted Eval. Due to high volume, we are unable to grant access to any email other than the one used for the pull request.


To run evals, you will need to set up and specify your OpenAI API key. You can generate one at https://platform.openai.com/account/api-keys. After you obtain an API key, specify it using the OPENAI_API_KEY environment variable. Please be aware of the costs associated with using the API when running evals.

Downloading evals

Our Evals registry is stored using Git-LFS. Once you have downloaded and installed LFS, you can fetch the evals with:

git lfs fetch --all
git lfs pull

You may just want to fetch data for a select eval. You can achieve this via:

git lfs fetch --include=evals/registry/data/${your eval}
git lfs pull

Making evals

If you are going to be creating evals, we suggest cloning this repo directly from GitHub and installing the requirements using the following command:

pip install -e .

Using -e, changes you make to your eval will be reflected immediately without having to reinstall.

Running evals

If you don't want to contribute new evals, but simply want to run them locally, you can install the evals package via pip:

pip install evals

We provide the option for you to log your eval results to a Snowflake database, if you have one or wish to set one up. For this option, you will further have to specify the SNOWFLAKE_ACCOUNT, SNOWFLAKE_DATABASE, SNOWFLAKE_USERNAME, and SNOWFLAKE_PASSWORD environment variables.


Do you have any examples of how to build an eval from start to finish?

  • Yes! These are in the examples folder. We recommend that you also read through build-eval.md in order to gain a deeper understanding of what is happening in these examples.

Do you have any examples of evals implemented in multiple different ways?

  • Yes! In particular, see evals/registry/evals/coqa.yaml. We have implemented small subsets of the CoQA dataset for various eval templates to help illustrate the differences.

I changed my data but this isn't reflected when running my eval, what's going on?

  • Your data may have been cached to /tmp/filecache. Try removing this cache and rerunning your eval.

There's a lot of code, and I just want to spin up a quick eval. Help? OR,

I am a world-class prompt engineer. I choose not to code. How can I contribute my wisdom?

  • If you follow an existing eval template to build a basic or model-graded eval, you don't need to write any evaluation code at all! Just provide your data in JSON format and specify your eval parameters in YAML. build-eval.md walks you through these steps, and you can supplement these instructions with the Jupyter notebooks in the examples folder to help you get started quickly. Keep in mind, though, that a good eval will inevitably require careful thought and rigorous experimentation!


By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies: https://platform.openai.com/docs/usage-policies.

Download Details:

Author: openai
Source Code: https://github.com/openai/evals 
License: MIT license

#python #openai #models #opensource 

Evals: A Framework for Evaluating OpenAI Models

DirichletProcessMixtures.jl: Dirichlet Process Mixture Models in Julia


This package implements Dirichlet Process Mixture Models in Julia using variational inference for truncated stick-breaking representation of Dirichlet Process.

(almost) infinite mixture of Gaussians

Most likely you need this package especially for this purpose, this is how to do Gaussian clustering. You may check demo code which contains almost all functionality you may need.

First off, you define your prior over parameters of mixture component (i.e. mean and precision matrix) using NormalWishart distribution:

using DirichletProcessMixtures
using Distributions

prior = NormalWishart(zeros(2), 1e-7, eye(2) / 4, 4.0001)

Then you generate your mixture

x = ... # your data, x[:, i] - is i-th data point
T = 20 # truncation level
alpha = 0.1 # Dirichlet process parameter, controls how many clusters you need a priori
gm, theta, predictive_likelihood = gaussian_mixture(prior, T, alpha, x)

gm is an internal representation of mixture model. theta is array of size T whose elements refer to parameters of posterior NormalWishart's. Finally, predictive_likelihood is a function which takes a matrix containing test data and returns per-point test loglikelihood. Now we can perform inference in our model

function iter_callback(mix::TSBPMM, iter::Int64, lower_bound::Float64)
    pl = sum(predictive_likelihood(xtest)) / M
    println("iteration $iter test likelihood=$pl, lower_bound=$lower_bound")

maxiter = 200
ltol = 1e-5
niter = infer(gm, maxiter, ltol; iter_callback=iter_callback)

You may see that infer method performs not more than maxiter iterations until lower bound tolerance reaches ltol value, calling iter_callback at each iteration if provided.

Another useful quantities you may need from mixture model:

  • gm.z - TxN array with expected mixture component assignments
  • gm.qv - posterior Beta distributions for stick-breaking proportions

General interface

It is also possible to implement custom mixture models with conjugate priors for mixture components, but this remains to be documented yet. For a reference implementation of custom mixture model use mixture of Gaussians.

Download Details:

Author: sbos
Source Code: https://github.com/sbos/DirichletProcessMixtures.jl 
License: View license

#julia #models 

DirichletProcessMixtures.jl: Dirichlet Process Mixture Models in Julia

A Julia Package for Modeling & Fitting Generalized Low Rank Models


LowRankModels.jl is a Julia package for modeling and fitting generalized low rank models (GLRMs). GLRMs model a data array by a low rank matrix, and include many well known models in data analysis, such as principal components analysis (PCA), matrix completion, robust PCA, nonnegative matrix factorization, k-means, and many more.

For more information on GLRMs, see our paper. There is a python interface to this package, and a GLRM implementation in the H2O machine learning platform with interfaces in a variety of languages.

LowRankModels.jl makes it easy to mix and match loss functions and regularizers to construct a model suitable for a particular data set. In particular, it supports

  • using different loss functions for different columns of the data array, which is useful when data types are heterogeneous (e.g., real, boolean, and ordinal columns);
  • fitting the model to only some of the entries in the table, which is useful for data tables with many missing (unobserved) entries; and
  • adding offsets and scalings to the model without destroying sparsity, which is useful when the data is poorly scaled.


To install, just call


at the Julia prompt.

Generalized Low Rank Models

GLRMs form a low rank model for tabular data A with m rows and n columns, which can be input as an array or any array-like object (for example, a data frame). It is fine if only some of the entries have been observed (i.e., the others are missing); the GLRM will only be fit on the !ismissing entries. The desired model is specified by choosing a rank k for the model, an array of loss functions losses, and two regularizers, rx and ry. The data is modeled as X'*Y, where X is a kxm matrix and Y is a kxn matrix. X and Y are found by solving the optimization problem

minimize sum_{(i,j) in obs} losses[j]((X'*Y)[i,j], A[i,j]) + sum_i rx(X[:,i]) + sum_j ry(Y[:,j])

The basic type used by LowRankModels.jl is the GLRM. To form a GLRM, the user specifies

  • the data A (any AbstractArray, such as an array, a sparse matrix, or a data frame)
  • the array of loss functions losses
  • the regularizers rx and ry
  • the rank k

The user may also specify

  • the observed entries obs
  • starting matrices X₀ and Y₀

obs is a list of tuples of the indices of the observed entries in the matrix, and may be omitted if all the entries in the matrix have been observed. If A is a sparse matrix, implicit zeros are interpreted as missing entries by default; see the discussion of sparse matrices below for more details. X₀ and Y₀ are initialization matrices that represent a starting guess for the optimization.

Losses and regularizers must be of type Loss and Regularizer, respectively, and may be chosen from a list of supported losses and regularizers, which include


  • quadratic loss QuadLoss
  • hinge loss HingeLoss
  • logistic loss LogisticLoss
  • Poisson loss PoissonLoss
  • weighted hinge loss WeightedHingeLoss
  • l1 loss L1Loss
  • ordinal hinge loss OrdinalHingeLoss
  • periodic loss PeriodicLoss
  • multinomial categorical loss MultinomialLoss
  • multinomial ordinal (aka ordered logit) loss OrderedMultinomialLoss
  • bigger-vs-smaller loss BvSLoss (for ordinal data)
  • one-vs all-loss OvALoss (for categorical data)

The constructors for all the ordinal and categorical losses take as an argument the maximum (or both minimum and maximum) value the variable may take. Using the one-vs-all loss is equivalent to transforming a categorical value to a one-hot vector and using a binary loss on each entry in that vector. Using the bigger-vs-smaller loss is equivalent to transforming the ordinal value to a Boolean vector and using a binary loss on each entry in that vector. By default, the binary loss used is the logistic loss.


  • quadratic regularization QuadReg
  • constrained squared euclidean norm QuadConstraint
  • l1 regularization OneReg
  • no regularization ZeroReg
  • nonnegative constraint NonNegConstraint (e.g., for nonnegative matrix factorization)
  • 1-sparse constraint OneSparseConstraint (e.g., for orthogonal NNMF)
  • unit 1-sparse constraint UnitOneSparseConstraint (e.g., for k-means)
  • simplex constraint SimplexConstraint
  • l1 regularization, combined with nonnegative constraint NonNegOneReg
  • fix features at values y0 FixedLatentFeaturesConstraint(y0)

Each of these losses and regularizers can be scaled (for example, to increase the importance of the loss relative to the regularizer) by calling mul!(loss, newscale). Users may also implement their own losses and regularizers, or adjust internal parameters of the losses and regularizers; see losses.jl and regularizers.jl for more details.


For example, the following code forms a k-means model with k=5 on the 100x100 matrix A:

using LowRankModels
m, n, k = 100, 100, 5
losses = QuadLoss() # minimize squared distance to cluster centroids
rx = UnitOneSparseConstraint() # each row is assigned to exactly one cluster
ry = ZeroReg() # no regularization on the cluster centroids
glrm = GLRM(A, losses, rx, ry, k)

To fit the model, call

X, Y, ch = fit!(glrm)

which runs an alternating directions proximal gradient method on glrm to find the X and Y minimizing the objective function. (ch gives the convergence history; see Technical details below for more information.)

The losses argument can also be an array of loss functions, with one for each column (in order). For example, for a data set with 3 columns, you could use

losses = Loss[QuadLoss(), LogisticLoss(), HingeLoss()]

Similiarly, the ry argument can be an array of regularizers, with one for each column (in order). For example, for a data set with 3 columns, you could use

ry = Regularizer[QuadReg(1), QuadReg(10), FixedLatentFeaturesConstraint([1.,2.,3.])]

This regularizes the first to columns of Y with ||Y[:,1]||^2 + 10||Y[:,2]||^2 and constrains the third (and last) column of Y to be equal to [1,2,3].

More examples here.

Missing data

If not all entries are present in your data table, just tell the GLRM which observations to fit the model to by listing tuples of their indices in obs, e.g., if obs=[(1,2), (5,3)], exactly two entries have been observed. Then initialize the model using

GLRM(A, losses, rx, ry, k, obs=obs)

If A is a DataFrame and you just want the model to ignore any entry that is missing, you can use

obs = observations(A)

Standard low rank models

Low rank models can easily be used to fit standard models such as PCA, k-means, and nonnegative matrix factorization. The following functions are available:

  • pca: principal components analysis
  • qpca: quadratically regularized principal components analysis
  • rpca: robust principal components analysis
  • nnmf: nonnegative matrix factorization
  • k-means: k-means

See the code for usage. Any keyword argument valid for a GLRM object, such as an initial value for X or Y or a list of observations, can also be used with these standard low rank models.

Scaling and offsets

If you choose, LowRankModels.jl can add an offset to your model and scale the loss functions and regularizers so all columns have the same pull in the model. Simply call

glrm = GLRM(A, losses, rx, ry, k, offset=true, scale=true)

This transformation generalizes standardization, a common preprocessing technique applied before PCA. (For more about offsets and scaling, see the code or the paper.)

You can also add offsets and scalings to previously unscaled models:

  • Add an offset to the model (by applying no regularization to the last row of the matrix Y, and enforcing that the last column of X be all 1s) using
  • Scale the loss functions and regularizers by calling
  • Scale only the columns associated to QuadLoss or HuberLoss loss functions.

Fitting DataFrames

Perhaps all this sounds like too much work. Perhaps you happen to have a DataFrame df lying around that you'd like a low rank (e.g., k=2) model for. For example,

import RDatasets
df = RDatasets.dataset("psych", "msq")

Never fear! Just call

glrm, labels = GLRM(df, k)
X, Y, ch = fit!(glrm)

This will fit a GLRM with rank k to your data, using a QuadLoss loss for real valued columns, HingeLoss loss for boolean columns, and ordinal HingeLoss loss for integer columns, a small amount of QuadLoss regularization, and scaling and adding an offset to the model as described here. It returns the column labels for the columns it fit, along with the model. Right now, all other data types are ignored. NaN values are treated as missing values (missings) and ignored in the fit.

The full call signature is

function GLRM(df::DataFrame, k::Int;
              losses = Loss[], rx = QuadReg(.01), ry = QuadReg(.01),
              offset = true, scale = false,
              prob_scale = true, NaNs_to_NAs = true)

You can modify the losses or regularizers, or turn off offsets or scaling, using these keyword arguments.

Or to specify a map from data types to losses, define a new loss_map from datatypes to losses (like probabilistic_losses, below):

probabilistic_losses = Dict{Symbol, Any}(
    :real        => QuadLoss,
    :bool        => LogisticLoss,
    :ord         => MultinomialOrdinalLoss,
    :cat         => MultinomialLoss

and input an array of datatypes (one for each column of your data frame: GLRM(A, k, datatypes; loss_map = loss_map). The full call signature is

function GLRM(df::DataFrame, k::Int, datatypes::Array{Symbol,1};
              loss_map = probabilistic_losses,
              rx = QuadReg(.01), ry = QuadReg(.01),
              offset = true, scale = false, prob_scale = true,
              transform_data_to_numbers = true, NaNs_to_NAs = true)

You can modify the losses or regularizers, or turn off offsets or scaling, using these keyword arguments.

To fit a data frame with categorical values, you can use the function expand_categoricals! to turn categorical columns into a Boolean column for each level of the categorical variable. For example, expand_categoricals!(df, [:gender]) will replace the gender column with a column corresponding to gender=male, a column corresponding to gender=female, and other columns corresponding to labels outside the gender binary, if they appear in the data set.

You can use the model to get some intuition for the data set. For example, try plotting the columns of Y with the labels; you might see that similar features are close to each other!

Fitting Sparse Matrices

If you have a very large, sparsely observed dataset, then you may want to encode your data as a sparse matrix. By default, LowRankModels interprets the sparse entries of a sparse matrix as missing entries (i.e. NA values). There is no need to pass the indices of observed entries (obs) -- this is done automatically when GLRM(A::SparseMatrixCSC,...) is called. In addition, calling fit!(glrm) when glrm.A is a sparse matrix will use the sparse variant of the proximal gradient descent algorithm, fit!(glrm, SparseProxGradParams(); kwargs...).

If, instead, you'd like to interpret the sparse entries as zeros, rather than missing or NA entries, use:

glrm = GLRM(...; sparse_na=false)

In this case, the dataset is dense in terms of observations, but sparse in terms of nonzero values. Thus, it may make more sense to fit the model with the vanilla proximal gradient descent algorithm, fit!(glrm, ProxGradParams(); kwargs...).

Parallel fitting (experimental)

LowRankModels makes use of Julia v0.5's new multithreading functionality to fit models in parallel. To fit a LowRankModel in parallel using multithreading, simply set the number of threads from the command line before starting Julia: e.g.,


Technical details


The function fit! uses an alternating directions proximal gradient method to minimize the objective. This method is not guaranteed to converge to the optimum, or even to a local minimum. If your code is not converging or is converging to a model you dislike, there are a number of parameters you can tweak.

Warm start

The algorithm starts with glrm.X and glrm.Y as the initial estimates for X and Y. If these are not given explicitly, they will be initialized randomly. If you have a good guess for a model, try setting them explicitly. If you think that you're getting stuck in a local minimum, try reinitializing your GLRM (so as to construct a new initial random point) and see if the model you obtain improves.

The function fit! sets the fields glrm.X and glrm.Y after fitting the model. This is particularly useful if you want to use the model you generate as a warm start for further iterations. If you prefer to preserve the original glrm.X and glrm.Y (e.g., for cross validation), you should call the function fit, which does not mutate its arguments.

You can even start with an easy-to-optimize loss function, run fit!, change the loss function (glrm.losses = newlosses), and keep going from your warm start by calling fit! again to fit the new loss functions.


If you don't have a good guess at a warm start for your model, you might try one of the initializations provided in LowRankModels.

  • init_svd! initializes the model as the truncated SVD of the matrix of observed entries, with unobserved entries filled in with zeros. This initialization is known to result in provably good solutions for a number of "PCA-like" problems. See our paper for details.
  • init_kmeanspp! initializes the model using a modification of the kmeans++ algorithm for data sets with missing entries; see our paper for details. This works well for fitting clustering models, and may help in achieving better fits for nonnegative matrix factorization problems as well.
  • init_nndsvd! initializes the model using a modification of the NNDSVD algorithm as implemented by the NMF package. This modification handles data sets with missing entries by replacing missing entries with zeros. Optionally, by setting the argument max_iters=n with n>0, it will iteratively replace missing entries by their values as imputed by the NNDSVD, and call NNDSVD again on the new matrix. (This procedure is similar to the soft impute method of Mazumder, Hastie and Tibshirani for matrix completion.)


As mentioned earlier, LowRankModels uses alternating proximal gradient descent to derive estimates of X and Y. This can be done by two slightly different procedures: (A) compute the full reconstruction, X' * Y, to compute the gradient and objective function; (B) only compute the model estimate for entries of A that are observed. The first method is likely preferred when there are few missing entries for A because of hardware level optimizations (e.g. chunking the operations so they just fit in various caches). The second method is likely preferred when there are many missing entries of A.

To fit with the first (dense) method:

fit!(glrm, ProxGradParams(); kwargs...)

To fit with the second (sparse) method:

fit!(glrm, SparseProxGradParams(); kwargs...)

The first method is used by default if glrm.A is a standard matrix/array. The second method is used by default if glrm.A is a SparseMatrixCSC.

ProxGradParams() and SparseProxGradParams() run these respective methods with the default parameters:

  • stepsize: The step size controls the speed of convergence. Small step sizes will slow convergence, while large ones will cause divergence. stepsize should be of order 1.
  • abs_tol: The algorithm stops when the decrease in the objective per iteration is less than abs_tol*length(obs).
  • rel_tol: The algorithm stops when the decrease in the objective per iteration is less than rel_tol.
  • max_iter: The algorithm also stops if maximum number of rounds max_iter has been reached.
  • min_stepsize: The algorithm also stops if stepsize decreases below this limit.
  • inner_iter: specifies how many proximal gradient steps to take on X before moving on to Y (and vice versa).

The default parameters are: ProxGradParams(stepsize=1.0;max_iter=100,inner_iter=1,abs_tol=0.00001,rel_tol=0.0001,min_stepsize=0.01*stepsize)


ch gives the convergence history so that the success of the optimization can be monitored; ch.objective stores the objective values, and ch.times captures the times these objective values were achieved. Try plotting this to see if you just need to increase max_iter to converge to a better model.


After fitting a GLRM, you can use it to impute values of A in four different ways:

  • impute(glrm) gives the maximum likelihood estimates for each entry
  • impute_missing(glrm) imputes missing entries and leaves observed entries unchanged
  • sample(glrm) gives a draw from the posterior distribution, conditioned on the fit values of X and Y, for each entry
  • sample_missing(glrm) samples missing entries and leaves observed entries unchanged

Cross validation

A number of useful functions are available to help you check whether a given low rank model overfits to the test data set. These functions should help you choose adequate regularization for your model.

Cross validation

cross_validate(glrm::GLRM, nfolds=5, params=Params(); verbose=false, use_folds=None, error_fn=objective, init=None): performs n-fold cross validation and returns average loss among all folds. More specifically, splits observations in glrm into nfolds groups, and builds new GLRMs, each with one group of observations left out. Fits each GLRM to the training set (the observations revealed to each GLRM) and returns the average loss on the test sets (the observations left out of each GLRM).

Optional arguments:

  • use_folds: build use_folds new GLRMs instead of n_folds new GLRMs, each with 1/nfolds of the entries left out. (use_folds defaults to nfolds.)
  • error_fn: use a custom error function to evaluate the fit, rather than the objective. For example, one might use the imputation error by setting error_fn = error_metric.
  • init: initialize the fit using a particular procedure. For example, consider init=init_svd!. See Initialization for more options.

cv_by_iter(glrm::GLRM, holdout_proportion=.1, params=Params(1,1,.01,.01), niters=30; verbose=true): computes the test error and train error of the GLRM as it is trained. Splits the observations into a training set (1-holdout_proportion of the original observations) and a test set (holdout_proportion of the original observations). Performs params.maxiter iterations of the fitting algorithm on the training set niters times, and returns the test and train error as a function of iteration.

Regularization paths

  • regularization_path(glrm::GLRM; params=Params(), reg_params=exp10.(range(2,stop=-2,length=5)), holdout_proportion=.1, verbose=true, ch::ConvergenceHistory=ConvergenceHistory("reg_path")): computes the train and test error for GLRMs varying the scaling of the regularization through any scaling factor in the array reg_params.


  • get_train_and_test(obs, m, n, holdout_proportion=.1): splits observations obs into a train and test set. m and n must be at least as large as the maximal value of the first or second elements of the tuples in observations, respectively. Returns observed_features and observed_examples for both train and test sets.


This library implements the ScikitLearn.jl interface. These models are available: SkGLRM, PCA, QPCA, NNMF, KMeans, RPCA. See their docstrings for more information (e.g. ?QPCA). All models support the ScikitLearnBase.fit! and ScikitLearnBase.transform interface. Examples:

## Apply PCA to the iris dataset
using LowRankModels
import ScikitLearnBase
using RDatasets    # may require Pkg.add("RDatasets")

A = convert(Matrix, dataset("datasets", "iris")[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
ScikitLearnBase.fit_transform!(PCA(k=3, max_iter=500), A)
## Fit K-Means to a fake dataset of two Gaussians
using LowRankModels
import ScikitLearnBase

# Generate two disjoint Gaussians with 100 and 50 points
gaussian1 = randn(100, 2) + 5
gaussian2 = randn(50, 2) - 10
# Merge them into a single dataset
A = vcat(gaussian1, gaussian2)

model = ScikitLearnBase.fit!(LowRankModels.KMeans(), A)
# Count how many points are assigned to each Gaussians (should be 100 and 50)
Set(sum(ScikitLearnBase.transform(model, A), 1))

See also this notebook demonstrating K-Means.

These models can be used inside a ScikitLearn pipeline, and every hyperparameter can be tuned with GridSearchCV.

Citing this package

If you use LowRankModels for published work, we encourage you to cite the software.

Use the following BibTeX citation:

    title = {Generalized Low Rank Models},
    author ={Madeleine Udell and Horn, Corinne and Zadeh, Reza and Boyd, Stephen},
    doi = {10.1561/2200000055},
    year = {2016},
    archivePrefix = "arXiv",
    eprint = {1410.0342},
    primaryClass = "stat-ml",
    journal = {Foundations and Trends in Machine Learning},
    number = {1},
    volume = {9},
    issn = {1935-8237},
    url = {http://dx.doi.org/10.1561/2200000055},

Download Details:

Author: Madeleineudell
Source Code: https://github.com/madeleineudell/LowRankModels.jl 
License: View license

#julia #models 

A Julia Package for Modeling & Fitting Generalized Low Rank Models
Monty  Boehm

Monty Boehm


LLaMA: Inference Code for LLaMA Models


This repository is intended as a minimal, hackable and readable example to load LLaMA models and run inference. In order to download the checkpoints and tokenizer, fill this google form


In a conda env with pytorch / cuda available, run

pip install -r requirements.txt

Then in this repository

pip install -e .


Once your request is approved, you will receive links to download the tokenizer and model files. Edit the download.sh script with the signed url provided in the email to download the model weights and tokenizer.


The provided example.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using TARGET_FOLDER as defined in download.sh:

torchrun --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model

Different models require different MP values:


Model Card


Download Details:

Author: Facebookresearch
Source Code: https://github.com/facebookresearch/llama 
License: GPL-3.0 license

#python #code #llama #models 

LLaMA: Inference Code for LLaMA Models
Royce  Reinger

Royce Reinger


Text: Models, Data Loaders and Abstractions for Language Processing


Models, data loaders and abstractions for language processing, powered by PyTorch


We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch versiontorchtext versionSupported Python version
nightly buildmain>=3.8, <=3.11>=3.7, <=3.10>=3.7, <=3.10>=3.6, <=3.9>=3.6, <=3.9>=3.6, <=3.9
1.90.10>=3.6, <=3.9>=3.6, <=3.9
1.80.9>=3.6, <=3.9>=3.6, <=3.9
1.70.8>=3.6, <=3.8
1.60.7>=3.6, <=3.8
1.50.6>=3.5, <=3.8, >=3.5, <=3.8
0.4 and below0.2.32.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.


When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.


Find the documentation here.


The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9
  • Machine translation: IWSLT2016, IWSLT2017, Multi30k
  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking
  • Question answering: SQuAD1, SQuAD2
  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
  • Model pre-training: CC-100


The library currently consist of following pre-trained models:


The transforms module currently support following scriptable tokenizers:


To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

This repository consists of:

Download Details:

Author: Pytorch
Source Code: https://github.com/pytorch/text 
License: BSD-3-Clause license

#machinelearning #python #nlp #deeplearning #models #pytorch 

Text: Models, Data Loaders and Abstractions for Language Processing
Oral  Brekke

Oral Brekke


MM-cot: Multimodal Chain-of-Thought Reasoning in Language Models

Multimodal Chain-of-Thought Reasoning in Language Models

"Imagine learning a textbook without figures or tables."

Multimodal-CoT incorporates vision features in a decoupled training framework. The framework consists of two training stages: (i) rationale generation and (ii) answer inference. Both stages share the same model architecture but differ in the input and output.


Install all required python dependencies:

pip install -r requirements.txt


Download the dataset from the following repository:


Download the extracted vision features from vision_features and unzip the files under vision_features



# rationale generation
CUDA_VISIBLE_DEVICES=0,1 python main.py \
    --model allenai/unifiedqa-t5-base \
    --user_msg rationale --img_type detr \
    --bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
    --final_eval --prompt_format QCM-LE

# answer inference
CUDA_VISIBLE_DEVICES=0,1 python main.py \
    --model allenai/unifiedqa-t5-base \
    --user_msg answer --img_type detr \
    --bs 8 --eval_bs 4 --eval_acc 10 --output_len 64 \
    --final_eval --prompt_format QCMG-A \
    --eval_le experiments/rationale_allenai-unifiedqa-t5-base_detr_QCM-LE_lr5e-05_bs16_op512_ep20/predictions_ans_eval.json \
    --test_le experiments/rationale_allenai-unifiedqa-t5-base_detr_QCM-LE_lr5e-05_bs16_op512_ep20/predictions_ans_test.json


Our trained models are available at models. To use our trained models, please put the them under the models folder.

# rationale generation
CUDA_VISIBLE_DEVICES=0,1 python main.py \
    --model allenai/unifiedqa-t5-base \
    --user_msg rationale --img_type detr \
    --bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
    --final_eval --prompt_format QCM-LE \
    --evaluate_dir models/MM-CoT-UnifiedQA-base-Rationale

# answer inference
CUDA_VISIBLE_DEVICES=0,1 python main.py \
    --model allenai/unifiedqa-t5-base \
    --user_msg answer --img_type detr \
    --bs 8 --eval_bs 4 --eval_acc 10 --output_len 64 \
    --final_eval --prompt_format QCMG-A \
    --eval_le models/rationale/predictions_ans_eval.json \
    --test_le models/rationale/predictions_ans_test.json \
    --evaluate_dir models/MM-CoT-UnifiedQA-base-Answer

Citing MM-CoT

  title={Multimodal Chain-of-Thought Reasoning in Language Models},
  author={Zhang, Zhuosheng and Zhang, Aston and Li, Mu and Zhao, Hai and Karypis, George and Smola, Alex},
  journal={arXiv preprint arXiv:2302.00923},


Part of our codes are adapted from ScienceQA and Transformers.

We thank Pan Lu for providing parameter size for ScienceQA baselines.

Download Details:

Author: Amazon-science
Source Code: https://github.com/amazon-science/mm-cot 
License: Apache-2.0 license

#python #language #models 

MM-cot: Multimodal Chain-of-Thought Reasoning in Language Models
Royce  Reinger

Royce Reinger


HMMLearn: Hidden Markov Models in Python, with Scikit-learn Like API


hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and similar models see seqlearn.

Note: This package is under limited-maintenance mode.


The required dependencies to use hmmlearn are

  • Python >= 3.6
  • NumPy >= 1.10
  • scikit-learn >= 0.16

You also need Matplotlib >= 1.1.1 to run the examples and pytest >= 2.6.0 to run the tests.


Requires a C compiler and Python headers.

To install from PyPI:

pip install --upgrade --user hmmlearn

To install from the repo:

pip install --user git+https://github.com/hmmlearn/hmmlearn

Important links

Download Details:

Author: hmmlearn
Source Code: https://github.com/hmmlearn/hmmlearn 
License: BSD-3-Clause license

#machinelearning #python #markov #models 

HMMLearn: Hidden Markov Models in Python, with Scikit-learn Like API
Sheldon  Grant

Sheldon Grant


ControlNet: Let Us Control Diffusion Models


Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models.

ControlNet is a neural network structure to control diffusion models by adding extra conditions.


It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.

The "trainable" one learns your condition. The "locked" one preserves your model.

Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.

The "zero convolution" is 1×1 convolution with both weight and bias initialized as zeros.

Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion.

No layer is trained from scratch. You are still fine-tuning. Your original model is safe.

This allows training on small-scale or even personal devices.

This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.


Q: But wait, if the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why "zero convolution" works?

A: This is not true. See an explanation here.

Stable Diffusion + ControlNet

By repeating the above simple structure 14 times, we can control stable diffusion in this way:


Note that the way we connect layers is computational efficient. The original SD encoder does not need to store gradients (the locked original SD Encoder Block 1234 and Middle). The required GPU memory is not much larger than original SD, although many layers are added. Great!

Production-Ready Pretrained Models

First create a new conda environment

conda env create -f environment.yaml
conda activate control

All models and detectors can be downloaded from our Hugging Face page. Make sure that SD models are put in "ControlNet/models" and detectors are put in "ControlNet/annotator/ckpts". Make sure that you download all necessary pretrained weights and detector models from that Hugging Face page, including HED edge detection model, Midas depth estimation model, Openpose, and so on.

We provide 9 Gradio apps with these models.

All test images can be found at the folder "test_imgs".


2023/02/12 - Now you can play with any community model by Transferring the ControlNet.

2023/02/11 - Low VRAM mode is added. Please use this mode if you are using 8GB GPU(s) or if you want larger batch size.

ControlNet with Canny Edge

Stable Diffusion 1.5 + ControlNet (using simple Canny edge detection)

python gradio_canny2image.py

The Gradio app also allows you to change the Canny edge thresholds. Just try it for more details.

Prompt: "bird" p

Prompt: "cute dog" p

ControlNet with M-LSD Lines

Stable Diffusion 1.5 + ControlNet (using simple M-LSD straight line detection)

python gradio_hough2image.py

The Gradio app also allows you to change the M-LSD thresholds. Just try it for more details.

Prompt: "room" p

Prompt: "building" p

ControlNet with HED Boundary

Stable Diffusion 1.5 + ControlNet (using soft HED Boundary)

python gradio_hed2image.py

The soft HED Boundary will preserve many details in input images, making this app suitable for recoloring and stylizing. Just try it for more details.

Prompt: "oil painting of handsome old man, masterpiece" p

Prompt: "Cyberpunk robot" p

ControlNet with User Scribbles

Stable Diffusion 1.5 + ControlNet (using Scribbles)

python gradio_scribble2image.py

Note that the UI is based on Gradio, and Gradio is somewhat difficult to customize. Right now you need to draw scribbles outside the UI (using your favorite drawing software, for example, MS Paint) and then import the scribble image to Gradio.

Prompt: "turtle" p

Prompt: "hot air balloon" p

Interactive Interface

We actually provide an interactive interface

python gradio_scribble2image_interactive.py

However, because gradio is very buggy and difficult to customize, right now, user need to first set canvas width and heights and then click "Open drawing canvas" to get a drawing area. Please do not upload image to that drawing canvas. Also, the drawing area is very small; it should be bigger. But I failed to find out how to make it larger. Again, gradio is really buggy.

The below dog sketch is drawn by me. Perhaps we should draw a better dog for showcase.

Prompt: "dog in a room" p

ControlNet with Fake Scribbles

Stable Diffusion 1.5 + ControlNet (using fake scribbles)

python gradio_fake_scribble2image.py

Sometimes we are lazy, and we do not want to draw scribbles. This script use the exactly same scribble-based model but use a simple algorithm to synthesize scribbles from input images.

Prompt: "bag" p

Prompt: "shose" (Note that "shose" is a typo; it should be "shoes". But it still seems to work.) p

ControlNet with Human Pose

Stable Diffusion 1.5 + ControlNet (using human pose)

python gradio_pose2image.py

Apparently, this model deserves a better UI to directly manipulate pose skeleton. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then the Openpose will detect the pose for you.

Prompt: "Chief in the kitchen" p

Prompt: "An astronaut on the moon" p

ControlNet with Semantic Segmentation

Stable Diffusion 1.5 + ControlNet (using semantic segmentation)

python gradio_seg2image.py

This model use ADE20K's segmentation protocol. Again, this model deserves a better UI to directly draw the segmentations. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then a model called Uniformer will detect the segmentations for you. Just try it for more details.

Prompt: "House" p

Prompt: "River" p

ControlNet with Depth

Stable Diffusion 1.5 + ControlNet (using depth map)

python gradio_depth2image.py

Great! Now SD 1.5 also have a depth control. FINALLY. So many possibilities (considering SD1.5 has much more community models than SD2).

Note that different from Stability's model, the ControlNet receive the full 512×512 depth map, rather than 64×64 depth. Note that Stability's SD2 depth model use 64*64 depth maps. This means that the ControlNet will preserve more details in the depth map.

This is always a strength because if users do not want to preserve more details, they can simply use another SD to post-process an i2i. But if they want to preserve more details, ControlNet becomes their only choice. Again, SD2 uses 64×64 depth, we use 512×512.

Prompt: "Stormtrooper's lecture" p

ControlNet with Normal Map

Stable Diffusion 1.5 + ControlNet (using normal map)

python gradio_normal2image.py

This model use normal map. Rightnow in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling).

Prompt: "Cute toy" p

Prompt: "Plaster statue of Abraham Lincoln" p

Compared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model.

Prompt: "Plaster statue of Abraham Lincoln" p

ControlNet with Anime Line Drawing

We also trained a relatively simple ControlNet for anime line drawings. This tool may be useful for artistic creations. (Although the image details in the results is a bit modified, since it still diffuse latent images.)

This model is not available right now. We need to evaluate the potential risks before releasing this model. Nevertheless, you may be interested in transferring the ControlNet to any community model.


Annotate Your Own Data

We provide simple python scripts to process images.

See a gradio example here.

Train with Your Own Data

Training a ControlNet is as easy as (or even easier than) training a simple pix2pix.

See the steps here.


author = "Lvmin Zhang and Maneesh Agrawala",
title = "Adding Conditional Control to Text-to-Image Diffusion Models",
month = "Feb",
year = "2022"

Download the paper here.

Download Details:

Author: lllyasviel
Source Code: https://github.com/lllyasviel/ControlNet 
License: Apache-2.0 license

#python #models #text #image 

ControlNet: Let Us Control Diffusion Models

Hard Prompts Made Easy: Discrete Prompt Tuning for Language Models

Hard Prompts Made Easy: Discrete Prompt Tuning for Language Models

This code is the official implementation of Hard Prompts Made Easy.

If you have any questions, feel free to email Yuxin (ywen@umd.edu).


From a given image, we first optimize a hard prompt using the PEZ algorithm and CLIP encoders. Then, we take the optimized prompts and feed them into Stable Diffusion to generate new images. The name PEZ (hard Prompts made EaZy) was inspired from the PEZ candy dispenser.

Try out

You can try out our demos on Colab Open In Colab or Hugging Face Space Generic badge.

More Jupyter notebook examples can be found in the examples/ folder.

We recommand to run more shots to obtain more desirable prompts.


  • PyTorch => 1.13.0
  • transformers >= 4.23.1
  • diffusers >= 0.11.1
  • sentence-transformers >= 2.2.2
  • ftfy >= 6.1.1
  • mediapy >= 1.1.2


Ensure you have python 3 installed.

Create a virtual environment, activate it, and install dependencies:

$ python -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt


A script is provided to perform prompt inversion (finding a prompt from an image or set of images). For examples of other usages, see the examples folder.

python run.py image.png

You can pass multiple images to optimize a prompt across all images.


Config can be loaded from a JSON file. A sample config is provided at ./sample-config.json.

Config has the following parameters:

  • prompt_len: the number of tokens in the optimized prompt. 16 empirically results in the most generalizable performance. more is not necessarily better.
  • iter: the total number of iterations to run for.
  • lr: the learning weight for the optimizer.
  • weight_decay: the weight decay for the optimizer.
  • prompt_bs: number of initializations.
  • batch_size: number of target images/prompts used for each iteration.
  • clip_model: the name of the CLiP model for use with . "ViT-H-14" is the model used in SD 2.0 and Midjourney. "ViT-L-14" is the model used in SD 1.5. This should ideally match your target generator.
  • clip_pretrain: the name of the pretrained model for open_clip. For "ViT-H-14" use "laion2b_s32b_b79k". For "ViT-L-14" use "openai".
  • print_step: if not null, how often (in steps) to print a line giving current status.
  • print_new_best: whether to print out new best prompts whenver found. will be quite noisy initially.

Language Model Prompt Experiments

You may check the code in prompt_lm/ folder.

Download Details:

Author: YuxinWenRick
Source Code: https://github.com/YuxinWenRick/hard-prompts-made-easy 
License: MIT license

#python #language #models 

Hard Prompts Made Easy: Discrete Prompt Tuning for Language Models
Royce  Reinger

Royce Reinger


Uplift Modeling & Causal inference with Machine Learning Algorithms

Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research [1]. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. Typical use cases include

Campaign targeting optimization: An important lever to increase ROI in an advertising campaign is to target the ad to the set of customers who will have a favorable response in a given KPI such as engagement or sales. CATE identifies these customers by estimating the effect of the KPI from ad exposure at the individual level from A/B experiment or historical observational data.

Personalized engagement: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

The package currently supports the following methods

  • Tree-based algorithms
    • Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square [2]
    • Uplift tree/random forests on Contextual Treatment Selection [3]
    • Causal Tree [4] - Work-in-progress
  • Meta-learner algorithms
    • S-learner [5]
    • T-learner [5]
    • X-learner [5]
    • R-learner [6]
    • Doubly Robust (DR) learner [7]
    • TMLE learner [8]
  • Instrumental variables algorithms
    • 2-Stage Least Squares (2SLS)
    • Doubly Robust (DR) IV [9]
  • Neural-network-based algorithms


Installation with conda is recommended. conda environment files for Python 3.6, 3.7, 3.8 and 3.9 are available in the repository. To use models under the inference.tf module (e.g. DragonNet), additional dependency of tensorflow is required. For detailed instructions, see below.

Install using conda:

Install from conda-forge

Directly install from the conda-forge channel using conda.

$ conda install -c conda-forge causalml

Install with the conda virtual environment

This will create a new conda virtual environment named causalml-[tf-]py3x, where x is in [6, 7, 8, 9]. e.g. causalml-py37 or causalml-tf-py38. If you want to change the name of the environment, update the relevant YAML file in envs/

$ git clone https://github.com/uber/causalml.git
$ cd causalml/envs/
$ conda env create -f environment-py38.yml    # for the virtual environment with Python 3.8 and CausalML
$ conda activate causalml-py38

Install causalml with tensorflow

$ git clone https://github.com/uber/causalml.git
$ cd causalml/envs/
$ conda env create -f environment-tf-py38.yml    # for the virtual environment with Python 3.8 and CausalML
$ conda activate causalml-tf-py38
(causalml-tf-py38) pip install -U numpy            # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)

Install using pip:

$ git clone https://github.com/uber/causalml.git
$ cd causalml
$ pip install -r requirements.txt
$ pip install causalml

Install causalml with tensorflow

$ git clone https://github.com/uber/causalml.git
$ cd causalml
$ pip install -r requirements-tf.txt
$ pip install causalml[tf]
$ pip install -U numpy                            # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)

Install from source:

$ git clone https://github.com/uber/causalml.git
$ cd causalml
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
$ python setup.py install

Quick Start

Average Treatment Effect Estimation with S, T, X, and R Learners

from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor
from causalml.inference.meta import BaseRRegressor
from xgboost import XGBRegressor
from causalml.dataset import synthetic_data

y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

lr = LRSRegressor()
te, lb, ub = lr.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Linear Regression): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(X, treatment, y)
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

nn = MLPTRegressor(hidden_layer_sizes=(10, 10),
te, lb, ub = nn.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Neural Network (MLP)): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xl = BaseXRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = xl.estimate_ate(X, treatment, y, e)
print('Average Treatment Effect (BaseXRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

rl = BaseRRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub =  rl.estimate_ate(X=X, p=e, treatment=treatment, y=y)
print('Average Treatment Effect (BaseRRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

See the Meta-learner example notebook for details.

Interpretable Causal ML

Causal ML provides methods to interpret the treatment effect models trained as follows:

Meta Learner Feature Importances

from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
from causalml.dataset.regression import synthetic_data

# Load synthetic data
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=25, sigma=0.5)
w_multi = np.array(['treatment_A' if x==1 else 'control' for x in treatment]) # customize treatment/control names

slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)

model_tau_feature = RandomForestRegressor()  # specify model for model_tau_feature

slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
                        normalize=True, method='auto', features=feature_names)

# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')

# Using eli5's PermutationImportance
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')

# Using SHAP
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)

# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau)

# Plot shap values WITH specifying shap_dict
slearner.plot_shap_values(X=X, shap_dict=shap_slearner)

# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)

See the feature interpretations example notebook for details.

Uplift Tree Visualization

from IPython.display import Image
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot

uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
                                    n_reg=100, evaluationFunction='KL', control_name='control')


graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, features)

See the Uplift Tree visualization example notebook for details.


We welcome community contributors to the project. Before you start, please read our code of conduct and check out contributing guidelines first.


We document versions and changes in our changelog.


This project is licensed under the Apache 2.0 License - see the LICENSE file for details.



Conference Talks and Publications by CausalML Team


To cite CausalML in publications, you can refer to the following sources:

Whitepaper: CausalML: Python Package for Causal Machine Learning


@misc{chen2020causalml, title={CausalML: Python Package for Causal Machine Learning}, author={Huigang Chen and Totte Harinen and Jeong-Yoon Lee and Mike Yung and Zhenyu Zhao}, year={2020}, eprint={2002.11631}, archivePrefix={arXiv}, primaryClass={cs.CY} }


  1. Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020).
  2. Radcliffe, Nicholas J., and Patrick D. Surry. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011-1, Stochastic Solutions (2011): 1-33.
  3. Zhao, Yan, Xiao Fang, and David Simchi-Levi. "Uplift modeling with multiple treatments and general response types." Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2017.
  4. Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.
  5. Künzel, Sören R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165.
  6. Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017).
  7. Bang, Heejung, and James M. Robins. "Doubly robust estimation in missing data and causal inference models." Biometrics 61.4 (2005): 962-973.
  8. Van Der Laan, Mark J., and Daniel Rubin. "Targeted maximum likelihood learning." The international journal of biostatistics 2.1 (2006).
  9. Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020).
  10. Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017).
  11. Shi, Claudia, David M. Blei, and Victor Veitch. "Adapting neural networks for the estimation of treatment effects." 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
  12. Zhao, Zhenyu, Yumin Zhang, Totte Harinen, and Mike Yung. "Feature Selection Methods for Uplift Modeling." arXiv preprint arXiv:2005.03447 (2020).
  13. Zhao, Zhenyu, and Totte Harinen. "Uplift modeling for multiple treatments with cost optimization." In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 422-431. IEEE, 2019.

Related projects

  • uplift: uplift models in R
  • grf: generalized random forests that include heterogeneous treatment effect estimation in R
  • rlearner: A R package that implements R-Learner
  • DoWhy: Causal inference in Python based on Judea Pearl's do-calculus
  • EconML: A Python package that implements heterogeneous treatment effect estimators from econometrics and machine learning methods


This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.

Download Details:

Author: uber
Source Code: https://github.com/uber/causalml 
License: View license

#machinelearning #models 

Uplift Modeling & Causal inference with Machine Learning Algorithms

Best 10 Mental Models Developers Can Use to Get Unstuck

Start experimenting with ten mental models you can use to get unstuck, look at difficult problems from new angles, verify your assumptions, and understand systems more deeply.

How do you quickly recover when you’re stuck in a rut? 

Naturally, you could sit down and brainstorm solutions. Unfortunately, it may take time for inspiration to come when solving a complicated challenge with your code.

What can we do to think better and solve problems faster?

Whether you want to identify the root cause of a problem or understand the ideal way to prioritize, mental models could offer valuable insights. 

What Is a Mental Model?

A mental model is a way for us to understand the world. Mental models are frameworks that help us understand how our minds work and why we think the way that we do. We can also use mental models to rationalize concepts.

Mental models are not always right. They are a simplified way of thinking that can help us understand things better. We can use these insights to take action.

Mental models are powerful because they’re flexible. Like metaphors, mental models let us understand things that we don’t know by comparing them to what we already know.

For example, game theory is a branch of mathematics focused on analyzing the actions and counteractions of individuals or groups. It’s a rigorous form of mental modeling that allows us to explore concepts such as decision-making, strategy, and even reciprocal relationships with others.

As human beings, it’s easy for us to underestimate the power of these tools. We often forget how much thinking goes into our daily routines. In fact, mental models can help us examine how we work and why we think the way we do.

How Do Mental Models Help Developers Think Better?

Our brains’ mental models determine the quality of our thoughts. Understanding which mental model best fits a situation can help you work and think smarter. 

For developers, mental models can benefit your productivity and efficiency. It could enable you to understand the problem, correct high-level issues in the code, and avoid potential bugs.

Consider this scenario.

You’re in the zone and writing code at a fast pace when something goes wrong. You check the source code, iterate potential solutions, invoke a debugger, or analyze stack traces.

When done right, you can find the root cause of the issue. But this can take a lot of time and effort.

Now, consider an alternative scenario.

Let’s say you encountered a problem with a code. 

Instead of using a variety of random strategies, you can analyze the mental model of a system. Think about the conditions that led to the bug and find areas where the code isn’t aligned with the mental model. 

A developer could identify the solution even without a Google search with this approach. 

Now, what are the mental models that can help you get unstuck? Here are some notable mental models for developers that can help you get the job done. 

Mental Model 1: Rubber Ducking

Rubber ducking is a shorter term for “rubber duck debugging.” 

The concept originated from a tale wherein a programmer described their code line-by-line to a rubber duck. 

While its original inspiration seems odd, the rationale is simple.

Explaining your code to another individual or an inanimate object lets you break down the problem and determine where you got stuck. You’re compelled to think outside the box.

Eventually, you’ll arrive at the point where you went wrong with your code.

Just to clarify, you don’t need to talk to an actual rubber duck or a toy plushie to get this done. You can also gain valuable insights by rubber ducking with a colleague or a friend. As you attempt to explain your code in-depth, they might brainstorm potential solutions.

Model 2: Circle of Competence

The Circle of Competence is about differentiating “what you know” from “what you don’t know.

To put it simply, this mental model helps you remain aware of your areas of expertise. At the same time, you can accept your weaknesses or sectors where you are at a disadvantage. 

No matter how long you’ve worked as a developer, you won’t be able to know everything. 

An example would be a gaming developer moving on as a developer in the finance industry. 

You need to be proficient in C# and C++, user interface design, and program terrains or AI for non-playable characters as a game developer. Some of these skills may be useful in your current role, but you later discover you need to understand bank laws or manage security services too. 

With the Circle of Competence, developers can predict the challenges they may encounter when starting a new project or moving on to a new job. Once you know what’s outside the circle, you can seek help or contact experts that could help you conquer the areas where you’re not confident. 

Model 3: Feedback Loops

A feedback loop happens when an output of a system re-enters the system as inputs. 

It usually occurs in the “plan-do-check-act (PDCA) cycle,” an iterative process for improving products and services. 

This process involves four steps:

  • Plan: Determining what needs to be done
  • Do: Following the initial plan
  • Check: Assessing your plan’s execution and evaluating its effectiveness 
  • Act: Putting the plan into action

In software development, feedback loops can occur during the development phase. 

This process may involve aggregating feedback from a sample group of customers to determine whether the output solves what it’s intended to. Otherwise, we may waste time and money in the development phase without satisfying customer expectations. 

Developers may apply feedback loops during pair programming or code reviews.

Imagine a junior developer writing the code while a senior developer reviews it. The process improves the skills of junior developers, helps identify bugs, and improves subsequent outputs of the team. 

Model 4: Mindmaps

A mindmap is a diagram that offers a visual representation of concepts or ideas. 

Try kicking off a project by making a mindmap. Begin with a central idea or concept. It might be the main problem or the project’s title. 

Next, you can add branches or subtopics related to the central concept. These could be the main tasks that need to be done by each team. 

You can then add more subtopics or branches. These could encompass tasks assigned to each member, contributing to the overarching goal. 

A mind map is also helpful in the testing process in software development. Testers could use it to explore an application and list passed or failed tests. 

Along the way, you could even include questions in the sub-branches. This way, the feedback and issues are organized in an easy-to-understand format. 

Model 5: Hill Charts

Hill charts are a mental model that can help you identify what’s in motion and what’s stuck. 

Like the shape of a hill, the chart is composed of two phases – an uphill slope and a downward slope.

The first phase is “Figuring Things Out,” situated on the uphill slope. At this stage, you have a basic understanding of the project, but you still need to settle some unknowns or finalize your overall strategy. 

As time goes by, you’ll eventually reach a point where you’re ready to put your strategy into action. Then, the downhill phase is about “Making it Happen” or implementation. 

Developers can utilize Hill charts by coming up with to-do lists for their projects. As you fulfill or add more items on the list, identify where they should be situated on the Hill chart. 

Senior developers working on multiple projects or managing several teams can use this to gauge where a team is focusing its efforts. It could also help identify stuck groups and what they need to move forward. 

Model 6: Parkinson’s Law

Parkinson’s law is a mental model which states that work expands to fill the time allotted.

Take, for instance, a developer team that’s given three weeks to add or tweak a specific feature in the product. The team is delighted to find that they have more than enough time to finish the project. They start slow and require three weeks to complete the task, but they discover more issues to finish after receiving feedback. 

Parkinson’s law states that teams should set deadlines for maximum efficiency, even if they’re imperfect. 

In the first example, the team seems too relaxed because of the illusion of time. Questions and minor tweaks could slow them down, but the output may still be imperfect.

However, if they were allotted a realistic two-week deadline, the same team could get more done in less time. They’ll even have sufficient time to work on feedback from testing, if necessary. 

Model 7: 5 Whys

The 5 Whys is a mental model which requires asking “Why” five times. 

The rationale is when you identify a problem, the most obvious solution may not address the root cause of the issue. 

Identifying the leading cause will enable developers to save time and effort. Otherwise, they would merely apply band-aid solutions while the real problem is left unaddressed. 

An example that seems relatable to developers could be the following:

Why couldn’t the user access the calendar feature in the app? There was a bug in the recent update.

What led to the bug in the recent update? The team was unable to test all the features. Why was the team unable to test all the features? New testers on the team were unable to test all the features properly.

Why did new testers fail to perform well? They were also not provided with resources and adequate training. Why were they not provided with proper training and resources? Most new testers worked remotely.

The team in charge of training them is having a hard time because there’s no tried and tested onboarding process yet for fully-remote workers.

Model 8: Inversion

During the problem-solving process, we often think forward. 

This may be effective when solving simple issues. However, it may be challenging to tackle a complicated issue that needs to be broken down. 

Inversion helps us break down problems and brainstorm solutions by thinking backward.

Let’s say your software product has launched a free trial to boost your customer base. Yet, the free trial conversion rate is only a dismal 2%. 

The standard thought process for brainstorming solutions would involve asking, “What can I do to get more people to use my product even after the free trial ends?”

Instead of thinking forward, invert the problem and ask, “Which features did users try the most during the free trial? How can we improve the user experience in our free plan?”

The solution to the first problem may solely involve improving your onboarding experience and creating tutorials. Yet, you may uncover underlying issues that significantly contribute to the low conversion rate by inverting the problem. 

Model 9: Occam’s Razor

Occam’s Razor, also known as the law of parsimony, is a mental model for problem-solving. To put it simply, the model states that when there are several ways to solve a problem, the simplest solution is likely more correct and ideal. 

Consider a developer that can write both simple and complex code to accomplish the same outcome. Even if two options exist, the most ideal would be the simpler code because it is faster to review and easier to update.

While the result is the same, the more straightforward solution is easier to execute and more advantageous in the long run.

Model 10: Lean Startup

Lean Startup involves the build-measure-learn feedback loop.

Most startups start with a great idea, but it can take weeks or months to realize this product. 

Lean Startup processes solve this problem by encouraging the development of a minimum viable product (MVP) that potential customers can test.

Once selected target customers try it, the startup will measure results and ask for feedback. The cycle continues until the startup has a high-quality product that they can confidently release en masse to target consumers.

The team can build the ideal product with continuous feedback from target consumers. Otherwise, it could take weeks or months for startups to get a product beta tested.

Worse, they may discover significant issues during the testing process. However, they’ve already invested thousands of dollars into building a product and can’t afford to stay in this stage for a more extended period. 

Pick the Right Mental Model

Understanding the right mental model for each situation helps us work smarter, not harder. 

Dealing with a complicated issue can cost us a lot of time and effort. Mental models help us break down the big problem into much smaller ones. This way, we can get to the heart of the matter and develop the most practical solutions. 

I know it may take time to ingrain these mental models in your daily life. But once you learn the process and actualize it, you can instantly get unstuck and steered in the right direction.

Original article source at: https://www.sitepoint.com/

#models #strategy #developers 

Best 10 Mental Models Developers Can Use to Get Unstuck

How to Sound Logic and Monotonic AI Models

For those working with AI, the future is certainly exciting. At the same time, there is a general sense that AI suffers from one pesky flaw: AI in its current state can be unpredictably unreliable.

AI is fast becoming an amazing asset, having achieved superhuman levels of performance in domains such as image recognition, Go, and even poker. Many are excited about the future of AI and humanity. At the same time, there is a general sense that AI does suffer from one pesky flaw: AI in its current state can be unpredictably unreliable.

The classical example is the Jeopardy! IBM Challenge, during which Watson, the IBM AI, cleaned the board with ease, only to miss the “Final Jeopardy!” question, which was under the category of US Cities: “Its largest airport is named for a World War II hero; its second largest for a World War II battle.” Watson answered, “What is Toronto?????”—the extra question marks (and low wager) indicating its doubt.

So even though AI has the capacity for fairy-tale-like performance for large periods of time—months, years, decades even—there is always this nagging possibility that all of a sudden, it will mysteriously blunder.

Most concerning to us humans is not that the AI will make a mistake, but how “illogical” the mistake will be. In Watson’s case, someone who doesn’t know the answer to the question would “logically” try to at least guess a major US city. I believe that this is one of the main reasons we don’t yet have public adoption of self-driving cars: Even if self-driving cars may be statistically safer, we fear that their underlying AI might unexpectedly blunder in a similar sense to Watson, but with much more serious repercussions.

This got me wondering, could the right AI model fix this issue? Could the right AI have the capability to make sound decisions in critical moments, even when it doesn’t have all the answers? Such AI would be able to change the course of technology and enable us the fairy-tale-like benefits of AI…

I believe the answer to these questions is yes. I believe that mistakes like Watson’s may be avoidable with the use of improved, more logically constrained models, the early prototype of which are called monotonic machine learning models. Without going into details just yet, with the proper monotonic AI model:

  • A self-driving car would be safer, as the detection of even the smallest amount of human signal would always suffice to activate a safety protocol even in the presence of a large amount of other signal.
  • Machine learning (ML) systems would be more robust to adversarial attacks and unexpected situations.
  • ML performance would be more logical and humanly understandable.

I believe we are moving from an era of great growth in the computational and algorithmic power of AI to an era of finesse, effectiveness, and understanding in AI, and monotonic machine learning models are the first step in this exciting journey. Monotonic models make AI more “logical.”

Editor’s note: Readers looking to take their own first step in understanding ML basics are encouraged to read our introductory article on ML.

The Theory of Monotonic AI Models

So what is a monotonic model? Loosely speaking, a monotonic model is an ML model that has some set of features (monotonic features) whose increase always leads the model to increase its output.


...there are two places where the above definition is imprecise.

First, the features here are monotonic increasing. We can also have monotonically decreasing features, whose increase always leads to a decrease in the model. The two can be converted into one another simply by negation (multiplying by -1).

Second, when we say the output increases, we do not mean it's strictly increasing—we mean that it does not decrease, because the output can remain the same.

In real life, many pairs of variables exhibit monotonic relationships. For example:

  • The gas price for a trip is monotonically increasing in the distance driven.
  • The likelihood of receiving a loan is greater with better credit.
  • The expected driving time increases with the amount of traffic.
  • Revenue increases with the click rate on an ad.

Though these logical relationships are clear enough, for an ML model that is interpolating using limited data and no domain knowledge, they might not be. In fact, the model might interpolate them incorrectly, resulting in ridiculous and wacky predictions. Machine learning models that do capture such knowledge perform better in practice (by avoiding overfitting), are easier to debug, and are more interpretable. In most use cases, the monotonic model should be used in conjunction with an ordinary model, as part of an ensemble of learners.

One place where monotonic AI models really shine is in adversarial robustness. Monotonic models are “hardened” machine learning models, meaning they are resistant to adversarial attacks. Attackers who are able to manipulate only non-monotonic features are unable to evade the monotonic AI model because they are unable to alter the label of the example with respect to the monotonic AI model.

Use Cases for Monotonic AI Models

So far, this discussion has been entirely theoretical. Let’s discuss some real-life use cases.

Use Case #1: Malware Detection

One of the coolest use cases for monotonic AI models has to be their use in malware detection. Implemented as part of Windows Defender, a monotonic model is present in every up-to-date Windows device, quietly protecting users from malware.

In one scenario, malware authors impersonated legitimate, registered businesses to defraud certificate authorities, successfully digitally code-signing their malware with trusted certificates. A naive malware classifier is likely to use code-signing as a feature and would indicate such samples to be benign.

But not so in the case of Windows Defender’s monotonic AI model, whose monotonic features are only the features that indicate malware. No matter how much “benign” content malware authors inject into their malware, Windows Defender’s monotonic AI model would continue to catch the sample and defend users from damage.

In my course, Machine Learning for Red Team Hackers, I teach several techniques for evading ML-based malware classifiers. One of the techniques consists of stuffing a malicious sample with “benign” content/features to evade naive ML models. Monotonic models are resistant to this attack and force malicious actors to work much harder if they are to have any hope of evading the classifier.

Use Case #2: Content Filtering

Suppose a team is constructing a web-surfing content filter for school libraries. A monotonic AI model is a great candidate to use here because a forum that contains inappropriate content might also contain plenty of acceptable content.

A naive classifier might weigh the presence of “appropriate” features against the presence of “inappropriate” features. But that won’t do since we don’t want our children accessing inappropriate content, even if it makes up only a small fraction of the content.

Use Case #3: Self-driving Car AI

Imagine constructing a self-driving car algorithm. It looks at an image and sees a green light. It also sees a pedestrian. Should it weigh the signal of each against one another? Absolutely not. The presence of a pedestrian is sufficient to make a decision to stop the car. The presence of pedestrians should be viewed as a monotonic feature, and a monotonic AI model should be used in this scenario.

Use Case #4: Recommendation Engines

Recommendation engines are a great use case for monotonic AI models. In general, they might have many inputs about each product: star rating, price, number of reviews, etc. With all other inputs being equal, such as star ratings and price, we would prefer the product that has a greater number of reviews. We can enforce such logic using a monotonic AI model.

Use Case #5: Spam and Phishing Filtering

This use case is similar to the malware detection use case. Malicious users may inject their spam or phishing emails with benign-seeming terms to fool spam filters. A monotonic AI model will be immune to that.

Implementation and Demonstration

When it comes to freely available implementations of monotonic AI models, three stand out as best-supported: XGBoost, LightGBM, and TensorFlow Lattice.

Monotonic ML XGBoost Tutorial

XGBoost is considered one of the best-performing algorithms on structured data, based on years of empirical research and competition. In addition, monotonicity has been implemented in XGBoost.

The following demonstration XGBoost tutorial on how to use monotonic ML models has an accompanying Python repo.

Start by importing a few libraries:

import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import confusion_matrix
import seaborn as sns


The scenario we are going to model is a content filtering or malware database. We will have some benign_features, which model, e.g., the amount of content related to “science,” “history,” and “sports,” or in the malware case, “code-signing” and “recognized authors.”

Also, we will have malicious_features, which model, e.g., the amount of content related to “violence” and “drugs,” or in the malware case, “the number of times calls to crypto libraries are made” and “a numerical measure of similarity to a known malware family.”

We will model the situation via a generative model. We generate a large number of data points at random, about half benign and half malicious, using the function:

def flip():
    """Simulates a coin flip."""
    return 1 if random.random() < 0.5 else 0

Each data point will randomly generate its features. A “benign” data point will have a higher bias for the benign features, while a “malicious” data point will have a higher bias for the malicious features.

We will use a triangular distribution, like so:

bins = [0.1 * i for i in range(12)]

plt.hist([random.triangular(0, 1, 1) for i in range(50000)], bins)


A data point distribution graph resembling a staircase. Most buckets, from 0.1 to 0.2, 0.2 to 0.3, and so on, have about 1,000 more data points in them than the ones to their left. The first, from 0 to 0.1, appears to have about 500.


We’ll use this function to capture the above logic:

def generate():
    """Samples from the triangular distribution."""
    return random.triangular(0, 1, 1)

Then, we’ll proceed to create our dataset:

m = 100000
benign_features = 5
malicious_features = 5
n = benign_features + malicious_features
benign = 0
malicious = 1
X = np.zeros((m, n))
y = np.zeros((m))
for i in range(m):
    vec = np.zeros((n))
y[i] = flip()
if y[i] == benign:
    for j in range(benign_features):
        vec[j] = generate()
    for j in range(malicious_features):
        vec[j + benign_features] = 1 - generate()
    for j in range(benign_features):
        vec[j] = 1 - generate()
    for j in range(malicious_features):
        vec[j + benign_features] = generate()
X[i, :] = vec

X contains the vectors of randomly generated features, while y contains the labels. This classification problem is not trivial.


Typical samples: benign vs. malicious. Each graph shows 10 features (0 through 9) with values on a scale from 0 to 1. In the benign graph, most features are below 0.5; features 6 and 7 are above 0.6; feature 2 is nearly 0.8; and feature 3 is nearly 1.0. In the malicious graph, 7 out of 10 features are above 0.5, including features 5, 6, 7, and 8.


You can see that benign samples generally have greater weight in the first few features, whereas malicious samples generally have greater weight in the last few features.

With the data ready, let’s perform a simple training-testing split:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

We’ll use a function to prep the data for use with our XGBoost tutorial:

import xgboost as xgb

def prepare_for_XGBoost(X, y):
    """Converts a numpy X and y dataset into a DMatrix for XGBoost."""
    return xgb.DMatrix(X, label=y)

dtrain = prepare_for_XGBoost(X_train, y_train)
dtest = prepare_for_XGBoost(X_test, y_test)
dall = prepare_for_XGBoost(X, y)

Now, let’s train and test a simple (non-monotonic) XGBoost model on the data. We will then print out the confusion matrix to see a numerical breakdown of the correctly labeled positive examples, correctly labeled negative examples, incorrectly labeled positive examples, and incorrectly labeled negative examples.

params = {"n_jobs": -1, "tree_method": "hist"}
model_no_constraints = xgb.train(params=params, dtrain=dtrain)
CM = predict_with_XGBoost_and_return_confusion_matrix(
    model_no_constraints, dtrain, y_train
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Unconstrained model's training confusion matrix")
CM = predict_with_XGBoost_and_return_confusion_matrix(
    model_no_constraints, dtest, y_test
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Unconstrained model's testing confusion matrix")
model_no_constraints = xgb.train(params=params, dtrain=dall)


Unconstrained model's training confusion matrix, a two-by-two checkerboard. The Y axis is called "True Label," with zero at the top and one at the bottom. The X axis is called "Predicted Label," with zero on the left and one on the right. The color scale goes from white at zero to dark blue at 0.5. The upper-left and lower-right squares are dark blue, at 49.29% and 48.89% respectively. The other two squares are close to white, both at 0.91%. To the right is a very similar chart but for testing rather than training, with, in reading order, 49.33%, 1.25%, 1.20%, and 48.23%.


Looking at the results, we can see that there is no significant overfitting. We will compare these results to those of the monotonic models.

To that end, let’s train and test a monotonic XGBoost model. The syntax in which we pass in the monotone constraints is a sequence (f0, f1, …, fN), where each fi is one of -1, 0 or 1, depending on whether we want feature i to be monotonically decreasing, unconstrained, or monotonically increasing, respectively. In the case at hand, we specify the malicious features to be monotonic increasing.

params_constrained = params.copy()
monotone_constraints = (
    + ",".join([str(0) for m in range(benign_features)])
    + ","
    + ",".join([str(1) for m in range(malicious_features)])
    + ")"
print("Monotone constraints enforced are:")
params_constrained["monotone_constraints"] = monotone_constraints
model_monotonic = xgb.train(params=params_constrained, dtrain=dtrain)
CM = predict_with_XGBoost_and_return_confusion_matrix(model_monotonic, dtrain, y_train)
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Monotonic model's training confusion matrix")
CM = predict_with_XGBoost_and_return_confusion_matrix(model_monotonic, dtest, y_test)
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Monotonic model's testing confusion matrix")
model_monotonic = xgb.train(params=params_constrained, dtrain=dall)


Monotonic AI model's training confusion matrix, a two-by-two checkerboard. The Y axis is called "True Label," with zero at the top and one at the bottom. The X axis is called "Predicted Label," with zero on the left and one on the right. The color scale goes from white at zero to dark blue at 0.5. The upper-left and lower-right squares are dark blue, at 49.20% and 48.82% respectively. The upper-right and lower-left squares are close to white, at 0.99% and 0.98% respectively. To the right is a very similar chart for testing rather than training, with, in reading order, 49.32%, 1.26%, 1.22%, and 48.20%.


It’s clear that the performance of the monotonic model is the same as that of the unconstrained model.

Now, we are going to create an adversarial dataset. We are going to take all of the malicious samples and “stuff” their benign features by setting them all to 1. We will then see how the two models perform side by side.

X_adversarial = X[y == malicious]
y_adversarial = len(X_adversarial) * [malicious]
for i in range(len(X_adversarial)):
    vec = X_adversarial[i, :]
    for j in range(benign_features):
        vec[j] = 1
    X_adversarial[i, :] = vec

Let’s convert these to a form to be ingested by XGBoost:

dadv = prepare_for_XGBoost(X_adversarial, y_adversarial)

For the final step of our XGBoost tutorial, we’ll test the two machine learning model types:

CM = predict_with_XGBoost_and_return_confusion_matrix(
    model_no_constraints, dadv, y_adversarial
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Unconstrained model's confusion matrix on adversarial dataset")
CM = predict_with_XGBoost_and_return_confusion_matrix(
    model_monotonic, dadv, y_adversarial
plt.figure(figsize=(12, 10))
sns.heatmap(CM / np.sum(CM), annot=True, fmt=".2%", cmap="Blues")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.title("Monotonic model's confusion matrix on adversarial dataset")


Unconstrained vs monotonic AI models' training confusion matrices on the same adversarial dataset. Each is a two-by-two checkerboard. The Y axis is called "True Label," with zero at the top and one at the bottom. The X axis is called "Predicted Label," with zero on the left and one on the right. The color scale goes from white at zero to dark blue at 1.0. Both matrices' top rows contain only 0.00%. The left-hand (unconstrained) matrix's bottom row reads 99.99% and 0.01%, whereas the right-hand (monotonic) matrix's bottom row reads 75.81% and 24.19%.


As you can see, the monotonic AI model was about 2,500 times more robust to adversarial attacks.


The syntax for using monotonic features in LightGBM is similar.

TensorFlow Lattice

TensorFlow Lattice is another framework for tackling monotonicity constraints and is a set of prebuilt TensorFlow Estimators as well as TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables, meaning they are points evenly distributed in space (like a grid), along with function values at these points. According to the Google AI Blog:

“…the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time.”

Tutorials for how to use TensorFlow Lattice can be found here.

Monotonic AI Models and the Future

From defending devices from malicious attacks to offering logical and helpful restaurant recommendations, monotonic AI models have proven to be a great boon to society and a wonderful tool to master. Monotonic models are here to usher us into a new era of safety, finesse, and understanding in AI. And so I say, here’s to monotonic AI models, here’s to progress.

Original article source at: https://www.toptal.com/

#ai #models #logic #machinelearning 

How to Sound Logic and Monotonic AI Models

Learn Getting The Most Out Of Pre-trained Models

Pre-trained models are making waves in the deep learning world. Using massive pre-training datasets, these NLP models bring previously unheard-of feats of AI within the reach of app developers.

Most of the new deep learning models being released, especially in NLP, are very, very large: They have parameters ranging from hundreds of millions to tens of billions.

Given good enough architecture, the larger the model, the more learning capacity it has. Thus, these new models have huge learning capacity and are trained on very, very large datasets.

Because of that, they learn the entire distribution of the datasets they are trained on. One can say that they encode compressed knowledge of these datasets. This allows these models to be used for very interesting applications—the most common one being transfer learning. Transfer learning is fine-tuning pre-trained models on custom datasets/tasks, which requires far less data, and models converge very quickly compared to training from scratch.

How Pre-trained Models Are the Algorithms of the Future

Although pre-trained models are also used in computer vision, this article will focus on their cutting-edge use in the natural language processing (NLP) domain. Transformer architecture is the most common and most powerful architecture that is being used in these models.


The Transformer architecture as presented in Google's 2017 paper, "Attention Is All You Need."


Although BERT started the NLP transfer learning revolution, we will explore GPT-2 and T5 models. These models are pre-trained—fine-tuning them on specific applications will result in much better evaluation metrics, but we will be using them out of the box, i.e., with no fine-tuning.

Pre-trained NLP Models: OpenAI’s GPT-2

GPT-2 created quite a controversy when it was released back in 2019. Since it was very good at generating text, it attracted quite the media attention and raised a lot of questions regarding the future of AI.

Trained on 40 GB of textual data, GPT-2 is a very large model containing a massive amount of compressed knowledge from a cross-section of the internet.

GPT-2 has a lot of potential use cases. It can be used to predict the probability of a sentence. This, in turn, can be used for text autocorrection. Next, word prediction can be directly used to build an autocomplete component for an IDE (like Visual Studio Code or PyCharm) for writing code as well as general text writing. We will use it for automatic text generation, and a large corpus of text can be used for natural language analysis.


GPT-2 producing suggestions based on textual input. The input on the left is "AI took most of the jobs." The three outputs suggested on the right are "In fact, the company's fist hiring in May," "The only job that was not taken was that of," and "'When I was young, it was like being" without a closing quote mark.


Text Generation

The ability of a pre-trained model like GPT-2 to generate coherent text is very impressive. We can give it a prefix text and ask it to generate the next word, phrase, or sentence.

An example use case is generating a product reviews dataset to see which type of words are generally used in positive reviews versus negative reviews.

Let’s look at some examples, starting with what we get if we start with the positive prefix, “Really liked this movie!”

Sample 1Loved the character’s emotions at being in constant danger, and how his inner fears were slowly overcome by these events. Also loved that he is so focused on surviving; even for a while it felt like something out of Batman v Superman was showing up every now again because you always knew your enemy would show up eventually anyways :) The ending theme really stuck with me too… I mean yeah they did have to…
Sample 2It did exactly what I wanted it to do, just a lot of fun. The cast was awesome and the story makes sense so far after seeing everything else already on IMDB. There are hints about being in an underground city from time immemorial due their “power” (or lack thereof) but they don’t really tell that much…I think its more like one big mystery then some other aspects coming together with me for now though….It’s supposed…

As you can see, the word review was not anywhere in the prefix, but as most reviews are titles followed by the body of the review, this forced the model to adapt to that distribution. Also notice the reference to Batman v Superman.

Let’s see another example. Instead of a movie review, we’ll try to generate a product review using the negative prefix, “A trash product! Do not buy.”

Sample 1And please, do NOT try to use this as a fertilizer for your gardens or lawns – it will cause some serious water damage if ingested by fish and birds. Use with caution because of the possibility that pesticides may be present in pet foods available online at organic grocery stores such too “good” names like Petsmart, PetSmart & Safeway.How toxic is food waste? Waste products can contribute significant amounts - enough so that even very…
Sample 2No one will give you a piece of that garbage ever again. Just do yourself and your business a favor and clean up before buying it for nothing more than to save some money or convenience in the future.1/2 oz Bottle Plastic Food Storage Packaging + 1 pack #1636370105702615 (P, 3ml) $7 Buy Now You can purchase these bottles at any retail stores but here is how to get…

Again, the prefix can be inferred as the title of a product review, so the model starts generating text following that pattern. GPT-2 can generate any type of text like this.

A Google Colab notebook is ready to be used for experiments, as is the “Write With Transformer” live demo.

Question Answering

Yes, since GPT-2 is trained on the web, it “knows” a lot of human knowledge that has been published online up till 2019. It can work for contextual questions as well, but we will have to follow the explicit format of “Question: X, Answer:” before letting it attempt to autocomplete. But if we force the model to answer our question, it may output a pretty vague answer. Here’s what happens trying to force it to answer open-ended questions to test its knowledge:

Sample 1Question: Who invented the theory of evolution?
Answer: The theory of evolution was first proposed by Charles Darwin in 1859.
Sample 2Question: How many teeth do humans have?
Answer: Humans have 21 teeth.

As we can see, the pre-trained model gave a pretty detailed answer to the first question. For the second, it tried its best, but it does not compare with Google Search.

It’s clear that GPT-2 has huge potential. Fine-tuning it, it can be used for the above-mentioned examples with much higher accuracy. But even the pre-trained GPT-2 we are evaluating is still not that bad.

Pre-trained NLP Models: Google’s T5

Google’s T5 is one of the most advanced natural language models to date. It builds on top of previous work on Transformer models in general. Unlike BERT, which had only encoder blocks, and GPT-2, which had only decoder blocks, T5 uses both.

T5 inputs and outputs. 1) "translate English to German: That is good," becomes "Das ist gut." 2) "cola sentence: The course is jumping well," becomes "not acceptable." 3) "stsb sentence1: The rhino grazed on the grass. sentence2: A rhino is grazing in a field," becomes "3.8." 4) "summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…" becomes "six people hospitalized after a storm in attala county."

Examples of inputs and corresponding outputs from the T5 model, from Google's 2019 paper, "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer."

GPT-2 being trained on 40 GB of text data was already impressive, but T5 was trained on a 7 TB dataset. Even though it was trained for a very, very large number of iterations, it could not go through all the text. Although T5 can do text generation like GPT-2, we will use it for more interesting business use cases.


Let’s start with a simple task: text summarization. For those AI development companies wanting to build an app that summarizes a news article, T5 is perfectly suited for the task. For example, giving this article to T5, here are three different summaries it produced:

V1destiny 2’s next season, starting march 10, will rework swords . they’ll have recharging energy used to power both heavy attacks and guarding . the valentine’s day event, crimson days, is also happening this month .
V2bungie has revealed that the next season of destiny 2 will dramatically rework swords . the studio has mostly been coy about what the season will entail . the rethink will let swords partly bypass ai enemies’ shields .
V3destiny 2’s next season will rework swords and let them bypass ai enemies’ shields . the season starts march 10th . you can play destiny 2 during crimson days, a valentine’s day event .

As we can see, it has done a pretty nifty job of summarizing the article. Also, each summary is different from the others.

Summarizing using pre-trained models has huge potential applications. One interesting use case could be to generate a summary of every article automatically and put that at the start for readers who just want a synopsis. It could be taken further by personalizing the summary for each user. For example, if some users have smaller vocabularies, they could be served a summary with less complicated word choices. This is a very simple example, yet it demonstrates the power of this model.

Another interesting use case could be to use such summaries in the SEO of a website. Although T5 can be trained to generate very high-quality SEO automatically, using a summary might help out of the box, without retraining the model.

Reading Comprehension

T5 can also be used for reading comprehension, e.g., answering questions from a given context. This application has very interesting use cases we will see later. But let’s start with a few examples:

QuestionWho invented the theory of evolution?
(Encyclopædia Britannica)
The discovery of fossil bones from large extinct mammals in Argentina and the observation of numerous species of finches in the Galapagos Islands were among the events credited with stimulating Darwin’s interest in how species originate. In 1859 he published On the Origin of Species by Means of Natural Selection, a treatise establishing the theory of evolution and, most important, the role of natural selection in determining its course.

There is no explicit mention that Darwin invented the theory, but the model used its existing knowledge along with some context to reach the right conclusion.

How about a very small context?

QuestionWhere did we go?
ContextOn my birthday, we decided to visit the northern areas of Pakistan. It was really fun.
Answernorthern areas of pakistan

Okay, that was pretty easy. How about a philosophical question?

QuestionWhat is the meaning of life?
The meaning of life as we perceive it is derived from philosophical and religious contemplation of, and scientific inquiries about existence, social ties, consciousness, and happiness. Many other issues are also involved, such as symbolic meaning, ontology, value, purpose, ethics, good and evil, free will, the existence of one or multiple gods, conceptions of God, the soul, and the afterlife. Scientific contributions focus primarily on describing related empirical facts about the universe, exploring the context and parameters concerning the “how” of life.
Answerphilosophical and religious contemplation of, and scientific inquiries about existence, social ties, consciousness, and happiness

Although we know the answer to this question is very complicated, T5 tried to come up with a very close, yet sensible answer. Kudos!

Let us take it further. Let’s ask a few questions using the previously mentioned Engadget article as the context.

QuestionWhat is this about?
Answerdestiny 2 will dramatically rework
QuestionWhen can we expect this update?
Answermarch 10th

As you can see, the contextual question answering of T5 is very good. One business use case could be to build a contextual chatbot for websites that answers queries relevant to the current page.

Another use case could be to search for some information from documents, e.g., ask questions like, “Is it a breach of contract to use a company laptop for a personal project?” using a legal document as context. Although T5 has its limits, it is pretty well-suited for this type of task.

Readers may wonder, Why not use specialized models for each task? It’s a good point: The accuracy would be much higher and the deployment cost of specialized models would be much lower than T5’s pre-trained NLP model. But the beauty of T5 is precisely that it is “one model to rule them all,” i.e., you can use one pre-trained model for almost any NLP task. Plus, we want to use these models out of the box, without retraining or fine-tuning. So for developers creating an app that summarizes different articles, as well as an app that does contextual question answering, the same T5 model can do both of them.

Pre-trained Models: The Deep Learning Models That Will Soon Be Ubiquitous

In this article, we explored pre-trained models and how to use them out of the box for different business use cases. Just like a classical sorting algorithm is used almost everywhere for sorting problems, these pre-trained models will be used as standard algorithms. It’s pretty clear that what we explored was just scratching the surface of NLP applications, and there is a lot more that can be done by these models.

Pre-trained deep learning models like StyleGAN-2 and DeepLabv3 can power, in a similar fashion, applications of computer vision. I hope you enjoyed this article and look forward to hearing your comments below.

Original article source at: https://www.toptal.com/

#models #nlp #deeplearning 

Learn Getting The Most Out Of Pre-trained Models

How Documents and Models Are Organized

Make data modeling even easier by understanding Vertabelo’s document structure and how the program organizes documents and models.

To feel comfortable when modeling data, you must be familiar with your data modeling tool. In this case, it’s VERTABELO.

For starters, it’s important that you know how the main structure of documents works and what it looks like in Vertabelo. This includes knowing what types of documents are available, how they can be organized, and the purpose of every folder.

This article will guide you through all of this simply and easily.

Folders in Vertabelo

In Vertabelo, you can find the document structure in the left panel. It consists of four main folders:

  • My Vertabelo
  • Shared
  • Recent
  • Trash

Vertabelo’s Document Structure: How Documents and Models Are Organized

All these folders have their own purpose, and I’ll walk you through each one.

My Vertabelo Folder

This folder stores all documents belonging to a particular user account.

It allows you to create three different document types:

  • Logical data model
  • Physical data model
  • SQL script

To find out the purpose of each document, refer to this ARTICLE ON DOCUMENT TYPES IN VERTABELO.

Vertabelo’s Document Structure: How Documents and Models Are Organized

When creating a new document, you can choose one of these three document types. Notice that every document type has a distinct icon. When you’ve created various document types, you’ll be able to distinguish them by that icon on the right panel:

Vertabelo’s Document Structure: How Documents and Models Are Organized

All three documents in the above image have the same name. However, the icons tell you the first document is a physical data model for a bike shop database. The second is a logical data model, while the third one is an SQL script that creates a database.

The right panel also allows you to sort your documents by name or by the time documents were modified.

As you probably noticed, I have four sub-folders in the My Vertabelo folder. Yes, that means you can create your own subfolders too. The three steps highlighted orange show you how:

Vertabelo’s Document Structure: How Documents and Models Are Organized

You can create endless levels of subfolders according to your needs. If your folder tree grows a bit, it could look like this:

Vertabelo’s Document Structure: How Documents and Models Are Organized

To navigate between the folders’ levels, use the left panel (shown above). The right panel will then show all the folders and models associated with the folder you’re in. For example, these are all the folders and documents in the My Vertabelo folder:

Vertabelo’s Document Structure: How Documents and Models Are Organized

If you move to another folder, you’ll see its subfolders and models only:

Vertabelo’s Document Structure: How Documents and Models Are Organized

You don’t have to keep the data models to yourself; you can also share them with other users. If you do that, the additional icon will help differentiate such models from the non-shared ones:

Vertabelo’s Document Structure: How Documents and Models Are Organized

This sharing feature implies (correctly) that Vertabelo is perfectly suited for collaboration. You can easily collaborate with others by SHARING DATA MODELS OR PUBLISHING MODELS ON THE WEB. There’s also this article detailing how and what you can SHARE IN VERTABELO.

Shared Folder

Speaking of sharing, the second main folder stores the documents that other users have shared with you. It also contains information on who shared the documents and when.

Recent Folder

This folder contains recently-opened documents. They are grouped by the last opened time into several categories:

  • Earlier this week
  • Earlier this month
  • Earlier this year
  • Older

Here’s how it looks:

Vertabelo’s Document Structure: How Documents and Models Are Organized

Trash Folder

The name of this folder is probably self-explanatory:  you can find here all the documents deleted from the My Vertabelo folder.

I have several such documents:

Vertabelo’s Document Structure: How Documents and Models Are Organized

Keep it Beautiful by Keeping it Simple

The Vertabelo documents structure is rather simple. It feels natural because it follows the folder structure every computer user is familiar with.

Vertabelo’s main purpose is to make your work efficient. To do that, this modeling tool has to make sure it doesn’t get in your way. It lets you concentrate on what you’re here for: creating data models.

Now that you know the document types, your next step should be to learn how to create LOGICAL or PHYSICAL DIAGRAMS in Vertabelo, or how you can WORK WITH AN SQL SCRIPT. Give it a go!

Original article source at: https://www.vertabelo.com/

#documents #models 

How Documents and Models Are Organized