Rocio  O'Keefe

Rocio O'Keefe

1642159380

DeepNeuro: A Deep Learning Python Package for Neuroimaging Data

DeepNeuro

A deep learning python package for neuroimaging data. Focused on validated command-line tools you can use today. Created by the Quantitative Tumor Imaging Lab at the Martinos Center (Harvard-MIT Program in Health, Sciences, and Technology / Massachusetts General Hospital).

Table of Contents

About

DeepNeuro is an open-source toolset of deep learning applications for neuroimaging. We have several goals for this package:

  • Provide easy-to-use command line tools for neuroimaging using deep learning.
  • Create Docker containers for each tool and all out-of-package pre-processing steps, so they can each can be run without having install prerequisite libraries.
  • Provide freely available deep learning models trained on a wealth of neuroimaging data.
  • Provide training scripts and links to publically-available data to replicate the results of DeepNeuro's models.
  • Provide implementations of popular models for medical imaging data, and pre-processed datasets for educational purposes.

This package is under active development, but we encourage users to both try the modules with pre-trained modules highlighted below, and try their hand at making their own DeepNeuro modules using the tutorials below.

Installation

Install Docker from Docker's website here: https://www.docker.com/get-started. Follow instructions on that link to get Docker set up properly on your workstation.

Install the Docker Engine Utility for NVIDIA GPUs, AKA nvidia-docker. You can find installation instructions at their Github page, here: https://github.com/NVIDIA/nvidia-docker

Pull the DeepNeuro Docker container from https://hub.docker.com/r/qtimlab/deepneuro_segment_gbm/. Use the command "docker pull qtimlab/deepneuro"

If you want to run DeepNeuro outside of a Docker container, you can install the DeepNeuro Python package locally using the pip package manager. On the command line, run pip install deepneuro

Tutorials

Modules

Citation

If you use DeepNeuro in your published work, please cite:

Beers, A., Brown, J., Chang, K., Hoebel, K., Patel, J., Ly, K. Ina, Tolaney, S.M., Brastianos, P., Rosen, B., Gerstner, E., and Kalpathy-Cramer, J. (2020). DeepNeuro: an open-source deep learning toolbox for neuroimaging. Neuroinformatics. DOI: 10.1007/s12021-020-09477-5. PMID: 32578020

If you use the MRI skull-stripping or glioblastoma segmentation modules, please cite:

Chang, K., Beers, A.L., Bai, H.X., Brown, J.M., Ly, K.I., Li, X., Senders, J.T., Kavouridis, V.K., Boaro, A., Su, C., Bi, W.L., Rapalino, O., Liao, W., Shen, Q., Zhou, H., Xiao, B., Wang, Y., Zhang, P.J., Pinho, M.C., Wen, P.Y., Batchelor, T.T., Boxerman, J.L., Arnaout, O., Rosen, B.R., Gerstner, E.R., Yang, L., Huang, R.Y., and Kalpathy-Cramer, J., 2019. Automatic assessment of glioma burden: A deep learning algorithm for fully automated volumetric and bi-dimensional measurement. Neuro-Oncology. DOI: 10.1093/neuonc/noz106. PMID: 31190077

Contact

DeepNeuro is under active development, and you may run into errors or want additional features. Send any questions or requests for methods to qtimlab@gmail.com. You can also submit a Github issue if you run into a bug.

Acknowledgements

The Center for Clinical Data Science at Massachusetts General Hospital and the Brigham and Woman's Hospital provided technical and hardware support for the development of DeepNeuro, including access to graphics processing units. The DeepNeuro project is also indebted to the following Github repository for the 3D UNet by user ellisdg, which formed the original kernel for much of its code in early stages. Long live open source deep learning :)

Disclaimer

This software package and the deep learning models within are intended for research purposes only and have not yet been validated for clinical use.

Author: QTIM-Lab
Source Code: https://github.com/QTIM-Lab/DeepNeuro 
License: MIT License

#python #deeplearning #jupyternotebook 

DeepNeuro: A Deep Learning Python Package for Neuroimaging Data
Rocio  O'Keefe

Rocio O'Keefe

1642136760

Medpy: Medical Image Processing in Python

medpy - Medical Image Processing in Python

MedPy is an image processing library and collection of scripts targeted towards medical (i.e. high dimensional) image processing.

Stable releases

Development version

  • Download (development version): https://github.com/loli/medpy
  • HTML documentation and installation instruction (development version): create this from doc/ folder following instructions in contained README file

Python 2 version

Python 2 is no longer supported. But you can still use the older releases <=0.3.0.

Other links

Author: Loli
Source Code: https://github.com/loli/medpy 
License: GPL-3.0 License

#python #machine-learning #jupyternotebook 

Medpy: Medical Image Processing in Python
Nat  Grady

Nat Grady

1642043640

Portfolio and Risk analytics in Python

pyfolio

pyfolio is a Python library for performance and risk analysis of financial portfolios developed by Quantopian Inc. It works well with the Zipline open source backtesting library. Quantopian also offers a fully managed service for professionals that includes Zipline, Alphalens, Pyfolio, FactSet data, and more.

At the core of pyfolio is a so-called tear sheet that consists of various individual plots that provide a comprehensive image of the performance of a trading algorithm. Here's an example of a simple tear sheet analyzing a strategy:

simple tear 0 simple tear 1

Also see slides of a talk about pyfolio.

Installation

To install pyfolio, run:

pip install pyfolio

Development

For development, you may want to use a virtual environment to avoid dependency conflicts between pyfolio and other Python projects you have. To get set up with a virtual env, run:

mkvirtualenv pyfolio

Next, clone this git repository and run python setup.py develop and edit the library files directly.

Matplotlib on OSX

If you are on OSX and using a non-framework build of Python, you may need to set your backend:

echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc

Usage

A good way to get started is to run the pyfolio examples in a Jupyter notebook. To do this, you first want to start a Jupyter notebook server:

jupyter notebook

From the notebook list page, navigate to the pyfolio examples directory and open a notebook. Execute the code in a notebook cell by clicking on it and hitting Shift+Enter.

Questions?

If you find a bug, feel free to open an issue in this repository.

You can also join our mailing list or our Gitter channel.

Support

Please open an issue for support.

Contributing

If you'd like to contribute, a great place to look is the issues marked with help-wanted.

For a list of core developers and outside collaborators, see the GitHub contributors list.

Author: Quantopian
Source Code: https://github.com/quantopian/pyfolio 
License: Apache-2.0 License

#python #jupyternotebook 

Portfolio and Risk analytics in Python

DDSP: Differentiable Digital Signal Processing

DDSP: Differentiable Digital Signal Processing

DDSP is a library of differentiable versions of common DSP functions (such as synthesizers, waveshapers, and filters). This allows these interpretable elements to be used as part of an deep learning model, especially as the output layers for audio generation.

Getting Started

First, follow the steps in the Installation section to install the DDSP package and its dependencies. DDSP modules can be used to generate and manipulate audio from neural network outputs as in this simple example:

import ddsp

# Get synthesizer parameters from a neural network.
outputs = network(inputs)

# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()

# Generates audio from harmonic synthesizer.
audio = harmonic(outputs['amplitudes'],
                 outputs['harmonic_distribution'],
                 outputs['f0_hz'])

Links

 

Demos

Colab notebooks demonstrating some of the neat things you can do with DDSP ddsp/colab/demos

Timbre Transfer: Convert audio between sound sources with pretrained models. Try turning your voice into a violin, or scratching your laptop and seeing how it sounds as a flute :). Pick from a selection of pretrained models or upload your own that you can train with the train_autoencoder demo.

Train Autoencoder: Takes you through all the steps to convert audio files into a dataset and train your own DDSP autoencoder model. You can transfer data and models to/from google drive, and download a .zip file of your trained model to be used with the timbre_transfer demo.

Pitch Detection: Demonstration of self-supervised pitch detection models from the 2020 ICML Workshop paper.

 

Tutorials

To introduce the main concepts of the library, we have step-by-step colab tutorials for all the major library components ddsp/colab/tutorials.

Modules

The DDSP library consists of a core library (ddsp/) and a self-contained training library (ddsp/training/). The core library is split up into into several modules:

  • Core: All the differentiable DSP functions.
  • Processors: Base classes for Processor and ProcessorGroup.
  • Synths: Processors that generate audio from network outputs.
  • Effects: Processors that transform audio according to network outputs.
  • Losses: Loss functions relevant to DDSP applications.
  • Spectral Ops: Helper library of Fourier and related transforms.

Besides the tutorials, each module has its own test file that can be helpful for examples of usage.

 

Installation

Requires tensorflow version >= 2.1.0, but the core library runs in either eager or graph mode.

sudo apt-get install libsndfile-dev
pip install --upgrade pip
pip install --upgrade ddsp

 

Overview

Processor

The Processor is the main object type and preferred API of the DDSP library. It inherits from tfkl.Layer and can be used like any other differentiable module.

Unlike other layers, Processors (such as Synthesizers and Effects) specifically format their inputs into controls that are physically meaningful. For instance, a synthesizer might need to remove frequencies above the Nyquist frequency to avoid aliasing or ensure that its amplitudes are strictly positive. To this end, they have the methods:

  • get_controls(): inputs -> controls.
  • get_signal(): controls -> signal.
  • __call__(): inputs -> signal. (i.e. get_signal(**get_controls()))

Where:

  • inputs is a variable number of tensor arguments (depending on processor). Often the outputs of a neural network.
  • controls is a dictionary of tensors scaled and constrained specifically for the processor.
  • signal is an output tensor (usually audio or control signal for another processor).

For example, here are of some inputs to an Harmonic() synthesizer:

logo

And here are the resulting controls after logarithmically scaling amplitudes, removing harmonics above the Nyquist frequency, and normalizing the remaining harmonic distribution:

logo

Notice that only 18 harmonics are nonzero (sample rate 16kHz, Nyquist 8kHz, 18*440=7920Hz) and they sum to 1.0 at all times

ProcessorGroup

Consider the situation where you want to string together a group of Processors. Since Processors are just instances of tfkl.Layer you could use python control flow, as you would with any other differentiable modules.

In the example below, we have an audio autoencoder that uses a differentiable harmonic+noise synthesizer with reverb to generate audio for a multi-scale spectrogram reconstruction loss.

import ddsp

# Get synthesizer parameters from the input audio.
outputs = network(audio_input)

# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()
filtered_noise = ddsp.synths.FilteredNoise()
reverb = ddsp.effects.TrainableReverb()
spectral_loss = ddsp.losses.SpectralLoss()

# Generate audio.
audio_harmonic = harmonic(outputs['amplitudes'],
                          outputs['harmonic_distribution'],
                          outputs['f0_hz'])
audio_noise = filtered_noise(outputs['magnitudes'])
audio = audio_harmonic + audio_noise
audio = reverb(audio)

# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)

ProcessorGroup (with a list)

A ProcessorGroup allows specifies a as a Directed Acyclic Graph (DAG) of processors. The main advantage of using a ProcessorGroup is that the entire signal processing chain can be specified in a .gin file, removing the need to write code in python for every different configuration of processors.

You can specify the DAG as a list of tuples dag = [(processor, ['input1', 'input2', ...]), ...] where processor is an Processor instance, and ['input1', 'input2', ...] is a list of strings specifying input arguments. The output signal of each processor can be referenced as an input by the string 'processor_name/signal' where processor_name is the name of the processor at construction. The ProcessorGroup takes a dictionary of inputs, who keys can be referenced in the DAG.

import ddsp
import gin

# Get synthesizer parameters from the input audio.
outputs = network(audio_input)

# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()
filtered_noise = ddsp.synths.FilteredNoise()
add = ddsp.processors.Add()
reverb = ddsp.effects.TrainableReverb()
spectral_loss = ddsp.losses.SpectralLoss()

# Processor group DAG
dag = [
  (harmonic,
   ['amps', 'harmonic_distribution', 'f0_hz']),
  (filtered_noise,
   ['magnitudes']),
  (add,
   ['harmonic/signal', 'filtered_noise/signal']),
  (reverb,
   ['add/signal'])
]
processor_group = ddsp.processors.ProcessorGroup(dag=dag)

# Generate audio.
audio = processor_group(outputs)

# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)

ProcessorGroup (with gin)

The main advantage of a ProcessorGroup is that it can be defined with a .gin file, allowing flexible configurations without having to write new python code for every new DAG.

In the example below we pretend we have an external file written, which we treat here as a string. Now, after parsing the gin file, the ProcessorGroup will have its arguments configured on construction.

import ddsp
import gin

gin_config = """
import ddsp

processors.ProcessorGroup.dag = [
  (@ddsp.synths.Harmonic(),
   ['amplitudes', 'harmonic_distribution', 'f0_hz']),
  (@ddsp.synths.FilteredNoise(),
   ['magnitudes']),
  (@ddsp.processors.Add(),
   ['filtered_noise/signal', 'harmonic/signal']),
  (@ddsp.effects.TrainableReverb(),
   ['add/signal'])
]
"""

with gin.unlock_config():
  gin.parse_config(gin_config)

# Get synthesizer parameters from the input audio.
outputs = network(audio_input)

# Initialize signal processors, arguments are configured by gin.
processor_group = ddsp.processors.ProcessorGroup()

# Generate audio.
audio = processor_group(outputs)

# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)

A word about gin...

The gin library is a "super power" of dependency injection, and we find it very helpful for our experiments, but with great power comes great responsibility. There are two methods for injecting dependencies with gin.

@gin.configurable makes a function globally configurable, such that anywhere the function or object is called, gin sets its default arguments/constructor values. This can lead to a lot of unintended side-effects.

@gin.register registers a function or object with gin, and only sets the default argument values when the function or object itself is used as an argument to another function.

To "use gin responsibly", by wrapping most functions with @gin.register so that they can be specified as arguments of more "global" @gin.configurable functions/objects such as ProcessorGroup in the main library and Model, train(), evaluate(), and sample() in ddsp/training.

As you can see in the code, this allows us to flexibly define hyperparameters of most functions without worrying about side-effects. One exception is ddsp.core.oscillator_bank.use_angular_cumsum where we can enable a slower but more accurate algorithm globally.

Backwards compatability

For backwards compatability, we keep track of changes in function signatures in update_gin_config.py, which can be used to update old operative configs to work with the current library.

 

Contributing

We're eager to collaborate with you! See CONTRIBUTING.md for a guide on how to contribute.

 

Citation

If you use this code please cite it as:

@inproceedings{
  engel2020ddsp,
  title={DDSP: Differentiable Digital Signal Processing},
  author={Jesse Engel and Lamtharn (Hanoi) Hantrakul and Chenjie Gu and Adam Roberts},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=B1x1ma4tDr}
}

 

Disclaimer

Functions and classes marked EXPERIMENTAL in their doc string are under active development and very likely to change. They should not be expected to be maintained in their current state.

This is not an official Google product.

Author: Magenta
Source Code: https://github.com/magenta/ddsp 
License: Apache-2.0 License

#python #jupyternotebook 

DDSP: Differentiable Digital Signal Processing
Elian  Harber

Elian Harber

1642010220

Google Maps for Jupyter Notebooks

gmaps

gmaps is a plugin for including interactive Google maps in the IPython Notebook.

Let's plot a heatmap of taxi pickups in San Francisco:

import gmaps
import gmaps.datasets
gmaps.configure(api_key="AI...") # Your Google API key

# load a Numpy array of (latitude, longitude) pairs
locations = gmaps.datasets.load_dataset("taxi_rides")

fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations))
fig

docs/source/_images/taxi_example.png

We can also plot chloropleth maps using GeoJSON:

from matplotlib.cm import viridis
from matplotlib.colors import to_hex

import gmaps
import gmaps.datasets
import gmaps.geojson_geometries

gmaps.configure(api_key="AI...") # Your Google API key

countries_geojson = gmaps.geojson_geometries.load_geometry('countries') # Load GeoJSON of countries

rows = gmaps.datasets.load_dataset('gini') # 'rows' is a list of tuples
country2gini = dict(rows) # dictionary mapping 'country' -> gini coefficient
min_gini = min(country2gini.values())
max_gini = max(country2gini.values())
gini_range = max_gini - min_gini

def calculate_color(gini):
    """
    Convert the GINI coefficient to a color
    """
    # make gini a number between 0 and 1
    normalized_gini = (gini - min_gini) / gini_range

    # invert gini so that high inequality gives dark color
    inverse_gini = 1.0 - normalized_gini

    # transform the gini coefficient to a matplotlib color
    mpl_color = viridis(inverse_gini)

    # transform from a matplotlib color to a valid CSS color
    gmaps_color = to_hex(mpl_color, keep_alpha=False)

    return gmaps_color

# Calculate a color for each GeoJSON feature
colors = []
for feature in countries_geojson['features']:
    country_name = feature['properties']['name']
    try:
        gini = country2gini[country_name]
        color = calculate_color(gini)
    except KeyError:
        # no GINI for that country: return default color
        color = (0, 0, 0, 0.3)
    colors.append(color)

fig = gmaps.figure()
gini_layer = gmaps.geojson_layer(
    countries_geojson,
    fill_color=colors,
    stroke_color=colors,
    fill_opacity=0.8)
fig.add_layer(gini_layer)
fig

docs/source/_images/geojson-2.png

Or, for coffee fans, a map of all Starbucks in the UK:

import gmaps
import gmaps.datasets
gmaps.configure(api_key="AI...") # Your Google API key

df = gmaps.datasets.load_dataset_as_df('starbucks_kfc_uk')

starbucks_df = df[df['chain_name'] == 'starbucks']
starbucks_df = starbucks_df[['latitude', 'longitude']]

starbucks_layer = gmaps.symbol_layer(
    starbucks_df, fill_color="green", stroke_color="green", scale=2
)
fig = gmaps.figure()
fig.add_layer(starbucks_layer)
fig

docs/source/_images/starbucks-symbols.png

Installation

Installing jupyter-gmaps with conda

The easiest way to install gmaps is with conda:

$ conda install -c conda-forge gmaps

Installing jupyter-gmaps with pip

Make sure that you have enabled ipywidgets widgets extensions:

$ jupyter nbextension enable --py --sys-prefix widgetsnbextension

You can then install gmaps with:

$ pip install gmaps

Then tell Jupyter to load the extension with:

$ jupyter nbextension enable --py --sys-prefix gmaps

Installing jupyter-gmaps for JupyterLab

To use jupyter-gmaps with JupyterLab, you will need to install the jupyter widgets extension for JupyterLab:

$ jupyter labextension install @jupyter-widgets/jupyterlab-manager

You can then install jupyter-gmaps via pip (or conda):

$ pip install gmaps

Next time you open JupyterLab, you will be prompted to rebuild JupyterLab: this is necessary to include the jupyter-gmaps frontend code into your JupyterLab installation. You can also trigger this directly on the command line with:

$ jupyter lab build

Support for JupyterLab pre 1.0

To install jupyter-gmaps with versions of JupyterLab pre 1.0, you will need to pin the version of jupyterlab-manager and of jupyter-gmaps. Find the version of the jupyterlab-manager that you need from this compatibility table. For instance, for JupyterLab 0.35.x:

$ jupyter labextension install @jupyter-widgets/jupyterlab-manager@0.38

Then, install a pinned version of jupyter-gmaps:

$ pip install gmaps==0.8.4

You will then need to rebuild JupyterLab with:

$ jupyter lab build

Google API keys

To access Google maps, gmaps needs a Google API key. This key tells Google who you are, presumably so it can keep track of rate limits and such things. To create an API key, follow the instructions in the documentation. Once you have an API key, pass it to gmaps before creating widgets:

gmaps.configure(api_key="AI...")

Documentation

Documentation for gmaps is available here.

Similar libraries

The current version of this library is inspired by the ipyleaflet notebook widget extension. This extension aims to provide much of the same functionality as gmaps, but for leaflet maps, not Google maps.

Vision and roadmap

Jupyter-gmaps is built for data scientists. Data scientists should be able to visualize geographical data on a map with minimal friction. Beyond just visualization, they should be able to integrate gmaps into their widgets so they can build interactive applications.

We see the priorities of gmaps as:

  • responding to events, like user clicks, so that maps can be used interactively.
  • adding greater flexibility and customisability (e.g. choosing map styles)

Issue reporting and contributing

Report issues using the github issue tracker.

Contributions are welcome. Read the CONTRIBUTING guide to learn how to contribute.

Author: Pbugnion
Source Code: https://github.com/pbugnion/gmaps 
License: View license

#python #jupyternotebook #google-maps #maps 

Google Maps for Jupyter Notebooks
Royce  Reinger

Royce Reinger

1642004640

Streaming Over Lightweight Data Transformations

Description

Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast and has OpenCV in its backend. Full auto-generated docs and examples are available here: https://mipt-oulu.github.io/solt/.

Features

  • Support of Images, masks and keypoints for all the transforms (including multiple items at the time)
  • Fast and PyTorch-integrated
  • Convenient and flexible serialization API
  • Excellent documentation
  • Easy to extend
  • 100% Code coverage

Examples

Images: Cats Images + Keypoints: Cats Medical Images + Binary Masks: Brain MRI Medical Images + Multiclass Masks Knee MRI

E.g. the last row is generated using the following transforms stream.

stream = solt.Stream([
    slt.Rotate(angle_range=(-20, 20), p=1, padding='r'),
    slt.Crop((256, 256)),
    solt.SelectiveStream([
        slt.GammaCorrection(gamma_range=0.5, p=1),
        slt.Noise(gain_range=0.1, p=1),
        slt.Blur()    
    ], n=3)
])

img_aug, mask_aug = stream({'image': img, 'mask': mask})

If you want to visualize the results, you need to modify the execution of the transforms:

img_aug, mask_aug = stream({'image': img, 'mask': mask}, return_torch=False).data

Installation

The most recent version is available in pip:

pip install solt

You can fetch the most fresh changes from this repository:

pip install git+https://github.com/MIPT-Oulu/solt

Benchmark

We propose a fair benchmark based on the refactored version of the one proposed by albumentations team, but here, we also convert the results into a PyTorch tensor and do the ImageNet normalization. The following numbers support a realistic and honest comparison between the libraries (number of images per second, the higher - the better):

 albumentations
0.4.3
torchvision (Pillow-SIMD backend)
0.5.0
augmentor
0.2.8
solt
0.1.9
HorizontalFlip2253254925613530
VerticalFlip2380255725723740
RotateAny147913896702070
Crop2242566196619814281
Crop1285467573857207186
Crop6492859112904910345
Crop3211979105501060712348
Pad3001642109-2631
VHFlipRotateCrop157413346161889
HFlipCrop2391194319173572

Python and library versions: Python 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0], numpy 1.18.1, pillow-simd 7.0.0.post3, opencv-python 4.2.0.32, scikit-image 0.16.2, scipy 1.4.1.

The code was run on AMD Threadripper 1900. Please find the details about the benchmark here.

How to contribute

Follow the guidelines described here.

Author

Aleksei Tiulpin, Research Unit of Medical Imaging, Physics and Technology, University of Oulu, Finalnd.

How to cite

If you use SOLT and cite it in your research, please, don't hesitate to sent an email to Aleksei Tiulpin. All the papers that use SOLT are listed here.

@misc{solt2019,
  author       = {Aleksei Tiulpin},
  title        = {SOLT: Streaming over Lightweight Transformations},
  month        = jul,
  year         = 2019,
  version      = {v0.1.9},
  doi          = {10.5281/zenodo.3702819},
  url          = {https://doi.org/10.5281/zenodo.3702819}
}

Author: MIPT-Oulu
Source Code: https://github.com/MIPT-Oulu/solt 
License: MIT License

#python #jupyternotebook 

Streaming Over Lightweight Data Transformations

Semantic Similarity Framework for Knowledge Graph

Introduction

Sematch is an integrated framework for the development, evaluation, and application of semantic similarity for Knowledge Graphs (KGs). It is easy to use Sematch to compute semantic similarity scores of concepts, words and entities. Sematch focuses on specific knowledge-based semantic similarity metrics that rely on structural knowledge in taxonomy (e.g. depth, path length, least common subsumer), and statistical information contents (corpus-IC and graph-IC). Knowledge-based approaches differ from their counterpart corpus-based approaches relying on co-occurrence (e.g. Pointwise Mutual Information) or distributional similarity (Latent Semantic Analysis, Word2Vec, GLOVE and etc). Knowledge-based approaches are usually used for structural KGs, while corpus-based approaches are normally applied in textual corpora.

In text analysis applications, a common pipeline is adopted in using semantic similarity from concept level, to word and sentence level. For example, word similarity is first computed based on similarity scores of WordNet concepts, and sentence similarity is computed by composing word similarity scores. Finally, document similarity could be computed by identifying important sentences, e.g. TextRank.

logo

KG based applications also meet similar pipeline in using semantic similarity, from concept similarity (e.g. http://dbpedia.org/class/yago/Actor109765278) to entity similarity (e.g. http://dbpedia.org/resource/Madrid). Furthermore, in computing document similarity, entities are extracted and document similarity is computed by composing entity similarity scores.

kg

In KGs, concepts usually denote ontology classes while entities refer to ontology instances. Moreover, those concepts are usually constructed into hierarchical taxonomies, such as DBpedia ontology class, thus quantifying concept similarity in KG relies on similar semantic information (e.g. path length, depth, least common subsumer, information content) and semantic similarity metrics (e.g. Path, Wu & Palmer,Li, Resnik, Lin, Jiang & Conrad and WPath). In consequence, Sematch provides an integrated framework to develop and evaluate semantic similarity metrics for concepts, words, entities and their applications.


Getting started: 20 minutes to Sematch

Install Sematch

You need to install scientific computing libraries numpy and scipy first. An example of installing them with pip is shown below.

pip install numpy scipy

Depending on different OS, you can use different ways to install them. After sucessful installation of numpy and scipy, you can install sematch with following commands.

pip install sematch
python -m sematch.download

Alternatively, you can use the development version to clone and install Sematch with setuptools. We recommend you to update your pip and setuptools.

git clone https://github.com/gsi-upm/sematch.git
cd sematch
python setup.py install

We also provide a Sematch-Demo Server. You can use it for experimenting with main functionalities or take it as an example for using Sematch to develop applications. Please check our Documentation for more details.

Computing Word Similarity

The core module of Sematch is measuring semantic similarity between concepts that are represented as concept taxonomies. Word similarity is computed based on the maximum semantic similarity of WordNet concepts. You can use Sematch to compute multi-lingual word similarity based on WordNet with various of semantic similarity metrics.

from sematch.semantic.similarity import WordNetSimilarity
wns = WordNetSimilarity()

# Computing English word similarity using Li method
wns.word_similarity('dog', 'cat', 'li') # 0.449327301063
# Computing Spanish word similarity using Lin method
wns.monol_word_similarity('perro', 'gato', 'spa', 'lin') #0.876800984373
# Computing Chinese word similarity using  Wu & Palmer method
wns.monol_word_similarity('็‹—', '็Œซ', 'cmn', 'wup') # 0.857142857143
# Computing Spanish and English word similarity using Resnik method
wns.crossl_word_similarity('perro', 'cat', 'spa', 'eng', 'res') #7.91166650904
# Computing Spanish and Chinese word similarity using Jiang & Conrad method
wns.crossl_word_similarity('perro', '็Œซ', 'spa', 'cmn', 'jcn') #0.31023804699
# Computing Chinese and English word similarity using WPath method
wns.crossl_word_similarity('็‹—', 'cat', 'cmn', 'eng', 'wpath')#0.593666388463

Computing semantic similarity of YAGO concepts.

from sematch.semantic.similarity import YagoTypeSimilarity
sim = YagoTypeSimilarity()

#Measuring YAGO concept similarity through WordNet taxonomy and corpus based information content
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Actor109765278', 'wpath') #0.642
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Singer110599806', 'wpath') #0.544
#Measuring YAGO concept similarity based on graph-based IC
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Actor109765278', 'wpath_graph') #0.423
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Singer110599806', 'wpath_graph') #0.328

Computing semantic similarity of DBpedia concepts.

from sematch.semantic.graph import DBpediaDataTransform, Taxonomy
from sematch.semantic.similarity import ConceptSimilarity
concept = ConceptSimilarity(Taxonomy(DBpediaDataTransform()),'models/dbpedia_type_ic.txt')
concept.name2concept('actor')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'path')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'wup')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'li')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'res')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'lin')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'jcn')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'wpath')

Computing semantic similarity of DBpedia entities.

from sematch.semantic.similarity import EntitySimilarity
sim = EntitySimilarity()
sim.similarity('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona') #0.409923677282
sim.similarity('http://dbpedia.org/resource/Apple_Inc.','http://dbpedia.org/resource/Steve_Jobs')#0.0904545454545
sim.relatedness('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona')#0.457984139871
sim.relatedness('http://dbpedia.org/resource/Apple_Inc.','http://dbpedia.org/resource/Steve_Jobs')#0.465991132787

Evaluate semantic similarity metrics with word similarity datasets

from sematch.evaluation import WordSimEvaluation
from sematch.semantic.similarity import WordNetSimilarity
evaluation = WordSimEvaluation()
evaluation.dataset_names()
wns = WordNetSimilarity()
# define similarity metrics
wpath = lambda x, y: wns.word_similarity_wpath(x, y, 0.8)
# evaluate similarity metrics with SimLex dataset
evaluation.evaluate_metric('wpath', wpath, 'noun_simlex')
# performa Steiger's Z significance Test
evaluation.statistical_test('wpath', 'path', 'noun_simlex')
# define similarity metrics for Spanish words
wpath_es = lambda x, y: wns.monol_word_similarity(x, y, 'spa', 'path')
# define cross-lingual similarity metrics for English-Spanish
wpath_en_es = lambda x, y: wns.crossl_word_similarity(x, y, 'eng', 'spa', 'wpath')
# evaluate metrics in multilingual word similarity datasets
evaluation.evaluate_metric('wpath_es', wpath_es, 'rg65_spanish')
evaluation.evaluate_metric('wpath_en_es', wpath_en_es, 'rg65_EN-ES')

Evaluate semantic similarity metrics with category classification

Although the word similarity correlation measure is the standard way to evaluate the semantic similarity metrics, it relies on human judgements over word pairs which may not have same performance in real applications. Therefore, apart from word similarity evaluation, the Sematch evaluation framework also includes a simple aspect category classification. The task classifies noun concepts such as pasta, noodle, steak, tea into their ontological parent concept FOOD, DRINKS.

from sematch.evaluation import AspectEvaluation
from sematch.application import SimClassifier, SimSVMClassifier
from sematch.semantic.similarity import WordNetSimilarity

# create aspect classification evaluation
evaluation = AspectEvaluation()
# load the dataset
X, y = evaluation.load_dataset()
# define word similarity function
wns = WordNetSimilarity()
word_sim = lambda x, y: wns.word_similarity(x, y)
# Train and evaluate metrics with unsupervised classification model
simclassifier = SimClassifier.train(zip(X,y), word_sim)
evaluation.evaluate(X,y, simclassifier)

macro averge:  (0.65319812882333839, 0.7101245049198579, 0.66317566364913016, None)
micro average:  (0.79210167952791644, 0.79210167952791644, 0.79210167952791644, None)
weighted average:  (0.80842645056024054, 0.79210167952791644, 0.79639496616636352, None)
accuracy:  0.792101679528
             precision    recall  f1-score   support

    SERVICE       0.50      0.43      0.46       519
 RESTAURANT       0.81      0.66      0.73       228
       FOOD       0.95      0.87      0.91      2256
   LOCATION       0.26      0.67      0.37        54
   AMBIENCE       0.60      0.70      0.65       597
     DRINKS       0.81      0.93      0.87       752

avg / total       0.81      0.79      0.80      4406

Matching Entities with type using SPARQL queries

You can use Sematch to download a list of entities having a specific type using different languages. Sematch will generate SPARQL queries and execute them in DBpedia Sparql Endpoint.

from sematch.application import Matcher
matcher = Matcher()
# matching scientist entities from DBpedia
matcher.match_type('scientist')
matcher.match_type('cientรญfico', 'spa')
matcher.match_type('็ง‘ๅญฆๅฎถ', 'cmn')
matcher.match_entity_type('movies with Tom Cruise')

Example of automatically generated SPARQL query.

SELECT DISTINCT ?s, ?label, ?abstract WHERE {
    {  
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/NuclearPhysicist110364643> . }
 UNION {  
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Econometrician110043491> . }
 UNION {  
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Sociologist110620758> . }
 UNION {  
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Archeologist109804806> . }
 UNION {  
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Neurolinguist110354053> . } 
    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> . 
    ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
    FILTER( lang(?label) = "en") . 
    ?s <http://dbpedia.org/ontology/abstract> ?abstract . 
    FILTER( lang(?abstract) = "en") .
} LIMIT 5000

Entity feature extraction with Similarity Graph

Apart from semantic matching of entities from DBpedia, you can also use Sematch to extract features of entities and apply semantic similarity analysis using graph-based ranking algorithms. Given a list of objects (concepts, words, entities), Sematch compute their pairwise semantic similarity and generate similarity graph where nodes denote objects and edges denote similarity scores. An example of using similarity graph for extracting important words from an entity description.

from sematch.semantic.graph import SimGraph
from sematch.semantic.similarity import WordNetSimilarity
from sematch.nlp import Extraction, word_process
from sematch.semantic.sparql import EntityFeatures
from collections import Counter
tom = EntityFeatures().features('http://dbpedia.org/resource/Tom_Cruise')
words = Extraction().extract_nouns(tom['abstract'])
words = word_process(words)
wns = WordNetSimilarity()
word_graph = SimGraph(words, wns.word_similarity)
word_scores = word_graph.page_rank()
words, scores =zip(*Counter(word_scores).most_common(10))
print words
(u'picture', u'action', u'number', u'film', u'post', u'sport', 
u'program', u'men', u'performance', u'motion')

Publications

Ganggao Zhu, and Carlos A. Iglesias. "Computing Semantic Similarity of Concepts in Knowledge Graphs." IEEE Transactions on Knowledge and Data Engineering 29.1 (2017): 72-85.

Oscar Araque, Ganggao Zhu, Manuel Garcia-Amado and Carlos A. Iglesias Mining the Opinionated Web: Classification and Detection of Aspect Contexts for Aspect Based Sentiment Analysis, ICDM sentire, 2016.

Ganggao Zhu, and Carlos Angel Iglesias. "Sematch: Semantic Entity Search from Knowledge Graph." SumPre-HSWI@ ESWC. 2015.


Support

You can post bug reports and feature requests in Github issues. Make sure to read our guidelines first. This project is still under active development approaching to its goals. The project is mainly maintained by Ganggao Zhu. You can contact him via gzhu [at] dit.upm.es


Why this name, Sematch and Logo?

The name of Sematch is composed based on Spanish "se" and English "match". It is also the abbreviation of semantic matching because semantic similarity metrics helps to determine semantic distance of concepts, words, entities, instead of exact matching.

The logo of Sematch is based on Chinese Yin and Yang which is written in I Ching. Somehow, it correlates to 0 and 1 in computer science.

Author: Gsi-upm
Source Code: https://github.com/gsi-upm/sematch 
License: View license

#python #jupyternotebook #graph 

Semantic Similarity Framework for Knowledge Graph
Dexter  Goodwin

Dexter Goodwin

1641985680

Automatic Extraction Of Relevant Features From Time Series

tsfresh

This repository contains the TSFRESH python package. The abbreviation stands for

"Time Series Feature extraction based on scalable hypothesis tests".

The package provides systematic time-series feature extraction by combining established algorithms from statistics, time-series analysis, signal processing, and nonlinear dynamics with a robust feature selection algorithm. In this context, the term time-series is interpreted in the broadest possible sense, such that any types of sampled data or even event sequences can be characterised.

Spend less time on feature engineering

Data Scientists often spend most of their time either cleaning data or building features. While we cannot change the first thing, the second can be automated. TSFRESH frees your time spent on building features by extracting them automatically. Hence, you have more time to study the newest deep learning paper, read hacker news or build better models.

Automatic extraction of 100s of features

TSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value or more complex features such as the time reversal symmetry statistic.

The features extracted from a exemplary time series

The set of features can then be used to construct statistical or machine learning models on the time series to be used for example in regression or classification tasks.

Forget irrelevant features

Time series often contain noise, redundancies or irrelevant information. As a result most of the extracted features will not be useful for the machine learning task at hand.

To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.

It is based on the well developed theory of hypothesis testing and uses a multiple test procedure. As a result the filtering process mathematically controls the percentage of irrelevant extracted features.

The TSFRESH package is described in the following open access paper:

  • Christ, M., Braun, N., Neuffer, J., and Kempa-Liehr A.W. (2018). Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh -- A Python package). Neurocomputing 307 (2018) 72-77, doi: 10.1016/j.neucom.2018.03.067.

The FRESH algorithm is described in the following whitepaper:

  • Christ, M., Kempa-Liehr, A.W., and Feindt, M. (2017). Distributed and parallel time series feature extraction for industrial big data applications. ArXiv e-print 1610.07717, https://arxiv.org/abs/1610.07717.

Due to the fact that tsfresh basically provides time-series feature extraction for free, you can now concentrate on engineering new time-series, like e.g. differences of signals from synchronous measurements, which provide even better time-series features:

  • Kempa-Liehr, A.W., Oram, J., Wong, A., Finch, M., Besier, T. (2020). Feature engineering workflow for activity recognition from synchronized inertial measurement units. In: Pattern Recognition. ACPR 2019. Ed. by M. Cree et al. Vol. 1180. Communications in Computer and Information Science (CCIS). Singapore: Springer 2020, 223โ€“231. doi: 10.1007/978-981-15-3651-9_20.

Systematic time-series features engineering allows to work with time-series samples of different lengths, because every time-series is projected into a well-defined feature space. This approach allows the design of robust machine learning algorithms in applications with missing data.

  • Kennedy, A., Gemma, N., Rattenbury, N., Kempa-Liehr, A.W. (2021). Modelling the projected separation of microlensing events using systematic time-series feature engineering. Astronomy and Computing 35.100460 (2021), 1โ€“14, doi: 10.1016/j.ascom.2021.100460

Natural language processing of written texts is an example of applying systematic time-series feature engineering to event sequences, which is described in the following open access paper:

  • Tang, Y., Blincoe, K., Kempa-Liehr, A.W. (2020). Enriching Feature Engineering for Short Text Samples by Language Time Series Analysis. EPJ Data Science 9.26 (2020), 1โ€“59. doi: 10.1140/epjds/s13688-020-00244-9

Advantages of tsfresh

TSFRESH has several selling points, for example

  1. it is field tested
  2. it is unit tested
  3. the filtering process is statistically/mathematically correct
  4. it has a comprehensive documentation
  5. it is compatible with sklearn, pandas and numpy
  6. it allows anyone to easily add their favorite features
  7. it both runs on your local machine or even on a cluster

Next steps

If you are interested in the technical workings, go to see our comprehensive Read-The-Docs documentation at http://tsfresh.readthedocs.io.

The algorithm, especially the filtering part are also described in the paper mentioned above.

If you have some questions or feedback you can find the developers in the gitter chatroom.

We appreciate any contributions, if you are interested in helping us to make TSFRESH the biggest archive of feature extraction methods in python, just head over to our How-To-Contribute instructions.

If you want to try out tsfresh quickly or if you want to integrate it into your workflow, we also have a docker image available:

docker pull nbraun/tsfresh

Acknowledgements

The research and development of TSFRESH was funded in part by the German Federal Ministry of Education and Research under grant number 01IS14004 (project iPRODICT).

Author: Blue-yonder
Source Code: https://github.com/blue-yonder/tsfresh 
License: MIT License

#jupyternotebook #python #data-science #time 

Automatic Extraction Of Relevant Features From Time Series
Elian  Harber

Elian Harber

1641979320

How to Use Mapbox GL JS To Visualize Data in A Python Jupyter Notebook

Location Data Visualization library for Jupyter Notebooks

Create Mapbox GL JS data visualizations natively in Jupyter Notebooks with Python and Pandas. mapboxgl is a high-performance, interactive, WebGL-based data visualization tool that drops directly into Jupyter. mapboxgl is similar to Folium built on top of the raster Leaflet map library, but with much higher performance for large data sets using WebGL and Mapbox Vector Tiles.

https://cl.ly/3a0K2m1o2j1A/download/Image%202018-02-22%20at%207.16.58%20PM.png

Try out the interactive map example notebooks from the /examples directory in this repository

  1. Categorical points
  2. All visualization types
  3. Choropleth Visualization types
  4. Image Visualization types
  5. Raster Tile Visualization types

Installation

$ pip install mapboxgl

Documentation

Documentation is on Read The Docs at https://mapbox-mapboxgl-jupyter.readthedocs-hosted.com/en/latest/.

Usage

The examples directory contains sample Jupyter notebooks demonstrating usage.

import os

import pandas as pd

from mapboxgl.utils import create_color_stops, df_to_geojson
from mapboxgl.viz import CircleViz


# Load data from sample csv
data_url = 'https://raw.githubusercontent.com/mapbox/mapboxgl-jupyter/master/examples/data/points.csv'
df = pd.read_csv(data_url)

# Must be a public token, starting with `pk`
token = os.getenv('MAPBOX_ACCESS_TOKEN')

# Create a geojson file export from a Pandas dataframe
df_to_geojson(df, filename='points.geojson',
              properties=['Avg Medicare Payments', 'Avg Covered Charges', 'date'],
              lat='lat', lon='lon', precision=3)

# Generate data breaks and color stops from colorBrewer
color_breaks = [0,10,100,1000,10000]
color_stops = create_color_stops(color_breaks, colors='YlGnBu')

# Create the viz from the dataframe
viz = CircleViz('points.geojson',
                access_token=token,
                height='400px',
                color_property = "Avg Medicare Payments",
                color_stops = color_stops,
                center = (-95, 40),
                zoom = 3,
                below_layer = 'waterway-label'
              )
viz.show()

Development

Install the python library locally with pip:

$ pip install -e .

To run tests use pytest:

$ pip install mock pytest
$ python -m pytest

To run the Jupyter examples,

$ cd examples
$ pip install jupyter
$ jupyter notebook

We follow the PEP8 style guide for Python for all Python code.

Release process

  • Update the version number in mapboxgl/__init__.py and push directly to master.

Upload the release files

  • twine upload dist/mapboxgl-*

Create the release files

  • rm dist/* # clean out old releases if they exist
  • python setup.py sdist bdist_wheel

Setup for pypi (one time only)

  • You'll need to pip install twine and set up your credentials in a ~/.pypirc file.

Tag the release

  • git tag <version>
  • git push --tags

After merging all relevant PRs for the upcoming release, pull the master branch

  • git checkout master
  • git pull

Author: Mapbox
Source Code: https://github.com/mapbox/mapboxgl-jupyter 
License: MIT License
#python #jupyternotebook #html #datavisualizations 

How to Use Mapbox GL JS To Visualize Data in A Python Jupyter Notebook

How to Build a Graph Nets in Tensorflow

Graph Nets library

Graph Nets is DeepMind's library for building graph networks in Tensorflow and Sonnet.

Contact graph-nets@google.com for comments and questions.

What are graph networks?

A graph network takes a graph as input and returns a graph as output. The input graph has edge- (E ), node- (V ), and global-level (u) attributes. The output graph has the same structure, but updated attributes. Graph networks are part of the broader family of "graph neural networks" (Scarselli et al., 2009).

To learn more about graph networks, see our arXiv paper: Relational inductive biases, deep learning, and graph networks.

Graph network

Installation

The Graph Nets library can be installed from pip.

This installation is compatible with Linux/Mac OS X, and Python 2.7 and 3.4+.

The library will work with both the CPU and GPU version of TensorFlow, but to allow for that it does not list Tensorflow as a requirement, so you need to install Tensorflow separately if you haven't already done so.

To install the Graph Nets library and use it with TensorFlow 1 and Sonnet 1, run:

(CPU)

$ pip install graph_nets "tensorflow>=1.15,<2" "dm-sonnet<2" "tensorflow_probability<0.9"

(GPU)

$ pip install graph_nets "tensorflow_gpu>=1.15,<2" "dm-sonnet<2" "tensorflow_probability<0.9"

To install the Graph Nets library and use it with TensorFlow 2 and Sonnet 2, run:

(CPU)

$ pip install graph_nets "tensorflow>=2.1.0-rc1" "dm-sonnet>=2.0.0b0" tensorflow_probability

(GPU)

$ pip install graph_nets "tensorflow_gpu>=2.1.0-rc1" "dm-sonnet>=2.0.0b0" tensorflow_probability

The latest version of the library requires TensorFlow >=1.15. For compatibility with earlier versions of TensorFlow, please install v1.0.4 of the Graph Nets library.

Usage example

The following code constructs a simple graph net module and connects it to data.

import graph_nets as gn
import sonnet as snt

# Provide your own functions to generate graph-structured data.
input_graphs = get_graphs()

# Create the graph network.
graph_net_module = gn.modules.GraphNetwork(
    edge_model_fn=lambda: snt.nets.MLP([32, 32]),
    node_model_fn=lambda: snt.nets.MLP([32, 32]),
    global_model_fn=lambda: snt.nets.MLP([32, 32]))

# Pass the input graphs to the graph network, and return the output graphs.
output_graphs = graph_net_module(input_graphs)

Demo Jupyter notebooks

The library includes demos which show how to create, manipulate, and train graph networks to reason about graph-structured data, on a shortest path-finding task, a sorting task, and a physical prediction task. Each demo uses the same graph network architecture, which highlights the flexibility of the approach.

Try the demos in your browser in Colaboratory

To try out the demos without installing anything locally, you can run the demos in your browser (even on your phone) via a cloud Colaboratory backend. Click a demo link below, and follow the instructions in the notebook.


Run "shortest path demo" in browser

The "shortest path demo" creates random graphs, and trains a graph network to label the nodes and edges on the shortest path between any two nodes. Over a sequence of message-passing steps (as depicted by each step's plot), the model refines its prediction of the shortest path.

Shortest path


Run "sort demo" in browser (Run TF2 version)

The "sort demo" creates lists of random numbers, and trains a graph network to sort the list. After a sequence of message-passing steps, the model makes an accurate prediction of which elements (columns in the figure) come next after each other (rows).

Sort


Run "physics demo" in browser

The "physics demo" creates random mass-spring physical systems, and trains a graph network to predict the state of the system on the next timestep. The model's next-step predictions can be fed back in as input to create a rollout of a future trajectory. Each subplot below shows the true and predicted mass-spring system states over 50 steps. This is similar to the model and experiments in Battaglia et al. (2016)'s "interaction networks".

Physics


Run "graph nets basics demo" in browser (Run TF2 version)

The "graph nets basics demo" is a tutorial containing step by step examples about how to create and manipulate graphs, how to feed them into graph networks and how to build custom graph network modules.


Run the demos on your local machine

To install the necessary dependencies, run:

$ pip install jupyter matplotlib scipy

To try the demos, run:

$ cd <path-to-graph-nets-library>/demos
$ jupyter notebook

then open a demo through the Jupyter notebook interface.

Other graph neural network libraries

Check out these high-quality open-source libraries for graph neural networks:

jraph: DeepMind's GNNs/GraphNets library for JAX.

pytorch_geometric: See MetaLayer for an analog of our Graph Nets interface.

Deep Graph Library (DGL).

Author: Deepmind
Source Code: https://github.com/deepmind/graph_nets 
License: Apache-2.0 License

#graph #python #jupyternotebook #deep-learning #tensorflow 

How to Build a Graph Nets in Tensorflow
Elian  Harber

Elian Harber

1641971727

PySAL: Python Spatial Analysis Library Meta-Package

Python Spatial Analysis Library

PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with an emphasis on geospatial vector data written in Python. It supports the development of high level applications for spatial analysis, such as

  • detection of spatial clusters, hot-spots, and outliers
  • construction of graphs from spatial data
  • spatial regression and statistical modeling on geographically embedded networks
  • spatial econometrics
  • exploratory spatio-temporal data analysis

PySAL Components

PySAL is a family of packages for spatial data science and is divided into four major components:

Lib

solve a wide variety of computational geometry problems including graph construction from polygonal lattices, lines, and points, construction and interactive editing of spatial weights matrices & graphs - computation of alpha shapes, spatial indices, and spatial-topological relationships, and reading and writing of sparse graph data, as well as pure python readers of spatial vector data. Unike other PySAL modules, these functions are exposed together as a single package.

  • libpysal : libpysal provides foundational algorithms and data structures that support the rest of the library. This currently includes the following modules: input/output (io), which provides readers and writers for common geospatial file formats; weights (weights), which provides the main class to store spatial weights matrices, as well as several utilities to manipulate and operate on them; computational geometry (cg), with several algorithms, such as Voronoi tessellations or alpha shapes that efficiently process geometric shapes; and an additional module with example data sets (examples).

Explore

The explore layer includes modules to conduct exploratory analysis of spatial and spatio-temporal data. At a high level, packages in explore are focused on enabling the user to better understand patterns in the data and suggest new interesting questions rather than answer existing ones. They include methods to characterize the structure of spatial distributions (either on networks, in continuous space, or on polygonal lattices). In addition, this domain offers methods to examine the dynamics of these distributions, such as how their composition or spatial extent changes over time.

esda : esda implements methods for the analysis of both global (map-wide) and local (focal) spatial autocorrelation, for both continuous and binary data. In addition, the package increasingly offers cutting-edge statistics about boundary strength and measures of aggregation error in statistical analyses

giddy : giddy is an extension of esda to spatio-temporal data. The package hosts state-of-the-art methods that explicitly consider the role of space in the dynamics of distributions over time

inequality : inequality provides indices for measuring inequality over space and time. These comprise classic measures such as the Theil T information index and the Gini index in mean deviation form; but also spatially-explicit measures that incorporate the location and spatial configuration of observations in the calculation of inequality measures.

momepy : momepy is a library for quantitative analysis of urban form - urban morphometrics. It aims to provide a wide range of tools for a systematic and exhaustive analysis of urban form. It can work with a wide range of elements, while focused on building footprints and street networks. momepy stands for Morphological Measuring in Python.

pointpats : pointpats supports the statistical analysis of point data, including methods to characterize the spatial structure of an observed point pattern: a collection of locations where some phenomena of interest have been recorded. This includes measures of centrography which provide overall geometric summaries of the point pattern, including central tendency, dispersion, intensity, and extent.

segregation : segregation package calculates over 40 different segregation indices and provides a suite of additional features for measurement, visualization, and hypothesis testing that together represent the state-of-the-art in quantitative segregation analysis.

spaghetti : spaghetti supports the the spatial analysis of graphs, networks, topology, and inference. It includes functionality for the statistical testing of clusters on networks, a robust all-to-all Dijkstra shortest path algorithm with multiprocessing functionality, and high-performance geometric and spatial computations using geopandas that are necessary for high-resolution interpolation along networks, and the ability to connect near-network observations onto the network

Model

In contrast to explore, the model layer focuses on confirmatory analysis. In particular, its packages focus on the estimation of spatial relationships in data with a variety of linear, generalized-linear, generalized-additive, nonlinear, multi-level, and local regression models.

mgwr : mgwr provides scalable algorithms for estimation, inference, and prediction using single- and multi-scale geographically-weighted regression models in a variety of generalized linear model frameworks, as well model diagnostics tools

spglm : spglm implements a set of generalized linear regression techniques, including Gaussian, Poisson, and Logistic regression, that allow for sparse matrix operations in their computation and estimation to lower memory overhead and decreased computation time.

spint : spint provides a collection of tools to study spatial interaction processes and analyze spatial interaction data. It includes functionality to facilitate the calibration and interpretation of a family of gravity-type spatial interaction models, including those with production constraints, attraction constraints, or a combination of the two.

spreg : spreg supports the estimation of classic and spatial econometric models. Currently it contains methods for estimating standard Ordinary Least Squares (OLS), Two Stage Least Squares (2SLS) and Seemingly Unrelated Regressions (SUR), in addition to various tests of homokestadicity, normality, spatial randomness, and different types of spatial autocorrelation. It also includes a suite of tests for spatial dependence in models with binary dependent variables.

spvcm : spvcm provides a general framework for estimating spatially-correlated variance components models. This class of models allows for spatial dependence in the variance components, so that nearby groups may affect one another. It also also provides a general-purpose framework for estimating models using Gibbs sampling in Python, accelerated by the numba package.

tobler : tobler provides functionality for for areal interpolation and dasymetric mapping. Its name is an homage to the legendary geographer Waldo Tobler a pioneer of dozens of spatial analytical methods. tobler includes functionality for interpolating data using area-weighted approaches, regression model-based approaches that leverage remotely-sensed raster data as auxiliary information, and hybrid approaches.

access : access aims to make it easy for analysis to calculate measures of spatial accessibility. This work has traditionally had two challenges: [1] to calculate accurate travel time matrices at scale and [2] to derive measures of access using the travel times and supply and demand locations. access implements classic spatial access models, allowing easy comparison of methodologies and assumptions.

spopt: spopt is an open-source Python library for solving optimization problems with spatial data. Originating from the original region module in PySAL, it is under active development for the inclusion of newly proposed models and methods for regionalization, facility location, and transportation-oriented solutions.

Viz

The viz layer provides functionality to support the creation of geovisualisations and visual representations of outputs from a variety of spatial analyses. Visualization plays a central role in modern spatial/geographic data science. Current packages provide classification methods for choropleth mapping and a common API for linking PySAL outputs to visualization tool-kits in the Python ecosystem.

legendgram : legendgram is a small package that provides "legendgrams" legends that visualize the distribution of observations by color in a given map. These distributional visualizations for map classification schemes assist in analytical cartography and spatial data visualization

mapclassify : mapclassify provides functionality for Choropleth map classification. Currently, fifteen different classification schemes are available, including a highly-optimized implementation of Fisher-Jenks optimal classification. Each scheme inherits a common structure that ensures computations are scalable and supports applications in streaming contexts.

splot : splot provides statistical visualizations for spatial analysis. It methods for visualizing global and local spatial autocorrelation (through Moran scatterplots and cluster maps), temporal analysis of cluster dynamics (through heatmaps and rose diagrams), and multivariate choropleth mapping (through value-by-alpha maps. A high level API supports the creation of publication-ready visualizations

Installation

PySAL is available through Anaconda (in the defaults or conda-forge channel) We recommend installing PySAL from conda-forge:

conda config --add channels conda-forge
conda install pysal

PySAL can also be installed using pip:

pip install pysal

As of version 2.0.0 PySAL has shifted to Python 3 only.

Users who need an older stable version of PySAL that is Python 2 compatible can install version 1.14.3 through pip or conda:

conda install pysal==1.14.3

Documentation

For help on using PySAL, check out the following resources:

Development

As of version 2.0.0, PySAL is now a collection of affiliated geographic data science packages. Changes to the code for any of the subpackages should be directed at the respective upstream repositories, and not made here. Infrastructural changes for the meta-package, like those for tooling, building the package, and code standards, will be considered.

Development is hosted on github.

Discussions of development as well as help for users occurs on the developer list as well as gitter.

Getting Involved

If you are interested in contributing to PySAL please see our development guidelines.

Bug reports

To search for or report bugs, please see PySAL's issues.

Build Instructions

To build the meta-package pysal see tools/README.md.

Author: Pysal
Source Code: https://github.com/pysal/pysal 
License: BSD-3-Clause License

#python #jupyternotebook 

PySAL: Python Spatial Analysis Library Meta-Package
Elian  Harber

Elian Harber

1641930420

Ipyleaflet: A Jupyter - Leaflet.js bridge

ipyleaflet

A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook.

Usage

Selecting a basemap for a leaflet map:

Basemap Screencast

Loading a geojson map:

GeoJSON Screencast

Making use of leafletjs primitives:

Primitives Screencast

Using the splitmap control:

Splitmap Screencast

Displaying velocity data on the top of a map:

Velocity Screencast

Choropleth layer:

Choropleth Screencast

Widget control

Widget Control

Installation

Using conda:

conda install -c conda-forge ipyleaflet

Using pip:

pip install ipyleaflet

If you are using the classic Jupyter Notebook < 5.3 you need to run this extra command:

jupyter nbextension enable --py --sys-prefix ipyleaflet

If you are using JupyterLab <=2, you will need to install the JupyterLab extension:

jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leaflet

Installation from sources

For a development installation (requires yarn, you can install it with conda install -c conda-forge yarn):

git clone https://github.com/jupyter-widgets/ipyleaflet.git
cd ipyleaflet
pip install -e .

If you are using the classic Jupyter Notebook you need to install the nbextension:

jupyter nbextension install --py --symlink --sys-prefix ipyleaflet
jupyter nbextension enable --py --sys-prefix ipyleaflet

Note for developers:

  • the -e pip option allows one to modify the Python code in-place. Restart the kernel in order to see the changes.
  • the --symlink argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows.

For developing with JupyterLab:

jupyter labextension develop --overwrite ipyleaflet

Documentation

To get started with using ipyleaflet, check out the full documentation

https://ipyleaflet.readthedocs.io/

Related projects

The ipyleaflet repository includes the jupyter-leaflet npm package, which is a front-end component, and the ipyleaflet python package which is the backend for the Python Jupyter kernel.

Similarly, the xleaflet project provides a backend to jupyter-leaflet for the "xeus-cling" C++ Jupyter kernel.

Xleaflet Screencast

Author: Jupyter-widgets/
Source Code: https://github.com/jupyter-widgets/ipyleaflet 
License: MIT License

#python #jupyternotebook #datavisualizations 

Ipyleaflet: A Jupyter - Leaflet.js bridge
Royce  Reinger

Royce Reinger

1641884100

VISSL Is FAIR's Library Of Extensible, Modular and Scalable Components

What's New

Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here.

Introduction

VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:

Reproducible implementation of SOTA in Self-Supervision: All existing SOTA in Self-Supervision are implemented - SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, Jigsaw. Also supports supervised trainings.

Benchmark suite: Variety of benchmarks tasks including linear image classification (places205, imagenet1k, voc07, food, CLEVR, dsprites, UCF101, stanford cars and many more), full finetuning, semi-supervised benchmark, nearest neighbor benchmark, object detection (Pascal VOC and COCO).

Ease of Usability: easy to use using yaml configuration system based on Hydra.

Modular: Easy to design new tasks and reuse the existing components from other tasks (objective functions, model trunk and heads, data transforms, etc.). The modular components are simple drop-in replacements in yaml config files.

Scalability: Easy to train model on 1-gpu, multi-gpu and multi-node. Several components for large scale trainings provided as simple config file plugs: Activation checkpointing, ZeRO, FP16, LARC, Stateful data sampler, data class to handle invalid images, large model backbones like RegNets, etc.

Model Zoo: Over 60 pre-trained self-supervised model weights.

Installation

See INSTALL.md.

Getting Started

Install VISSL by following the installation instructions. After installation, please see Getting Started with VISSL and the Colab Notebook to learn about basic usage.

Documentation

Learn more about VISSL at our documentation. And see the projects/ for some projects built on top of VISSL.

Tutorials

Get started with VISSL by trying one of the Colab tutorial notebooks.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the VISSL Model Zoo.

Contributors

VISSL is written and maintained by the Facebook AI Research.

Development

We welcome new contributions to VISSL and we will be actively maintaining this library! Please refer to CONTRIBUTING.md for full instructions on how to run the code, tests and linter, and submit your pull requests.

Citing VISSL

If you find VISSL useful in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{goyal2021vissl,
  author =       {Priya Goyal and Quentin Duval and Jeremy Reizenstein and Matthew Leavitt and Min Xu and
                  Benjamin Lefaudeux and Mannat Singh and Vinicius Reis and Mathilde Caron and Piotr Bojanowski and
                  Armand Joulin and Ishan Misra},
  title =        {VISSL},
  howpublished = {\url{https://github.com/facebookresearch/vissl}},
  year =         {2021}
}

Author: Facebookresearch
Source Code: https://github.com/facebookresearch/vissl 
License: MIT License

#python #jupyternotebook 

VISSL Is FAIR's Library Of Extensible, Modular and Scalable Components
Oral  Brekke

Oral Brekke

1641863280

Scikit-learn Wrappers for Python FastText

skift skift_icon

scikit-learn wrappers for Python fastText.

>>> from skift import FirstColFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstColFtClassifier(lr=0.3, epoch=10)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]

Contents

1   Installation

Dependencies:

  • numpy
  • scipy
  • scikit-learn
  • The fasttext Python package

pip install skift

2   Configuration

Because fasttext reads input data from files, skift has to dump the input data into temporary files for fasttext to use. A dedicated folder is created for those files on the filesystem. By default, this storage is allocated in the system temporary storage location (i.e. /tmp on *nix systems). To override this default location, use the SKIFT_TEMP_DIR environment variable:

export SKIFT_TEMP_DIR=/path/to/desired/temp/folder

NOTE: The directory will be created if it does not already exist.

3   Features

4   Wrappers

fastText works only on text data, which means that it will only use a single column from a dataset which might contain many feature columns of different types. As such, a common use case is to have the fastText classifier use a single column as input, ignoring other columns. This is especially true when fastText is to be used as one of several classifiers in a stacking classifier, with other classifiers using non-textual features.

skift includes several scikit-learn-compatible wrappers (for the official fastText Python package) which cater to these use cases.

NOTICE: Any additional keyword arguments provided to the classifier constructor, besides those required, will be forwarded to the fastText.train_supervised method on every call to fit.

4.1   Standard wrappers

These wrappers do not make additional assumptions on input besides those commonly made by scikit-learn classifies; i.e. that input is a 2d ndarray object and such.

  • FirstColFtClassifier - An sklearn classifier adapter for fasttext that takes the first column of input ndarray objects as input.
>>> from skift import FirstColFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstColFtClassifier(lr=0.3, epoch=10)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
  • IdxBasedFtClassifier - An sklearn classifier adapter for fasttext that takes input by column index. This is set on object construction by providing the input_ix parameter to the constructor.
>>> from skift import IdxBasedFtClassifier
>>> df = pandas.DataFrame([[5, 'woof', 0], [83, 'meow', 1]], columns=['count', 'txt', 'lbl'])
>>> sk_clf = IdxBasedFtClassifier(input_ix=1, lr=0.4, epoch=6)
>>> sk_clf.fit(df[['count', 'txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]

4.2   pandas-dependent wrappers

These wrappers assume the X parameter given to fit, predict, and predict_proba methods is a pandas.DataFrame object:

  • FirstObjFtClassifier - An sklearn adapter for fasttext using the first column of dtype == object as input.
>>> from skift import FirstObjFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstObjFtClassifier(lr=0.2)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
  • ColLblBasedFtClassifier - An sklearn adapter for fasttext taking input by column label. This is set on object construction by providing the input_col_lbl parameter to the constructor.
>>> from skift import ColLblBasedFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = ColLblBasedFtClassifier(input_col_lbl='txt', epoch=8)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
  • SeriesFtClassifier - An sklearn adapter for fasttext taking a Pandas Series as input.

5   Contributing

Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed.

5.1   Installing for development

Clone:

git clone git@github.com:shaypal5/skift.git

Install in development mode, including test dependencies:

cd skift
pip install -e '.[test]'

To also install fasttext, see instructions in the Installation section.

5.2   Running the tests

To run the tests use:

cd skift
pytest

5.3   Adding documentation

The project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings. When documenting code you add to this project, follow these conventions.

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

6   Credits

Created by Shay Palachy (shay.palachy@gmail.com).

Fixes: uniaz, crouffer, amirzamli and sgt.

Author: Shaypal5
Source Code: https://github.com/shaypal5/skift 
License: View license

#python #jupyternotebook #text 

Scikit-learn Wrappers for Python FastText

A Python Library for Learning & Evaluating Knowledge Graph Embeddings

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).

Installation

The latest stable version of PyKEEN can be downloaded and installed from PyPI with:

$ pip install pykeen

The latest version of PyKEEN can be installed directly from the source on GitHub with:

$ pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.

Quickstart

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.

PyKEEN is extensible such that:

  • Each model has the same API, so anything from pykeen.models can be dropped in
  • Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
  • Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (29)

The following datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.

NameDocumentationCitationEntitiesRelationsTriples
BioKGpykeen.datasets.BioKGWalsh et al., 2019105524172067997
Clinical Knowledge Graphpykeen.datasets.CKGSantos et al., 202076174191126691525
CN3l Familypykeen.datasets.CN3lChen et al., 201732064221777
CoDEx (large)pykeen.datasets.CoDExLargeSafavi et al., 20207795169612437
CoDEx (medium)pykeen.datasets.CoDExMediumSafavi et al., 20201705051206205
CoDEx (small)pykeen.datasets.CoDExSmallSafavi et al., 202020344236543
ConceptNetpykeen.datasets.ConceptNetSpeer et al., 2017283700835034074917
Countriespykeen.datasets.CountriesBouchard et al., 201527121158
Commonsense Knowledge Graphpykeen.datasets.CSKGIlievski et al., 20202087833584598728
DB100Kpykeen.datasets.DB100KDing et al., 201899604470697479
DBpedia50pykeen.datasets.DBpedia50Shi et al., 20172462435134421
Drug Repositioning Knowledge Graphpykeen.datasets.DRKGgnn4dr/DRKG972381075874257
FB15kpykeen.datasets.FB15kBordes et al., 2013149511345592213
FB15k-237pykeen.datasets.FB15k237Toutanova et al., 201514505237310079
Hetionetpykeen.datasets.HetionetHimmelstein et al., 201745158242250197
Kinshipspykeen.datasets.KinshipsKemp et al., 20061042510686
Nationspykeen.datasets.NationsZhenfengLei/KGDatasets14551992
OGB BioKGpykeen.datasets.OGBBioKGHu et al., 202045085515088433
OGB WikiKGpykeen.datasets.OGBWikiKGHu et al., 2020250060453517137181
OpenBioLinkpykeen.datasets.OpenBioLinkBreit et al., 2020180992284563407
OpenBioLinkpykeen.datasets.OpenBioLinkLQBreit et al., 20204808763227320889
Unified Medical Language Systempykeen.datasets.UMLSZhenfengLei/KGDatasets135466529
WD50K (triples)pykeen.datasets.WD50KTGalkin et al., 202040107473232344
Wikidata5Mpykeen.datasets.Wikidata5MWang et al., 2019459414982220624239
WK3l-120k Familypykeen.datasets.WK3l120kChen et al., 201711974831091375406
WK3l-15k Familypykeen.datasets.WK3l15kChen et al., 2017151261841209041
WordNet-18pykeen.datasets.WN18Bordes et al., 20144094318151442
WordNet-18 (RR)pykeen.datasets.WN18RRToutanova et al., 2015405591192583
YAGO3-10pykeen.datasets.YAGO310Mahdisoltani et al., 2015123143371089000

Models (37)

NameReferenceCitation
AutoSFpykeen.models.AutoSFZhang et al., 2020
BoxEpykeen.models.BoxEAbboud et al., 2020
CompGCNpykeen.models.CompGCNVashishth et al., 2020
ComplExpykeen.models.ComplExTrouillon et al., 2016
ComplEx Literalpykeen.models.ComplExLiteralKristiadi et al., 2018
ConvEpykeen.models.ConvEDettmers et al., 2018
ConvKBpykeen.models.ConvKBNguyen et al., 2018
Canonical Tensor Decompositionpykeen.models.CPLacroix et al., 2018
CrossEpykeen.models.CrossEZhang et al., 2019
DistMApykeen.models.DistMAShi et al., 2019
DistMultpykeen.models.DistMultYang et al., 2014
DistMult Literalpykeen.models.DistMultLiteralKristiadi et al., 2018
DistMult Literal (Gated)pykeen.models.DistMultLiteralGatedKristiadi et al., 2018
ER-MLPpykeen.models.ERMLPDong et al., 2014
ER-MLP (E)pykeen.models.ERMLPESharifzadeh et al., 2019
Fixed Modelpykeen.models.FixedModelBerrendorf et al., 2021
HolEpykeen.models.HolENickel et al., 2016
KG2Epykeen.models.KG2EHe et al., 2015
MuREpykeen.models.MuREBalaลพeviฤ‡ et al., 2019
NodePiecepykeen.models.NodePieceGalkin et al., 2021
NTNpykeen.models.NTNSocher et al., 2013
PairREpykeen.models.PairREChao et al., 2020
ProjEpykeen.models.ProjEShi et al., 2017
QuatEpykeen.models.QuatEZhang et al., 2019
RESCALpykeen.models.RESCALNickel et al., 2011
R-GCNpykeen.models.RGCNSchlichtkrull et al., 2018
RotatEpykeen.models.RotatESun et al., 2019
SimplEpykeen.models.SimplEKazemi et al., 2018
Structured Embeddingpykeen.models.StructuredEmbeddingBordes et al., 2011
TorusEpykeen.models.TorusEEbisu et al., 2018
TransDpykeen.models.TransDJi et al., 2015
TransEpykeen.models.TransEBordes et al., 2013
TransFpykeen.models.TransFFeng et al., 2016
TransHpykeen.models.TransHWang et al., 2014
TransRpykeen.models.TransRLin et al., 2015
TuckERpykeen.models.TuckERBalaลพeviฤ‡ et al., 2019
Unstructured Modelpykeen.models.UnstructuredModelBordes et al., 2014

Losses (13)

NameReferenceDescription
Binary cross entropy (after sigmoid)pykeen.losses.BCEAfterSigmoidLossA module for the numerically unstable version of explicit Sigmoid + BCE loss.
Binary cross entropy (with logits)pykeen.losses.BCEWithLogitsLossA module for the binary cross entropy loss.
Cross entropypykeen.losses.CrossEntropyLossA module for the cross entropy loss that evaluates the cross entropy after softmax output.
Double Marginpykeen.losses.DoubleMarginLossA limit-based scoring loss, with separate margins for positive and negative elements from [sun2018]_.
Focalpykeen.losses.FocalLossA module for the focal loss proposed by [lin2018]_.
Margin rankingpykeen.losses.MarginRankingLossA module for the pairwise hinge loss (i.e., margin ranking loss).
Mean square errorpykeen.losses.MSELossA module for the mean square error loss.
Self-adversarial negative samplingpykeen.losses.NSSALossAn implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
Pairwise logisticpykeen.losses.PairwiseLogisticLossThe pairwise logistic loss.
Pointwise Hingepykeen.losses.PointwiseHingeLossA module for the pointwise hinge loss.
Soft margin rankingpykeen.losses.SoftMarginRankingLossA module for the soft pairwise hinge loss (i.e., soft margin ranking loss).
Softpluspykeen.losses.SoftplusLossA module for the pointwise logistic loss (i.e., softplus loss).
Soft Pointwise Hingepykeen.losses.SoftPointwiseHingeLossA module for the soft pointwise hinge loss .

Regularizers (5)

NameReferenceDescription
combinedpykeen.regularizers.CombinedRegularizerA convex combination of regularizers.
lppykeen.regularizers.LpRegularizerA simple L_p norm based regularizer.
nopykeen.regularizers.NoRegularizerA regularizer which does not perform any regularization.
powersumpykeen.regularizers.PowerSumRegularizerA simple x^p based regularizer.
transhpykeen.regularizers.TransHRegularizerA regularizer for the soft constraints in TransH.

Training Loops (2)

NameReferenceDescription
lcwapykeen.training.LCWATrainingLoopA training loop that is based upon the local closed world assumption (LCWA).
slcwapykeen.training.SLCWATrainingLoopA training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (3)

NameReferenceDescription
basicpykeen.sampling.BasicNegativeSamplerA basic negative sampler.
bernoullipykeen.sampling.BernoulliNegativeSamplerAn implementation of the Bernoulli negative sampling approach proposed by [wang2014]_.
pseudotypedpykeen.sampling.PseudoTypedNegativeSamplerA sampler that accounts for which entities co-occur with a relation.

Stoppers (2)

NameReferenceDescription
earlypykeen.stoppers.EarlyStopperA harness for early stopping.
noppykeen.stoppers.NopStopperA stopper that does nothing.

Evaluators (2)

NameReferenceDescription
classificationpykeen.evaluation.ClassificationEvaluatorAn evaluator that uses a classification metrics.
rankbasedpykeen.evaluation.RankBasedEvaluatorA rank-based evaluator for KGE models.

Metrics (37)

NameIntervalDirectionDescriptionType
AUC-ROC[0, 1]๐Ÿ“ˆArea Under the ROC CurveClassification
Accuracy[0, 1]๐Ÿ“ˆ(TP + TN) / (TP + TN + FP + FN)Classification
Average Precision[0, 1]๐Ÿ“ˆA summary statistic over the precision-recall curveClassification
Balanced Accuracy[0, 1]๐Ÿ“ˆAn adjusted version of the accuracy for imbalanced datasetsClassification
Diagnostic Odds Ratio[0, inf)๐Ÿ“ˆLR+/LR-Classification
F1 Score[0, 1]๐Ÿ“ˆ2TP / (2TP + FP + FN)Classification
False Discovery Rate[0, 1]๐Ÿ“‰FP / (FP + TP)Classification
False Negative Rate[0, 1]๐Ÿ“‰FN / (FN + TP)Classification
False Omission Rate[0, 1]๐Ÿ“‰FN / (FN + TN)Classification
False Positive Rate[0, 1]๐Ÿ“‰FP / (FP + TN)Classification
Fowlkes Mallows Index[0, 1]๐Ÿ“ˆโˆšPPV x โˆšTPRClassification
Informedness[0, 1]๐Ÿ“ˆTPR + TNR - 1Classification
Markedness[0, 1]๐Ÿ“ˆPPV + NPV - 1Classification
Matthews Correlation Coefficient[-1, 1]๐Ÿ“ˆA balanced measure applicable even with class imbalanceClassification
Negative Likelihood Ratio[0, inf)๐Ÿ“‰FNR / TNRClassification
Negative Predictive Value[0, 1]๐Ÿ“ˆTN / (TN + FN)Classification
Positive Likelihood Ratio[0, inf)๐Ÿ“ˆTPR / FPRClassification
Positive Predictive Value[0, 1]๐Ÿ“ˆTP / (TP + FP)Classification
Prevalence Threshold[0, 1]๐Ÿ“‰โˆšFPR / (โˆšTPR + โˆšFPR)Classification
Threat Score[0, 1]๐Ÿ“ˆTP / (TP + FN + FP)Classification
True Negative Rate[0, 1]๐Ÿ“ˆTN / (TN + FP)Classification
True Positive Rate[0, 1]๐Ÿ“ˆTP / (TP + FN)Classification
Adjusted Arithmetic Mean Rank (AAMR)(0, 2)๐Ÿ“‰The mean over all chance-adjusted ranks.Ranking
Adjusted Arithmetic Mean Rank Index (AAMRI)[-1, 1]๐Ÿ“ˆThe re-indexed adjusted mean rank (AAMR)Ranking
Geometric Mean Rank (GMR)[1, inf)๐Ÿ“‰The geometric mean over all ranks.Ranking
Harmonic Mean Rank (HMR)[1, inf)๐Ÿ“‰The harmonic mean over all ranks.Ranking
Hits @ K[0, 1]๐Ÿ“ˆThe relative frequency of ranks not larger than a given k.Ranking
Inverse Arithmetic Mean Rank (IAMR)(0, 1]๐Ÿ“ˆThe inverse of the arithmetic mean over all ranks.Ranking
Inverse Geometric Mean Rank (IGMR)(0, 1]๐Ÿ“ˆThe inverse of the geometric mean over all ranks.Ranking
Inverse Median Rank(0, 1]๐Ÿ“ˆThe inverse of the median over all ranks.Ranking
Mean Rank (MR)[1, inf)๐Ÿ“‰The arithmetic mean over all ranks.Ranking
Mean Reciprocal Rank (MRR)(0, 1]๐Ÿ“ˆThe inverse of the harmonic mean over all ranks.Ranking
Median Rank[1, inf)๐Ÿ“‰The median over all ranks.Ranking

Trackers (8)

NameReferenceDescription
consolepykeen.trackers.ConsoleResultTrackerA class that directly prints to console.
csvpykeen.trackers.CSVResultTrackerTracking results to a CSV file.
jsonpykeen.trackers.JSONResultTrackerTracking results to a JSON lines file.
mlflowpykeen.trackers.MLFlowResultTrackerA tracker for MLflow.
neptunepykeen.trackers.NeptuneResultTrackerA tracker for Neptune.ai.
pythonpykeen.trackers.PythonResultTrackerA tracker which stores everything in Python dictionaries.
tensorboardpykeen.trackers.TensorBoardResultTrackerA tracker for TensorBoard.
wandbpykeen.trackers.WANDBResultTrackerA tracker for Weights and Biases.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

$ pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

$ pykeen experiments ablation ~/path/to/config.json

Large-scale Reproducibility and Benchmarking Study

We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:

@article{ali2020benchmarking,
  title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2006.13365},
  year={2020}
}

We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Funding

The development of PyKEEN has been funded by the following grants:

Funding BodyProgramGrant
DARPAAutomating Scientific Knowledge Extraction (ASKE)HR00111990009
German Federal Ministry of Education and Research (BMBF)Maschinelles Lernen mit Wissensgraphen (MLWin)01IS18050D
German Federal Ministry of Education and Research (BMBF)Munich Center for Machine Learning (MCML)01IS18036A
Innovation Fund Denmark (Innovationsfonden)Danish Center for Big Data Analytics driven Innovation (DABAI)Grand Solutions

Logo

The PyKEEN logo was designed by Carina Steinborn

Citation

If you have found PyKEEN useful in your work, please consider citing our article:

@article{ali2021pykeen,
    author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
    journal = {Journal of Machine Learning Research},
    number = {82},
    pages = {1--6},
    title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
    url = {http://jmlr.org/papers/v22/20-825.html},
    volume = {22},
    year = {2021}
}

Author: Pykeen
Source Code: https://github.com/pykeen/pykeen#installation-- 
License: MIT License

#python #machine-learning #jupyternotebook 

A Python Library for Learning & Evaluating Knowledge Graph Embeddings