1642159380
DeepNeuro
A deep learning python package for neuroimaging data. Focused on validated command-line tools you can use today. Created by the Quantitative Tumor Imaging Lab at the Martinos Center (Harvard-MIT Program in Health, Sciences, and Technology / Massachusetts General Hospital).
DeepNeuro is an open-source toolset of deep learning applications for neuroimaging. We have several goals for this package:
This package is under active development, but we encourage users to both try the modules with pre-trained modules highlighted below, and try their hand at making their own DeepNeuro modules using the tutorials below.
Install Docker from Docker's website here: https://www.docker.com/get-started. Follow instructions on that link to get Docker set up properly on your workstation.
Install the Docker Engine Utility for NVIDIA GPUs, AKA nvidia-docker. You can find installation instructions at their Github page, here: https://github.com/NVIDIA/nvidia-docker
Pull the DeepNeuro Docker container from https://hub.docker.com/r/qtimlab/deepneuro_segment_gbm/. Use the command "docker pull qtimlab/deepneuro"
If you want to run DeepNeuro outside of a Docker container, you can install the DeepNeuro Python package locally using the pip package manager. On the command line, run pip install deepneuro
If you use DeepNeuro in your published work, please cite:
Beers, A., Brown, J., Chang, K., Hoebel, K., Patel, J., Ly, K. Ina, Tolaney, S.M., Brastianos, P., Rosen, B., Gerstner, E., and Kalpathy-Cramer, J. (2020). DeepNeuro: an open-source deep learning toolbox for neuroimaging. Neuroinformatics. DOI: 10.1007/s12021-020-09477-5. PMID: 32578020
If you use the MRI skull-stripping or glioblastoma segmentation modules, please cite:
Chang, K., Beers, A.L., Bai, H.X., Brown, J.M., Ly, K.I., Li, X., Senders, J.T., Kavouridis, V.K., Boaro, A., Su, C., Bi, W.L., Rapalino, O., Liao, W., Shen, Q., Zhou, H., Xiao, B., Wang, Y., Zhang, P.J., Pinho, M.C., Wen, P.Y., Batchelor, T.T., Boxerman, J.L., Arnaout, O., Rosen, B.R., Gerstner, E.R., Yang, L., Huang, R.Y., and Kalpathy-Cramer, J., 2019. Automatic assessment of glioma burden: A deep learning algorithm for fully automated volumetric and bi-dimensional measurement. Neuro-Oncology. DOI: 10.1093/neuonc/noz106. PMID: 31190077
DeepNeuro is under active development, and you may run into errors or want additional features. Send any questions or requests for methods to qtimlab@gmail.com. You can also submit a Github issue if you run into a bug.
The Center for Clinical Data Science at Massachusetts General Hospital and the Brigham and Woman's Hospital provided technical and hardware support for the development of DeepNeuro, including access to graphics processing units. The DeepNeuro project is also indebted to the following Github repository for the 3D UNet by user ellisdg, which formed the original kernel for much of its code in early stages. Long live open source deep learning :)
This software package and the deep learning models within are intended for research purposes only and have not yet been validated for clinical use.
Author: QTIM-Lab
Source Code: https://github.com/QTIM-Lab/DeepNeuro
License: MIT License
1642136760
medpy - Medical Image Processing in Python
MedPy is an image processing library and collection of scripts targeted towards medical (i.e. high dimensional) image processing.
Python 2 is no longer supported. But you can still use the older releases <=0.3.0
.
Author: Loli
Source Code: https://github.com/loli/medpy
License: GPL-3.0 License
1642043640
pyfolio is a Python library for performance and risk analysis of financial portfolios developed by Quantopian Inc. It works well with the Zipline open source backtesting library. Quantopian also offers a fully managed service for professionals that includes Zipline, Alphalens, Pyfolio, FactSet data, and more.
At the core of pyfolio is a so-called tear sheet that consists of various individual plots that provide a comprehensive image of the performance of a trading algorithm. Here's an example of a simple tear sheet analyzing a strategy:
Also see slides of a talk about pyfolio.
To install pyfolio, run:
pip install pyfolio
For development, you may want to use a virtual environment to avoid dependency conflicts between pyfolio and other Python projects you have. To get set up with a virtual env, run:
mkvirtualenv pyfolio
Next, clone this git repository and run python setup.py develop
and edit the library files directly.
If you are on OSX and using a non-framework build of Python, you may need to set your backend:
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
A good way to get started is to run the pyfolio examples in a Jupyter notebook. To do this, you first want to start a Jupyter notebook server:
jupyter notebook
From the notebook list page, navigate to the pyfolio examples directory and open a notebook. Execute the code in a notebook cell by clicking on it and hitting Shift+Enter.
If you find a bug, feel free to open an issue in this repository.
You can also join our mailing list or our Gitter channel.
Please open an issue for support.
If you'd like to contribute, a great place to look is the issues marked with help-wanted.
For a list of core developers and outside collaborators, see the GitHub contributors list.
Author: Quantopian
Source Code: https://github.com/quantopian/pyfolio
License: Apache-2.0 License
1642012200
DDSP is a library of differentiable versions of common DSP functions (such as synthesizers, waveshapers, and filters). This allows these interpretable elements to be used as part of an deep learning model, especially as the output layers for audio generation.
First, follow the steps in the Installation section to install the DDSP package and its dependencies. DDSP modules can be used to generate and manipulate audio from neural network outputs as in this simple example:
import ddsp
# Get synthesizer parameters from a neural network.
outputs = network(inputs)
# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()
# Generates audio from harmonic synthesizer.
audio = harmonic(outputs['amplitudes'],
outputs['harmonic_distribution'],
outputs['f0_hz'])
Colab notebooks demonstrating some of the neat things you can do with DDSP ddsp/colab/demos
Timbre Transfer: Convert audio between sound sources with pretrained models. Try turning your voice into a violin, or scratching your laptop and seeing how it sounds as a flute :). Pick from a selection of pretrained models or upload your own that you can train with the train_autoencoder
demo.
Train Autoencoder: Takes you through all the steps to convert audio files into a dataset and train your own DDSP autoencoder model. You can transfer data and models to/from google drive, and download a .zip file of your trained model to be used with the timbre_transfer
demo.
Pitch Detection: Demonstration of self-supervised pitch detection models from the 2020 ICML Workshop paper.
To introduce the main concepts of the library, we have step-by-step colab tutorials for all the major library components ddsp/colab/tutorials
.
The DDSP library consists of a core library (ddsp/
) and a self-contained training library (ddsp/training/
). The core library is split up into into several modules:
Besides the tutorials, each module has its own test file that can be helpful for examples of usage.
Installation
Requires tensorflow version >= 2.1.0, but the core library runs in either eager or graph mode.
sudo apt-get install libsndfile-dev
pip install --upgrade pip
pip install --upgrade ddsp
Overview
The Processor
is the main object type and preferred API of the DDSP library. It inherits from tfkl.Layer
and can be used like any other differentiable module.
Unlike other layers, Processors (such as Synthesizers and Effects) specifically format their inputs
into controls
that are physically meaningful. For instance, a synthesizer might need to remove frequencies above the Nyquist frequency to avoid aliasing or ensure that its amplitudes are strictly positive. To this end, they have the methods:
get_controls()
: inputs -> controls.get_signal()
: controls -> signal.__call__()
: inputs -> signal. (i.e. get_signal(**get_controls())
)Where:
inputs
is a variable number of tensor arguments (depending on processor). Often the outputs of a neural network.controls
is a dictionary of tensors scaled and constrained specifically for the processor.signal
is an output tensor (usually audio or control signal for another processor).For example, here are of some inputs to an Harmonic()
synthesizer:
And here are the resulting controls after logarithmically scaling amplitudes, removing harmonics above the Nyquist frequency, and normalizing the remaining harmonic distribution:
Notice that only 18 harmonics are nonzero (sample rate 16kHz, Nyquist 8kHz, 18*440=7920Hz) and they sum to 1.0 at all times
Consider the situation where you want to string together a group of Processors. Since Processors are just instances of tfkl.Layer
you could use python control flow, as you would with any other differentiable modules.
In the example below, we have an audio autoencoder that uses a differentiable harmonic+noise synthesizer with reverb to generate audio for a multi-scale spectrogram reconstruction loss.
import ddsp
# Get synthesizer parameters from the input audio.
outputs = network(audio_input)
# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()
filtered_noise = ddsp.synths.FilteredNoise()
reverb = ddsp.effects.TrainableReverb()
spectral_loss = ddsp.losses.SpectralLoss()
# Generate audio.
audio_harmonic = harmonic(outputs['amplitudes'],
outputs['harmonic_distribution'],
outputs['f0_hz'])
audio_noise = filtered_noise(outputs['magnitudes'])
audio = audio_harmonic + audio_noise
audio = reverb(audio)
# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)
A ProcessorGroup
allows specifies a as a Directed Acyclic Graph (DAG) of processors. The main advantage of using a ProcessorGroup is that the entire signal processing chain can be specified in a .gin
file, removing the need to write code in python for every different configuration of processors.
You can specify the DAG as a list of tuples dag = [(processor, ['input1', 'input2', ...]), ...]
where processor
is an Processor instance, and ['input1', 'input2', ...]
is a list of strings specifying input arguments. The output signal of each processor can be referenced as an input by the string 'processor_name/signal'
where processor_name is the name of the processor at construction. The ProcessorGroup takes a dictionary of inputs, who keys can be referenced in the DAG.
import ddsp
import gin
# Get synthesizer parameters from the input audio.
outputs = network(audio_input)
# Initialize signal processors.
harmonic = ddsp.synths.Harmonic()
filtered_noise = ddsp.synths.FilteredNoise()
add = ddsp.processors.Add()
reverb = ddsp.effects.TrainableReverb()
spectral_loss = ddsp.losses.SpectralLoss()
# Processor group DAG
dag = [
(harmonic,
['amps', 'harmonic_distribution', 'f0_hz']),
(filtered_noise,
['magnitudes']),
(add,
['harmonic/signal', 'filtered_noise/signal']),
(reverb,
['add/signal'])
]
processor_group = ddsp.processors.ProcessorGroup(dag=dag)
# Generate audio.
audio = processor_group(outputs)
# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)
gin
)The main advantage of a ProcessorGroup is that it can be defined with a .gin
file, allowing flexible configurations without having to write new python code for every new DAG.
In the example below we pretend we have an external file written, which we treat here as a string. Now, after parsing the gin file, the ProcessorGroup will have its arguments configured on construction.
import ddsp
import gin
gin_config = """
import ddsp
processors.ProcessorGroup.dag = [
(@ddsp.synths.Harmonic(),
['amplitudes', 'harmonic_distribution', 'f0_hz']),
(@ddsp.synths.FilteredNoise(),
['magnitudes']),
(@ddsp.processors.Add(),
['filtered_noise/signal', 'harmonic/signal']),
(@ddsp.effects.TrainableReverb(),
['add/signal'])
]
"""
with gin.unlock_config():
gin.parse_config(gin_config)
# Get synthesizer parameters from the input audio.
outputs = network(audio_input)
# Initialize signal processors, arguments are configured by gin.
processor_group = ddsp.processors.ProcessorGroup()
# Generate audio.
audio = processor_group(outputs)
# Multi-scale spectrogram reconstruction loss.
loss = spectral_loss(audio, audio_input)
gin
...The gin library is a "super power" of dependency injection, and we find it very helpful for our experiments, but with great power comes great responsibility. There are two methods for injecting dependencies with gin.
@gin.configurable
makes a function globally configurable, such that anywhere the function or object is called, gin sets its default arguments/constructor values. This can lead to a lot of unintended side-effects.
@gin.register
registers a function or object with gin, and only sets the default argument values when the function or object itself is used as an argument to another function.
To "use gin responsibly", by wrapping most functions with @gin.register
so that they can be specified as arguments of more "global" @gin.configurable
functions/objects such as ProcessorGroup
in the main library and Model
, train()
, evaluate()
, and sample()
in ddsp/training
.
As you can see in the code, this allows us to flexibly define hyperparameters of most functions without worrying about side-effects. One exception is ddsp.core.oscillator_bank.use_angular_cumsum
where we can enable a slower but more accurate algorithm globally.
For backwards compatability, we keep track of changes in function signatures in update_gin_config.py
, which can be used to update old operative configs to work with the current library.
Contributing
We're eager to collaborate with you! See CONTRIBUTING.md
for a guide on how to contribute.
Citation
If you use this code please cite it as:
@inproceedings{
engel2020ddsp,
title={DDSP: Differentiable Digital Signal Processing},
author={Jesse Engel and Lamtharn (Hanoi) Hantrakul and Chenjie Gu and Adam Roberts},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=B1x1ma4tDr}
}
Disclaimer
Functions and classes marked EXPERIMENTAL in their doc string are under active development and very likely to change. They should not be expected to be maintained in their current state.
This is not an official Google product.
Author: Magenta
Source Code: https://github.com/magenta/ddsp
License: Apache-2.0 License
1642010220
gmaps is a plugin for including interactive Google maps in the IPython Notebook.
Let's plot a heatmap of taxi pickups in San Francisco:
import gmaps
import gmaps.datasets
gmaps.configure(api_key="AI...") # Your Google API key
# load a Numpy array of (latitude, longitude) pairs
locations = gmaps.datasets.load_dataset("taxi_rides")
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations))
fig
We can also plot chloropleth maps using GeoJSON:
from matplotlib.cm import viridis
from matplotlib.colors import to_hex
import gmaps
import gmaps.datasets
import gmaps.geojson_geometries
gmaps.configure(api_key="AI...") # Your Google API key
countries_geojson = gmaps.geojson_geometries.load_geometry('countries') # Load GeoJSON of countries
rows = gmaps.datasets.load_dataset('gini') # 'rows' is a list of tuples
country2gini = dict(rows) # dictionary mapping 'country' -> gini coefficient
min_gini = min(country2gini.values())
max_gini = max(country2gini.values())
gini_range = max_gini - min_gini
def calculate_color(gini):
"""
Convert the GINI coefficient to a color
"""
# make gini a number between 0 and 1
normalized_gini = (gini - min_gini) / gini_range
# invert gini so that high inequality gives dark color
inverse_gini = 1.0 - normalized_gini
# transform the gini coefficient to a matplotlib color
mpl_color = viridis(inverse_gini)
# transform from a matplotlib color to a valid CSS color
gmaps_color = to_hex(mpl_color, keep_alpha=False)
return gmaps_color
# Calculate a color for each GeoJSON feature
colors = []
for feature in countries_geojson['features']:
country_name = feature['properties']['name']
try:
gini = country2gini[country_name]
color = calculate_color(gini)
except KeyError:
# no GINI for that country: return default color
color = (0, 0, 0, 0.3)
colors.append(color)
fig = gmaps.figure()
gini_layer = gmaps.geojson_layer(
countries_geojson,
fill_color=colors,
stroke_color=colors,
fill_opacity=0.8)
fig.add_layer(gini_layer)
fig
Or, for coffee fans, a map of all Starbucks in the UK:
import gmaps
import gmaps.datasets
gmaps.configure(api_key="AI...") # Your Google API key
df = gmaps.datasets.load_dataset_as_df('starbucks_kfc_uk')
starbucks_df = df[df['chain_name'] == 'starbucks']
starbucks_df = starbucks_df[['latitude', 'longitude']]
starbucks_layer = gmaps.symbol_layer(
starbucks_df, fill_color="green", stroke_color="green", scale=2
)
fig = gmaps.figure()
fig.add_layer(starbucks_layer)
fig
The easiest way to install gmaps is with conda:
$ conda install -c conda-forge gmaps
Make sure that you have enabled ipywidgets widgets extensions:
$ jupyter nbextension enable --py --sys-prefix widgetsnbextension
You can then install gmaps with:
$ pip install gmaps
Then tell Jupyter to load the extension with:
$ jupyter nbextension enable --py --sys-prefix gmaps
To use jupyter-gmaps with JupyterLab, you will need to install the jupyter widgets extension for JupyterLab:
$ jupyter labextension install @jupyter-widgets/jupyterlab-manager
You can then install jupyter-gmaps via pip (or conda):
$ pip install gmaps
Next time you open JupyterLab, you will be prompted to rebuild JupyterLab: this is necessary to include the jupyter-gmaps frontend code into your JupyterLab installation. You can also trigger this directly on the command line with:
$ jupyter lab build
To install jupyter-gmaps with versions of JupyterLab pre 1.0, you will need to pin the version of jupyterlab-manager and of jupyter-gmaps. Find the version of the jupyterlab-manager that you need from this compatibility table. For instance, for JupyterLab 0.35.x:
$ jupyter labextension install @jupyter-widgets/jupyterlab-manager@0.38
Then, install a pinned version of jupyter-gmaps:
$ pip install gmaps==0.8.4
You will then need to rebuild JupyterLab with:
$ jupyter lab build
To access Google maps, gmaps needs a Google API key. This key tells Google who you are, presumably so it can keep track of rate limits and such things. To create an API key, follow the instructions in the documentation. Once you have an API key, pass it to gmaps before creating widgets:
gmaps.configure(api_key="AI...")
Documentation for gmaps is available here.
The current version of this library is inspired by the ipyleaflet notebook widget extension. This extension aims to provide much of the same functionality as gmaps, but for leaflet maps, not Google maps.
Jupyter-gmaps is built for data scientists. Data scientists should be able to visualize geographical data on a map with minimal friction. Beyond just visualization, they should be able to integrate gmaps into their widgets so they can build interactive applications.
We see the priorities of gmaps as:
Report issues using the github issue tracker.
Contributions are welcome. Read the CONTRIBUTING guide to learn how to contribute.
Author: Pbugnion
Source Code: https://github.com/pbugnion/gmaps
License: View license
1642004640
Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast and has OpenCV in its backend. Full auto-generated docs and examples are available here: https://mipt-oulu.github.io/solt/.
Images: Images + Keypoints:
Medical Images + Binary Masks:
Medical Images + Multiclass Masks
E.g. the last row is generated using the following transforms stream.
stream = solt.Stream([
slt.Rotate(angle_range=(-20, 20), p=1, padding='r'),
slt.Crop((256, 256)),
solt.SelectiveStream([
slt.GammaCorrection(gamma_range=0.5, p=1),
slt.Noise(gain_range=0.1, p=1),
slt.Blur()
], n=3)
])
img_aug, mask_aug = stream({'image': img, 'mask': mask})
If you want to visualize the results, you need to modify the execution of the transforms:
img_aug, mask_aug = stream({'image': img, 'mask': mask}, return_torch=False).data
The most recent version is available in pip:
pip install solt
You can fetch the most fresh changes from this repository:
pip install git+https://github.com/MIPT-Oulu/solt
We propose a fair benchmark based on the refactored version of the one proposed by albumentations team, but here, we also convert the results into a PyTorch tensor and do the ImageNet normalization. The following numbers support a realistic and honest comparison between the libraries (number of images per second, the higher - the better):
albumentations 0.4.3 | torchvision (Pillow-SIMD backend) 0.5.0 | augmentor 0.2.8 | solt 0.1.9 | |
---|---|---|---|---|
HorizontalFlip | 2253 | 2549 | 2561 | 3530 |
VerticalFlip | 2380 | 2557 | 2572 | 3740 |
RotateAny | 1479 | 1389 | 670 | 2070 |
Crop224 | 2566 | 1966 | 1981 | 4281 |
Crop128 | 5467 | 5738 | 5720 | 7186 |
Crop64 | 9285 | 9112 | 9049 | 10345 |
Crop32 | 11979 | 10550 | 10607 | 12348 |
Pad300 | 1642 | 109 | - | 2631 |
VHFlipRotateCrop | 1574 | 1334 | 616 | 1889 |
HFlipCrop | 2391 | 1943 | 1917 | 3572 |
Python and library versions: Python 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0], numpy 1.18.1, pillow-simd 7.0.0.post3, opencv-python 4.2.0.32, scikit-image 0.16.2, scipy 1.4.1.
The code was run on AMD Threadripper 1900. Please find the details about the benchmark here.
Follow the guidelines described here.
Aleksei Tiulpin, Research Unit of Medical Imaging, Physics and Technology, University of Oulu, Finalnd.
If you use SOLT and cite it in your research, please, don't hesitate to sent an email to Aleksei Tiulpin. All the papers that use SOLT are listed here.
@misc{solt2019,
author = {Aleksei Tiulpin},
title = {SOLT: Streaming over Lightweight Transformations},
month = jul,
year = 2019,
version = {v0.1.9},
doi = {10.5281/zenodo.3702819},
url = {https://doi.org/10.5281/zenodo.3702819}
}
Author: MIPT-Oulu
Source Code: https://github.com/MIPT-Oulu/solt
License: MIT License
1641989100
Introduction
Sematch is an integrated framework for the development, evaluation, and application of semantic similarity for Knowledge Graphs (KGs). It is easy to use Sematch to compute semantic similarity scores of concepts, words and entities. Sematch focuses on specific knowledge-based semantic similarity metrics that rely on structural knowledge in taxonomy (e.g. depth, path length, least common subsumer), and statistical information contents (corpus-IC and graph-IC). Knowledge-based approaches differ from their counterpart corpus-based approaches relying on co-occurrence (e.g. Pointwise Mutual Information) or distributional similarity (Latent Semantic Analysis, Word2Vec, GLOVE and etc). Knowledge-based approaches are usually used for structural KGs, while corpus-based approaches are normally applied in textual corpora.
In text analysis applications, a common pipeline is adopted in using semantic similarity from concept level, to word and sentence level. For example, word similarity is first computed based on similarity scores of WordNet concepts, and sentence similarity is computed by composing word similarity scores. Finally, document similarity could be computed by identifying important sentences, e.g. TextRank.
KG based applications also meet similar pipeline in using semantic similarity, from concept similarity (e.g. http://dbpedia.org/class/yago/Actor109765278
) to entity similarity (e.g. http://dbpedia.org/resource/Madrid
). Furthermore, in computing document similarity, entities are extracted and document similarity is computed by composing entity similarity scores.
In KGs, concepts usually denote ontology classes while entities refer to ontology instances. Moreover, those concepts are usually constructed into hierarchical taxonomies, such as DBpedia ontology class, thus quantifying concept similarity in KG relies on similar semantic information (e.g. path length, depth, least common subsumer, information content) and semantic similarity metrics (e.g. Path, Wu & Palmer,Li, Resnik, Lin, Jiang & Conrad and WPath). In consequence, Sematch provides an integrated framework to develop and evaluate semantic similarity metrics for concepts, words, entities and their applications.
You need to install scientific computing libraries numpy and scipy first. An example of installing them with pip is shown below.
pip install numpy scipy
Depending on different OS, you can use different ways to install them. After sucessful installation of numpy and scipy, you can install sematch with following commands.
pip install sematch
python -m sematch.download
Alternatively, you can use the development version to clone and install Sematch with setuptools. We recommend you to update your pip and setuptools.
git clone https://github.com/gsi-upm/sematch.git
cd sematch
python setup.py install
We also provide a Sematch-Demo Server. You can use it for experimenting with main functionalities or take it as an example for using Sematch to develop applications. Please check our Documentation for more details.
The core module of Sematch is measuring semantic similarity between concepts that are represented as concept taxonomies. Word similarity is computed based on the maximum semantic similarity of WordNet concepts. You can use Sematch to compute multi-lingual word similarity based on WordNet with various of semantic similarity metrics.
from sematch.semantic.similarity import WordNetSimilarity
wns = WordNetSimilarity()
# Computing English word similarity using Li method
wns.word_similarity('dog', 'cat', 'li') # 0.449327301063
# Computing Spanish word similarity using Lin method
wns.monol_word_similarity('perro', 'gato', 'spa', 'lin') #0.876800984373
# Computing Chinese word similarity using Wu & Palmer method
wns.monol_word_similarity('狗', '猫', 'cmn', 'wup') # 0.857142857143
# Computing Spanish and English word similarity using Resnik method
wns.crossl_word_similarity('perro', 'cat', 'spa', 'eng', 'res') #7.91166650904
# Computing Spanish and Chinese word similarity using Jiang & Conrad method
wns.crossl_word_similarity('perro', '猫', 'spa', 'cmn', 'jcn') #0.31023804699
# Computing Chinese and English word similarity using WPath method
wns.crossl_word_similarity('狗', 'cat', 'cmn', 'eng', 'wpath')#0.593666388463
from sematch.semantic.similarity import YagoTypeSimilarity
sim = YagoTypeSimilarity()
#Measuring YAGO concept similarity through WordNet taxonomy and corpus based information content
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Actor109765278', 'wpath') #0.642
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Singer110599806', 'wpath') #0.544
#Measuring YAGO concept similarity based on graph-based IC
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Actor109765278', 'wpath_graph') #0.423
sim.yago_similarity('http://dbpedia.org/class/yago/Dancer109989502','http://dbpedia.org/class/yago/Singer110599806', 'wpath_graph') #0.328
from sematch.semantic.graph import DBpediaDataTransform, Taxonomy
from sematch.semantic.similarity import ConceptSimilarity
concept = ConceptSimilarity(Taxonomy(DBpediaDataTransform()),'models/dbpedia_type_ic.txt')
concept.name2concept('actor')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'path')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'wup')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'li')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'res')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'lin')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'jcn')
concept.similarity('http://dbpedia.org/ontology/Actor','http://dbpedia.org/ontology/Film', 'wpath')
from sematch.semantic.similarity import EntitySimilarity
sim = EntitySimilarity()
sim.similarity('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona') #0.409923677282
sim.similarity('http://dbpedia.org/resource/Apple_Inc.','http://dbpedia.org/resource/Steve_Jobs')#0.0904545454545
sim.relatedness('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona')#0.457984139871
sim.relatedness('http://dbpedia.org/resource/Apple_Inc.','http://dbpedia.org/resource/Steve_Jobs')#0.465991132787
from sematch.evaluation import WordSimEvaluation
from sematch.semantic.similarity import WordNetSimilarity
evaluation = WordSimEvaluation()
evaluation.dataset_names()
wns = WordNetSimilarity()
# define similarity metrics
wpath = lambda x, y: wns.word_similarity_wpath(x, y, 0.8)
# evaluate similarity metrics with SimLex dataset
evaluation.evaluate_metric('wpath', wpath, 'noun_simlex')
# performa Steiger's Z significance Test
evaluation.statistical_test('wpath', 'path', 'noun_simlex')
# define similarity metrics for Spanish words
wpath_es = lambda x, y: wns.monol_word_similarity(x, y, 'spa', 'path')
# define cross-lingual similarity metrics for English-Spanish
wpath_en_es = lambda x, y: wns.crossl_word_similarity(x, y, 'eng', 'spa', 'wpath')
# evaluate metrics in multilingual word similarity datasets
evaluation.evaluate_metric('wpath_es', wpath_es, 'rg65_spanish')
evaluation.evaluate_metric('wpath_en_es', wpath_en_es, 'rg65_EN-ES')
Although the word similarity correlation measure is the standard way to evaluate the semantic similarity metrics, it relies on human judgements over word pairs which may not have same performance in real applications. Therefore, apart from word similarity evaluation, the Sematch evaluation framework also includes a simple aspect category classification. The task classifies noun concepts such as pasta, noodle, steak, tea into their ontological parent concept FOOD, DRINKS.
from sematch.evaluation import AspectEvaluation
from sematch.application import SimClassifier, SimSVMClassifier
from sematch.semantic.similarity import WordNetSimilarity
# create aspect classification evaluation
evaluation = AspectEvaluation()
# load the dataset
X, y = evaluation.load_dataset()
# define word similarity function
wns = WordNetSimilarity()
word_sim = lambda x, y: wns.word_similarity(x, y)
# Train and evaluate metrics with unsupervised classification model
simclassifier = SimClassifier.train(zip(X,y), word_sim)
evaluation.evaluate(X,y, simclassifier)
macro averge: (0.65319812882333839, 0.7101245049198579, 0.66317566364913016, None)
micro average: (0.79210167952791644, 0.79210167952791644, 0.79210167952791644, None)
weighted average: (0.80842645056024054, 0.79210167952791644, 0.79639496616636352, None)
accuracy: 0.792101679528
precision recall f1-score support
SERVICE 0.50 0.43 0.46 519
RESTAURANT 0.81 0.66 0.73 228
FOOD 0.95 0.87 0.91 2256
LOCATION 0.26 0.67 0.37 54
AMBIENCE 0.60 0.70 0.65 597
DRINKS 0.81 0.93 0.87 752
avg / total 0.81 0.79 0.80 4406
You can use Sematch to download a list of entities having a specific type using different languages. Sematch will generate SPARQL queries and execute them in DBpedia Sparql Endpoint.
from sematch.application import Matcher
matcher = Matcher()
# matching scientist entities from DBpedia
matcher.match_type('scientist')
matcher.match_type('científico', 'spa')
matcher.match_type('科学家', 'cmn')
matcher.match_entity_type('movies with Tom Cruise')
Example of automatically generated SPARQL query.
SELECT DISTINCT ?s, ?label, ?abstract WHERE {
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/NuclearPhysicist110364643> . }
UNION {
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Econometrician110043491> . }
UNION {
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Sociologist110620758> . }
UNION {
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Archeologist109804806> . }
UNION {
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/Neurolinguist110354053> . }
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
FILTER( lang(?label) = "en") .
?s <http://dbpedia.org/ontology/abstract> ?abstract .
FILTER( lang(?abstract) = "en") .
} LIMIT 5000
Apart from semantic matching of entities from DBpedia, you can also use Sematch to extract features of entities and apply semantic similarity analysis using graph-based ranking algorithms. Given a list of objects (concepts, words, entities), Sematch compute their pairwise semantic similarity and generate similarity graph where nodes denote objects and edges denote similarity scores. An example of using similarity graph for extracting important words from an entity description.
from sematch.semantic.graph import SimGraph
from sematch.semantic.similarity import WordNetSimilarity
from sematch.nlp import Extraction, word_process
from sematch.semantic.sparql import EntityFeatures
from collections import Counter
tom = EntityFeatures().features('http://dbpedia.org/resource/Tom_Cruise')
words = Extraction().extract_nouns(tom['abstract'])
words = word_process(words)
wns = WordNetSimilarity()
word_graph = SimGraph(words, wns.word_similarity)
word_scores = word_graph.page_rank()
words, scores =zip(*Counter(word_scores).most_common(10))
print words
(u'picture', u'action', u'number', u'film', u'post', u'sport',
u'program', u'men', u'performance', u'motion')
Ganggao Zhu, and Carlos A. Iglesias. "Computing Semantic Similarity of Concepts in Knowledge Graphs." IEEE Transactions on Knowledge and Data Engineering 29.1 (2017): 72-85.
Oscar Araque, Ganggao Zhu, Manuel Garcia-Amado and Carlos A. Iglesias Mining the Opinionated Web: Classification and Detection of Aspect Contexts for Aspect Based Sentiment Analysis, ICDM sentire, 2016.
Ganggao Zhu, and Carlos Angel Iglesias. "Sematch: Semantic Entity Search from Knowledge Graph." SumPre-HSWI@ ESWC. 2015.
You can post bug reports and feature requests in Github issues. Make sure to read our guidelines first. This project is still under active development approaching to its goals. The project is mainly maintained by Ganggao Zhu. You can contact him via gzhu [at] dit.upm.es
The name of Sematch is composed based on Spanish "se" and English "match". It is also the abbreviation of semantic matching because semantic similarity metrics helps to determine semantic distance of concepts, words, entities, instead of exact matching.
The logo of Sematch is based on Chinese Yin and Yang which is written in I Ching. Somehow, it correlates to 0 and 1 in computer science.
Author: Gsi-upm
Source Code: https://github.com/gsi-upm/sematch
License: View license
1641985680
This repository contains the TSFRESH python package. The abbreviation stands for
"Time Series Feature extraction based on scalable hypothesis tests".
The package provides systematic time-series feature extraction by combining established algorithms from statistics, time-series analysis, signal processing, and nonlinear dynamics with a robust feature selection algorithm. In this context, the term time-series is interpreted in the broadest possible sense, such that any types of sampled data or even event sequences can be characterised.
Data Scientists often spend most of their time either cleaning data or building features. While we cannot change the first thing, the second can be automated. TSFRESH frees your time spent on building features by extracting them automatically. Hence, you have more time to study the newest deep learning paper, read hacker news or build better models.
TSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value or more complex features such as the time reversal symmetry statistic.
The set of features can then be used to construct statistical or machine learning models on the time series to be used for example in regression or classification tasks.
Time series often contain noise, redundancies or irrelevant information. As a result most of the extracted features will not be useful for the machine learning task at hand.
To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.
It is based on the well developed theory of hypothesis testing and uses a multiple test procedure. As a result the filtering process mathematically controls the percentage of irrelevant extracted features.
The TSFRESH package is described in the following open access paper:
The FRESH algorithm is described in the following whitepaper:
Due to the fact that tsfresh basically provides time-series feature extraction for free, you can now concentrate on engineering new time-series, like e.g. differences of signals from synchronous measurements, which provide even better time-series features:
Systematic time-series features engineering allows to work with time-series samples of different lengths, because every time-series is projected into a well-defined feature space. This approach allows the design of robust machine learning algorithms in applications with missing data.
Natural language processing of written texts is an example of applying systematic time-series feature engineering to event sequences, which is described in the following open access paper:
TSFRESH has several selling points, for example
If you are interested in the technical workings, go to see our comprehensive Read-The-Docs documentation at http://tsfresh.readthedocs.io.
The algorithm, especially the filtering part are also described in the paper mentioned above.
If you have some questions or feedback you can find the developers in the gitter chatroom.
We appreciate any contributions, if you are interested in helping us to make TSFRESH the biggest archive of feature extraction methods in python, just head over to our How-To-Contribute instructions.
If you want to try out tsfresh
quickly or if you want to integrate it into your workflow, we also have a docker image available:
docker pull nbraun/tsfresh
The research and development of TSFRESH was funded in part by the German Federal Ministry of Education and Research under grant number 01IS14004 (project iPRODICT).
Author: Blue-yonder
Source Code: https://github.com/blue-yonder/tsfresh
License: MIT License
1641979320
Location Data Visualization library for Jupyter Notebooks
Create Mapbox GL JS data visualizations natively in Jupyter Notebooks with Python and Pandas. mapboxgl is a high-performance, interactive, WebGL-based data visualization tool that drops directly into Jupyter. mapboxgl is similar to Folium built on top of the raster Leaflet map library, but with much higher performance for large data sets using WebGL and Mapbox Vector Tiles.
Try out the interactive map example notebooks from the /examples directory in this repository
$ pip install mapboxgl
Documentation is on Read The Docs at https://mapbox-mapboxgl-jupyter.readthedocs-hosted.com/en/latest/.
The examples
directory contains sample Jupyter notebooks demonstrating usage.
import os
import pandas as pd
from mapboxgl.utils import create_color_stops, df_to_geojson
from mapboxgl.viz import CircleViz
# Load data from sample csv
data_url = 'https://raw.githubusercontent.com/mapbox/mapboxgl-jupyter/master/examples/data/points.csv'
df = pd.read_csv(data_url)
# Must be a public token, starting with `pk`
token = os.getenv('MAPBOX_ACCESS_TOKEN')
# Create a geojson file export from a Pandas dataframe
df_to_geojson(df, filename='points.geojson',
properties=['Avg Medicare Payments', 'Avg Covered Charges', 'date'],
lat='lat', lon='lon', precision=3)
# Generate data breaks and color stops from colorBrewer
color_breaks = [0,10,100,1000,10000]
color_stops = create_color_stops(color_breaks, colors='YlGnBu')
# Create the viz from the dataframe
viz = CircleViz('points.geojson',
access_token=token,
height='400px',
color_property = "Avg Medicare Payments",
color_stops = color_stops,
center = (-95, 40),
zoom = 3,
below_layer = 'waterway-label'
)
viz.show()
Install the python library locally with pip:
$ pip install -e .
To run tests use pytest:
$ pip install mock pytest
$ python -m pytest
To run the Jupyter examples,
$ cd examples
$ pip install jupyter
$ jupyter notebook
We follow the PEP8 style guide for Python for all Python code.
mapboxgl/__init__.py
and push directly to master.Upload the release files
twine upload dist/mapboxgl-*
Create the release files
rm dist/*
# clean out old releases if they existpython setup.py sdist bdist_wheel
Setup for pypi (one time only)
Tag the release
git tag <version>
git push --tags
After merging all relevant PRs for the upcoming release, pull the master branch
git checkout master
git pull
Author: Mapbox
Source Code: https://github.com/mapbox/mapboxgl-jupyter
License: MIT License
#python #jupyternotebook #html #datavisualizations
1641977280
Graph Nets library
Graph Nets is DeepMind's library for building graph networks in Tensorflow and Sonnet.
Contact graph-nets@google.com for comments and questions.
A graph network takes a graph as input and returns a graph as output. The input graph has edge- (E ), node- (V ), and global-level (u) attributes. The output graph has the same structure, but updated attributes. Graph networks are part of the broader family of "graph neural networks" (Scarselli et al., 2009).
To learn more about graph networks, see our arXiv paper: Relational inductive biases, deep learning, and graph networks.
The Graph Nets library can be installed from pip.
This installation is compatible with Linux/Mac OS X, and Python 2.7 and 3.4+.
The library will work with both the CPU and GPU version of TensorFlow, but to allow for that it does not list Tensorflow as a requirement, so you need to install Tensorflow separately if you haven't already done so.
To install the Graph Nets library and use it with TensorFlow 1 and Sonnet 1, run:
(CPU)
$ pip install graph_nets "tensorflow>=1.15,<2" "dm-sonnet<2" "tensorflow_probability<0.9"
(GPU)
$ pip install graph_nets "tensorflow_gpu>=1.15,<2" "dm-sonnet<2" "tensorflow_probability<0.9"
To install the Graph Nets library and use it with TensorFlow 2 and Sonnet 2, run:
(CPU)
$ pip install graph_nets "tensorflow>=2.1.0-rc1" "dm-sonnet>=2.0.0b0" tensorflow_probability
(GPU)
$ pip install graph_nets "tensorflow_gpu>=2.1.0-rc1" "dm-sonnet>=2.0.0b0" tensorflow_probability
The latest version of the library requires TensorFlow >=1.15. For compatibility with earlier versions of TensorFlow, please install v1.0.4 of the Graph Nets library.
The following code constructs a simple graph net module and connects it to data.
import graph_nets as gn
import sonnet as snt
# Provide your own functions to generate graph-structured data.
input_graphs = get_graphs()
# Create the graph network.
graph_net_module = gn.modules.GraphNetwork(
edge_model_fn=lambda: snt.nets.MLP([32, 32]),
node_model_fn=lambda: snt.nets.MLP([32, 32]),
global_model_fn=lambda: snt.nets.MLP([32, 32]))
# Pass the input graphs to the graph network, and return the output graphs.
output_graphs = graph_net_module(input_graphs)
The library includes demos which show how to create, manipulate, and train graph networks to reason about graph-structured data, on a shortest path-finding task, a sorting task, and a physical prediction task. Each demo uses the same graph network architecture, which highlights the flexibility of the approach.
To try out the demos without installing anything locally, you can run the demos in your browser (even on your phone) via a cloud Colaboratory backend. Click a demo link below, and follow the instructions in the notebook.
The "shortest path demo" creates random graphs, and trains a graph network to label the nodes and edges on the shortest path between any two nodes. Over a sequence of message-passing steps (as depicted by each step's plot), the model refines its prediction of the shortest path.
The "sort demo" creates lists of random numbers, and trains a graph network to sort the list. After a sequence of message-passing steps, the model makes an accurate prediction of which elements (columns in the figure) come next after each other (rows).
The "physics demo" creates random mass-spring physical systems, and trains a graph network to predict the state of the system on the next timestep. The model's next-step predictions can be fed back in as input to create a rollout of a future trajectory. Each subplot below shows the true and predicted mass-spring system states over 50 steps. This is similar to the model and experiments in Battaglia et al. (2016)'s "interaction networks".
The "graph nets basics demo" is a tutorial containing step by step examples about how to create and manipulate graphs, how to feed them into graph networks and how to build custom graph network modules.
To install the necessary dependencies, run:
$ pip install jupyter matplotlib scipy
To try the demos, run:
$ cd <path-to-graph-nets-library>/demos
$ jupyter notebook
then open a demo through the Jupyter notebook interface.
Check out these high-quality open-source libraries for graph neural networks:
jraph: DeepMind's GNNs/GraphNets library for JAX.
pytorch_geometric: See MetaLayer for an analog of our Graph Nets interface.
Author: Deepmind
Source Code: https://github.com/deepmind/graph_nets
License: Apache-2.0 License
1641971727
PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with an emphasis on geospatial vector data written in Python. It supports the development of high level applications for spatial analysis, such as
PySAL is a family of packages for spatial data science and is divided into four major components:
solve a wide variety of computational geometry problems including graph construction from polygonal lattices, lines, and points, construction and interactive editing of spatial weights matrices & graphs - computation of alpha shapes, spatial indices, and spatial-topological relationships, and reading and writing of sparse graph data, as well as pure python readers of spatial vector data. Unike other PySAL modules, these functions are exposed together as a single package.
libpysal
provides foundational algorithms and data structures that support the rest of the library. This currently includes the following modules: input/output (io
), which provides readers and writers for common geospatial file formats; weights (weights
), which provides the main class to store spatial weights matrices, as well as several utilities to manipulate and operate on them; computational geometry (cg
), with several algorithms, such as Voronoi tessellations or alpha shapes that efficiently process geometric shapes; and an additional module with example data sets (examples
).The explore
layer includes modules to conduct exploratory analysis of spatial and spatio-temporal data. At a high level, packages in explore
are focused on enabling the user to better understand patterns in the data and suggest new interesting questions rather than answer existing ones. They include methods to characterize the structure of spatial distributions (either on networks, in continuous space, or on polygonal lattices). In addition, this domain offers methods to examine the dynamics of these distributions, such as how their composition or spatial extent changes over time.
esda : esda
implements methods for the analysis of both global (map-wide) and local (focal) spatial autocorrelation, for both continuous and binary data. In addition, the package increasingly offers cutting-edge statistics about boundary strength and measures of aggregation error in statistical analyses
giddy : giddy
is an extension of esda
to spatio-temporal data. The package hosts state-of-the-art methods that explicitly consider the role of space in the dynamics of distributions over time
inequality : inequality
provides indices for measuring inequality over space and time. These comprise classic measures such as the Theil T information index and the Gini index in mean deviation form; but also spatially-explicit measures that incorporate the location and spatial configuration of observations in the calculation of inequality measures.
momepy : momepy
is a library for quantitative analysis of urban form - urban morphometrics. It aims to provide a wide range of tools for a systematic and exhaustive analysis of urban form. It can work with a wide range of elements, while focused on building footprints and street networks. momepy stands for Morphological Measuring in Python.
pointpats : pointpats
supports the statistical analysis of point data, including methods to characterize the spatial structure of an observed point pattern: a collection of locations where some phenomena of interest have been recorded. This includes measures of centrography which provide overall geometric summaries of the point pattern, including central tendency, dispersion, intensity, and extent.
segregation : segregation
package calculates over 40 different segregation indices and provides a suite of additional features for measurement, visualization, and hypothesis testing that together represent the state-of-the-art in quantitative segregation analysis.
spaghetti : spaghetti
supports the the spatial analysis of graphs, networks, topology, and inference. It includes functionality for the statistical testing of clusters on networks, a robust all-to-all Dijkstra shortest path algorithm with multiprocessing functionality, and high-performance geometric and spatial computations using geopandas
that are necessary for high-resolution interpolation along networks, and the ability to connect near-network observations onto the network
In contrast to explore
, the model
layer focuses on confirmatory analysis. In particular, its packages focus on the estimation of spatial relationships in data with a variety of linear, generalized-linear, generalized-additive, nonlinear, multi-level, and local regression models.
mgwr : mgwr
provides scalable algorithms for estimation, inference, and prediction using single- and multi-scale geographically-weighted regression models in a variety of generalized linear model frameworks, as well model diagnostics tools
spglm : spglm
implements a set of generalized linear regression techniques, including Gaussian, Poisson, and Logistic regression, that allow for sparse matrix operations in their computation and estimation to lower memory overhead and decreased computation time.
spint : spint
provides a collection of tools to study spatial interaction processes and analyze spatial interaction data. It includes functionality to facilitate the calibration and interpretation of a family of gravity-type spatial interaction models, including those with production constraints, attraction constraints, or a combination of the two.
spreg : spreg
supports the estimation of classic and spatial econometric models. Currently it contains methods for estimating standard Ordinary Least Squares (OLS), Two Stage Least Squares (2SLS) and Seemingly Unrelated Regressions (SUR), in addition to various tests of homokestadicity, normality, spatial randomness, and different types of spatial autocorrelation. It also includes a suite of tests for spatial dependence in models with binary dependent variables.
spvcm : spvcm
provides a general framework for estimating spatially-correlated variance components models. This class of models allows for spatial dependence in the variance components, so that nearby groups may affect one another. It also also provides a general-purpose framework for estimating models using Gibbs sampling in Python, accelerated by the numba
package.
tobler : tobler
provides functionality for for areal interpolation and dasymetric mapping. Its name is an homage to the legendary geographer Waldo Tobler a pioneer of dozens of spatial analytical methods. tobler
includes functionality for interpolating data using area-weighted approaches, regression model-based approaches that leverage remotely-sensed raster data as auxiliary information, and hybrid approaches.
access : access
aims to make it easy for analysis to calculate measures of spatial accessibility. This work has traditionally had two challenges: [1] to calculate accurate travel time matrices at scale and [2] to derive measures of access using the travel times and supply and demand locations. access
implements classic spatial access models, allowing easy comparison of methodologies and assumptions.
spopt: spopt
is an open-source Python library for solving optimization problems with spatial data. Originating from the original region
module in PySAL, it is under active development for the inclusion of newly proposed models and methods for regionalization, facility location, and transportation-oriented solutions.
The viz
layer provides functionality to support the creation of geovisualisations and visual representations of outputs from a variety of spatial analyses. Visualization plays a central role in modern spatial/geographic data science. Current packages provide classification methods for choropleth mapping and a common API for linking PySAL outputs to visualization tool-kits in the Python ecosystem.
legendgram : legendgram
is a small package that provides "legendgrams" legends that visualize the distribution of observations by color in a given map. These distributional visualizations for map classification schemes assist in analytical cartography and spatial data visualization
mapclassify : mapclassify
provides functionality for Choropleth map classification. Currently, fifteen different classification schemes are available, including a highly-optimized implementation of Fisher-Jenks optimal classification. Each scheme inherits a common structure that ensures computations are scalable and supports applications in streaming contexts.
splot : splot
provides statistical visualizations for spatial analysis. It methods for visualizing global and local spatial autocorrelation (through Moran scatterplots and cluster maps), temporal analysis of cluster dynamics (through heatmaps and rose diagrams), and multivariate choropleth mapping (through value-by-alpha maps. A high level API supports the creation of publication-ready visualizations
Installation
PySAL is available through Anaconda (in the defaults or conda-forge channel) We recommend installing PySAL from conda-forge:
conda config --add channels conda-forge
conda install pysal
PySAL can also be installed using pip:
pip install pysal
As of version 2.0.0 PySAL has shifted to Python 3 only.
Users who need an older stable version of PySAL that is Python 2 compatible can install version 1.14.3 through pip or conda:
conda install pysal==1.14.3
Documentation
For help on using PySAL, check out the following resources:
Development
As of version 2.0.0, PySAL is now a collection of affiliated geographic data science packages. Changes to the code for any of the subpackages should be directed at the respective upstream repositories, and not made here. Infrastructural changes for the meta-package, like those for tooling, building the package, and code standards, will be considered.
Development is hosted on github.
Discussions of development as well as help for users occurs on the developer list as well as gitter.
Getting Involved
If you are interested in contributing to PySAL please see our development guidelines.
Bug reports
To search for or report bugs, please see PySAL's issues.
Build Instructions
To build the meta-package pysal see tools/README.md.
Author: Pysal
Source Code: https://github.com/pysal/pysal
License: BSD-3-Clause License
1641930420
A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook.
Using conda:
conda install -c conda-forge ipyleaflet
Using pip:
pip install ipyleaflet
If you are using the classic Jupyter Notebook < 5.3 you need to run this extra command:
jupyter nbextension enable --py --sys-prefix ipyleaflet
If you are using JupyterLab <=2, you will need to install the JupyterLab extension:
jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leaflet
For a development installation (requires yarn, you can install it with conda install -c conda-forge yarn
):
git clone https://github.com/jupyter-widgets/ipyleaflet.git
cd ipyleaflet
pip install -e .
If you are using the classic Jupyter Notebook you need to install the nbextension:
jupyter nbextension install --py --symlink --sys-prefix ipyleaflet
jupyter nbextension enable --py --sys-prefix ipyleaflet
Note for developers:
-e
pip option allows one to modify the Python code in-place. Restart the kernel in order to see the changes.--symlink
argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows.For developing with JupyterLab:
jupyter labextension develop --overwrite ipyleaflet
To get started with using ipyleaflet
, check out the full documentation
https://ipyleaflet.readthedocs.io/
The ipyleaflet
repository includes the jupyter-leaflet
npm package, which is a front-end component, and the ipyleaflet
python package which is the backend for the Python Jupyter kernel.
Similarly, the xleaflet
project provides a backend to jupyter-leaflet
for the "xeus-cling" C++ Jupyter kernel.
Author: Jupyter-widgets/
Source Code: https://github.com/jupyter-widgets/ipyleaflet
License: MIT License
1641884100
Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here.
VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:
Reproducible implementation of SOTA in Self-Supervision: All existing SOTA in Self-Supervision are implemented - SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, Jigsaw. Also supports supervised trainings.
Benchmark suite: Variety of benchmarks tasks including linear image classification (places205, imagenet1k, voc07, food, CLEVR, dsprites, UCF101, stanford cars and many more), full finetuning, semi-supervised benchmark, nearest neighbor benchmark, object detection (Pascal VOC and COCO).
Ease of Usability: easy to use using yaml configuration system based on Hydra.
Modular: Easy to design new tasks and reuse the existing components from other tasks (objective functions, model trunk and heads, data transforms, etc.). The modular components are simple drop-in replacements in yaml config files.
Scalability: Easy to train model on 1-gpu, multi-gpu and multi-node. Several components for large scale trainings provided as simple config file plugs: Activation checkpointing, ZeRO, FP16, LARC, Stateful data sampler, data class to handle invalid images, large model backbones like RegNets, etc.
Model Zoo: Over 60 pre-trained self-supervised model weights.
See INSTALL.md
.
Install VISSL by following the installation instructions. After installation, please see Getting Started with VISSL and the Colab Notebook to learn about basic usage.
Learn more about VISSL at our documentation. And see the projects/ for some projects built on top of VISSL.
Get started with VISSL by trying one of the Colab tutorial notebooks.
We provide a large set of baseline results and trained models available for download in the VISSL Model Zoo.
VISSL is written and maintained by the Facebook AI Research.
We welcome new contributions to VISSL and we will be actively maintaining this library! Please refer to CONTRIBUTING.md
for full instructions on how to run the code, tests and linter, and submit your pull requests.
If you find VISSL useful in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.
@misc{goyal2021vissl,
author = {Priya Goyal and Quentin Duval and Jeremy Reizenstein and Matthew Leavitt and Min Xu and
Benjamin Lefaudeux and Mannat Singh and Vinicius Reis and Mathilde Caron and Piotr Bojanowski and
Armand Joulin and Ishan Misra},
title = {VISSL},
howpublished = {\url{https://github.com/facebookresearch/vissl}},
year = {2021}
}
Author: Facebookresearch
Source Code: https://github.com/facebookresearch/vissl
License: MIT License
1641863280
skift
scikit-learn
wrappers for Python fastText
.
>>> from skift import FirstColFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstColFtClassifier(lr=0.3, epoch=10)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
Contents
Dependencies:
numpy
scipy
scikit-learn
fasttext
Python packagepip install skift
Because fasttext
reads input data from files, skift
has to dump the input data into temporary files for fasttext
to use. A dedicated folder is created for those files on the filesystem. By default, this storage is allocated in the system temporary storage location (i.e. /tmp on *nix systems). To override this default location, use the SKIFT_TEMP_DIR
environment variable:
export SKIFT_TEMP_DIR=/path/to/desired/temp/folder
NOTE: The directory will be created if it does not already exist.
scikit-learn
classifier API, including predict_proba
.pandas.DataFrame
inputs.fastText
with other types of scikit-learn
-compliant classifiers.fastText
works only on text data, which means that it will only use a single column from a dataset which might contain many feature columns of different types. As such, a common use case is to have the fastText
classifier use a single column as input, ignoring other columns. This is especially true when fastText
is to be used as one of several classifiers in a stacking classifier, with other classifiers using non-textual features.
skift
includes several scikit-learn
-compatible wrappers (for the official fastText
Python package) which cater to these use cases.
NOTICE: Any additional keyword arguments provided to the classifier constructor, besides those required, will be forwarded to the fastText.train_supervised
method on every call to fit
.
These wrappers do not make additional assumptions on input besides those commonly made by scikit-learn
classifies; i.e. that input is a 2d ndarray
object and such.
FirstColFtClassifier
- An sklearn classifier adapter for fasttext that takes the first column of input ndarray
objects as input.>>> from skift import FirstColFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstColFtClassifier(lr=0.3, epoch=10)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
IdxBasedFtClassifier
- An sklearn classifier adapter for fasttext that takes input by column index. This is set on object construction by providing the input_ix
parameter to the constructor.>>> from skift import IdxBasedFtClassifier
>>> df = pandas.DataFrame([[5, 'woof', 0], [83, 'meow', 1]], columns=['count', 'txt', 'lbl'])
>>> sk_clf = IdxBasedFtClassifier(input_ix=1, lr=0.4, epoch=6)
>>> sk_clf.fit(df[['count', 'txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
These wrappers assume the X
parameter given to fit
, predict
, and predict_proba
methods is a pandas.DataFrame
object:
FirstObjFtClassifier
- An sklearn adapter for fasttext using the first column of dtype == object
as input.>>> from skift import FirstObjFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = FirstObjFtClassifier(lr=0.2)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
ColLblBasedFtClassifier
- An sklearn adapter for fasttext taking input by column label. This is set on object construction by providing the input_col_lbl
parameter to the constructor.>>> from skift import ColLblBasedFtClassifier
>>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], columns=['txt', 'lbl'])
>>> sk_clf = ColLblBasedFtClassifier(input_col_lbl='txt', epoch=8)
>>> sk_clf.fit(df[['txt']], df['lbl'])
>>> sk_clf.predict([['woof']])
[0]
SeriesFtClassifier
- An sklearn adapter for fasttext taking a Pandas Series as input.Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed.
Clone:
git clone git@github.com:shaypal5/skift.git
Install in development mode, including test dependencies:
cd skift
pip install -e '.[test]'
To also install fasttext
, see instructions in the Installation section.
To run the tests use:
cd skift
pytest
The project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings. When documenting code you add to this project, follow these conventions.
Additionally, if you update this README.rst
file, use python setup.py checkdocs
to validate it compiles.
Created by Shay Palachy (shay.palachy@gmail.com).
Fixes: uniaz, crouffer, amirzamli and sgt.
Author: Shaypal5
Source Code: https://github.com/shaypal5/skift
License: View license
1641862500
PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).
The latest stable version of PyKEEN can be downloaded and installed from PyPI with:
$ pip install pykeen
The latest version of PyKEEN can be installed directly from the source on GitHub with:
$ pip install git+https://github.com/pykeen/pykeen.git
More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.
This example shows how to train a model on a dataset and test on another dataset.
The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE',
dataset='nations',
)
The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.
PyKEEN is extensible such that:
pykeen.models
can be dropped inpykeen.training.LCWATrainingLoop
can be dropped infrom pykeen.triples.TriplesFactory
The full documentation can be found at https://pykeen.readthedocs.io.
Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen
.
The following datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.
Name | Reference | Description |
---|---|---|
Binary cross entropy (after sigmoid) | pykeen.losses.BCEAfterSigmoidLoss | A module for the numerically unstable version of explicit Sigmoid + BCE loss. |
Binary cross entropy (with logits) | pykeen.losses.BCEWithLogitsLoss | A module for the binary cross entropy loss. |
Cross entropy | pykeen.losses.CrossEntropyLoss | A module for the cross entropy loss that evaluates the cross entropy after softmax output. |
Double Margin | pykeen.losses.DoubleMarginLoss | A limit-based scoring loss, with separate margins for positive and negative elements from [sun2018]_. |
Focal | pykeen.losses.FocalLoss | A module for the focal loss proposed by [lin2018]_. |
Margin ranking | pykeen.losses.MarginRankingLoss | A module for the pairwise hinge loss (i.e., margin ranking loss). |
Mean square error | pykeen.losses.MSELoss | A module for the mean square error loss. |
Self-adversarial negative sampling | pykeen.losses.NSSALoss | An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_. |
Pairwise logistic | pykeen.losses.PairwiseLogisticLoss | The pairwise logistic loss. |
Pointwise Hinge | pykeen.losses.PointwiseHingeLoss | A module for the pointwise hinge loss. |
Soft margin ranking | pykeen.losses.SoftMarginRankingLoss | A module for the soft pairwise hinge loss (i.e., soft margin ranking loss). |
Softplus | pykeen.losses.SoftplusLoss | A module for the pointwise logistic loss (i.e., softplus loss). |
Soft Pointwise Hinge | pykeen.losses.SoftPointwiseHingeLoss | A module for the soft pointwise hinge loss . |
Name | Reference | Description |
---|---|---|
combined | pykeen.regularizers.CombinedRegularizer | A convex combination of regularizers. |
lp | pykeen.regularizers.LpRegularizer | A simple L_p norm based regularizer. |
no | pykeen.regularizers.NoRegularizer | A regularizer which does not perform any regularization. |
powersum | pykeen.regularizers.PowerSumRegularizer | A simple x^p based regularizer. |
transh | pykeen.regularizers.TransHRegularizer | A regularizer for the soft constraints in TransH. |
Name | Reference | Description |
---|---|---|
lcwa | pykeen.training.LCWATrainingLoop | A training loop that is based upon the local closed world assumption (LCWA). |
slcwa | pykeen.training.SLCWATrainingLoop | A training loop that uses the stochastic local closed world assumption training approach. |
Name | Reference | Description |
---|---|---|
basic | pykeen.sampling.BasicNegativeSampler | A basic negative sampler. |
bernoulli | pykeen.sampling.BernoulliNegativeSampler | An implementation of the Bernoulli negative sampling approach proposed by [wang2014]_. |
pseudotyped | pykeen.sampling.PseudoTypedNegativeSampler | A sampler that accounts for which entities co-occur with a relation. |
Name | Reference | Description |
---|---|---|
early | pykeen.stoppers.EarlyStopper | A harness for early stopping. |
nop | pykeen.stoppers.NopStopper | A stopper that does nothing. |
Name | Reference | Description |
---|---|---|
classification | pykeen.evaluation.ClassificationEvaluator | An evaluator that uses a classification metrics. |
rankbased | pykeen.evaluation.RankBasedEvaluator | A rank-based evaluator for KGE models. |
Name | Interval | Direction | Description | Type |
---|---|---|---|---|
AUC-ROC | [0, 1] | 📈 | Area Under the ROC Curve | Classification |
Accuracy | [0, 1] | 📈 | (TP + TN) / (TP + TN + FP + FN) | Classification |
Average Precision | [0, 1] | 📈 | A summary statistic over the precision-recall curve | Classification |
Balanced Accuracy | [0, 1] | 📈 | An adjusted version of the accuracy for imbalanced datasets | Classification |
Diagnostic Odds Ratio | [0, inf) | 📈 | LR+/LR- | Classification |
F1 Score | [0, 1] | 📈 | 2TP / (2TP + FP + FN) | Classification |
False Discovery Rate | [0, 1] | 📉 | FP / (FP + TP) | Classification |
False Negative Rate | [0, 1] | 📉 | FN / (FN + TP) | Classification |
False Omission Rate | [0, 1] | 📉 | FN / (FN + TN) | Classification |
False Positive Rate | [0, 1] | 📉 | FP / (FP + TN) | Classification |
Fowlkes Mallows Index | [0, 1] | 📈 | √PPV x √TPR | Classification |
Informedness | [0, 1] | 📈 | TPR + TNR - 1 | Classification |
Markedness | [0, 1] | 📈 | PPV + NPV - 1 | Classification |
Matthews Correlation Coefficient | [-1, 1] | 📈 | A balanced measure applicable even with class imbalance | Classification |
Negative Likelihood Ratio | [0, inf) | 📉 | FNR / TNR | Classification |
Negative Predictive Value | [0, 1] | 📈 | TN / (TN + FN) | Classification |
Positive Likelihood Ratio | [0, inf) | 📈 | TPR / FPR | Classification |
Positive Predictive Value | [0, 1] | 📈 | TP / (TP + FP) | Classification |
Prevalence Threshold | [0, 1] | 📉 | √FPR / (√TPR + √FPR) | Classification |
Threat Score | [0, 1] | 📈 | TP / (TP + FN + FP) | Classification |
True Negative Rate | [0, 1] | 📈 | TN / (TN + FP) | Classification |
True Positive Rate | [0, 1] | 📈 | TP / (TP + FN) | Classification |
Adjusted Arithmetic Mean Rank (AAMR) | (0, 2) | 📉 | The mean over all chance-adjusted ranks. | Ranking |
Adjusted Arithmetic Mean Rank Index (AAMRI) | [-1, 1] | 📈 | The re-indexed adjusted mean rank (AAMR) | Ranking |
Geometric Mean Rank (GMR) | [1, inf) | 📉 | The geometric mean over all ranks. | Ranking |
Harmonic Mean Rank (HMR) | [1, inf) | 📉 | The harmonic mean over all ranks. | Ranking |
Hits @ K | [0, 1] | 📈 | The relative frequency of ranks not larger than a given k. | Ranking |
Inverse Arithmetic Mean Rank (IAMR) | (0, 1] | 📈 | The inverse of the arithmetic mean over all ranks. | Ranking |
Inverse Geometric Mean Rank (IGMR) | (0, 1] | 📈 | The inverse of the geometric mean over all ranks. | Ranking |
Inverse Median Rank | (0, 1] | 📈 | The inverse of the median over all ranks. | Ranking |
Mean Rank (MR) | [1, inf) | 📉 | The arithmetic mean over all ranks. | Ranking |
Mean Reciprocal Rank (MRR) | (0, 1] | 📈 | The inverse of the harmonic mean over all ranks. | Ranking |
Median Rank | [1, inf) | 📉 | The median over all ranks. | Ranking |
Name | Reference | Description |
---|---|---|
console | pykeen.trackers.ConsoleResultTracker | A class that directly prints to console. |
csv | pykeen.trackers.CSVResultTracker | Tracking results to a CSV file. |
json | pykeen.trackers.JSONResultTracker | Tracking results to a JSON lines file. |
mlflow | pykeen.trackers.MLFlowResultTracker | A tracker for MLflow. |
neptune | pykeen.trackers.NeptuneResultTracker | A tracker for Neptune.ai. |
python | pykeen.trackers.PythonResultTracker | A tracker which stores everything in Python dictionaries. |
tensorboard | pykeen.trackers.TensorBoardResultTracker | A tracker for TensorBoard. |
wandb | pykeen.trackers.WANDBResultTracker | A tracker for Weights and Biases. |
PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:
$ pykeen experiments reproduce tucker balazevic2019 fb15k
Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d
.
PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:
$ pykeen experiments ablation ~/path/to/config.json
We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:
@article{ali2020benchmarking,
title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
journal={arXiv preprint arXiv:2006.13365},
year={2020}
}
We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
This project has been supported by several organizations (in alphabetical order):
The development of PyKEEN has been funded by the following grants:
Funding Body | Program | Grant |
---|---|---|
DARPA | Automating Scientific Knowledge Extraction (ASKE) | HR00111990009 |
German Federal Ministry of Education and Research (BMBF) | Maschinelles Lernen mit Wissensgraphen (MLWin) | 01IS18050D |
German Federal Ministry of Education and Research (BMBF) | Munich Center for Machine Learning (MCML) | 01IS18036A |
Innovation Fund Denmark (Innovationsfonden) | Danish Center for Big Data Analytics driven Innovation (DABAI) | Grand Solutions |
The PyKEEN logo was designed by Carina Steinborn
If you have found PyKEEN useful in your work, please consider citing our article:
@article{ali2021pykeen,
author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
journal = {Journal of Machine Learning Research},
number = {82},
pages = {1--6},
title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
url = {http://jmlr.org/papers/v22/20-825.html},
volume = {22},
year = {2021}
}
Author: Pykeen
Source Code: https://github.com/pykeen/pykeen#installation--
License: MIT License