1679887042
This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).
dolly
repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git
, then click Create Repo).12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12)
single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4
or p4d.24xlarge
).train_dolly
notebook in the dolly
repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training
.pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh
Author: Databrickslabs
Source Code: https://github.com/databrickslabs/dolly
License: Apache-2.0 license
1679562480
Writing a notebook is not just about writing the final document — Pluto empowers the experiments and discoveries that are essential to getting there.
Explore models and share results in a notebook that is
🎈 Pluto demo inside your browser 🎈
A Pluto notebook is made up of small blocks of Julia code (cells) and together they form a reactive notebook. When you change a variable, Pluto automatically re-runs the cells that refer to it. Cells can even be placed in arbitrary order - intelligent syntax analysis figures out the dependencies between them and takes care of execution.
Cells can contain arbitrary Julia code, and you can use external libraries. There are no code rewrites or wrappers, Pluto just looks at your code once before evaluation.
Your notebooks are saved as pure Julia files (sample), which you can then import as if you had been programming in a regular editor all along. You can also export your notebook with cell outputs as attractive HTML and PDF documents. By reordering cells and hiding code, you have full control over how you tell your story.
Pluto offers an environment where changed code takes effect instantly and where deleted code leaves no trace. Unlike Jupyter or Matlab, there is no mutable workspace, but rather, an important guarantee:
At any instant, the program state is completely described by the code you see.
No hidden state, no hidden bugs.
Your programming environment becomes interactive by splitting your code into multiple cells! Changing one cell instantly shows effects on all other cells, giving you a fast and fun way to experiment with your model.
In the example below, changing the parameter A
and running the first cell will directly re-evaluate the second cell and display the new plot.
Pluto uses syntax analysis to understand which packages are being used in a notebook, and it automatically manages a package environment for your notebook. You no longer need to install packages, you can directly import any registered package like Plots
or DataFrames
and use it.
To ensure reproducibility, the information to exactly reproduce the package environment is stored in your notebook file. When someone else opens your notebook with Pluto, the exact same package environment will be used, and packages will work on their computer, automatically! more info
Lastly, here's _one more feature_: Pluto notebooks have a @bind
macro to create a live bond between an HTML object and a Julia variable. Combined with reactivity, this is a very powerful tool!
You don't need to know HTML to use it! The PlutoUI package contains basic inputs like sliders and buttons. Pluto's interactivity is very easy to use, you will learn more from the featured notebooks inside Pluto!
But for those who want to dive deeper - you can use HTML, JavaScript and CSS to write your own widgets! Custom update events can be fired by dispatching a new CustomEvent("input")
, making it compatible with the viewof
operator of observablehq. Have a look at the JavaScript featured notebook inside Pluto!
Pluto was developed alongside the free online course Introduction to Computational Thinking at MIT, with the goal of creating a programming environment that is powerful, helpful and interactive, without being too intimidating for students and teachers.
Are you interested in using Pluto for your class? Here are some presentations by people who are using it already: the MIT team, Gerhard Dorn, Daniel Molina, Henki W. Ashadi and Max Köhler.
https://user-images.githubusercontent.com/6933510/134824521-7cefa38a-7102-4767-bee4-777caf30ba47.mp4
(video) Grant Sanderson (3Blue1Brown) using Pluto's interactivity to teach Computational Thinking at MIT!
Let's do it!
For one tasty notebook 🥞 you will need:
🎈 How to install Julia & Pluto (6 min) 🎈
Run Julia, enter ]
to bring up Julia's package manager, and add the Pluto package:
julia> ]
(v1.7) pkg> add Pluto
Press Ctrl+C
to return to the julia>
prompt.
To run Pluto, run the following commands in your Julia REPL:
julia> import Pluto
julia> Pluto.run()
Pluto will open in your browser, and you can get started!
Questions? Have a look at the FAQ.
Interested in learning Julia, Pluto and applied mathematics? Join the open MIT course taught by Alan Edelman, David P. Sanders & Grant Sanderson (3blue1brown) (and a bit of me): Introduction to Computational Thinking, Spring 2021.
Follow these instructions to start working on the package.
Unless otherwise specified, the included featured notebooks have a more permissive license: the Unlicense. This means that you can use them however you like - you do not need to credit us!
Your notebook files are yours, you also do not need to credit us. Have fun!
The Pluto project is an ambition to rethink what a programming environment should be. We believe that scientific computing can be a lot simpler and more accessible. If you feel the same, give Pluto a try! We would love to hear what you think. 😊
pluto.jl
stream)Questions? Have a look at the FAQ.
🎈 Pluto – introduction (20 min) at Juliacon 2020 🎈
🌐 Pluto – one year later (25 min) at Juliacon 2021 🌐
Author: fonsp
Source Code: https://github.com/fonsp/Pluto.jl
License: MIT license
1678564140
Packages such plotly, tfjs-vis & danfo.js support rich visualization only in the browser, however, this extension leverages the power of Notebooks to provide the same rich visualizations when targeting node.js.
Use the command Open a sample node.js notebook
to open a sample notebook to get started with plotly.js, danfo.js, tensorflow.js, etc.
Open Node.js REPL
*.nnb
, e.g. sample.nnb
New File...
to create a Node.js notebook
Open a sample node.js notebook
to open a sample notebook.Welcome: Open Walkthrough...
to checkout the samples.Thanks to the various packages we provide integrations with which help make this extension useful:
Author: DonJayamanne
Source Code: https://github.com/DonJayamanne/typescript-notebook
License: MIT license
1678156140
kb is a text-oriented minimalist command line knowledge base manager. kb can be considered a quick note collection and access tool oriented toward software developers, penetration testers, hackers, students or whoever has to collect and organize notes in a clean way. Although kb is mainly targeted on text-based note collection, it supports non-text files as well (e.g., images, pdf, videos and others).
The project was born from the frustration of trying to find a good way to quickly access my notes, procedures, cheatsheets and lists (e.g., payloads) but at the same time, keeping them organized. This is particularly useful for any kind of student. I use it in the context of penetration testing to organize pentesting procedures, cheatsheets, payloads, guides and notes.
I found myself too frequently spending time trying to search for that particular payload list quickly, or spending too much time trying to find a specific guide/cheatsheet for a needed tool. kb tries to solve this problem by providing you a quick and intuitive way to access knowledge.
In few words kb allows a user to quickly and efficiently:
Basically, kb provides a clean text-based way to organize your knowledge.
You should have Python 3.6 or above installed.
To install the most recent stable version of kb just type:
pip install -U kb-manager
If you want to install the bleeding-edge version of kb (that may have some bugs) you should do:
git clone https://github.com/gnebbia/kb
cd kb
pip install -r requirements.txt
python setup.py install
# or with pip
pip install -U git+https://github.com/gnebbia/kb
Tip for GNU/Linux and MacOS users: For a better user experience, also set the following kb bash aliases:
cat <<EOF > ~/.kb_alias
alias kbl="kb list"
alias kbe="kb edit"
alias kba="kb add"
alias kbv="kb view"
alias kbd="kb delete --id"
alias kbg="kb grep"
alias kbt="kb list --tags"
EOF
echo "source ~/.kb_alias" >> ~/.bashrc
source ~/.kb_alias
Please remember to upgrade kb frequently by doing:
pip install -U kb-manager
Arch Linux users can install kb or kb-git with their favorite AUR Helper.
Stable:
yay -S kb
Dev:
yay -S kb-git
Of course it runs on NetBSD (and on pkgsrc). We can install it from pkgsrc source tree (databases/py-kb) or as a binary package using pkgin:
pkgin in py38-kb
Note that at the moment the package is only available from -current repositories.
To install using homebrew, use:
brew tap gnebbia/kb https://github.com/gnebbia/kb.git
brew install gnebbia/kb/kb
To upgrade with homebrew:
brew update
brew upgrade gnebbia/kb/kb
Windows users should keep in mind these things:
EDITOR=C:\Program Files\Editor\my cool editor.exe -> WRONG!
EDITOR="C:\Program Files\Editor\my cool editor.exe" -> OK!
To set the "EDITOR" Environment variable by using cmd.exe, just issue the following commands, after having inserted the path to your desired text editor:
set EDITOR="C:\path\to\editor\here.exe"
setx EDITOR "\"C:\path\to\editor\here.exe\""
To set the "EDITOR" Environment variable by using Powershell, just issue the following commands, after having inserted the path to your desired text editor:
$env:EDITOR='"C:\path\to\editor\here.exe"'
[System.Environment]::SetEnvironmentVariable('EDITOR','"C:\path\to\editor\here.exe"', [System.EnvironmentVariableTarget]::User)
Open a cmd.exe terminal with administrative rights and paste the following commands:
reg add "HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor" /v "AutoRun" /t REG_EXPAND_SZ /d "%USERPROFILE%\autorun.cmd"
(
echo @echo off
echo doskey kbl=kb list $*
echo doskey kbe=kb edit $*
echo doskey kba=kb add $*
echo doskey kbv=kb view $*
echo doskey kbd=kb delete --id $*
echo doskey kbg=kb grep $*
echo doskey kbt=kb list --tags $*
)> %USERPROFILE%\autorun.cmd
Open a Powershell terminal and paste the following commands:
@'
function kbl { kb list $args }
function kbe { kb edit $args }
function kba { kb add $args }
function kbv { kb view $args }
function kbd { kb delete --id $args }
function kbg { kb grep $args }
function kbt { kb list --tags $args }
'@ > $env:USERPROFILE\Documents\WindowsPowerShell\profile.ps1
A docker setup has been included to help with development.
To install and start the project with docker:
docker-compose up -d
docker-compose exec kb bash
The container has the aliases included in its .bashrc
so you can use kb in the running container as you would if you installed it on the host directly. The ./docker/data
directory on the host is bound to /data
in the container, which is the image's working directly also. To interact with the container, place (or symlink) the files on your host into the ./docker/data
directory, which can then be seen and used in the /data
directory in the container.
A quick demo of a typical scenario using kb:
A quick demo with kb aliases enabled:
A quick demo for non-text documents:
kb list
# or if aliases are used:
kbl
kb list zip
# or if aliases are used:
kbl zip
kb list --category cheatsheet
# or
kb list -c cheatsheet
# or if aliases are used:
kbl -c cheatsheet
kb list --tags "web;pentest"
# or if aliases are used:
kbl --tags "web;pentest"
kb list -v
# or if aliases are used:
kbl -v
kb add ~/Notes/cheatsheets/pytest
# or if aliases are used:
kba ~/Notes/cheatsheets/pytest
kb add ~/ssh_tunnels --title pentest_ssh --category "procedure" \
--tags "pentest;network" --author "gnc" --status "draft"
kb add ~/Notes/cheatsheets/general/* --category "cheatsheet"
kb add --title "ftp" --category "notes" --tags "protocol;network"
# a text editor ($EDITOR) will be launched for editing
kb add --title "my_network_scan" --category "scans" --body "$(nmap -T5 -p80 192.168.1.0/24)"
kb delete --id 2
# or if aliases are used:
kbd 2
kb delete --id 2 3 4
# or if aliases are used:
kbd 2 3 4
kb delete --title zap --category cheatsheet
kb view --id 3
# or
kb view -i 3
# or
kb view 3
# or if aliases are used:
kbv 3
kb view --title "gobuster"
# or
kb view -t "gobuster"
# or
kb view gobuster
kb view -t dirb -n
kb view -i 2 -e
# or if aliases are used:
kbv 2 -e
Editing artifacts involves opening a text editor. Hence, binary files cannot be edited by kb.
The editor can be set by the "EDITOR" environment variable.
kb edit --id 13
# or
kbe 13
# or if aliases are used:
kbe 13
kb edit --title "git" --category "cheatsheet"
# or
kb edit -t "git" -c "cheatsheet"
# or if git is unique as artifact
kb edit git
kb grep "[bg]zip"
# or if aliases are used:
kbg "[bg]zip"
kb grep -i "[BG]ZIP"
kb grep -v "[bg]zip"
kb grep -m "[bg]zip"
To export the entire knowledge base, do:
kb export
This will generate a .kb.tar.gz archive that can be later be imported by kb.
If you want to export only data (so that it can be used in other software):
kb export --only-data
This will export a directory containing a subdirectory for each category and within these subdirectories we will have all the artifacts belonging to that specific category.
kb import archive.kb.tar.gz
NOTE: Importing a knowledge base erases all the previous data. Basically it erases everything and imports the new knowledge base.
kb erase
kb supports custom templates for the artifacts. A template is basically a file using the "toml" format, structured in this way:
TITLES = [ "^#.*", "blue", ]
WARNINGS = [ "!.*" , "yellow",]
COMMENTS = [ ";;.*", "green", ]
Where the first element of each list is a regex and the second element is a color.
Note that by default an artifact is assigned with the 'default' template, and this template can be changed too (look at "Edit a template" subsection).
To list all available templates:
kb template list
To list all the templates containing the string "theory":
kb template list "theory"
Create a new template called "lisp-cheatsheets", note that an example template will be put as example in the editor.
kb template new lisp-cheatsheets
To delete the template called "lisp-cheatsheets" just do:
kb template delete lisp-cheatsheets
To edit the template called "listp-cheatsheets" just do:
kb template edit lisp-cheatsheets
We can also add a template from an already existing toml configuration file by just doing:
kb template add ~/path/to/myconfig.toml --title myconfig
We can change the template for an existing artifact by ID by using the update command:
kb update --id 2 --template "lisp-cheatsheets"
We can apply the template "lisp-cheatsheets" to all artifacts belonging to the category "lispcode" by doing:
kb template apply "lisp-cheatsheets" --category "lispcode"
We can apply the template "dark" to all artifacts having in their title the string "zip" (e.g., bzip, 7zip, zipper) by doing:
kb template apply "dark" --title "zip" --extended-match
# or
kb template apply "dark" --title "zip" -m
We can always have our queries to "contain" the string by using the --extended-match
option when using kb template apply
.
We can apply the template "light" to all artifacts of the category "cheatsheet" who have as author "gnc" and as status "OK" by doing:
kb template apply "light" --category "cheatsheet" --author "gnc" --status "OK"
kb can be integrated with other tools.
We can integrate kb with rofi, a custom mode has been developed accessible in the "misc" directory within this repository.
We can launch rofi with this mode by doing:
rofi -show kb -modi kb:/path/to/rofi-kb-mode.sh
Synchronization with a remote git repository is experimental at the moment. Anyway we can initialize our knowledge base to a created empty github/gitlab (other git service) repository by doing:
kb sync init
We can then push our knowledge base to the remote git repository with:
kb sync push
We can pull (e.g., from another machine) our knowledge base from the remote git repository with:
kb sync pull
We can at any time view to what remote endpoint our knowledge is synchronizing to with:
kb sync info
If you want to upgrade kb to the most recent stable release do:
pip install -U kb-manager
If instead you want to update kb to the most recent release (that may be bugged), do:
git clone https://github.com/gnebbia/kb
cd kb
pip install --upgrade .
Q) How do I solve the AttributeError: module 'attr' has no attribute 's'
error?
A) Uninstall attr and use attrs:
pip uninstall attr
pip uninstall attrs
pip install attrs
pip install -U kb-manager
Date: 2022-09-21
Version: 0.1.7
Author: Gnebbia
Source Code: https://github.com/gnebbia/kb
License: GPL-3.0 license
1677899040
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.10 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files. You can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:
pip install git+https://github.com/openai/whisper.git
To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust
installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during the pip install
command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH
environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH"
. If the installation fails with No module named 'setuptools_rust'
, you need to install setuptools_rust
, e.g. by running:
pip install setuptools-rust
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
base | 74 M | base.en | base | ~1 GB | ~16x |
small | 244 M | small.en | small | ~2 GB | ~6x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1550 M | N/A | large | ~10 GB | 1x |
The .en
models for English-only applications tend to perform better, especially for the tiny.en
and base.en
models. We observed that the difference becomes less significant for the small.en
and medium.en
models.
Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the large-v2
model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D in the paper. The smaller, the better.
The following command will transcribe speech in audio files, using the medium
model:
whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small
model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language
option:
whisper japanese.wav --language Japanese
Adding --task translate
will translate the speech into English:
whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:
whisper --help
See tokenizer.py for the list of all available languages.
Transcription can also be performed within Python:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe()
method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of whisper.detect_language()
and whisper.decode()
which provide lower-level access to the model.
import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.
Author: Openai
Source Code: https://github.com/openai/whisper
License: MIT license
1675910400
Querybook is a Big Data IDE that allows you to discover, create, and share data analyses, queries, and tables.
Features
Getting started
Please install Docker before trying out Querybook.
Pull this repo and run make
. Visit https://localhost:10001 when the build completes.
For more details on installation, click here
For infrastructure configuration, click here For general configuration, click here
Can be used to fetch schema and table information for metadata enrichment.
Use one of the following to store query results.
Upload query results from Querybook to other tools for further analyses.
Get notified upon completion of queries and DataDoc invitations via IM or email.
User Interface
Query Editor
Charting
Lineage & Analytics
Contributing Back
See CONTRIBUTING.
Check out the full documentation & feature highlights here.
Author: Pinterest
Source Code: https://github.com/pinterest/querybook
License: Apache-2.0 license
1675707840
Community repository of bamboolib
This is the community repository of bamboolib. You can use bamboolib for free if you use bamboolib on your local computer or on Open Data via Binder.
bamboolib is a GUI for pandas DataFrames that enables anyone to work with Python in Jupyter Notebook or JupyterLab.
Install bamboolib for Jupyter Notebook or Jupyter Lab by running the code below in your terminal (or Anaconda Prompt for Windows):
pip install bamboolib
# Jupyter Notebook extensions
python -m bamboolib install_nbextensions
# JupyterLab extensions
python -m bamboolib install_labextensions
After you have installed bamboolib, you can go here to test bamboolib.
You find out how to get started along with tutorials and an API reference on our docs.
bamboolib is joining forces with Databricks. For more information, please read our announcement.
Please note that this repository does not contain the source code of bamboolib. The repo contains e.g. explanations and code samples for plugins and it serves as a place to answer public questions via issues.
Author: tkrabel
Source Code: https://github.com/tkrabel/bamboolib
1675657760
pip uninstall mopp
For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.
export JUPYTER_TEXT2CODE_MODE="cpu"
sudo apt-get install libopenblas-dev libomp-dev
git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main
pip uninstall jupyter-text2code
jupyter notebook
Nbextensions
tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
notebooks/ctds.ipynb
notebook for testingtensorflow_hub
Terminal
Icon which appears on the menu (to activate the extension)We have published CPU and GPU images to docker hub with all dependencies pre-installed.
Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB
The plugin now supports pandas commands + quick snippet insertion of available snippets from awesome-notebooks. With this change, we can now get snippets for most popular integrations from within the jupyter tab. eg:
paraphrase-MiniLM-L6-v2
ner_templates
with a new intent_idgenerate_training_data.py
if different generation techniques are needed or if introducing a new entity.jupyter_text2code/jupyter_text2code_serverextension/__init__.py
with new intent's condition and add actual code for the intentpip install .
Author: Deepklarity
Source Code: https://github.com/deepklarity/jupyter-text2code
License: MIT license
1673742780
An open-source framework to evaluate, test and monitor ML models in production.
📊 What is Evidently?
Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production.
Evidently has a modular approach with 3 interfaces on top of the shared metrics
functionality.
Tests perform structured data and ML model quality checks. They verify a condition and return an explicit pass or fail result.
You can create a custom Test Suite from 50+ individual tests or run a preset (for example, Data Drift or Regression Performance). You can get results as an interactive visual dashboard inside Jupyter notebook or Colab, or export as JSON or Python dictionary.
Tests are best for automated batch model checks. You can integrate them as a pipeline step using tools like Airlfow.
Note We added a new Report object starting from v0.1.57.dev0. Reports unite the functionality of Dashboards and JSON profiles with a new, cleaner API. You can still use the old Dashboards API but it will soon be depreciated.
Reports calculate various data and ML metrics and render rich visualizations. You can create a custom Report or run a preset to evaluate a specific aspect of the model or data performance. For example, a Data Quality or Classification Performance report.
You can get an HTML report (best for exploratory analysis and debugging) or export results as JSON or Python dictionary (best for logging, documention or to integrate with BI tools).
Note This functionality is in development and subject to API change.
Evidently has monitors
that collect data and model metrics from a deployed ML service. You can use it to build live monitoring dashboards. Evidently configures the monitoring on top of streaming data and emits the metrics in Prometheus format. There are pre-built Grafana dashboards to visualize them.
👩💻 Installing from PyPI
Evidently is available as a PyPI package. To install it using pip package manager, run:
$ pip install evidently
If you only want to get results as HTML or JSON files, the installation is now complete. To display the dashboards inside a Jupyter notebook, you need jupyter nbextension
. After installing evidently
, run the two following commands in the terminal from the evidently directory.
To install jupyter nbextension, run:
$ jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently
To enable it, run:
$ jupyter nbextension enable evidently --py --sys-prefix
That's it! A single run after the installation is enough.
Note: if you use Jupyter Lab, the reports might not display in the notebook. However, you can still save them as HTML files.
Evidently is available as a PyPI package. To install it using pip package manager, run:
$ pip install evidently
Unfortunately, building reports inside a Jupyter notebook is not yet possible for Windows. The reason is Windows requires administrator privileges to create symlink. In later versions we will address this issue. You can still generate the HTML to view externally.
▶️ Getting started
Note This is a simple Hello World example. You can find a complete Getting Started Tutorial in the docs.
To start, prepare your data as two pandas DataFrames
. The first should include your reference data, the second - current production data. The structure of both datasets should be identical. To run some of the evaluations (e.g. Data Drift), you need input features only. In other cases (e.g. Target Drift, Classification Performance), you need Target and/or Prediction.
After installing the tool, import Evidently test suite and required presets. We'll use a simple toy dataset:
import pandas as pd
from sklearn import datasets
from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset
from evidently.test_preset import DataQualityTestPreset
iris_data = datasets.load_iris(as_frame='auto')
iris_frame = iris_data.frame
To run the Data Stability test suite and display the reports in the notebook:
data_stability= TestSuite(tests=[
DataStabilityTestPreset(),
])
data_stability.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None)
data_stability
To save the results as an HTML file:
data_stability.save_html("file.html")
You'll need to open it from the destination folder.
To get the output as JSON:
data_stability.json()
After installing the tool, import Evidently report and required presets:
import pandas as pd
from sklearn import datasets
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
iris_data = datasets.load_iris(as_frame='auto')
iris_frame = iris_data.frame
To generate the Data Drift report, run:
data_drift_report = Report(metrics=[
DataDriftPreset(),
])
data_drift_report.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None)
data_drift_report
To save the report as HTML:
data_drift_report.save_html("file.html")
You'll need to open it from the destination folder.
To get the output as JSON:
data_drift_report.json()
💻 Contributions
We welcome contributions! Read the Guide to learn more.
📚 Documentation
For more information, refer to a complete Documentation. You can start with this Tutorial for a quick introduction.
🗂️ Examples
Here you can find simple examples on toy datasets to quickly explore what Evidently can do right out of the box.
Report | Jupyter notebook | Colab notebook | Contents |
---|---|---|---|
Getting Started Tutorial | link | link | Data Stability and custom test suites, Data Drift and Target Drift reports |
Evidently Metric Presets | link | link | Data Drift, Target Drift, Data Quality, Regression, Classification reports |
Evidently Metrics | link | link | All individual metrics |
Evidently Test Presets | link | link | NoTargetPerformance, Data Stability, Data Quality, Data Drift Regression, Milti-class Classification, Binary Classification, Binary Classification top-K test suites |
Evidently Tests | link | link | All individual tests |
See how to integrate Evidently in your prediction pipelines and use it with other tools.
Title | link to tutorial |
---|---|
Real-time ML monitoring with Grafana | Evidently + Grafana |
Batch ML monitoring with Airflow | Evidently + Airflow |
Log Evidently metrics in MLflow UI | Evidently + MLflow |
☎️ User Newsletter
To get updates on new features, integrations and code tutorials, sign up for the Evidently User Newsletter.
✅ Discord Community
If you want to chat and connect, join our Discord community!
Docs | Discord Community | User Newsletter | Blog | Twitter
Author: Evidentlyai
Source Code: https://github.com/evidentlyai/evidently
License: Apache-2.0 license
#machinelearning #datascience #pandas #dataframes #jupyter #notebook
1672205880
Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to do any of the following (and more):
You can try a relatively recent build of the jupyter/base-notebook image on mybinder.org by simply clicking the preceding link. Otherwise, the examples below may help you get started if you have Docker installed, know which Docker image you want to use and want to launch a single Jupyter Server in a container.
The User Guide on ReadTheDocs describes additional uses and features in detail.
Example 1:
This command pulls the jupyter/scipy-notebook
image tagged 85f615d5cafa
from Docker Hub if it is not already present on the local host. It then starts a container running a Jupyter Server and exposes the container's internal port 8888
to port 10000
of the host machine:
docker run -p 10000:8888 jupyter/scipy-notebook:85f615d5cafa
You can modify the port on which the container's port is exposed by changing the value of the -p
option to -p 8888:8888
.
Visiting http://<hostname>:10000/?token=<token>
in a browser loads JupyterLab, where:
hostname
is the name of the computer running Dockertoken
is the secret token printed in the console.The container remains intact for restart after the Jupyter Server exits.
Example 2:
This command pulls the jupyter/datascience-notebook
image tagged 85f615d5cafa
from Docker Hub if it is not already present on the local host. It then starts an ephemeral container running a Jupyter Server and exposes the server on host port 10000.
docker run -it --rm -p 10000:8888 -v "${PWD}":/home/jovyan/work jupyter/datascience-notebook:85f615d5cafa
The use of the -v
flag in the command mounts the current working directory on the host (${PWD}
in the example command) as /home/jovyan/work
in the container. The server logs appear in the terminal.
Visiting http://<hostname>:10000/?token=<token>
in a browser loads JupyterLab.
Due to the usage of the flag --rm
Docker automatically cleans up the container and removes the file system when the container exits, but any changes made to the ~/work
directory and its files in the container will remain intact on the host. The -it
flag allocates pseudo-TTY.
Please see the Contributor Guide on ReadTheDocs for information about how to contribute package updates, recipes, features, tests, and community maintained stacks.
We value all positive contributions to the Docker stacks project, from bug reports to pull requests to help with answering questions. We'd also like to invite members of the community to help with two maintainer activities:
Anyone in the community can jump in and help with these activities at any time. We will happily grant additional permissions (e.g., ability to merge PRs) to anyone who shows an ongoing interest in working on the project.
Following Jupyter Notebook notice, JupyterLab is now the default for all the Jupyter Docker stack images. It is still possible to switch back to Jupyter Notebook (or to launch a different startup command). You can achieve this by passing the environment variable DOCKER_STACKS_JUPYTER_CMD=notebook
(or any other valid jupyter
subcommand) at container startup, more information is available in the documentation.
According to the Jupyter Notebook project status and its compatibility with JupyterLab, these Docker images may remove the classic Jupyter Notebook interface altogether in favor of another classic-like UI built atop JupyterLab.
This change is tracked in the issue #1217; please check its content for more information.
x86_64
and aarch64
platforms, except for tensorflow-notebook
, which only supports x86_64
for nowaarch64
or x86_64
tag prefixes, for example jupyter/base-notebook:aarch64-python-3.10.5
2022-09-21
, we create multi-platform imagesThis project only builds one set of images at a time. On 2022-10-09
, we rebuilt images with old Ubuntu
and python
versions for users who still need them:
Ubuntu | Python | Tag |
---|---|---|
20.04 | 3.7 | 1aac87eb7fa5 |
20.04 | 3.8 | a374cab4fcb6 |
20.04 | 3.9 | 5ae537728c69 |
20.04 | 3.10 | f3079808ca8c |
22.04 | 3.7 | b86753318aa1 |
22.04 | 3.8 | 7285848c0a11 |
22.04 | 3.9 | ed2908bbb62e |
22.04 | 3.10 | latest (this image is rebuilt weekly) |
Author: jupyter
Source Code: https://github.com/jupyter/docker-stacks
License: View license
1672193648
The Jupyter notebook is a web-based notebook environment for interactive computing.
We maintain the two most recently released major versions of Jupyter Notebook, Notebook v5 and Classic Notebook v6. After Notebook v7.0 is released, we will no longer maintain Notebook v5. All Notebook v5 users are strongly advised to upgrade to Classic Notebook v6 as soon as possible.
The Jupyter Notebook project is currently undertaking a transition to a more modern code base built from the ground-up using JupyterLab components and extensions.
There is new stream of work which was submitted and then accepted as a Jupyter Enhancement Proposal (JEP) as part of the next version (v7): https://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7.html
There is also a plan to continue maintaining Notebook v6 with bug and security fixes only, to ease the transition to Notebook v7: https://github.com/jupyter/notebook-team-compass/issues/5#issuecomment-1085254000
The next major version of Notebook will be based on:
This represents a significant change to the jupyter/notebook
code base.
To learn more about Notebook v7: https://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7.html
Maintainance and security-related issues are now being addressed in the 6.4.x
branch.
A 6.5.x
branch will be soon created and will depend on nbclassic
for the HTML/JavaScript/CSS assets.
New features and continuous improvement is now focused on Notebook v7 (see section above).
If you have an open pull request with a new feature or if you were planning to open one, we encourage switching over to the Jupyter Server and JupyterLab architecture, and distribute it as a server extension and / or JupyterLab prebuilt extension. That way your new feature will also be compatible with the new Notebook v7.
Jupyter notebook is a language-agnostic HTML notebook application for Project Jupyter. In 2015, Jupyter notebook was released as a part of The Big Split™ of the IPython codebase. IPython 3 was the last major monolithic release containing both language-agnostic code, such as the IPython notebook, and language specific code, such as the IPython kernel for Python. As computing spans across many languages, Project Jupyter will continue to develop the language-agnostic Jupyter notebook in this repo and with the help of the community develop language specific kernels which are found in their own discrete repos.
You can find the installation documentation for the Jupyter platform, on ReadTheDocs. The documentation for advanced usage of Jupyter notebook can be found here.
For a local installation, make sure you have pip installed and run:
pip install notebook
Launch with:
jupyter notebook
You need some configuration before starting Jupyter notebook remotely. See Running a notebook server.
See CONTRIBUTING.md
for how to set up a local development installation.
If you are interested in contributing to the project, see CONTRIBUTING.md
.
This repository is a Jupyter project and follows the Jupyter Community Guides and Code of Conduct.
Author: Jupyter
Source Code: https://github.com/jupyter/notebook
License: View license
1670371920
In this article, we will learn about Notebook Quarto and Jupyter. as a Python user, I’m not that familiar with
.Rmd
/.qmd
files; I useipynb
notebooks most often. And in this post, I’ll show why you might consider using Jupyter Notebooks and how to convert them into beautiful reports with minimal effort – using Quarto.
If you’re an R purist, you may not be familiar with Jupyter Notebooks. I’ll briefly introduce it and then we can jump into Quarto.
Jupyter Notebook is a web application that provides a streamlined, interactive way to work with code mixed with plots and markdown text. And although it’s popular for Python users, it also supports other languages like R. There are some limitations in creating and sharing computational documentation, but that’s where Quarto comes into play.
Contrary to .qmd
/.Rmd
files, all outputs, like plots and tables, are saved inside the report file in a .ipynb
format.
This has its pros and cons. On one hand, it’s convenient to be able to embed images into the same file where executable code is. On the other, embedding images into code files makes it hard to version control notebooks. Fortunately, in recent years this has changed; VS Code supports notebooks differences!
Additionally, Jupyter Notebooks are rendered on GitHub so you can easily share your report and make them readable for everyone.
While the Jupyter Notebook format is very convenient to experiment in, there’s no easy way to convert a notebook into a beautiful report. That is until Quarto entered the picture.
Image 1 – Getting started with Quarto
With Quarto, you can easily export your .ipynb
file into an interactive HTML
with plotly plots, interactive cross-references, and a table of contents!
Is R your preferred language? Get started with Quarto in R with our Quarto tutorial for interactive Markdown documents.
If you work closely with R developers that are used to .Rmd
files you’ll find an additional benefit. You can create a single custom theme in .css
file for all reports, from .qmd
files, and from .ipynb
! This way you’ll have consistency across reports created using different technologies. Creating a uniform, professional look in front of your clients!
It’s always good to remember that Quarto, by leveraging on top of Pandoc, allows exporting to over 50 different formats! This is important as HTML
reports are not accepted everywhere. But worry not, you can just as easily export as pdf or another format as needed.
I believe that the best way to learn is through examples. So let’s start by looking at a simple notebook, full of Quarto features compatible with any notebook editor. Actually, this notebook contains the same code as the .qmd
file from the previous post.
And here is the generated report by running quarto render report.ipynb
.
The only important thing is the first cell, the one with yaml configuration, that has to be of type raw
. As we can see, all features that we’ve used earlier, work here as well!
Once you’re done with report creation, you might want to check out the self-contained: true
Quarto option. It bundles all required css
, js
files into the HTML
file, thus making the report easy to distribute, working without the Internet.
You could ask, OK, so the results are exactly the same as with the qmd
file, what’s the deal? With quarto preview
, every time you change the notebook and save, the preview gets updated. But what’s important is that the cells’ outputs are taken directly from the notebook, with no need for re-running all cells!
RStudio (Posit) Connect and Workbench give you the power to create and publish data products at a push of a button! See how Appsilon can help as an RStudio Certified Partner.
This can save you a lot of time. And makes working in Jupyter my favorite way of creating reports. That being said, remember that apart from Python, you can just as well use R or Julia as kernels in Jupyter!
Jupyter Notebooks provide a fantastic way for iterative experimenting. What they were lacking was the possibility to export the created report to a visually appealing, business-friendly format. And that’s exactly what Quarto does!
Original article sourced at: https://appsilon.com
1667923380
Please refer to the official docs at kubeflow.org.
The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.
Please refer to the Community page.
Author: Kubeflow
Source Code: https://github.com/kubeflow/kubeflow
License: Apache-2.0 license
1666874880
Examples show Python code, but most features also work in R, bash, typescript, and many other languages.
Hover over any piece of code; if an underline appears, you can press Ctrl to get a tooltip with function/class signature, module documentation or any other piece of information that the language server provides
Critical errors have red underline, warnings are orange, etc. Hover over the underlined code to see a more detailed message
Use the context menu entry, or Alt + :computer_mouse: to jump to definitions/references (you can change it to Ctrl/⌘ in settings); use Alt + o to jump back.
Place your cursor on a variable, function, etc and all the usages will be highlighted
continuousHinting
setting.Function signatures will automatically be displayed
Advanced static-analysis autocompletion without a running kernel
When a kernel is available the suggestions from the kernel (such as keys of a dict and columns of a DataFrame) are merged with the suggestions from the Language Server (in notebook).
If the kernel is too slow to respond promptly only the Language Server suggestions will be shown (default threshold: 0.6s). You can configure the completer to not attempt to fetch the kernel completions if the kernel is busy (skipping the 0.6s timeout).
You can deactivate the kernel suggestions by adding "Kernel"
to the disableCompletionsFrom
in the completion
section of Advanced Settings. Alternatively if you only want kernel completions you can add "LSP"
to the same setting; Or add both if you like to code in hardcore mode and get no completions, or if another provider has been added.
Rename variables, functions and more, in both: notebooks and the file editor. Use the context menu option or the F2 shortcut to invoke.
Sort and jump between the diagnostics using the diagnostics panel. Open it searching for "Show diagnostics panel" in JupyterLab commands palette or from the context menu. Use context menu on rows in the panel to filter out diagnostics or copy their message.
You will need to have both of the following installed:
In addition, if you wish to use javascript, html, markdown or any other NodeJS-based language server you will need to have appropriate NodeJS version installed.
Note: Installation for JupyterLab 2.x requires a different procedure, please consult the documentation for the extension version 2.x.
For more extensive installation instructions, see the documentation.
For the current stable version, the following steps are recommended. Use of a python virtualenv
or a conda env is also recommended.
install python 3
conda install -c conda-forge python=3
install JupyterLab and the extensions
conda install -c conda-forge 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp
# or
pip install 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp
Note:
jupyterlab-lsp
provides both the server extension and the lab extension.
Note: With conda, you could take advantage of the bundles:
jupyter-lsp-python
orjupyter-lsp-r
to install both the server extension and the language server.
install LSP servers for languages of your choice; for example, for Python (pylsp) and R (languageserver) servers:
pip install 'python-lsp-server[all]'
R -e 'install.packages("languageserver")'
or from conda-forge
conda install -c conda-forge python-lsp-server r-languageserver
Please see our full list of supported language servers which includes installation hints for the common package managers (npm/pip/conda). In general, any LSP server from the Microsoft list should work after some additional configuration.
Note: it is worth visiting the repository of each server you install as many provide additional configuration options.
Restart JupyterLab
If JupyterLab is running when you installed the extension, a restart is required for the server extension and any language servers to be recognized by JupyterLab.
(Optional, IPython users only) to improve the performance of autocompletion, disable Jedi in IPython (the LSP servers for Python use Jedi too). You can do that temporarily with:
%config Completer.use_jedi = False
or permanently by setting c.Completer.use_jedi = False
in your ipython_config.py
file.
(Optional, Linux/OSX-only) As a security measure by default Jupyter server only allows access to files under the Jupyter root directory (the place where you launch the Jupyter server). Thus, in order to allow jupyterlab-lsp
to navigate to external files such as packages installed system-wide or to libraries inside a virtual environment (conda
, pip
, ...) this access control mechanism needs to be circumvented: inside your Jupyter root directory create a symlink named .lsp_symlink pointing to your system root /
.
ln -s / .lsp_symlink
As this symlink is a hidden file the Jupyter server must be instructed to serve hidden files. Either use the appropriate command line flag:
jupyter lab --ContentsManager.allow_hidden=True
or, alternatively, set the corresponding setting inside your jupyter_server_config.py
.
Help in implementing a custom ContentsManager
which will enable navigating to external files without the symlink is welcome.
Server configurations can be edited using the Advanced Settings editor in JupyterLab (Settings > Advanced Settings Editor). For settings specific to each server, please see the table of language servers. Example settings might include:
Note: for the new (currently recommended) python-lsp-server replace
pyls
occurrences withpylsp
{
"language_servers": {
"pyls": {
"serverSettings": {
"pyls.plugins.pydocstyle.enabled": true,
"pyls.plugins.pyflakes.enabled": false,
"pyls.plugins.flake8.enabled": true
}
},
"r-languageserver": {
"serverSettings": {
"r.lsp.debug": false,
"r.lsp.diagnostics": false
}
}
}
}
The serverSettings
key specifies the configurations sent to the language servers. These can be written using stringified dot accessors like above (in the VSCode style), or as nested JSON objects, e.g.:
{
"language_servers": {
"pyls": {
"serverSettings": {
"pyls": {
"plugins": {
"pydocstyle": {
"enabled": true
},
"pyflakes": {
"enabled": false
},
"flake8": {
"enabled": true
}
}
}
}
}
}
}
Some language servers, such as pyls
, provide other configuration methods in addition to language-server configuration messages (accessed using the Advanced Settings Editor). For example, pyls
allows users to configure the server using a local configuration file. You can change the inspection/diagnostics for server plugins like pycodestyle
there.
The exact configuration details will vary between operating systems (please see the configuration section of pycodestyle documentation), but as an example, on Linux you would simply need to create a file called ~/.config/pycodestyle
, which may look like that:
[pycodestyle]
ignore = E402, E703
max-line-length = 120
In the example above:
After changing the configuration you may need to restart the JupyterLab, and please be advised that the errors in configuration may prevent the servers from functioning properly.
Again, please do check the pycodestyle documentation for specific error codes, and check the configuration of other feature providers and language servers as needed.
This would not be possible without the fantastic initial work at wylieconlon/lsp-editor-adapter.
Author: jupyter-lsp
Source Code: https://github.com/jupyter-lsp/jupyterlab-lsp
License: BSD-3-Clause license
1666108883
FSNotes is modern notes manager for macOS and iOS.
[[double brackets]]
.
Download Details:
Author: Glushchenko
Source Code: https://github.com/glushchenko/fsnotes
License: MIT license