Hermann  Frami

Hermann Frami

1679887042

Dolly: This Fine-tunes The GPT-J 6B Model on The Alpaca Dataset

Dolly

This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).

Get Started Training

  • Add the dolly repo to Databricks (under Repos click Add Repo, enter https://github.com/databrickslabs/dolly.git, then click Create Repo).
  • Start a 12.2 LTS ML (includes Apache Spark 3.3.2, GPU, Scala 2.12) single-node cluster with node type having 8 A100 GPUs (e.g. Standard_ND96asr_v4 or p4d.24xlarge).
  • Open the train_dolly notebook in the dolly repo, attach to your GPU cluster, and run all cells. When training finishes, the notebook will save the model under /dbfs/dolly_training.

Running Unit Tests Locally

pyenv local 3.8.13
python -m venv .venv
. .venv/bin/activate
pip install -r requirements_dev.txt
./run_pytest.sh

Download Details:

Author: Databrickslabs
Source Code: https://github.com/databrickslabs/dolly 
License: Apache-2.0 license

#python #dataset #databricks #notebook 

Dolly: This Fine-tunes The GPT-J 6B Model on The Alpaca Dataset
Lawson  Wehner

Lawson Wehner

1679562480

Pluto.jl: Simple reactive notebooks for Julia

Pluto.jl

Writing a notebook is not just about writing the final document — Pluto empowers the experiments and discoveries that are essential to getting there.

Explore models and share results in a notebook that is

  • reactive - when changing a function or variable, Pluto automatically updates all affected cells.
  • lightweight - Pluto is written in pure Julia and is easy to install.
  • simple - no hidden workspace state; friendly UI.

reactivity screencap

🎈 Pluto demo inside your browser 🎈

Input

A Pluto notebook is made up of small blocks of Julia code (cells) and together they form a reactive notebook. When you change a variable, Pluto automatically re-runs the cells that refer to it. Cells can even be placed in arbitrary order - intelligent syntax analysis figures out the dependencies between them and takes care of execution.

Cells can contain arbitrary Julia code, and you can use external libraries. There are no code rewrites or wrappers, Pluto just looks at your code once before evaluation.

Output

Your notebooks are saved as pure Julia files (sample), which you can then import as if you had been programming in a regular editor all along. You can also export your notebook with cell outputs as attractive HTML and PDF documents. By reordering cells and hiding code, you have full control over how you tell your story.

Dynamic environment

Pluto offers an environment where changed code takes effect instantly and where deleted code leaves no trace. Unlike Jupyter or Matlab, there is no mutable workspace, but rather, an important guarantee:

At any instant, the program state is completely described by the code you see.

No hidden state, no hidden bugs.

Interactivity

Your programming environment becomes interactive by splitting your code into multiple cells! Changing one cell instantly shows effects on all other cells, giving you a fast and fun way to experiment with your model.

In the example below, changing the parameter A and running the first cell will directly re-evaluate the second cell and display the new plot.

plotting screencap 
 

Built-in package manager

Pluto uses syntax analysis to understand which packages are being used in a notebook, and it automatically manages a package environment for your notebook. You no longer need to install packages, you can directly import any registered package like Plots or DataFrames and use it.

To ensure reproducibility, the information to exactly reproduce the package environment is stored in your notebook file. When someone else opens your notebook with Pluto, the exact same package environment will be used, and packages will work on their computer, automatically! more info

package manager screencap 
 

HTML interaction

Lastly, here's _one more feature_: Pluto notebooks have a @bind macro to create a live bond between an HTML object and a Julia variable. Combined with reactivity, this is a very powerful tool!

@bind macro screencap 
 

You don't need to know HTML to use it! The PlutoUI package contains basic inputs like sliders and buttons. Pluto's interactivity is very easy to use, you will learn more from the featured notebooks inside Pluto!

But for those who want to dive deeper - you can use HTML, JavaScript and CSS to write your own widgets! Custom update events can be fired by dispatching a new CustomEvent("input"), making it compatible with the viewof operator of observablehq. Have a look at the JavaScript featured notebook inside Pluto!


 

Pluto for teaching

Pluto was developed alongside the free online course Introduction to Computational Thinking at MIT, with the goal of creating a programming environment that is powerful, helpful and interactive, without being too intimidating for students and teachers.

Are you interested in using Pluto for your class? Here are some presentations by people who are using it already: the MIT team, Gerhard Dorn, Daniel Molina, Henki W. Ashadi and Max Köhler.

https://user-images.githubusercontent.com/6933510/134824521-7cefa38a-7102-4767-bee4-777caf30ba47.mp4

(video) Grant Sanderson (3Blue1Brown) using Pluto's interactivity to teach Computational Thinking at MIT!


 



 

Let's do it!

Ingredients

For one tasty notebook 🥞 you will need:

  • Julia v1.6 or above
  • Linux, MacOS or Windows, Linux and MacOS will work best
  • Mozilla Firefox or Google Chrome

Installation

🎈 How to install Julia & Pluto (6 min) 🎈

Run Julia, enter ] to bring up Julia's package manager, and add the Pluto package:

julia> ]
(v1.7) pkg> add Pluto

Press Ctrl+C to return to the julia> prompt.

Usage

To run Pluto, run the following commands in your Julia REPL:

julia> import Pluto
julia> Pluto.run()

Pluto will open in your browser, and you can get started!

Questions and Help

Questions? Have a look at the FAQ


 

Interested in learning Julia, Pluto and applied mathematics? Join the open MIT course taught by Alan Edelman, David P. Sanders & Grant Sanderson (3blue1brown) (and a bit of me): Introduction to Computational Thinking, Spring 2021. 
 


 

Contribute to Pluto

Follow these instructions to start working on the package.

Featured notebooks

Unless otherwise specified, the included featured notebooks have a more permissive license: the Unlicense. This means that you can use them however you like - you do not need to credit us!

Your notebook files are yours, you also do not need to credit us. Have fun!

From the authors

The Pluto project is an ambition to rethink what a programming environment should be. We believe that scientific computing can be a lot simpler and more accessible. If you feel the same, give Pluto a try! We would love to hear what you think. 😊

You can chat with us

feedback screencap

Questions? Have a look at the FAQ.


🎈 Pluto – introduction (20 min) at Juliacon 2020 🎈
 

🌐 Pluto – one year later (25 min) at Juliacon 2021 🌐


Download Details:

Author: fonsp
Source Code: https://github.com/fonsp/Pluto.jl 
License: MIT license

#julia #visualization #education #reactivate #notebook 

Pluto.jl: Simple reactive notebooks for Julia
Lawrence  Lesch

Lawrence Lesch

1678564140

Run JavaScript & TypeScript in Node.js within VS Code Notebooks

Node.js Notebooks

Features

  • Enhanced REPL experience for Node.js in Notebooks (with top level awaits)
  • Run & debug JavaScript, TypeScript code in node.js
  • Built in support for typescript (ships with TypeScript & ts-node).
  • Built in support for plotly (plotly.js is shipped with the extension)
  • Rich (inline visualizations) using @tensorflow/tfjs-vis & Tensorboards
  • Excellent support for danfo.js (rich HTML output and plots)
  • Excellent support for arquero (rich HTML output)
  • Run shell scripts within the notebook cell.
  • Quickly prototype and view HTML/JavaScript/CSS output
  • Support for user input using readline

Packages such plotly, tfjs-vis & danfo.js support rich visualization only in the browser, however, this extension leverages the power of Notebooks to provide the same rich visualizations when targeting node.js.

Use the command Open a sample node.js notebook to open a sample notebook to get started with plotly.js, danfo.js, tensorflow.js, etc.

Getting started

  • For a REPL experience use the command Open Node.js REPL
    • Consider installing the Jupyter extension for an enhance user interface (toolbars).
  • For a notebook experience, create a file with the extension *.nnb, e.g. sample.nnb
    • Or use the menu item New File... to create a Node.js notebook

Repl Demo

Examples

  • Use the command Open a sample node.js notebook to open a sample notebook.
  • Use the command Welcome: Open Walkthrough... to checkout the samples.

Requirements

  • node.js >= 12
  • node.js needs to be in the current path

Roadmap

  • Open a plain js/ts file as a notebook & vice versa.
  • Better renderers for tabular data (arquero, danfo.js, etc)
  • Vega plots without having to install vega
  • Custom node arguments

Known issues, workarounds and technical details

  • See here for more details

Thanks

Thanks to the various packages we provide integrations with which help make this extension useful:


Download Details:

Author: DonJayamanne
Source Code: https://github.com/DonJayamanne/typescript-notebook 
License: MIT license

#typescript #jupyter #notebook #plotly 

Run JavaScript & TypeScript in Node.js within VS Code Notebooks
Oral  Brekke

Oral Brekke

1678156140

KB: A Minimalist Command Line Knowledge Base Manager

kb. A minimalist knowledge base manager


Purpose

kb is a text-oriented minimalist command line knowledge base manager. kb can be considered a quick note collection and access tool oriented toward software developers, penetration testers, hackers, students or whoever has to collect and organize notes in a clean way. Although kb is mainly targeted on text-based note collection, it supports non-text files as well (e.g., images, pdf, videos and others).

The project was born from the frustration of trying to find a good way to quickly access my notes, procedures, cheatsheets and lists (e.g., payloads) but at the same time, keeping them organized. This is particularly useful for any kind of student. I use it in the context of penetration testing to organize pentesting procedures, cheatsheets, payloads, guides and notes.

I found myself too frequently spending time trying to search for that particular payload list quickly, or spending too much time trying to find a specific guide/cheatsheet for a needed tool. kb tries to solve this problem by providing you a quick and intuitive way to access knowledge.

In few words kb allows a user to quickly and efficiently:

  • collect items containing notes,guides,procedures,cheatsheets into an organized knowledge base;
  • filter the knowledge base on different metadata: title, category, tags and others;
  • visualize items within the knowledge base with (or without) syntax highlighting;
  • grep through the knowledge base using regexes;
  • import/export an entire knowledge base;

Basically, kb provides a clean text-based way to organize your knowledge.

Installation

You should have Python 3.6 or above installed.

To install the most recent stable version of kb just type:

pip install -U kb-manager

If you want to install the bleeding-edge version of kb (that may have some bugs) you should do:

git clone https://github.com/gnebbia/kb
cd kb
pip install -r requirements.txt
python setup.py install

# or with pip
pip install -U git+https://github.com/gnebbia/kb

Tip for GNU/Linux and MacOS users: For a better user experience, also set the following kb bash aliases:

cat <<EOF > ~/.kb_alias
alias kbl="kb list"
alias kbe="kb edit"
alias kba="kb add"
alias kbv="kb view"
alias kbd="kb delete --id"
alias kbg="kb grep"
alias kbt="kb list --tags"
EOF
echo "source ~/.kb_alias" >> ~/.bashrc
source ~/.kb_alias

Please remember to upgrade kb frequently by doing:

pip install -U kb-manager

Installation from AUR

Arch Linux users can install kb or kb-git with their favorite AUR Helper.

Stable:

yay -S kb

Dev:

yay -S kb-git

Installation from pkgsrc

Of course it runs on NetBSD (and on pkgsrc). We can install it from pkgsrc source tree (databases/py-kb) or as a binary package using pkgin:

pkgin in py38-kb

Note that at the moment the package is only available from -current repositories.

Installation with homebrew

To install using homebrew, use:

brew tap gnebbia/kb https://github.com/gnebbia/kb.git
brew install gnebbia/kb/kb

To upgrade with homebrew:

brew update
brew upgrade gnebbia/kb/kb

Notes for Windows users

Windows users should keep in mind these things:

  • DO NOT USE notepad as %EDITOR%, kb is not compatible with notepad, a reasonable alternative is notepad++;
  • %EDITOR% variable should ALWAYS be enclosed within double quotes;
EDITOR=C:\Program Files\Editor\my cool editor.exe      -> WRONG!
EDITOR="C:\Program Files\Editor\my cool editor.exe"    -> OK!

To set the "EDITOR" Environment variable by using cmd.exe, just issue the following commands, after having inserted the path to your desired text editor:

set EDITOR="C:\path\to\editor\here.exe"
setx EDITOR "\"C:\path\to\editor\here.exe\""

To set the "EDITOR" Environment variable by using Powershell, just issue the following commands, after having inserted the path to your desired text editor:

$env:EDITOR='"C:\path\to\editor\here.exe"'
[System.Environment]::SetEnvironmentVariable('EDITOR','"C:\path\to\editor\here.exe"', [System.EnvironmentVariableTarget]::User)

Setting Aliases for cmd

Open a cmd.exe terminal with administrative rights and paste the following commands:

reg add "HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor" /v "AutoRun" /t REG_EXPAND_SZ /d "%USERPROFILE%\autorun.cmd"
(
echo @echo off
echo doskey kbl=kb list $*
echo doskey kbe=kb edit $*
echo doskey kba=kb add $*
echo doskey kbv=kb view $*
echo doskey kbd=kb delete --id $*
echo doskey kbg=kb grep $*
echo doskey kbt=kb list --tags $*
)> %USERPROFILE%\autorun.cmd

Setting Aliases for Powershell

Open a Powershell terminal and paste the following commands:

@'
function kbl { kb list $args }
function kbe { kb edit $args }
function kba { kb add  $args }
function kbv { kb view $args }
function kbd { kb delete --id $args }
function kbg { kb grep $args }
function kbt { kb list --tags $args }
'@ >  $env:USERPROFILE\Documents\WindowsPowerShell\profile.ps1

Docker

A docker setup has been included to help with development.

To install and start the project with docker:

docker-compose up -d
docker-compose exec kb bash

The container has the aliases included in its .bashrc so you can use kb in the running container as you would if you installed it on the host directly. The ./docker/data directory on the host is bound to /data in the container, which is the image's working directly also. To interact with the container, place (or symlink) the files on your host into the ./docker/data directory, which can then be seen and used in the /data directory in the container.

Usage

A quick demo of a typical scenario using kb:

kb_general_demo.gif

A quick demo with kb aliases enabled:

kb_general_demo_alias.gif

A quick demo for non-text documents:

kb_non_text_demo.gif

List artifacts

List all artifacts contained in the kb knowledge base

kb list

# or if aliases are used:
kbl

List all artifacts containing the string "zip"

kb list zip

# or if aliases are used:
kbl zip

kb_list_title_zip.gif

List all artifacts belonging to the category "cheatsheet"

kb list --category cheatsheet
# or
kb list -c cheatsheet

# or if aliases are used:
kbl -c cheatsheet

kb_list_category.gif

List all the artifacts having the tags "web" or "pentest"

kb list --tags "web;pentest"

# or if aliases are used:
kbl --tags "web;pentest"

kb_list_tags.gif

List using "verbose mode"

kb list -v

# or if aliases are used:
kbl -v

kb_list_verbose.gif

Add artifacts

Add a file to the collection of artifacts

kb add ~/Notes/cheatsheets/pytest

# or if aliases are used:
kba ~/Notes/cheatsheets/pytest

kb_add.gif

Add a file to the artifacts

kb add ~/ssh_tunnels --title pentest_ssh --category "procedure" \
    --tags "pentest;network" --author "gnc" --status "draft"

kb_add_title.gif

Add all files contained in a directory to kb

kb add ~/Notes/cheatsheets/general/* --category "cheatsheet"

kb_add_directory.gif

Create a new artifact from scratch

kb add --title "ftp" --category "notes" --tags "protocol;network"
# a text editor ($EDITOR) will be launched for editing

kb_add_from_scratch.gif

Create a new artifact from the output of another program

kb add --title "my_network_scan" --category "scans" --body "$(nmap -T5 -p80 192.168.1.0/24)"

Delete artifacts

Delete an artifact by ID

kb delete --id 2

# or if aliases are used:
kbd 2

Delete multiple artifacts by ID

kb delete --id 2 3 4

# or if aliases are used:
kbd 2 3 4

kb_delete_multiple.gif

Delete an artifact by name

kb delete --title zap --category cheatsheet

kb_delete_name.gif

View artifacts

View an artifact by id

kb view --id 3
# or
kb view -i 3
# or 
kb view 3

# or if aliases are used:
kbv 3

kb_view.gif

View an artifact by name

kb view --title "gobuster"
# or
kb view -t "gobuster"
# or
kb view gobuster

kb_view_title.gif

View an artifact without colors

kb view -t dirb -n

kb_view_title_nocolor.gif

View an artifact within a text-editor

kb view -i 2 -e

# or if aliases are used:
kbv 2 -e

kb_view_in_editor.gif

Edit artifacts

Editing artifacts involves opening a text editor. Hence, binary files cannot be edited by kb.

The editor can be set by the "EDITOR" environment variable.

Edit an artifact by id

kb edit --id 13
# or
kbe 13
# or if aliases are used:
kbe 13 

kb_edit.gif

Edit an artifact by name

kb edit --title "git" --category "cheatsheet"
# or
kb edit -t "git" -c "cheatsheet"
# or if git is unique as artifact
kb edit git

Grep through artifacts

Grep through the knowledge base

kb grep "[bg]zip"

# or if aliases are used:
kbg "[bg]zip"

kb_grep.gif

Grep (case-insensitive) through the knowledge base

kb grep -i "[BG]ZIP"

kb_grep_case_insensitive.gif

Grep in "verbose mode" through the knowledge base

kb grep -v "[bg]zip"

Grep through the knowledge base and show matching lines

kb grep -m "[bg]zip"

Import/Export/Erase a knowledge base

Export the current knowledge base

To export the entire knowledge base, do:

kb export

This will generate a .kb.tar.gz archive that can be later be imported by kb. kb_export.gif

If you want to export only data (so that it can be used in other software):

 kb export --only-data

This will export a directory containing a subdirectory for each category and within these subdirectories we will have all the artifacts belonging to that specific category.

Import a knowledge base

kb import archive.kb.tar.gz

NOTE: Importing a knowledge base erases all the previous data. Basically it erases everything and imports the new knowledge base. kb_import.gif

Erase the entire knowledge base

kb erase

kb_erase.gif

Manage Templates

kb supports custom templates for the artifacts. A template is basically a file using the "toml" format, structured in this way:

TITLES   = [ "^#.*", "blue",  ]
WARNINGS = [ "!.*" , "yellow",]
COMMENTS = [ ";;.*", "green", ]

Where the first element of each list is a regex and the second element is a color.

Note that by default an artifact is assigned with the 'default' template, and this template can be changed too (look at "Edit a template" subsection).

List available templates

To list all available templates:

kb template list

To list all the templates containing the string "theory":

kb template list "theory"

Create a new template

Create a new template called "lisp-cheatsheets", note that an example template will be put as example in the editor.

kb template new lisp-cheatsheets

Delete a template

To delete the template called "lisp-cheatsheets" just do:

kb template delete lisp-cheatsheets

Edit a template

To edit the template called "listp-cheatsheets" just do:

kb template edit lisp-cheatsheets

Add a template

We can also add a template from an already existing toml configuration file by just doing:

kb template add ~/path/to/myconfig.toml --title myconfig

Change template for an artifact

We can change the template for an existing artifact by ID by using the update command:

kb update --id 2 --template "lisp-cheatsheets"

Apply a template to all artifacts of a category

We can apply the template "lisp-cheatsheets" to all artifacts belonging to the category "lispcode" by doing:

kb template apply "lisp-cheatsheets" --category "lispcode"

Apply a template to all artifacts having zip in their title

We can apply the template "dark" to all artifacts having in their title the string "zip" (e.g., bzip, 7zip, zipper) by doing:

kb template apply "dark" --title "zip" --extended-match
# or 
kb template apply "dark" --title "zip" -m

We can always have our queries to "contain" the string by using the --extended-match option when using kb template apply.

Apply a template to all artifacts having specific properties

We can apply the template "light" to all artifacts of the category "cheatsheet" who have as author "gnc" and as status "OK" by doing:

kb template apply "light" --category "cheatsheet" --author "gnc" --status "OK"

Integrating kb with other tools

kb can be integrated with other tools.

kb and rofi

We can integrate kb with rofi, a custom mode has been developed accessible in the "misc" directory within this repository.

We can launch rofi with this mode by doing:

rofi -show kb -modi kb:/path/to/rofi-kb-mode.sh

Experimental

Synchronize kb with a remote git repository

Synchronization with a remote git repository is experimental at the moment. Anyway we can initialize our knowledge base to a created empty github/gitlab (other git service) repository by doing:

kb sync init

We can then push our knowledge base to the remote git repository with:

kb sync push

We can pull (e.g., from another machine) our knowledge base from the remote git repository with:

kb sync pull

We can at any time view to what remote endpoint our knowledge is synchronizing to with:

kb sync info

UPGRADE

If you want to upgrade kb to the most recent stable release do:

pip install -U kb-manager

If instead you want to update kb to the most recent release (that may be bugged), do:

git clone https://github.com/gnebbia/kb 
cd kb
pip install --upgrade .

FAQ

Q) How do I solve the AttributeError: module 'attr' has no attribute 's' error?

A) Uninstall attr and use attrs:

pip uninstall attr
pip uninstall attrs
pip install attrs
pip install -U kb-manager

Date: 2022-09-21

Version: 0.1.7


Download Details:

Author: Gnebbia
Source Code: https://github.com/gnebbia/kb 
License: GPL-3.0 license

#python #cli #knowledge #notebook #notes 

 KB: A Minimalist Command Line Knowledge Base Manager

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Approach

Approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup

We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.10 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files. You can download and install (or update to) the latest release of Whisper with the following command:

pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git 

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

You may need rust installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

pip install setuptools-rust

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

SizeParametersEnglish-only modelMultilingual modelRequired VRAMRelative speed
tiny39 Mtiny.entiny~1 GB~32x
base74 Mbase.enbase~1 GB~16x
small244 Msmall.ensmall~2 GB~6x
medium769 Mmedium.enmedium~5 GB~2x
large1550 MN/Alarge~10 GB1x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the large-v2 model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D in the paper. The smaller, the better.

WER breakdown by language

Command-line usage

The following command will transcribe speech in audio files, using the medium model:

whisper audio.flac audio.mp3 audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese

Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate

Run the following to view all available options:

whisper --help

See tokenizer.py for the list of all available languages.

Python usage

Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.

import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

More examples

Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.


Download Details:

Author: Openai
Source Code: https://github.com/openai/whisper 
License: MIT license

#jupyter #notebook #speech #recognition #via 

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Lawrence  Lesch

Lawrence Lesch

1675910400

Querybook: A Big Data Querying UI, Combining Collocated Table Metadata

Querybook

Querybook is a Big Data IDE that allows you to discover, create, and share data analyses, queries, and tables. 

Features

  • 📚 Organize analyses with rich text, queries, and charts
  • ✏️ Compose queries with autocompletion and hovering tooltip
  • 📈 Use scheduling + charting in DataDocs to build dashboards
  • 🙌 Live query collaborations with others
  • 📝 Add additional documentation to your tables
  • 🧮 Get lineage, sample queries, frequent user, search ranking based on past query runs

Getting started

Prerequisite

Please install Docker before trying out Querybook.

Quick setup

Pull this repo and run make. Visit https://localhost:10001 when the build completes.

For more details on installation, click here

Configuration

For infrastructure configuration, click here For general configuration, click here

Supported Integrations

Query Engines

Authentication

  • User/Password
  • OAuth
    • Google Cloud OAuth
    • Okta OAuth
    • GitHub OAuth
  • LDAP

Metastore

Can be used to fetch schema and table information for metadata enrichment.

  • Hive Metastore
  • Sqlalchemy Inspect
  • AWS Glue Data Catalog

Result Storage

Use one of the following to store query results.

  • Database (MySQL, Postgres, etc)
  • S3
  • Google Cloud Storage
  • Local file

Result Export

Upload query results from Querybook to other tools for further analyses.

  • Google Sheets Export
  • Python export

Notification

Get notified upon completion of queries and DataDoc invitations via IM or email.

  • Email
  • Slack

User Interface

Query Editor editor.gif

Charting visualization.gif

Scheduling

Lineage & Analytics analytics.gif

Contributing Back

See CONTRIBUTING.

Check out the full documentation & feature highlights here.

Download Details:

Author: Pinterest
Source Code: https://github.com/pinterest/querybook 
License: Apache-2.0 license

#typescript #flask #presto #hive #notebook 

Querybook: A Big Data Querying UI, Combining Collocated Table Metadata
Royce  Reinger

Royce Reinger

1675707840

Bamboolib: A GUI for Pandas DataFrames

Bamboolib

Community repository of bamboolib

Data Analysis in Python 🐍 - without becoming a programmer or googling syntax

This is the community repository of bamboolib. You can use bamboolib for free if you use bamboolib on your local computer or on Open Data via Binder.

  • If you have any issues or feature requests, please open an issue.

bamboolib is a GUI for pandas DataFrames that enables anyone to work with Python in Jupyter Notebook or JupyterLab.

Features

  • Intuitive GUI that exports Python code
  • Supports all common transformations and visualizations
  • Provides best-practice analyses for data exploration
  • Can be arbitrarily customized via simple Python plugins
  • Integrate any internal or external Python library

Main benefits of bamboolib

  • Enables anyone to analyse data in Python without having to write code
  • Even people who can code use bamboolib because it is faster and easier than writing the code themselves
  • Reduces employee on-boarding time and training costs
  • Enables team members of all skill levels to collaborate within Jupyter and to share the working results as reproducible code
  • No lock-in. You own the code you created with bamboolib
  • All your data remains private and secure

🔍Try bamboolib live on Binder

Installation

Install bamboolib for Jupyter Notebook or Jupyter Lab by running the code below in your terminal (or Anaconda Prompt for Windows):

pip install bamboolib

# Jupyter Notebook extensions
python -m bamboolib install_nbextensions

# JupyterLab extensions
python -m bamboolib install_labextensions

After you have installed bamboolib, you can go here to test bamboolib.

Documentation

You find out how to get started along with tutorials and an API reference on our docs.

Further links


bamboolib is joining forces with Databricks. For more information, please read our announcement.

Please note that this repository does not contain the source code of bamboolib. The repo contains e.g. explanations and code samples for plugins and it serves as a place to answer public questions via issues.


Download Details:

Author: tkrabel
Source Code: https://github.com/tkrabel/bamboolib 

#machinelearning #python #jupyter #notebook 

Bamboolib: A GUI for Pandas DataFrames
Royce  Reinger

Royce Reinger

1675657760

Jupyter-text2code: Text2Code for Jupyter Notebook

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

jupyter-text2code-demo.gif

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code (old version)

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.

CPU image size: 1.51 GB

GPU image size: 2.56 GB

Model training:

The plugin now supports pandas commands + quick snippet insertion of available snippets from awesome-notebooks. With this change, we can now get snippets for most popular integrations from within the jupyter tab. eg:

  • Get followers count from twitter
  • Get stats about a story from instagram The detailed training steps are available in scripts README where we also evaluated performance of different models and ended up selecting SentenceTransformers paraphrase-MiniLM-L6-v2

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  •  Publish Docker image
  •  Refactor code and make it mode modular, remove duplicate code, etc
  •  Add support for more commands
  •  Improve intent detection and NER
  •  Add support for Windows
  •  Explore sentence Paraphrasing to generate higher-quality training data
  •  Gather real-world variable names, library names as opposed to randomly generating them
  •  Try NER with a transformer-based model
  •  With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  •  Create a survey to collect linguistic data
  •  Add Speech2Code support

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook


Download Details:

Author: Deepklarity
Source Code: https://github.com/deepklarity/jupyter-text2code 
License: MIT license

#machinelearning #python #jupyter #notebook 

 Jupyter-text2code: Text2Code for Jupyter Notebook
Royce  Reinger

Royce Reinger

1673742780

Evaluate and Monitor ML Models From Validation to Production

Evidently

An open-source framework to evaluate, test and monitor ML models in production.

📊 What is Evidently?

Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production.

Evidently has a modular approach with 3 interfaces on top of the shared metrics functionality.

1. Tests: batch model checks

Tests example

Tests perform structured data and ML model quality checks. They verify a condition and return an explicit pass or fail result.

You can create a custom Test Suite from 50+ individual tests or run a preset (for example, Data Drift or Regression Performance). You can get results as an interactive visual dashboard inside Jupyter notebook or Colab, or export as JSON or Python dictionary.

Tests are best for automated batch model checks. You can integrate them as a pipeline step using tools like Airlfow.

2. Reports: interactive dashboards

Note We added a new Report object starting from v0.1.57.dev0. Reports unite the functionality of Dashboards and JSON profiles with a new, cleaner API. You can still use the old Dashboards API but it will soon be depreciated.

Report example

Reports calculate various data and ML metrics and render rich visualizations. You can create a custom Report or run a preset to evaluate a specific aspect of the model or data performance. For example, a Data Quality or Classification Performance report.

You can get an HTML report (best for exploratory analysis and debugging) or export results as JSON or Python dictionary (best for logging, documention or to integrate with BI tools).

3. Real-time ML monitoring

Note This functionality is in development and subject to API change.

Dashboard example

Evidently has monitors that collect data and model metrics from a deployed ML service. You can use it to build live monitoring dashboards. Evidently configures the monitoring on top of streaming data and emits the metrics in Prometheus format. There are pre-built Grafana dashboards to visualize them.

👩‍💻 Installing from PyPI

MAC OS and Linux

Evidently is available as a PyPI package. To install it using pip package manager, run:

$ pip install evidently

If you only want to get results as HTML or JSON files, the installation is now complete. To display the dashboards inside a Jupyter notebook, you need jupyter nbextension. After installing evidently, run the two following commands in the terminal from the evidently directory.

To install jupyter nbextension, run:

$ jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently

To enable it, run:

$ jupyter nbextension enable evidently --py --sys-prefix

That's it! A single run after the installation is enough.

Note: if you use Jupyter Lab, the reports might not display in the notebook. However, you can still save them as HTML files.

Windows

Evidently is available as a PyPI package. To install it using pip package manager, run:

$ pip install evidently

Unfortunately, building reports inside a Jupyter notebook is not yet possible for Windows. The reason is Windows requires administrator privileges to create symlink. In later versions we will address this issue. You can still generate the HTML to view externally.

▶️ Getting started

Note This is a simple Hello World example. You can find a complete Getting Started Tutorial in the docs.

Jupyter Notebook

To start, prepare your data as two pandas DataFrames. The first should include your reference data, the second - current production data. The structure of both datasets should be identical. To run some of the evaluations (e.g. Data Drift), you need input features only. In other cases (e.g. Target Drift, Classification Performance), you need Target and/or Prediction.

Option 1: Test Suites

After installing the tool, import Evidently test suite and required presets. We'll use a simple toy dataset:

import pandas as pd

from sklearn import datasets

from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset
from evidently.test_preset import DataQualityTestPreset

iris_data = datasets.load_iris(as_frame='auto')
iris_frame = iris_data.frame

To run the Data Stability test suite and display the reports in the notebook:

data_stability= TestSuite(tests=[
    DataStabilityTestPreset(),
])
data_stability.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None)
data_stability 

To save the results as an HTML file:

data_stability.save_html("file.html")

You'll need to open it from the destination folder.

To get the output as JSON:

data_stability.json()

Option 2: Reports

After installing the tool, import Evidently report and required presets:

import pandas as pd

from sklearn import datasets

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

iris_data = datasets.load_iris(as_frame='auto')
iris_frame = iris_data.frame

To generate the Data Drift report, run:

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None)
data_drift_report

To save the report as HTML:

data_drift_report.save_html("file.html")

You'll need to open it from the destination folder.

To get the output as JSON:

data_drift_report.json()

💻 Contributions

We welcome contributions! Read the Guide to learn more.

📚 Documentation

For more information, refer to a complete Documentation. You can start with this Tutorial for a quick introduction.

🗂️ Examples

Here you can find simple examples on toy datasets to quickly explore what Evidently can do right out of the box.

ReportJupyter notebookColab notebookContents
Getting Started TutoriallinklinkData Stability and custom test suites, Data Drift and Target Drift reports
Evidently Metric PresetslinklinkData Drift, Target Drift, Data Quality, Regression, Classification reports
Evidently MetricslinklinkAll individual metrics
Evidently Test PresetslinklinkNoTargetPerformance, Data Stability, Data Quality, Data Drift Regression, Milti-class Classification, Binary Classification, Binary Classification top-K test suites
Evidently TestslinklinkAll individual tests

Integrations

See how to integrate Evidently in your prediction pipelines and use it with other tools.

Titlelink to tutorial
Real-time ML monitoring with GrafanaEvidently + Grafana
Batch ML monitoring with AirflowEvidently + Airflow
Log Evidently metrics in MLflow UIEvidently + MLflow

☎️ User Newsletter

To get updates on new features, integrations and code tutorials, sign up for the Evidently User Newsletter.

✅ Discord Community

If you want to chat and connect, join our Discord community!

Docs | Discord Community | User Newsletter | Blog | Twitter

Download Details:

Author: Evidentlyai
Source Code: https://github.com/evidentlyai/evidently 
License: Apache-2.0 license

#machinelearning #datascience #pandas #dataframes #jupyter #notebook 

Evaluate and Monitor ML Models From Validation to Production

Docker-stacks: Ready-to-run Docker images containing Jupyter apps

Jupyter Docker Stacks

Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to do any of the following (and more):

  • Start a personal Jupyter Server with the JupyterLab frontend (default)
  • Run JupyterLab for a team using JupyterHub
  • Start a personal Jupyter Notebook server in a local Docker container
  • Write your own project Dockerfile

Quick Start

You can try a relatively recent build of the jupyter/base-notebook image on mybinder.org by simply clicking the preceding link. Otherwise, the examples below may help you get started if you have Docker installed, know which Docker image you want to use and want to launch a single Jupyter Server in a container.

The User Guide on ReadTheDocs describes additional uses and features in detail.

Example 1:

This command pulls the jupyter/scipy-notebook image tagged 85f615d5cafa from Docker Hub if it is not already present on the local host. It then starts a container running a Jupyter Server and exposes the container's internal port 8888 to port 10000 of the host machine:

docker run -p 10000:8888 jupyter/scipy-notebook:85f615d5cafa

You can modify the port on which the container's port is exposed by changing the value of the -p option to -p 8888:8888.

Visiting http://<hostname>:10000/?token=<token> in a browser loads JupyterLab, where:

  • hostname is the name of the computer running Docker
  • token is the secret token printed in the console.

The container remains intact for restart after the Jupyter Server exits.

Example 2:

This command pulls the jupyter/datascience-notebook image tagged 85f615d5cafa from Docker Hub if it is not already present on the local host. It then starts an ephemeral container running a Jupyter Server and exposes the server on host port 10000.

docker run -it --rm -p 10000:8888 -v "${PWD}":/home/jovyan/work jupyter/datascience-notebook:85f615d5cafa

The use of the -v flag in the command mounts the current working directory on the host (${PWD} in the example command) as /home/jovyan/work in the container. The server logs appear in the terminal.

Visiting http://<hostname>:10000/?token=<token> in a browser loads JupyterLab.

Due to the usage of the flag --rm Docker automatically cleans up the container and removes the file system when the container exits, but any changes made to the ~/work directory and its files in the container will remain intact on the host. The -it flag allocates pseudo-TTY.

Contributing

Please see the Contributor Guide on ReadTheDocs for information about how to contribute package updates, recipes, features, tests, and community maintained stacks.

Maintainer Help Wanted

We value all positive contributions to the Docker stacks project, from bug reports to pull requests to help with answering questions. We'd also like to invite members of the community to help with two maintainer activities:

  • Issue triaging: Reading and providing a first response to issues, labeling issues appropriately, redirecting cross-project questions to Jupyter Discourse
  • Pull request reviews: Reading proposed documentation and code changes, working with the submitter to improve the contribution, deciding if the contribution should take another form (e.g., a recipe instead of a permanent change to the images)

Anyone in the community can jump in and help with these activities at any time. We will happily grant additional permissions (e.g., ability to merge PRs) to anyone who shows an ongoing interest in working on the project.

Jupyter Notebook Deprecation Notice

Following Jupyter Notebook notice, JupyterLab is now the default for all the Jupyter Docker stack images. It is still possible to switch back to Jupyter Notebook (or to launch a different startup command). You can achieve this by passing the environment variable DOCKER_STACKS_JUPYTER_CMD=notebook (or any other valid jupyter subcommand) at container startup, more information is available in the documentation.

According to the Jupyter Notebook project status and its compatibility with JupyterLab, these Docker images may remove the classic Jupyter Notebook interface altogether in favor of another classic-like UI built atop JupyterLab.

This change is tracked in the issue #1217; please check its content for more information.

Alternatives

Resources

CPU Architectures

  • We publish containers for both x86_64 and aarch64 platforms, except for tensorflow-notebook, which only supports x86_64 for now
  • Single-platform images have either aarch64 or x86_64 tag prefixes, for example jupyter/base-notebook:aarch64-python-3.10.5
  • Starting from 2022-09-21, we create multi-platform images

Using old images

This project only builds one set of images at a time. On 2022-10-09, we rebuilt images with old Ubuntu and python versions for users who still need them:

UbuntuPythonTag
20.043.71aac87eb7fa5
20.043.8a374cab4fcb6
20.043.95ae537728c69
20.043.10f3079808ca8c
22.043.7b86753318aa1
22.043.87285848c0a11
22.043.9ed2908bbb62e
22.043.10latest (this image is rebuilt weekly)

Download Details:

Author: jupyter
Source Code: https://github.com/jupyter/docker-stacks 
License: View license

#jupyter #python #docker #notebook 

Docker-stacks: Ready-to-run Docker images containing Jupyter apps

Notebook: Jupyter interactive Notebook

Jupyter Notebook

The Jupyter notebook is a web-based notebook environment for interactive computing.

Jupyter notebook example

Maintained versions

We maintain the two most recently released major versions of Jupyter Notebook, Notebook v5 and Classic Notebook v6. After Notebook v7.0 is released, we will no longer maintain Notebook v5. All Notebook v5 users are strongly advised to upgrade to Classic Notebook v6 as soon as possible.

The Jupyter Notebook project is currently undertaking a transition to a more modern code base built from the ground-up using JupyterLab components and extensions.

There is new stream of work which was submitted and then accepted as a Jupyter Enhancement Proposal (JEP) as part of the next version (v7): https://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7.html

There is also a plan to continue maintaining Notebook v6 with bug and security fixes only, to ease the transition to Notebook v7: https://github.com/jupyter/notebook-team-compass/issues/5#issuecomment-1085254000

Notebook v7

The next major version of Notebook will be based on:

  • JupyterLab components for the frontend
  • Jupyter Server for the Python server

This represents a significant change to the jupyter/notebook code base.

To learn more about Notebook v7: https://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7.html

Classic Notebook v6

Maintainance and security-related issues are now being addressed in the 6.4.x branch.

A 6.5.x branch will be soon created and will depend on nbclassic for the HTML/JavaScript/CSS assets.

New features and continuous improvement is now focused on Notebook v7 (see section above).

If you have an open pull request with a new feature or if you were planning to open one, we encourage switching over to the Jupyter Server and JupyterLab architecture, and distribute it as a server extension and / or JupyterLab prebuilt extension. That way your new feature will also be compatible with the new Notebook v7.

Jupyter notebook, the language-agnostic evolution of IPython notebook

Jupyter notebook is a language-agnostic HTML notebook application for Project Jupyter. In 2015, Jupyter notebook was released as a part of The Big Split™ of the IPython codebase. IPython 3 was the last major monolithic release containing both language-agnostic code, such as the IPython notebook, and language specific code, such as the IPython kernel for Python. As computing spans across many languages, Project Jupyter will continue to develop the language-agnostic Jupyter notebook in this repo and with the help of the community develop language specific kernels which are found in their own discrete repos.

Installation

You can find the installation documentation for the Jupyter platform, on ReadTheDocs. The documentation for advanced usage of Jupyter notebook can be found here.

For a local installation, make sure you have pip installed and run:

pip install notebook

Usage - Running Jupyter notebook

Running in a local installation

Launch with:

jupyter notebook

Running in a remote installation

You need some configuration before starting Jupyter notebook remotely. See Running a notebook server.

Development Installation

See CONTRIBUTING.md for how to set up a local development installation.

Contributing

If you are interested in contributing to the project, see CONTRIBUTING.md.

Community Guidelines and Code of Conduct

This repository is a Jupyter project and follows the Jupyter Community Guides and Code of Conduct.

Resources

Download Details:

Author: Jupyter
Source Code: https://github.com/jupyter/notebook 
License: View license

#jupyter #notebook 

Notebook: Jupyter interactive Notebook
Jarvis  Maggio

Jarvis Maggio

1670371920

Learn About Quarto and Jupyter Notebooks

In this article, we will learn about Notebook Quarto and Jupyter. as a Python user, I’m not that familiar with .Rmd/.qmd files; I use ipynb notebooks most often. And in this post, I’ll show why you might consider using Jupyter Notebooks and how to convert them into beautiful reports with minimal effort – using Quarto.

What is Jupyter Notebook

If you’re an R purist, you may not be familiar with Jupyter Notebooks. I’ll briefly introduce it and then we can jump into Quarto. 

Jupyter Notebook is a web application that provides a streamlined, interactive way to work with code mixed with plots and markdown text. And although it’s popular for Python users, it also supports other languages like R. There are some limitations in creating and sharing computational documentation, but that’s where Quarto comes into play.

Contrary to .qmd/.Rmd files, all outputs, like plots and tables, are saved inside the report file in a .ipynb format. 

This has its pros and cons. On one hand, it’s convenient to be able to embed images into the same file where executable code is. On the other, embedding images into code files makes it hard to version control notebooks. Fortunately, in recent years this has changed; VS Code supports notebooks differences!

Additionally, Jupyter Notebooks are rendered on GitHub so you can easily share your report and make them readable for everyone.

If Jupyter Notebook is so great, why should I consider Quarto?

While the Jupyter Notebook format is very convenient to experiment in, there’s no easy way to convert a notebook into a beautiful report. That is until Quarto entered the picture.

Image 1 - Getting started with R Quarto

Image 1 – Getting started with Quarto

With Quarto, you can easily export your .ipynb file into an interactive HTML with plotly plots, interactive cross-references, and a table of contents!

Is R your preferred language? Get started with Quarto in R with our Quarto tutorial for interactive Markdown documents.

If you work closely with R developers that are used to .Rmd files you’ll find an additional benefit. You can create a single custom theme in .css file for all reports, from .qmd files, and from .ipynb! This way you’ll have consistency across reports created using different technologies. Creating a uniform, professional look in front of your clients! 

It’s always good to remember that Quarto, by leveraging on top of Pandoc, allows exporting to over 50 different formats! This is important as HTML reports are not accepted everywhere. But worry not, you can just as easily export as pdf or another format as needed.

Example of Jupyter Notebook and Quarto

I believe that the best way to learn is through examples. So let’s start by looking at a simple notebook, full of Quarto features compatible with any notebook editor. Actually, this notebook contains the same code as the .qmd file from the previous post.

 

Quarto and Jupyter Notebook code example

And here is the generated report by running quarto render report.ipynb.

Quarto report from Jupyter Notebook

The only important thing is the first cell, the one with yaml configuration, that has to be of type raw. As we can see, all features that we’ve used earlier, work here as well!

Once you’re done with report creation, you might want to check out the self-contained: true Quarto option. It bundles all required css, js files into the HTML file, thus making the report easy to distribute, working without the Internet.

The benefits of using Quarto with Jupyter Notebook

You could ask, OK, so the results are exactly the same as with the qmd file, what’s the deal? With quarto preview, every time you change the notebook and save, the preview gets updated. But what’s important is that the cells’ outputs are taken directly from the notebook, with no need for re-running all cells! 

RStudio (Posit) Connect and Workbench give you the power to create and publish data products at a push of a button! See how Appsilon can help as an RStudio Certified Partner.

This can save you a lot of time. And makes working in Jupyter my favorite way of creating reports. That being said, remember that apart from Python, you can just as well use R or Julia as kernels in Jupyter!

Summing up Quarto and Jupyter Notebook

Jupyter Notebooks provide a fantastic way for iterative experimenting. What they were lacking was the possibility to export the created report to a visually appealing, business-friendly format. And that’s exactly what Quarto does! 


Original article sourced at: https://appsilon.com

#jupyter #notebook 

Learn About Quarto and Jupyter Notebooks
Royce  Reinger

Royce Reinger

1667923380

Kubeflow: Machine Learning toolkit for Kubernetes

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment. 


Documentation

Please refer to the official docs at kubeflow.org.

Working Groups

The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.

Quick Links

Get Involved

Please refer to the Community page.

Download Details:

Author: Kubeflow
Source Code: https://github.com/kubeflow/kubeflow 
License: Apache-2.0 license

#machinelearning #kubernetes #jupyter #notebook #tensorflow 

Kubeflow: Machine Learning toolkit for Kubernetes
Nat  Grady

Nat Grady

1666874880

Jupyterlab-lsp: Language Server Protocol integration for Jupyter(Lab)

Language Server Protocol integration for Jupyter(Lab)

Features

Examples show Python code, but most features also work in R, bash, typescript, and many other languages.

Hover

Hover over any piece of code; if an underline appears, you can press Ctrl to get a tooltip with function/class signature, module documentation or any other piece of information that the language server provides

hover

Diagnostics

Critical errors have red underline, warnings are orange, etc. Hover over the underlined code to see a more detailed message

inspections

Jump to Definition and References

Use the context menu entry, or Alt + :computer_mouse: to jump to definitions/references (you can change it to Ctrl/⌘ in settings); use Alt + o to jump back.

jump

Highlight References

Place your cursor on a variable, function, etc and all the usages will be highlighted

Automatic Completion and Continuous Hinting

  • Certain characters, for example '.' (dot) in Python, will automatically trigger completion.
  • You can choose to receive the completion suggestions as you type by enabling continuousHinting setting.

invoke

Automatic Signature Suggestions

Function signatures will automatically be displayed

signature

Kernel-less Autocompletion

Advanced static-analysis autocompletion without a running kernel

autocompletion

The runtime kernel suggestions are still there

When a kernel is available the suggestions from the kernel (such as keys of a dict and columns of a DataFrame) are merged with the suggestions from the Language Server (in notebook).

If the kernel is too slow to respond promptly only the Language Server suggestions will be shown (default threshold: 0.6s). You can configure the completer to not attempt to fetch the kernel completions if the kernel is busy (skipping the 0.6s timeout).

You can deactivate the kernel suggestions by adding "Kernel" to the disableCompletionsFrom in the completion section of Advanced Settings. Alternatively if you only want kernel completions you can add "LSP" to the same setting; Or add both if you like to code in hardcore mode and get no completions, or if another provider has been added.

Rename

Rename variables, functions and more, in both: notebooks and the file editor. Use the context menu option or the F2 shortcut to invoke.

rename

Diagnostics panel

Sort and jump between the diagnostics using the diagnostics panel. Open it searching for "Show diagnostics panel" in JupyterLab commands palette or from the context menu. Use context menu on rows in the panel to filter out diagnostics or copy their message.

panel

Prerequisites

You will need to have both of the following installed:

  • JupyterLab >=3.3.0,<4.0.0a0
  • Python 3.7+

In addition, if you wish to use javascript, html, markdown or any other NodeJS-based language server you will need to have appropriate NodeJS version installed.

Note: Installation for JupyterLab 2.x requires a different procedure, please consult the documentation for the extension version 2.x.

Installation

For more extensive installation instructions, see the documentation.

For the current stable version, the following steps are recommended. Use of a python virtualenv or a conda env is also recommended.

install python 3

conda install -c conda-forge python=3

install JupyterLab and the extensions

conda install -c conda-forge 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp
# or
pip install 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp

Note: jupyterlab-lsp provides both the server extension and the lab extension.

Note: With conda, you could take advantage of the bundles: jupyter-lsp-python or jupyter-lsp-r to install both the server extension and the language server.

install LSP servers for languages of your choice; for example, for Python (pylsp) and R (languageserver) servers:

pip install 'python-lsp-server[all]'
R -e 'install.packages("languageserver")'

or from conda-forge

conda install -c conda-forge python-lsp-server r-languageserver

Please see our full list of supported language servers which includes installation hints for the common package managers (npm/pip/conda). In general, any LSP server from the Microsoft list should work after some additional configuration.

Note: it is worth visiting the repository of each server you install as many provide additional configuration options.

Restart JupyterLab

If JupyterLab is running when you installed the extension, a restart is required for the server extension and any language servers to be recognized by JupyterLab.

(Optional, IPython users only) to improve the performance of autocompletion, disable Jedi in IPython (the LSP servers for Python use Jedi too). You can do that temporarily with:

%config Completer.use_jedi = False

or permanently by setting c.Completer.use_jedi = False in your ipython_config.py file.

(Optional, Linux/OSX-only) As a security measure by default Jupyter server only allows access to files under the Jupyter root directory (the place where you launch the Jupyter server). Thus, in order to allow jupyterlab-lsp to navigate to external files such as packages installed system-wide or to libraries inside a virtual environment (conda, pip, ...) this access control mechanism needs to be circumvented: inside your Jupyter root directory create a symlink named .lsp_symlink pointing to your system root /.

ln -s / .lsp_symlink

As this symlink is a hidden file the Jupyter server must be instructed to serve hidden files. Either use the appropriate command line flag:

jupyter lab --ContentsManager.allow_hidden=True

or, alternatively, set the corresponding setting inside your jupyter_server_config.py.

Help in implementing a custom ContentsManager which will enable navigating to external files without the symlink is welcome.

Configuring the servers

Server configurations can be edited using the Advanced Settings editor in JupyterLab (Settings > Advanced Settings Editor). For settings specific to each server, please see the table of language servers. Example settings might include:

Note: for the new (currently recommended) python-lsp-server replace pyls occurrences with pylsp

{
  "language_servers": {
    "pyls": {
      "serverSettings": {
        "pyls.plugins.pydocstyle.enabled": true,
        "pyls.plugins.pyflakes.enabled": false,
        "pyls.plugins.flake8.enabled": true
      }
    },
    "r-languageserver": {
      "serverSettings": {
        "r.lsp.debug": false,
        "r.lsp.diagnostics": false
      }
    }
  }
}

The serverSettings key specifies the configurations sent to the language servers. These can be written using stringified dot accessors like above (in the VSCode style), or as nested JSON objects, e.g.:

{
  "language_servers": {
    "pyls": {
      "serverSettings": {
        "pyls": {
          "plugins": {
            "pydocstyle": {
              "enabled": true
            },
            "pyflakes": {
              "enabled": false
            },
            "flake8": {
              "enabled": true
            }
          }
        }
      }
    }
  }
}

Other configuration methods

Some language servers, such as pyls, provide other configuration methods in addition to language-server configuration messages (accessed using the Advanced Settings Editor). For example, pyls allows users to configure the server using a local configuration file. You can change the inspection/diagnostics for server plugins like pycodestyle there.

The exact configuration details will vary between operating systems (please see the configuration section of pycodestyle documentation), but as an example, on Linux you would simply need to create a file called ~/.config/pycodestyle, which may look like that:

[pycodestyle]
ignore = E402, E703
max-line-length = 120

In the example above:

  • ignoring E402 allows imports which are not on the very top of the file,
  • ignoring E703 allows terminating semicolon (useful for matplotlib plots),
  • the maximal allowed line length is increased to 120.

After changing the configuration you may need to restart the JupyterLab, and please be advised that the errors in configuration may prevent the servers from functioning properly.

Again, please do check the pycodestyle documentation for specific error codes, and check the configuration of other feature providers and language servers as needed.

Acknowledgements

This would not be possible without the fantastic initial work at wylieconlon/lsp-editor-adapter.

Download Details:

Author: jupyter-lsp
Source Code: https://github.com/jupyter-lsp/jupyterlab-lsp 
License: BSD-3-Clause license

#r #jupyter #notebook 

Jupyterlab-lsp: Language Server Protocol integration for Jupyter(Lab)
Rupert  Beatty

Rupert Beatty

1666108883

FSnotes: Notes manager for macOS/iOS

FSNotes

FSNotes is modern notes manager for macOS and iOS.

macOS app macOS FSNotes

Key features

  • Markdown-first. Also supports any plaintext and RTF files.
  • Fast and lightweight. Works smoothly with 10k+ files.
  • Access anywhere. Sync with iCloud Drive or Dropbox.
  • Multi-folder storage.
  • Keyboard-centric. nvalt-inspired controls and shortcuts.
  • Syntax highlighting within code blocks. Supports over 170 programming languages.
  • In-line image support.
  • Organize with tags.
  • Cross-note links using [[double brackets]].
  • Elastic two-pane view. Choose a vertical or horizontal layout.
  • External editor support (changes seamless live sync with UI).
  • Pin important notes.
  • Quickly copy notes to the clipboard.
  • Dark mode.
  • AES-256 encryption.
  • Mermaid and MathJax support.
  • Optional Git versioning and backups.

iOS app 

FSNotes for iOS FSNotes for iOS

Key features

  • Sync via iCloud Drive.
  • 3D Touch and configurable keyboard.
  • TextBundle and EncryptedTextBundle containers.
  • Pinned notes kept in sync with the desktop app.
  • Dynamic fonts (iOS 11+).
  • Night mode by location or screen brightness.
  • Sharing extension.
  • Encrypted note support.

Download Details:

Author: Glushchenko
Source Code: https://github.com/glushchenko/fsnotes 
License: MIT license

#swift #ios #mac #notebook 

FSnotes: Notes manager for macOS/iOS