1608978638
Are tapes coming back?
IBM , from its labs in Zurich, Switzerland, announced that its scientists, in collaboration with Japan-based Fujifilms have been able to demonstrate a 29 times increase in the storage capacity of its current data tape cartridge.
#ibm #data #big-data #datacenter #storage #technology
1608978638
Are tapes coming back?
IBM , from its labs in Zurich, Switzerland, announced that its scientists, in collaboration with Japan-based Fujifilms have been able to demonstrate a 29 times increase in the storage capacity of its current data tape cartridge.
#ibm #data #big-data #datacenter #storage #technology
1653075360
HAML-Lint
haml-lint
is a tool to help keep your HAML files clean and readable. In addition to HAML-specific style and lint checks, it integrates with RuboCop to bring its powerful static analysis tools to your HAML documents.
You can run haml-lint
manually from the command line, or integrate it into your SCM hooks.
gem install haml_lint
If you'd rather install haml-lint
using bundler
, don't require
it in your Gemfile
:
gem 'haml_lint', require: false
Then you can still use haml-lint
from the command line, but its source code won't be auto-loaded inside your application.
Run haml-lint
from the command line by passing in a directory (or multiple directories) to recursively scan:
haml-lint app/views/
You can also specify a list of files explicitly:
haml-lint app/**/*.html.haml
haml-lint
will output any problems with your HAML, including the offending filename and line number.
haml-lint
assumes all files are encoded in UTF-8.
Command Line Flag | Description |
---|---|
--auto-gen-config | Generate a configuration file acting as a TODO list |
--auto-gen-exclude-limit | Number of failures to allow in the TODO list before the entire rule is excluded |
-c /--config | Specify which configuration file to use |
-e /--exclude | Exclude one or more files from being linted |
-i /--include-linter | Specify which linters you specifically want to run |
-x /--exclude-linter | Specify which linters you don't want to run |
-r /--reporter | Specify which reporter you want to use to generate the output |
-p /--parallel | Run linters in parallel using available CPUs |
--fail-fast | Specify whether to fail after the first file with lint |
--fail-level | Specify the minimum severity (warning or error) for which the lint should fail |
--[no-]color | Whether to output in color |
--[no-]summary | Whether to output a summary in the default reporter |
--show-linters | Show all registered linters |
--show-reporters | Display available reporters |
-h /--help | Show command line flag documentation |
-v /--version | Show haml-lint version |
-V /--verbose-version | Show haml-lint , haml , and ruby version information |
haml-lint
will automatically recognize and load any file with the name .haml-lint.yml
as a configuration file. It loads the configuration based on the directory haml-lint
is being run from, ascending until a configuration file is found. Any configuration loaded is automatically merged with the default configuration (see config/default.yml
).
Here's an example configuration file:
linters:
ImplicitDiv:
enabled: false
severity: error
LineLength:
max: 100
All linters have an enabled
option which can be true
or false
, which controls whether the linter is run, along with linter-specific options. The defaults are defined in config/default.yml
.
Option | Description |
---|---|
enabled | If false , this linter will never be run. This takes precedence over any other option. |
include | List of files or glob patterns to scope this linter to. This narrows down any files specified via the command line. |
exclude | List of files or glob patterns to exclude from this linter. This excludes any files specified via the command line or already filtered via the include option. |
severity | The severity of the linter. External tools consuming haml-lint output can use this to determine whether to warn or error based on the lints reported. |
The exclude
global configuration option allows you to specify a list of files or glob patterns to exclude from all linters. This is useful for ignoring third-party code that you don't maintain or care to lint. You can specify a single string or a list of strings for this option.
Some static blog generators such as Jekyll include leading frontmatter to the template for their own tracking purposes. haml-lint
allows you to ignore these headers by specifying the skip_frontmatter
option in your .haml-lint.yml
configuration:
skip_frontmatter: true
The inherits_from
global configuration option allows you to specify an inheritance chain for a configuration file. It accepts either a scalar value of a single file name or a vector of multiple files to inherit from. The inherited files are resolved in a first in, first out order and with "last one wins" precedence. For example:
inherits_from:
- .shared_haml-lint.yml
- .personal_haml-lint.yml
First, the default configuration is loaded. Then the .shared_haml-lint.yml
configuration is loaded, followed by .personal_haml-lint.yml
. Each of these overwrite each other in the event of a collision in configuration value. Once the inheritance chain is resolved, the base configuration is loaded and applies its rules to overwrite any in the intermediate configuration.
Lastly, in order to match your RuboCop configuration style, you can also use the inherit_from
directive, which is an alias for inherits_from
.
haml-lint
is an opinionated tool that helps you enforce a consistent style in your HAML files. As an opinionated tool, we've had to make calls about what we think are the "best" style conventions, even when there are often reasonable arguments for more than one possible style. While all of our choices have a rational basis, we think that the opinions themselves are less important than the fact that haml-lint
provides us with an automated and low-cost means of enforcing consistency.
Add the following to your configuration file:
require:
- './relative/path/to/my_first_linter.rb'
- 'absolute/path/to/my_second_linter.rb'
The files that are referenced by this config should have the following structure:
module HamlLint
# MyFirstLinter is the name of the linter in this example, but it can be anything
class Linter::MyFirstLinter < Linter
include LinterRegistry
def visit_tag
return unless node.tag_name == 'div'
record_lint(node, "You're not allowed divs!")
end
end
end
For more information on the different types on HAML node, please look through the HAML parser code: https://github.com/haml/haml/blob/master/lib/haml/parser.rb
Keep in mind that by default your linter will be disabled by default. So you will need to enable it in your configuration file to have it run.
One or more individual linters can be disabled locally in a file by adding a directive comment. These comments look like the following:
-# haml-lint:disable AltText, LineLength
[...]
-# haml-lint:enable AltText, LineLength
You can disable all linters for a section with the following:
-# haml-lint:disable all
A directive will disable the given linters for the scope of the block. This scope is inherited by child elements and sibling elements that come after the comment. For example:
-# haml-lint:disable AltText
#content
%img#will-not-show-lint-1{ src: "will-not-show-lint-1.png" }
-# haml-lint:enable AltText
%img#will-show-lint-1{ src: "will-show-lint-1.png" }
.sidebar
%img#will-show-lint-2{ src: "will-show-lint-2.png" }
%img#will-not-show-lint-2{ src: "will-not-show-lint-2.png" }
The #will-not-show-lint-1
image on line 2 will not raise an AltText
lint because of the directive on line 1. Since that directive is at the top level of the tree, it applies everywhere.
However, on line 4, the directive enables the AltText
linter for the remainder of the #content
element's content. This means that the #will-show-lint-1
image on line 5 will raise an AltText
lint because it is a sibling of the enabling directive that appears later in the #content
element. Likewise, the #will-show-lint-2
image on line 7 will raise an AltText
lint because it is a child of a sibling of the enabling directive.
Lastly, the #will-not-show-lint-2
image on line 8 will not raise an AltText
lint because the enabling directive on line 4 exists in a separate element and is not a sibling of the it.
If there are multiple directives for the same linter in an element, the last directive wins. For example:
-# haml-lint:enable AltText
%p Hello, world!
-# haml-lint:disable AltText
%img#will-not-show-lint{ src: "will-not-show-lint.png" }
There are two conflicting directives for the AltText
linter. The first one enables it, but the second one disables it. Since the disable directive came later, the #will-not-show-lint
element will not raise an AltText
lint.
You can use this functionality to selectively enable directives within a file by first using the haml-lint:disable all
directive to disable all linters in the file, then selectively using haml-lint:enable
to enable linters one at a time.
Adding a new linter into a project that wasn't previously using one can be a daunting task. To help ease the pain of starting to use Haml-Lint, you can generate a configuration file that will exclude all linters from reporting lint in files that currently have lint. This gives you something similar to a to-do list where the violations that you had when you started using Haml-Lint are listed for you to whittle away, but ensuring that any views you create going forward are properly linted.
To use this functionality, call Haml-Lint like:
haml-lint --auto-gen-config
This will generate a .haml-lint_todo.yml
file that contains all existing lint as exclusions. You can then add inherits_from: .haml-lint_todo.yml
to your .haml-lint.yml
configuration file to ensure these exclusions are used whenever you call haml-lint
.
By default, any rules with more than 15 violations will be disabled in the todo-file. You can increase this limit with the auto-gen-exclude-limit
option:
haml-lint --auto-gen-config --auto-gen-exclude-limit 100
If you use vim
, you can have haml-lint
automatically run against your HAML files after saving by using the Syntastic plugin. If you already have the plugin, just add let g:syntastic_haml_checkers = ['haml_lint']
to your .vimrc
.
If you use vim
8+ or Neovim
, you can have haml-lint
automatically run against your HAML files as you type by using the Asynchronous Lint Engine (ALE) plugin. ALE will automatically lint your HAML files if it detects haml-lint
in your PATH
.
If you use SublimeLinter 3
with Sublime Text 3
you can install the SublimeLinter-haml-lint plugin using Package Control.
If you use atom
, you can install the linter-haml plugin.
If you use TextMate 2
, you can install the Haml-Lint.tmbundle bundle.
If you use Visual Studio Code
, you can install the Haml Lint extension
If you'd like to integrate haml-lint
into your Git workflow, check out our Git hook manager, overcommit.
To execute haml-lint
via a Rake task, make sure you have rake
included in your gem path (e.g. via Gemfile
) add the following to your Rakefile
:
require 'haml_lint/rake_task'
HamlLint::RakeTask.new
By default, when you execute rake haml_lint
, the above configuration is equivalent to running haml-lint .
, which will lint all .haml
files in the current directory and its descendants.
You can customize your task by writing:
require 'haml_lint/rake_task'
HamlLint::RakeTask.new do |t|
t.config = 'custom/config.yml'
t.files = ['app/views', 'custom/*.haml']
t.quiet = true # Don't display output from haml-lint to STDOUT
end
You can also use this custom configuration with a set of files specified via the command line:
# Single quotes prevent shell glob expansion
rake 'haml_lint[app/views, custom/*.haml]'
Files specified in this manner take precedence over the task's files
attribute.
Code documentation is generated with YARD and hosted by RubyDoc.info.
We love getting feedback with or without pull requests. If you do add a new feature, please add tests so that we can avoid breaking it in the future.
Speaking of tests, we use Appraisal to test against both HAML 4 and 5. We use rspec
to write our tests. To run the test suite, execute the following from the root directory of the repository:
appraisal bundle install
appraisal bundle exec rspec
All major discussion surrounding HAML-Lint happens on the GitHub issues page.
If you're interested in seeing the changes and bug fixes between each version of haml-lint
, read the HAML-Lint Changelog.
Author: sds
Source Code: https://github.com/sds/haml-lint
License: MIT license
1642995900
Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.
With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling:
df.plot_bokeh()
Pandas-Bokeh also provides native support as a Pandas Plotting backend for Pandas >= 0.25. When Pandas-Bokeh is installed, switchting the default Pandas plotting backend to Bokeh can be done via:
pd.set_option('plotting.backend', 'pandas_bokeh')
More details about the new Pandas backend can be found below.
Please visit:
https://patrikhlobil.github.io/Pandas-Bokeh/
for an interactive version of the documentation below, where you can play with the dynamic Bokeh plots.
For more information have a look at the Examples below or at notebooks on the Github Repository of this project.
You can install Pandas-Bokeh from PyPI via pip
pip install pandas-bokeh
or conda:
conda install -c patrikhlobil pandas-bokeh
With the current release 0.5.5, Pandas-Bokeh officially supports Python 3.6 and newer. For more details, see Release Notes.
The Pandas-Bokeh library should be imported after Pandas, GeoPandas and/or Pyspark. After the import, one should define the plotting output, which can be:
For more details about the plotting outputs, see the reference here or the Bokeh documentation.
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_file("Interactive Plot.html")
For pandas >= 0.25, a plotting backend switch is natively supported. It can be achievied by calling:
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
Now, the plotting API is accessible for a Pandas DataFrame via:
df.plot(...)
All additional functionalities of Pandas-Bokeh are then accessible at pd.plotting. So, setting the output to notebook is:
pd.plotting.output_notebook()
or calling the grid layout functionality:
pd.plotting.plot_grid(...)
Note: Backwards compatibility is kept since there will still be the df.plot_bokeh(...) methods for a DataFrame.
Supported plottypes are at the moment:
Also, check out the complementary chapter Outputs, Formatting & Layouts about:
This simple lineplot in Pandas-Bokeh already contains various interactive elements:
Consider the following simple example:
import numpy as np
np.random.seed(42)
df = pd.DataFrame({"Google": np.random.randn(1000)+0.2,
"Apple": np.random.randn(1000)+0.17},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line") #equivalent to df.plot_bokeh.line()
Note, that similar to the regular pandas.DataFrame.plot method, there are also additional accessors to directly access the different plotting types like:
df.plot_bokeh(kind="line", ...)
→ df.plot_bokeh.line(...)
df.plot_bokeh(kind="bar", ...)
→ df.plot_bokeh.bar(...)
df.plot_bokeh(kind="hist", ...)
→ df.plot_bokeh.hist(...)
There are various optional parameters to tune the plots, for example:
Try them out to get a feeling for the effects. Let us consider now:
df.plot_bokeh.line(
figsize=(800, 450),
y="Apple",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
toolbar_location=None,
colormap=["red", "blue"],
hovertool_string=r"""<img
src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Apple_logo_black.svg/170px-Apple_logo_black.svg.png'
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img> Apple
<h4> Stock Price: </h4> @{Apple}""",
panning=False,
zooming=False)
For lineplots, as for many other plot-kinds, there are some special keyword arguments that only work for this plotting type. For lineplots, these are:
Let us use this information to have another version of the same plot:
df.plot_bokeh.line(
figsize=(800, 450),
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(100, 200),
xlim=("2001-01-01", "2001-02-01"),
colormap=["red", "blue"],
plot_data_points=True,
plot_data_points_size=10,
marker="asterisk")
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()
df.plot_bokeh(rangetool=True)
If you just wish to draw the date points for curves, the pointplot option is the right choice. It also accepts the kwargs of bokeh.plotting.figure.scatter like marker or size:
import numpy as np
x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
x="x",
xticks=range(-3, 4),
size=5,
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
marker="x")
With a similar API as the line- & pointplots, one can generate a stepplot. Additional keyword arguments for this plot type are passes to bokeh.plotting.figure.step, e.g. mode (before, after, center), see the following example
import numpy as np
x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
figsize=(800,300),
fontsize_title=30,
fontsize_label=25,
fontsize_ticks=15,
fontsize_legend=5,
)
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
mode="after",
figsize=(800,300)
)
Note that the step-plot API of Bokeh does so far not support a hovertool functionality.
A basic scatterplot can be created using the kind="scatter" option. For scatterplots, the x and y parameters have to be specified and the following optional keyword argument is allowed:
category: Determines the category column to use for coloring the scatter points
kwargs**: Optional keyword arguments of bokeh.plotting.figure.scatter
Note, that the pandas.DataFrame.plot_bokeh() method return per default a Bokeh figure, which can be embedded in Dashboard layouts with other figures and Bokeh objects (for more details about (sub)plot layouts and embedding the resulting Bokeh plots as HTML click here).
In the example below, we use the building grid layout support of Pandas-Bokeh to display both the DataFrame (using a Bokeh DataTable) and the resulting scatterplot:
# Load Iris Dataset:
df = pd.read_csv(
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)
# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource
data_table = DataTable(
columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
source=ColumnDataSource(df),
height=300,
)
# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False,
)
# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)
A possible optional keyword parameters that can be passed to bokeh.plotting.figure.scatter is size. Below, we use the sepal length of the Iris data as reference for the size:
#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15
#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization with Size Keyword",
size="sepal length (cm)")
In this example you can see, that the additional dimension sepal length cannot be used to clearly differentiate between the virginica and versicolor species.
The barplot API has no special keyword arguments, but accepts optional kwargs of bokeh.plotting.figure.vbar like alpha. It uses per default the index for the bar categories (however, also columns can be used as x-axis category using the x argument).
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6)
Using the stacked keyword argument you also maked stacked barplots:
p_stacked_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
stacked=True,
alpha=0.6)
Also horizontal versions of the above barplot are supported with the keyword kind="barh" or the accessor plot_bokeh.barh. You can still specify a column of the DataFrame as the bar category via the x argument if you do not wish to use the index.
#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)
#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
kind="barh",
x="fruits",
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
x="fruits",
stacked=True,
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
[p_hbar, p_stacked_hbar]],
plot_width=450)
For drawing histograms (kind="hist"), Pandas-Bokeh has a lot of customization features. Optional keyword arguments for histogram plots are:
Below examples of the different histogram types:
import numpy as np
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Top-on-Top)",
line_color="black")
#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
kind="hist",
bins=np.linspace(-5, 5, 41),
histogram_type="sidebyside",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Side-by-Side)",
line_color="black")
#Stacked histogram:
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
histogram_type="stacked",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Stacked)",
line_color="black")
Further, advanced keyword arguments for histograms are:
Their usage is shown in these examples:
p_hist = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed)",
show_average=True,
xlim=(-4, 6),
ylim=(0, 30),
show_figure=False)
p_hist_cum = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
cumulative=True,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed & cumulative)",
show_figure=False)
pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300)
Areaplot (kind="area") can be either drawn on top of each other or stacked. The important parameters are:
stacked: If True, the areaplots are stacked. If False, plots are drawn on top of each other. Default: False
kwargs**: Optional keyword arguments of bokeh.plotting.figure.patch
Let us consider the energy consumption split by source that can be downloaded as DataFrame via:
df_energy = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/energy/energy.csv",
parse_dates=["Year"])
df_energy.head()
Year | Oil | Gas | Coal | Nuclear Energy | Hydroelectricity | Other Renewable |
---|---|---|---|---|---|---|
1970-01-01 | 2291.5 | 826.7 | 1467.3 | 17.7 | 265.8 | 5.8 |
1971-01-01 | 2427.7 | 884.8 | 1459.2 | 24.9 | 276.4 | 6.3 |
1972-01-01 | 2613.9 | 933.7 | 1475.7 | 34.1 | 288.9 | 6.8 |
1973-01-01 | 2818.1 | 978.0 | 1519.6 | 45.9 | 292.5 | 7.3 |
1974-01-01 | 2777.3 | 1001.9 | 1520.9 | 59.6 | 321.1 | 7.7 |
Creating the Areaplot can be achieved via:
df_energy.plot_bokeh.area(
x="Year",
stacked=True,
legend="top_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent",
ylim=(0, 16000))
Note that the energy consumption of fossile energy is still increasing and renewable energy sources are still small in comparison 😢!!! However, when we norm the plot using the normed keyword, there is a clear trend towards renewable energies in the last decade:
df_energy.plot_bokeh.area(
x="Year",
stacked=True,
normed=100,
legend="bottom_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent")
For Pieplots, let us consider a dataset showing the results of all Bundestags elections in Germany since 2002:
df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie
Partei | 2002 | 2005 | 2009 | 2013 | 2017 |
---|---|---|---|---|---|
CDU/CSU | 38.5 | 35.2 | 33.8 | 41.5 | 32.9 |
SPD | 38.5 | 34.2 | 23.0 | 25.7 | 20.5 |
FDP | 7.4 | 9.8 | 14.6 | 4.8 | 10.7 |
Grünen | 8.6 | 8.1 | 10.7 | 8.4 | 8.9 |
Linke/PDS | 4.0 | 8.7 | 11.9 | 8.6 | 9.2 |
AfD | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
Sonstige | 3.0 | 4.0 | 6.0 | 11.0 | 5.0 |
We can create a Pieplot of the last election in 2017 by specifying the "Partei" (german for party) column as the x column and the "2017" column as the y column for values:
df_pie.plot_bokeh.pie(
x="Partei",
y="2017",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Election 2017",
)
When you pass several columns to the y parameter (not providing the y-parameter assumes you plot all columns), multiple nested pieplots will be shown in one plot:
df_pie.plot_bokeh.pie(
x="Partei",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Elections [2002-2017]",
line_color="grey")
The mapplot method of Pandas-Bokeh allows for plotting geographic points stored in a Pandas DataFrame on an interactive map. For more advanced Geoplots for line and polygon shapes have a look at the Geoplots examples for the GeoPandas API of Pandas-Bokeh.
For mapplots, only (latitude, longitude) pairs in geographic projection (WGS84) can be plotted on a map. The basic API has the following 2 base parameters:
The other optional keyword arguments are discussed in the section about the GeoPandas API, e.g. category for coloring the points.
Below an example of plotting all cities for more than 1 million inhabitants:
df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
name | pop_max | latitude | longitude | size |
---|---|---|---|---|
Mesa | 1085394 | 33.423915 | -111.736084 | 1.085394 |
Sharjah | 1103027 | 25.371383 | 55.406478 | 1.103027 |
Changwon | 1081499 | 35.219102 | 128.583562 | 1.081499 |
Sheffield | 1292900 | 53.366677 | -1.499997 | 1.292900 |
Abbottabad | 1183647 | 34.149503 | 73.199501 | 1.183647 |
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
x="longitude",
y="latitude",
hovertool_string="""<h2> @{name} </h2>
<h3> Population: @{pop_max} </h3>""",
tile_provider="STAMEN_TERRAIN_RETINA",
size="size",
figsize=(900, 600),
title="World cities with more than 1.000.000 inhabitants")
Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :
Note: t is not possible to mix up the objects types, i.e. a GeoDataFrame with Points and Lines is for example not allowed.
Les us start with a simple example using the "World Borders Dataset" . Let us first import all neccessary libraries and read the shapefile:
import geopandas as gpd
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()
#Read in GeoJSON from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states.head()
STATE_NAME | REGION | POPESTIMATE2010 | POPESTIMATE2011 | POPESTIMATE2012 | POPESTIMATE2013 | POPESTIMATE2014 | POPESTIMATE2015 | POPESTIMATE2016 | POPESTIMATE2017 | geometry |
---|---|---|---|---|---|---|---|---|---|---|
Hawaii | 4 | 1363817 | 1378323 | 1392772 | 1408038 | 1417710 | 1426320 | 1428683 | 1427538 | (POLYGON ((-160.0738033454681 22.0041773479577... |
Washington | 4 | 6741386 | 6819155 | 6890899 | 6963410 | 7046931 | 7152818 | 7280934 | 7405743 | (POLYGON ((-122.4020153103835 48.2252163723779... |
Montana | 4 | 990507 | 996866 | 1003522 | 1011921 | 1019931 | 1028317 | 1038656 | 1050493 | POLYGON ((-111.4754253002074 44.70216236909688... |
Maine | 1 | 1327568 | 1327968 | 1328101 | 1327975 | 1328903 | 1327787 | 1330232 | 1335907 | (POLYGON ((-69.77727626137293 44.0741483685119... |
North Dakota | 2 | 674518 | 684830 | 701380 | 722908 | 738658 | 754859 | 755548 | 755393 | POLYGON ((-98.73043728833767 45.93827137024809... |
Plotting the data on a map is as simple as calling:
df_states.plot_bokeh(simplify_shapes=10000)
We also passed the optional parameter simplify_shapes (~meter) to improve plotting performance (for a reference see shapely.object.simplify). The above geolayer thus has an accuracy of about 10km.
Many keyword arguments like xlabel, ylabel, xlim, ylim, title, colormap, hovertool, zooming, panning, ... for costumizing the plot are also available for the geoplotting API and can be uses as in the examples shown above. There are however also many other options especially for plotting geodata:
One of the most common usage of map plots are choropleth maps, where the color of a the objects is determined by the property of the object itself. There are 3 ways of drawing choropleth maps using Pandas-Bokeh, which are described below.
This is the simplest way. Just provide the category keyword for the selection of the property column:
Let us now draw the regions as a choropleth plot using the category keyword (at the moment, only numerical columns are supported for choropleth plots):
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
category="REGION",
show_colorbar=False,
colormap=["blue", "yellow", "green", "red"],
hovertool_columns=["STATE_NAME", "REGION"],
tile_provider="STAMEN_TERRAIN_RETINA")
When hovering over the states, the state-name and the region are shown as specified in the hovertool_columns argument.
By passing a list of column names of the GeoDataFrame as the dropdown keyword argument, a dropdown menu is shown above the map. This dropdown menu can be used to select the choropleth layer by the user. :
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
dropdown=["POPESTIMATE2010", "POPESTIMATE2017"],
colormap="Viridis",
hovertool_string="""
<img
src="https://www.states101.com/img/flags/gif/small/@STATE_NAME_SMALL.gif"
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img>
<h2> @STATE_NAME </h2>
<h3> 2010: @POPESTIMATE2010 </h3>
<h3> 2017: @POPESTIMATE2017 </h3>""",
tile_provider_url=r"http://c.tile.stamen.com/watercolor/{Z}/{X}/{Y}.jpg",
tile_attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, under <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a>. Data by <a href="http://openstreetmap.org">OpenStreetMap</a>, under <a href="http://www.openstreetmap.org/copyright">ODbL</a>.'
)
Using hovertool_string, one can pass a string that can contain arbitrary HTML elements (including divs, images, ...) that is shown when hovering over the geographies (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation).
Here, we also used an OSM tile server with watercolor style via tile_provider_url and added the attribution via tile_attribution.
Another option for interactive choropleth maps is the slider implementation of Pandas-Bokeh. The possible keyword arguments are here:
This can be used to display the change in population relative to the year 2010:
#Calculate change of population relative to 2010:
for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100
#Specify slider columns:
slider_columns = ["Delta_Population_201%d"%i for i in range(8)]
#Specify slider-range (Maps "Delta_Population_2010" -> 2010,
# "Delta_Population_2011" -> 2011, ...):
slider_range = range(2010, 2018)
#Make slider plot:
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
slider=slider_columns,
slider_range=slider_range,
slider_name="Year",
colormap="Inferno",
hovertool_columns=["STATE_NAME"] + slider_columns,
title="Change of Population [%]")
If you wish to display multiple geolayers, you can pass the Bokeh figure of a Pandas-Bokeh plot via the figure keyword to the next plot_bokeh() call:
import geopandas as gpd
import pandas_bokeh
pandas_bokeh.output_notebook()
# Read in GeoJSONs from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_cities = gpd.read_file(
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson"
)
df_cities["size"] = df_cities.pop_max / 400000
#Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
figsize=(800, 450),
simplify_shapes=10000,
show_figure=False,
xlim=[-170, -80],
ylim=[10, 70],
category="REGION",
colormap="Dark2",
legend="States",
show_colorbar=False,
)
#Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
figure=figure, # <== pass figure here!
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
marker="inverted_triangle",
legend="Cities",
)
Below, you can see an example that use Pandas-Bokeh to plot point data on a map. The plot shows all cities with a population larger than 1.000.000. For point plots, you can select the marker as keyword argument (since it is passed to bokeh.plotting.figure.scatter). Here an overview of all available marker types:
gdf = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")
gdf["size"] = gdf.pop_max / 400000
gdf.plot_bokeh(
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
xlim=[-15, 35],
ylim=[30,60],
marker="inverted_triangle");
In a similar way, also GeoDataFrames with (multi)line shapes can be drawn using Pandas-Bokeh.
If you want to display the numerical labels on your colorbar with an alternative to the scientific format, you can pass in a one of the bokeh number string formats or an instance of one of the bokeh.models.formatters to the colorbar_tick_format
argument in the geoplot
An example of using the string format argument:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
# pass in a string format to colorbar_tick_format to display the ticks as 10m rather than 1e7
df_states.plot_bokeh(
figsize=(900, 600),
category="POPESTIMATE2017",
simplify_shapes=5000,
colormap="Inferno",
colormap_uselog=True,
colorbar_tick_format="0.0a")
An example of using the bokeh PrintfTickFormatter
:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100
# pass in a PrintfTickFormatter instance colorbar_tick_format to display the ticks with 2 decimal places
df_states.plot_bokeh(
figsize=(900, 600),
category="Delta_Population_2017",
simplify_shapes=5000,
colormap="Inferno",
colorbar_tick_format=PrintfTickFormatter(format="%4.2f"))
The pandas.DataFrame.plot_bokeh API has the following additional keyword arguments:
If you have a Bokeh figure or layout, you can also use the pandas_bokeh.embedded_html function to generate an embeddable HTML representation of the plot. This can be included into any valid HTML (note that this is not possible directly with the HTML generated by the pandas_bokeh.output_file output option, because it includes an HTML header). Let us consider the following simple example:
#Import Pandas and Pandas-Bokeh (if you do not specify an output option, the standard is
#output_file):
import pandas as pd
import pandas_bokeh
#Create DataFrame to Plot:
import numpy as np
x = np.arange(-10, 10, 0.1)
sin = np.sin(x)
cos = np.cos(x)
tan = np.tan(x)
df = pd.DataFrame({"x": x, "sin(x)": sin, "cos(x)": cos, "tan(x)": tan})
#Make Bokeh plot from DataFrame using Pandas-Bokeh. Do not show the plot, but export
#it to an embeddable HTML string:
html_plot = df.plot_bokeh(
kind="line",
x="x",
y=["sin(x)", "cos(x)", "tan(x)"],
xticks=range(-20, 20),
title="Trigonometric functions",
show_figure=False,
return_html=True,
ylim=(-1.5, 1.5))
#Write some HTML and embed the HTML plot below it. For production use, please use
#Templates and the awesome Jinja library.
html = r"""
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h1> Trigonometric functions </h1>
<p> The basic trigonometric functions are:</p>
<p>$ sin(x) $</p>
<p>$ cos(x) $</p>
<p>$ tan(x) = \frac{sin(x)}{cos(x)}$</p>
<p>Below is a plot that shows them</p>
""" + html_plot
#Export the HTML string to an external HTML file and show it:
with open("test.html" , "w") as f:
f.write(html)
import webbrowser
webbrowser.open("test.html")
This code will open up a webbrowser and show the following page. As you can see, the interactive Bokeh plot is embedded nicely into the HTML layout. The return_html option is ideal for the use in a templating engine like Jinja.
For single plots that have a number of x axis values or for larger monitors, you can auto scale the figure to the width of the entire jupyter cell by setting the sizing_mode
parameter.
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot_bokeh(kind="bar", figsize=(500, 200), sizing_mode="scale_width")
The figsize
parameter can be used to change the height and width as well as act as a scaling multiplier against the axis that is not being scaled.
To change the formats of numbers in the hovertool, use the number_format keyword argument. For a documentation about the format to pass, have a look at the Bokeh documentation.Let us consider some examples for the number 3.141592653589793:
Format | Output |
---|---|
0 | 3 |
0.000 | 3.141 |
0.00 $ | 3.14 $ |
This number format will be applied to all numeric columns of the hovertool. If you want to make a very custom or complicated hovertool, you should probably use the hovertool_string keyword argument, see e.g. this example. Below, we use the number_format parameter to specify the "Stock Price" format to 2 decimal digits and an additional $ sign.
import numpy as np
#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
"Google": np.random.randn(1000) + 0.2,
"Apple": np.random.randn(1000) + 0.17
},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(
kind="line",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
number_format="1.00 $")
If you want to suppress the scientific notation for axes, you can use the disable_scientific_axes parameter, which accepts one of "x", "y", "xy":
df = pd.DataFrame({"Animal": ["Mouse", "Rabbit", "Dog", "Tiger", "Elefant", "Wale"],
"Weight [g]": [19, 3000, 40000, 200000, 6000000, 50000000]})
p_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", show_figure=False)
p_non_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", disable_scientific_axes="y", show_figure=False,)
pandas_bokeh.plot_grid([[p_scientific, p_non_scientific]], plot_width = 450)
As shown in the Scatterplot Example, combining plots with plots or other HTML elements is straighforward in Pandas-Bokeh due to the layout capabilities of Bokeh. The easiest way to generate a dashboard layout is using the pandas_bokeh.plot_grid method (which is an extension of bokeh.layouts.gridplot):
import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()
#Barplot:
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh(
kind="bar",
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
show_figure=False)
#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
"Google": np.random.randn(1000) + 0.2,
"Apple": np.random.randn(1000) + 0.17
},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
p_line = df.plot_bokeh(
kind="line",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
show_figure=False)
#Scatterplot:
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
kind="scatter",
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False)
#Histogram:
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
p_hist = df_hist.plot_bokeh(
kind="hist",
bins=np.arange(-6, 6.5, 0.5),
vertical_xlabel=True,
normed=100,
hovertool=False,
title="Normal distributions",
show_figure=False)
#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar],
[p_scatter, p_hist]], plot_width=450)
Using a combination of row and column elements (see also Bokeh Layouts) allow for a very easy general arrangement of elements. An alternative layout to the one above is:
p_line.plot_width = 900
p_hist.plot_width = 900
layout = pandas_bokeh.column(p_line,
pandas_bokeh.row(p_scatter, p_bar),
p_hist)
pandas_bokeh.show(layout)
Release Notes can be found here.
Contributing to Pandas-Bokeh
If you wish to contribute to the development of Pandas-Bokeh
you can follow the instructions on the CONTRIBUTING.md.
Download Details:
Author: PatrikHlobil
Source Code: https://github.com/PatrikHlobil/Pandas-Bokeh
License: MIT License
1641430440
Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.
With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling:
df.plot_bokeh()
Pandas-Bokeh also provides native support as a Pandas Plotting backend for Pandas >= 0.25. When Pandas-Bokeh is installed, switchting the default Pandas plotting backend to Bokeh can be done via:
pd.set_option('plotting.backend', 'pandas_bokeh')
More details about the new Pandas backend can be found below.
Please visit:
https://patrikhlobil.github.io/Pandas-Bokeh/
for an interactive version of the documentation below, where you can play with the dynamic Bokeh plots.
For more information have a look at the Examples below or at notebooks on the Github Repository of this project.
You can install Pandas-Bokeh from PyPI via pip
pip install pandas-bokeh
or conda:
conda install -c patrikhlobil pandas-bokeh
With the current release 0.5.5, Pandas-Bokeh officially supports Python 3.6 and newer. For more details, see Release Notes.
The Pandas-Bokeh library should be imported after Pandas, GeoPandas and/or Pyspark. After the import, one should define the plotting output, which can be:
pandas_bokeh.output_notebook(): Embeds the Plots in the cell outputs of the notebook. Ideal when working in Jupyter Notebooks.
pandas_bokeh.output_file(filename): Exports the plot to the provided filename as an HTML.
For more details about the plotting outputs, see the reference here or the Bokeh documentation.
import pandas as pd import pandas_bokeh pandas_bokeh.output_notebook()
import pandas as pd import pandas_bokeh pandas_bokeh.output_file("Interactive Plot.html")
For pandas >= 0.25, a plotting backend switch is natively supported. It can be achievied by calling:
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
Now, the plotting API is accessible for a Pandas DataFrame via:
df.plot(...)
All additional functionalities of Pandas-Bokeh are then accessible at pd.plotting. So, setting the output to notebook is:
pd.plotting.output_notebook()
or calling the grid layout functionality:
pd.plotting.plot_grid(...)
Note: Backwards compatibility is kept since there will still be the df.plot_bokeh(...) methods for a DataFrame.
Supported plottypes are at the moment:
Also, check out the complementary chapter Outputs, Formatting & Layouts about:
This simple lineplot in Pandas-Bokeh already contains various interactive elements:
Consider the following simple example:
import numpy as np
np.random.seed(42)
df = pd.DataFrame({"Google": np.random.randn(1000)+0.2,
"Apple": np.random.randn(1000)+0.17},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line") #equivalent to df.plot_bokeh.line()
Note, that similar to the regular pandas.DataFrame.plot method, there are also additional accessors to directly access the different plotting types like:
df.plot_bokeh(kind="line", ...)
→ df.plot_bokeh.line(...)
df.plot_bokeh(kind="bar", ...)
→ df.plot_bokeh.bar(...)
df.plot_bokeh(kind="hist", ...)
→ df.plot_bokeh.hist(...)
There are various optional parameters to tune the plots, for example:
kind: Which kind of plot should be produced. Currently supported are: "line", "point", "scatter", "bar" and "histogram". In the near future many more will be implemented as horizontal barplot, boxplots, pie-charts, etc.
x: Name of the column to use for the horizontal x-axis. If the x parameter is not specified, the index is used for the x-values of the plot. Alternative, also an array of values can be passed that has the same number of elements as the DataFrame.
y: Name of column or list of names of columns to use for the vertical y-axis.
figsize: Choose width & height of the plot
title: Sets title of the plot
xlim/ylim: Set visibler range of plot for x- and y-axis (also works for datetime x-axis)
xlabel/ylabel: Set x- and y-labels
logx/logy: Set log-scale on x-/y-axis
xticks/yticks: Explicitly set the ticks on the axes
color: Defines a single color for a plot.
colormap: Can be used to specify multiple colors to plot. Can be either a list of colors or the name of a Bokeh color palette
hovertool: If True a Hovertool is active, else if False no Hovertool is drawn.
hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation and here)
toolbar_location: Specify the position of the toolbar location (None, "above", "below", "left" or "right"). Default: "right"
zooming: Enables/Disables zooming. Default: True
panning: Enables/Disables panning. Default: True
fontsize_label/fontsize_ticks/fontsize_title/fontsize_legend: Set fontsize of labels, ticks, title or legend (int or string of form "15pt")
rangetool Enables a range tool scroller. Default False
kwargs**: Optional keyword arguments of bokeh.plotting.figure.line
Try them out to get a feeling for the effects. Let us consider now:
df.plot_bokeh.line(
figsize=(800, 450),
y="Apple",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
toolbar_location=None,
colormap=["red", "blue"],
hovertool_string=r"""<img
src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Apple_logo_black.svg/170px-Apple_logo_black.svg.png'
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img> Apple
<h4> Stock Price: </h4> @{Apple}""",
panning=False,
zooming=False)
For lineplots, as for many other plot-kinds, there are some special keyword arguments that only work for this plotting type. For lineplots, these are:
plot_data_points: Plot also the data points on the lines
plot_data_points_size: Determines the size of the data points
marker: Defines the point type (Default: "circle"). Possible values are: 'circle', 'square', 'triangle', 'asterisk', 'circle_x', 'square_x', 'inverted_triangle', 'x', 'circle_cross', 'square_cross', 'diamond', 'cross'
kwargs**: Optional keyword arguments of bokeh.plotting.figure.line```
Let us use this information to have another version of the same plot:
df.plot_bokeh.line(
figsize=(800, 450),
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(100, 200),
xlim=("2001-01-01", "2001-02-01"),
colormap=["red", "blue"],
plot_data_points=True,
plot_data_points_size=10,
marker="asterisk")
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()
df.plot_bokeh(rangetool=True)
Pointplot
If you just wish to draw the date points for curves, the pointplot option is the right choice. It also accepts the kwargs of bokeh.plotting.figure.scatter like marker or size:
import numpy as np
x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
x="x",
xticks=range(-3, 4),
size=5,
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
marker="x")
With a similar API as the line- & pointplots, one can generate a stepplot. Additional keyword arguments for this plot type are passes to bokeh.plotting.figure.step, e.g. mode (before, after, center), see the following example
import numpy as np
x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
figsize=(800,300),
fontsize_title=30,
fontsize_label=25,
fontsize_ticks=15,
fontsize_legend=5,
)
df.plot_bokeh.step(
x="x",
xticks=range(-1, 1),
colormap=["#009933", "#ff3399"],
title="Pointplot (Parabula vs. Cube)",
mode="after",
figsize=(800,300)
)
Note that the step-plot API of Bokeh does so far not support a hovertool functionality.
A basic scatterplot can be created using the kind="scatter" option. For scatterplots, the x and y parameters have to be specified and the following optional keyword argument is allowed:
category: Determines the category column to use for coloring the scatter points
kwargs**: Optional keyword arguments of bokeh.plotting.figure.scatter
Note, that the pandas.DataFrame.plot_bokeh() method return per default a Bokeh figure, which can be embedded in Dashboard layouts with other figures and Bokeh objects (for more details about (sub)plot layouts and embedding the resulting Bokeh plots as HTML click here).
In the example below, we use the building grid layout support of Pandas-Bokeh to display both the DataFrame (using a Bokeh DataTable) and the resulting scatterplot:
# Load Iris Dataset:
df = pd.read_csv(
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)
# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource
data_table = DataTable(
columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
source=ColumnDataSource(df),
height=300,
)
# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False,
)
# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)
A possible optional keyword parameters that can be passed to bokeh.plotting.figure.scatter is size. Below, we use the sepal length of the Iris data as reference for the size:
#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15
#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization with Size Keyword",
size="sepal length (cm)")
In this example you can see, that the additional dimension sepal length cannot be used to clearly differentiate between the virginica and versicolor species.
The barplot API has no special keyword arguments, but accepts optional kwargs of bokeh.plotting.figure.vbar like alpha. It uses per default the index for the bar categories (however, also columns can be used as x-axis category using the x argument).
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6)
Using the stacked keyword argument you also maked stacked barplots:
p_stacked_bar = df.plot_bokeh.bar(
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
stacked=True,
alpha=0.6)
Also horizontal versions of the above barplot are supported with the keyword kind="barh" or the accessor plot_bokeh.barh. You can still specify a column of the DataFrame as the bar category via the x argument if you do not wish to use the index.
#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)
#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
kind="barh",
x="fruits",
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
x="fruits",
stacked=True,
xlabel="Price per Unit [€]",
title="Fruit prices per Year",
alpha=0.6,
legend = "bottom_right",
show_figure=False)
#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
[p_hbar, p_stacked_hbar]],
plot_width=450)
For drawing histograms (kind="hist"), Pandas-Bokeh has a lot of customization features. Optional keyword arguments for histogram plots are:
bins: Determines bins to use for the histogram. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by histogram_bin_edges.
histogram_type: Either "sidebyside", "topontop" or "stacked". Default: "topontop"
stacked: Boolean that overrides the histogram_type as "stacked" if given. Default: False
kwargs**: Optional keyword arguments of bokeh.plotting.figure.quad
Below examples of the different histogram types:
import numpy as np
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Top-on-Top)",
line_color="black")
#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
kind="hist",
bins=np.linspace(-5, 5, 41),
histogram_type="sidebyside",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Side-by-Side)",
line_color="black")
#Stacked histogram:
df_hist.plot_bokeh.hist(
bins=np.linspace(-5, 5, 41),
histogram_type="stacked",
vertical_xlabel=True,
hovertool=False,
title="Normal distributions (Stacked)",
line_color="black")
Further, advanced keyword arguments for histograms are:
Their usage is shown in these examples:
p_hist = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed)",
show_average=True,
xlim=(-4, 6),
ylim=(0, 30),
show_figure=False)
p_hist_cum = df_hist.plot_bokeh.hist(
y=["a", "b"],
bins=np.arange(-4, 6.5, 0.5),
normed=100,
cumulative=True,
vertical_xlabel=True,
ylabel="Share[%]",
title="Normal distributions (normed & cumulative)",
show_figure=False)
pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300)
Areaplot (kind="area") can be either drawn on top of each other or stacked. The important parameters are:
stacked: If True, the areaplots are stacked. If False, plots are drawn on top of each other. Default: False
kwargs**: Optional keyword arguments of bokeh.plotting.figure.patch
Let us consider the energy consumption split by source that can be downloaded as DataFrame via:
df_energy = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/energy/energy.csv",
parse_dates=["Year"])
df_energy.head()
Year | Oil | Gas | Coal | Nuclear Energy | Hydroelectricity | Other Renewable |
---|---|---|---|---|---|---|
1970-01-01 | 2291.5 | 826.7 | 1467.3 | 17.7 | 265.8 | 5.8 |
1971-01-01 | 2427.7 | 884.8 | 1459.2 | 24.9 | 276.4 | 6.3 |
1972-01-01 | 2613.9 | 933.7 | 1475.7 | 34.1 | 288.9 | 6.8 |
1973-01-01 | 2818.1 | 978.0 | 1519.6 | 45.9 | 292.5 | 7.3 |
1974-01-01 | 2777.3 | 1001.9 | 1520.9 | 59.6 | 321.1 | 7.7 |
Creating the Areaplot can be achieved via:
df_energy.plot_bokeh.area(
x="Year",
stacked=True,
legend="top_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent",
ylim=(0, 16000))
Note that the energy consumption of fossile energy is still increasing and renewable energy sources are still small in comparison 😢!!! However, when we norm the plot using the normed keyword, there is a clear trend towards renewable energies in the last decade:
df_energy.plot_bokeh.area(
x="Year",
stacked=True,
normed=100,
legend="bottom_left",
colormap=["brown", "orange", "black", "grey", "blue", "green"],
title="Worldwide energy consumption split by energy source",
ylabel="Million tonnes oil equivalent")
Pieplot
For Pieplots, let us consider a dataset showing the results of all Bundestags elections in Germany since 2002:
df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie
Partei | 2002 | 2005 | 2009 | 2013 | 2017 |
---|---|---|---|---|---|
CDU/CSU | 38.5 | 35.2 | 33.8 | 41.5 | 32.9 |
SPD | 38.5 | 34.2 | 23.0 | 25.7 | 20.5 |
FDP | 7.4 | 9.8 | 14.6 | 4.8 | 10.7 |
Grünen | 8.6 | 8.1 | 10.7 | 8.4 | 8.9 |
Linke/PDS | 4.0 | 8.7 | 11.9 | 8.6 | 9.2 |
AfD | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
Sonstige | 3.0 | 4.0 | 6.0 | 11.0 | 5.0 |
We can create a Pieplot of the last election in 2017 by specifying the "Partei" (german for party) column as the x column and the "2017" column as the y column for values:
df_pie.plot_bokeh.pie(
x="Partei",
y="2017",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Election 2017",
)
When you pass several columns to the y parameter (not providing the y-parameter assumes you plot all columns), multiple nested pieplots will be shown in one plot:
df_pie.plot_bokeh.pie(
x="Partei",
colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
title="Results of German Bundestag Elections [2002-2017]",
line_color="grey")
Mapplot
The mapplot method of Pandas-Bokeh allows for plotting geographic points stored in a Pandas DataFrame on an interactive map. For more advanced Geoplots for line and polygon shapes have a look at the Geoplots examples for the GeoPandas API of Pandas-Bokeh.
For mapplots, only (latitude, longitude) pairs in geographic projection (WGS84) can be plotted on a map. The basic API has the following 2 base parameters:
The other optional keyword arguments are discussed in the section about the GeoPandas API, e.g. category for coloring the points.
Below an example of plotting all cities for more than 1 million inhabitants:
df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
name | pop_max | latitude | longitude | size |
---|---|---|---|---|
Mesa | 1085394 | 33.423915 | -111.736084 | 1.085394 |
Sharjah | 1103027 | 25.371383 | 55.406478 | 1.103027 |
Changwon | 1081499 | 35.219102 | 128.583562 | 1.081499 |
Sheffield | 1292900 | 53.366677 | -1.499997 | 1.292900 |
Abbottabad | 1183647 | 34.149503 | 73.199501 | 1.183647 |
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
x="longitude",
y="latitude",
hovertool_string="""<h2> @{name} </h2>
<h3> Population: @{pop_max} </h3>""",
tile_provider="STAMEN_TERRAIN_RETINA",
size="size",
figsize=(900, 600),
title="World cities with more than 1.000.000 inhabitants")
Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :
Note: t is not possible to mix up the objects types, i.e. a GeoDataFrame with Points and Lines is for example not allowed.
Les us start with a simple example using the "World Borders Dataset" . Let us first import all neccessary libraries and read the shapefile:
import geopandas as gpd
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()
#Read in GeoJSON from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states.head()
STATE_NAME | REGION | POPESTIMATE2010 | POPESTIMATE2011 | POPESTIMATE2012 | POPESTIMATE2013 | POPESTIMATE2014 | POPESTIMATE2015 | POPESTIMATE2016 | POPESTIMATE2017 | geometry |
---|---|---|---|---|---|---|---|---|---|---|
Hawaii | 4 | 1363817 | 1378323 | 1392772 | 1408038 | 1417710 | 1426320 | 1428683 | 1427538 | (POLYGON ((-160.0738033454681 22.0041773479577... |
Washington | 4 | 6741386 | 6819155 | 6890899 | 6963410 | 7046931 | 7152818 | 7280934 | 7405743 | (POLYGON ((-122.4020153103835 48.2252163723779... |
Montana | 4 | 990507 | 996866 | 1003522 | 1011921 | 1019931 | 1028317 | 1038656 | 1050493 | POLYGON ((-111.4754253002074 44.70216236909688... |
Maine | 1 | 1327568 | 1327968 | 1328101 | 1327975 | 1328903 | 1327787 | 1330232 | 1335907 | (POLYGON ((-69.77727626137293 44.0741483685119... |
North Dakota | 2 | 674518 | 684830 | 701380 | 722908 | 738658 | 754859 | 755548 | 755393 | POLYGON ((-98.73043728833767 45.93827137024809... |
Plotting the data on a map is as simple as calling:
df_states.plot_bokeh(simplify_shapes=10000)
We also passed the optional parameter simplify_shapes (~meter) to improve plotting performance (for a reference see shapely.object.simplify). The above geolayer thus has an accuracy of about 10km.
Many keyword arguments like xlabel, ylabel, xlim, ylim, title, colormap, hovertool, zooming, panning, ... for costumizing the plot are also available for the geoplotting API and can be uses as in the examples shown above. There are however also many other options especially for plotting geodata:
One of the most common usage of map plots are choropleth maps, where the color of a the objects is determined by the property of the object itself. There are 3 ways of drawing choropleth maps using Pandas-Bokeh, which are described below.
This is the simplest way. Just provide the category keyword for the selection of the property column:
Let us now draw the regions as a choropleth plot using the category keyword (at the moment, only numerical columns are supported for choropleth plots):
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
category="REGION",
show_colorbar=False,
colormap=["blue", "yellow", "green", "red"],
hovertool_columns=["STATE_NAME", "REGION"],
tile_provider="STAMEN_TERRAIN_RETINA")
When hovering over the states, the state-name and the region are shown as specified in the hovertool_columns argument.
By passing a list of column names of the GeoDataFrame as the dropdown keyword argument, a dropdown menu is shown above the map. This dropdown menu can be used to select the choropleth layer by the user. :
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
dropdown=["POPESTIMATE2010", "POPESTIMATE2017"],
colormap="Viridis",
hovertool_string="""
<img
src="https://www.states101.com/img/flags/gif/small/@STATE_NAME_SMALL.gif"
height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2"></img>
<h2> @STATE_NAME </h2>
<h3> 2010: @POPESTIMATE2010 </h3>
<h3> 2017: @POPESTIMATE2017 </h3>""",
tile_provider_url=r"http://c.tile.stamen.com/watercolor/{Z}/{X}/{Y}.jpg",
tile_attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, under <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a>. Data by <a href="http://openstreetmap.org">OpenStreetMap</a>, under <a href="http://www.openstreetmap.org/copyright">ODbL</a>.'
)
Using hovertool_string, one can pass a string that can contain arbitrary HTML elements (including divs, images, ...) that is shown when hovering over the geographies (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation).
Here, we also used an OSM tile server with watercolor style via tile_provider_url and added the attribution via tile_attribution.
Another option for interactive choropleth maps is the slider implementation of Pandas-Bokeh. The possible keyword arguments are here:
This can be used to display the change in population relative to the year 2010:
#Calculate change of population relative to 2010:
for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100
#Specify slider columns:
slider_columns = ["Delta_Population_201%d"%i for i in range(8)]
#Specify slider-range (Maps "Delta_Population_2010" -> 2010,
# "Delta_Population_2011" -> 2011, ...):
slider_range = range(2010, 2018)
#Make slider plot:
df_states.plot_bokeh(
figsize=(900, 600),
simplify_shapes=5000,
slider=slider_columns,
slider_range=slider_range,
slider_name="Year",
colormap="Inferno",
hovertool_columns=["STATE_NAME"] + slider_columns,
title="Change of Population [%]")
If you wish to display multiple geolayers, you can pass the Bokeh figure of a Pandas-Bokeh plot via the figure keyword to the next plot_bokeh() call:
import geopandas as gpd
import pandas_bokeh
pandas_bokeh.output_notebook()
# Read in GeoJSONs from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_cities = gpd.read_file(
r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson"
)
df_cities["size"] = df_cities.pop_max / 400000
#Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
figsize=(800, 450),
simplify_shapes=10000,
show_figure=False,
xlim=[-170, -80],
ylim=[10, 70],
category="REGION",
colormap="Dark2",
legend="States",
show_colorbar=False,
)
#Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
figure=figure, # <== pass figure here!
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
marker="inverted_triangle",
legend="Cities",
)
Below, you can see an example that use Pandas-Bokeh to plot point data on a map. The plot shows all cities with a population larger than 1.000.000. For point plots, you can select the marker as keyword argument (since it is passed to bokeh.plotting.figure.scatter). Here an overview of all available marker types:
gdf = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")
gdf["size"] = gdf.pop_max / 400000
gdf.plot_bokeh(
category="pop_max",
colormap="Viridis",
colormap_uselog=True,
size="size",
hovertool_string="""<h1>@name</h1>
<h3>Population: @pop_max </h3>""",
xlim=[-15, 35],
ylim=[30,60],
marker="inverted_triangle");
In a similar way, also GeoDataFrames with (multi)line shapes can be drawn using Pandas-Bokeh.
If you want to display the numerical labels on your colorbar with an alternative to the scientific format, you can pass in a one of the bokeh number string formats or an instance of one of the bokeh.models.formatters to the colorbar_tick_format
argument in the geoplot
An example of using the string format argument:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
# pass in a string format to colorbar_tick_format to display the ticks as 10m rather than 1e7
df_states.plot_bokeh(
figsize=(900, 600),
category="POPESTIMATE2017",
simplify_shapes=5000,
colormap="Inferno",
colormap_uselog=True,
colorbar_tick_format="0.0a")
An example of using the bokeh PrintfTickFormatter
:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()
for i in range(8):
df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100
# pass in a PrintfTickFormatter instance colorbar_tick_format to display the ticks with 2 decimal places
df_states.plot_bokeh(
figsize=(900, 600),
category="Delta_Population_2017",
simplify_shapes=5000,
colormap="Inferno",
colorbar_tick_format=PrintfTickFormatter(format="%4.2f"))
The pandas.DataFrame.plot_bokeh API has the following additional keyword arguments:
If you have a Bokeh figure or layout, you can also use the pandas_bokeh.embedded_html function to generate an embeddable HTML representation of the plot. This can be included into any valid HTML (note that this is not possible directly with the HTML generated by the pandas_bokeh.output_file output option, because it includes an HTML header). Let us consider the following simple example:
#Import Pandas and Pandas-Bokeh (if you do not specify an output option, the standard is
#output_file):
import pandas as pd
import pandas_bokeh
#Create DataFrame to Plot:
import numpy as np
x = np.arange(-10, 10, 0.1)
sin = np.sin(x)
cos = np.cos(x)
tan = np.tan(x)
df = pd.DataFrame({"x": x, "sin(x)": sin, "cos(x)": cos, "tan(x)": tan})
#Make Bokeh plot from DataFrame using Pandas-Bokeh. Do not show the plot, but export
#it to an embeddable HTML string:
html_plot = df.plot_bokeh(
kind="line",
x="x",
y=["sin(x)", "cos(x)", "tan(x)"],
xticks=range(-20, 20),
title="Trigonometric functions",
show_figure=False,
return_html=True,
ylim=(-1.5, 1.5))
#Write some HTML and embed the HTML plot below it. For production use, please use
#Templates and the awesome Jinja library.
html = r"""
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h1> Trigonometric functions </h1>
<p> The basic trigonometric functions are:</p>
<p>$ sin(x) $</p>
<p>$ cos(x) $</p>
<p>$ tan(x) = \frac{sin(x)}{cos(x)}$</p>
<p>Below is a plot that shows them</p>
""" + html_plot
#Export the HTML string to an external HTML file and show it:
with open("test.html" , "w") as f:
f.write(html)
import webbrowser
webbrowser.open("test.html")
This code will open up a webbrowser and show the following page. As you can see, the interactive Bokeh plot is embedded nicely into the HTML layout. The return_html option is ideal for the use in a templating engine like Jinja.
For single plots that have a number of x axis values or for larger monitors, you can auto scale the figure to the width of the entire jupyter cell by setting the sizing_mode
parameter.
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) df.plot_bokeh(kind="bar", figsize=(500, 200), sizing_mode="scale_width")
The figsize
parameter can be used to change the height and width as well as act as a scaling multiplier against the axis that is not being scaled.
To change the formats of numbers in the hovertool, use the number_format keyword argument. For a documentation about the format to pass, have a look at the Bokeh documentation.Let us consider some examples for the number 3.141592653589793:
Format | Output |
---|---|
0 | 3 |
0.000 | 3.141 |
0.00 $ | 3.14 $ |
This number format will be applied to all numeric columns of the hovertool. If you want to make a very custom or complicated hovertool, you should probably use the hovertool_string keyword argument, see e.g. this example. Below, we use the number_format parameter to specify the "Stock Price" format to 2 decimal digits and an additional $ sign.
import numpy as np
#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
"Google": np.random.randn(1000) + 0.2,
"Apple": np.random.randn(1000) + 0.17
},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(
kind="line",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
number_format="1.00 $")
If you want to suppress the scientific notation for axes, you can use the disable_scientific_axes parameter, which accepts one of "x", "y", "xy":
df = pd.DataFrame({"Animal": ["Mouse", "Rabbit", "Dog", "Tiger", "Elefant", "Wale"],
"Weight [g]": [19, 3000, 40000, 200000, 6000000, 50000000]})
p_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", show_figure=False)
p_non_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", disable_scientific_axes="y", show_figure=False,)
pandas_bokeh.plot_grid([[p_scientific, p_non_scientific]], plot_width = 450)
As shown in the Scatterplot Example, combining plots with plots or other HTML elements is straighforward in Pandas-Bokeh due to the layout capabilities of Bokeh. The easiest way to generate a dashboard layout is using the pandas_bokeh.plot_grid method (which is an extension of bokeh.layouts.gridplot):
import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()
#Barplot:
data = {
'fruits':
['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
'2015': [2, 1, 4, 3, 2, 4],
'2016': [5, 3, 3, 2, 4, 6],
'2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh(
kind="bar",
ylabel="Price per Unit [€]",
title="Fruit prices per Year",
show_figure=False)
#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
"Google": np.random.randn(1000) + 0.2,
"Apple": np.random.randn(1000) + 0.17
},
index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
p_line = df.plot_bokeh(
kind="line",
title="Apple vs Google",
xlabel="Date",
ylabel="Stock price [$]",
yticks=[0, 100, 200, 300, 400],
ylim=(0, 400),
colormap=["red", "blue"],
show_figure=False)
#Scatterplot:
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
kind="scatter",
x="petal length (cm)",
y="sepal width (cm)",
category="species",
title="Iris DataSet Visualization",
show_figure=False)
#Histogram:
df_hist = pd.DataFrame({
'a': np.random.randn(1000) + 1,
'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1
},
columns=['a', 'b', 'c'])
p_hist = df_hist.plot_bokeh(
kind="hist",
bins=np.arange(-6, 6.5, 0.5),
vertical_xlabel=True,
normed=100,
hovertool=False,
title="Normal distributions",
show_figure=False)
#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar],
[p_scatter, p_hist]], plot_width=450)
Using a combination of row and column elements (see also Bokeh Layouts) allow for a very easy general arrangement of elements. An alternative layout to the one above is:
p_line.plot_width = 900
p_hist.plot_width = 900
layout = pandas_bokeh.column(p_line,
pandas_bokeh.row(p_scatter, p_bar),
p_hist)
pandas_bokeh.show(layout)
Release Notes
Release Notes can be found here.
Contributing to Pandas-Bokeh
If you wish to contribute to the development of Pandas-Bokeh
you can follow the instructions on the CONTRIBUTING.md.
Author: PatrikHlobil
Source Code: https://github.com/PatrikHlobil/Pandas-Bokeh
License: MIT License
1561523460
This Matplotlib cheat sheet introduces you to the basics that you need to plot your data with Python and includes code samples.
Data visualization and storytelling with your data are essential skills that every data scientist needs to communicate insights gained from analyses effectively to any audience out there.
For most beginners, the first package that they use to get in touch with data visualization and storytelling is, naturally, Matplotlib: it is a Python 2D plotting library that enables users to make publication-quality figures. But, what might be even more convincing is the fact that other packages, such as Pandas, intend to build more plotting integration with Matplotlib as time goes on.
However, what might slow down beginners is the fact that this package is pretty extensive. There is so much that you can do with it and it might be hard to still keep a structure when you're learning how to work with Matplotlib.
DataCamp has created a Matplotlib cheat sheet for those who might already know how to use the package to their advantage to make beautiful plots in Python, but that still want to keep a one-page reference handy. Of course, for those who don't know how to work with Matplotlib, this might be the extra push be convinced and to finally get started with data visualization in Python.
You'll see that this cheat sheet presents you with the six basic steps that you can go through to make beautiful plots.
Check out the infographic by clicking on the button below:
With this handy reference, you'll familiarize yourself in no time with the basics of Matplotlib: you'll learn how you can prepare your data, create a new plot, use some basic plotting routines to your advantage, add customizations to your plots, and save, show and close the plots that you make.
What might have looked difficult before will definitely be more clear once you start using this cheat sheet! Use it in combination with the Matplotlib Gallery, the documentation.
Matplotlib
Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
>>> import numpy as np
>>> x = np.linspace(0, 10, 100)
>>> y = np.cos(x)
>>> z = np.sin(x)
>>> data = 2 * np.random.random((10, 10))
>>> data2 = 3 * np.random.random((10, 10))
>>> Y, X = np.mgrid[-3:3:100j, -3:3:100j]
>>> U = 1 X** 2 + Y
>>> V = 1 + X Y**2
>>> from matplotlib.cbook import get_sample_data
>>> img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> fig2 = plt.figure(figsize=plt.figaspect(2.0))
>>> fig.add_axes()
>>> ax1 = fig.add_subplot(221) #row-col-num
>>> ax3 = fig.add_subplot(212)
>>> fig3, axes = plt.subplots(nrows=2,ncols=2)
>>> fig4, axes2 = plt.subplots(ncols=3)
>>> plt.savefig('foo.png') #Save figures
>>> plt.savefig('foo.png', transparent=True) #Save transparent figures
>>> plt.show()
>>> fig, ax = plt.subplots()
>>> lines = ax.plot(x,y) #Draw points with lines or markers connecting them
>>> ax.scatter(x,y) #Draw unconnected points, scaled or colored
>>> axes[0,0].bar([1,2,3],[3,4,5]) #Plot vertical rectangles (constant width)
>>> axes[1,0].barh([0.5,1,2.5],[0,1,2]) #Plot horiontal rectangles (constant height)
>>> axes[1,1].axhline(0.45) #Draw a horizontal line across axes
>>> axes[0,1].axvline(0.65) #Draw a vertical line across axes
>>> ax.fill(x,y,color='blue') #Draw filled polygons
>>> ax.fill_between(x,y,color='yellow') #Fill between y values and 0
>>> fig, ax = plt.subplots()
>>> im = ax.imshow(img, #Colormapped or RGB arrays
cmap= 'gist_earth',
interpolation= 'nearest',
vmin=-2,
vmax=2)
>>> axes2[0].pcolor(data2) #Pseudocolor plot of 2D array
>>> axes2[0].pcolormesh(data) #Pseudocolor plot of 2D array
>>> CS = plt.contour(Y,X,U) #Plot contours
>>> axes2[2].contourf(data1) #Plot filled contours
>>> axes2[2]= ax.clabel(CS) #Label a contour plot
>>> axes[0,1].arrow(0,0,0.5,0.5) #Add an arrow to the axes
>>> axes[1,1].quiver(y,z) #Plot a 2D field of arrows
>>> axes[0,1].streamplot(X,Y,U,V) #Plot a 2D field of arrows
>>> ax1.hist(y) #Plot a histogram
>>> ax3.boxplot(y) #Make a box and whisker plot
>>> ax3.violinplot(z) #Make a violin plot
y-axis
x-axis
The basic steps to creating plots with matplotlib are:
1 Prepare Data
2 Create Plot
3 Plot
4 Customized Plot
5 Save Plot
6 Show Plot
>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4] #Step 1
>>> y = [10,20,25,30]
>>> fig = plt.figure() #Step 2
>>> ax = fig.add_subplot(111) #Step 3
>>> ax.plot(x, y, color= 'lightblue', linewidth=3) #Step 3, 4
>>> ax.scatter([2,4,6],
[5,15,25],
color= 'darkgreen',
marker= '^' )
>>> ax.set_xlim(1, 6.5)
>>> plt.savefig('foo.png' ) #Step 5
>>> plt.show() #Step 6
>>> plt.cla() #Clear an axis
>>> plt.clf(). #Clear the entire figure
>>> plt.close(). #Close a window
>>> plt.plot(x, x, x, x**2, x, x** 3)
>>> ax.plot(x, y, alpha = 0.4)
>>> ax.plot(x, y, c= 'k')
>>> fig.colorbar(im, orientation= 'horizontal')
>>> im = ax.imshow(img,
cmap= 'seismic' )
>>> fig, ax = plt.subplots()
>>> ax.scatter(x,y,marker= ".")
>>> ax.plot(x,y,marker= "o")
>>> plt.plot(x,y,linewidth=4.0)
>>> plt.plot(x,y,ls= 'solid')
>>> plt.plot(x,y,ls= '--')
>>> plt.plot(x,y,'--' ,x**2,y**2,'-.' )
>>> plt.setp(lines,color= 'r',linewidth=4.0)
>>> ax.text(1,
-2.1,
'Example Graph',
style= 'italic' )
>>> ax.annotate("Sine",
xy=(8, 0),
xycoords= 'data',
xytext=(10.5, 0),
textcoords= 'data',
arrowprops=dict(arrowstyle= "->",
connectionstyle="arc3"),)
>>> plt.title(r '$sigma_i=15$', fontsize=20)
Limits & Autoscaling
>>> ax.margins(x=0.0,y=0.1) #Add padding to a plot
>>> ax.axis('equal') #Set the aspect ratio of the plot to 1
>>> ax.set(xlim=[0,10.5],ylim=[-1.5,1.5]) #Set limits for x-and y-axis
>>> ax.set_xlim(0,10.5) #Set limits for x-axis
Legends
>>> ax.set(title= 'An Example Axes', #Set a title and x-and y-axis labels
ylabel= 'Y-Axis',
xlabel= 'X-Axis')
>>> ax.legend(loc= 'best') #No overlapping plot elements
Ticks
>>> ax.xaxis.set(ticks=range(1,5), #Manually set x-ticks
ticklabels=[3,100, 12,"foo" ])
>>> ax.tick_params(axis= 'y', #Make y-ticks longer and go in and out
direction= 'inout',
length=10)
Subplot Spacing
>>> fig3.subplots_adjust(wspace=0.5, #Adjust the spacing between subplots
hspace=0.3,
left=0.125,
right=0.9,
top=0.9,
bottom=0.1)
>>> fig.tight_layout() #Fit subplot(s) in to the figure area
Axis Spines
>>> ax1.spines[ 'top'].set_visible(False) #Make the top axis line for a plot invisible
>>> ax1.spines['bottom' ].set_position(( 'outward',10)) #Move the bottom axis line outward
Have this Cheat Sheet at your fingertips
Original article source at https://www.datacamp.com
#matplotlib #cheatsheet #python