John Deo

1578456538

6 tips to make your Dashboards more Performant in Tableau

We here at Tableau are very proud of how easy it is to see and understand data with Tableau. Once you get started, it’s intuitive to dive deeper by adding more and more fields, formulae, and calculations to a simple visualization—until it becomes slower and slower to render. In a world where two-second response times can lose an audience, performance is crucial.

So where do I start?
So how can you make your dashboards run faster? Your first step is to identify the problem spots by running and interpreting your performance recording. The performance recorder is every Tableau speed demon’s ticket to the fast lane. The performance recorder can pinpoint slow worksheets, slow queries, and long render-times on a dashboard. It even shows the query text, allowing you to work with your database team on optimizing at the database level.
This is image title

Now that you know which views or data connections are slowing you down, below are six tips to make those dashboards more performant. For each tip, we’ve listed the most common causes of performance degradation as well as some quick solutions.

To get in-Depth knowledge on Tableau Dashboard you can enroll for live Tableau Online Training

1. Your data strategy drives performance
Extracts are typically much faster to work with than a live data source, and are especially great for prototyping. The key is to use domain-specific cuts of your data. The Data Engine is not intended to be a replacement for a data warehouse. Rather, it’s meant to be a supplement for fast prototyping and data discovery.

This is image title

  • Minimize the number of fields based on the analysis being performed. Use the hide all unused fields option to remove unused columns from a data source.
  • Minimize the number of records. Use extract filters to keep only the data you need.
  • Optimize extracts to speed up future queries by materializing calculations, removing columns and the use of accelerated views.

Keep in mind: Extracts are not always the long-term solution. The typical extent of an extract is between 500 million to one billion rows; mileage will vary. When querying against constantly-refreshing data, a live connection often makes more sense when operationalizing the view.

2. Reduce the marks (data points) in your view
When data is highly granular, Tableau must render and precisely place each element. Each mark represents a batch that Tableau must parse. More marks create more batches; drawing 1,000 points on a graph is more difficult than drawing three bars in a chart.
This is image title

Large crosstabs with a bevy of quick filters can cause increased load times when you try to view all the rows and dimensions on a Tableau view.

Excessive marks (think: data points) on a view also reduce the visual analytics value. Large, slow, manual table scans can cause information overload and make it harder to see and understand your data.

Here’s how you can avoid this trap:

  • Practice guided analytics. There’s no need to fit everything you plan to show in a single view. Compile related views and connect them with action filters to travel from overview to highly-granular views at the speed of thought.
  • Remove unneeded dimensions from the detail shelf.
  • Explore. Try displaying your data in different types of views.

3. Limit your filters by number and type
Filtering in Tableau is extremely powerful and expressive. However, inefficient and excessive filters are one of the most common causes of poorly performing workbooks and dashboards. Note: Showing the filter dialog requires Tableau to load its members and may create extra queries, especially if the filtered dimension is not in the view.
This is image title

  • Reduce the number of filters in use. Excessive filters on a view will create a more complex query, which takes longer to return results. Double-check your filters and remove any that aren’t necessary.
  • Use an include filter. Exclude filters load the entire domain of a dimension, while include filters do not. An include filter runs much faster than an exclude filter, especially for dimensions with many members.
  • Use a continuous date filter. Continuous date filters (relative and range-of-date filters) can take advantage of the indexing properties in your database and are faster than discrete date filters.
  • Use Boolean or numeric filters. Computers process integers and Booleans (t/f) much faster than strings.
  • Use parameters and action filters. These reduce the query load (and work across data sources).

4. Optimize and materialize your calculations

  • Perform calculations in the database. Wherever possible, especially on production views, perform calculations in the database to reduce overhead in Tableau. Aggregate calculations are great for calculated fields in Tableau. Perform row-level calculations in the database when you can.

  • Reduce the number of nested calculations. Just like Russian nesting dolls, unpacking a calculation and then building it takes longer for each extra layer.
    This is image title

  • Reduce the granularity of LOD or table calculations in the view. The more granular the calculation, the longer it takes.
    LODs - Look at the number of unique dimension members in the calculation.
    Table Calculations - the more marks in the view, the longer it will take to calculate.

  • Where possible, use MIN or MAX instead of AVG. AVG requires more processing than MIN or MAX. Often rows will be duplicated and display the same result with MIN, MAX, or AVG.

  • Make groups with calculations. Like include filters, calculated groups load only named members of the domain, whereas Tableau’s group function loads the entire domain.

  • Use Booleans or numeric calculations instead of string calculations. Computers can process integers and Booleans (t/f) much faster than strings. Boolean>Int>Float>Date>DateTime>String

5. Take advantage of Tableau’s query optimization

  • Blend on low-granularity dimensions. The more members in a blend, the longer it takes. Blending aggregates the data to the level of the relationship. Blends are not meant to replace row-level joins.
  • Minimize joined tables. Lots of joins take lots of time. If you find yourself creating a data connection with many joined tables, it may be faster to materialize the view in the database.
  • Assume referential integrity if your database is configured with this option.
  • Remove custom SQL. Tableau can take advantage of database optimizations when the data connection does not use custom SQL.

6. Clean up your workbooks!

  • Reduce dashboard scope. Excess worksheets on a dashboard can impact performance.
  • Delete or consolidate unused worksheets and data sources. A clean workbook is a happy workbook.

Take your career to new heights of success with a tableau training

#tableau #tableu #dashboard #technology #businessintelligence

What is GEEK

Buddha Community

6 tips to make your Dashboards more Performant in Tableau
Ray  Patel

Ray Patel

1619518440

top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

1) swap two numbers.

2) Reversing a string in Python.

3) Create a single string from all the elements in list.

4) Chaining Of Comparison Operators.

5) Print The File Path Of Imported Modules.

6) Return Multiple Values From Functions.

7) Find The Most Frequent Value In A List.

8) Check The Memory Usage Of An Object.

#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners

John Deo

1578456538

6 tips to make your Dashboards more Performant in Tableau

We here at Tableau are very proud of how easy it is to see and understand data with Tableau. Once you get started, it’s intuitive to dive deeper by adding more and more fields, formulae, and calculations to a simple visualization—until it becomes slower and slower to render. In a world where two-second response times can lose an audience, performance is crucial.

So where do I start?
So how can you make your dashboards run faster? Your first step is to identify the problem spots by running and interpreting your performance recording. The performance recorder is every Tableau speed demon’s ticket to the fast lane. The performance recorder can pinpoint slow worksheets, slow queries, and long render-times on a dashboard. It even shows the query text, allowing you to work with your database team on optimizing at the database level.
This is image title

Now that you know which views or data connections are slowing you down, below are six tips to make those dashboards more performant. For each tip, we’ve listed the most common causes of performance degradation as well as some quick solutions.

To get in-Depth knowledge on Tableau Dashboard you can enroll for live Tableau Online Training

1. Your data strategy drives performance
Extracts are typically much faster to work with than a live data source, and are especially great for prototyping. The key is to use domain-specific cuts of your data. The Data Engine is not intended to be a replacement for a data warehouse. Rather, it’s meant to be a supplement for fast prototyping and data discovery.

This is image title

  • Minimize the number of fields based on the analysis being performed. Use the hide all unused fields option to remove unused columns from a data source.
  • Minimize the number of records. Use extract filters to keep only the data you need.
  • Optimize extracts to speed up future queries by materializing calculations, removing columns and the use of accelerated views.

Keep in mind: Extracts are not always the long-term solution. The typical extent of an extract is between 500 million to one billion rows; mileage will vary. When querying against constantly-refreshing data, a live connection often makes more sense when operationalizing the view.

2. Reduce the marks (data points) in your view
When data is highly granular, Tableau must render and precisely place each element. Each mark represents a batch that Tableau must parse. More marks create more batches; drawing 1,000 points on a graph is more difficult than drawing three bars in a chart.
This is image title

Large crosstabs with a bevy of quick filters can cause increased load times when you try to view all the rows and dimensions on a Tableau view.

Excessive marks (think: data points) on a view also reduce the visual analytics value. Large, slow, manual table scans can cause information overload and make it harder to see and understand your data.

Here’s how you can avoid this trap:

  • Practice guided analytics. There’s no need to fit everything you plan to show in a single view. Compile related views and connect them with action filters to travel from overview to highly-granular views at the speed of thought.
  • Remove unneeded dimensions from the detail shelf.
  • Explore. Try displaying your data in different types of views.

3. Limit your filters by number and type
Filtering in Tableau is extremely powerful and expressive. However, inefficient and excessive filters are one of the most common causes of poorly performing workbooks and dashboards. Note: Showing the filter dialog requires Tableau to load its members and may create extra queries, especially if the filtered dimension is not in the view.
This is image title

  • Reduce the number of filters in use. Excessive filters on a view will create a more complex query, which takes longer to return results. Double-check your filters and remove any that aren’t necessary.
  • Use an include filter. Exclude filters load the entire domain of a dimension, while include filters do not. An include filter runs much faster than an exclude filter, especially for dimensions with many members.
  • Use a continuous date filter. Continuous date filters (relative and range-of-date filters) can take advantage of the indexing properties in your database and are faster than discrete date filters.
  • Use Boolean or numeric filters. Computers process integers and Booleans (t/f) much faster than strings.
  • Use parameters and action filters. These reduce the query load (and work across data sources).

4. Optimize and materialize your calculations

  • Perform calculations in the database. Wherever possible, especially on production views, perform calculations in the database to reduce overhead in Tableau. Aggregate calculations are great for calculated fields in Tableau. Perform row-level calculations in the database when you can.

  • Reduce the number of nested calculations. Just like Russian nesting dolls, unpacking a calculation and then building it takes longer for each extra layer.
    This is image title

  • Reduce the granularity of LOD or table calculations in the view. The more granular the calculation, the longer it takes.
    LODs - Look at the number of unique dimension members in the calculation.
    Table Calculations - the more marks in the view, the longer it will take to calculate.

  • Where possible, use MIN or MAX instead of AVG. AVG requires more processing than MIN or MAX. Often rows will be duplicated and display the same result with MIN, MAX, or AVG.

  • Make groups with calculations. Like include filters, calculated groups load only named members of the domain, whereas Tableau’s group function loads the entire domain.

  • Use Booleans or numeric calculations instead of string calculations. Computers can process integers and Booleans (t/f) much faster than strings. Boolean>Int>Float>Date>DateTime>String

5. Take advantage of Tableau’s query optimization

  • Blend on low-granularity dimensions. The more members in a blend, the longer it takes. Blending aggregates the data to the level of the relationship. Blends are not meant to replace row-level joins.
  • Minimize joined tables. Lots of joins take lots of time. If you find yourself creating a data connection with many joined tables, it may be faster to materialize the view in the database.
  • Assume referential integrity if your database is configured with this option.
  • Remove custom SQL. Tableau can take advantage of database optimizations when the data connection does not use custom SQL.

6. Clean up your workbooks!

  • Reduce dashboard scope. Excess worksheets on a dashboard can impact performance.
  • Delete or consolidate unused worksheets and data sources. A clean workbook is a happy workbook.

Take your career to new heights of success with a tableau training

#tableau #tableu #dashboard #technology #businessintelligence

渚  直樹

渚 直樹

1635917640

ループを使用して、Rustのデータを反復処理します

このモジュールでは、Rustでハッシュマップ複合データ型を操作する方法について説明します。ハッシュマップのようなコレクション内のデータを反復処理するループ式を実装する方法を学びます。演習として、要求された注文をループし、条件をテストし、さまざまなタイプのデータを処理することによって車を作成するRustプログラムを作成します。

さび遊び場

錆遊び場は錆コンパイラにブラウザインタフェースです。言語をローカルにインストールする前、またはコンパイラが利用できない場合は、Playgroundを使用してRustコードの記述を試すことができます。このコース全体を通して、サンプルコードと演習へのPlaygroundリンクを提供します。現時点でRustツールチェーンを使用できない場合でも、コードを操作できます。

Rust Playgroundで実行されるすべてのコードは、ローカルの開発環境でコンパイルして実行することもできます。コンピューターからRustコンパイラーと対話することを躊躇しないでください。Rust Playgroundの詳細については、What isRust?をご覧ください。モジュール。

学習目標

このモジュールでは、次のことを行います。

  • Rustのハッシュマップデータ型、およびキーと値にアクセスする方法を確認してください
  • ループ式を使用してRustプログラムのデータを反復処理する方法を探る
  • Rustプログラムを作成、コンパイル、実行して、ループを使用してハッシュマップデータを反復処理します

Rustのもう1つの一般的なコレクションの種類は、ハッシュマップです。このHashMap<K, V>型は、各キーKをその値にマッピングすることによってデータを格納しますV。ベクトル内のデータは整数インデックスを使用してアクセスされますが、ハッシュマップ内のデータはキーを使用してアクセスされます。

ハッシュマップタイプは、オブジェクト、ハッシュテーブル、辞書などのデータ項目の多くのプログラミング言語で使用されます。

ベクトルのように、ハッシュマップは拡張可能です。データはヒープに格納され、ハッシュマップアイテムへのアクセスは実行時にチェックされます。

ハッシュマップを定義する

次の例では、書評を追跡するためのハッシュマップを定義しています。ハッシュマップキーは本の名前であり、値は読者のレビューです。

use std::collections::HashMap;
let mut reviews: HashMap<String, String> = HashMap::new();

reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));
reviews.insert(String::from("Cooking with Rhubarb"), String::from("Sweet recipes."));
reviews.insert(String::from("Programming in Rust"), String::from("Great examples."));

このコードをさらに詳しく調べてみましょう。最初の行に、新しいタイプの構文が表示されます。

use std::collections::HashMap;

このuseコマンドは、Rust標準ライブラリの一部HashMapからの定義をcollectionsプログラムのスコープに取り込みます。この構文は、他のプログラミング言語がインポートと呼ぶものと似ています。

HashMap::newメソッドを使用して空のハッシュマップを作成します。reviews必要に応じてキーと値を追加または削除できるように、変数を可変として宣言します。この例では、ハッシュマップのキーと値の両方がStringタイプを使用しています。

let mut reviews: HashMap<String, String> = HashMap::new();

キーと値のペアを追加します

このinsert(<key>, <value>)メソッドを使用して、ハッシュマップに要素を追加します。コードでは、構文は<hash_map_name>.insert()次のとおりです。

reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));

キー値を取得する

ハッシュマップにデータを追加した後、get(<key>)メソッドを使用してキーの特定の値を取得できます。

// Look for a specific review
let book: &str = "Programming in Rust";
println!("\nReview for \'{}\': {:?}", book, reviews.get(book));

出力は次のとおりです。

Review for 'Programming in Rust': Some("Great examples.")

ノート

出力には、書評が単なる「すばらしい例」ではなく「Some( "すばらしい例。")」として表示されていることに注意してください。getメソッドはOption<&Value>型を返すため、Rustはメソッド呼び出しの結果を「Some()」表記でラップします。

キーと値のペアを削除します

この.remove()メソッドを使用して、ハッシュマップからエントリを削除できます。get無効なハッシュマップキーに対してメソッドを使用すると、getメソッドは「なし」を返します。

// Remove book review
let obsolete: &str = "Ancient Roman History";
println!("\n'{}\' removed.", obsolete);
reviews.remove(obsolete);

// Confirm book review removed
println!("\nReview for \'{}\': {:?}", obsolete, reviews.get(obsolete));

出力は次のとおりです。

'Ancient Roman History' removed.
Review for 'Ancient Roman History': None

このコードを試して、このRustPlaygroundでハッシュマップを操作できます。

演習:ハッシュマップを使用して注文を追跡する
この演習では、ハッシュマップを使用するように自動車工場のプログラムを変更します。

ハッシュマップキーと値のペアを使用して、車の注文に関する詳細を追跡し、出力を表示します。繰り返しになりますが、あなたの課題は、サンプルコードを完成させてコンパイルして実行することです。

この演習のサンプルコードで作業するには、次の2つのオプションがあります。

  • コードをコピーして、ローカル開発環境で編集します。
  • 準備されたRustPlaygroundでコードを開きます。

ノート

サンプルコードで、todo!マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。

現在のプログラムをロードする

最初のステップは、既存のプログラムコードを取得することです。

  1. 編集のために既存のプログラムコードを開きます。コードは、データ型宣言、および定義のため含みcar_qualitycar_factoryおよびmain機能を。

次のコードをコピーしてローカル開発環境で編集する
か、この準備されたRustPlaygroundでコードを開きます。

#[derive(PartialEq, Debug)]
struct Car { color: String, motor: Transmission, roof: bool, age: (Age, u32) }

#[derive(PartialEq, Debug)]
enum Transmission { Manual, SemiAuto, Automatic }

#[derive(PartialEq, Debug)]
enum Age { New, Used }

// Get the car quality by testing the value of the input argument
// - miles (u32)
// Return tuple with car age ("New" or "Used") and mileage
fn car_quality (miles: u32) -> (Age, u32) {

    // Check if car has accumulated miles
    // Return tuple early for Used car
    if miles > 0 {
        return (Age::Used, miles);
    }

    // Return tuple for New car, no need for "return" keyword or semicolon
    (Age::New, miles)
}

// Build "Car" using input arguments
fn car_factory(order: i32, miles: u32) -> Car {
    let colors = ["Blue", "Green", "Red", "Silver"];

    // Prevent panic: Check color index for colors array, reset as needed
    // Valid color = 1, 2, 3, or 4
    // If color > 4, reduce color to valid index
    let mut color = order as usize;
    if color > 4 {        
        // color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
        color = color - 4;
    }

    // Add variety to orders for motor type and roof type
    let mut motor = Transmission::Manual;
    let mut roof = true;
    if order % 3 == 0 {          // 3, 6, 9
        motor = Transmission::Automatic;
    } else if order % 2 == 0 {   // 2, 4, 8, 10
        motor = Transmission::SemiAuto;
        roof = false;
    }                            // 1, 5, 7, 11

    // Return requested "Car"
    Car {
        color: String::from(colors[(color-1) as usize]),
        motor: motor,
        roof: roof,
        age: car_quality(miles)
    }
}

fn main() {
    // Initialize counter variable
    let mut order = 1;
    // Declare a car as mutable "Car" struct
    let mut car: Car;

    // Order 6 cars, increment "order" for each request
    // Car order #1: Used, Hard top
    car = car_factory(order, 1000);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    // Car order #2: Used, Convertible
    order = order + 1;
    car = car_factory(order, 2000);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);    

    // Car order #3: New, Hard top
    order = order + 1;
    car = car_factory(order, 0);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    // Car order #4: New, Convertible
    order = order + 1;
    car = car_factory(order, 0);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    // Car order #5: Used, Hard top
    order = order + 1;
    car = car_factory(order, 3000);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    // Car order #6: Used, Hard top
    order = order + 1;
    car = car_factory(order, 4000);
    println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
}

2. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされて実行されることを確認してください。

次の出力が表示されます。

1: Used, Hard top = true, Manual, Blue, 1000 miles
2: Used, Hard top = false, SemiAuto, Green, 2000 miles
3: New, Hard top = true, Automatic, Red, 0 miles
4: New, Hard top = false, SemiAuto, Silver, 0 miles
5: Used, Hard top = true, Manual, Blue, 3000 miles
6: Used, Hard top = true, Automatic, Green, 4000 miles

注文の詳細を追跡するためのハッシュマップを追加する

現在のプログラムは、各車の注文を処理し、各注文が完了した後に要約を印刷します。car_factory関数を呼び出すたびにCar、注文の詳細を含む構造体が返され、注文が実行されます。結果はcar変数に格納されます。

お気づきかもしれませんが、このプログラムにはいくつかの重要な機能がありません。すべての注文を追跡しているわけではありません。car変数は、現在の注文の詳細のみを保持しています。関数carの結果で変数が更新されるたびcar_factoryに、前の順序の詳細が上書きされます。

ファイリングシステムのようにすべての注文を追跡するために、プログラムを更新する必要があります。この目的のために、<K、V>ペアでハッシュマップを定義します。ハッシュマップキーは、車の注文番号に対応します。ハッシュマップ値は、Car構造体で定義されているそれぞれの注文の詳細になります。

  1. ハッシュマップを定義するには、main関数の先頭、最初の中括弧の直後に次のコードを追加します{
// Initialize a hash map for the car orders
    // - Key: Car order number, i32
    // - Value: Car order details, Car struct
    use std::collections::HashMap;
    let mut orders: HashMap<i32, Car> = HashMap;

2. ordersハッシュマップを作成するステートメントの構文の問題を修正します。

ヒント

ハッシュマップを最初から作成しているので、おそらくこのnew()メソッドを使用することをお勧めします。

3. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされていることを確認してください。コンパイラからの警告メッセージは無視してかまいません。

ハッシュマップに値を追加する

次のステップは、履行された各自動車注文をハッシュマップに追加することです。

このmain関数では、car_factory車の注文ごとに関数を呼び出します。注文が履行された後、println!マクロを呼び出して、car変数に格納されている注文の詳細を表示します。

// Car order #1: Used, Hard top
    car = car_factory(order, 1000);
    println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    ...

    // Car order #6: Used, Hard top
    order = order + 1;
    car = car_factory(order, 4000);
    println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

新しいハッシュマップで機能するように、これらのコードステートメントを修正します。

  • car_factory関数の呼び出しは保持します。返された各Car構造体は、ハッシュマップの<K、V>ペアの一部として格納されます。
  • println!マクロの呼び出しを更新して、ハッシュマップに保存されている注文の詳細を表示します。
  1. main関数で、関数の呼び出しcar_factoryとそれに伴うprintln!マクロの呼び出しを見つけます。
// Car order #1: Used, Hard top
    car = car_factory(order, 1000);
    println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

    ...

    // Car order #6: Used, Hard top
    order = order + 1;
    car = car_factory(order, 4000);
    println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);

2. すべての自動車注文のステートメントの完全なセットを次の改訂されたコードに置き換えます。

// Car order #1: Used, Hard top
    car = car_factory(order, 1000);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    // Car order #2: Used, Convertible
    order = order + 1;
    car = car_factory(order, 2000);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    // Car order #3: New, Hard top
    order = order + 1;
    car = car_factory(order, 0);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    // Car order #4: New, Convertible
    order = order + 1;
    car = car_factory(order, 0);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    // Car order #5: Used, Hard top
    order = order + 1;
    car = car_factory(order, 3000);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    // Car order #6: Used, Hard top
    order = order + 1;
    car = car_factory(order, 4000);
    orders(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

3. 今すぐプログラムをビルドしようとすると、コンパイルエラーが表示されます。<K、V>ペアをordersハッシュマップに追加するステートメントに構文上の問題があります。問題がありますか?先に進んで、ハッシュマップに順序を追加する各ステートメントの問題を修正してください。

ヒント

ordersハッシュマップに直接値を割り当てることはできません。挿入を行うにはメソッドを使用する必要があります。

プログラムを実行する

プログラムが正常にビルドされると、次の出力が表示されます。

Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 1000) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 2000) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("New", 0) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("New", 0) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 3000) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 4000) })

改訂されたコードの出力が異なることに注意してください。println!マクロディスプレイの内容Car各値を示すことによって、構造体と対応するフィールド名。

次の演習では、ループ式を使用してコードの冗長性を減らします。

for、while、およびloop式を使用します


多くの場合、プログラムには、その場で繰り返す必要のあるコードのブロックがあります。ループ式を使用して、繰り返しの実行方法をプログラムに指示できます。電話帳のすべてのエントリを印刷するには、ループ式を使用して、最初のエントリから最後のエントリまで印刷する方法をプログラムに指示できます。

Rustは、プログラムにコードのブロックを繰り返させるための3つのループ式を提供します。

  • loop:手動停止が発生しない限り、繰り返します。
  • while:条件が真のままで繰り返します。
  • for:コレクション内のすべての値に対して繰り返します。

この単元では、これらの各ループ式を見ていきます。

ループし続けるだけ

loop式は、無限ループを作成します。このキーワードを使用すると、式の本文でアクションを継続的に繰り返すことができます。ループを停止させるための直接アクションを実行するまで、アクションが繰り返されます。

次の例では、「We loopforever!」というテキストを出力します。そしてそれはそれ自体で止まりません。println!アクションは繰り返し続けます。

loop {
    println!("We loop forever!");
}

loop式を使用する場合、ループを停止する唯一の方法は、プログラマーとして直接介入する場合です。特定のコードを追加してループを停止したり、Ctrl + Cなどのキーボード命令を入力してプログラムの実行を停止したりできます。

loop式を停止する最も一般的な方法は、breakキーワードを使用してブレークポイントを設定することです。

loop {
    // Keep printing, printing, printing...
    println!("We loop forever!");
    // On the other hand, maybe we should stop!
    break;                            
}

プログラムがbreakキーワードを検出すると、loop式の本体でアクションの実行を停止し、次のコードステートメントに進みます。

breakキーワードは、特別な機能を明らかにするloop表現を。breakキーワードを使用すると、式本体でのアクションの繰り返しを停止することも、ブレークポイントで値を返すこともできます。

次の例はbreakloop式でキーワードを使用して値も返す方法を示しています。

let mut counter = 1;
// stop_loop is set when loop stops
let stop_loop = loop {
    counter *= 2;
    if counter > 100 {
        // Stop loop, return counter value
        break counter;
    }
};
// Loop should break when counter = 128
println!("Break the loop at counter = {}.", stop_loop);

出力は次のとおりです。

Break the loop at counter = 128.

私たちのloop表現の本体は、これらの連続したアクションを実行します。

  1. stop_loop変数を宣言します。
  2. 変数値をloop式の結果にバインドするようにプログラムに指示します。
  3. ループを開始します。loop式の本体でアクションを実行します:
    ループ本体
    1. counter値を現在の値の2倍にインクリメントします。
    2. counter値を確認してください。
    3. もしcounter値が100以上です。

ループから抜け出し、counter値を返します。

4. もしcounter値が100以上ではありません。

ループ本体でアクションを繰り返します。

5. stop_loop値を式のcounter結果である値に設定しますloop

loop式本体は、複数のブレークポイントを持つことができます。式に複数のブレークポイントがある場合、すべてのブレークポイントは同じタイプの値を返す必要があります。すべての値は、整数型、文字列型、ブール型などである必要があります。ブレークポイントが明示的に値を返さない場合、プログラムは式の結果を空のタプルとして解釈します()

しばらくループする

whileループは、条件式を使用しています。条件式が真である限り、ループが繰り返されます。このキーワードを使用すると、条件式がfalseになるまで、式本体のアクションを実行できます。

whileループは、ブール条件式を評価することから始まります。条件式がと評価されるtrueと、本体のアクションが実行されます。アクションが完了すると、制御は条件式に戻ります。条件式がと評価されるfalseと、while式は停止します。

次の例では、「しばらくループします...」というテキストを出力します。ループを繰り返すたびに、「カウントが5未満である」という条件がテストされます。条件が真のままである間、式本体のアクションが実行されます。条件が真でなくなった後、whileループは停止し、プログラムは次のコードステートメントに進みます。

while counter < 5 {
    println!("We loop a while...");
    counter = counter + 1;
}

これらの値のループ

forループは、項目のコレクションを処理するためにイテレータを使用しています。ループは、コレクション内の各アイテムの式本体のアクションを繰り返します。このタイプのループの繰り返しは、反復と呼ばれます。すべての反復が完了すると、ループは停止します。

Rustでは、配列、ベクトル、ハッシュマップなど、任意のコレクションタイプを反復処理できます。Rustはイテレータを使用して、コレクション内の各アイテムを最初から最後まで移動します

forループはイテレータとして一時変数を使用しています。変数はループ式の開始時に暗黙的に宣言され、現在の値は反復ごとに設定されます。

次のコードでは、コレクションはbig_birds配列であり、イテレーターの名前はbirdです。

let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds

iter()メソッドを使用して、コレクション内のアイテムにアクセスします。for式は結果にイテレータの現在の値をバインドするiter()方法。式本体では、イテレータ値を操作できます。

let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds.iter() {
    println!("The {} is a big bird.", bird);
}

出力は次のとおりです。

The ostrich is a big bird.
The peacock is a big bird.
The stork is a big bird.

イテレータを作成するもう1つの簡単な方法は、範囲表記を使用することですa..b。イテレータはa値から始まりb、1ステップずつ続きますが、値を使用しませんb

for number in 0..5 {
    println!("{}", number * 2);
}

このコードは、0、1、2、3、および4の数値をnumber繰り返し処理します。ループの繰り返しごとに、値を変数にバインドします。

出力は次のとおりです。

0
2
4
6
8

このコードを実行して、このRustPlaygroundでループを探索できます。

演習:ループを使用してデータを反復処理する


この演習では、自動車工場のプログラムを変更して、ループを使用して自動車の注文を反復処理します。

main関数を更新して、注文の完全なセットを処理するためのループ式を追加します。ループ構造は、コードの冗長性を減らすのに役立ちます。コードを簡素化することで、注文量を簡単に増やすことができます。

このcar_factory関数では、範囲外の値での実行時のパニックを回避するために、別のループを追加します。

課題は、サンプルコードを完成させて、コンパイルして実行することです。

この演習のサンプルコードで作業するには、次の2つのオプションがあります。

  • コードをコピーして、ローカル開発環境で編集します。
  • 準備されたRustPlaygroundでコードを開きます。

ノート

サンプルコードで、todo!マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。

プログラムをロードする

前回の演習でプログラムコードを閉じた場合は、この準備されたRustPlaygroundでコードを再度開くことができます。

必ずプログラムを再構築し、コンパイラエラーなしで実行されることを確認してください。

ループ式でアクションを繰り返す

より多くの注文をサポートするには、プログラムを更新する必要があります。現在のコード構造では、冗長ステートメントを使用して6つの注文をサポートしています。冗長性は扱いにくく、維持するのが困難です。

ループ式を使用してアクションを繰り返し、各注文を作成することで、構造を単純化できます。簡略化されたコードを使用すると、多数の注文をすばやく作成できます。

  1. ではmain機能、削除次の文を。このコードブロックは、order変数を定義および設定し、自動車の注文のcar_factory関数とprintln!マクロを呼び出し、各注文をordersハッシュマップに挿入します。
// Order 6 cars
    // - Increment "order" after each request
    // - Add each order <K, V> pair to "orders" hash map
    // - Call println! to show order details from the hash map

    // Initialize order variable
    let mut order = 1;

    // Car order #1: Used, Hard top
    car = car_factory(order, 1000);
    orders.insert(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

    ...

    // Car order #6: Used, Hard top
    order = order + 1;
    car = car_factory(order, 4000);
    orders.insert(order, car);
    println!("Car order {}: {:?}", order, orders.get(&order));

2. 削除されたステートメントを次のコードブロックに置き換えます。

// Start with zero miles
    let mut miles = 0;

    todo!("Add a loop expression to fulfill orders for 6 cars, initialize `order` variable to 1") {

        // Call car_factory to fulfill order
        // Add order <K, V> pair to "orders" hash map
        // Call println! to show order details from the hash map        
        car = car_factory(order, miles);
        orders.insert(order, car);
        println!("Car order {}: {:?}", order, orders.get(&order));

        // Reset miles for order variety
        if miles == 2100 {
            miles = 0;
        } else {
            miles = miles + 700;
        }
    }

3. アクションを繰り返すループ式を追加して、6台の車の注文を作成します。order1に初期化された変数が必要です。

4. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。

次の例のような出力が表示されます。

Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })

車の注文を11に増やす

 プログラムは現在、ループを使用して6台の車の注文を処理しています。6台以上注文するとどうなりますか?

  1. main関数のループ式を更新して、11台の車を注文します。
    todo!("Update the loop expression to create 11 cars");

2. プログラムを再構築します。実行時に、プログラムはパニックになります!

Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.26s
    Running `target/debug/playground`
thread 'main' panicked at 'index out of bounds: the len is 4 but the index is 4', src/main.rs:34:29

この問題を解決する方法を見てみましょう。

ループ式で実行時のパニックを防ぐ

このcar_factory関数では、if / else式を使用colorして、colors配列のインデックスの値を確認します。

// Prevent panic: Check color index for colors array, reset as needed
    // Valid color = 1, 2, 3, or 4
    // If color > 4, reduce color to valid index
    let mut color = order as usize;
    if color > 4 {        
        // color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
        color = color - 4;
    }

colors配列には4つの要素を持ち、かつ有効なcolor場合は、インデックスの範囲は0〜3の条件式をチェックしているcolor私たちはをチェックしません(インデックスが4よりも大きい場合color、その後の関数で4に等しいインデックスへのときに我々のインデックスを車の色を割り当てる配列では、インデックス値から1を減算しますcolor - 1color値4はcolors[3]、配列と同様に処理されます。)

現在のif / else式は、8台以下の車を注文するときの実行時のパニックを防ぐためにうまく機能します。しかし、11台の車を注文すると、プログラムは9番目の注文でパニックになります。より堅牢になるように式を調整する必要があります。この改善を行うために、別のループ式を使用します。

  1. ではcar_factory機能、ループ式であれば/他の条件文を交換してください。colorインデックス値が4より大きい場合に実行時のパニックを防ぐために、次の擬似コードステートメントを修正してください。
// Prevent panic: Check color index, reset as needed
    // If color = 1, 2, 3, or 4 - no change needed
    // If color > 4, reduce to color to a valid index
    let mut color = order as usize;
    todo!("Replace `if/else` condition with a loop to prevent run-time panic for color > 4");

ヒント

この場合、if / else条件からループ式への変更は実際には非常に簡単です。

2. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。

次の出力が表示されます。

Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })
Car order 7: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })
Car order 8: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 9: Some(Car { color: "Blue", motor: Automatic, roof: true, age: ("New", 0) })
Car order 10: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 11: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })

概要

このモジュールでは、Rustで使用できるさまざまなループ式を調べ、ハッシュマップの操作方法を発見しました。データは、キーと値のペアとしてハッシュマップに保存されます。ハッシュマップは拡張可能です。

loop手動でプロセスを停止するまでの式は、アクションを繰り返します。while式をループして、条件が真である限りアクションを繰り返すことができます。このfor式は、データ収集を反復処理するために使用されます。

この演習では、自動車プログラムを拡張して、繰り返されるアクションをループし、すべての注文を処理しました。注文を追跡するためにハッシュマップを実装しました。

このラーニングパスの次のモジュールでは、Rustコードでエラーと障害がどのように処理されるかについて詳しく説明します。

 リンク: https://docs.microsoft.com/en-us/learn/modules/rust-loop-expressions/

#rust #Beginners 

Elian  Harber

Elian Harber

1641430440

Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.

With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling:

df.plot_bokeh()

Pandas-Bokeh also provides native support as a Pandas Plotting backend for Pandas >= 0.25. When Pandas-Bokeh is installed, switchting the default Pandas plotting backend to Bokeh can be done via:

pd.set_option('plotting.backend', 'pandas_bokeh')

More details about the new Pandas backend can be found below.


Interactive Documentation

Please visit:

https://patrikhlobil.github.io/Pandas-Bokeh/

for an interactive version of the documentation below, where you can play with the dynamic Bokeh plots.


For more information have a look at the Examples below or at notebooks on the Github Repository of this project.

Startimage


 

Installation

You can install Pandas-Bokeh from PyPI via pip

pip install pandas-bokeh

or conda:

conda install -c patrikhlobil pandas-bokeh

With the current release 0.5.5, Pandas-Bokeh officially supports Python 3.6 and newer. For more details, see Release Notes.

How To Use

Classical Use

The Pandas-Bokeh library should be imported after Pandas, GeoPandas and/or Pyspark. After the import, one should define the plotting output, which can be:

pandas_bokeh.output_notebook(): Embeds the Plots in the cell outputs of the notebook. Ideal when working in Jupyter Notebooks.

pandas_bokeh.output_file(filename): Exports the plot to the provided filename as an HTML.

For more details about the plotting outputs, see the reference here or the Bokeh documentation.

Notebook output (see also bokeh.io.output_notebook)

import pandas as pd import pandas_bokeh pandas_bokeh.output_notebook()

File output to "Interactive Plot.html" (see also bokeh.io.output_file)

import pandas as pd import pandas_bokeh pandas_bokeh.output_file("Interactive Plot.html")

Pandas-Bokeh as native Pandas plotting backend

For pandas >= 0.25, a plotting backend switch is natively supported. It can be achievied by calling:

import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')

Now, the plotting API is accessible for a Pandas DataFrame via:

df.plot(...)

All additional functionalities of Pandas-Bokeh are then accessible at pd.plotting. So, setting the output to notebook is:

pd.plotting.output_notebook()

or calling the grid layout functionality:

pd.plotting.plot_grid(...)

Note: Backwards compatibility is kept since there will still be the df.plot_bokeh(...) methods for a DataFrame.


Plot types

Supported plottypes are at the moment:

Also, check out the complementary chapter Outputs, Formatting & Layouts about:


Lineplot

Basic Lineplot

This simple lineplot in Pandas-Bokeh already contains various interactive elements:

  • a pannable and zoomable (zoom in plotarea and zoom on axis) plot
  • by clicking on the legend elements, one can hide and show the individual lines
  • a Hovertool for the plotted lines

Consider the following simple example:

import numpy as np

np.random.seed(42)
df = pd.DataFrame({"Google": np.random.randn(1000)+0.2, 
                   "Apple": np.random.randn(1000)+0.17}, 
                   index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line")       #equivalent to df.plot_bokeh.line()

ApplevsGoogle_1

Note, that similar to the regular pandas.DataFrame.plot method, there are also additional accessors to directly access the different plotting types like:

  • df.plot_bokeh(kind="line", ...)df.plot_bokeh.line(...)
  • df.plot_bokeh(kind="bar", ...)df.plot_bokeh.bar(...)
  • df.plot_bokeh(kind="hist", ...)df.plot_bokeh.hist(...)
  • ...

Advanced Lineplot

There are various optional parameters to tune the plots, for example:

kind: Which kind of plot should be produced. Currently supported are: "line", "point", "scatter", "bar" and "histogram". In the near future many more will be implemented as horizontal barplot, boxplots, pie-charts, etc.

x: Name of the column to use for the horizontal x-axis. If the x parameter is not specified, the index is used for the x-values of the plot. Alternative, also an array of values can be passed that has the same number of elements as the DataFrame.

y: Name of column or list of names of columns to use for the vertical y-axis.

figsize: Choose width & height of the plot

title: Sets title of the plot

xlim/ylim: Set visibler range of plot for x- and y-axis (also works for datetime x-axis)

xlabel/ylabel: Set x- and y-labels

logx/logy: Set log-scale on x-/y-axis

xticks/yticks: Explicitly set the ticks on the axes

color: Defines a single color for a plot.

colormap: Can be used to specify multiple colors to plot. Can be either a list of colors or the name of a Bokeh color palette

hovertool: If True a Hovertool is active, else if False no Hovertool is drawn.

hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation and here)

toolbar_location: Specify the position of the toolbar location (None, "above", "below", "left" or "right"). Default: "right"

zooming: Enables/Disables zooming. Default: True

panning: Enables/Disables panning. Default: True

fontsize_label/fontsize_ticks/fontsize_title/fontsize_legend: Set fontsize of labels, ticks, title or legend (int or string of form "15pt")

rangetool Enables a range tool scroller. Default False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.line

Try them out to get a feeling for the effects. Let us consider now:

df.plot_bokeh.line(
    figsize=(800, 450),
    y="Apple",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    toolbar_location=None,
    colormap=["red", "blue"],
    hovertool_string=r"""<img
                        src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Apple_logo_black.svg/170px-Apple_logo_black.svg.png' 
                        height="42" alt="@imgs" width="42"
                        style="float: left; margin: 0px 15px 15px 0px;"
                        border="2"></img> Apple 
                        
                        <h4> Stock Price: </h4> @{Apple}""",
    panning=False,
    zooming=False)

ApplevsGoogle_2

Lineplot with data points

For lineplots, as for many other plot-kinds, there are some special keyword arguments that only work for this plotting type. For lineplots, these are:

plot_data_points: Plot also the data points on the lines

plot_data_points_size: Determines the size of the data points

marker: Defines the point type (Default: "circle"). Possible values are: 'circle', 'square', 'triangle', 'asterisk', 'circle_x', 'square_x', 'inverted_triangle', 'x', 'circle_cross', 'square_cross', 'diamond', 'cross'

kwargs**: Optional keyword arguments of bokeh.plotting.figure.line```

Let us use this information to have another version of the same plot:

df.plot_bokeh.line(
    figsize=(800, 450),
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(100, 200),
    xlim=("2001-01-01", "2001-02-01"),
    colormap=["red", "blue"],
    plot_data_points=True,
    plot_data_points_size=10,
    marker="asterisk")

ApplevsGoogle_3

Lineplot with rangetool

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()

df.plot_bokeh(rangetool=True)

rangetool

Pointplot

If you just wish to draw the date points for curves, the pointplot option is the right choice. It also accepts the kwargs of bokeh.plotting.figure.scatter like marker or size:

import numpy as np

x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
    x="x",
    xticks=range(-3, 4),
    size=5,
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    marker="x")

Pointplot

Stepplot

With a similar API as the line- & pointplots, one can generate a stepplot. Additional keyword arguments for this plot type are passes to bokeh.plotting.figure.step, e.g. mode (before, after, center), see the following example

import numpy as np

x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
    x="x",
    xticks=range(-1, 1),
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    figsize=(800,300),
    fontsize_title=30,
    fontsize_label=25,
    fontsize_ticks=15,
    fontsize_legend=5,
    )

df.plot_bokeh.step(
    x="x",
    xticks=range(-1, 1),
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    mode="after",
    figsize=(800,300)
    )

Stepplot

Note that the step-plot API of Bokeh does so far not support a hovertool functionality.

Scatterplot

A basic scatterplot can be created using the kind="scatter" option. For scatterplots, the x and y parameters have to be specified and the following optional keyword argument is allowed:

category: Determines the category column to use for coloring the scatter points

kwargs**: Optional keyword arguments of bokeh.plotting.figure.scatter

Note, that the pandas.DataFrame.plot_bokeh() method return per default a Bokeh figure, which can be embedded in Dashboard layouts with other figures and Bokeh objects (for more details about (sub)plot layouts and embedding the resulting Bokeh plots as HTML click here).

In the example below, we use the building grid layout support of Pandas-Bokeh to display both the DataFrame (using a Bokeh DataTable) and the resulting scatterplot:

# Load Iris Dataset:
df = pd.read_csv(
    r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)

# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource

data_table = DataTable(
    columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
    source=ColumnDataSource(df),
    height=300,
)

# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization",
    show_figure=False,
)

# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)

 

Scatterplot

A possible optional keyword parameters that can be passed to bokeh.plotting.figure.scatter is size. Below, we use the sepal length of the Iris data as reference for the size:

#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15

#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization with Size Keyword",
    size="sepal length (cm)")

Scatterplot2

In this example you can see, that the additional dimension sepal length cannot be used to clearly differentiate between the virginica and versicolor species.

Barplot

The barplot API has no special keyword arguments, but accepts optional kwargs of bokeh.plotting.figure.vbar like alpha. It uses per default the index for the bar categories (however, also columns can be used as x-axis category using the x argument).

data = {
    'fruits':
    ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
    '2015': [2, 1, 4, 3, 2, 4],
    '2016': [5, 3, 3, 2, 4, 6],
    '2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")

p_bar = df.plot_bokeh.bar(
    ylabel="Price per Unit [€]", 
    title="Fruit prices per Year", 
    alpha=0.6)

Barplot

Using the stacked keyword argument you also maked stacked barplots:

p_stacked_bar = df.plot_bokeh.bar(
    ylabel="Price per Unit [€]",
    title="Fruit prices per Year",
    stacked=True,
    alpha=0.6)

Barplot2

Also horizontal versions of the above barplot are supported with the keyword kind="barh" or the accessor plot_bokeh.barh. You can still specify a column of the DataFrame as the bar category via the x argument if you do not wish to use the index.

#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)

#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
    kind="barh",
    x="fruits",
    xlabel="Price per Unit [€]",
    title="Fruit prices per Year",
    alpha=0.6,
    legend = "bottom_right",
    show_figure=False)

#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
    x="fruits",
    stacked=True,
    xlabel="Price per Unit [€]",
    title="Fruit prices per Year",
    alpha=0.6,
    legend = "bottom_right",
    show_figure=False)

#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
                        [p_hbar, p_stacked_hbar]], 
                       plot_width=450)

Barplot3

Histogram

For drawing histograms (kind="hist"), Pandas-Bokeh has a lot of customization features. Optional keyword arguments for histogram plots are:

bins: Determines bins to use for the histogram. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by histogram_bin_edges.

histogram_type: Either "sidebyside", "topontop" or "stacked". Default: "topontop"

stacked: Boolean that overrides the histogram_type as "stacked" if given. Default: False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.quad

Below examples of the different histogram types:

import numpy as np

df_hist = pd.DataFrame({
    'a': np.random.randn(1000) + 1,
    'b': np.random.randn(1000),
    'c': np.random.randn(1000) - 1
    },
    columns=['a', 'b', 'c'])

#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
    bins=np.linspace(-5, 5, 41),
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Top-on-Top)",
    line_color="black")

#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
    kind="hist",
    bins=np.linspace(-5, 5, 41),
    histogram_type="sidebyside",
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Side-by-Side)",
    line_color="black")

#Stacked histogram:
df_hist.plot_bokeh.hist(
    bins=np.linspace(-5, 5, 41),
    histogram_type="stacked",
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Stacked)",
    line_color="black")

Histogram

Further, advanced keyword arguments for histograms are:

  • weights: A column of the DataFrame that is used as weight for the histogramm aggregation (see also numpy.histogram)
  • normed: If True, histogram values are normed to 1 (sum of histogram values=1). It is also possible to pass an integer, e.g. normed=100 would result in a histogram with percentage y-axis (sum of histogram values=100). Default: False
  • cumulative: If True, a cumulative histogram is shown. Default: False
  • show_average: If True, the average of the histogram is also shown. Default: False

Their usage is shown in these examples:

p_hist = df_hist.plot_bokeh.hist(
    y=["a", "b"],
    bins=np.arange(-4, 6.5, 0.5),
    normed=100,
    vertical_xlabel=True,
    ylabel="Share[%]",
    title="Normal distributions (normed)",
    show_average=True,
    xlim=(-4, 6),
    ylim=(0, 30),
    show_figure=False)

p_hist_cum = df_hist.plot_bokeh.hist(
    y=["a", "b"],
    bins=np.arange(-4, 6.5, 0.5),
    normed=100,
    cumulative=True,
    vertical_xlabel=True,
    ylabel="Share[%]",
    title="Normal distributions (normed & cumulative)",
    show_figure=False)

pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300)

Histogram2


 

Areaplot

Areaplot (kind="area") can be either drawn on top of each other or stacked. The important parameters are:

stacked: If True, the areaplots are stacked. If False, plots are drawn on top of each other. Default: False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.patch


Let us consider the energy consumption split by source that can be downloaded as DataFrame via:

df_energy = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/energy/energy.csv", 
parse_dates=["Year"])
df_energy.head()
YearOilGasCoalNuclear EnergyHydroelectricityOther Renewable
1970-01-012291.5826.71467.317.7265.85.8
1971-01-012427.7884.81459.224.9276.46.3
1972-01-012613.9933.71475.734.1288.96.8
1973-01-012818.1978.01519.645.9292.57.3
1974-01-012777.31001.91520.959.6321.17.7


Creating the Areaplot can be achieved via:

df_energy.plot_bokeh.area(
    x="Year",
    stacked=True,
    legend="top_left",
    colormap=["brown", "orange", "black", "grey", "blue", "green"],
    title="Worldwide energy consumption split by energy source",
    ylabel="Million tonnes oil equivalent",
    ylim=(0, 16000))

areaplot

Note that the energy consumption of fossile energy is still increasing and renewable energy sources are still small in comparison 😢!!! However, when we norm the plot using the normed keyword, there is a clear trend towards renewable energies in the last decade:

df_energy.plot_bokeh.area(
    x="Year",
    stacked=True,
    normed=100,
    legend="bottom_left",
    colormap=["brown", "orange", "black", "grey", "blue", "green"],
    title="Worldwide energy consumption split by energy source",
    ylabel="Million tonnes oil equivalent")

areaplot2

Pieplot

For Pieplots, let us consider a dataset showing the results of all Bundestags elections in Germany since 2002:

df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie
Partei20022005200920132017
CDU/CSU38.535.233.841.532.9
SPD38.534.223.025.720.5
FDP7.49.814.64.810.7
Grünen8.68.110.78.48.9
Linke/PDS4.08.711.98.69.2
AfD0.00.00.00.012.6
Sonstige3.04.06.011.05.0

We can create a Pieplot of the last election in 2017 by specifying the "Partei" (german for party) column as the x column and the "2017" column as the y column for values:

df_pie.plot_bokeh.pie(
    x="Partei",
    y="2017",
    colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
    title="Results of German Bundestag Election 2017",
    )

pieplot

When you pass several columns to the y parameter (not providing the y-parameter assumes you plot all columns), multiple nested pieplots will be shown in one plot:

df_pie.plot_bokeh.pie(
    x="Partei",
    colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
    title="Results of German Bundestag Elections [2002-2017]",
    line_color="grey")

pieplot2

Mapplot

The mapplot method of Pandas-Bokeh allows for plotting geographic points stored in a Pandas DataFrame on an interactive map. For more advanced Geoplots for line and polygon shapes have a look at the Geoplots examples for the GeoPandas API of Pandas-Bokeh.

For mapplots, only (latitude, longitude) pairs in geographic projection (WGS84) can be plotted on a map. The basic API has the following 2 base parameters:

  • x: name of the longitude column of the DataFrame
  • y: name of the latitude column of the DataFrame

The other optional keyword arguments are discussed in the section about the GeoPandas API, e.g. category for coloring the points.

Below an example of plotting all cities for more than 1 million inhabitants:

df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
namepop_maxlatitudelongitudesize
Mesa108539433.423915-111.7360841.085394
Sharjah110302725.37138355.4064781.103027
Changwon108149935.219102128.5835621.081499
Sheffield129290053.366677-1.4999971.292900
Abbottabad118364734.14950373.1995011.183647
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
    x="longitude",
    y="latitude",
    hovertool_string="""<h2> @{name} </h2> 
    
                        <h3> Population: @{pop_max} </h3>""",
    tile_provider="STAMEN_TERRAIN_RETINA",
    size="size", 
    figsize=(900, 600),
    title="World cities with more than 1.000.000 inhabitants")

 

Mapplot

Geoplots

Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :

  • Points/MultiPoints
  • Lines/MultiLines
  • Polygons/MultiPolygons

Note: t is not possible to mix up the objects types, i.e. a GeoDataFrame with Points and Lines is for example not allowed.

Les us start with a simple example using the "World Borders Dataset" . Let us first import all neccessary libraries and read the shapefile:

import geopandas as gpd
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

#Read in GeoJSON from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states.head()
STATE_NAMEREGIONPOPESTIMATE2010POPESTIMATE2011POPESTIMATE2012POPESTIMATE2013POPESTIMATE2014POPESTIMATE2015POPESTIMATE2016POPESTIMATE2017geometry
Hawaii413638171378323139277214080381417710142632014286831427538(POLYGON ((-160.0738033454681 22.0041773479577...
Washington467413866819155689089969634107046931715281872809347405743(POLYGON ((-122.4020153103835 48.2252163723779...
Montana4990507996866100352210119211019931102831710386561050493POLYGON ((-111.4754253002074 44.70216236909688...
Maine113275681327968132810113279751328903132778713302321335907(POLYGON ((-69.77727626137293 44.0741483685119...
North Dakota2674518684830701380722908738658754859755548755393POLYGON ((-98.73043728833767 45.93827137024809...

Plotting the data on a map is as simple as calling:

df_states.plot_bokeh(simplify_shapes=10000)

US_States_1

We also passed the optional parameter simplify_shapes (~meter) to improve plotting performance (for a reference see shapely.object.simplify). The above geolayer thus has an accuracy of about 10km.

Many keyword arguments like xlabel, ylabel, xlim, ylim, title, colormap, hovertool, zooming, panning, ... for costumizing the plot are also available for the geoplotting API and can be uses as in the examples shown above. There are however also many other options especially for plotting geodata:

  • geometry_column: Specify the column that stores the geometry-information (default: "geometry")
  • hovertool_columns: Specify column names, for which values should be shown in hovertool
  • hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation)
  • colormap_uselog: If set True, the colormapper is using a logscale. Default: False
  • colormap_range: Specify the value range of the colormapper via (min, max) tuple
  • tile_provider: Define build-in tile provider for background maps. Possible values: None, 'CARTODBPOSITRON', 'CARTODBPOSITRON_RETINA', 'STAMEN_TERRAIN', 'STAMEN_TERRAIN_RETINA', 'STAMEN_TONER', 'STAMEN_TONER_BACKGROUND', 'STAMEN_TONER_LABELS'. Default: CARTODBPOSITRON_RETINA
  • tile_provider_url: An arbitraty tile_provider_url of the form '/{Z}/{X}/{Y}*.png' can be passed to be used as background map.
  • tile_attribution: String (also HTML accepted) for showing attribution for tile source in the lower right corner
  • tile_alpha: Sets the alpha value of the background tile between [0, 1]. Default: 1

One of the most common usage of map plots are choropleth maps, where the color of a the objects is determined by the property of the object itself. There are 3 ways of drawing choropleth maps using Pandas-Bokeh, which are described below.

Categories

This is the simplest way. Just provide the category keyword for the selection of the property column:

  • category: Specifies the column of the GeoDataFrame that should be used to draw a choropleth map
  • show_colorbar: Whether or not to show a colorbar for categorical plots. Default: True

Let us now draw the regions as a choropleth plot using the category keyword (at the moment, only numerical columns are supported for choropleth plots):

df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    category="REGION",
    show_colorbar=False,
    colormap=["blue", "yellow", "green", "red"],
    hovertool_columns=["STATE_NAME", "REGION"],
    tile_provider="STAMEN_TERRAIN_RETINA")

When hovering over the states, the state-name and the region are shown as specified in the hovertool_columns argument.

US_States_2

 

Dropdown

By passing a list of column names of the GeoDataFrame as the dropdown keyword argument, a dropdown menu is shown above the map. This dropdown menu can be used to select the choropleth layer by the user. :

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    dropdown=["POPESTIMATE2010", "POPESTIMATE2017"],
    colormap="Viridis",
    hovertool_string="""
                        <img
                        src="https://www.states101.com/img/flags/gif/small/@STATE_NAME_SMALL.gif" 
                        height="42" alt="@imgs" width="42"
                        style="float: left; margin: 0px 15px 15px 0px;"
                        border="2"></img>
                
                        <h2>  @STATE_NAME </h2>
                        <h3> 2010: @POPESTIMATE2010 </h3>
                        <h3> 2017: @POPESTIMATE2017 </h3>""",
    tile_provider_url=r"http://c.tile.stamen.com/watercolor/{Z}/{X}/{Y}.jpg",
    tile_attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, under <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a>. Data by <a href="http://openstreetmap.org">OpenStreetMap</a>, under <a href="http://www.openstreetmap.org/copyright">ODbL</a>.'
    )

US_States_3

Using hovertool_string, one can pass a string that can contain arbitrary HTML elements (including divs, images, ...) that is shown when hovering over the geographies (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation).

Here, we also used an OSM tile server with watercolor style via tile_provider_url and added the attribution via tile_attribution.

Sliders

Another option for interactive choropleth maps is the slider implementation of Pandas-Bokeh. The possible keyword arguments are here:

  • slider: By passing a list of column names of the GeoDataFrame, a slider can be used to . This dropdown menu can be used to select the choropleth layer by the user.
  • slider_range: Pass a range (or numpy.arange) of numbers object to relate the sliders values with the slider columns. By passing range(0,10), the slider will have values [0, 1, 2, ..., 9], when passing numpy.arange(3,5,0.5), the slider will have values [3, 3.5, 4, 4.5]. Default: range(0, len(slider))
  • slider_name: Specifies the title of the slider. Default is an empty string.

This can be used to display the change in population relative to the year 2010:


#Calculate change of population relative to 2010:
for i in range(8):
    df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

#Specify slider columns:
slider_columns = ["Delta_Population_201%d"%i for i in range(8)]

#Specify slider-range (Maps "Delta_Population_2010" -> 2010, 
#                           "Delta_Population_2011" -> 2011, ...):
slider_range = range(2010, 2018)

#Make slider plot:
df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    slider=slider_columns,
    slider_range=slider_range,
    slider_name="Year", 
    colormap="Inferno",
    hovertool_columns=["STATE_NAME"] + slider_columns,
    title="Change of Population [%]")

US_States_4



 

Plot multiple geolayers

If you wish to display multiple geolayers, you can pass the Bokeh figure of a Pandas-Bokeh plot via the figure keyword to the next plot_bokeh() call:

import geopandas as gpd
import pandas_bokeh
pandas_bokeh.output_notebook()

# Read in GeoJSONs from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_cities = gpd.read_file(
    r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson"
)
df_cities["size"] = df_cities.pop_max / 400000

#Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
    figsize=(800, 450),
    simplify_shapes=10000,
    show_figure=False,
    xlim=[-170, -80],
    ylim=[10, 70],
    category="REGION",
    colormap="Dark2",
    legend="States",
    show_colorbar=False,
)

#Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
    figure=figure,         # <== pass figure here!
    category="pop_max",
    colormap="Viridis",
    colormap_uselog=True,
    size="size",
    hovertool_string="""<h1>@name</h1>
                        <h3>Population: @pop_max </h3>""",
    marker="inverted_triangle",
    legend="Cities",
)

Multiple Geolayers


Point & Line plots:

Below, you can see an example that use Pandas-Bokeh to plot point data on a map. The plot shows all cities with a population larger than 1.000.000. For point plots, you can select the marker as keyword argument (since it is passed to bokeh.plotting.figure.scatter). Here an overview of all available marker types:

gdf = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")
gdf["size"] = gdf.pop_max / 400000

gdf.plot_bokeh(
    category="pop_max",
    colormap="Viridis",
    colormap_uselog=True,
    size="size",
    hovertool_string="""<h1>@name</h1>
                        <h3>Population: @pop_max </h3>""",
    xlim=[-15, 35],
    ylim=[30,60],
    marker="inverted_triangle");

Pointmap

In a similar way, also GeoDataFrames with (multi)line shapes can be drawn using Pandas-Bokeh.


 


Colorbar formatting:

If you want to display the numerical labels on your colorbar with an alternative to the scientific format, you can pass in a one of the bokeh number string formats or an instance of one of the bokeh.models.formatters to the colorbar_tick_format argument in the geoplot

An example of using the string format argument:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

# pass in a string format to colorbar_tick_format to display the ticks as 10m rather than 1e7
df_states.plot_bokeh(
    figsize=(900, 600),
    category="POPESTIMATE2017",
    simplify_shapes=5000,    
    colormap="Inferno",
    colormap_uselog=True,
    colorbar_tick_format="0.0a")

colorbar_tick_format with string argument

An example of using the bokeh PrintfTickFormatter:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

for i in range(8):
    df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

# pass in a PrintfTickFormatter instance colorbar_tick_format to display the ticks with 2 decimal places  
df_states.plot_bokeh(
    figsize=(900, 600),
    category="Delta_Population_2017",
    simplify_shapes=5000,    
    colormap="Inferno",
    colorbar_tick_format=PrintfTickFormatter(format="%4.2f"))

colorbar_tick_format with bokeh.models.formatter_instance


Outputs, Formatting & Layouts

Output options

The pandas.DataFrame.plot_bokeh API has the following additional keyword arguments:

  • show_figure: If True, the resulting figure is shown (either in the notebook or exported and shown as HTML file, see Basics. If False, None is returned. Default: True
  • return_html: If True, the method call returns an HTML string that contains all Bokeh CSS&JS resources and the figure embedded in a div. This HTML representation of the plot can be used for embedding the plot in an HTML document. Default: False

If you have a Bokeh figure or layout, you can also use the pandas_bokeh.embedded_html function to generate an embeddable HTML representation of the plot. This can be included into any valid HTML (note that this is not possible directly with the HTML generated by the pandas_bokeh.output_file output option, because it includes an HTML header). Let us consider the following simple example:

#Import Pandas and Pandas-Bokeh (if you do not specify an output option, the standard is
#output_file):
import pandas as pd
import pandas_bokeh

#Create DataFrame to Plot:
import numpy as np
x = np.arange(-10, 10, 0.1)
sin = np.sin(x)
cos = np.cos(x)
tan = np.tan(x)
df = pd.DataFrame({"x": x, "sin(x)": sin, "cos(x)": cos, "tan(x)": tan})

#Make Bokeh plot from DataFrame using Pandas-Bokeh. Do not show the plot, but export
#it to an embeddable HTML string:
html_plot = df.plot_bokeh(
    kind="line",
    x="x",
    y=["sin(x)", "cos(x)", "tan(x)"],
    xticks=range(-20, 20),
    title="Trigonometric functions",
    show_figure=False,
    return_html=True,
    ylim=(-1.5, 1.5))

#Write some HTML and embed the HTML plot below it. For production use, please use
#Templates and the awesome Jinja library.
html = r"""
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
  src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<h1> Trigonometric functions </h1>

<p> The basic trigonometric functions are:</p>

<p>$ sin(x) $</p>
<p>$ cos(x) $</p>
<p>$ tan(x) = \frac{sin(x)}{cos(x)}$</p>

<p>Below is a plot that shows them</p>

""" + html_plot

#Export the HTML string to an external HTML file and show it:
with open("test.html" , "w") as f:
    f.write(html)
    
import webbrowser
webbrowser.open("test.html")

This code will open up a webbrowser and show the following page. As you can see, the interactive Bokeh plot is embedded nicely into the HTML layout. The return_html option is ideal for the use in a templating engine like Jinja.

Embedded HTML

Auto Scaling Plots

For single plots that have a number of x axis values or for larger monitors, you can auto scale the figure to the width of the entire jupyter cell by setting the sizing_mode parameter.

df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) df.plot_bokeh(kind="bar", figsize=(500, 200), sizing_mode="scale_width")

Scaled Plot

The figsize parameter can be used to change the height and width as well as act as a scaling multiplier against the axis that is not being scaled.

 

Number formats

To change the formats of numbers in the hovertool, use the number_format keyword argument. For a documentation about the format to pass, have a look at the Bokeh documentation.Let us consider some examples for the number 3.141592653589793:

FormatOutput
03
0.0003.141
0.00 $3.14 $

This number format will be applied to all numeric columns of the hovertool. If you want to make a very custom or complicated hovertool, you should probably use the hovertool_string keyword argument, see e.g. this example. Below, we use the number_format parameter to specify the "Stock Price" format to 2 decimal digits and an additional $ sign.

import numpy as np

#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
    "Google": np.random.randn(1000) + 0.2,
    "Apple": np.random.randn(1000) + 0.17
},
                  index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(
    kind="line",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    colormap=["red", "blue"],
    number_format="1.00 $")

Number format

Suppress scientific notation for axes

If you want to suppress the scientific notation for axes, you can use the disable_scientific_axes parameter, which accepts one of "x", "y", "xy":

df = pd.DataFrame({"Animal": ["Mouse", "Rabbit", "Dog", "Tiger", "Elefant", "Wale"],
                   "Weight [g]": [19, 3000, 40000, 200000, 6000000, 50000000]})
p_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", show_figure=False)
p_non_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", disable_scientific_axes="y", show_figure=False,)
pandas_bokeh.plot_grid([[p_scientific, p_non_scientific]], plot_width = 450)

Number format

 

Dashboard Layouts

As shown in the Scatterplot Example, combining plots with plots or other HTML elements is straighforward in Pandas-Bokeh due to the layout capabilities of Bokeh. The easiest way to generate a dashboard layout is using the pandas_bokeh.plot_grid method (which is an extension of bokeh.layouts.gridplot):

import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()

#Barplot:
data = {
    'fruits':
    ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
    '2015': [2, 1, 4, 3, 2, 4],
    '2016': [5, 3, 3, 2, 4, 6],
    '2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh(
    kind="bar",
    ylabel="Price per Unit [€]",
    title="Fruit prices per Year",
    show_figure=False)

#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
    "Google": np.random.randn(1000) + 0.2,
    "Apple": np.random.randn(1000) + 0.17
},
                  index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
p_line = df.plot_bokeh(
    kind="line",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    colormap=["red", "blue"],
    show_figure=False)

#Scatterplot:
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
    kind="scatter",
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization",
    show_figure=False)

#Histogram:
df_hist = pd.DataFrame({
    'a': np.random.randn(1000) + 1,
    'b': np.random.randn(1000),
    'c': np.random.randn(1000) - 1
},
                       columns=['a', 'b', 'c'])

p_hist = df_hist.plot_bokeh(
    kind="hist",
    bins=np.arange(-6, 6.5, 0.5),
    vertical_xlabel=True,
    normed=100,
    hovertool=False,
    title="Normal distributions",
    show_figure=False)

#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar], 
                        [p_scatter, p_hist]], plot_width=450)

Dashboard Layout

Using a combination of row and column elements (see also Bokeh Layouts) allow for a very easy general arrangement of elements. An alternative layout to the one above is:

p_line.plot_width = 900
p_hist.plot_width = 900

layout = pandas_bokeh.column(p_line,
                pandas_bokeh.row(p_scatter, p_bar),
                p_hist)

pandas_bokeh.show(layout)

Alternative Dashboard Layout


 



 

 

Release Notes

Release Notes can be found here.

Contributing to Pandas-Bokeh

If you wish to contribute to the development of Pandas-Bokeh you can follow the instructions on the CONTRIBUTING.md.

 

Author: PatrikHlobil
Source Code: https://github.com/PatrikHlobil/Pandas-Bokeh 
License: MIT License

#machine-learning  #datavisualizations #python 

Jamison  Fisher

Jamison Fisher

1642995900

Pandas Bokeh: Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.

With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling:

df.plot_bokeh()

Pandas-Bokeh also provides native support as a Pandas Plotting backend for Pandas >= 0.25. When Pandas-Bokeh is installed, switchting the default Pandas plotting backend to Bokeh can be done via:

pd.set_option('plotting.backend', 'pandas_bokeh')

More details about the new Pandas backend can be found below.

Interactive Documentation

Please visit:

https://patrikhlobil.github.io/Pandas-Bokeh/

for an interactive version of the documentation below, where you can play with the dynamic Bokeh plots.

For more information have a look at the Examples below or at notebooks on the Github Repository of this project.

Startimage

Installation

You can install Pandas-Bokeh from PyPI via pip

pip install pandas-bokeh

or conda:

conda install -c patrikhlobil pandas-bokeh

With the current release 0.5.5, Pandas-Bokeh officially supports Python 3.6 and newer. For more details, see Release Notes.

How To Use

Classical Use

 

The Pandas-Bokeh library should be imported after Pandas, GeoPandas and/or Pyspark. After the import, one should define the plotting output, which can be:

  • pandas_bokeh.output_notebook(): Embeds the Plots in the cell outputs of the notebook. Ideal when working in Jupyter Notebooks.
  • pandas_bokeh.output_file(filename): Exports the plot to the provided filename as an HTML.

For more details about the plotting outputs, see the reference here or the Bokeh documentation.

Notebook output (see also bokeh.io.output_notebook)

import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

File output to "Interactive Plot.html" (see also bokeh.io.output_file)

import pandas as pd
import pandas_bokeh
pandas_bokeh.output_file("Interactive Plot.html")

Pandas-Bokeh as native Pandas plotting backend

For pandas >= 0.25, a plotting backend switch is natively supported. It can be achievied by calling:

import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')

Now, the plotting API is accessible for a Pandas DataFrame via:

df.plot(...)

All additional functionalities of Pandas-Bokeh are then accessible at pd.plotting. So, setting the output to notebook is:

pd.plotting.output_notebook()

or calling the grid layout functionality:

pd.plotting.plot_grid(...)

Note: Backwards compatibility is kept since there will still be the df.plot_bokeh(...) methods for a DataFrame.

Plot types

Supported plottypes are at the moment:

Also, check out the complementary chapter Outputs, Formatting & Layouts about:

Lineplot

Basic Lineplot

This simple lineplot in Pandas-Bokeh already contains various interactive elements:

  • a pannable and zoomable (zoom in plotarea and zoom on axis) plot
  • by clicking on the legend elements, one can hide and show the individual lines
  • a Hovertool for the plotted lines

Consider the following simple example:

import numpy as np

np.random.seed(42)
df = pd.DataFrame({"Google": np.random.randn(1000)+0.2, 
                   "Apple": np.random.randn(1000)+0.17}, 
                   index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(kind="line")       #equivalent to df.plot_bokeh.line()

ApplevsGoogle_1

Note, that similar to the regular pandas.DataFrame.plot method, there are also additional accessors to directly access the different plotting types like:

  • df.plot_bokeh(kind="line", ...)df.plot_bokeh.line(...)
  • df.plot_bokeh(kind="bar", ...)df.plot_bokeh.bar(...)
  • df.plot_bokeh(kind="hist", ...)df.plot_bokeh.hist(...)
  • ...

Advanced Lineplot

There are various optional parameters to tune the plots, for example:

  • kind: Which kind of plot should be produced. Currently supported are: "line", "point", "scatter", "bar" and "histogram". In the near future many more will be implemented as horizontal barplot, boxplots, pie-charts, etc.
  • x: Name of the column to use for the horizontal x-axis. If the x parameter is not specified, the index is used for the x-values of the plot. Alternative, also an array of values can be passed that has the same number of elements as the DataFrame.
  • y: Name of column or list of names of columns to use for the vertical y-axis.
  • figsize: Choose width & height of the plot
  • title: Sets title of the plot
  • xlim/ylim: Set visibler range of plot for x- and y-axis (also works for datetime x-axis)
  • xlabel/ylabel: Set x- and y-labels
  • logx/logy: Set log-scale on x-/y-axis
  • xticks/yticks: Explicitly set the ticks on the axes
  • color: Defines a single color for a plot.
  • colormap: Can be used to specify multiple colors to plot. Can be either a list of colors or the name of a Bokeh color palette
  • hovertool: If True a Hovertool is active, else if False no Hovertool is drawn.
  • hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation and here)
  • toolbar_location: Specify the position of the toolbar location (None, "above", "below", "left" or "right"). Default: "right"
  • zooming: Enables/Disables zooming. Default: True
  • panning: Enables/Disables panning. Default: True
  • fontsize_label/fontsize_ticks/fontsize_title/fontsize_legend: Set fontsize of labels, ticks, title or legend (int or string of form "15pt")
  • rangetool Enables a range tool scroller. Default False
  • kwargs**: Optional keyword arguments of bokeh.plotting.figure.line

Try them out to get a feeling for the effects. Let us consider now:

df.plot_bokeh.line(
    figsize=(800, 450),
    y="Apple",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    toolbar_location=None,
    colormap=["red", "blue"],
    hovertool_string=r"""<img
                        src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Apple_logo_black.svg/170px-Apple_logo_black.svg.png' 
                        height="42" alt="@imgs" width="42"
                        style="float: left; margin: 0px 15px 15px 0px;"
                        border="2"></img> Apple 
                        
                        <h4> Stock Price: </h4> @{Apple}""",
    panning=False,
    zooming=False)

ApplevsGoogle_2

Lineplot with data points

For lineplots, as for many other plot-kinds, there are some special keyword arguments that only work for this plotting type. For lineplots, these are:

  • plot_data_points: Plot also the data points on the lines
  • plot_data_points_size: Determines the size of the data points
  • marker: Defines the point type (Default: "circle"). Possible values are: 'circle', 'square', 'triangle', 'asterisk', 'circle_x', 'square_x', 'inverted_triangle', 'x', 'circle_cross', 'square_cross', 'diamond', 'cross'
  • kwargs**: Optional keyword arguments of bokeh.plotting.figure.line

Let us use this information to have another version of the same plot:

df.plot_bokeh.line(
    figsize=(800, 450),
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(100, 200),
    xlim=("2001-01-01", "2001-02-01"),
    colormap=["red", "blue"],
    plot_data_points=True,
    plot_data_points_size=10,
    marker="asterisk")

ApplevsGoogle_3

Lineplot with rangetool

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()

df.plot_bokeh(rangetool=True)

rangetool

Pointplot

If you just wish to draw the date points for curves, the pointplot option is the right choice. It also accepts the kwargs of bokeh.plotting.figure.scatter like marker or size:

import numpy as np

x = np.arange(-3, 3, 0.1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.point(
    x="x",
    xticks=range(-3, 4),
    size=5,
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    marker="x")

Pointplot

Stepplot

With a similar API as the line- & pointplots, one can generate a stepplot. Additional keyword arguments for this plot type are passes to bokeh.plotting.figure.step, e.g. mode (before, after, center), see the following example

import numpy as np

x = np.arange(-3, 3, 1)
y2 = x**2
y3 = x**3
df = pd.DataFrame({"x": x, "Parabula": y2, "Cube": y3})
df.plot_bokeh.step(
    x="x",
    xticks=range(-1, 1),
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    figsize=(800,300),
    fontsize_title=30,
    fontsize_label=25,
    fontsize_ticks=15,
    fontsize_legend=5,
    )

df.plot_bokeh.step(
    x="x",
    xticks=range(-1, 1),
    colormap=["#009933", "#ff3399"],
    title="Pointplot (Parabula vs. Cube)",
    mode="after",
    figsize=(800,300)
    )

Stepplot

Note that the step-plot API of Bokeh does so far not support a hovertool functionality.

Scatterplot

A basic scatterplot can be created using the kind="scatter" option. For scatterplots, the x and y parameters have to be specified and the following optional keyword argument is allowed:

category: Determines the category column to use for coloring the scatter points

kwargs**: Optional keyword arguments of bokeh.plotting.figure.scatter

Note, that the pandas.DataFrame.plot_bokeh() method return per default a Bokeh figure, which can be embedded in Dashboard layouts with other figures and Bokeh objects (for more details about (sub)plot layouts and embedding the resulting Bokeh plots as HTML click here).

In the example below, we use the building grid layout support of Pandas-Bokeh to display both the DataFrame (using a Bokeh DataTable) and the resulting scatterplot:

# Load Iris Dataset:
df = pd.read_csv(
    r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/iris/iris.csv"
)
df = df.sample(frac=1)

# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import ColumnDataSource

data_table = DataTable(
    columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
    source=ColumnDataSource(df),
    height=300,
)

# Create Scatterplot:
p_scatter = df.plot_bokeh.scatter(
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization",
    show_figure=False,
)

# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_scatter]], plot_width=400, plot_height=350)

 

Scatterplot

A possible optional keyword parameters that can be passed to bokeh.plotting.figure.scatter is size. Below, we use the sepal length of the Iris data as reference for the size:

#Change one value to clearly see the effect of the size keyword
df.loc[13, "sepal length (cm)"] = 15

#Make scatterplot:
p_scatter = df.plot_bokeh.scatter(
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization with Size Keyword",
    size="sepal length (cm)")

Scatterplot2

In this example you can see, that the additional dimension sepal length cannot be used to clearly differentiate between the virginica and versicolor species.

Barplot

The barplot API has no special keyword arguments, but accepts optional kwargs of bokeh.plotting.figure.vbar like alpha. It uses per default the index for the bar categories (however, also columns can be used as x-axis category using the x argument).

data = {
    'fruits':
    ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
    '2015': [2, 1, 4, 3, 2, 4],
    '2016': [5, 3, 3, 2, 4, 6],
    '2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")

p_bar = df.plot_bokeh.bar(
    ylabel="Price per Unit [€]", 
    title="Fruit prices per Year", 
    alpha=0.6)

Barplot

Using the stacked keyword argument you also maked stacked barplots:

p_stacked_bar = df.plot_bokeh.bar(
    ylabel="Price per Unit [€]",
    title="Fruit prices per Year",
    stacked=True,
    alpha=0.6)

Barplot2

Also horizontal versions of the above barplot are supported with the keyword kind="barh" or the accessor plot_bokeh.barh. You can still specify a column of the DataFrame as the bar category via the x argument if you do not wish to use the index.

#Reset index, such that "fruits" is now a column of the DataFrame:
df.reset_index(inplace=True)

#Create horizontal bar (via kind keyword):
p_hbar = df.plot_bokeh(
    kind="barh",
    x="fruits",
    xlabel="Price per Unit [€]",
    title="Fruit prices per Year",
    alpha=0.6,
    legend = "bottom_right",
    show_figure=False)

#Create stacked horizontal bar (via barh accessor):
p_stacked_hbar = df.plot_bokeh.barh(
    x="fruits",
    stacked=True,
    xlabel="Price per Unit [€]",
    title="Fruit prices per Year",
    alpha=0.6,
    legend = "bottom_right",
    show_figure=False)

#Plot all barplot examples in a grid:
pandas_bokeh.plot_grid([[p_bar, p_stacked_bar],
                        [p_hbar, p_stacked_hbar]], 
                       plot_width=450)

Barplot3

Histogram

For drawing histograms (kind="hist"), Pandas-Bokeh has a lot of customization features. Optional keyword arguments for histogram plots are:

  • bins: Determines bins to use for the histogram. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by histogram_bin_edges.
  • histogram_type: Either "sidebyside", "topontop" or "stacked". Default: "topontop"
  • stacked: Boolean that overrides the histogram_type as "stacked" if given. Default: False
  • kwargs**: Optional keyword arguments of bokeh.plotting.figure.quad

Below examples of the different histogram types:

import numpy as np

df_hist = pd.DataFrame({
    'a': np.random.randn(1000) + 1,
    'b': np.random.randn(1000),
    'c': np.random.randn(1000) - 1
    },
    columns=['a', 'b', 'c'])

#Top-on-Top Histogram (Default):
df_hist.plot_bokeh.hist(
    bins=np.linspace(-5, 5, 41),
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Top-on-Top)",
    line_color="black")

#Side-by-Side Histogram (multiple bars share bin side-by-side) also accessible via
#kind="hist":
df_hist.plot_bokeh(
    kind="hist",
    bins=np.linspace(-5, 5, 41),
    histogram_type="sidebyside",
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Side-by-Side)",
    line_color="black")

#Stacked histogram:
df_hist.plot_bokeh.hist(
    bins=np.linspace(-5, 5, 41),
    histogram_type="stacked",
    vertical_xlabel=True,
    hovertool=False,
    title="Normal distributions (Stacked)",
    line_color="black")

Histogram

Further, advanced keyword arguments for histograms are:

  • weights: A column of the DataFrame that is used as weight for the histogramm aggregation (see also numpy.histogram)
  • normed: If True, histogram values are normed to 1 (sum of histogram values=1). It is also possible to pass an integer, e.g. normed=100 would result in a histogram with percentage y-axis (sum of histogram values=100). Default: False
  • cumulative: If True, a cumulative histogram is shown. Default: False
  • show_average: If True, the average of the histogram is also shown. Default: False

Their usage is shown in these examples:

p_hist = df_hist.plot_bokeh.hist(
    y=["a", "b"],
    bins=np.arange(-4, 6.5, 0.5),
    normed=100,
    vertical_xlabel=True,
    ylabel="Share[%]",
    title="Normal distributions (normed)",
    show_average=True,
    xlim=(-4, 6),
    ylim=(0, 30),
    show_figure=False)

p_hist_cum = df_hist.plot_bokeh.hist(
    y=["a", "b"],
    bins=np.arange(-4, 6.5, 0.5),
    normed=100,
    cumulative=True,
    vertical_xlabel=True,
    ylabel="Share[%]",
    title="Normal distributions (normed & cumulative)",
    show_figure=False)

pandas_bokeh.plot_grid([[p_hist, p_hist_cum]], plot_width=450, plot_height=300)

Histogram2

Areaplot

Areaplot (kind="area") can be either drawn on top of each other or stacked. The important parameters are:

stacked: If True, the areaplots are stacked. If False, plots are drawn on top of each other. Default: False

kwargs**: Optional keyword arguments of bokeh.plotting.figure.patch

Let us consider the energy consumption split by source that can be downloaded as DataFrame via:

df_energy = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/energy/energy.csv", 
parse_dates=["Year"])
df_energy.head()
YearOilGasCoalNuclear EnergyHydroelectricityOther Renewable
1970-01-012291.5826.71467.317.7265.85.8
1971-01-012427.7884.81459.224.9276.46.3
1972-01-012613.9933.71475.734.1288.96.8
1973-01-012818.1978.01519.645.9292.57.3
1974-01-012777.31001.91520.959.6321.17.7

Creating the Areaplot can be achieved via:

df_energy.plot_bokeh.area(
    x="Year",
    stacked=True,
    legend="top_left",
    colormap=["brown", "orange", "black", "grey", "blue", "green"],
    title="Worldwide energy consumption split by energy source",
    ylabel="Million tonnes oil equivalent",
    ylim=(0, 16000))

areaplot

Note that the energy consumption of fossile energy is still increasing and renewable energy sources are still small in comparison 😢!!! However, when we norm the plot using the normed keyword, there is a clear trend towards renewable energies in the last decade:

df_energy.plot_bokeh.area(
    x="Year",
    stacked=True,
    normed=100,
    legend="bottom_left",
    colormap=["brown", "orange", "black", "grey", "blue", "green"],
    title="Worldwide energy consumption split by energy source",
    ylabel="Million tonnes oil equivalent")

areaplot2

Pieplot

For Pieplots, let us consider a dataset showing the results of all Bundestags elections in Germany since 2002:

df_pie = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/Bundestagswahl/Bundestagswahl.csv")
df_pie
Partei20022005200920132017
CDU/CSU38.535.233.841.532.9
SPD38.534.223.025.720.5
FDP7.49.814.64.810.7
Grünen8.68.110.78.48.9
Linke/PDS4.08.711.98.69.2
AfD0.00.00.00.012.6
Sonstige3.04.06.011.05.0

We can create a Pieplot of the last election in 2017 by specifying the "Partei" (german for party) column as the x column and the "2017" column as the y column for values:

df_pie.plot_bokeh.pie(
    x="Partei",
    y="2017",
    colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
    title="Results of German Bundestag Election 2017",
    )

pieplot

When you pass several columns to the y parameter (not providing the y-parameter assumes you plot all columns), multiple nested pieplots will be shown in one plot:

df_pie.plot_bokeh.pie(
    x="Partei",
    colormap=["blue", "red", "yellow", "green", "purple", "orange", "grey"],
    title="Results of German Bundestag Elections [2002-2017]",
    line_color="grey")

pieplot2

Mapplot

The mapplot method of Pandas-Bokeh allows for plotting geographic points stored in a Pandas DataFrame on an interactive map. For more advanced Geoplots for line and polygon shapes have a look at the Geoplots examples for the GeoPandas API of Pandas-Bokeh.

For mapplots, only (latitude, longitude) pairs in geographic projection (WGS84) can be plotted on a map. The basic API has the following 2 base parameters:

  • x: name of the longitude column of the DataFrame
  • y: name of the latitude column of the DataFrame

The other optional keyword arguments are discussed in the section about the GeoPandas API, e.g. category for coloring the points.

Below an example of plotting all cities for more than 1 million inhabitants:

df_mapplot = pd.read_csv(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/populated_places.csv")
df_mapplot.head()
namepop_maxlatitudelongitudesize
Mesa108539433.423915-111.7360841.085394
Sharjah110302725.37138355.4064781.103027
Changwon108149935.219102128.5835621.081499
Sheffield129290053.366677-1.4999971.292900
Abbottabad118364734.14950373.1995011.183647
df_mapplot["size"] = df_mapplot["pop_max"] / 1000000
df_mapplot.plot_bokeh.map(
    x="longitude",
    y="latitude",
    hovertool_string="""<h2> @{name} </h2> 
    
                        <h3> Population: @{pop_max} </h3>""",
    tile_provider="STAMEN_TERRAIN_RETINA",
    size="size", 
    figsize=(900, 600),
    title="World cities with more than 1.000.000 inhabitants")

 

Mapplot

Geoplots

Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :

  • Points/MultiPoints
  • Lines/MultiLines
  • Polygons/MultiPolygons

Note: t is not possible to mix up the objects types, i.e. a GeoDataFrame with Points and Lines is for example not allowed.

Les us start with a simple example using the "World Borders Dataset" . Let us first import all neccessary libraries and read the shapefile:

import geopandas as gpd
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

#Read in GeoJSON from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_states.head()
STATE_NAMEREGIONPOPESTIMATE2010POPESTIMATE2011POPESTIMATE2012POPESTIMATE2013POPESTIMATE2014POPESTIMATE2015POPESTIMATE2016POPESTIMATE2017geometry
Hawaii413638171378323139277214080381417710142632014286831427538(POLYGON ((-160.0738033454681 22.0041773479577...
Washington467413866819155689089969634107046931715281872809347405743(POLYGON ((-122.4020153103835 48.2252163723779...
Montana4990507996866100352210119211019931102831710386561050493POLYGON ((-111.4754253002074 44.70216236909688...
Maine113275681327968132810113279751328903132778713302321335907(POLYGON ((-69.77727626137293 44.0741483685119...
North Dakota2674518684830701380722908738658754859755548755393POLYGON ((-98.73043728833767 45.93827137024809...

Plotting the data on a map is as simple as calling:

df_states.plot_bokeh(simplify_shapes=10000)

US_States_1

We also passed the optional parameter simplify_shapes (~meter) to improve plotting performance (for a reference see shapely.object.simplify). The above geolayer thus has an accuracy of about 10km.

Many keyword arguments like xlabel, ylabel, xlim, ylim, title, colormap, hovertool, zooming, panning, ... for costumizing the plot are also available for the geoplotting API and can be uses as in the examples shown above. There are however also many other options especially for plotting geodata:

  • geometry_column: Specify the column that stores the geometry-information (default: "geometry")
  • hovertool_columns: Specify column names, for which values should be shown in hovertool
  • hovertool_string: If specified, this string will be used for the hovertool (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation)
  • colormap_uselog: If set True, the colormapper is using a logscale. Default: False
  • colormap_range: Specify the value range of the colormapper via (min, max) tuple
  • tile_provider: Define build-in tile provider for background maps. Possible values: None, 'CARTODBPOSITRON', 'CARTODBPOSITRON_RETINA', 'STAMEN_TERRAIN', 'STAMEN_TERRAIN_RETINA', 'STAMEN_TONER', 'STAMEN_TONER_BACKGROUND', 'STAMEN_TONER_LABELS'. Default: CARTODBPOSITRON_RETINA
  • tile_provider_url: An arbitraty tile_provider_url of the form '/{Z}/{X}/{Y}*.png' can be passed to be used as background map.
  • tile_attribution: String (also HTML accepted) for showing attribution for tile source in the lower right corner
  • tile_alpha: Sets the alpha value of the background tile between [0, 1]. Default: 1

One of the most common usage of map plots are choropleth maps, where the color of a the objects is determined by the property of the object itself. There are 3 ways of drawing choropleth maps using Pandas-Bokeh, which are described below.

Categories

This is the simplest way. Just provide the category keyword for the selection of the property column:

  • category: Specifies the column of the GeoDataFrame that should be used to draw a choropleth map
  • show_colorbar: Whether or not to show a colorbar for categorical plots. Default: True

Let us now draw the regions as a choropleth plot using the category keyword (at the moment, only numerical columns are supported for choropleth plots):

df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    category="REGION",
    show_colorbar=False,
    colormap=["blue", "yellow", "green", "red"],
    hovertool_columns=["STATE_NAME", "REGION"],
    tile_provider="STAMEN_TERRAIN_RETINA")

When hovering over the states, the state-name and the region are shown as specified in the hovertool_columns argument.

US_States_2

Dropdown

By passing a list of column names of the GeoDataFrame as the dropdown keyword argument, a dropdown menu is shown above the map. This dropdown menu can be used to select the choropleth layer by the user. :

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    dropdown=["POPESTIMATE2010", "POPESTIMATE2017"],
    colormap="Viridis",
    hovertool_string="""
                        <img
                        src="https://www.states101.com/img/flags/gif/small/@STATE_NAME_SMALL.gif" 
                        height="42" alt="@imgs" width="42"
                        style="float: left; margin: 0px 15px 15px 0px;"
                        border="2"></img>
                
                        <h2>  @STATE_NAME </h2>
                        <h3> 2010: @POPESTIMATE2010 </h3>
                        <h3> 2017: @POPESTIMATE2017 </h3>""",
    tile_provider_url=r"http://c.tile.stamen.com/watercolor/{Z}/{X}/{Y}.jpg",
    tile_attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, under <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a>. Data by <a href="http://openstreetmap.org">OpenStreetMap</a>, under <a href="http://www.openstreetmap.org/copyright">ODbL</a>.'
    )

US_States_3

Using hovertool_string, one can pass a string that can contain arbitrary HTML elements (including divs, images, ...) that is shown when hovering over the geographies (@{column} will be replaced by the value of the column for the element the mouse hovers over, see also Bokeh documentation).

Here, we also used an OSM tile server with watercolor style via tile_provider_url and added the attribution via tile_attribution.

Sliders

Another option for interactive choropleth maps is the slider implementation of Pandas-Bokeh. The possible keyword arguments are here:

  • slider: By passing a list of column names of the GeoDataFrame, a slider can be used to . This dropdown menu can be used to select the choropleth layer by the user.
  • slider_range: Pass a range (or numpy.arange) of numbers object to relate the sliders values with the slider columns. By passing range(0,10), the slider will have values [0, 1, 2, ..., 9], when passing numpy.arange(3,5,0.5), the slider will have values [3, 3.5, 4, 4.5]. Default: range(0, len(slider))
  • slider_name: Specifies the title of the slider. Default is an empty string.

This can be used to display the change in population relative to the year 2010:

#Calculate change of population relative to 2010:
for i in range(8):
    df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

#Specify slider columns:
slider_columns = ["Delta_Population_201%d"%i for i in range(8)]

#Specify slider-range (Maps "Delta_Population_2010" -> 2010, 
#                           "Delta_Population_2011" -> 2011, ...):
slider_range = range(2010, 2018)

#Make slider plot:
df_states.plot_bokeh(
    figsize=(900, 600),
    simplify_shapes=5000,
    slider=slider_columns,
    slider_range=slider_range,
    slider_name="Year", 
    colormap="Inferno",
    hovertool_columns=["STATE_NAME"] + slider_columns,
    title="Change of Population [%]")

US_States_4

Plot multiple geolayers

If you wish to display multiple geolayers, you can pass the Bokeh figure of a Pandas-Bokeh plot via the figure keyword to the next plot_bokeh() call:

import geopandas as gpd
import pandas_bokeh
pandas_bokeh.output_notebook()

# Read in GeoJSONs from URL:
df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_cities = gpd.read_file(
    r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson"
)
df_cities["size"] = df_cities.pop_max / 400000

#Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
    figsize=(800, 450),
    simplify_shapes=10000,
    show_figure=False,
    xlim=[-170, -80],
    ylim=[10, 70],
    category="REGION",
    colormap="Dark2",
    legend="States",
    show_colorbar=False,
)

#Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
    figure=figure,         # <== pass figure here!
    category="pop_max",
    colormap="Viridis",
    colormap_uselog=True,
    size="size",
    hovertool_string="""<h1>@name</h1>
                        <h3>Population: @pop_max </h3>""",
    marker="inverted_triangle",
    legend="Cities",
)

Multiple Geolayers

Point & Line plots:

Below, you can see an example that use Pandas-Bokeh to plot point data on a map. The plot shows all cities with a population larger than 1.000.000. For point plots, you can select the marker as keyword argument (since it is passed to bokeh.plotting.figure.scatter). Here an overview of all available marker types:

gdf = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")
gdf["size"] = gdf.pop_max / 400000

gdf.plot_bokeh(
    category="pop_max",
    colormap="Viridis",
    colormap_uselog=True,
    size="size",
    hovertool_string="""<h1>@name</h1>
                        <h3>Population: @pop_max </h3>""",
    xlim=[-15, 35],
    ylim=[30,60],
    marker="inverted_triangle");

Pointmap

In a similar way, also GeoDataFrames with (multi)line shapes can be drawn using Pandas-Bokeh.

Colorbar formatting:

If you want to display the numerical labels on your colorbar with an alternative to the scientific format, you can pass in a one of the bokeh number string formats or an instance of one of the bokeh.models.formatters to the colorbar_tick_format argument in the geoplot

An example of using the string format argument:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

# pass in a string format to colorbar_tick_format to display the ticks as 10m rather than 1e7
df_states.plot_bokeh(
    figsize=(900, 600),
    category="POPESTIMATE2017",
    simplify_shapes=5000,    
    colormap="Inferno",
    colormap_uselog=True,
    colorbar_tick_format="0.0a")

colorbar_tick_format with string argument

An example of using the bokeh PrintfTickFormatter:

df_states = gpd.read_file(r"https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")

df_states["STATE_NAME_SMALL"] = df_states["STATE_NAME"].str.lower()

for i in range(8):
    df_states["Delta_Population_201%d"%i] = ((df_states["POPESTIMATE201%d"%i] / df_states["POPESTIMATE2010"]) -1 ) * 100

# pass in a PrintfTickFormatter instance colorbar_tick_format to display the ticks with 2 decimal places  
df_states.plot_bokeh(
    figsize=(900, 600),
    category="Delta_Population_2017",
    simplify_shapes=5000,    
    colormap="Inferno",
    colorbar_tick_format=PrintfTickFormatter(format="%4.2f"))

colorbar_tick_format with bokeh.models.formatter_instance

Outputs, Formatting & Layouts

 

Output options

The pandas.DataFrame.plot_bokeh API has the following additional keyword arguments:

  • show_figure: If True, the resulting figure is shown (either in the notebook or exported and shown as HTML file, see Basics. If False, None is returned. Default: True
  • return_html: If True, the method call returns an HTML string that contains all Bokeh CSS&JS resources and the figure embedded in a div. This HTML representation of the plot can be used for embedding the plot in an HTML document. Default: False

If you have a Bokeh figure or layout, you can also use the pandas_bokeh.embedded_html function to generate an embeddable HTML representation of the plot. This can be included into any valid HTML (note that this is not possible directly with the HTML generated by the pandas_bokeh.output_file output option, because it includes an HTML header). Let us consider the following simple example:

#Import Pandas and Pandas-Bokeh (if you do not specify an output option, the standard is
#output_file):
import pandas as pd
import pandas_bokeh

#Create DataFrame to Plot:
import numpy as np
x = np.arange(-10, 10, 0.1)
sin = np.sin(x)
cos = np.cos(x)
tan = np.tan(x)
df = pd.DataFrame({"x": x, "sin(x)": sin, "cos(x)": cos, "tan(x)": tan})

#Make Bokeh plot from DataFrame using Pandas-Bokeh. Do not show the plot, but export
#it to an embeddable HTML string:
html_plot = df.plot_bokeh(
    kind="line",
    x="x",
    y=["sin(x)", "cos(x)", "tan(x)"],
    xticks=range(-20, 20),
    title="Trigonometric functions",
    show_figure=False,
    return_html=True,
    ylim=(-1.5, 1.5))

#Write some HTML and embed the HTML plot below it. For production use, please use
#Templates and the awesome Jinja library.
html = r"""
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
  src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<h1> Trigonometric functions </h1>

<p> The basic trigonometric functions are:</p>

<p>$ sin(x) $</p>
<p>$ cos(x) $</p>
<p>$ tan(x) = \frac{sin(x)}{cos(x)}$</p>

<p>Below is a plot that shows them</p>

""" + html_plot

#Export the HTML string to an external HTML file and show it:
with open("test.html" , "w") as f:
    f.write(html)
    
import webbrowser
webbrowser.open("test.html")

This code will open up a webbrowser and show the following page. As you can see, the interactive Bokeh plot is embedded nicely into the HTML layout. The return_html option is ideal for the use in a templating engine like Jinja.

Embedded HTML

Auto Scaling Plots

For single plots that have a number of x axis values or for larger monitors, you can auto scale the figure to the width of the entire jupyter cell by setting the sizing_mode parameter.

df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])

df.plot_bokeh(kind="bar", figsize=(500, 200), sizing_mode="scale_width")

Scaled Plot

The figsize parameter can be used to change the height and width as well as act as a scaling multiplier against the axis that is not being scaled.

 

Number formats

To change the formats of numbers in the hovertool, use the number_format keyword argument. For a documentation about the format to pass, have a look at the Bokeh documentation.Let us consider some examples for the number 3.141592653589793:

FormatOutput
03
0.0003.141
0.00 $3.14 $

This number format will be applied to all numeric columns of the hovertool. If you want to make a very custom or complicated hovertool, you should probably use the hovertool_string keyword argument, see e.g. this example. Below, we use the number_format parameter to specify the "Stock Price" format to 2 decimal digits and an additional $ sign.

import numpy as np

#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
    "Google": np.random.randn(1000) + 0.2,
    "Apple": np.random.randn(1000) + 0.17
},
                  index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
df.plot_bokeh(
    kind="line",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    colormap=["red", "blue"],
    number_format="1.00 $")

Number format

Suppress scientific notation for axes

If you want to suppress the scientific notation for axes, you can use the disable_scientific_axes parameter, which accepts one of "x", "y", "xy":

df = pd.DataFrame({"Animal": ["Mouse", "Rabbit", "Dog", "Tiger", "Elefant", "Wale"],
                   "Weight [g]": [19, 3000, 40000, 200000, 6000000, 50000000]})
p_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", show_figure=False)
p_non_scientific = df.plot_bokeh(x="Animal", y="Weight [g]", disable_scientific_axes="y", show_figure=False,)
pandas_bokeh.plot_grid([[p_scientific, p_non_scientific]], plot_width = 450)

Number format

 

Dashboard Layouts

As shown in the Scatterplot Example, combining plots with plots or other HTML elements is straighforward in Pandas-Bokeh due to the layout capabilities of Bokeh. The easiest way to generate a dashboard layout is using the pandas_bokeh.plot_grid method (which is an extension of bokeh.layouts.gridplot):

import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()

#Barplot:
data = {
    'fruits':
    ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'],
    '2015': [2, 1, 4, 3, 2, 4],
    '2016': [5, 3, 3, 2, 4, 6],
    '2017': [3, 2, 4, 4, 5, 3]
}
df = pd.DataFrame(data).set_index("fruits")
p_bar = df.plot_bokeh(
    kind="bar",
    ylabel="Price per Unit [€]",
    title="Fruit prices per Year",
    show_figure=False)

#Lineplot:
np.random.seed(42)
df = pd.DataFrame({
    "Google": np.random.randn(1000) + 0.2,
    "Apple": np.random.randn(1000) + 0.17
},
                  index=pd.date_range('1/1/2000', periods=1000))
df = df.cumsum()
df = df + 50
p_line = df.plot_bokeh(
    kind="line",
    title="Apple vs Google",
    xlabel="Date",
    ylabel="Stock price [$]",
    yticks=[0, 100, 200, 300, 400],
    ylim=(0, 400),
    colormap=["red", "blue"],
    show_figure=False)

#Scatterplot:
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris["data"])
df.columns = iris["feature_names"]
df["species"] = iris["target"]
df["species"] = df["species"].map(dict(zip(range(3), iris["target_names"])))
p_scatter = df.plot_bokeh(
    kind="scatter",
    x="petal length (cm)",
    y="sepal width (cm)",
    category="species",
    title="Iris DataSet Visualization",
    show_figure=False)

#Histogram:
df_hist = pd.DataFrame({
    'a': np.random.randn(1000) + 1,
    'b': np.random.randn(1000),
    'c': np.random.randn(1000) - 1
},
                       columns=['a', 'b', 'c'])

p_hist = df_hist.plot_bokeh(
    kind="hist",
    bins=np.arange(-6, 6.5, 0.5),
    vertical_xlabel=True,
    normed=100,
    hovertool=False,
    title="Normal distributions",
    show_figure=False)

#Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_line, p_bar], 
                        [p_scatter, p_hist]], plot_width=450)

Dashboard Layout

Using a combination of row and column elements (see also Bokeh Layouts) allow for a very easy general arrangement of elements. An alternative layout to the one above is:

p_line.plot_width = 900
p_hist.plot_width = 900

layout = pandas_bokeh.column(p_line,
                pandas_bokeh.row(p_scatter, p_bar),
                p_hist)

pandas_bokeh.show(layout)

Alternative Dashboard Layout

Release Notes

Release Notes can be found here.

Contributing to Pandas-Bokeh

If you wish to contribute to the development of Pandas-Bokeh you can follow the instructions on the CONTRIBUTING.md.

Download Details:
Author: PatrikHlobil
Source Code: https://github.com/PatrikHlobil/Pandas-Bokeh
License: MIT License

#pandas  #python #bokeh #Ploty