1620104580
It’s that time of the year again: time to get excited about all things cloud-native, as we gear up to connect, share and learn from fellow developers and technologists around the world at KubeCon EU 2021 next week.
Cloud-native technologies are mainstream and our creation, Kubernetes, is core to building and operating modern software. We’re working hard to create industry standards and services to make it easy for everyone to use this service. Let’s take a look at what’s new in the world of Kubernetes at Google Cloud since the last KubeCon and how we’re making it easier for everyone to use and benefit from this foundational technology.
Google Kubernetes Engine (GKE), our managed Kubernetes service, has always been about making it easy for you to run your containerized applications, while still giving you the control you need. With GKE Autopilot, a new mode of operation for GKE, you have an automated Kubernetes experience that optimizes your clusters for production, reduces the operational cost of managing clusters, and delivers higher workload availability.
“Reducing the complexity while getting the most out of Kubernetes is key for us and GKE Autopilot does exactly that!” - STRABAG BRVZ
Customers who want advanced configuration flexibility continue to use GKE in the standard mode of operation. As customers scale up their production environments, application requirements for availability, reducing blast-radius, or distributing different types of services have grown to necessitate deployment across multiple clusters. With the recently introduced GKE multi-cluster services, the Kubernetes Services object can now span multiple clusters in a zone, across multiple zones, or across multiple regions, with minimal configuration or overhead for managing the interconnection between clusters. GKE multi-cluster services enable you to focus on the needs of your application while GKE manages your multi-cluster topology.
#google cloud platform #containers & kubernetes
1635917640
このモジュールでは、Rustでハッシュマップ複合データ型を操作する方法について説明します。ハッシュマップのようなコレクション内のデータを反復処理するループ式を実装する方法を学びます。演習として、要求された注文をループし、条件をテストし、さまざまなタイプのデータを処理することによって車を作成するRustプログラムを作成します。
錆遊び場は錆コンパイラにブラウザインタフェースです。言語をローカルにインストールする前、またはコンパイラが利用できない場合は、Playgroundを使用してRustコードの記述を試すことができます。このコース全体を通して、サンプルコードと演習へのPlaygroundリンクを提供します。現時点でRustツールチェーンを使用できない場合でも、コードを操作できます。
Rust Playgroundで実行されるすべてのコードは、ローカルの開発環境でコンパイルして実行することもできます。コンピューターからRustコンパイラーと対話することを躊躇しないでください。Rust Playgroundの詳細については、What isRust?をご覧ください。モジュール。
このモジュールでは、次のことを行います。
Rustのもう1つの一般的なコレクションの種類は、ハッシュマップです。このHashMap<K, V>
型は、各キーK
をその値にマッピングすることによってデータを格納しますV
。ベクトル内のデータは整数インデックスを使用してアクセスされますが、ハッシュマップ内のデータはキーを使用してアクセスされます。
ハッシュマップタイプは、オブジェクト、ハッシュテーブル、辞書などのデータ項目の多くのプログラミング言語で使用されます。
ベクトルのように、ハッシュマップは拡張可能です。データはヒープに格納され、ハッシュマップアイテムへのアクセスは実行時にチェックされます。
次の例では、書評を追跡するためのハッシュマップを定義しています。ハッシュマップキーは本の名前であり、値は読者のレビューです。
use std::collections::HashMap;
let mut reviews: HashMap<String, String> = HashMap::new();
reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));
reviews.insert(String::from("Cooking with Rhubarb"), String::from("Sweet recipes."));
reviews.insert(String::from("Programming in Rust"), String::from("Great examples."));
このコードをさらに詳しく調べてみましょう。最初の行に、新しいタイプの構文が表示されます。
use std::collections::HashMap;
このuse
コマンドは、Rust標準ライブラリの一部HashMap
からの定義をcollections
プログラムのスコープに取り込みます。この構文は、他のプログラミング言語がインポートと呼ぶものと似ています。
HashMap::new
メソッドを使用して空のハッシュマップを作成します。reviews
必要に応じてキーと値を追加または削除できるように、変数を可変として宣言します。この例では、ハッシュマップのキーと値の両方がString
タイプを使用しています。
let mut reviews: HashMap<String, String> = HashMap::new();
このinsert(<key>, <value>)
メソッドを使用して、ハッシュマップに要素を追加します。コードでは、構文は<hash_map_name>.insert()
次のとおりです。
reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));
ハッシュマップにデータを追加した後、get(<key>)
メソッドを使用してキーの特定の値を取得できます。
// Look for a specific review
let book: &str = "Programming in Rust";
println!("\nReview for \'{}\': {:?}", book, reviews.get(book));
出力は次のとおりです。
Review for 'Programming in Rust': Some("Great examples.")
ノート
出力には、書評が単なる「すばらしい例」ではなく「Some( "すばらしい例。")」として表示されていることに注意してください。get
メソッドはOption<&Value>
型を返すため、Rustはメソッド呼び出しの結果を「Some()」表記でラップします。
この.remove()
メソッドを使用して、ハッシュマップからエントリを削除できます。get
無効なハッシュマップキーに対してメソッドを使用すると、get
メソッドは「なし」を返します。
// Remove book review
let obsolete: &str = "Ancient Roman History";
println!("\n'{}\' removed.", obsolete);
reviews.remove(obsolete);
// Confirm book review removed
println!("\nReview for \'{}\': {:?}", obsolete, reviews.get(obsolete));
出力は次のとおりです。
'Ancient Roman History' removed.
Review for 'Ancient Roman History': None
このコードを試して、このRustPlaygroundでハッシュマップを操作できます。
演習:ハッシュマップを使用して注文を追跡する
この演習では、ハッシュマップを使用するように自動車工場のプログラムを変更します。
ハッシュマップキーと値のペアを使用して、車の注文に関する詳細を追跡し、出力を表示します。繰り返しになりますが、あなたの課題は、サンプルコードを完成させてコンパイルして実行することです。
この演習のサンプルコードで作業するには、次の2つのオプションがあります。
ノート
サンプルコードで、
todo!
マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。
最初のステップは、既存のプログラムコードを取得することです。
car_quality
、car_factory
およびmain
機能を。次のコードをコピーしてローカル開発環境で編集する
か、この準備されたRustPlaygroundでコードを開きます。
#[derive(PartialEq, Debug)]
struct Car { color: String, motor: Transmission, roof: bool, age: (Age, u32) }
#[derive(PartialEq, Debug)]
enum Transmission { Manual, SemiAuto, Automatic }
#[derive(PartialEq, Debug)]
enum Age { New, Used }
// Get the car quality by testing the value of the input argument
// - miles (u32)
// Return tuple with car age ("New" or "Used") and mileage
fn car_quality (miles: u32) -> (Age, u32) {
// Check if car has accumulated miles
// Return tuple early for Used car
if miles > 0 {
return (Age::Used, miles);
}
// Return tuple for New car, no need for "return" keyword or semicolon
(Age::New, miles)
}
// Build "Car" using input arguments
fn car_factory(order: i32, miles: u32) -> Car {
let colors = ["Blue", "Green", "Red", "Silver"];
// Prevent panic: Check color index for colors array, reset as needed
// Valid color = 1, 2, 3, or 4
// If color > 4, reduce color to valid index
let mut color = order as usize;
if color > 4 {
// color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
color = color - 4;
}
// Add variety to orders for motor type and roof type
let mut motor = Transmission::Manual;
let mut roof = true;
if order % 3 == 0 { // 3, 6, 9
motor = Transmission::Automatic;
} else if order % 2 == 0 { // 2, 4, 8, 10
motor = Transmission::SemiAuto;
roof = false;
} // 1, 5, 7, 11
// Return requested "Car"
Car {
color: String::from(colors[(color-1) as usize]),
motor: motor,
roof: roof,
age: car_quality(miles)
}
}
fn main() {
// Initialize counter variable
let mut order = 1;
// Declare a car as mutable "Car" struct
let mut car: Car;
// Order 6 cars, increment "order" for each request
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #2: Used, Convertible
order = order + 1;
car = car_factory(order, 2000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #3: New, Hard top
order = order + 1;
car = car_factory(order, 0);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #4: New, Convertible
order = order + 1;
car = car_factory(order, 0);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #5: Used, Hard top
order = order + 1;
car = car_factory(order, 3000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
}
2. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされて実行されることを確認してください。
次の出力が表示されます。
1: Used, Hard top = true, Manual, Blue, 1000 miles
2: Used, Hard top = false, SemiAuto, Green, 2000 miles
3: New, Hard top = true, Automatic, Red, 0 miles
4: New, Hard top = false, SemiAuto, Silver, 0 miles
5: Used, Hard top = true, Manual, Blue, 3000 miles
6: Used, Hard top = true, Automatic, Green, 4000 miles
現在のプログラムは、各車の注文を処理し、各注文が完了した後に要約を印刷します。car_factory
関数を呼び出すたびにCar
、注文の詳細を含む構造体が返され、注文が実行されます。結果はcar
変数に格納されます。
お気づきかもしれませんが、このプログラムにはいくつかの重要な機能がありません。すべての注文を追跡しているわけではありません。car
変数は、現在の注文の詳細のみを保持しています。関数car
の結果で変数が更新されるたびcar_factory
に、前の順序の詳細が上書きされます。
ファイリングシステムのようにすべての注文を追跡するために、プログラムを更新する必要があります。この目的のために、<K、V>ペアでハッシュマップを定義します。ハッシュマップキーは、車の注文番号に対応します。ハッシュマップ値は、Car
構造体で定義されているそれぞれの注文の詳細になります。
main
関数の先頭、最初の中括弧の直後に次のコードを追加します{
。// Initialize a hash map for the car orders
// - Key: Car order number, i32
// - Value: Car order details, Car struct
use std::collections::HashMap;
let mut orders: HashMap<i32, Car> = HashMap;
2. orders
ハッシュマップを作成するステートメントの構文の問題を修正します。
ヒント
ハッシュマップを最初から作成しているので、おそらくこの
new()
メソッドを使用することをお勧めします。
3. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされていることを確認してください。コンパイラからの警告メッセージは無視してかまいません。
次のステップは、履行された各自動車注文をハッシュマップに追加することです。
このmain
関数では、car_factory
車の注文ごとに関数を呼び出します。注文が履行された後、println!
マクロを呼び出して、car
変数に格納されている注文の詳細を表示します。
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
新しいハッシュマップで機能するように、これらのコードステートメントを修正します。
car_factory
関数の呼び出しは保持します。返された各Car
構造体は、ハッシュマップの<K、V>ペアの一部として格納されます。println!
マクロの呼び出しを更新して、ハッシュマップに保存されている注文の詳細を表示します。main
関数で、関数の呼び出しcar_factory
とそれに伴うprintln!
マクロの呼び出しを見つけます。// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
2. すべての自動車注文のステートメントの完全なセットを次の改訂されたコードに置き換えます。
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #2: Used, Convertible
order = order + 1;
car = car_factory(order, 2000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #3: New, Hard top
order = order + 1;
car = car_factory(order, 0);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #4: New, Convertible
order = order + 1;
car = car_factory(order, 0);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #5: Used, Hard top
order = order + 1;
car = car_factory(order, 3000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
3. 今すぐプログラムをビルドしようとすると、コンパイルエラーが表示されます。<K、V>ペアをorders
ハッシュマップに追加するステートメントに構文上の問題があります。問題がありますか?先に進んで、ハッシュマップに順序を追加する各ステートメントの問題を修正してください。
ヒント
orders
ハッシュマップに直接値を割り当てることはできません。挿入を行うにはメソッドを使用する必要があります。
プログラムが正常にビルドされると、次の出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 1000) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 2000) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("New", 0) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("New", 0) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 3000) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 4000) })
改訂されたコードの出力が異なることに注意してください。println!
マクロディスプレイの内容Car
各値を示すことによって、構造体と対応するフィールド名。
次の演習では、ループ式を使用してコードの冗長性を減らします。
for、while、およびloop式を使用します
多くの場合、プログラムには、その場で繰り返す必要のあるコードのブロックがあります。ループ式を使用して、繰り返しの実行方法をプログラムに指示できます。電話帳のすべてのエントリを印刷するには、ループ式を使用して、最初のエントリから最後のエントリまで印刷する方法をプログラムに指示できます。
Rustは、プログラムにコードのブロックを繰り返させるための3つのループ式を提供します。
loop
:手動停止が発生しない限り、繰り返します。while
:条件が真のままで繰り返します。for
:コレクション内のすべての値に対して繰り返します。この単元では、これらの各ループ式を見ていきます。
loop
式は、無限ループを作成します。このキーワードを使用すると、式の本文でアクションを継続的に繰り返すことができます。ループを停止させるための直接アクションを実行するまで、アクションが繰り返されます。
次の例では、「We loopforever!」というテキストを出力します。そしてそれはそれ自体で止まりません。println!
アクションは繰り返し続けます。
loop {
println!("We loop forever!");
}
loop
式を使用する場合、ループを停止する唯一の方法は、プログラマーとして直接介入する場合です。特定のコードを追加してループを停止したり、Ctrl + Cなどのキーボード命令を入力してプログラムの実行を停止したりできます。
loop
式を停止する最も一般的な方法は、break
キーワードを使用してブレークポイントを設定することです。
loop {
// Keep printing, printing, printing...
println!("We loop forever!");
// On the other hand, maybe we should stop!
break;
}
プログラムがbreak
キーワードを検出すると、loop
式の本体でアクションの実行を停止し、次のコードステートメントに進みます。
break
キーワードは、特別な機能を明らかにするloop
表現を。break
キーワードを使用すると、式本体でのアクションの繰り返しを停止することも、ブレークポイントで値を返すこともできます。
次の例はbreak
、loop
式でキーワードを使用して値も返す方法を示しています。
let mut counter = 1;
// stop_loop is set when loop stops
let stop_loop = loop {
counter *= 2;
if counter > 100 {
// Stop loop, return counter value
break counter;
}
};
// Loop should break when counter = 128
println!("Break the loop at counter = {}.", stop_loop);
出力は次のとおりです。
Break the loop at counter = 128.
私たちのloop
表現の本体は、これらの連続したアクションを実行します。
stop_loop
変数を宣言します。loop
式の結果にバインドするようにプログラムに指示します。loop
式の本体でアクションを実行します:counter
値を現在の値の2倍にインクリメントします。counter
値を確認してください。counter
値が100以上です。ループから抜け出し、
counter
値を返します。
4. もしcounter
値が100以上ではありません。
ループ本体でアクションを繰り返します。
5. stop_loop
値を式のcounter
結果である値に設定しますloop
。
loop
式本体は、複数のブレークポイントを持つことができます。式に複数のブレークポイントがある場合、すべてのブレークポイントは同じタイプの値を返す必要があります。すべての値は、整数型、文字列型、ブール型などである必要があります。ブレークポイントが明示的に値を返さない場合、プログラムは式の結果を空のタプルとして解釈します()
。
while
ループは、条件式を使用しています。条件式が真である限り、ループが繰り返されます。このキーワードを使用すると、条件式がfalseになるまで、式本体のアクションを実行できます。
while
ループは、ブール条件式を評価することから始まります。条件式がと評価されるtrue
と、本体のアクションが実行されます。アクションが完了すると、制御は条件式に戻ります。条件式がと評価されるfalse
と、while
式は停止します。
次の例では、「しばらくループします...」というテキストを出力します。ループを繰り返すたびに、「カウントが5未満である」という条件がテストされます。条件が真のままである間、式本体のアクションが実行されます。条件が真でなくなった後、while
ループは停止し、プログラムは次のコードステートメントに進みます。
while counter < 5 {
println!("We loop a while...");
counter = counter + 1;
}
for
ループは、項目のコレクションを処理するためにイテレータを使用しています。ループは、コレクション内の各アイテムの式本体のアクションを繰り返します。このタイプのループの繰り返しは、反復と呼ばれます。すべての反復が完了すると、ループは停止します。
Rustでは、配列、ベクトル、ハッシュマップなど、任意のコレクションタイプを反復処理できます。Rustはイテレータを使用して、コレクション内の各アイテムを最初から最後まで移動します。
for
ループはイテレータとして一時変数を使用しています。変数はループ式の開始時に暗黙的に宣言され、現在の値は反復ごとに設定されます。
次のコードでは、コレクションはbig_birds
配列であり、イテレーターの名前はbird
です。
let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds
iter()
メソッドを使用して、コレクション内のアイテムにアクセスします。for
式は結果にイテレータの現在の値をバインドするiter()
方法。式本体では、イテレータ値を操作できます。
let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds.iter() {
println!("The {} is a big bird.", bird);
}
出力は次のとおりです。
The ostrich is a big bird.
The peacock is a big bird.
The stork is a big bird.
イテレータを作成するもう1つの簡単な方法は、範囲表記を使用することですa..b
。イテレータはa
値から始まりb
、1ステップずつ続きますが、値を使用しませんb
。
for number in 0..5 {
println!("{}", number * 2);
}
このコードは、0、1、2、3、および4の数値をnumber
繰り返し処理します。ループの繰り返しごとに、値を変数にバインドします。
出力は次のとおりです。
0
2
4
6
8
このコードを実行して、このRustPlaygroundでループを探索できます。
演習:ループを使用してデータを反復処理する
この演習では、自動車工場のプログラムを変更して、ループを使用して自動車の注文を反復処理します。
main
関数を更新して、注文の完全なセットを処理するためのループ式を追加します。ループ構造は、コードの冗長性を減らすのに役立ちます。コードを簡素化することで、注文量を簡単に増やすことができます。
このcar_factory
関数では、範囲外の値での実行時のパニックを回避するために、別のループを追加します。
課題は、サンプルコードを完成させて、コンパイルして実行することです。
この演習のサンプルコードで作業するには、次の2つのオプションがあります。
ノート
サンプルコードで、
todo!
マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。
前回の演習でプログラムコードを閉じた場合は、この準備されたRustPlaygroundでコードを再度開くことができます。
必ずプログラムを再構築し、コンパイラエラーなしで実行されることを確認してください。
より多くの注文をサポートするには、プログラムを更新する必要があります。現在のコード構造では、冗長ステートメントを使用して6つの注文をサポートしています。冗長性は扱いにくく、維持するのが困難です。
ループ式を使用してアクションを繰り返し、各注文を作成することで、構造を単純化できます。簡略化されたコードを使用すると、多数の注文をすばやく作成できます。
main
機能、削除次の文を。このコードブロックは、order
変数を定義および設定し、自動車の注文のcar_factory
関数とprintln!
マクロを呼び出し、各注文をorders
ハッシュマップに挿入します。// Order 6 cars
// - Increment "order" after each request
// - Add each order <K, V> pair to "orders" hash map
// - Call println! to show order details from the hash map
// Initialize order variable
let mut order = 1;
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
2. 削除されたステートメントを次のコードブロックに置き換えます。
// Start with zero miles
let mut miles = 0;
todo!("Add a loop expression to fulfill orders for 6 cars, initialize `order` variable to 1") {
// Call car_factory to fulfill order
// Add order <K, V> pair to "orders" hash map
// Call println! to show order details from the hash map
car = car_factory(order, miles);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Reset miles for order variety
if miles == 2100 {
miles = 0;
} else {
miles = miles + 700;
}
}
3. アクションを繰り返すループ式を追加して、6台の車の注文を作成します。order
1に初期化された変数が必要です。
4. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。
次の例のような出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })
プログラムは現在、ループを使用して6台の車の注文を処理しています。6台以上注文するとどうなりますか?
main
関数のループ式を更新して、11台の車を注文します。 todo!("Update the loop expression to create 11 cars");
2. プログラムを再構築します。実行時に、プログラムはパニックになります!
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 1.26s
Running `target/debug/playground`
thread 'main' panicked at 'index out of bounds: the len is 4 but the index is 4', src/main.rs:34:29
この問題を解決する方法を見てみましょう。
このcar_factory
関数では、if / else式を使用color
して、colors
配列のインデックスの値を確認します。
// Prevent panic: Check color index for colors array, reset as needed
// Valid color = 1, 2, 3, or 4
// If color > 4, reduce color to valid index
let mut color = order as usize;
if color > 4 {
// color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
color = color - 4;
}
colors
配列には4つの要素を持ち、かつ有効なcolor
場合は、インデックスの範囲は0〜3の条件式をチェックしているcolor
私たちはをチェックしません(インデックスが4よりも大きい場合color
、その後の関数で4に等しいインデックスへのときに我々のインデックスを車の色を割り当てる配列では、インデックス値から1を減算しますcolor - 1
。color
値4はcolors[3]
、配列と同様に処理されます。)
現在のif / else式は、8台以下の車を注文するときの実行時のパニックを防ぐためにうまく機能します。しかし、11台の車を注文すると、プログラムは9番目の注文でパニックになります。より堅牢になるように式を調整する必要があります。この改善を行うために、別のループ式を使用します。
car_factory
機能、ループ式であれば/他の条件文を交換してください。color
インデックス値が4より大きい場合に実行時のパニックを防ぐために、次の擬似コードステートメントを修正してください。// Prevent panic: Check color index, reset as needed
// If color = 1, 2, 3, or 4 - no change needed
// If color > 4, reduce to color to a valid index
let mut color = order as usize;
todo!("Replace `if/else` condition with a loop to prevent run-time panic for color > 4");
ヒント
この場合、if / else条件からループ式への変更は実際には非常に簡単です。
2. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。
次の出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })
Car order 7: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })
Car order 8: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 9: Some(Car { color: "Blue", motor: Automatic, roof: true, age: ("New", 0) })
Car order 10: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 11: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })
このモジュールでは、Rustで使用できるさまざまなループ式を調べ、ハッシュマップの操作方法を発見しました。データは、キーと値のペアとしてハッシュマップに保存されます。ハッシュマップは拡張可能です。
loop
手動でプロセスを停止するまでの式は、アクションを繰り返します。while
式をループして、条件が真である限りアクションを繰り返すことができます。このfor
式は、データ収集を反復処理するために使用されます。
この演習では、自動車プログラムを拡張して、繰り返されるアクションをループし、すべての注文を処理しました。注文を追跡するためにハッシュマップを実装しました。
このラーニングパスの次のモジュールでは、Rustコードでエラーと障害がどのように処理されるかについて詳しく説明します。
リンク: https://docs.microsoft.com/en-us/learn/modules/rust-loop-expressions/
1641276000
Tabular augmentation is a new experimental space that makes use of novel and traditional data generation and synthesisation techniques to improve model prediction success. It is in essence a process of modular feature engineering and observation engineering while emphasising the order of augmentation to achieve the best predicted outcome from a given information set. DeltaPy was created with finance applications in mind, but it can be broadly applied to any data-rich environment.
To take full advantage of tabular augmentation for time-series you would perform the techniques in the following order: (1) transforming, (2) interacting, (3) mapping, (4) extracting, and (5) synthesising. What follows is a practical example of how the above methodology can be used. The purpose here is to establish a framework for table augmentation and to point and guide the user to existing packages.
For most the Colab Notebook format might be preferred. I have enabled comments if you want to ask question or address any issues you uncover. For anything pressing use the issues tab. Also have a look at the SSRN report for a more succinct insights.
Data augmentation can be defined as any method that could increase the size or improve the quality of a dataset by generating new features or instances without the collection of additional data-points. Data augmentation is of particular importance in image classification tasks where additional data can be created by cropping, padding, or flipping existing images.
Tabular cross-sectional and time-series prediction tasks can also benefit from augmentation. Here we divide tabular augmentation into columnular and row-wise methods. Row-wise methods are further divided into extraction and data synthesisation techniques, whereas columnular methods are divided into transformation, interaction, and mapping methods.
See the Skeleton Example, for a combination of multiple methods that lead to a halfing of the mean squared error.
pip install deltapy
@software{deltapy,
title = {{DeltaPy}: Tabular Data Augmentation},
author = {Snow, Derek},
url = {https://github.com/firmai/deltapy/},
version = {0.1.0},
date = {2020-04-11},
}
Snow, Derek, DeltaPy: A Framework for Tabular Data Augmentation in Python (April 22, 2020). Available at SSRN: https://ssrn.com/abstract=3582219
Transformation
df_out = transform.robust_scaler(df.copy(), drop=["Close_1"]); df_out.head()
df_out = transform.standard_scaler(df.copy(), drop=["Close"]); df_out.head()
df_out = transform.fast_fracdiff(df.copy(), ["Close","Open"],0.5); df_out.head()
df_out = transform.windsorization(df.copy(),"Close",para,strategy='both'); df_out.head()
df_out = transform.operations(df.copy(),["Close"]); df_out.head()
df_out = transform.triple_exponential_smoothing(df.copy(),["Close"], 12, .2,.2,.2,0);
df_out = transform.naive_dec(df.copy(), ["Close","Open"]); df_out.head()
df_out = transform.bkb(df.copy(), ["Close"]); df_out.head()
df_out = transform.butter_lowpass_filter(df.copy(),["Close"],4); df_out.head()
df_out = transform.instantaneous_phases(df.copy(), ["Close"]); df_out.head()
df_out = transform.kalman_feat(df.copy(), ["Close"]); df_out.head()
df_out = transform.perd_feat(df.copy(),["Close"]); df_out.head()
df_out = transform.fft_feat(df.copy(), ["Close"]); df_out.head()
df_out = transform.harmonicradar_cw(df.copy(), ["Close"],0.3,0.2); df_out.head()
df_out = transform.saw(df.copy(),["Close","Open"]); df_out.head()
df_out = transform.modify(df.copy(),["Close"]); df_out.head()
df_out = transform.multiple_rolling(df, columns=["Close"]); df_out.head()
df_out = transform.multiple_lags(df, start=1, end=3, columns=["Close"]); df_out.head()
df_out = transform.prophet_feat(df.copy().reset_index(),["Close","Open"],"Date", "D"); df_out.head()
Interaction
df_out = interact.lowess(df.copy(), ["Open","Volume"], df["Close"], f=0.25, iter=3); df_out.head()
df_out = interact.autoregression(df.copy()); df_out.head()
df_out = interact.muldiv(df.copy(), ["Close","Open"]); df_out.head()
df_out = interact.decision_tree_disc(df.copy(), ["Close"]); df_out.head()
df_out = interact.quantile_normalize(df.copy(), drop=["Close"]); df_out.head()
df_out = interact.tech(df.copy()); df_out.head()
df_out = interact.genetic_feat(df.copy()); df_out.head()
Mapping
df_out = mapper.pca_feature(df.copy(),variance_or_components=0.80,drop_cols=["Close_1"]); df_out.head()
df_out = mapper.cross_lag(df.copy()); df_out.head()
df_out = mapper.a_chi(df.copy()); df_out.head()
df_out = mapper.encoder_dataset(df.copy(), ["Close_1"], 15); df_out.head()
df_out = mapper.lle_feat(df.copy(),["Close_1"],4); df_out.head()
df_out = mapper.feature_agg(df.copy(),["Close_1"],4 ); df_out.head()
df_out = mapper.neigh_feat(df.copy(),["Close_1"],4 ); df_out.head()
Extraction
extract.abs_energy(df["Close"])
extract.cid_ce(df["Close"], True)
extract.mean_abs_change(df["Close"])
extract.mean_second_derivative_central(df["Close"])
extract.variance_larger_than_standard_deviation(df["Close"])
extract.var_index(df["Close"].values,var_index_param)
extract.symmetry_looking(df["Close"])
extract.has_duplicate_max(df["Close"])
extract.partial_autocorrelation(df["Close"])
extract.augmented_dickey_fuller(df["Close"])
extract.gskew(df["Close"])
extract.stetson_mean(df["Close"])
extract.length(df["Close"])
extract.count_above_mean(df["Close"])
extract.longest_strike_below_mean(df["Close"])
extract.wozniak(df["Close"])
extract.last_location_of_maximum(df["Close"])
extract.fft_coefficient(df["Close"])
extract.ar_coefficient(df["Close"])
extract.index_mass_quantile(df["Close"])
extract.number_cwt_peaks(df["Close"])
extract.spkt_welch_density(df["Close"])
extract.linear_trend_timewise(df["Close"])
extract.c3(df["Close"])
extract.binned_entropy(df["Close"])
extract.svd_entropy(df["Close"].values)
extract.hjorth_complexity(df["Close"])
extract.max_langevin_fixed_point(df["Close"])
extract.percent_amplitude(df["Close"])
extract.cad_prob(df["Close"])
extract.zero_crossing_derivative(df["Close"])
extract.detrended_fluctuation_analysis(df["Close"])
extract.fisher_information(df["Close"])
extract.higuchi_fractal_dimension(df["Close"])
extract.petrosian_fractal_dimension(df["Close"])
extract.hurst_exponent(df["Close"])
extract.largest_lyauponov_exponent(df["Close"])
extract.whelch_method(df["Close"])
extract.find_freq(df["Close"])
extract.flux_perc(df["Close"])
extract.range_cum_s(df["Close"])
extract.structure_func(df["Close"])
extract.kurtosis(df["Close"])
extract.stetson_k(df["Close"])
Test sets should ideally not be preprocessed with the training data, as in such a way one could be peaking ahead in the training data. The preprocessing parameters should be identified on the test set and then applied on the test set, i.e., the test set should not have an impact on the transformation applied. As an example, you would learn the parameters of PCA decomposition on the training set and then apply the parameters to both the train and the test set.
The benefit of pipelines become clear when one wants to apply multiple augmentation methods. It makes it easy to learn the parameters and then apply them widely. For the most part, this notebook does not concern itself with 'peaking ahead' or pipelines, for some functions, one might have to restructure to code and make use of open source packages to create your preferred solution.
Notebook Dependencies
pip install deltapy
pip install pykalman
pip install tsaug
pip install ta
pip install tsaug
pip install pandasvault
pip install gplearn
pip install ta
pip install seasonal
pip install pandasvault
import pandas as pd
import numpy as np
from deltapy import transform, interact, mapper, extract
import warnings
warnings.filterwarnings('ignore')
def data_copy():
df = pd.read_csv("https://github.com/firmai/random-assets-two/raw/master/numpy/tsla.csv")
df["Close_1"] = df["Close"].shift(-1)
df = df.dropna()
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
return df
df = data_copy(); df.head()
Some of these categories are fluid and some techniques could fit into multiple buckets. This is an attempt to find an exhaustive number of techniques, but not an exhaustive list of implementations of the techniques. For example, there are thousands of ways to smooth a time-series, but we have only includes 1-2 techniques of interest under each category.
Here transformation is any method that includes only one feature as an input to produce a new feature/s. Transformations can be applied to cross-section and time-series data. Some transformations are exclusive to time-series data (smoothing, filtering), but a handful of functions apply to both.
Where the time series methods has a centred mean, or are forward-looking, there is a need to recalculate the outputed time series on a running basis to ensure that information of the future does not leak into the model. The last value of this recalculated series or an extracted feature from this series can then be used as a running value that is only backward looking, satisfying the no 'peaking' ahead rule.
There are some packaged in Python that dynamically create time series and extracts their features, but none that incoropates the dynamic creation of a time series in combination with a wide application of prespecified list of extractions. Because this technique is expensive, we have a preference for models that only take historical data into account.
In this section we will include a list of all types of transformations, those that only use present information (operations), those that incorporate all values (interpolation methods), those that only include past values (smoothing functions), and those that incorporate a subset window of lagging and leading values (select filters). Only those that use historical values or are turned into prediction methods can be used out of the box. The entire time series can be used in the model development process for historical value methods, and only the forecasted values can be used for prediction models.
Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. When using an interpolation method, you are taking future information into account e.g, cubic spline. You can use interpolation methods to forecast into the future (extrapolation), and then use those forecasts in a training set. Or you could recalculate the interpolation for each time step and then extract features out of that series (extraction method). Interpolation and other forward-looking methods can be used if they are turned into prediction problems, then the forecasted values can be trained and tested on, and the fitted data can be diregarded. In the list presented below the first five methods can be used for cross-section and time series data, after that the time-series only methods follow.
There are a multitude of scaling methods available. Scaling generally gets applied to the entire dataset and is especially necessary for certain algorithms. K-means make use of euclidean distance hence the need for scaling. For PCA because we are trying to identify the feature with maximus variance we also need scaling. Similarly, we need scaled features for gradient descent. Any algorithm that is not based on a distance measure is not affected by feature scaling. Some of the methods include range scalers like minimum-maximum scaler, maximum absolute scaler or even standardisation methods like the standard scaler can be used for scaling. The example used here is robust scaler. Normalisation is a good technique when you don't know the distribution of the data. Scaling looks into the future, so parameters have to be training on a training set and applied to a test set.
(i) Robust Scaler
Scaling according to the interquartile range, making it robust to outliers.
def robust_scaler(df, drop=None,quantile_range=(25, 75) ):
if drop:
keep = df[drop]
df = df.drop(drop, axis=1)
center = np.median(df, axis=0)
quantiles = np.percentile(df, quantile_range, axis=0)
scale = quantiles[1] - quantiles[0]
df = (df - center) / scale
if drop:
df = pd.concat((keep,df),axis=1)
return df
df_out = transform.robust_scaler(df.copy(), drop=["Close_1"]); df_out.head()
When using a standardisation method, it is often more effective when the attribute itself if Gaussian. It is also useful to apply the technique when the model you want to use makes assumptions of Gaussian distributions like linear regression, logistic regression, and linear discriminant analysis. For most applications, standardisation is recommended.
(i) Standard Scaler
Standardize features by removing the mean and scaling to unit variance
def standard_scaler(df,drop ):
if drop:
keep = df[drop]
df = df.drop(drop, axis=1)
mean = np.mean(df, axis=0)
scale = np.std(df, axis=0)
df = (df - mean) / scale
if drop:
df = pd.concat((keep,df),axis=1)
return df
df_out = transform.standard_scaler(df.copy(), drop=["Close"]); df_out.head()
Computing the differences between consecutive observation, normally used to obtain a stationary time series.
(i) Fractional Differencing
Fractional differencing, allows us to achieve stationarity while maintaining the maximum amount of memory compared to integer differencing.
import pylab as pl
def fast_fracdiff(x, cols, d):
for col in cols:
T = len(x[col])
np2 = int(2 ** np.ceil(np.log2(2 * T - 1)))
k = np.arange(1, T)
b = (1,) + tuple(np.cumprod((k - d - 1) / k))
z = (0,) * (np2 - T)
z1 = b + z
z2 = tuple(x[col]) + z
dx = pl.ifft(pl.fft(z1) * pl.fft(z2))
x[col+"_frac"] = np.real(dx[0:T])
return x
df_out = transform.fast_fracdiff(df.copy(), ["Close","Open"],0.5); df_out.head()
Any method that provides sets a floor and a cap to a feature's value. Capping can affect the distribution of data, so it should not be exagerated. One can cap values by using the average, by using the max and min values, or by an arbitrary extreme value.
(i) Winzorisation
The transformation of features by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers by replacing it with a certain percentile value.
def outlier_detect(data,col,threshold=1,method="IQR"):
if method == "IQR":
IQR = data[col].quantile(0.75) - data[col].quantile(0.25)
Lower_fence = data[col].quantile(0.25) - (IQR * threshold)
Upper_fence = data[col].quantile(0.75) + (IQR * threshold)
if method == "STD":
Upper_fence = data[col].mean() + threshold * data[col].std()
Lower_fence = data[col].mean() - threshold * data[col].std()
if method == "OWN":
Upper_fence = data[col].mean() + threshold * data[col].std()
Lower_fence = data[col].mean() - threshold * data[col].std()
if method =="MAD":
median = data[col].median()
median_absolute_deviation = np.median([np.abs(y - median) for y in data[col]])
modified_z_scores = pd.Series([0.6745 * (y - median) / median_absolute_deviation for y in data[col]])
outlier_index = np.abs(modified_z_scores) > threshold
print('Num of outlier detected:',outlier_index.value_counts()[1])
print('Proportion of outlier detected',outlier_index.value_counts()[1]/len(outlier_index))
return outlier_index, (median_absolute_deviation, median_absolute_deviation)
para = (Upper_fence, Lower_fence)
tmp = pd.concat([data[col]>Upper_fence,data[col]<Lower_fence],axis=1)
outlier_index = tmp.any(axis=1)
print('Num of outlier detected:',outlier_index.value_counts()[1])
print('Proportion of outlier detected',outlier_index.value_counts()[1]/len(outlier_index))
return outlier_index, para
def windsorization(data,col,para,strategy='both'):
"""
top-coding & bottom coding (capping the maximum of a distribution at an arbitrarily set value,vice versa)
"""
data_copy = data.copy(deep=True)
if strategy == 'both':
data_copy.loc[data_copy[col]>para[0],col] = para[0]
data_copy.loc[data_copy[col]<para[1],col] = para[1]
elif strategy == 'top':
data_copy.loc[data_copy[col]>para[0],col] = para[0]
elif strategy == 'bottom':
data_copy.loc[data_copy[col]<para[1],col] = para[1]
return data_copy
_, para = transform.outlier_detect(df, "Close")
df_out = transform.windsorization(df.copy(),"Close",para,strategy='both'); df_out.head()
Operations here are treated like traditional transformations. It is the replacement of a variable by a function of that variable. In a stronger sense, a transformation is a replacement that changes the shape of a distribution or relationship.
(i) Power, Log, Recipricol, Square Root
def operations(df,features):
df_new = df[features]
df_new = df_new - df_new.min()
sqr_name = [str(fa)+"_POWER_2" for fa in df_new.columns]
log_p_name = [str(fa)+"_LOG_p_one_abs" for fa in df_new.columns]
rec_p_name = [str(fa)+"_RECIP_p_one" for fa in df_new.columns]
sqrt_name = [str(fa)+"_SQRT_p_one" for fa in df_new.columns]
df_sqr = pd.DataFrame(np.power(df_new.values, 2),columns=sqr_name, index=df.index)
df_log = pd.DataFrame(np.log(df_new.add(1).abs().values),columns=log_p_name, index=df.index)
df_rec = pd.DataFrame(np.reciprocal(df_new.add(1).values),columns=rec_p_name, index=df.index)
df_sqrt = pd.DataFrame(np.sqrt(df_new.abs().add(1).values),columns=sqrt_name, index=df.index)
dfs = [df, df_sqr, df_log, df_rec, df_sqrt]
df= pd.concat(dfs, axis=1)
return df
df_out = transform.operations(df.copy(),["Close"]); df_out.head()
Here we maintain that any method that has a component of historical averaging is a smoothing method such as a simple moving average and single, double and tripple exponential smoothing methods. These forms of non-causal filters are also popular in signal processing and are called filters, where exponential smoothing is called an IIR filter and a moving average a FIR filter with equal weighting factors.
(i) Tripple Exponential Smoothing (Holt-Winters Exponential Smoothing)
The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations — one for the level $ℓt$, one for the trend &bt&, and one for the seasonal component $st$. This particular version is performed by looking at the last 12 periods. For that reason, the first 12 records should be disregarded because they can't make use of the required window size for a fair calculation. The calculation is such that values are still provided for those periods based on whatever data might be available.
def initial_trend(series, slen):
sum = 0.0
for i in range(slen):
sum += float(series[i+slen] - series[i]) / slen
return sum / slen
def initial_seasonal_components(series, slen):
seasonals = {}
season_averages = []
n_seasons = int(len(series)/slen)
# compute season averages
for j in range(n_seasons):
season_averages.append(sum(series[slen*j:slen*j+slen])/float(slen))
# compute initial values
for i in range(slen):
sum_of_vals_over_avg = 0.0
for j in range(n_seasons):
sum_of_vals_over_avg += series[slen*j+i]-season_averages[j]
seasonals[i] = sum_of_vals_over_avg/n_seasons
return seasonals
def triple_exponential_smoothing(df,cols, slen, alpha, beta, gamma, n_preds):
for col in cols:
result = []
seasonals = initial_seasonal_components(df[col], slen)
for i in range(len(df[col])+n_preds):
if i == 0: # initial values
smooth = df[col][0]
trend = initial_trend(df[col], slen)
result.append(df[col][0])
continue
if i >= len(df[col]): # we are forecasting
m = i - len(df[col]) + 1
result.append((smooth + m*trend) + seasonals[i%slen])
else:
val = df[col][i]
last_smooth, smooth = smooth, alpha*(val-seasonals[i%slen]) + (1-alpha)*(smooth+trend)
trend = beta * (smooth-last_smooth) + (1-beta)*trend
seasonals[i%slen] = gamma*(val-smooth) + (1-gamma)*seasonals[i%slen]
result.append(smooth+trend+seasonals[i%slen])
df[col+"_TES"] = result
#print(seasonals)
return df
df_out= transform.triple_exponential_smoothing(df.copy(),["Close"], 12, .2,.2,.2,0); df_out.head()
Decomposition procedures are used in time series to describe the trend and seasonal factors in a time series. More extensive decompositions might also include long-run cycles, holiday effects, day of week effects and so on. Here, we’ll only consider trend and seasonal decompositions. A naive decomposition makes use of moving averages, other decomposition methods are available that make use of LOESS.
(i) Naive Decomposition
The base trend takes historical information into account and established moving averages; it does not have to be linear. To estimate the seasonal component for each season, simply average the detrended values for that season. If the seasonal variation looks constant, we should use the additive model. If the magnitude is increasing as a function of time, we will use multiplicative. Here because it is predictive in nature we are using a one sided moving average, as opposed to a two-sided centred average.
import statsmodels.api as sm
def naive_dec(df, columns, freq=2):
for col in columns:
decomposition = sm.tsa.seasonal_decompose(df[col], model='additive', freq = freq, two_sided=False)
df[col+"_NDDT" ] = decomposition.trend
df[col+"_NDDT"] = decomposition.seasonal
df[col+"_NDDT"] = decomposition.resid
return df
df_out = transform.naive_dec(df.copy(), ["Close","Open"]); df_out.head()
It is often useful to either low-pass filter (smooth) time series in order to reveal low-frequency features and trends, or to high-pass filter (detrend) time series in order to isolate high frequency transients (e.g. storms). Low pass filters use historical values, high-pass filters detrends with low-pass filters, so also indirectly uses historical values.
There are a few filters available, closely associated with decompositions and smoothing functions. The Hodrick-Prescott filter separates a time-series $yt$ into a trend $τt$ and a cyclical component $ζt$. The Christiano-Fitzgerald filter is a generalization of Baxter-King filter and can be seen as weighted moving average.
(i) Baxter-King Bandpass
The Baxter-King filter is intended to explicitly deal with the periodicity of the business cycle. By applying their band-pass filter to a series, they produce a new series that does not contain fluctuations at higher or lower than those of the business cycle. The parameters are arbitrarily chosen. This method uses a centred moving average that has to be changed to a lagged moving average before it can be used as an input feature. The maximum period of oscillation should be used as the point to truncate the dataset, as that part of the time series does not incorporate all the required datapoints.
import statsmodels.api as sm
def bkb(df, cols):
for col in cols:
df[col+"_BPF"] = sm.tsa.filters.bkfilter(df[[col]].values, 2, 10, len(df)-1)
return df
df_out = transform.bkb(df.copy(), ["Close"]); df_out.head()
(ii) Butter Lowpass (IIR Filter Design)
The Butterworth filter is a type of signal processing filter designed to have a frequency response as flat as possible in the passban. Like other filtersm the first few values have to be disregarded for accurate downstream prediction. Instead of disregarding these values on a per case basis, they can be diregarded in one chunk once the database of transformed features have been developed.
from scipy import signal, integrate
def butter_lowpass(cutoff, fs=20, order=5):
nyq = 0.5 * fs
normal_cutoff = cutoff / nyq
b, a = signal.butter(order, normal_cutoff, btype='low', analog=False)
return b, a
def butter_lowpass_filter(df,cols, cutoff, fs=20, order=5):
b, a = butter_lowpass(cutoff, fs, order=order)
for col in cols:
df[col+"_BUTTER"] = signal.lfilter(b, a, df[col])
return df
df_out = transform.butter_lowpass_filter(df.copy(),["Close"],4); df_out.head()
(iii) Hilbert Transform Angle
The Hilbert transform is a time-domain to time-domain transformation which shifts the phase of a signal by 90 degrees. It is also a centred measure and would be difficult to use in a time series prediction setting, unless it is recalculated on a per step basis or transformed to be based on historical values only.
from scipy import signal
import numpy as np
def instantaneous_phases(df,cols):
for col in cols:
df[col+"_HILLB"] = np.unwrap(np.angle(signal.hilbert(df[col], axis=0)), axis=0)
return df
df_out = transform.instantaneous_phases(df.copy(), ["Close"]); df_out.head()
(iiiv) Unscented Kalman Filter
The Kalman filter is better suited for estimating things that change over time. The most tangible example is tracking moving objects. A Kalman filter will be very close to the actual trajectory because it says the most recent measurement is more important than the older ones. The Unscented Kalman Filter (UKF) is a model based-techniques that recursively estimates the states (and with some modifications also parameters) of a nonlinear, dynamic, discrete-time system. The UKF is based on the typical prediction-correction style methods. The Kalman Smoother incorporates future values, the Filter doesn't and can be used for online prediction. The normal Kalman filter is a forward filter in the sense that it makes forecast of the current state using only current and past observations, whereas the smoother is based on computing a suitable linear combination of two filters, which are ran in forward and backward directions.
from pykalman import UnscentedKalmanFilter
def kalman_feat(df, cols):
for col in cols:
ukf = UnscentedKalmanFilter(lambda x, w: x + np.sin(w), lambda x, v: x + v, observation_covariance=0.1)
(filtered_state_means, filtered_state_covariances) = ukf.filter(df[col])
(smoothed_state_means, smoothed_state_covariances) = ukf.smooth(df[col])
df[col+"_UKFSMOOTH"] = smoothed_state_means.flatten()
df[col+"_UKFFILTER"] = filtered_state_means.flatten()
return df
df_out = transform.kalman_feat(df.copy(), ["Close"]); df_out.head()
There are a range of functions for spectral analysis. You can use periodograms and the welch method to estimate the power spectral density. You can also use the welch method to estimate the cross power spectral density. Other techniques include spectograms, Lomb-Scargle periodograms and, short time fourier transform.
(i) Periodogram
This returns an array of sample frequencies and the power spectrum of x, or the power spectral density of x.
from scipy import signal
def perd_feat(df, cols):
for col in cols:
sig = signal.periodogram(df[col],fs=1, return_onesided=False)
df[col+"_FREQ"] = sig[0]
df[col+"_POWER"] = sig[1]
return df
df_out = transform.perd_feat(df.copy(),["Close"]); df_out.head()
(ii) Fast Fourier Transform
The FFT, or fast fourier transform is an algorithm that essentially uses convolution techniques to efficiently find the magnitude and location of the tones that make up the signal of interest. We can often play with the FFT spectrum, by adding and removing successive tones (which is akin to selectively filtering particular tones that make up the signal), in order to obtain a smoothed version of the underlying signal. This takes the entire signal into account, and as a result has to be recalculated on a running basis to avoid peaking into the future.
def fft_feat(df, cols):
for col in cols:
fft_df = np.fft.fft(np.asarray(df[col].tolist()))
fft_df = pd.DataFrame({'fft':fft_df})
df[col+'_FFTABS'] = fft_df['fft'].apply(lambda x: np.abs(x)).values
df[col+'_FFTANGLE'] = fft_df['fft'].apply(lambda x: np.angle(x)).values
return df
df_out = transform.fft_feat(df.copy(), ["Close"]); df_out.head()
The waveform of a signal is the shape of its graph as a function of time.
(i) Continuous Wave Radar
from scipy import signal
def harmonicradar_cw(df, cols, fs,fc):
for col in cols:
ttxt = f'CW: {fc} Hz'
#%% input
t = df[col]
tx = np.sin(2*np.pi*fc*t)
_,Pxx = signal.welch(tx,fs)
#%% diode
d = (signal.square(2*np.pi*fc*t))
d[d<0] = 0.
#%% output of diode
rx = tx * d
df[col+"_HARRAD"] = rx.values
return df
df_out = transform.harmonicradar_cw(df.copy(), ["Close"],0.3,0.2); df_out.head()
(ii) Saw Tooth
Return a periodic sawtooth or triangle waveform.
def saw(df, cols):
for col in cols:
df[col+" SAW"] = signal.sawtooth(df[col])
return df
df_out = transform.saw(df.copy(),["Close","Open"]); df_out.head()
(9) Modifications
A range of modification usually applied ot images, these values would have to be recalculate for each time-series.
(i) Various Techniques
from tsaug import *
def modify(df, cols):
for col in cols:
series = df[col].values
df[col+"_magnify"], _ = magnify(series, series)
df[col+"_affine"], _ = affine(series, series)
df[col+"_crop"], _ = crop(series, series)
df[col+"_cross_sum"], _ = cross_sum(series, series)
df[col+"_resample"], _ = resample(series, series)
df[col+"_trend"], _ = trend(series, series)
df[col+"_random_affine"], _ = random_time_warp(series, series)
df[col+"_random_crop"], _ = random_crop(series, series)
df[col+"_random_cross_sum"], _ = random_cross_sum(series, series)
df[col+"_random_sidetrack"], _ = random_sidetrack(series, series)
df[col+"_random_time_warp"], _ = random_time_warp(series, series)
df[col+"_random_magnify"], _ = random_magnify(series, series)
df[col+"_random_jitter"], _ = random_jitter(series, series)
df[col+"_random_trend"], _ = random_trend(series, series)
return df
df_out = transform.modify(df.copy(),["Close"]); df_out.head()
Features that are calculated on a rolling basis over fixed window size.
(i) Mean, Standard Deviation
def multiple_rolling(df, windows = [1,2], functions=["mean","std"], columns=None):
windows = [1+a for a in windows]
if not columns:
columns = df.columns.to_list()
rolling_dfs = (df[columns].rolling(i) # 1. Create window
.agg(functions) # 1. Aggregate
.rename({col: '{0}_{1:d}'.format(col, i)
for col in columns}, axis=1) # 2. Rename columns
for i in windows) # For each window
df_out = pd.concat((df, *rolling_dfs), axis=1)
da = df_out.iloc[:,len(df.columns):]
da = [col[0] + "_" + col[1] for col in da.columns.to_list()]
df_out.columns = df.columns.to_list() + da
return df_out # 3. Concatenate dataframes
df_out = transform.multiple_rolling(df, columns=["Close"]); df_out.head()
Lagged values from existing features.
(i) Single Steps
def multiple_lags(df, start=1, end=3,columns=None):
if not columns:
columns = df.columns.to_list()
lags = range(start, end+1) # Just two lags for demonstration.
df = df.assign(**{
'{}_t_{}'.format(col, t): df[col].shift(t)
for t in lags
for col in columns
})
return df
df_out = transform.multiple_lags(df, start=1, end=3, columns=["Close"]); df_out.head()
There are a range of time series model that can be implemented like AR, MA, ARMA, ARIMA, SARIMA, SARIMAX, VAR, VARMA, VARMAX, SES, and HWES. The models can be divided into autoregressive models and smoothing models. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. Each method might requre specific tuning and parameters to suit your prediction task. You need to drop a certain amount of historical data that you use during the fitting stage. Models that take seasonality into account need more training data.
(i) Prophet
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality. You can apply additive models to your training data but also interactive models like deep learning models. The problem is that because these models have learned from future observations, there would this be a need to recalculate the time series on a running basis, or to only include the predicted as opposed to fitted values in future training and test sets. In this example, I train on 150 data points to illustrate how the remaining or so 100 datapoints can be used in a new prediction problem. You can plot with df["PROPHET"].plot()
to see the effect.
You can apply additive models to your training data but also interactive models like deep learning models. The problem is that these models have learned from future observations, there would this be a need to recalculate the time series on a running basis, or to only include the predicted as opposed to fitted values in future training and test sets.
from fbprophet import Prophet
def prophet_feat(df, cols,date, freq,train_size=150):
def prophet_dataframe(df):
df.columns = ['ds','y']
return df
def original_dataframe(df, freq, name):
prophet_pred = pd.DataFrame({"Date" : df['ds'], name : df["yhat"]})
prophet_pred = prophet_pred.set_index("Date")
#prophet_pred.index.freq = pd.tseries.frequencies.to_offset(freq)
return prophet_pred[name].values
for col in cols:
model = Prophet(daily_seasonality=True)
fb = model.fit(prophet_dataframe(df[[date, col]].head(train_size)))
forecast_len = len(df) - train_size
future = model.make_future_dataframe(periods=forecast_len,freq=freq)
future_pred = model.predict(future)
df[col+"_PROPHET"] = list(original_dataframe(future_pred,freq,col))
return df
df_out = transform.prophet_feat(df.copy().reset_index(),["Close","Open"],"Date", "D"); df_out.head()
Interactions are defined as methods that require more than one feature to create an additional feature. Here we include normalising and discretising techniques that are non-feature specific. Almost all of these method can be applied to cross-section method. The only methods that are time specific is the technical features in the speciality section and the autoregression model.
Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables.
(i) Lowess Smoother
The lowess smoother is a robust locally weighted regression. The function fits a nonparametric regression curve to a scatterplot.
from math import ceil
import numpy as np
from scipy import linalg
import math
def lowess(df, cols, y, f=2. / 3., iter=3):
for col in cols:
n = len(df[col])
r = int(ceil(f * n))
h = [np.sort(np.abs(df[col] - df[col][i]))[r] for i in range(n)]
w = np.clip(np.abs((df[col][:, None] - df[col][None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iter):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * df[col])])
A = np.array([[np.sum(weights), np.sum(weights * df[col])],
[np.sum(weights * df[col]), np.sum(weights * df[col] * df[col])]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * df[col][i]
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
df[col+"_LOWESS"] = yest
return df
df_out = interact.lowess(df.copy(), ["Open","Volume"], df["Close"], f=0.25, iter=3); df_out.head()
Autoregression
Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step
from statsmodels.tsa.ar_model import AR
from timeit import default_timer as timer
def autoregression(df, drop=None, settings={"autoreg_lag":4}):
autoreg_lag = settings["autoreg_lag"]
if drop:
keep = df[drop]
df = df.drop([drop],axis=1).values
n_channels = df.shape[0]
t = timer()
channels_regg = np.zeros((n_channels, autoreg_lag + 1))
for i in range(0, n_channels):
fitted_model = AR(df.values[i, :]).fit(autoreg_lag)
# TODO: This is not the same as Matlab's for some reasons!
# kk = ARMAResults(fitted_model)
# autore_vals, dummy1, dummy2 = arburg(x[i, :], autoreg_lag) # This looks like Matlab's but slow
channels_regg[i, 0: len(fitted_model.params)] = np.real(fitted_model.params)
for i in range(channels_regg.shape[1]):
df["LAG_"+str(i+1)] = channels_regg[:,i]
if drop:
df = pd.concat((keep,df),axis=1)
t = timer() - t
return df
df_out = interact.autoregression(df.copy()); df_out.head()
Looking at interaction between different features. Here the methods employed are multiplication and division.
(i) Multiplication and Division
def muldiv(df, feature_list):
for feat in feature_list:
for feat_two in feature_list:
if feat==feat_two:
continue
else:
df[feat+"/"+feat_two] = df[feat]/(df[feat_two]-df[feat_two].min()) #zero division guard
df[feat+"_X_"+feat_two] = df[feat]*(df[feat_two])
return df
df_out = interact.muldiv(df.copy(), ["Close","Open"]); df_out.head()
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes
(i) Decision Tree Discretiser
The first method that will be applies here is a supersived discretiser. Discretisation with Decision Trees consists of using a decision tree to identify the optimal splitting points that would determine the bins or contiguous intervals.
from sklearn.tree import DecisionTreeRegressor
def decision_tree_disc(df, cols, depth=4 ):
for col in cols:
df[col +"_m1"] = df[col].shift(1)
df = df.iloc[1:,:]
tree_model = DecisionTreeRegressor(max_depth=depth,random_state=0)
tree_model.fit(df[col +"_m1"].to_frame(), df[col])
df[col+"_Disc"] = tree_model.predict(df[col +"_m1"].to_frame())
return df
df_out = interact.decision_tree_disc(df.copy(), ["Close"]); df_out.head()
Normalising normally pertains to the scaling of data. There are many method available, interacting normalising methods makes use of all the feature's attributes to do the scaling.
(i) Quantile Normalisation
In statistics, quantile normalization is a technique for making two distributions identical in statistical properties.
import numpy as np
import pandas as pd
def quantile_normalize(df, drop):
if drop:
keep = df[drop]
df = df.drop(drop,axis=1)
#compute rank
dic = {}
for col in df:
dic.update({col : sorted(df[col])})
sorted_df = pd.DataFrame(dic)
rank = sorted_df.mean(axis = 1).tolist()
#sort
for col in df:
t = np.searchsorted(np.sort(df[col]), df[col])
df[col] = [rank[i] for i in t]
if drop:
df = pd.concat((keep,df),axis=1)
return df
df_out = interact.quantile_normalize(df.copy(), drop=["Close"]); df_out.head()
There are multiple types of distance functions like Euclidean, Mahalanobis, and Minkowski distance. Here we are using a contrived example in a location based haversine distance.
(i) Haversine Distance
The Haversine (or great circle) distance is the angular distance between two points on the surface of a sphere.
from math import sin, cos, sqrt, atan2, radians
def haversine_distance(row, lon="Open", lat="Close"):
c_lat,c_long = radians(52.5200), radians(13.4050)
R = 6373.0
long = radians(row['Open'])
lat = radians(row['Close'])
dlon = long - c_long
dlat = lat - c_lat
a = sin(dlat / 2)**2 + cos(lat) * cos(c_lat) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
return R * c
df_out['distance_central'] = df.apply(interact.haversine_distance,axis=1); df_out.head()
(i) Technical Features
Technical indicators are heuristic or mathematical calculations based on the price, volume, or open interest of a security or contract used by traders who follow technical analysis. By analyzing historical data, technical analysts use indicators to predict future price movements.
import ta
def tech(df):
return ta.add_all_ta_features(df, open="Open", high="High", low="Low", close="Close", volume="Volume")
df_out = interact.tech(df.copy()); df_out.head()
Genetic programming has shown promise in constructing feature by osing original features to form high-level ones that can help algorithms achieve better performance.
(i) Symbolic Transformer
A symbolic transformer is a supervised transformer that begins by building a population of naive random formulas to represent a relationship.
df.head()
from gplearn.genetic import SymbolicTransformer
def genetic_feat(df, num_gen=20, num_comp=10):
function_set = ['add', 'sub', 'mul', 'div',
'sqrt', 'log', 'abs', 'neg', 'inv','tan']
gp = SymbolicTransformer(generations=num_gen, population_size=200,
hall_of_fame=100, n_components=num_comp,
function_set=function_set,
parsimony_coefficient=0.0005,
max_samples=0.9, verbose=1,
random_state=0, n_jobs=6)
gen_feats = gp.fit_transform(df.drop("Close_1", axis=1), df["Close_1"]); df.iloc[:,:8]
gen_feats = pd.DataFrame(gen_feats, columns=["gen_"+str(a) for a in range(gen_feats.shape[1])])
gen_feats.index = df.index
return pd.concat((df,gen_feats),axis=1)
df_out = interact.genetic_feat(df.copy()); df_out.head()
Methods that help with the summarisation of features by remapping them to achieve some aim like the maximisation of variability or class separability. These methods tend to be unsupervised, but can also take an supervised form.
Eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Some examples are LDA and PCA.
(i) Principal Component Analysis
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
def pca_feature(df, memory_issues=False,mem_iss_component=False,variance_or_components=0.80,n_components=5 ,drop_cols=None, non_linear=True):
if non_linear:
pca = KernelPCA(n_components = n_components, kernel='rbf', fit_inverse_transform=True, random_state = 33, remove_zero_eig= True)
else:
if memory_issues:
if not mem_iss_component:
raise ValueError("If you have memory issues, you have to preselect mem_iss_component")
pca = IncrementalPCA(mem_iss_component)
else:
if variance_or_components>1:
pca = PCA(n_components=variance_or_components)
else: # automated selection based on variance
pca = PCA(n_components=variance_or_components,svd_solver="full")
if drop_cols:
X_pca = pca.fit_transform(df.drop(drop_cols,axis=1))
return pd.concat((df[drop_cols],pd.DataFrame(X_pca, columns=["PCA_"+str(i+1) for i in range(X_pca.shape[1])],index=df.index)),axis=1)
else:
X_pca = pca.fit_transform(df)
return pd.DataFrame(X_pca, columns=["PCA_"+str(i+1) for i in range(X_pca.shape[1])],index=df.index)
return df
df_out = mapper.pca_feature(df.copy(), variance_or_components=0.9, n_components=8,non_linear=False)
These families of algorithms are useful to find linear relations between two multivariate datasets.
(1) Canonical Correlation Analysis
Canonical-correlation analysis (CCA) is a way of inferring information from cross-covariance matrices.
from sklearn.cross_decomposition import CCA
def cross_lag(df, drop=None, lags=1, components=4 ):
if drop:
keep = df[drop]
df = df.drop([drop],axis=1)
df_2 = df.shift(lags)
df = df.iloc[lags:,:]
df_2 = df_2.dropna().reset_index(drop=True)
cca = CCA(n_components=components)
cca.fit(df_2, df)
X_c, df_2 = cca.transform(df_2, df)
df_2 = pd.DataFrame(df_2, index=df.index)
df_2 = df.add_prefix('crd_')
if drop:
df = pd.concat([keep,df,df_2],axis=1)
else:
df = pd.concat([df,df_2],axis=1)
return df
df_out = mapper.cross_lag(df.copy()); df_out.head()
Functions that approximate the feature mappings that correspond to certain kernels, as they are used for example in support vector machines.
(i) Additive Chi2 Kernel
Computes the additive chi-squared kernel between observations in X and Y The chi-squared kernel is computed between each pair of rows in X and Y. X and Y have to be non-negative.
from sklearn.kernel_approximation import AdditiveChi2Sampler
def a_chi(df, drop=None, lags=1, sample_steps=2 ):
if drop:
keep = df[drop]
df = df.drop([drop],axis=1)
df_2 = df.shift(lags)
df = df.iloc[lags:,:]
df_2 = df_2.dropna().reset_index(drop=True)
chi2sampler = AdditiveChi2Sampler(sample_steps=sample_steps)
df_2 = chi2sampler.fit_transform(df_2, df["Close"])
df_2 = pd.DataFrame(df_2, index=df.index)
df_2 = df.add_prefix('achi_')
if drop:
df = pd.concat([keep,df,df_2],axis=1)
else:
df = pd.concat([df,df_2],axis=1)
return df
df_out = mapper.a_chi(df.copy()); df_out.head()
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore noise.
(i) Feed Forward
The simplest form of an autoencoder is a feedforward, non-recurrent neural network similar to single layer perceptrons that participate in multilayer perceptrons
from sklearn.preprocessing import minmax_scale
import tensorflow as tf
import numpy as np
def encoder_dataset(df, drop=None, dimesions=20):
if drop:
train_scaled = minmax_scale(df.drop(drop,axis=1).values, axis = 0)
else:
train_scaled = minmax_scale(df.values, axis = 0)
# define the number of encoding dimensions
encoding_dim = dimesions
# define the number of features
ncol = train_scaled.shape[1]
input_dim = tf.keras.Input(shape = (ncol, ))
# Encoder Layers
encoded1 = tf.keras.layers.Dense(3000, activation = 'relu')(input_dim)
encoded2 = tf.keras.layers.Dense(2750, activation = 'relu')(encoded1)
encoded3 = tf.keras.layers.Dense(2500, activation = 'relu')(encoded2)
encoded4 = tf.keras.layers.Dense(750, activation = 'relu')(encoded3)
encoded5 = tf.keras.layers.Dense(500, activation = 'relu')(encoded4)
encoded6 = tf.keras.layers.Dense(250, activation = 'relu')(encoded5)
encoded7 = tf.keras.layers.Dense(encoding_dim, activation = 'relu')(encoded6)
encoder = tf.keras.Model(inputs = input_dim, outputs = encoded7)
encoded_input = tf.keras.Input(shape = (encoding_dim, ))
encoded_train = pd.DataFrame(encoder.predict(train_scaled),index=df.index)
encoded_train = encoded_train.add_prefix('encoded_')
if drop:
encoded_train = pd.concat((df[drop],encoded_train),axis=1)
return encoded_train
df_out = mapper.encoder_dataset(df.copy(), ["Close_1"], 15); df_out.head()
df_out.head()
Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data.
(i) Local Linear Embedding
Locally Linear Embedding is a method of non-linear dimensionality reduction. It tries to reduce these n-Dimensions while trying to preserve the geometric features of the original non-linear feature structure.
from sklearn.manifold import LocallyLinearEmbedding
def lle_feat(df, drop=None, components=4):
if drop:
keep = df[drop]
df = df.drop(drop, axis=1)
embedding = LocallyLinearEmbedding(n_components=components)
em = embedding.fit_transform(df)
df = pd.DataFrame(em,index=df.index)
df = df.add_prefix('lle_')
if drop:
df = pd.concat((keep,df),axis=1)
return df
df_out = mapper.lle_feat(df.copy(),["Close_1"],4); df_out.head()
Most clustering techniques start with a bottom up approach: each observation starts in its own cluster, and clusters are successively merged together with some measure. Although these clustering techniques are typically used for observations, it can also be used for feature dimensionality reduction; especially hierarchical clustering techniques.
(i) Feature Agglomeration
Feature agglomerative uses clustering to group together features that look very similar, thus decreasing the number of features.
import numpy as np
from sklearn import datasets, cluster
def feature_agg(df, drop=None, components=4):
if drop:
keep = df[drop]
df = df.drop(drop, axis=1)
components = min(df.shape[1]-1,components)
agglo = cluster.FeatureAgglomeration(n_clusters=components)
agglo.fit(df)
df = pd.DataFrame(agglo.transform(df),index=df.index)
df = df.add_prefix('feagg_')
if drop:
return pd.concat((keep,df),axis=1)
else:
return df
df_out = mapper.feature_agg(df.copy(),["Close_1"],4 ); df_out.head()
Neighbouring points can be calculated using distance metrics like Hamming, Manhattan, Minkowski distance. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these.
(i) Nearest Neighbours
Unsupervised learner for implementing neighbor searches.
from sklearn.neighbors import NearestNeighbors
def neigh_feat(df, drop, neighbors=6):
if drop:
keep = df[drop]
df = df.drop(drop, axis=1)
components = min(df.shape[0]-1,neighbors)
neigh = NearestNeighbors(n_neighbors=neighbors)
neigh.fit(df)
neigh = neigh.kneighbors()[0]
df = pd.DataFrame(neigh, index=df.index)
df = df.add_prefix('neigh_')
if drop:
return pd.concat((keep,df),axis=1)
else:
return df
return df
df_out = mapper.neigh_feat(df.copy(),["Close_1"],4 ); df_out.head()
When working with extraction, you have decide the size of the time series history to take into account when calculating a collection of walk-forward feature values. To facilitate our extraction, we use an excellent package called TSfresh, and also some of their default features. For completeness, we also include 12 or so custom features to be added to the extraction pipeline.
The time series methods in the transformation section and the interaction section are similar to the methods we will uncover in the extraction section, however, for transformation and interaction methods the output is an entire new time series, whereas extraction methods takes as input multiple constructed time series and extracts a singular value from each time series to reconstruct an entirely new time series.
Some methods naturally fit better in one format over another, e.g., lags are too expensive for extraction; time series decomposition only has to be performed once, because it has a low level of 'leakage' so is better suited to transformation; and forecast methods attempt to predict multiple future training samples, so won't work with extraction that only delivers one value per time series. Furthermore all non time-series (cross-sectional) transformation and extraction techniques can not make use of extraction as it is solely a time-series method.
Lastly, when we want to double apply specific functions we can apply it as a transformation/interaction then all the extraction methods can be applied to this feature as well. For example, if we calculate a smoothing function (transformation) then all other extraction functions (median, entropy, linearity etc.) can now be applied to that smoothing function, including the application of the smoothing function itself, e.g., a double smooth, double lag, double filter etc. So separating these methods out give us great flexibility.
Decorator
def set_property(key, value):
"""
This method returns a decorator that sets the property key of the function to value
"""
def decorate_func(func):
setattr(func, key, value)
if func.__doc__ and key == "fctype":
func.__doc__ = func.__doc__ + "\n\n *This function is of type: " + value + "*\n"
return func
return decorate_func
You can calculate the linear, non-linear and absolute energy of a time series. In signal processing, the energy $E_S$ of a continuous-time signal $x(t)$ is defined as the area under the squared magnitude of the considered signal. Mathematically, $E_{s}=\langle x(t), x(t)\rangle=\int_{-\infty}^{\infty}|x(t)|^{2} d t$
(i) Absolute Energy
Returns the absolute energy of the time series which is the sum over the squared values
#-> In Package
def abs_energy(x):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
return np.dot(x, x)
extract.abs_energy(df["Close"])
Here we widely define distance measures as those that take a difference between attributes or series of datapoints.
(i) Complexity-Invariant Distance
This function calculator is an estimate for a time series complexity.
#-> In Package
def cid_ce(x, normalize):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
if normalize:
s = np.std(x)
if s!=0:
x = (x - np.mean(x))/s
else:
return 0.0
x = np.diff(x)
return np.sqrt(np.dot(x, x))
extract.cid_ce(df["Close"], True)
Many alternatives to differencing exists, one can for example take the difference of every other value, take the squared difference, take the fractional difference, or like our example, take the mean absolute difference.
(i) Mean Absolute Change
Returns the mean over the absolute differences between subsequent time series values.
#-> In Package
def mean_abs_change(x):
return np.mean(np.abs(np.diff(x)))
extract.mean_abs_change(df["Close"])
Features where the emphasis is on the rate of change.
(i) Mean Central Second Derivative
Returns the mean value of a central approximation of the second derivative
#-> In Package
def _roll(a, shift):
if not isinstance(a, np.ndarray):
a = np.asarray(a)
idx = shift % len(a)
return np.concatenate([a[-idx:], a[:-idx]])
def mean_second_derivative_central(x):
diff = (_roll(x, 1) - 2 * np.array(x) + _roll(x, -1)) / 2.0
return np.mean(diff[1:-1])
extract.mean_second_derivative_central(df["Close"])
Volatility is a statistical measure of the dispersion of a time-series.
(i) Variance Larger than Standard Deviation
#-> In Package
def variance_larger_than_standard_deviation(x):
y = np.var(x)
return y > np.sqrt(y)
extract.variance_larger_than_standard_deviation(df["Close"])
(ii) Variability Index
Variability Index is a way to measure how smooth or 'variable' a time series is.
var_index_param = {"Volume":df["Volume"].values, "Open": df["Open"].values}
@set_property("fctype", "combiner")
@set_property("custom", True)
def var_index(time,param=var_index_param):
final = []
keys = []
for key, magnitude in param.items():
w = 1.0 / np.power(np.subtract(time[1:], time[:-1]), 2)
w_mean = np.mean(w)
N = len(time)
sigma2 = np.var(magnitude)
S1 = sum(w * (magnitude[1:] - magnitude[:-1]) ** 2)
S2 = sum(w)
eta_e = (w_mean * np.power(time[N - 1] -
time[0], 2) * S1 / (sigma2 * S2 * N ** 2))
final.append(eta_e)
keys.append(key)
return {"Interact__{}".format(k): eta_e for eta_e, k in zip(final,keys) }
extract.var_index(df["Close"].values,var_index_param)
Features that emphasises a particular shape not ordinarily considered as a distribution statistic. Extends to derivations of the original time series too For example a feature looking at the sinusoidal shape of an autocorrelation plot.
(i) Symmetrical
Boolean variable denoting if the distribution of x looks symmetric.
#-> In Package
def symmetry_looking(x, param=[{"r": 0.2}]):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
mean_median_difference = np.abs(np.mean(x) - np.median(x))
max_min_difference = np.max(x) - np.min(x)
return [("r_{}".format(r["r"]), mean_median_difference < (r["r"] * max_min_difference))
for r in param]
extract.symmetry_looking(df["Close"])
Looking at the occurrence, and reoccurence of defined values.
(i) Has Duplicate Max
#-> In Package
def has_duplicate_max(x):
"""
Checks if the maximum value of x is observed more than once
:param x: the time series to calculate the feature of
:type x: numpy.ndarray
:return: the value of this feature
:return type: bool
"""
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
return np.sum(x == np.max(x)) >= 2
extract.has_duplicate_max(df["Close"])
Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay.
(i) Partial Autocorrelation
Partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.
#-> In Package
from statsmodels.tsa.stattools import acf, adfuller, pacf
def partial_autocorrelation(x, param=[{"lag": 1}]):
# Check the difference between demanded lags by param and possible lags to calculate (depends on len(x))
max_demanded_lag = max([lag["lag"] for lag in param])
n = len(x)
# Check if list is too short to make calculations
if n <= 1:
pacf_coeffs = [np.nan] * (max_demanded_lag + 1)
else:
if (n <= max_demanded_lag):
max_lag = n - 1
else:
max_lag = max_demanded_lag
pacf_coeffs = list(pacf(x, method="ld", nlags=max_lag))
pacf_coeffs = pacf_coeffs + [np.nan] * max(0, (max_demanded_lag - max_lag))
return [("lag_{}".format(lag["lag"]), pacf_coeffs[lag["lag"]]) for lag in param]
extract.partial_autocorrelation(df["Close"])
Stochastic refers to a randomly determined process. Any features trying to capture stochasticity by degree or type are included under this branch.
(i) Augmented Dickey Fuller
The Augmented Dickey-Fuller test is a hypothesis test which checks whether a unit root is present in a time series sample.
#-> In Package
def augmented_dickey_fuller(x, param=[{"attr": "teststat"}]):
res = None
try:
res = adfuller(x)
except LinAlgError:
res = np.NaN, np.NaN, np.NaN
except ValueError: # occurs if sample size is too small
res = np.NaN, np.NaN, np.NaN
except MissingDataError: # is thrown for e.g. inf or nan in the data
res = np.NaN, np.NaN, np.NaN
return [('attr_"{}"'.format(config["attr"]),
res[0] if config["attr"] == "teststat"
else res[1] if config["attr"] == "pvalue"
else res[2] if config["attr"] == "usedlag" else np.NaN)
for config in param]
extract.augmented_dickey_fuller(df["Close"])
(i) Median of Magnitudes Skew
@set_property("fctype", "simple")
@set_property("custom", True)
def gskew(x):
interpolation="nearest"
median_mag = np.median(x)
F_3_value = np.percentile(x, 3, interpolation=interpolation)
F_97_value = np.percentile(x, 97, interpolation=interpolation)
skew = (np.median(x[x <= F_3_value]) +
np.median(x[x >= F_97_value]) - 2 * median_mag)
return skew
extract.gskew(df["Close"])
(ii) Stetson Mean
An iteratively weighted mean used in the Stetson variability index
stestson_param = {"weight":100., "alpha":2., "beta":2., "tol":1.e-6, "nmax":20}
@set_property("fctype", "combiner")
@set_property("custom", True)
def stetson_mean(x, param=stestson_param):
weight= stestson_param["weight"]
alpha= stestson_param["alpha"]
beta = stestson_param["beta"]
tol= stestson_param["tol"]
nmax= stestson_param["nmax"]
mu = np.median(x)
for i in range(nmax):
resid = x - mu
resid_err = np.abs(resid) * np.sqrt(weight)
weight1 = weight / (1. + (resid_err / alpha)**beta)
weight1 /= weight1.mean()
diff = np.mean(x * weight1) - mu
mu += diff
if (np.abs(diff) < tol*np.abs(mu) or np.abs(diff) < tol):
break
return mu
extract.stetson_mean(df["Close"])
(i) Lenght
#-> In Package
def length(x):
return len(x)
extract.length(df["Close"])
(i) Count Above Mean
Returns the number of values in x that are higher than the mean of x
#-> In Package
def count_above_mean(x):
m = np.mean(x)
return np.where(x > m)[0].size
extract.count_above_mean(df["Close"])
(i) Longest Strike Below Mean
Returns the length of the longest consecutive subsequence in x that is smaller than the mean of x
#-> In Package
import itertools
def get_length_sequences_where(x):
if len(x) == 0:
return [0]
else:
res = [len(list(group)) for value, group in itertools.groupby(x) if value == 1]
return res if len(res) > 0 else [0]
def longest_strike_below_mean(x):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
return np.max(get_length_sequences_where(x <= np.mean(x))) if x.size > 0 else 0
extract.longest_strike_below_mean(df["Close"])
(ii) Wozniak
This is an astronomical feature, we count the number of three consecutive data points that are brighter or fainter than $2σ$ and normalize the number by $N−2$
woz_param = [{"consecutiveStar": n} for n in [2, 4]]
@set_property("fctype", "combiner")
@set_property("custom", True)
def wozniak(magnitude, param=woz_param):
iters = []
for consecutiveStar in [stars["consecutiveStar"] for stars in param]:
N = len(magnitude)
if N < consecutiveStar:
return 0
sigma = np.std(magnitude)
m = np.mean(magnitude)
count = 0
for i in range(N - consecutiveStar + 1):
flag = 0
for j in range(consecutiveStar):
if(magnitude[i + j] > m + 2 * sigma or
magnitude[i + j] < m - 2 * sigma):
flag = 1
else:
flag = 0
break
if flag:
count = count + 1
iters.append(count * 1.0 / (N - consecutiveStar + 1))
return [("consecutiveStar_{}".format(config["consecutiveStar"]), iters[en] ) for en, config in enumerate(param)]
extract.wozniak(df["Close"])
(i) Last location of Maximum
Returns the relative last location of the maximum value of x. last_location_of_minimum(x),
#-> In Package
def last_location_of_maximum(x):
x = np.asarray(x)
return 1.0 - np.argmax(x[::-1]) / len(x) if len(x) > 0 else np.NaN
extract.last_location_of_maximum(df["Close"])
Any coefficient that are obtained from a model that might help in the prediction problem. For example here we might include coefficients of polynomial $h(x)$, which has been fitted to the deterministic dynamics of Langevin model.
(i) FFT Coefficient
Calculates the fourier coefficients of the one-dimensional discrete Fourier Transform for real input.
#-> In Package
def fft_coefficient(x, param = [{"coeff": 10, "attr": "real"}]):
assert min([config["coeff"] for config in param]) >= 0, "Coefficients must be positive or zero."
assert set([config["attr"] for config in param]) <= set(["imag", "real", "abs", "angle"]), \
'Attribute must be "real", "imag", "angle" or "abs"'
fft = np.fft.rfft(x)
def complex_agg(x, agg):
if agg == "real":
return x.real
elif agg == "imag":
return x.imag
elif agg == "abs":
return np.abs(x)
elif agg == "angle":
return np.angle(x, deg=True)
res = [complex_agg(fft[config["coeff"]], config["attr"]) if config["coeff"] < len(fft)
else np.NaN for config in param]
index = [('coeff_{}__attr_"{}"'.format(config["coeff"], config["attr"]),res[0]) for config in param]
return index
extract.fft_coefficient(df["Close"])
(ii) AR Coefficient
This feature calculator fits the unconditional maximum likelihood of an autoregressive AR(k) process.
#-> In Package
from statsmodels.tsa.ar_model import AR
def ar_coefficient(x, param=[{"coeff": 5, "k": 5}]):
calculated_ar_params = {}
x_as_list = list(x)
calculated_AR = AR(x_as_list)
res = {}
for parameter_combination in param:
k = parameter_combination["k"]
p = parameter_combination["coeff"]
column_name = "k_{}__coeff_{}".format(k, p)
if k not in calculated_ar_params:
try:
calculated_ar_params[k] = calculated_AR.fit(maxlag=k, solver="mle").params
except (LinAlgError, ValueError):
calculated_ar_params[k] = [np.NaN]*k
mod = calculated_ar_params[k]
if p <= k:
try:
res[column_name] = mod[p]
except IndexError:
res[column_name] = 0
else:
res[column_name] = np.NaN
return [(key, value) for key, value in res.items()]
extract.ar_coefficient(df["Close"])
This includes finding normal quantile values in the series, but also quantile derived measures like change quantiles and index max quantiles.
(i) Index Mass Quantile
The relative index $i$ where $q%$ of the mass of the time series $x$ lie left of $i$ .
#-> In Package
def index_mass_quantile(x, param=[{"q": 0.3}]):
x = np.asarray(x)
abs_x = np.abs(x)
s = sum(abs_x)
if s == 0:
# all values in x are zero or it has length 0
return [("q_{}".format(config["q"]), np.NaN) for config in param]
else:
# at least one value is not zero
mass_centralized = np.cumsum(abs_x) / s
return [("q_{}".format(config["q"]), (np.argmax(mass_centralized >= config["q"])+1)/len(x)) for config in param]
extract.index_mass_quantile(df["Close"])
(i) Number of CWT Peaks
This feature calculator searches for different peaks in x.
from scipy.signal import cwt, find_peaks_cwt, ricker, welch
cwt_param = [ka for ka in [2,6,9]]
@set_property("fctype", "combiner")
@set_property("custom", True)
def number_cwt_peaks(x, param=cwt_param):
return [("CWTPeak_{}".format(n), len(find_peaks_cwt(vector=x, widths=np.array(list(range(1, n + 1))), wavelet=ricker))) for n in param]
extract.number_cwt_peaks(df["Close"])
The density, and more specifically the power spectral density of the signal describes the power present in the signal as a function of frequency, per unit frequency.
(i) Cross Power Spectral Density
This feature calculator estimates the cross power spectral density of the time series $x$ at different frequencies.
#-> In Package
def spkt_welch_density(x, param=[{"coeff": 5}]):
freq, pxx = welch(x, nperseg=min(len(x), 256))
coeff = [config["coeff"] for config in param]
indices = ["coeff_{}".format(i) for i in coeff]
if len(pxx) <= np.max(coeff): # There are fewer data points in the time series than requested coefficients
# filter coefficients that are not contained in pxx
reduced_coeff = [coefficient for coefficient in coeff if len(pxx) > coefficient]
not_calculated_coefficients = [coefficient for coefficient in coeff
if coefficient not in reduced_coeff]
# Fill up the rest of the requested coefficients with np.NaNs
return zip(indices, list(pxx[reduced_coeff]) + [np.NaN] * len(not_calculated_coefficients))
else:
return pxx[coeff].ravel()[0]
extract.spkt_welch_density(df["Close"])
Any measure of linearity that might make use of something like the linear least-squares regression for the values of the time series. This can be against the time series minus one and many other alternatives.
(i) Linear Trend Time Wise
Calculate a linear least-squares regression for the values of the time series versus the sequence from 0 to length of the time series minus one.
from scipy.stats import linregress
#-> In Package
def linear_trend_timewise(x, param= [{"attr": "pvalue"}]):
ix = x.index
# Get differences between each timestamp and the first timestamp in seconds.
# Then convert to hours and reshape for linear regression
times_seconds = (ix - ix[0]).total_seconds()
times_hours = np.asarray(times_seconds / float(3600))
linReg = linregress(times_hours, x.values)
return [("attr_\"{}\"".format(config["attr"]), getattr(linReg, config["attr"]))
for config in param]
extract.linear_trend_timewise(df["Close"])
(i) Schreiber Non-Linearity
#-> In Package
def c3(x, lag=3):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
n = x.size
if 2 * lag >= n:
return 0
else:
return np.mean((_roll(x, 2 * -lag) * _roll(x, -lag) * x)[0:(n - 2 * lag)])
extract.c3(df["Close"])
Any feature looking at the complexity of a time series. This is typically used in medical signal disciplines (EEG, EMG). There are multiple types of measures like spectral entropy, permutation entropy, sample entropy, approximate entropy, Lempel-Ziv complexity and other. This includes entropy measures and there derivations.
(i) Binned Entropy
Bins the values of x into max_bins equidistant bins.
#-> In Package
def binned_entropy(x, max_bins=10):
if not isinstance(x, (np.ndarray, pd.Series)):
x = np.asarray(x)
hist, bin_edges = np.histogram(x, bins=max_bins)
probs = hist / x.size
return - np.sum(p * np.math.log(p) for p in probs if p != 0)
extract.binned_entropy(df["Close"])
(ii) SVD Entropy
SVD entropy is an indicator of the number of eigenvectors that are needed for an adequate explanation of the data set.
svd_param = [{"Tau": ta, "DE": de}
for ta in [4]
for de in [3,6]]
def _embed_seq(X,Tau,D):
N =len(X)
if D * Tau > N:
print("Cannot build such a matrix, because D * Tau > N")
exit()
if Tau<1:
print("Tau has to be at least 1")
exit()
Y= np.zeros((N - (D - 1) * Tau, D))
for i in range(0, N - (D - 1) * Tau):
for j in range(0, D):
Y[i][j] = X[i + j * Tau]
return Y
@set_property("fctype", "combiner")
@set_property("custom", True)
def svd_entropy(epochs, param=svd_param):
axis=0
final = []
for par in param:
def svd_entropy_1d(X, Tau, DE):
Y = _embed_seq(X, Tau, DE)
W = np.linalg.svd(Y, compute_uv=0)
W /= sum(W) # normalize singular values
return -1 * np.sum(W * np.log(W))
Tau = par["Tau"]
DE = par["DE"]
final.append(np.apply_along_axis(svd_entropy_1d, axis, epochs, Tau, DE).ravel()[0])
return [("Tau_\"{}\"__De_{}\"".format(par["Tau"], par["DE"]), final[en]) for en, par in enumerate(param)]
extract.svd_entropy(df["Close"].values)
(iii) Hjort
The Complexity parameter represents the change in frequency. The parameter compares the signal's similarity to a pure sine wave, where the value converges to 1 if the signal is more similar.
def _hjorth_mobility(epochs):
diff = np.diff(epochs, axis=0)
sigma0 = np.std(epochs, axis=0)
sigma1 = np.std(diff, axis=0)
return np.divide(sigma1, sigma0)
@set_property("fctype", "simple")
@set_property("custom", True)
def hjorth_complexity(epochs):
diff1 = np.diff(epochs, axis=0)
diff2 = np.diff(diff1, axis=0)
sigma1 = np.std(diff1, axis=0)
sigma2 = np.std(diff2, axis=0)
return np.divide(np.divide(sigma2, sigma1), _hjorth_mobility(epochs))
extract.hjorth_complexity(df["Close"])
Fixed points and equilibria as identified from fitted models.
(i) Langevin Fixed Points
Largest fixed point of dynamics $max\ {h(x)=0}$ estimated from polynomial $h(x)$ which has been fitted to the deterministic dynamics of Langevin model
#-> In Package
def _estimate_friedrich_coefficients(x, m, r):
assert m > 0, "Order of polynomial need to be positive integer, found {}".format(m)
df = pd.DataFrame({'signal': x[:-1], 'delta': np.diff(x)})
try:
df['quantiles'] = pd.qcut(df.signal, r)
except ValueError:
return [np.NaN] * (m + 1)
quantiles = df.groupby('quantiles')
result = pd.DataFrame({'x_mean': quantiles.signal.mean(), 'y_mean': quantiles.delta.mean()})
result.dropna(inplace=True)
try:
return np.polyfit(result.x_mean, result.y_mean, deg=m)
except (np.linalg.LinAlgError, ValueError):
return [np.NaN] * (m + 1)
def max_langevin_fixed_point(x, r=3, m=30):
coeff = _estimate_friedrich_coefficients(x, m, r)
try:
max_fixed_point = np.max(np.real(np.roots(coeff)))
except (np.linalg.LinAlgError, ValueError):
return np.nan
return max_fixed_point
extract.max_langevin_fixed_point(df["Close"])
Features derived from peaked values in either the positive or negative direction.
(i) Willison Amplitude
This feature is defined as the amount of times that the change in the signal amplitude exceeds a threshold.
will_param = [ka for ka in [0.2,3]]
@set_property("fctype", "combiner")
@set_property("custom", True)
def willison_amplitude(X, param=will_param):
return [("Thresh_{}".format(n),np.sum(np.abs(np.diff(X)) >= n)) for n in param]
extract.willison_amplitude(df["Close"])
(ii) Percent Amplitude
Returns the largest distance from the median value, measured as a percentage of the median
perc_param = [{"base":ba, "exponent":exp} for ba in [3,5] for exp in [-0.1,-0.2]]
@set_property("fctype", "combiner")
@set_property("custom", True)
def percent_amplitude(x, param =perc_param):
final = []
for par in param:
linear_scale_data = par["base"] ** (par["exponent"] * x)
y_max = np.max(linear_scale_data)
y_min = np.min(linear_scale_data)
y_med = np.median(linear_scale_data)
final.append(max(abs((y_max - y_med) / y_med), abs((y_med - y_min) / y_med)))
return [("Base_{}__Exp{}".format(pa["base"],pa["exponent"]),fin) for fin, pa in zip(final,param)]
extract.percent_amplitude(df["Close"])
(i) Cadence Probability
Given the observed distribution of time lags cads, compute the probability that the next observation occurs within time minutes of an arbitrary epoch.
#-> fixes required
import scipy.stats as stats
cad_param = [0.1,1000, -234]
@set_property("fctype", "combiner")
@set_property("custom", True)
def cad_prob(cads, param=cad_param):
return [("time_{}".format(time), stats.percentileofscore(cads, float(time) / (24.0 * 60.0)) / 100.0) for time in param]
extract.cad_prob(df["Close"])
Calculates the crossing of the series with other defined values or series.
(i) Zero Crossing Derivative
The positioning of the edge point is located at the zero crossing of the first derivative of the filter.
zero_param = [0.01, 8]
@set_property("fctype", "combiner")
@set_property("custom", True)
def zero_crossing_derivative(epochs, param=zero_param):
diff = np.diff(epochs)
norm = diff-diff.mean()
return [("e_{}".format(e), np.apply_along_axis(lambda epoch: np.sum(((epoch[:-5] <= e) & (epoch[5:] > e))), 0, norm).ravel()[0]) for e in param]
extract.zero_crossing_derivative(df["Close"])
These features are again from medical signal sciences, but under this category we would include values such as fluctuation based entropy measures, fluctuation of correlation dynamics, and co-fluctuations.
(i) Detrended Fluctuation Analysis (DFA)
DFA Calculate the Hurst exponent using DFA analysis.
from scipy.stats import kurtosis as _kurt
from scipy.stats import skew as _skew
import numpy as np
@set_property("fctype", "simple")
@set_property("custom", True)
def detrended_fluctuation_analysis(epochs):
def dfa_1d(X, Ave=None, L=None):
X = np.array(X)
if Ave is None:
Ave = np.mean(X)
Y = np.cumsum(X)
Y -= Ave
if L is None:
L = np.floor(len(X) * 1 / (
2 ** np.array(list(range(1, int(np.log2(len(X))) - 4))))
)
F = np.zeros(len(L)) # F(n) of different given box length n
for i in range(0, len(L)):
n = int(L[i]) # for each box length L[i]
if n == 0:
print("time series is too short while the box length is too big")
print("abort")
exit()
for j in range(0, len(X), n): # for each box
if j + n < len(X):
c = list(range(j, j + n))
# coordinates of time in the box
c = np.vstack([c, np.ones(n)]).T
# the value of data in the box
y = Y[j:j + n]
# add residue in this box
F[i] += np.linalg.lstsq(c, y, rcond=None)[1]
F[i] /= ((len(X) / n) * n)
F = np.sqrt(F)
stacked = np.vstack([np.log(L), np.ones(len(L))])
stacked_t = stacked.T
Alpha = np.linalg.lstsq(stacked_t, np.log(F), rcond=None)
return Alpha[0][0]
return np.apply_along_axis(dfa_1d, 0, epochs).ravel()[0]
extract.detrended_fluctuation_analysis(df["Close"])
Closely related to entropy and complexity measures. Any measure that attempts to measure the amount of information from an observable variable is included here.
(i) Fisher Information
Fisher information is a statistical information concept distinct from, and earlier than, Shannon information in communication theory.
def _embed_seq(X, Tau, D):
shape = (X.size - Tau * (D - 1), D)
strides = (X.itemsize, Tau * X.itemsize)
return np.lib.stride_tricks.as_strided(X, shape=shape, strides=strides)
fisher_param = [{"Tau":ta, "DE":de} for ta in [3,15] for de in [10,5]]
@set_property("fctype", "combiner")
@set_property("custom", True)
def fisher_information(epochs, param=fisher_param):
def fisher_info_1d(a, tau, de):
# taken from pyeeg improvements
mat = _embed_seq(a, tau, de)
W = np.linalg.svd(mat, compute_uv=False)
W /= sum(W) # normalize singular values
FI_v = (W[1:] - W[:-1]) ** 2 / W[:-1]
return np.sum(FI_v)
return [("Tau_{}__DE_{}".format(par["Tau"], par["DE"]),np.apply_along_axis(fisher_info_1d, 0, epochs, par["Tau"], par["DE"]).ravel()[0]) for par in param]
extract.fisher_information(df["Close"])
In mathematics, more specifically in fractal geometry, a fractal dimension is a ratio providing a statistical index of complexity comparing how detail in a pattern (strictly speaking, a fractal pattern) changes with the scale at which it is measured.
(i) Highuchi Fractal
Compute a Higuchi Fractal Dimension of a time series
hig_para = [{"Kmax": 3},{"Kmax": 5}]
@set_property("fctype", "combiner")
@set_property("custom", True)
def higuchi_fractal_dimension(epochs, param=hig_para):
def hfd_1d(X, Kmax):
L = []
x = []
N = len(X)
for k in range(1, Kmax):
Lk = []
for m in range(0, k):
Lmk = 0
for i in range(1, int(np.floor((N - m) / k))):
Lmk += abs(X[m + i * k] - X[m + i * k - k])
Lmk = Lmk * (N - 1) / np.floor((N - m) / float(k)) / k
Lk.append(Lmk)
L.append(np.log(np.mean(Lk)))
x.append([np.log(float(1) / k), 1])
(p, r1, r2, s) = np.linalg.lstsq(x, L, rcond=None)
return p[0]
return [("Kmax_{}".format(config["Kmax"]), np.apply_along_axis(hfd_1d, 0, epochs, config["Kmax"]).ravel()[0] ) for config in param]
extract.higuchi_fractal_dimension(df["Close"])
(ii) Petrosian Fractal
Compute a Petrosian Fractal Dimension of a time series.
@set_property("fctype", "simple")
@set_property("custom", True)
def petrosian_fractal_dimension(epochs):
def pfd_1d(X, D=None):
# taken from pyeeg
"""Compute Petrosian Fractal Dimension of a time series from either two
cases below:
1. X, the time series of type list (default)
2. D, the first order differential sequence of X (if D is provided,
recommended to speed up)
In case 1, D is computed using Numpy's difference function.
To speed up, it is recommended to compute D before calling this function
because D may also be used by other functions whereas computing it here
again will slow down.
"""
if D is None:
D = np.diff(X)
D = D.tolist()
N_delta = 0 # number of sign changes in derivative of the signal
for i in range(1, len(D)):
if D[i] * D[i - 1] < 0:
N_delta += 1
n = len(X)
return np.log10(n) / (np.log10(n) + np.log10(n / n + 0.4 * N_delta))
return np.apply_along_axis(pfd_1d, 0, epochs).ravel()[0]
extract.petrosian_fractal_dimension(df["Close"])
(i) Hurst Exponent
The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases.
@set_property("fctype", "simple")
@set_property("custom", True)
def hurst_exponent(epochs):
def hurst_1d(X):
X = np.array(X)
N = X.size
T = np.arange(1, N + 1)
Y = np.cumsum(X)
Ave_T = Y / T
S_T = np.zeros(N)
R_T = np.zeros(N)
for i in range(N):
S_T[i] = np.std(X[:i + 1])
X_T = Y - T * Ave_T[i]
R_T[i] = np.ptp(X_T[:i + 1])
for i in range(1, len(S_T)):
if np.diff(S_T)[i - 1] != 0:
break
for j in range(1, len(R_T)):
if np.diff(R_T)[j - 1] != 0:
break
k = max(i, j)
assert k < 10, "rethink it!"
R_S = R_T[k:] / S_T[k:]
R_S = np.log(R_S)
n = np.log(T)[k:]
A = np.column_stack((n, np.ones(n.size)))
[m, c] = np.linalg.lstsq(A, R_S, rcond=None)[0]
H = m
return H
return np.apply_along_axis(hurst_1d, 0, epochs).ravel()[0]
extract.hurst_exponent(df["Close"])
(ii) Largest Lyauponov Exponent
In mathematics the Lyapunov exponent or Lyapunov characteristic exponent of a dynamical system is a quantity that characterizes the rate of separation of infinitesimally close trajectories.
def _embed_seq(X, Tau, D):
shape = (X.size - Tau * (D - 1), D)
strides = (X.itemsize, Tau * X.itemsize)
return np.lib.stride_tricks.as_strided(X, shape=shape, strides=strides)
lyaup_param = [{"Tau":4, "n":3, "T":10, "fs":9},{"Tau":8, "n":7, "T":15, "fs":6}]
@set_property("fctype", "combiner")
@set_property("custom", True)
def largest_lyauponov_exponent(epochs, param=lyaup_param):
def LLE_1d(x, tau, n, T, fs):
Em = _embed_seq(x, tau, n)
M = len(Em)
A = np.tile(Em, (len(Em), 1, 1))
B = np.transpose(A, [1, 0, 2])
square_dists = (A - B) ** 2 # square_dists[i,j,k] = (Em[i][k]-Em[j][k])^2
D = np.sqrt(square_dists[:, :, :].sum(axis=2)) # D[i,j] = ||Em[i]-Em[j]||_2
# Exclude elements within T of the diagonal
band = np.tri(D.shape[0], k=T) - np.tri(D.shape[0], k=-T - 1)
band[band == 1] = np.inf
neighbors = (D + band).argmin(axis=0) # nearest neighbors more than T steps away
# in_bounds[i,j] = (i+j <= M-1 and i+neighbors[j] <= M-1)
inc = np.tile(np.arange(M), (M, 1))
row_inds = (np.tile(np.arange(M), (M, 1)).T + inc)
col_inds = (np.tile(neighbors, (M, 1)) + inc.T)
in_bounds = np.logical_and(row_inds <= M - 1, col_inds <= M - 1)
# Uncomment for old (miscounted) version
# in_bounds = numpy.logical_and(row_inds < M - 1, col_inds < M - 1)
row_inds[~in_bounds] = 0
col_inds[~in_bounds] = 0
# neighbor_dists[i,j] = ||Em[i+j]-Em[i+neighbors[j]]||_2
neighbor_dists = np.ma.MaskedArray(D[row_inds, col_inds], ~in_bounds)
J = (~neighbor_dists.mask).sum(axis=1) # number of in-bounds indices by row
# Set invalid (zero) values to 1; log(1) = 0 so sum is unchanged
neighbor_dists[neighbor_dists == 0] = 1
# !!! this fixes the divide by zero in log error !!!
neighbor_dists.data[neighbor_dists.data == 0] = 1
d_ij = np.sum(np.log(neighbor_dists.data), axis=1)
mean_d = d_ij[J > 0] / J[J > 0]
x = np.arange(len(mean_d))
X = np.vstack((x, np.ones(len(mean_d)))).T
[m, c] = np.linalg.lstsq(X, mean_d, rcond=None)[0]
Lexp = fs * m
return Lexp
return [("Tau_{}__n_{}__T_{}__fs_{}".format(par["Tau"], par["n"], par["T"], par["fs"]), np.apply_along_axis(LLE_1d, 0, epochs, par["Tau"], par["n"], par["T"], par["fs"]).ravel()[0]) for par in param]
extract.largest_lyauponov_exponent(df["Close"])
Spectral analysis is analysis in terms of a spectrum of frequencies or related quantities such as energies, eigenvalues, etc.
(i) Whelch Method
The Whelch Method is an approach for spectral density estimation. It is used in physics, engineering, and applied mathematics for estimating the power of a signal at different frequencies.
from scipy import signal, integrate
whelch_param = [100,200]
@set_property("fctype", "combiner")
@set_property("custom", True)
def whelch_method(data, param=whelch_param):
final = []
for Fs in param:
f, pxx = signal.welch(data, fs=Fs, nperseg=1024)
d = {'psd': pxx, 'freqs': f}
df = pd.DataFrame(data=d)
dfs = df.sort_values(['psd'], ascending=False)
rows = dfs.iloc[:10]
final.append(rows['freqs'].mean())
return [("Fs_{}".format(pa),fin) for pa, fin in zip(param,final)]
extract.whelch_method(df["Close"])
#-> Basically same as above
freq_param = [{"fs":50, "sel":15},{"fs":200, "sel":20}]
@set_property("fctype", "combiner")
@set_property("custom", True)
def find_freq(serie, param=freq_param):
final = []
for par in param:
fft0 = np.fft.rfft(serie*np.hanning(len(serie)))
freqs = np.fft.rfftfreq(len(serie), d=1.0/par["fs"])
fftmod = np.array([np.sqrt(fft0[i].real**2 + fft0[i].imag**2) for i in range(0, len(fft0))])
d = {'fft': fftmod, 'freq': freqs}
df = pd.DataFrame(d)
hop = df.sort_values(['fft'], ascending=False)
rows = hop.iloc[:par["sel"]]
final.append(rows['freq'].mean())
return [("Fs_{}__sel{}".format(pa["fs"],pa["sel"]),fin) for pa, fin in zip(param,final)]
extract.find_freq(df["Close"])
(i) Flux Percentile
Flux (or radiant flux) is the total amount of energy that crosses a unit area per unit time. Flux is an astronomical value, measured in joules per square metre per second (joules/m2/s), or watts per square metre. Here we provide the ratio of flux percentiles.
#-> In Package
import math
def flux_perc(magnitude):
sorted_data = np.sort(magnitude)
lc_length = len(sorted_data)
F_60_index = int(math.ceil(0.60 * lc_length))
F_40_index = int(math.ceil(0.40 * lc_length))
F_5_index = int(math.ceil(0.05 * lc_length))
F_95_index = int(math.ceil(0.95 * lc_length))
F_40_60 = sorted_data[F_60_index] - sorted_data[F_40_index]
F_5_95 = sorted_data[F_95_index] - sorted_data[F_5_index]
F_mid20 = F_40_60 / F_5_95
return {"FluxPercentileRatioMid20": F_mid20}
extract.flux_perc(df["Close"])
(i) Range of Cummulative Sum
@set_property("fctype", "simple")
@set_property("custom", True)
def range_cum_s(magnitude):
sigma = np.std(magnitude)
N = len(magnitude)
m = np.mean(magnitude)
s = np.cumsum(magnitude - m) * 1.0 / (N * sigma)
R = np.max(s) - np.min(s)
return {"Rcs": R}
extract.range_cum_s(df["Close"])
Structural features, potential placeholders for future research.
(i) Structure Function
The structure function of rotation measures (RMs) contains information on electron density and magnetic field fluctuations when used i astronomy. It becomes a custom feature when used with your own unique time series data.
from scipy.interpolate import interp1d
struct_param = {"Volume":df["Volume"].values, "Open": df["Open"].values}
@set_property("fctype", "combiner")
@set_property("custom", True)
def structure_func(time, param=struct_param):
dict_final = {}
for key, magnitude in param.items():
dict_final[key] = []
Nsf, Np = 100, 100
sf1, sf2, sf3 = np.zeros(Nsf), np.zeros(Nsf), np.zeros(Nsf)
f = interp1d(time, magnitude)
time_int = np.linspace(np.min(time), np.max(time), Np)
mag_int = f(time_int)
for tau in np.arange(1, Nsf):
sf1[tau - 1] = np.mean(
np.power(np.abs(mag_int[0:Np - tau] - mag_int[tau:Np]), 1.0))
sf2[tau - 1] = np.mean(
np.abs(np.power(
np.abs(mag_int[0:Np - tau] - mag_int[tau:Np]), 2.0)))
sf3[tau - 1] = np.mean(
np.abs(np.power(
np.abs(mag_int[0:Np - tau] - mag_int[tau:Np]), 3.0)))
sf1_log = np.log10(np.trim_zeros(sf1))
sf2_log = np.log10(np.trim_zeros(sf2))
sf3_log = np.log10(np.trim_zeros(sf3))
if len(sf1_log) and len(sf2_log):
m_21, b_21 = np.polyfit(sf1_log, sf2_log, 1)
else:
m_21 = np.nan
if len(sf1_log) and len(sf3_log):
m_31, b_31 = np.polyfit(sf1_log, sf3_log, 1)
else:
m_31 = np.nan
if len(sf2_log) and len(sf3_log):
m_32, b_32 = np.polyfit(sf2_log, sf3_log, 1)
else:
m_32 = np.nan
dict_final[key].append(m_21)
dict_final[key].append(m_31)
dict_final[key].append(m_32)
return [("StructureFunction_{}__m_{}".format(key, name), li) for key, lis in dict_final.items() for name, li in zip([21,31,32], lis)]
struct_param = {"Volume":df["Volume"].values, "Open": df["Open"].values}
extract.structure_func(df["Close"],struct_param)
(i) Kurtosis
#-> In Package
def kurtosis(x):
if not isinstance(x, pd.Series):
x = pd.Series(x)
return pd.Series.kurtosis(x)
extract.kurtosis(df["Close"])
(ii) Stetson Kurtosis
@set_property("fctype", "simple")
@set_property("custom", True)
def stetson_k(x):
"""A robust kurtosis statistic."""
n = len(x)
x0 = stetson_mean(x, 1./20**2)
delta_x = np.sqrt(n / (n - 1.)) * (x - x0) / 20
ta = 1. / 0.798 * np.mean(np.abs(delta_x)) / np.sqrt(np.mean(delta_x**2))
return ta
extract.stetson_k(df["Close"])
Time-Series synthesisation (TSS) happens before the feature extraction step and Cross Sectional Synthesisation (CSS) happens after the feature extraction step. Currently I will only include a CSS package, in the future, I would further work on developing out this section. This area still has a lot of performance and stability issues. In the future it might be a more viable candidate to improve prediction.
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
def model(df_final):
model = LGBMRegressor()
test = df_final.head(int(len(df_final)*0.4))
train = df_final[~df_final.isin(test)].dropna()
model = model.fit(train.drop(["Close_1"],axis=1),train["Close_1"])
preds = model.predict(test.drop(["Close_1"],axis=1))
test = df_final.head(int(len(df_final)*0.4))
train = df_final[~df_final.isin(test)].dropna()
model = model.fit(train.drop(["Close_1"],axis=1),train["Close_1"])
val = mean_squared_error(test["Close_1"],preds);
return val
pip install ctgan
from ctgan import CTGANSynthesizer
#discrete_columns = [""]
ctgan = CTGANSynthesizer()
ctgan.fit(df,epochs=10) #15
Random Benchmark
np.random.seed(1)
df_in = df.copy()
df_in["Close_1"] = np.random.permutation(df_in["Close_1"].values)
model(df_in)
Generated Performance
df_gen = ctgan.sample(len(df_in)*100)
model(df_gen)
As expected a cross-sectional technique, does not work well on time-series data, in the future, other methods will be investigated.
Here I will perform tabular agumenting methods on a small dataset single digit features and around 250 instances. This is not necessarily the best sized dataset to highlight the performance of tabular augmentation as some method like extraction would be overkill as it would lead to dimensionality problems. It is also good to know that there are close to infinite number of ways to perform these augmentation methods. In the future, automated augmentation methods can guide the experiment process.
The approach taken in this skeleton is to develop running models that are tested after each augmentation to highlight what methods might work well on this particular dataset. The metric we will use is mean squared error. In this implementation we do not have special hold-out sets.
The above framework of implementation will be consulted, but one still have to be strategic as to when you apply what function, and you have to make sure that you are processing your data with appropriate techniques (drop null values, fill null values) at the appropriate time.
Develop Model and Define Metric
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
def model(df_final):
model = LGBMRegressor()
test = df_final.head(int(len(df_final)*0.4))
train = df_final[~df_final.isin(test)].dropna()
model = model.fit(train.drop(["Close_1"],axis=1),train["Close_1"])
preds = model.predict(test.drop(["Close_1"],axis=1))
test = df_final.head(int(len(df_final)*0.4))
train = df_final[~df_final.isin(test)].dropna()
model = model.fit(train.drop(["Close_1"],axis=1),train["Close_1"])
val = mean_squared_error(test["Close_1"],preds);
return val
Reload Data
df = data_copy()
model(df)
302.61676570345287
(1) (7) (i) Transformation - Decomposition - Naive
## If Inferred Seasonality is Too Large Default to Five
seasons = transform.infer_seasonality(df["Close"],index=0)
df_out = transform.naive_dec(df.copy(), ["Close","Open"], freq=5)
model(df_out) #improvement
274.34477082783525
(1) (8) (i) Transformation - Filter - Baxter-King-Bandpass
df_out = transform.bkb(df_out, ["Close","Low"])
df_best = df_out.copy()
model(df_out) #improvement
267.1826850968307
(1) (3) (i) Transformation - Differentiation - Fractional
df_out = transform.fast_fracdiff(df_out, ["Close_BPF"],0.5)
model(df_out) #null
267.7083192402742
(1) (1) (i) Transformation - Scaling - Robust Scaler
df_out = df_out.dropna()
df_out = transform.robust_scaler(df_out, drop=["Close_1"])
model(df_out) #noisy
270.96980399571214
(2) (2) (i) Interactions - Operator - Multiplication/Division
df_out.head()
Close_1 | High | Low | Open | Close | Volume | Adj Close | Close_NDDT | Close_NDDS | Close_NDDR | Open_NDDT | Open_NDDS | Open_NDDR | Close_BPF | Low_BPF | Close_BPF_frac | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||||
2019-01-08 | 338.529999 | 1.018413 | 0.964048 | 1.096600 | 1.001175 | -0.162616 | 1.001175 | 0.832297 | 0.834964 | 1.335433 | 0.758743 | 0.691596 | 2.259884 | -2.534142 | -2.249135 | -3.593612 |
2019-01-09 | 344.970001 | 1.012068 | 1.023302 | 1.011466 | 1.042689 | -0.501798 | 1.042689 | 0.908963 | -0.165036 | 1.111346 | 0.835786 | 0.333361 | 1.129783 | -3.081959 | -2.776302 | -2.523465 |
2019-01-10 | 347.260010 | 1.035581 | 1.027563 | 0.996969 | 1.126762 | -0.367576 | 1.126762 | 1.029347 | 2.120026 | 0.853697 | 0.907588 | 0.000000 | 0.533777 | -2.052768 | -2.543449 | -0.747382 |
2019-01-11 | 334.399994 | 1.073153 | 1.120506 | 1.098313 | 1.156658 | -0.586571 | 1.156658 | 1.109144 | -5.156051 | 0.591990 | 1.002162 | -0.666639 | 0.608516 | -0.694642 | -0.831670 | 0.414063 |
2019-01-14 | 344.429993 | 0.999627 | 1.056991 | 1.102135 | 0.988773 | -0.541752 | 0.988773 | 1.107633 | 0.000000 | -0.660350 | 1.056302 | -0.915491 | 0.263025 | -0.645590 | -0.116166 | -0.118012 |
df_out = interact.muldiv(df_out, ["Close","Open_NDDS","Low_BPF"])
model(df_out) #noisy
285.6420643864313
df_r = df_out.copy()
(2) (6) (i) Interactions - Speciality - Technical
import ta
df = interact.tech(df)
df_out = pd.merge(df_out, df.iloc[:,7:], left_index=True, right_index=True, how="left")
Clean Dataframe and Metric
"""Droping column where missing values are above a threshold"""
df_out = df_out.dropna(thresh = len(df_out)*0.95, axis = "columns")
df_out = df_out.dropna()
df_out = df_out.replace([np.inf, -np.inf], np.nan).ffill().fillna(0)
close = df_out["Close"].copy()
df_d = df_out.copy()
model(df_out) #improve
592.52971755184
(3) (1) (i) Mapping - Eigen Decomposition - PCA
from sklearn.decomposition import PCA, IncrementalPCA, KernelPCA
df_out = transform.robust_scaler(df_out, drop=["Close_1"])
df_out = df_out.replace([np.inf, -np.inf], np.nan).ffill().fillna(0)
df_out = mapper.pca_feature(df_out, drop_cols=["Close_1"], variance_or_components=0.9, n_components=8,non_linear=False)
model(df_out) #noisy but not too bad given the 10 fold dimensionality reduction
687.158330455884
(4) Extracting
Here at first, I show the functions that have been added to the DeltaPy fork of tsfresh. You have to add your own personal adjustments based on the features you would like to construct. I am using self-developed features, but you can also use TSFresh's community functions.
The following files have been appropriately ammended (Get in contact for advice)
(4) (10) (i) Extracting - Averages - GSkew
extract.gskew(df_out["PCA_1"])
-0.7903067336449059
(4) (21) (ii) Extracting - Entropy - SVD Entropy
svd_param = [{"Tau": ta, "DE": de}
for ta in [4]
for de in [3,6]]
extract.svd_entropy(df_out["PCA_1"],svd_param)
[('Tau_"4"__De_3"', 0.7234823323374294),
('Tau_"4"__De_6"', 1.3014347840145244)]
(4) (13) (ii) Extracting - Streaks - Wozniak
woz_param = [{"consecutiveStar": n} for n in [2, 4]]
extract.wozniak(df_out["PCA_1"],woz_param)
[('consecutiveStar_2', 0.012658227848101266), ('consecutiveStar_4', 0.0)]
(4) (28) (i) Extracting - Fractal - Higuchi
hig_param = [{"Kmax": 3},{"Kmax": 5}]
extract.higuchi_fractal_dimension(df_out["PCA_1"],hig_param)
[('Kmax_3', 0.577913816027104), ('Kmax_5', 0.8176960510304725)]
(4) (5) (ii) Extracting - Volatility - Variability Index
var_index_param = {"Volume":df["Volume"].values, "Open": df["Open"].values}
extract.var_index(df["Close"].values,var_index_param)
{'Interact__Open': 0.00396022538846289,
'Interact__Volume': 0.20550155114176533}
Time Series Extraction
pip install git+git://github.com/firmai/tsfresh.git
#Construct the preferred input dataframe.
from tsfresh.utilities.dataframe_functions import roll_time_series
df_out["ID"] = 0
periods = 30
df_out = df_out.reset_index()
df_ts = roll_time_series(df_out,"ID","Date",None,1,periods)
counts = df_ts['ID'].value_counts()
df_ts = df_ts[df_ts['ID'].isin(counts[counts > periods].index)]
#Perform extraction
from tsfresh.feature_extraction import extract_features, CustomFCParameters
settings_dict = CustomFCParameters()
settings_dict["var_index"] = {"PCA_1":None, "PCA_2": None}
df_feat = extract_features(df_ts.drop(["Close_1"],axis=1),default_fc_parameters=settings_dict,column_id="ID",column_sort="Date")
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00, 2.14s/it]
# Cleaning operations
import pandasvault as pv
df_feat2 = df_feat.copy()
df_feat = df_feat.dropna(thresh = len(df_feat)*0.50, axis = "columns")
df_feat_cons = pv.constant_feature_detect(data=df_feat,threshold=0.9)
df_feat = df_feat.drop(df_feat_cons, axis=1)
df_feat = df_feat.ffill()
df_feat = pd.merge(df_feat,df[["Close_1"]],left_index=True,right_index=True,how="left")
print(df_feat.shape)
model(df_feat) #noisy
7 variables are found to be almost constant
(208, 48)
2064.7813982935995
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute
impute(df_feat)
df_feat_2 = select_features(df_feat.drop(["Close_1"],axis=1),df_feat["Close_1"],fdr_level=0.05)
df_feat_2["Close_1"] = df_feat["Close_1"]
model(df_feat_2) #improvement (b/ not an augmentation method)
1577.5273071299482
(3) (6) (i) Feature Agglomoration; (1)(2)(i) Standard Scaler.
Like in this step, after (1), (2), (3), (4) and (5), you can often circle back to the initial steps to normalise the data and dimensionally reduce the data for the final model.
import numpy as np
from sklearn import datasets, cluster
def feature_agg(df, drop, components):
components = min(df.shape[1]-1,components)
agglo = cluster.FeatureAgglomeration(n_clusters=components,)
df = df.drop(drop,axis=1)
agglo.fit(df)
df = pd.DataFrame(agglo.transform(df))
df = df.add_prefix('fe_agg_')
return df
df_final = transform.standard_scaler(df_feat_2, drop=["Close_1"])
df_final = mapper.feature_agg(df_final,["Close_1"],4)
df_final.index = df_feat.index
df_final["Close_1"] = df_feat["Close_1"]
model(df_final) #noisy
1949.89085894338
Final Model After Applying 13 Arbitrary Augmentation Techniques
model(df_final) #improvement
1949.89085894338
Original Model Before Augmentation
df_org = df.iloc[:,:7][df.index.isin(df_final.index)]
model(df_org)
389.783990984133
Best Model After Developing 8 Augmenting Features
df_best = df_best.replace([np.inf, -np.inf], np.nan).ffill().fillna(0)
model(df_best)
267.1826850968307
Commentary
There are countless ways in which the current model can be improved, this can take on an automated process where all techniques are tested against a hold out set, for example, we can perform the operation below, and even though it improves the score here, there is a need for more robust tests. The skeleton example above is not meant to highlight the performance of the package. It simply serves as an example of how one can go about applying augmentation methods.
Quite naturally this example suffers from dimensionality issues with array shapes reaching (208, 48)
, furthermore you would need a sample that is at least 50-100 times larger before machine learning methods start to make sense.
Nonetheless, in this example, Transformation, Interactions and Mappings (applied to extraction output) performed fairly well. Extraction augmentation was overkill, but created a reasonable model when dimensionally reduced. A better selection of one of the 50+ augmentation methods and the order of augmentation could further help improve the outcome if robustly tested against development sets.
[1] DeltaPy Development
Author: firmai
Source Code: https://github.com/firmai/deltapy
#engineering
1647540000
The Substrate Knowledge Map provides information that you—as a Substrate hackathon participant—need to know to develop a non-trivial application for your hackathon submission.
The map covers 6 main sections:
Each section contains basic information on each topic, with links to additional documentation for you to dig deeper. Within each section, you'll find a mix of quizzes and labs to test your knowledge as your progress through the map. The goal of the labs and quizzes is to help you consolidate what you've learned and put it to practice with some hands-on activities.
One question we often get is why learn the Substrate framework when we can write smart contracts to build decentralized applications?
The short answer is that using the Substrate framework and writing smart contracts are two different approaches.
Traditional smart contract platforms allow users to publish additional logic on top of some core blockchain logic. Since smart contract logic can be published by anyone, including malicious actors and inexperienced developers, there are a number of intentional safeguards and restrictions built around these public smart contract platforms. For example:
Fees: Smart contract developers must ensure that contract users are charged for the computation and storage they impose on the computers running their contract. With fees, block creators are protected from abuse of the network.
Sandboxed: A contract is not able to modify core blockchain storage or storage items of other contracts directly. Its power is limited to only modifying its own state, and the ability to make outside calls to other contracts or runtime functions.
Reversion: Contracts can be prone to undesirable situations that lead to logical errors when wanting to revert or upgrade them. Developers need to learn additional patterns such as splitting their contract's logic and data to ensure seamless upgrades.
These safeguards and restrictions make running smart contracts slower and more costly. However, it's important to consider the different developer audiences for contract development versus Substrate runtime development.
Building decentralized applications with smart contracts allows your community to extend and develop on top of your runtime logic without worrying about proposals, runtime upgrades, and so on. You can also use smart contracts as a testing ground for future runtime changes, but done in an isolated way that protects your network from any errors the changes might introduce.
In summary, smart contract development:
Unlike traditional smart contract development, Substrate runtime development offers none of the network protections or safeguards. Instead, as a runtime developer, you have total control over how the blockchain behaves. However, this level of control also means that there is a higher barrier to entry.
Substrate is a framework for building blockchains, which almost makes comparing it to smart contract development like comparing apples and oranges. With the Substrate framework, developers can build smart contracts but that is only a fraction of using Substrate to its full potential.
With Substrate, you have full control over the underlying logic that your network's nodes will run. You also have full access for modifying and controlling each and every storage item across your runtime modules. As you progress through this map, you'll discover concepts and techniques that will help you to unlock the potential of the Substrate framework, giving you the freedom to build the blockchain that best suits the needs of your application.
You'll also discover how you can upgrade the Substrate runtime with a single transaction instead of having to organize a community hard-fork. Upgradeability is one of the primary design features of the Substrate framework.
In summary, runtime development:
To learn more about using smart contracts within Substrate, refer to the Smart Contract - Overview page as well as the Polkadot Builders Guide.
If you need any community support, please join the following channels based on the area where you need help:
Alternatively, also look for support on Stackoverflow where questions are tagged with "substrate" or on the Parity Subport repo.
Use the following links to explore the sites and resources available on each:
Substrate Developer Hub has the most comprehensive all-round coverage about Substrate, from a "big picture" explanation of architecture to specific technical concepts. The site also provides tutorials to guide you as your learn the Substrate framework and the API reference documentation. You should check this site first if you want to look up information about Substrate runtime development. The site consists of:
Knowledge Base: Explaining the foundational concepts of building blockchain runtimes using Substrate.
Tutorials: Hand-on tutorials for developers to follow. The first SIX tutorials show the fundamentals in Substrate and are recommended for every Substrate learner to go through.
How-to Guides: These resources are like the O'Reilly cookbook series written in a task-oriented way for readers to get the job done. Some examples of the topics overed include:
API docs: Substrate API reference documentation.
Substrate Node Template provides a light weight, minimal Substrate blockchain node that you can set up as a local development environment.
Substrate Front-end template provides a front-end interface built with React using Polkadot-JS API to connect to any Substrate node. Developers are encouraged to start new Substrate projects based on these templates.
If you face any technical difficulties and need support, feel free to join the Substrate Technical matrix channel and ask your questions there.
Polkadot Wiki documents the specific behavior and mechanisms of the Polkadot network. The Polkadot network allows multiple blockchains to connect and pass messages to each other. On the wiki, you can learn about how Polkadot—built using Substrate—is customized to support inter-blockchain message passing.
Polkadot JS API doc: documents how to use the Polkadot-JS API. This JavaScript-based API allows developers to build custom front-ends for their blockchains and applications. Polkadot JS API provides a way to connect to Substrate-based blockchains to query runtime metadata and send transactions.
👉 Submit your answers to Quiz #1
Here you will set up your local machine to install the Rust compiler—ensuring that you have both stable and nightly versions installed. Both stable and nightly versions are required because currently a Substrate runtime is compiled to a native binary using the stable Rust compiler, then compiled to a WebAssembly (WASM) binary, which only the nightly Rust compiler can do.
Also refer to:
👉 Complete Lab #1: Run a Substrate node
Polkadot JS Apps is the canonical front-end to interact with any Substrate-based chain.
You can configure whichever endpoint you want it to connected to, even to your localhost
running node. Refer to the following two diagrams.
👉 Complete Quiz #2
👉 Complete Lab #2: Using Polkadot-JS Apps
Notes: If you are connecting Apps to a custom chain (or your locally-running node), you may need to specify your chain's custom data types in JSON under Settings > Developer.
Polkadot-JS Apps only receives a series of bytes from the blockchain. It is up to the developer to tell it how to decode and interpret these custom data type. To learn more on this, refer to:
You will also need to create an account. To do so, follow these steps on account generation. You'll learn that you can also use the Polkadot-JS Browser Plugin (a Metamask-like browser extension to manage your Substrate accounts) and it will automatically be imported into Polkadot-JS Apps.
Notes: When you run a Substrate chain in development mode (with the
--dev
flag), well-known accounts (Alice
,Bob
,Charlie
, etc.) are always created for you.
👉 Complete Lab #3: Create an Account
You need to know some Rust programming concepts and have a good understanding on how blockchain technology works in order to make the most of developing with Substrate. The following resources will help you brush up in these areas.
You will need familiarize yourself with Rust to understand how Substrate is built and how to make the most of its capabilities.
If you are new to Rust, or need a brush up on your Rust knowledge, please refer to The Rust Book. You could still continue learning about Substrate without knowing Rust, but we recommend you come back to this section whenever in doubt about what any of the Rust syntax you're looking at means. Here are the parts of the Rust book we recommend you familiarize yourself with:
Given that you'll be writing a blockchain runtime, you need to know what a blockchain is, and how it works. The **Web3 Blockchain Fundamental MOOC Youtube video series provides a good basis for understanding key blockchain concepts and how blockchains work.
The lectures we recommend you watch are: lectures 1 - 7 and lecture 10. That's 8 lectures, or about 4 hours of video.
👉 Complete Quiz #3
To know more about the high level architecture of Substrate, please go through the Knowledge Base articles on Getting Started: Overview and Getting Started: Architecture.
In this document, we assume you will develop a Substrate runtime with FRAME (v2). This is what a Substrate node consists of.
Each node has many components that manage things like the transaction queue, communicating over a P2P network, reaching consensus on the state of the blockchain, and the chain's actual runtime logic (aka the blockchain runtime). Each aspect of the node is interesting in its own right, and the runtime is particularly interesting because it contains the business logic (aka "state transition function") that codifies the chain's functionality. The runtime contains a collection of pallets that are configured to work together.
On the node level, Substrate leverages libp2p for the p2p networking layer and puts the transaction pool, consensus mechanism, and underlying data storage (a key-value database) on the node level. These components all work "under the hood", and in this knowledge map we won't cover them in detail except for mentioning their existence.
👉 Complete Quiz #4
In our Developer Hub, we have a thorough coverage on various subjects you need to know to develop with Substrate. So here we just list out the key topics and reference back to Developer Hub. Please go through the following key concepts and the directed resources to know the fundamentals of runtime development.
Key Concept: Runtime, this is where the blockchain state transition function (the blockchain application-specific logic) is defined. It is about composing multiple pallets (can be understood as Rust modules) together in the runtime and hooking them up together.
Runtime Development: Execution, this article describes how a block is produced, and how transactions are selected and executed to reach the next "stage" in the blockchain.
Runtime Develpment: Pallets, this article describes what the basic structure of a Substrate pallet is consists of.
Runtime Development: FRAME, this article gives a high level overview of the system pallets Substrate already implements to help you quickly develop as a runtime engineer. Have a quick skim so you have a basic idea of the different pallets Substrate is made of.
👉 Complete Lab #4: Adding a Pallet into a Runtime
Runtime Development: Storage, this article describes how data is stored on-chain and how you could access them.
Runtime Development: Events & Errors, this page describe how external parties know what has happened in the blockchain, via the emitted events and errors when executing transactions.
Notes: All of the above concepts we leverage on the
#[pallet::*]
macro to define them in the code. If you are interested to learn more about what other types of pallet macros exist go to the FRAME macro API documentation and this doc on some frequently used Substrate macros.
👉 Complete Lab #5: Building a Proof-of-Existence dApp
👉 Complete Lab #6: Building a Substrate Kitties dApp
👉 Complete Quiz #5
Polkadot JS API is the javascript API for Substrate. By using it you can build a javascript front end or utility and interact with any Substrate-based blockchain.
The Substrate Front-end Template is an example of using Polkadot JS API in a React front-end.
👉 Complete Lab #7: Using Polkadot-JS API
👉 Complete Quiz #6: Using Polkadot-JS API
Learn about the difference between smart contract development vs Substrate runtime development, and when to use each here.
In Substrate, you can program smart contracts using ink!.
👉 Complete Quiz #7: Using ink!
A lot 😄
On-chain runtime upgrades. We have a tutorial on On-chain (forkless) Runtime Upgrade. This tutorial introduces how to perform and schedule a runtime upgrade as an on-chain transaction.
About transaction weight and fee, and benchmarking your runtime to determine the proper transaction cost.
There are certain limits to on-chain logic. For instance, computation cannot be too intensive that it affects the block output time, and computation must be deterministic. This means that computation that relies on external data fetching cannot be done on-chain. In Substrate, developers can run these types of computation off-chain and have the result sent back on-chain via extrinsics.
Tightly- and Loosely-coupled pallets, calling one pallet's functions from another pallet via trait specification.
Blockchain Consensus Mechansim, and a guide on customizing it to proof-of-work here.
Parachains: one key feature of Substrate is the capability of becoming a parachain for relay chains like Polkadot. You can develop your own application-specific logic in your chain and rely on the validator community of the relay chain to secure your network, instead of building another validator community yourself. Learn more with the following resources:
Author: substrate-developer-hub
Source Code: https://github.com/substrate-developer-hub/hackathon-knowledge-map
License:
1663559281
Learn how to create a to-do list app with local storage using HTML, CSS and JavaScript. Build a Todo list application with HTML, CSS and JavaScript. Learn the basics to JavaScript along with some more advanced features such as LocalStorage for saving data to the browser.
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>To Do List With Local Storage</title>
<!-- Font Awesome Icons -->
<link
rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.2.0/css/all.min.css"
/>
<!-- Google Fonts -->
<link
href="https://fonts.googleapis.com/css2?family=Poppins:wght@400;500&display=swap"
rel="stylesheet"
/>
<!-- Stylesheet -->
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<div id="new-task">
<input type="text" placeholder="Enter The Task Here..." />
<button id="push">Add</button>
</div>
<div id="tasks"></div>
</div>
<!-- Script -->
<script src="script.js"></script>
</body>
</html>
* {
padding: 0;
margin: 0;
box-sizing: border-box;
}
body {
background-color: #0b87ff;
}
.container {
width: 90%;
max-width: 34em;
position: absolute;
transform: translate(-50%, -50%);
top: 50%;
left: 50%;
}
#new-task {
position: relative;
background-color: #ffffff;
padding: 1.8em 1.25em;
border-radius: 0.3em;
box-shadow: 0 1.25em 1.8em rgba(1, 24, 48, 0.15);
display: grid;
grid-template-columns: 9fr 3fr;
gap: 1em;
}
#new-task input {
font-family: "Poppins", sans-serif;
font-size: 1em;
border: none;
border-bottom: 2px solid #d1d3d4;
padding: 0.8em 0.5em;
color: #111111;
font-weight: 500;
}
#new-task input:focus {
outline: none;
border-color: #0b87ff;
}
#new-task button {
font-family: "Poppins", sans-serif;
font-weight: 500;
font-size: 1em;
background-color: #0b87ff;
color: #ffffff;
outline: none;
border: none;
border-radius: 0.3em;
cursor: pointer;
}
#tasks {
background-color: #ffffff;
position: relative;
padding: 1.8em 1.25em;
margin-top: 3.8em;
width: 100%;
box-shadow: 0 1.25em 1.8em rgba(1, 24, 48, 0.15);
border-radius: 0.6em;
}
.task {
background-color: #ffffff;
padding: 0.3em 0.6em;
margin-top: 0.6em;
display: flex;
align-items: center;
border-bottom: 2px solid #d1d3d4;
cursor: pointer;
}
.task span {
font-family: "Poppins", sans-serif;
font-size: 0.9em;
font-weight: 400;
}
.task button {
color: #ffffff;
padding: 0.8em 0;
width: 2.8em;
border-radius: 0.3em;
border: none;
outline: none;
cursor: pointer;
}
.delete {
background-color: #fb3b3b;
}
.edit {
background-color: #0b87ff;
margin-left: auto;
margin-right: 3em;
}
.completed {
text-decoration: line-through;
}
//Initial References
const newTaskInput = document.querySelector("#new-task input");
const tasksDiv = document.querySelector("#tasks");
let deleteTasks, editTasks, tasks;
let updateNote = "";
let count;
//Function on window load
window.onload = () => {
updateNote = "";
count = Object.keys(localStorage).length;
displayTasks();
};
//Function to Display The Tasks
const displayTasks = () => {
if (Object.keys(localStorage).length > 0) {
tasksDiv.style.display = "inline-block";
} else {
tasksDiv.style.display = "none";
}
//Clear the tasks
tasksDiv.innerHTML = "";
//Fetch All The Keys in local storage
let tasks = Object.keys(localStorage);
tasks = tasks.sort();
for (let key of tasks) {
let classValue = "";
//Get all values
let value = localStorage.getItem(key);
let taskInnerDiv = document.createElement("div");
taskInnerDiv.classList.add("task");
taskInnerDiv.setAttribute("id", key);
taskInnerDiv.innerHTML = `<span id="taskname">${key.split("_")[1]}</span>`;
//localstorage would store boolean as string so we parse it to boolean back
let editButton = document.createElement("button");
editButton.classList.add("edit");
editButton.innerHTML = `<i class="fa-solid fa-pen-to-square"></i>`;
if (!JSON.parse(value)) {
editButton.style.visibility = "visible";
} else {
editButton.style.visibility = "hidden";
taskInnerDiv.classList.add("completed");
}
taskInnerDiv.appendChild(editButton);
taskInnerDiv.innerHTML += `<button class="delete"><i class="fa-solid fa-trash"></i></button>`;
tasksDiv.appendChild(taskInnerDiv);
}
//tasks completed
tasks = document.querySelectorAll(".task");
tasks.forEach((element, index) => {
element.onclick = () => {
//local storage update
if (element.classList.contains("completed")) {
updateStorage(element.id.split("_")[0], element.innerText, false);
} else {
updateStorage(element.id.split("_")[0], element.innerText, true);
}
};
});
//Edit Tasks
editTasks = document.getElementsByClassName("edit");
Array.from(editTasks).forEach((element, index) => {
element.addEventListener("click", (e) => {
//Stop propogation to outer elements (if removed when we click delete eventually rhw click will move to parent)
e.stopPropagation();
//disable other edit buttons when one task is being edited
disableButtons(true);
//update input value and remove div
let parent = element.parentElement;
newTaskInput.value = parent.querySelector("#taskname").innerText;
//set updateNote to the task that is being edited
updateNote = parent.id;
//remove task
parent.remove();
});
});
//Delete Tasks
deleteTasks = document.getElementsByClassName("delete");
Array.from(deleteTasks).forEach((element, index) => {
element.addEventListener("click", (e) => {
e.stopPropagation();
//Delete from local storage and remove div
let parent = element.parentElement;
removeTask(parent.id);
parent.remove();
count -= 1;
});
});
};
//Disable Edit Button
const disableButtons = (bool) => {
let editButtons = document.getElementsByClassName("edit");
Array.from(editButtons).forEach((element) => {
element.disabled = bool;
});
};
//Remove Task from local storage
const removeTask = (taskValue) => {
localStorage.removeItem(taskValue);
displayTasks();
};
//Add tasks to local storage
const updateStorage = (index, taskValue, completed) => {
localStorage.setItem(`${index}_${taskValue}`, completed);
displayTasks();
};
//Function To Add New Task
document.querySelector("#push").addEventListener("click", () => {
//Enable the edit button
disableButtons(false);
if (newTaskInput.value.length == 0) {
alert("Please Enter A Task");
} else {
//Store locally and display from local storage
if (updateNote == "") {
//new task
updateStorage(count, newTaskInput.value, false);
} else {
//update task
let existingCount = updateNote.split("_")[0];
removeTask(updateNote);
updateStorage(existingCount, newTaskInput.value, false);
updateNote = "";
}
count += 1;
newTaskInput.value = "";
}
});
#html #css #javascript
1667904060
A Django plugin for creating AJAX driven forms in Bootstrap modal.
This repository includes Dockerfile
and docker-compose.yml
files so you can easily setup and start to experiment with django-bootstrap-modal-forms
running inside of a container on your local machine. Any changes you make in bootstrap_modal_forms
, examples
and test
folders are reflected in the container (see docker-compose.yml) and the data stored in sqlite3 database are persistent even if you remove stopped container. Follow the steps below to run the app:
$ clone repository
$ cd django-bootstrap-modal-forms
$ docker compose up (use -d flag to run app in detached mode in the background)
$ visit 0.0.0.0:8000
Install django-bootstrap-modal-forms
:
$ pip install django-bootstrap-modal-forms
Add bootstrap_modal_forms
to your INSTALLED_APPS in settings.py:
INSTALLED_APPS = [
...
'bootstrap_modal_forms',
...
]
Include Bootstrap, jQuery and jquery.bootstrap.modal.forms.js
on every page where you would like to set up the AJAX driven Django forms in Bootstrap modal.
IMPORTANT: Adjust Bootstrap and jQuery file paths to match yours, but include jquery.bootstrap.modal.forms.js
exactly as in code bellow.
<head>
<link rel="stylesheet" href="{% static 'assets/css/bootstrap.css' %}">
</head>
<body>
<script src="{% static 'assets/js/bootstrap.js' %}"></script>
<script src="{% static 'assets/js/jquery.js' %}"></script>
<script src="{% static 'js/jquery.bootstrap.modal.forms.js' %}"></script>
<!-- You can alternatively load the minified version -->
<script src="{% static 'js/jquery.bootstrap.modal.forms.min.js' %}"></script>
</body>
index.html
<script type="text/javascript">
$(document).ready(function() {
$("#create-book").modalForm({
formURL: "{% url 'create_book' %}"
});
});
</script>
modalForm
opens modalformURL
is appended to the modalformURL
success_url
and shows success_message
, which are both defined in related Django viewDefine BookModelForm and inherit built-in form BSModalModelForm
.
forms.py
from .models import Book
from bootstrap_modal_forms.forms import BSModalModelForm
class BookModelForm(BSModalModelForm):
class Meta:
model = Book
fields = ['title', 'author', 'price']
Define form's html and save it as Django template.
formURL
defined in #6.class="invalid"
or custom errorClass
(see paragraph Options) to the elements that wrap the fields.class="invalid"
acts as a flag for the fields having errors after the form has been POSTed.book/create_book.html
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h5 class="modal-title">Create new Book</h5>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
{% for field in form %}
<div class="form-group{% if field.errors %} invalid{% endif %}">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{{ field }}
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
{% endfor %}
</div>
<div class="modal-footer">
<button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
<button type="submit" class="btn btn-primary">Create</button>
</div>
</form>
Define a class-based view BookCreateView and inherit from built-in generic view BSModalCreateView
. BookCreateView processes the form defined in #1, uses the template defined in #2 and redirects to success_url
showing success_message
.
views.py
from django.urls import reverse_lazy
from .forms import BookModelForm
from .models import Book
from bootstrap_modal_forms.generic import BSModalCreateView
class BookCreateView(BSModalCreateView):
template_name = 'examples/create_book.html'
form_class = BookModelForm
success_message = 'Success: Book was created.'
success_url = reverse_lazy('index')
Define URL for the view in #3.
from django.urls import path
from books import views
urlpatterns = [
path('', views.Index.as_view(), name='index'),
path('create/', views.BookCreateView.as_view(), name='create_book'),
]
Define the Bootstrap modal window and html element triggering modal opening.
modalForms
in single template (see #6).id
and the same value should also be set as modalID
option when instantiating modalForm
on trigger element.id="create-book"
) is used for instantiation of modalForm
in #6.modalForm
is bound to it.<div class="modal-content"></div>
and sets action attribute of the form to formURL
set in #6.index.html
<div class="modal fade" tabindex="-1" role="dialog" id="modal">
<div class="modal-dialog" role="document">
<div class="modal-content"></div>
</div>
</div>
<!-- Create book button -->
<button id="create-book" class="btn btn-primary" type="button" name="button">Create book</button>
Add script to the template from #5 and bind the modalForm
to the trigger element. Set BookCreateView URL defined in #4 as formURL
property of modalForm
.
modalForm
with unique URL to it.modalID
, modalContent
, modalForm
and errorClass
are used in this example, while formURL
is customized. If you customize any other option adjust the code of the above examples accordingly.index.html
<script type="text/javascript">
$(document).ready(function() {
$("#create-book").modalForm({
formURL: "{% url 'create_book' %}"
});
});
</script>
Set asyncUpdate and asyncSettings settings to create or update objects without page redirection to successUrl and define whether a modal should close or stay opened after form submission. See comments in example below and paragraph modalForm options for explanation of asyncSettings. See examples on how to properly reinstantiate modal forms for all CRUD buttons when using async options.
index.html
<!-- asyncSettings.dataElementId -->
<table id="books-table" class="table">
<thead>
...
</thead>
<tbody>
{% for book in books %}
<tr>
...
<!-- Update book buttons -->
<button type="button" class="update-book btn btn-sm btn-primary" data-form-url="{% url 'update_book' book.pk %}">
<span class="fa fa-pencil"></span>
</button>
...
</td>
</tr>
{% endfor %}
</tbody>
</table>
<script type="text/javascript">
$(function () {
...
# asyncSettings.successMessage
var asyncSuccessMessage = [
"<div ",
"style='position:fixed;top:0;z-index:10000;width:100%;border-radius:0;' ",
"class='alert alert-icon alert-success alert-dismissible fade show mb-0' role='alert'>",
"Success: Book was updated.",
"<button type='button' class='close' data-dismiss='alert' aria-label='Close'>",
"<span aria-hidden='true'>×</span>",
"</button>",
"</div>",
"<script>",
"$('.alert').fadeTo(2000, 500).slideUp(500, function () {$('.alert').slideUp(500).remove();});",
"<\/script>"
].join();
# asyncSettings.addModalFormFunction
function updateBookModalForm() {
$(".update-book").each(function () {
$(this).modalForm({
formURL: $(this).data("form-url"),
asyncUpdate: true,
asyncSettings: {
closeOnSubmit: false,
successMessage: asyncSuccessMessage
dataUrl: "books/",
dataElementId: "#books-table",
dataKey: "table",
addModalFormFunction: updateBookModalForm
}
});
});
}
updateBookModalForm();
...
});
</script>
urls.py
from django.urls import path
from . import views
urlpatterns = [
...
# asyncSettings.dataUrl
path('books/', views.books, name='books'),
...
]
views.py
from django.http import JsonResponse
from django.template.loader import render_to_string
from .models import Book
def books(request):
data = dict()
if request.method == 'GET':
books = Book.objects.all()
# asyncSettings.dataKey = 'table'
data['table'] = render_to_string(
'_books_table.html',
{'books': books},
request=request
)
return JsonResponse(data)
modalID
Sets the custom id of the modal. Default: "#modal"
modalContent
Sets the custom class of the element to which the form's html is appended. If you change modalContent
to the custom class, you should also change modalForm
accordingly. To keep Bootstrap's modal style you should than copy Bootstrap's style for modal-content
and set it to your new modalContent class. Default: ".modal-content"
modalForm
Sets the custom form selector. Default: ".modal-content form"
formURL
Sets the url of the form's view and html. Default: null
isDeleteForm
Defines if form is used for deletion. Should be set to true
for deletion forms. Default: false
errorClass
Sets the custom class for the form fields having errors. Default: ".invalid"
asyncUpdate
Sets asynchronous content update after form submission. Default: false
asyncSettings.closeOnSubmit
Sets whether modal closes or not after form submission. Default: false
asyncSettings.successMessage
Sets successMessage shown after succesful for submission. Should be set to string defining message element. See asyncSuccessMessage
example above. Default: null
asyncSettings.dataUrl
Sets url of the view returning new queryset = all of the objects plus newly created or updated one after asynchronous update. Default: null
asyncSettings.dataElementId
Sets the id
of the element which rerenders asynchronously updated queryset. Default: null
asyncSettings.dataKey
Sets the key containing asynchronously updated queryset in the data dictionary returned from the view providing updated queryset. Default: null
asyncSettings.addModalFormFunction
Sets the method needed for reinstantiation of event listeners on buttons (single or all CRUD buttons) after asynchronous update. Default: null
triggerElement.modalForm({
modalID: "#modal",
modalContent: ".modal-content",
modalForm: ".modal-content form",
formURL: null,
isDeleteForm: false,
errorClass: ".invalid",
asyncUpdate: false,
asyncSettings: {
closeOnSubmit: false,
successMessage: null,
dataUrl: null,
dataElementId: null,
dataKey: null,
addModalFormFunction: null
}
});
Import forms with from bootstrap_modal_forms.forms import BSModalForm
.
BSModalForm
Inherits PopRequestMixin and Django's forms.Form.
BSModalModelForm
Inherits PopRequestMixin, CreateUpdateAjaxMixin and Django's forms.ModelForm.
Import mixins with from bootstrap_modal_forms.mixins import PassRequestMixin
.
PassRequestMixin
Puts the request into the form's kwargs.
PopRequestMixin
Pops request out of the kwargs and attaches it to the form's instance.
CreateUpdateAjaxMixin
Saves or doesn't save the object based on the request type.
DeleteMessageMixin
Deletes object if request is not ajax request.
LoginAjaxMixin
Authenticates user if request is not ajax request.
Import generic views with from bootstrap_modal_forms.generic import BSModalFormView
.
BSModalFormView
Inherits PassRequestMixin and Django's generic.FormView.
BSModalCreateView
Inherits PassRequestMixin and Django's SuccessMessageMixin and generic.CreateView.
BSModalUpdateView
Inherits PassRequestMixin and Django's SuccessMessageMixin and generic.UpdateView.
BSModalReadView
Inherits Django's generic.DetailView.
BSModalDeleteView
Inherits DeleteMessageMixin and Django's generic.DeleteView.
To see django-bootstrap-modal-forms
in action clone the repository and run the examples locally:
$ git clone https://github.com/trco/django-bootstrap-modal-forms.git
$ cd django-bootstrap-modal-forms
$ pip install -r requirements.txt
$ python manage.py migrate
$ python manage.py runserver
Run unit and functional tests inside of project folder:
$ python manage.py test
For explanation how all the parts of the code work together see paragraph Usage. To test the working solution presented here clone and run Examples.
forms.py
from django.contrib.auth.forms import UserCreationForm
from django.contrib.auth.models import User
from bootstrap_modal_forms.mixins import PopRequestMixin, CreateUpdateAjaxMixin
class CustomUserCreationForm(PopRequestMixin, CreateUpdateAjaxMixin,
UserCreationForm):
class Meta:
model = User
fields = ['username', 'password1', 'password2']
signup.html
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Sign up</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="{% if form.non_field_errors %}invalid{% endif %} mb-2">
{% for error in form.non_field_errors %}
{{ error }}
{% endfor %}
</div>
{% for field in form %}
<div class="form-group">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{% render_field field class="form-control" placeholder=field.label %}
<div class="{% if field.errors %} invalid{% endif %}">
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
</div>
{% endfor %}
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Sign up</button>
</div>
</form>
views.py
from django.urls import reverse_lazy
from bootstrap_modal_forms.generic import BSModalCreateView
from .forms import CustomUserCreationForm
class SignUpView(BSModalCreateView):
form_class = CustomUserCreationForm
template_name = 'examples/signup.html'
success_message = 'Success: Sign up succeeded. You can now Log in.'
success_url = reverse_lazy('index')
urls.py
from django.urls import path
from . import views
app_name = 'accounts'
urlpatterns = [
path('signup/', views.SignUpView.as_view(), name='signup')
]
.html file containing modal, trigger element and script instantiating modalForm
<div class="modal fade" tabindex="-1" role="dialog" id="modal">
<div class="modal-dialog" role="document">
<div class="modal-content"></div>
</div>
</div>
<button id="signup-btn" class="btn btn-primary" type="button" name="button">Sign up</button>
<script type="text/javascript">
$(function () {
// Sign up button
$("#signup-btn").modalForm({
formURL: "{% url 'signup' %}"
});
});
</script>
For explanation how all the parts of the code work together see paragraph Usage. To test the working solution presented here clone and run Examples.
You can set the login redirection by setting the LOGIN_REDIRECT_URL
in settings.py
.
You can also set the custom login redirection by:
success_url
to the extra_context
of CustomLoginView
success_url
variable as a value of the hidden input field
with name="next"
within the Login form htmlforms.py
from django.contrib.auth.forms import AuthenticationForm
from django.contrib.auth.models import User
class CustomAuthenticationForm(AuthenticationForm):
class Meta:
model = User
fields = ['username', 'password']
login.html
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Log in</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="{% if form.non_field_errors %}invalid{% endif %} mb-2">
{% for error in form.non_field_errors %}
{{ error }}
{% endfor %}
</div>
{% for field in form %}
<div class="form-group">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{% render_field field class="form-control" placeholder=field.label %}
<div class="{% if field.errors %} invalid{% endif %}">
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
</div>
{% endfor %}
<!-- Hidden input field for custom redirection after successful login -->
<input type="hidden" name="next" value="{{ success_url }}">
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Log in</button>
</div>
</form>
views.py
from django.urls import reverse_lazy
from bootstrap_modal_forms.generic import BSModalLoginView
from .forms import CustomAuthenticationForm
class CustomLoginView(BSModalLoginView):
authentication_form = CustomAuthenticationForm
template_name = 'examples/login.html'
success_message = 'Success: You were successfully logged in.'
extra_context = dict(success_url=reverse_lazy('index'))
urls.py
from django.urls import path
from . import views
app_name = 'accounts'
urlpatterns = [
path('login/', views.CustomLoginView.as_view(), name='login')
]
.html file containing modal, trigger element and script instantiating modalForm
<div class="modal fade" tabindex="-1" role="dialog" id="modal">
<div class="modal-dialog" role="document">
<div class="modal-content"></div>
</div>
</div>
<button id="login-btn" class="btn btn-primary" type="button" name="button">Sign up</button>
<script type="text/javascript">
$(function () {
// Log in button
$("#login-btn").modalForm({
formURL: "{% url 'login' %}"
});
});
</script>
For explanation how all the parts of the code work together see paragraph Usage. To test the working solution presented here clone and run Examples.
forms.py
from .models import Book
from bootstrap_modal_forms.forms import BSModalModelForm
class BookModelForm(BSModalModelForm):
class Meta:
model = Book
exclude = ['timestamp']
create_book.html
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Create Book</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="{% if form.non_field_errors %}invalid{% endif %} mb-2">
{% for error in form.non_field_errors %}
{{ error }}
{% endfor %}
</div>
{% for field in form %}
<div class="form-group">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{% render_field field class="form-control" placeholder=field.label %}
<div class="{% if field.errors %} invalid{% endif %}">
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
</div>
{% endfor %}
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Create</button>
</div>
</form>
update_book.html
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Update Book</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="{% if form.non_field_errors %}invalid{% endif %} mb-2">
{% for error in form.non_field_errors %}
{{ error }}
{% endfor %}
</div>
{% for field in form %}
<div class="form-group">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{% render_field field class="form-control" placeholder=field.label %}
<div class="{% if field.errors %} invalid{% endif %}">
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
</div>
{% endfor %}
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Update</button>
</div>
</form>
read_book.html
{% load widget_tweaks %}
<div class="modal-header">
<h3 class="modal-title">Book details</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="">
Title: {{ book.title }}
</div>
<div class="">
Author: {{ book.author }}
</div>
<div class="">
Price: {{ book.price }} €
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
</div>
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Delete Book</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<p>Are you sure you want to delete book with title
<strong>{{ book.title }}</strong>?</p>
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-danger">Delete</button>
</div>
</form>
views.py
from django.urls import reverse_lazy
from django.views import generic
from .forms import BookModelForm
from .models import Book
from bootstrap_modal_forms.generic import (
BSModalCreateView,
BSModalUpdateView,
BSModalReadView,
BSModalDeleteView
)
class Index(generic.ListView):
model = Book
context_object_name = 'books'
template_name = 'index.html'
# Create
class BookCreateView(BSModalCreateView):
template_name = 'examples/create_book.html'
form_class = BookModelForm
success_message = 'Success: Book was created.'
success_url = reverse_lazy('index')
# Update
class BookUpdateView(BSModalUpdateView):
model = Book
template_name = 'examples/update_book.html'
form_class = BookModelForm
success_message = 'Success: Book was updated.'
success_url = reverse_lazy('index')
# Read
class BookReadView(BSModalReadView):
model = Book
template_name = 'examples/read_book.html'
# Delete
class BookDeleteView(BSModalDeleteView):
model = Book
template_name = 'examples/delete_book.html'
success_message = 'Success: Book was deleted.'
success_url = reverse_lazy('index')
urls.py
from django.urls import path
from books import views
urlpatterns = [
path('', views.Index.as_view(), name='index'),
path('create/', views.BookCreateView.as_view(), name='create_book'),
path('update/<int:pk>', views.BookUpdateView.as_view(), name='update_book'),
path('read/<int:pk>', views.BookReadView.as_view(), name='read_book'),
path('delete/<int:pk>', views.BookDeleteView.as_view(), name='delete_book')
]
.html file containing modal, trigger elements and script instantiating modalForms
<!-- Modal 1 with id="create-book"-->
<div class="modal fade" id="create-modal" tabindex="-1" role="dialog" aria-hidden="true">
<div class="modal-dialog">
<div class="modal-content">
</div>
</div>
</div>
<!-- Modal 2 with id="modal" -->
<div class="modal fade" tabindex="-1" role="dialog" id="modal">
<div class="modal-dialog" role="document">
<div class="modal-content"></div>
</div>
</div>
<!-- Create book button -->
<button id="create-book" class="btn btn-primary" type="button" name="button">Create book</button>
{% for book in books %}
<div class="text-center">
<!-- Read book buttons -->
<button type="button" class="read-book bs-modal btn btn-sm btn-primary" data-form-url="{% url 'read_book' book.pk %}">
<span class="fa fa-eye"></span>
</button>
<!-- Update book buttons -->
<button type="button" class="update-book bs-modal btn btn-sm btn-primary" data-form-url="{% url 'update_book' book.pk %}">
<span class="fa fa-pencil"></span>
</button>
<!-- Delete book buttons -->
<button type="button" class="delete-book bs-modal btn btn-sm btn-danger" data-form-url="{% url 'delete_book' book.pk %}">
<span class="fa fa-trash"></span>
</button>
</div>
{% endfor %}
<script type="text/javascript">
$(function () {
// Read book buttons
$(".read-book").each(function () {
$(this).modalForm({formURL: $(this).data("form-url")});
});
// Delete book buttons - formURL is retrieved from the data of the element
$(".delete-book").each(function () {
$(this).modalForm({formURL: $(this).data("form-url"), isDeleteForm: true});
});
// Create book button opens form in modal with id="create-modal"
$("#create-book").modalForm({
formURL: "{% url 'create_book' %}",
modalID: "#create-modal"
});
});
</script>
data-form-url
attribute of each Update, Read and Delete button should be set to relevant URL with pk argument of the object to be updated, read or deleted.data-form-url
URLs should than be set as formURLs
for modalForms
bound to the buttons.For explanation how all the parts of the code work together see paragraph Usage. To test the working solution presented here clone and run Examples.
forms.py
from bootstrap_modal_forms.forms import BSModalForm
class BookFilterForm(BSModalForm):
type = forms.ChoiceField(choices=Book.BOOK_TYPES)
class Meta:
fields = ['type']
filter_book.html
{% load widget_tweaks %}
<form method="post" action="">
{% csrf_token %}
<div class="modal-header">
<h3 class="modal-title">Filter Books</h3>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">
<div class="{% if form.non_field_errors %}invalid{% endif %} mb-2">
{% for error in form.non_field_errors %}
{{ error }}
{% endfor %}
</div>
{% for field in form %}
<div class="form-group">
<label for="{{ field.id_for_label }}">{{ field.label }}</label>
{% render_field field class="form-control" placeholder=field.label %}
<div class="{% if field.errors %} invalid{% endif %}">
{% for error in field.errors %}
<p class="help-block">{{ error }}</p>
{% endfor %}
</div>
</div>
{% endfor %}
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Filter</button>
</div>
</form>
views.py
class BookFilterView(BSModalFormView):
template_name = 'examples/filter_book.html'
form_class = BookFilterForm
def form_valid(self, form):
self.filter = '?type=' + form.cleaned_data['type']
response = super().form_valid(form)
return response
def get_success_url(self):
return reverse_lazy('index') + self.filter
urls.py
from django.urls import path
from . import views
app_name = 'accounts'
urlpatterns = [
path('filter/', views.BookFilterView.as_view(), name='filter_book'),
]
index.html
...
<button id="filter-book" class="filter-book btn btn-primary" type="button" name="button" data-form-url="{% url 'filter_book' %}">
<span class="fa fa-filter mr-2"></span>Filter books
</button>
...
<script type="text/javascript">
$(function () {
...
$("#filter-book").each(function () {
$(this).modalForm({formURL: $(this).data('form-url')});
});
...
});
</script>
This is an Open Source project and any contribution is appreciated.
Author: trco
Source Code: https://github.com/trco/django-bootstrap-modal-forms
License: MIT license