Myriam  Rogahn

Myriam Rogahn

1599517440

Z-score for anomaly detection

Most of the time I write longer articles on data science topics but recently I’ve been thinking about writing small, bite-sized pieces around specific concepts, algorithms and applications. This is my first attempt in that direction, hoping people will like these pieces.

In today’s “small-bite” I’m writing about Z-score in the context of anomaly detection.

Anomaly detection is a process for identifying unexpected data, event or behavior that require some examination. It is a well-established field within data science and there is a large number of algorithms to detect anomalies in a dataset depending on data type and business context. Z-score is probably the simplest algorithm that can rapidly screen candidates for further examination to determine whether they are suspicious or not.

What is Z-score

Simply speaking, Z-score is a statistical measure that tells you how far is a data point from the rest of the dataset. In a more technical term, Z-score tells how many standard deviations away a given observation is from the mean.

For example, a Z score of 2.5 means that the data point is 2.5 standard deviation far from the mean. And since it is far from the center, it’s flagged as an outlier/anomaly.

How it works?

Z-score is a parametric measure and it takes two parameters — mean and standard deviation.

Once you calculate these two parameters, finding the Z-score of a data point is easy.

Note that mean and standard deviation are calculated for the whole dataset, whereas x represents every single data point. That means, every data point will have its own z-score, whereas mean/standard deviation remains the same everywhere.

Example

Below is a python implementation of Z-score with a few sample data points. I’m adding notes in each line of code to explain what’s going on.

## import numpy
import numpy as np

## random data points to calculate z-score
data = [5, 5, 5, -99, 5, 5, 5, 5, 5, 5, 88, 5, 5, 5]
## calculate mean
mean = np.mean(data) 
## calculate standard deviation
sd = np.std(data)
## determine a threhold
threshold = 2
## create empty list to store outliers
outliers = []
## detect outlier
for i in data: 
    z = (i-mean)/sd ## calculate z-score
    if abs(z) > threshold:  ## identify outliers
        outliers.append(i) ## add to the empty list
## print outliers    
print("The detected outliers are: ", outliers)

Image for post

Caution and conclusion

If you play with these data you will notice a few things:

  • There are 14 data points and Z-score correctly detected 2 outliers [-99 and 88]. However, if you remove five data points from the list it detects only 1 outlier [-99]. That means you need to have a certain number of data size for Z-score to work.
  • In large production datasets, Z-score works best if data are normally distributed (aka. Gaussian distribution).
  • I used an arbitrary threshold of 2, beyond which all data points are flagged as outliers. The rule of thumb is to use 2, 2.5, 3 or 3.5 as threshold.
  • Finally, Z-score is sensitive to extreme values, because the mean itself is sensitive to extreme values.

Hope this was useful, feel free to get in touch via Twitter.

#machine-learning #anomaly-detection #outlier-detection #statistics #data-science

What is GEEK

Buddha Community

Z-score for anomaly detection
Dylan  Iqbal

Dylan Iqbal

1630996646

A Look at an ES2022 Feature: Class Static Initialization Blocks

ECMAScript class static initialization blocks

Class static blocks provide a mechanism to perform additional static initialization during class definition evaluation.

This is not intended as a replacement for public fields, as they provide useful information for static analysis tools and are a valid target for decorators. Rather, this is intended to augment existing use cases and enable new use cases not currently handled by that proposal.

Status

Stage: 4
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal see TODO, below.

Authors

  • Ron Buckton (@rbuckton)

Motivations

The current proposals for static fields and static private fields provide a mechanism to perform per-field initialization of the static-side of a class during ClassDefinitionEvaluation, however there are some cases that cannot be covered easily. For example, if you need to evaluate statements during initialization (such as try..catch), or set two fields from a single value, you have to perform that logic outside of the class definition.

// without static blocks:
class C {
  static x = ...;
  static y;
  static z;
}

try {
  const obj = doSomethingWith(C.x);
  C.y = obj.y
  C.z = obj.z;
}
catch {
  C.y = ...;
  C.z = ...;
}

// with static blocks:
class C {
  static x = ...;
  static y;
  static z;
  static {
    try {
      const obj = doSomethingWith(this.x);
      this.y = obj.y;
      this.z = obj.z;
    }
    catch {
      this.y = ...;
      this.z = ...;
    }
  }
}

In addition, there are cases where information sharing needs to occur between a class with an instance private field and another class or function declared in the same scope.

Static blocks provide an opportunity to evaluate statements in the context of the current class declaration, with privileged access to private state (be they instance-private or static-private):

let getX;

export class C {
  #x
  constructor(x) {
    this.#x = { data: x };
  }

  static {
    // getX has privileged access to #x
    getX = (obj) => obj.#x;
  }
}

export function readXData(obj) {
  return getX(obj).data;
}

Relation to "Private Declarations"

The Private Declarations proposal also intends to address the issue of privileged access between two classes, by lifting the private name out of the class declaration and into the enclosing scope. While there is some overlap in that respect, private declarations do not solve the issue of multi-step static initialization without potentially exposing a private name to the outer scope purely for initialization purposes:

// with private declarations
private #z; // exposed purely for post-declaration initialization
class C {
  static y;
  static outer #z;
}
const obj = ...;
C.y = obj.y;
C.#z = obj.z;

// with static block
class C {
  static y;
  static #z; // not exposed outside of class
  static {
    const obj = ...;
    this.y = obj.y;
    this.#z = obj.z;
  }
}

In addition, Private Declarations expose a private name that potentially allows both read and write access to shared private state when read-only access might be desireable. To work around this with private declarations requires additional complexity (though there is a similar cost for static{} as well):

// with private declarations
private #zRead;
class C {
  #z = ...; // only writable inside of the class
  get #zRead() { return this.#z; } // wrapper needed to ensure read-only access
}

// with static
let zRead;
class C {
  #z = ...; // only writable inside of the class
  static { zRead = obj => obj.#z; } // callback needed to ensure read-only access
}

In the long run, however, there is nothing that prevents these two proposals from working side-by-side:

private #shared;
class C {
  static outer #shared;
  static #local;
  static {
    const obj = ...;
    this.#shared = obj.shared;
    this.#local = obj.local;
  }
}
class D {
  method() {
    C.#shared; // ok
    C.#local; // no access
  }
}

Prior Art

Syntax

class C {
  static {
    // statements
  }
}

Semantics

  • A static {} initialization block creates a new lexical scope (e.g. var, function, and block-scoped declarations are local to the static {} initialization block. This lexical scope is nested within the lexical scope of the class body (granting privileged access to instance private state for the class).
  • A class may have any number of static {} initialization blocks in its class body.
  • static {} initialization blocks are evaluated in document order interleaved with static field initializers.
  • A static {} initialization block may not have decorators (instead you would decorate the class itself).
  • When evaluated, a static {} initialization block's this receiver is the constructor object of the class (as with static field initializers).
  • It is a Syntax Error to reference arguments from within a static {} initialization block.
  • It is a Syntax Error to include a SuperCall (i.e., super()) from within a static {} initialization block.
  • A static {} initialization block may contain SuperProperty references as a means to access or invoke static members on a base class that may have been overridden by the derived class containing the static {} initialization block.
  • A static {} initialization block should be represented as an independent stack frame in debuggers and exception traces.

Examples

// "friend" access (same module)
let A, B;
{
  let friendA;

  A = class A {
    #x;

    static {
        friendA = {
          getX(obj) { return obj.#x },
          setX(obj, value) { obj.#x = value }
        };
    }
  };

  B = class B {
    constructor(a) {
      const x = friendA.getX(a); // ok
      friendA.setX(a, x); // ok
    }
  };
}

References

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

  • Identified a "champion" who will advance the addition.
  • Prose outlining the problem or need and the general shape of a solution.
  • Illustrative examples of usage.
  • High-level API.

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

For up-to-date information on Stage 4 criteria, check: #48

  • Test262 acceptance tests have been written for mainline usage scenarios and merged.
  • Two compatible implementations which pass the acceptance tests:
  • A pull request has been sent to tc39/ecma262 with the integrated spec text.
  • The ECMAScript editor has signed off on the pull request.

Download Details:
Author: tc39
The Demo/Documentation: View The Demo/Documentation
Download Link: Download The Source Code
Official Website: https://github.com/tc39/proposal-class-static-block 
License: BSD-3
#javascript #es2022 #ecmascript 

Michael  Hamill

Michael Hamill

1618310820

These Tips Will Help You Step Up Anomaly Detection Using ML

In this article, you will learn a couple of Machine Learning-Based Approaches for Anomaly Detection and then show how to apply one of these approaches to solve a specific use case for anomaly detection (Credit Fraud detection) in part two.

A common need when you analyzing real-world data-sets is determining which data point stand out as being different from all other data points. Such data points are known as anomalies, and the goal of anomaly detection (also known as outlier detection) is to determine all such data points in a data-driven fashion. Anomalies can be caused by errors in the data but sometimes are indicative of a new, previously unknown, underlying process.

#machine-learning #machine-learning-algorithms #anomaly-detection #detecting-data-anomalies #data-anomalies #machine-learning-use-cases #artificial-intelligence #fraud-detection

Ismael  Stark

Ismael Stark

1618128600

Credit Card Fraud Detection via Machine Learning: A Case Study

This is the second and last part of my series which focuses on Anomaly Detection using Machine Learning. If you haven’t already, I recommend you read my first article here which will introduce you to Anomaly Detection and its applications in the business world.

In this article, I will take you through a case study focus on Credit Card Fraud Detection. It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. So the main task is to identify fraudulent credit card transactions by using Machine learning. We are going to use a Python library called PyOD which is specifically developed for anomaly detection purposes.

#machine-learning #anomaly-detection #data-anomalies #detecting-data-anomalies #fraud-detection #fraud-detector #data-science #machine-learning-tutorials

Myriam  Rogahn

Myriam Rogahn

1599517440

Z-score for anomaly detection

Most of the time I write longer articles on data science topics but recently I’ve been thinking about writing small, bite-sized pieces around specific concepts, algorithms and applications. This is my first attempt in that direction, hoping people will like these pieces.

In today’s “small-bite” I’m writing about Z-score in the context of anomaly detection.

Anomaly detection is a process for identifying unexpected data, event or behavior that require some examination. It is a well-established field within data science and there is a large number of algorithms to detect anomalies in a dataset depending on data type and business context. Z-score is probably the simplest algorithm that can rapidly screen candidates for further examination to determine whether they are suspicious or not.

What is Z-score

Simply speaking, Z-score is a statistical measure that tells you how far is a data point from the rest of the dataset. In a more technical term, Z-score tells how many standard deviations away a given observation is from the mean.

For example, a Z score of 2.5 means that the data point is 2.5 standard deviation far from the mean. And since it is far from the center, it’s flagged as an outlier/anomaly.

How it works?

Z-score is a parametric measure and it takes two parameters — mean and standard deviation.

Once you calculate these two parameters, finding the Z-score of a data point is easy.

Note that mean and standard deviation are calculated for the whole dataset, whereas x represents every single data point. That means, every data point will have its own z-score, whereas mean/standard deviation remains the same everywhere.

Example

Below is a python implementation of Z-score with a few sample data points. I’m adding notes in each line of code to explain what’s going on.

## import numpy
import numpy as np

## random data points to calculate z-score
data = [5, 5, 5, -99, 5, 5, 5, 5, 5, 5, 88, 5, 5, 5]
## calculate mean
mean = np.mean(data) 
## calculate standard deviation
sd = np.std(data)
## determine a threhold
threshold = 2
## create empty list to store outliers
outliers = []
## detect outlier
for i in data: 
    z = (i-mean)/sd ## calculate z-score
    if abs(z) > threshold:  ## identify outliers
        outliers.append(i) ## add to the empty list
## print outliers    
print("The detected outliers are: ", outliers)

Image for post

Caution and conclusion

If you play with these data you will notice a few things:

  • There are 14 data points and Z-score correctly detected 2 outliers [-99 and 88]. However, if you remove five data points from the list it detects only 1 outlier [-99]. That means you need to have a certain number of data size for Z-score to work.
  • In large production datasets, Z-score works best if data are normally distributed (aka. Gaussian distribution).
  • I used an arbitrary threshold of 2, beyond which all data points are flagged as outliers. The rule of thumb is to use 2, 2.5, 3 or 3.5 as threshold.
  • Finally, Z-score is sensitive to extreme values, because the mean itself is sensitive to extreme values.

Hope this was useful, feel free to get in touch via Twitter.

#machine-learning #anomaly-detection #outlier-detection #statistics #data-science

Dejah  Reinger

Dejah Reinger

1604230740

Introduction to Anomaly Detection Using PyCarat

What is an Anomaly?

An anomaly by definition is something that deviates from what is standard, normal, or expected.

When dealing with datasets on a binary classification problem, we usually deal with a balanced dataset. This ensures that the model picks up the right features to learn. Now, what happens if you have very little data belonging to one class, and almost all data points belong to another class?

In such a case, we consider one classification to be the ‘normal’, and the sparse data points as a deviation from the ‘normal’ classification points.

For example, you lock your house every day twice, at 11 AM before going to the office and 10 PM before sleeping. In case a lock is opened at 2 AM, this would be considered abnormal behavior. Anomaly detection means predicting these instances and is used for Intrusion Detection, Fraud Detection, health monitoring, etc.

In this article, I show you how to use pycaret on a dataset for anomaly detection.

What is PyCaret?

PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights. It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those **new to data science **with little or no background in coding. PyCaret allows you to go from preparing your data to deploying your model within seconds using your choice of notebook environment.

So, simply put, pycaret makes it super easy for you to visualize and train a model on your datasets within 3 lines of code!

So let’s dive in!

#anomaly-detection #machine-learning #anomaly #fraud-detection #pycaret