1593526440

# Measuring Agreement with Cohen’s Kappa Statistic

A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary returns. Due to these instances being outliers, it is hard to gather enough data to train a model on how to spot them. Some people dedicate their entire careers to creating strategies to combat imbalanced data. I’ll table those strategies for another blog another day.

Everyone has a strong intuition of what accuracy and error are. This is the same scoring system we used throughout our academic careers. Accuracy is the ratio of correct answers to total questions. Error is the ratio of wrong answers to total questions. These metrics work well for an ideal data set — which doesn’t exist in the real world.

An example of the accuracy paradox is if the there is a 99% chance that a person does not have a particular disease, then a model predicting no one ever has the disease will be 99% accurate.

## Calculating Cohen’s kappa statistic by hand

Cohen’s kappa statistic is easy to understand and doesn’t fall victim to the accuracy paradox. Stay with me as I get through the technical jargon. Pₒ is the observed agreement among graders. **Pₑ is the hypothetical chance of the graders arriving at the same answer. **Grader 1 will be your model’s predictions, and grader 2 may be your y_test data (the answer key).

Cohen’s kappa Statistic Formula

Let’s calculate Cohen’s kappa statistic by hand using a simple example. You scored a 7 out of 10 on an exam with only true and false answers. The teacher has the answer key (the y_test data). To calculate the agreement between your predictions and the answer key we must find Pₒ and Pₑ. First, to find Pₒ we add in the agreement of correct answers.

We have now calculated a kappa statistic of 0.4 and Sci-kit learn confirmed the value. The kappa statistic ranges from -1 to 1. The max value of 1 means both graders are in perfect agreement. A value of 0 is equal to the random chance of an agreement. So our value of 0.4 means we are in slight agreement. Depending on the situation this may or may not be acceptable. For example, when determining if a patient has cancer you would want a much higher kappa statistic for the model to be acceptable.

#data-science #statistics #machine-learning #python #supervised-learning

1593526440

## Measuring Agreement with Cohen’s Kappa Statistic

A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary returns. Due to these instances being outliers, it is hard to gather enough data to train a model on how to spot them. Some people dedicate their entire careers to creating strategies to combat imbalanced data. I’ll table those strategies for another blog another day.

Everyone has a strong intuition of what accuracy and error are. This is the same scoring system we used throughout our academic careers. Accuracy is the ratio of correct answers to total questions. Error is the ratio of wrong answers to total questions. These metrics work well for an ideal data set — which doesn’t exist in the real world.

An example of the accuracy paradox is if the there is a 99% chance that a person does not have a particular disease, then a model predicting no one ever has the disease will be 99% accurate.

## Calculating Cohen’s kappa statistic by hand

Cohen’s kappa statistic is easy to understand and doesn’t fall victim to the accuracy paradox. Stay with me as I get through the technical jargon. Pₒ is the observed agreement among graders. **Pₑ is the hypothetical chance of the graders arriving at the same answer. **Grader 1 will be your model’s predictions, and grader 2 may be your y_test data (the answer key).

Cohen’s kappa Statistic Formula

Let’s calculate Cohen’s kappa statistic by hand using a simple example. You scored a 7 out of 10 on an exam with only true and false answers. The teacher has the answer key (the y_test data). To calculate the agreement between your predictions and the answer key we must find Pₒ and Pₑ. First, to find Pₒ we add in the agreement of correct answers.

We have now calculated a kappa statistic of 0.4 and Sci-kit learn confirmed the value. The kappa statistic ranges from -1 to 1. The max value of 1 means both graders are in perfect agreement. A value of 0 is equal to the random chance of an agreement. So our value of 0.4 means we are in slight agreement. Depending on the situation this may or may not be acceptable. For example, when determining if a patient has cancer you would want a much higher kappa statistic for the model to be acceptable.

#data-science #statistics #machine-learning #python #supervised-learning

1593564878

## Levels of Measurements

Photo by William Warby on Unsplash

Measurement is the process of assigning numbers to quantities (variables). The process is so familiar that perhaps we often overlook its fundamental characteristics. A single measure of some attribute (for example, weight) of sample is called statistic. These attributes have inherent properties too that are similar to numbers that we assign to them during measurement. When we assign numbers to attributes (i.e., during measurement), we can do so poorly, in which case the properties of the numbers to not correspond to the properties of the attributes. In such a case, we achieve only a “low level of measurement” (in other words, low accuracy). Remember that in the earlier module we have seen that the term accuracy refers to the absolute difference between measurement and real value. On the other hand, if the properties of our assigned numbers correspond properly to those of the assigned attributes, we achieve a high level of measurement (that is, high accuracy).

American statistician Stanley Smith Stevens is credited with introducing various levels of measurements. Stevens (1946) said: “All measurements in science are conducted using four different types of scales nominal, ordinal, interval and ratio”. These levels are arranged in ascending order of increasing accuracy. That is, nominal level is lowest in accuracy, while ratio level is highest in accuracy. For the ensuing discussion, the following example is used. Six athletes try out for a sprinter’s position in CUPB Biologists’ Race. They all run a 100-meter dash, and are timed by several coaches each using a different stopwatch (U through Z). Only the stopwatch U captures the true time, stopwatches V through Z are erroneous, but at different levels of measurement. Readings obtained after the sprint is given in Table.

# Nominal level of measurement

Nominal scale captures only equivalence (same or different) and set membership. These sets are commonly called categories, or labels. Consider the results of sprint competition, Table 1. Watch V is virtually useless, but it has captured a basic property of the running times. Namely, two values given by the watch are the same if and only if two actual times are the same. For example, participants Shatakshi and Tejaswini took same time in the race (13s), and as per the readings of stopwatch V, this basic property remains same (20s each). By looking at the results from stopwatch V, it is cogent to conclude that ‘Shatakshi and Tejaswini took same time in the race’. This attribute is called equivalency. We can conclude that watch V has achieved only a nominal level of measurement. Variables assessed on a nominal scale are called categorical variables. Examples include first names, gender, race, religion, nationality, taxonomic ranks, parts of speech, expired vs non expired goods, patient vs. healthy, rock types etc. Correlating two nominal categories is very difficult, because any relationships that occur are usually deemed to be spurious, and thus unimportant. For example, trying to figure out how many people from Assam have first names starting with the letter ‘A’ would be a fairly arbitrary, random exercise.

# Ordinal level of measurement

Ordinal scale captures rank-ordering attribute, in addition to all attributes captured by nominal level. Consider the results of sprint competition, Table 1. Ascending order of time taken by the participants as revealed by the true time are (respective ranks in parentheses): Navjot (1), Surbhi (2), Sayyed (3), Shatakshi and Tejaswini (4 each), and Shweta (5). Besides capturing the same-difference property of nominal level, stopwatches W and X have captured the correct ordering of race outcome. We say that the stopwatches W and X have achieved an ordinal level of measurement. Rank-ordering data simply puts the data on an ordinal scale. Examples at this level of measurement include IQ Scores, Academic Scores (marks), Percentiles and so on. Rank ordering (ordinal measurement) is possible with a number of subjective measurement surveys. For example, a questionnaire survey for the public perception of evolution in India included the participants to choose an appropriate response ‘completely agree’, ‘mostly agree’, ‘mostly disagree’, ‘completely disagree’ when measuring their agreement to the statement “men evolved from earlier animals”.

#measurement #data-analysis #data #statistical-analysis #statistics #data analysis

1636360749

## Std Library Types - Rust By Example

The `std` library provides many custom types which expands drastically on the `primitives`. Some of these include:

• growable `String`s like: `"hello world"`
• growable vectors: `[1, 2, 3]`
• optional types: `Option<i32>`
• error handling types: `Result<i32, i32>`
• heap allocated pointers: `Box<i32>`

## Box, stack and heap

All values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a `Box<T>`. A box is a smart pointer to a heap allocated value of type `T`. When a box goes out of scope, its destructor is called, the inner object is destroyed, and the memory on the heap is freed.

Boxed values can be dereferenced using the `*` operator; this removes one layer of indirection.

``````use std::mem;

#[derive(Debug, Clone, Copy)]
struct Point {
x: f64,
y: f64,
}

// A Rectangle can be specified by where its top left and bottom right
// corners are in space
struct Rectangle {
top_left: Point,
bottom_right: Point,
}

fn origin() -> Point {
Point { x: 0.0, y: 0.0 }
}

fn boxed_origin() -> Box<Point> {
// Allocate this point on the heap, and return a pointer to it
Box::new(Point { x: 0.0, y: 0.0 })
}

fn main() {
// (all the type annotations are superfluous)
// Stack allocated variables
let point: Point = origin();
let rectangle: Rectangle = Rectangle {
top_left: origin(),
bottom_right: Point { x: 3.0, y: -4.0 }
};

// Heap allocated rectangle
let boxed_rectangle: Box<Rectangle> = Box::new(Rectangle {
top_left: origin(),
bottom_right: Point { x: 3.0, y: -4.0 },
});

// The output of functions can be boxed
let boxed_point: Box<Point> = Box::new(origin());

// Double indirection
let box_in_a_box: Box<Box<Point>> = Box::new(boxed_origin());

println!("Point occupies {} bytes on the stack",
mem::size_of_val(&point));
println!("Rectangle occupies {} bytes on the stack",
mem::size_of_val(&rectangle));

// box size == pointer size
println!("Boxed point occupies {} bytes on the stack",
mem::size_of_val(&boxed_point));
println!("Boxed rectangle occupies {} bytes on the stack",
mem::size_of_val(&boxed_rectangle));
println!("Boxed box occupies {} bytes on the stack",
mem::size_of_val(&box_in_a_box));

// Copy the data contained in `boxed_point` into `unboxed_point`
let unboxed_point: Point = *boxed_point;
println!("Unboxed point occupies {} bytes on the stack",
mem::size_of_val(&unboxed_point));
}
``````

## Vectors

Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time. A vector is represented using 3 parameters:

• pointer to the data
• length
• capacity

The capacity indicates how much memory is reserved for the vector. The vector can grow as long as the length is smaller than the capacity. When this threshold needs to be surpassed, the vector is reallocated with a larger capacity.

``````fn main() {
// Iterators can be collected into vectors
let collected_iterator: Vec<i32> = (0..10).collect();
println!("Collected (0..10) into: {:?}", collected_iterator);

// The `vec!` macro can be used to initialize a vector
let mut xs = vec![1i32, 2, 3];
println!("Initial vector: {:?}", xs);

// Insert new element at the end of the vector
println!("Push 4 into the vector");
xs.push(4);
println!("Vector: {:?}", xs);

// Error! Immutable vectors can't grow
collected_iterator.push(0);
// FIXME ^ Comment out this line

// The `len` method yields the number of elements currently stored in a vector
println!("Vector length: {}", xs.len());

// Indexing is done using the square brackets (indexing starts at 0)
println!("Second element: {}", xs[1]);

// `pop` removes the last element from the vector and returns it
println!("Pop last element: {:?}", xs.pop());

// Out of bounds indexing yields a panic
println!("Fourth element: {}", xs[3]);
// FIXME ^ Comment out this line

// `Vector`s can be easily iterated over
println!("Contents of xs:");
for x in xs.iter() {
println!("> {}", x);
}

// A `Vector` can also be iterated over while the iteration
// count is enumerated in a separate variable (`i`)
for (i, x) in xs.iter().enumerate() {
println!("In position {} we have value {}", i, x);
}

// Thanks to `iter_mut`, mutable `Vector`s can also be iterated
// over in a way that allows modifying each value
for x in xs.iter_mut() {
*x *= 3;
}
println!("Updated vector: {:?}", xs);
}
``````

More `Vec` methods can be found under the std::vec module

## Strings

There are two types of strings in Rust: `String` and `&str`.

A `String` is stored as a vector of bytes (`Vec<u8>`), but guaranteed to always be a valid UTF-8 sequence. `String` is heap allocated, growable and not null terminated.

`&str` is a slice (`&[u8]`) that always points to a valid UTF-8 sequence, and can be used to view into a `String`, just like `&[T]` is a view into `Vec<T>`.

``````fn main() {
// (all the type annotations are superfluous)
// A reference to a string allocated in read only memory
let pangram: &'static str = "the quick brown fox jumps over the lazy dog";
println!("Pangram: {}", pangram);

// Iterate over words in reverse, no new string is allocated
println!("Words in reverse");
for word in pangram.split_whitespace().rev() {
println!("> {}", word);
}

// Copy chars into a vector, sort and remove duplicates
let mut chars: Vec<char> = pangram.chars().collect();
chars.sort();
chars.dedup();

// Create an empty and growable `String`
let mut string = String::new();
for c in chars {
// Insert a char at the end of string
string.push(c);
// Insert a string at the end of string
string.push_str(", ");
}

// The trimmed string is a slice to the original string, hence no new
// allocation is performed
let chars_to_trim: &[char] = &[' ', ','];
let trimmed_str: &str = string.trim_matches(chars_to_trim);
println!("Used characters: {}", trimmed_str);

// Heap allocate a string
let alice = String::from("I like dogs");
// Allocate new memory and store the modified string there
let bob: String = alice.replace("dog", "cat");

println!("Alice says: {}", alice);
println!("Bob says: {}", bob);
}
``````

More `str`/`String` methods can be found under the std::str and std::string modules

### Literals and escapes

There are multiple ways to write string literals with special characters in them. All result in a similar `&str` so it's best to use the form that is the most convenient to write. Similarly there are multiple ways to write byte string literals, which all result in `&[u8; N]`.

Generally special characters are escaped with a backslash character: `\`. This way you can add any character to your string, even unprintable ones and ones that you don't know how to type. If you want a literal backslash, escape it with another one: `\\`

String or character literal delimiters occuring within a literal must be escaped: `"\""`, `'\''`.

``````fn main() {
// You can use escapes to write bytes by their hexadecimal values...
let byte_escape = "I'm writing \x52\x75\x73\x74!";
println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);

// ...or Unicode code points.
let unicode_codepoint = "\u{211D}";
let character_name = "\"DOUBLE-STRUCK CAPITAL R\"";

println!("Unicode character {} (U+211D) is called {}",
unicode_codepoint, character_name );

let long_string = "String literals
can span multiple lines.
The linebreak and indentation here ->\
<- can be escaped too!";
println!("{}", long_string);
}
``````

Sometimes there are just too many characters that need to be escaped or it's just much more convenient to write a string out as-is. This is where raw string literals come into play.

``````fn main() {
let raw_str = r"Escapes don't work here: \x3F \u{211D}";
println!("{}", raw_str);

// If you need quotes in a raw string, add a pair of #s
let quotes = r#"And then I said: "There is no escape!""#;
println!("{}", quotes);

// If you need "# in your string, just use more #s in the delimiter.
// There is no limit for the number of #s you can use.
let longer_delimiter = r###"A string with "# in it. And even "##!"###;
println!("{}", longer_delimiter);
}
``````

Want a string that's not UTF-8? (Remember, `str` and `String` must be valid UTF-8). Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!

``````use std::str;

fn main() {
// Note that this is not actually a `&str`
let bytestring: &[u8; 21] = b"this is a byte string";

// Byte arrays don't have the `Display` trait, so printing them is a bit limited
println!("A byte string: {:?}", bytestring);

// Byte strings can have byte escapes...
let escaped = b"\x52\x75\x73\x74 as bytes";
// ...but no unicode escapes
// let escaped = b"\u{211D} is not allowed";
println!("Some escaped bytes: {:?}", escaped);

// Raw byte strings work just like raw strings
let raw_bytestring = br"\u{211D} is not escaped here";
println!("{:?}", raw_bytestring);

// Converting a byte array to `str` can fail
if let Ok(my_str) = str::from_utf8(raw_bytestring) {
println!("And the same as text: '{}'", my_str);
}

let _quotes = br#"You can also use "fancier" formatting, \
like with normal raw strings"#;

// Byte strings don't have to be UTF-8
let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82\xbb"; // "ようこそ" in SHIFT-JIS

// But then they can't always be converted to `str`
match str::from_utf8(shift_jis) {
Ok(my_str) => println!("Conversion successful: '{}'", my_str),
Err(e) => println!("Conversion failed: {:?}", e),
};
}
``````

For conversions between character encodings check out the encoding crate.

A more detailed listing of the ways to write string literals and escape characters is given in the 'Tokens' chapter of the Rust Reference.

## `Option`

Sometimes it's desirable to catch the failure of some parts of a program instead of calling `panic!`; this can be accomplished using the `Option` enum.

The `Option<T>` enum has two variants:

• `None`, to indicate failure or lack of value, and
• `Some(value)`, a tuple struct that wraps a `value` with type `T`.
``````// An integer division that doesn't `panic!`
fn checked_division(dividend: i32, divisor: i32) -> Option<i32> {
if divisor == 0 {
// Failure is represented as the `None` variant
None
} else {
// Result is wrapped in a `Some` variant
Some(dividend / divisor)
}
}

// This function handles a division that may not succeed
fn try_division(dividend: i32, divisor: i32) {
// `Option` values can be pattern matched, just like other enums
match checked_division(dividend, divisor) {
None => println!("{} / {} failed!", dividend, divisor),
Some(quotient) => {
println!("{} / {} = {}", dividend, divisor, quotient)
},
}
}

fn main() {
try_division(4, 2);
try_division(1, 0);

// Binding `None` to a variable needs to be type annotated
let none: Option<i32> = None;
let _equivalent_none = None::<i32>;

let optional_float = Some(0f32);

// Unwrapping a `Some` variant will extract the value wrapped.
println!("{:?} unwraps to {:?}", optional_float, optional_float.unwrap());

// Unwrapping a `None` variant will `panic!`
println!("{:?} unwraps to {:?}", none, none.unwrap());
}
``````

## `Result`

We've seen that the `Option` enum can be used as a return value from functions that may fail, where `None` can be returned to indicate failure. However, sometimes it is important to express why an operation failed. To do this we have the `Result` enum.

The `Result<T, E>` enum has two variants:

• `Ok(value)` which indicates that the operation succeeded, and wraps the `value` returned by the operation. (`value` has type `T`)
• `Err(why)`, which indicates that the operation failed, and wraps `why`, which (hopefully) explains the cause of the failure. (`why` has type `E`)
``````mod checked {
// Mathematical "errors" we want to catch
#[derive(Debug)]
pub enum MathError {
DivisionByZero,
NonPositiveLogarithm,
NegativeSquareRoot,
}

pub type MathResult = Result<f64, MathError>;

pub fn div(x: f64, y: f64) -> MathResult {
if y == 0.0 {
// This operation would `fail`, instead let's return the reason of
// the failure wrapped in `Err`
Err(MathError::DivisionByZero)
} else {
// This operation is valid, return the result wrapped in `Ok`
Ok(x / y)
}
}

pub fn sqrt(x: f64) -> MathResult {
if x < 0.0 {
Err(MathError::NegativeSquareRoot)
} else {
Ok(x.sqrt())
}
}

pub fn ln(x: f64) -> MathResult {
if x <= 0.0 {
Err(MathError::NonPositiveLogarithm)
} else {
Ok(x.ln())
}
}
}

// `op(x, y)` === `sqrt(ln(x / y))`
fn op(x: f64, y: f64) -> f64 {
// This is a three level match pyramid!
match checked::div(x, y) {
Err(why) => panic!("{:?}", why),
Ok(ratio) => match checked::ln(ratio) {
Err(why) => panic!("{:?}", why),
Ok(ln) => match checked::sqrt(ln) {
Err(why) => panic!("{:?}", why),
Ok(sqrt) => sqrt,
},
},
}
}

fn main() {
// Will this fail?
println!("{}", op(1.0, 10.0));
}
``````

### `?`

Chaining results using match can get pretty untidy; luckily, the `?` operator can be used to make things pretty again. `?` is used at the end of an expression returning a `Result`, and is equivalent to a match expression, where the `Err(err)` branch expands to an early `Err(From::from(err))`, and the `Ok(ok)` branch expands to an `ok` expression.

``````mod checked {
#[derive(Debug)]
enum MathError {
DivisionByZero,
NonPositiveLogarithm,
NegativeSquareRoot,
}

type MathResult = Result<f64, MathError>;

fn div(x: f64, y: f64) -> MathResult {
if y == 0.0 {
Err(MathError::DivisionByZero)
} else {
Ok(x / y)
}
}

fn sqrt(x: f64) -> MathResult {
if x < 0.0 {
Err(MathError::NegativeSquareRoot)
} else {
Ok(x.sqrt())
}
}

fn ln(x: f64) -> MathResult {
if x <= 0.0 {
Err(MathError::NonPositiveLogarithm)
} else {
Ok(x.ln())
}
}

// Intermediate function
fn op_(x: f64, y: f64) -> MathResult {
// if `div` "fails", then `DivisionByZero` will be `return`ed
let ratio = div(x, y)?;

// if `ln` "fails", then `NonPositiveLogarithm` will be `return`ed
let ln = ln(ratio)?;

sqrt(ln)
}

pub fn op(x: f64, y: f64) {
match op_(x, y) {
Err(why) => panic!("{}", match why {
MathError::NonPositiveLogarithm
=> "logarithm of non-positive number",
MathError::DivisionByZero
=> "division by zero",
MathError::NegativeSquareRoot
=> "square root of negative number",
}),
Ok(value) => println!("{}", value),
}
}
}

fn main() {
checked::op(1.0, 10.0);
}
``````

Be sure to check the documentation, as there are many methods to map/compose `Result`.

## `panic!`

The `panic!` macro can be used to generate a panic and start unwinding its stack. While unwinding, the runtime will take care of freeing all the resources owned by the thread by calling the destructor of all its objects.

Since we are dealing with programs with only one thread, `panic!` will cause the program to report the panic message and exit.

``````// Re-implementation of integer division (/)
fn division(dividend: i32, divisor: i32) -> i32 {
if divisor == 0 {
// Division by zero triggers a panic
panic!("division by zero");
} else {
dividend / divisor
}
}

// The `main` task
fn main() {
// Heap allocated integer
let _x = Box::new(0i32);

// This operation will trigger a task failure
division(3, 0);

println!("This point won't be reached!");

// `_x` should get destroyed at this point
}
``````

Let's check that `panic!` doesn't leak memory.

``````\$ rustc panic.rs && valgrind ./panic
==4401== Memcheck, a memory error detector
==4401== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4401== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==4401== Command: ./panic
==4401==
thread '<main>' panicked at 'division by zero', panic.rs:5
==4401==
==4401== HEAP SUMMARY:
==4401==     in use at exit: 0 bytes in 0 blocks
==4401==   total heap usage: 18 allocs, 18 frees, 1,648 bytes allocated
==4401==
==4401== All heap blocks were freed -- no leaks are possible
==4401==
==4401== For counts of detected and suppressed errors, rerun with: -v
==4401== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
``````

## HashMap

Where vectors store values by an integer index, `HashMap`s store values by key. `HashMap` keys can be booleans, integers, strings, or any other type that implements the `Eq` and `Hash` traits. More on this in the next section.

Like vectors, `HashMap`s are growable, but HashMaps can also shrink themselves when they have excess space. You can create a HashMap with a certain starting capacity using `HashMap::with_capacity(uint)`, or use `HashMap::new()` to get a HashMap with a default initial capacity (recommended).

``````use std::collections::HashMap;

fn call(number: &str) -> &str {
match number {
"798-1364" => "We're sorry, the call cannot be completed as dialed.
Please hang up and try again.",
"645-7689" => "Hello, this is Mr. Awesome's Pizza. My name is Fred.
What can I get for you today?",
_ => "Hi! Who is this again?"
}
}

fn main() {
let mut contacts = HashMap::new();

contacts.insert("Daniel", "798-1364");
contacts.insert("Ashley", "645-7689");
contacts.insert("Katie", "435-8291");
contacts.insert("Robert", "956-1745");

// Takes a reference and returns Option<&V>
match contacts.get(&"Daniel") {
Some(&number) => println!("Calling Daniel: {}", call(number)),
_ => println!("Don't have Daniel's number."),
}

// `HashMap::insert()` returns `None`
// if the inserted value is new, `Some(value)` otherwise
contacts.insert("Daniel", "164-6743");

match contacts.get(&"Ashley") {
Some(&number) => println!("Calling Ashley: {}", call(number)),
_ => println!("Don't have Ashley's number."),
}

contacts.remove(&"Ashley");

// `HashMap::iter()` returns an iterator that yields
// (&'a key, &'a value) pairs in arbitrary order.
for (contact, &number) in contacts.iter() {
println!("Calling {}: {}", contact, call(number));
}
}
``````

For more information on how hashing and hash maps (sometimes called hash tables) work, have a look at Hash Table Wikipedia

### Alternate/custom key types

Any type that implements the `Eq` and `Hash` traits can be a key in `HashMap`. This includes:

• `bool` (though not very useful since there is only two possible keys)
• `int`, `uint`, and all variations thereof
• `String` and `&str` (protip: you can have a `HashMap` keyed by `String` and call `.get()` with an `&str`)

Note that `f32` and `f64` do not implement `Hash`, likely because floating-point precision errors would make using them as hashmap keys horribly error-prone.

All collection classes implement `Eq` and `Hash` if their contained type also respectively implements `Eq` and `Hash`. For example, `Vec<T>` will implement `Hash` if `T` implements `Hash`.

You can easily implement `Eq` and `Hash` for a custom type with just one line: `#[derive(PartialEq, Eq, Hash)]`

The compiler will do the rest. If you want more control over the details, you can implement `Eq` and/or `Hash` yourself. This guide will not cover the specifics of implementing `Hash`.

To play around with using a `struct` in `HashMap`, let's try making a very simple user logon system:

``````use std::collections::HashMap;

// Eq requires that you derive PartialEq on the type.
#[derive(PartialEq, Eq, Hash)]
struct Account<'a>{
}

struct AccountInfo<'a>{
name: &'a str,
email: &'a str,
}

type Accounts<'a> = HashMap<Account<'a>, AccountInfo<'a>>;

fn try_logon<'a>(accounts: &Accounts<'a>,
println!("Attempting logon...");

let logon = Account {
};

match accounts.get(&logon) {
Some(account_info) => {
println!("Successful logon!");
println!("Name: {}", account_info.name);
println!("Email: {}", account_info.email);
},
_ => println!("Login failed!"),
}
}

fn main(){
let mut accounts: Accounts = HashMap::new();

let account = Account {
};

let account_info = AccountInfo {
name: "John Everyman",
email: "j.everyman@email.com",
};

accounts.insert(account, account_info);

try_logon(&accounts, "j.everyman", "psasword123");

}
``````

### HashSet

Consider a `HashSet` as a `HashMap` where we just care about the keys ( `HashSet<T>` is, in actuality, just a wrapper around `HashMap<T, ()>`).

"What's the point of that?" you ask. "I could just store the keys in a `Vec`."

A `HashSet`'s unique feature is that it is guaranteed to not have duplicate elements. That's the contract that any set collection fulfills. `HashSet` is just one implementation. (see also: `BTreeSet`)

If you insert a value that is already present in the `HashSet`, (i.e. the new value is equal to the existing and they both have the same hash), then the new value will replace the old.

This is great for when you never want more than one of something, or when you want to know if you've already got something.

But sets can do more than that.

Sets have 4 primary operations (all of the following calls return an iterator):

`union`: get all the unique elements in both sets.

`difference`: get all the elements that are in the first set but not the second.

`intersection`: get all the elements that are only in both sets.

`symmetric_difference`: get all the elements that are in one set or the other, but not both.

Try all of these in the following example:

``````use std::collections::HashSet;

fn main() {
let mut a: HashSet<i32> = vec![1i32, 2, 3].into_iter().collect();
let mut b: HashSet<i32> = vec![2i32, 3, 4].into_iter().collect();

assert!(a.insert(4));
assert!(a.contains(&4));

// `HashSet::insert()` returns false if
// there was a value already present.
assert!(b.insert(4), "Value 4 is already in set B!");
// FIXME ^ Comment out this line

b.insert(5);

// If a collection's element type implements `Debug`,
// then the collection implements `Debug`.
// It usually prints its elements in the format `[elem1, elem2, ...]`
println!("A: {:?}", a);
println!("B: {:?}", b);

// Print [1, 2, 3, 4, 5] in arbitrary order
println!("Union: {:?}", a.union(&b).collect::<Vec<&i32>>());

// This should print [1]
println!("Difference: {:?}", a.difference(&b).collect::<Vec<&i32>>());

// Print [2, 3, 4] in arbitrary order.
println!("Intersection: {:?}", a.intersection(&b).collect::<Vec<&i32>>());

// Print [1, 5]
println!("Symmetric Difference: {:?}",
a.symmetric_difference(&b).collect::<Vec<&i32>>());
}
``````

(Examples are adapted from the documentation.)

## `Rc`

When multiple ownership is needed, `Rc`(Reference Counting) can be used. `Rc` keeps track of the number of the references which means the number of owners of the value wrapped inside an `Rc`.

Reference count of an `Rc` increases by 1 whenever an `Rc` is cloned, and decreases by 1 whenever one cloned `Rc` is dropped out of the scope. When an `Rc`'s reference count becomes zero, which means there are no owners remained, both the `Rc` and the value are all dropped.

Cloning an `Rc` never performs a deep copy. Cloning creates just another pointer to the wrapped value, and increments the count.

``````use std::rc::Rc;

fn main() {
let rc_examples = "Rc examples".to_string();
{
println!("--- rc_a is created ---");

let rc_a: Rc<String> = Rc::new(rc_examples);
println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));

{
println!("--- rc_a is cloned to rc_b ---");

let rc_b: Rc<String> = Rc::clone(&rc_a);
println!("Reference Count of rc_b: {}", Rc::strong_count(&rc_b));
println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));

// Two `Rc`s are equal if their inner values are equal
println!("rc_a and rc_b are equal: {}", rc_a.eq(&rc_b));

// We can use methods of a value directly
println!("Length of the value inside rc_a: {}", rc_a.len());
println!("Value of rc_b: {}", rc_b);

println!("--- rc_b is dropped out of scope ---");
}

println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));

println!("--- rc_a is dropped out of scope ---");
}

// Error! `rc_examples` already moved into `rc_a`
// And when `rc_a` is dropped, `rc_examples` is dropped together
// println!("rc_examples: {}", rc_examples);
// TODO ^ Try uncommenting this line
}
``````

## Arc

When shared ownership between threads is needed, `Arc`(Atomic Reference Counted) can be used. This struct, via the `Clone` implementation can create a reference pointer for the location of a value in the memory heap while increasing the reference counter. As it shares ownership between threads, when the last reference pointer to a value is out of scope, the variable is dropped.

``````
fn main() {
use std::sync::Arc;

// This variable declaration is where its value is specified.
let apple = Arc::new("the same apple");

for _ in 0..10 {
// Here there is no value specification as it is a pointer to a reference
// in the memory heap.
let apple = Arc::clone(&apple);

// As Arc was used, threads can be spawned using the value allocated
// in the Arc variable pointer's location.
println!("{:?}", apple);
});
}
}

``````

Original article source at https://doc.rust-lang.org

#rust #programming #developer

1595096220

## Factors That Can Contribute to the Faulty Statistical Inference

Hypothesis testing is a procedure where researchers make a precise statement based on their findings or data. Then, they collect evidence to falsify that precise statement or claim. This precise statement or claim is called the null hypothesis. If the evidence is strong to falsify the null hypothesis, we can reject the null hypothesis and adapt the alternative hypothesis. This is the basic idea of hypothesis testing.

## Error Types in Statistical Testing

There are two distinct types of errors that can occur in formal hypothesis testing. They are:

Type I: Type I error occurs when the null hypothesis is true but the hypothesis testing results show the evidence to reject it. This is called a false positive.

Type II: Type II error occurs when the null hypothesis is not true but it is not rejected in hypothesis testing.

Most hypothesis testing procedure performs well controlling type I error (at 5%) in ideal conditions. That may give a false idea that there is only a 5% probability that the reported findings are wrong. But it’s not that simple. The probability can be much higher than 5%.

## Normality of the Data

The normality of the data is an issue that can break down a statistical test. If the dataset is small, the normality of the data is very important for some statistical processes such as confidence interval or p-test. But if the data is large enough, normality does not have a significant impact.

## Correlation

If the variables in the dataset are correlated with each other, that may result in poor statistical inference. Look at this picture below:

In this graph, two variables seem to have a strong correlation. Or, if a series of data is observed as a sequence, that means values are correlated with its neighbors, and there may have some clustering or autocorrelation in the data. This kind of behavior in the dataset can adversely impact the statistical tests.

## Correlation and Causation

This is especially important when interpreting the result of a statistical test. “Correlation does not mean causation”. Here is an example. Suppose, you have study data that shows, more people who do not have college education believe that women should get paid less than men in the workplace. You may have conducted a good hypothesis testing and prove that. But care must be taken on what conclusion is drawn from this. Probably, there is a correlation between college education and the belief that ‘women should get paid less’. But it is not fair to say that not having a college degree is the cause of such belief. This is a correlation but not a direct cause ad effect relationship.

A more clear example can be provided from medical data. Studies showed that people with fewer cavities are less likely to get heart disease. You may have enough data to statistically prove that but you actually cannot say that the dental cavity causes heart disease. There is no medical theory like that.

#statistical-analysis #statistics #statistical-inference #math #data analysis

1616127982

## Lufkin Diameter Tape Measure

Lufkin Metric Tape Measure for construction and engineering, with Your Corporate LOGO! We have a variety of pipe diameter tapes, construction tapes, and architect tapes. Shop Now.

#diameter tape measure #pipe diameter tape measure #diameter tape #lufkin metric tape measure