Wilford  Pagac

Wilford Pagac

1597164060

Vectorize everything with Julia

Do you ever feel like for loops are taking over your life and there’s no escape from them? Do you feel trapped by all those loops? Well, fear not! There’s a way out! I’ll show you how to do the FizzBuzz challenge without any for loops at all.

Vectorize all the things! — SOURCE

The task of FizzBuzz is to print every number up to 100, but replace numbers divisible by 3 with “Fizz”, numbers divisible by 5 by “Buzz” and numbers that are divisible by both 3 and 5 have to be replaced by “FizzBuzz”.

Solving FizzBuzz with for loops is easy, you can even do this in BigQuery. Here, I’ll show you an alternative way of doing this — without any for loops whatsoever. The solution is Vectorised Functions.

If you already had some experience with R and Python, you’ve probably already come across vectorised functions in standard R or via Python’s numpy library. Let’s see how we can use them in Julia similarly.

Vectorised functions are great as they reduce the clutter often associated with for loops.

First steps with vectorized functions

Before we dive into solving FizzBuzz let’s see how you can replace a very simple for loop with a vectorized alternative in Julia.

Let’s start with a trivial task: Given a vector _a_ add 1 to each element of it.

For loop version:

a = [1,2,3];

	for i in 1:length(a)
	    a[i] += 1
	end
julia> print(a)

[2, 3, 4]

Vectorized version:

The above gets the job done, but it takes up 3 lines and a lot more characters than needed. If a was a numpy array in Python 🐍, you could just do a + 1 and job done. But first, you would have to convert your plain old array to a numpy array.

	a = [1,2,3];

	a .+ 1

Julia has a clever solution. You can use the broadcast operator . to apply an operation — in this case, addition — to all elements of an object. Here it is in action:

This gives the same answer as the for loop above. And there’s no need to convert your array.

SOURCE

Even better than that, you can broadcast any function of your liking, even your own ones. Here we calculate the area of a circle and then we broadcast it across our array:

function area_of_circle(r)
	    return π * r^2
	end

	a = [1,2,3];
	area_of_circle.(a)

Yes, pi is a built in constant in Julia!

julia> area_of_circle.(a)

3-element Array{Float64,1}:
  3.141592653589793
 12.566370614359172
 28.274333882308138

Bye-bye for loops

Image for post

Bye-bye for loops! — SOURCE

Now that we know the basics, let’s do FizzBuzz! But remember, no for loops allowed.

We will rephrase our problem a little bit. Instead of printing the numbers, Fizzes and Buzzes, we’ll return all of them together as a vector. I’ll break down the solution the same way as in the for loop article [LINK], so if you haven’t seen the previous posts, now would be a good time to check it out!

First, let’s return the numbers up until nas a vector:

	function fizzbuzz(n)
	    return collect(1:n)
	end

Here, collect just takes our range operator and evaluates it to an array.

julia> fizzbuzz(5)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Adding Fizzes

This works. Let’s see if we can print Fizz for each number that’s divisible by 3. We can do this by replacing all numbers that are divisible by 3 with a Fizz string.

julia> fizzbuzz(7)

7-element Array{String,1}:
 "1"
 "2"
 "Fizz"
 "4"
 "5"
 "Fizz"
 "7"

Let’s break this down step by step:

  • Why did we replace everything with string? Well, the array of numbers are just that, an array of numbers. We don’t want to have numbers and strings mingled up in a single object.
  • We broadcast rem.(numbers, 3 to find the remainder of all the numbers.
  • Then we compared this array of remainders elementwise to 0 ( .== 0 ).
  • Finally, we indexed our string array with the boolean mask and assigned “Fizz” to every element where our mask says true.

Feel free to break these steps down and try them in your own Julia REPL!

I know that the use of .= to assign a single element to many can be a bit controversial, but I actually quite like it. By explicitly specifying the broadcast of assignment you force yourself to think about the differences of these objects and everyone who reads your code afterwards will see that one is a vector and the other one is a scalar.

Adding Buzzes

Adding the Buzzes is done exactly the same way:

#programming #julia #optimization #coding #vectorization

What is GEEK

Buddha Community

Vectorize everything with Julia
Wilford  Pagac

Wilford Pagac

1597164060

Vectorize everything with Julia

Do you ever feel like for loops are taking over your life and there’s no escape from them? Do you feel trapped by all those loops? Well, fear not! There’s a way out! I’ll show you how to do the FizzBuzz challenge without any for loops at all.

Vectorize all the things! — SOURCE

The task of FizzBuzz is to print every number up to 100, but replace numbers divisible by 3 with “Fizz”, numbers divisible by 5 by “Buzz” and numbers that are divisible by both 3 and 5 have to be replaced by “FizzBuzz”.

Solving FizzBuzz with for loops is easy, you can even do this in BigQuery. Here, I’ll show you an alternative way of doing this — without any for loops whatsoever. The solution is Vectorised Functions.

If you already had some experience with R and Python, you’ve probably already come across vectorised functions in standard R or via Python’s numpy library. Let’s see how we can use them in Julia similarly.

Vectorised functions are great as they reduce the clutter often associated with for loops.

First steps with vectorized functions

Before we dive into solving FizzBuzz let’s see how you can replace a very simple for loop with a vectorized alternative in Julia.

Let’s start with a trivial task: Given a vector _a_ add 1 to each element of it.

For loop version:

a = [1,2,3];

	for i in 1:length(a)
	    a[i] += 1
	end
julia> print(a)

[2, 3, 4]

Vectorized version:

The above gets the job done, but it takes up 3 lines and a lot more characters than needed. If a was a numpy array in Python 🐍, you could just do a + 1 and job done. But first, you would have to convert your plain old array to a numpy array.

	a = [1,2,3];

	a .+ 1

Julia has a clever solution. You can use the broadcast operator . to apply an operation — in this case, addition — to all elements of an object. Here it is in action:

This gives the same answer as the for loop above. And there’s no need to convert your array.

SOURCE

Even better than that, you can broadcast any function of your liking, even your own ones. Here we calculate the area of a circle and then we broadcast it across our array:

function area_of_circle(r)
	    return π * r^2
	end

	a = [1,2,3];
	area_of_circle.(a)

Yes, pi is a built in constant in Julia!

julia> area_of_circle.(a)

3-element Array{Float64,1}:
  3.141592653589793
 12.566370614359172
 28.274333882308138

Bye-bye for loops

Image for post

Bye-bye for loops! — SOURCE

Now that we know the basics, let’s do FizzBuzz! But remember, no for loops allowed.

We will rephrase our problem a little bit. Instead of printing the numbers, Fizzes and Buzzes, we’ll return all of them together as a vector. I’ll break down the solution the same way as in the for loop article [LINK], so if you haven’t seen the previous posts, now would be a good time to check it out!

First, let’s return the numbers up until nas a vector:

	function fizzbuzz(n)
	    return collect(1:n)
	end

Here, collect just takes our range operator and evaluates it to an array.

julia> fizzbuzz(5)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Adding Fizzes

This works. Let’s see if we can print Fizz for each number that’s divisible by 3. We can do this by replacing all numbers that are divisible by 3 with a Fizz string.

julia> fizzbuzz(7)

7-element Array{String,1}:
 "1"
 "2"
 "Fizz"
 "4"
 "5"
 "Fizz"
 "7"

Let’s break this down step by step:

  • Why did we replace everything with string? Well, the array of numbers are just that, an array of numbers. We don’t want to have numbers and strings mingled up in a single object.
  • We broadcast rem.(numbers, 3 to find the remainder of all the numbers.
  • Then we compared this array of remainders elementwise to 0 ( .== 0 ).
  • Finally, we indexed our string array with the boolean mask and assigned “Fizz” to every element where our mask says true.

Feel free to break these steps down and try them in your own Julia REPL!

I know that the use of .= to assign a single element to many can be a bit controversial, but I actually quite like it. By explicitly specifying the broadcast of assignment you force yourself to think about the differences of these objects and everyone who reads your code afterwards will see that one is a vector and the other one is a scalar.

Adding Buzzes

Adding the Buzzes is done exactly the same way:

#programming #julia #optimization #coding #vectorization

Samanta  Moore

Samanta Moore

1622466360

Vector in Java | Java Vector Class with Examples

In programming, one of the most commonly used data structures is Vector in Java. Arrays are static data structures that can linearly store data. Similarly, vector in java also store the data linearly, but they are not restricted to a fixed size. Instead, its size can grow or shrink as per requirement. The parent class is AbstractList class and is implemented on List Interface.

Before you start to use vectors, import it from the java.util.package as follow:

import java.util.Vector

Declaration and Assessing Elements of a Vector

Here is how a vector in java is declared:

public class Vector extends AbstractList

implements List, RandomAccess, Cloneable, Serializable

Here, V is the type of element which can be int, string, char, etc.

Like we access data members in arrays, we can do that in vectors, too, by using the element’s index. For example, the second element of Vector E can be accessed as E[2].

Some common errors made while declaring a vector in java:

  • An IllegalArgumentException is thrown if the initial size of the vector is a negative value
  • A NullPointerException is thrown if the specified collection is null
  • The size of the vector is less than or equal to the capacity of the vector
  • Capacity is doubled in every increment cycle if vector increment is not specified

#full stack development #java #vector #vector in java

Macey  Legros

Macey Legros

1600270260

Java Vector Class Example | Vector Class In Java

Java Vector class is a part of the Java Collection framework. It is used to create arrays whose size can increase or decrease (that is, Dynamic Array) during the execution. The Java Vector class implements the List interface in the Java Collection framework. The main difference between the** Vector** class and other Collection classes is that the** Vector** class is synchronous, unlike others.

Java Vector Class

Vector class contains Legacy methods, unlike other Collection classes. Initially, Vector class was a Legacy class. It was added to the Collection framework later. The collection framework was not included in the early version of java. So, Legacy classes are used to store objects.

The hierarchy of the Vector class is shown in the diagram below.

#java #vector #java vector

A Julia Package for Evaluating Distances (Metrics) Between Vectors

Distances.jl

A Julia package for evaluating distances(metrics) between vectors.

This package also provides optimized functions to compute column-wise and pairwise distances, which are often substantially faster than a straightforward loop implementation. (See the benchmark section below for details).

Supported distances

  • Euclidean distance
  • Squared Euclidean distance
  • Periodic Euclidean distance
  • Cityblock distance
  • Total variation distance
  • Jaccard distance
  • Rogers-Tanimoto distance
  • Chebyshev distance
  • Minkowski distance
  • Hamming distance
  • Cosine distance
  • Correlation distance
  • Chi-square distance
  • Kullback-Leibler divergence
  • Generalized Kullback-Leibler divergence
  • Rényi divergence
  • Jensen-Shannon divergence
  • Mahalanobis distance
  • Squared Mahalanobis distance
  • Bhattacharyya distance
  • Hellinger distance
  • Haversine distance
  • Spherical angle distance
  • Mean absolute deviation
  • Mean squared deviation
  • Root mean squared deviation
  • Normalized root mean squared deviation
  • Bray-Curtis dissimilarity
  • Bregman divergence

For Euclidean distance, Squared Euclidean distance, Cityblock distance, Minkowski distance, and Hamming distance, a weighted version is also provided.

Basic use

The library supports three ways of computation: computing the distance between two iterators/vectors, "zip"-wise computation, and pairwise computation. Each of these computation modes works with arbitrary iterable objects of known size.

Computing the distance between two iterators or vectors

Each distance corresponds to a distance type. You can always compute a certain distance between two iterators or vectors of equal length using the following syntax

r = evaluate(dist, x, y)
r = dist(x, y)

Here, dist is an instance of a distance type: for example, the type for Euclidean distance is Euclidean (more distance types will be introduced in the next section). You can compute the Euclidean distance between x and y as

r = evaluate(Euclidean(), x, y)
r = Euclidean()(x, y)

Common distances also come with convenient functions for distance evaluation. For example, you may also compute Euclidean distance between two vectors as below

r = euclidean(x, y)

Computing distances between corresponding objects ("column-wise")

Suppose you have two m-by-n matrix X and Y, then you can compute all distances between corresponding columns of X and Y in one batch, using the colwise function, as

r = colwise(dist, X, Y)

The output r is a vector of length n. In particular, r[i] is the distance between X[:,i] and Y[:,i]. The batch computation typically runs considerably faster than calling evaluate column-by-column.

Note that either of X and Y can be just a single vector -- then the colwise function computes the distance between this vector and each column of the other argument.

Computing pairwise distances

Let X and Y have m and n columns, respectively, and the same number of rows. Then the pairwise function with the dims=2 argument computes distances between each pair of columns in X and Y:

R = pairwise(dist, X, Y, dims=2)

In the output, R is a matrix of size (m, n), such that R[i,j] is the distance between X[:,i] and Y[:,j]. Computing distances for all pairs using pairwise function is often remarkably faster than evaluting for each pair individually.

If you just want to just compute distances between all columns of a matrix X, you can write

R = pairwise(dist, X, dims=2)

This statement will result in an m-by-m matrix, where R[i,j] is the distance between X[:,i] and X[:,j]. pairwise(dist, X) is typically more efficient than pairwise(dist, X, X), as the former will take advantage of the symmetry when dist is a semi-metric (including metric).

To compute pairwise distances for matrices with observations stored in rows use the argument dims=1.

Computing column-wise and pairwise distances inplace

If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax (i being either 1 or 2):

colwise!(r, dist, X, Y)
pairwise!(R, dist, X, Y, dims=i)
pairwise!(R, dist, X, dims=i)

Please pay attention to the difference, the functions for inplace computation are colwise! and pairwise! (instead of colwise and pairwise).

Distance type hierarchy

The distances are organized into a type hierarchy.

At the top of this hierarchy is an abstract class PreMetric, which is defined to be a function d that satisfies

d(x, x) == 0  for all x
d(x, y) >= 0  for all x, y

SemiMetric is a abstract type that refines PreMetric. Formally, a semi-metric is a pre-metric that is also symmetric, as

d(x, y) == d(y, x)  for all x, y

Metric is a abstract type that further refines SemiMetric. Formally, a metric is a semi-metric that also satisfies triangle inequality, as

d(x, z) <= d(x, y) + d(y, z)  for all x, y, z

This type system has practical significance. For example, when computing pairwise distances between a set of vectors, you may only perform computation for half of the pairs, derive the values immediately for the remaining half by leveraging the symmetry of semi-metrics. Note that the types of SemiMetric and Metric do not completely follow the definition in mathematics as they do not require the "distance" to be able to distinguish between points: for these types x != y does not imply that d(x, y) != 0 in general compared to the mathematical definition of semi-metric and metric, as this property does not change computations in practice.

Each distance corresponds to a distance type. The type name and the corresponding mathematical definitions of the distances are listed in the following table.

type nameconvenient syntaxmath definition
Euclideaneuclidean(x, y)sqrt(sum((x - y) .^ 2))
SqEuclideansqeuclidean(x, y)sum((x - y).^2)
PeriodicEuclideanpeuclidean(x, y, w)sqrt(sum(min(mod(abs(x - y), w), w - mod(abs(x - y), w)).^2))
Cityblockcityblock(x, y)sum(abs(x - y))
TotalVariationtotalvariation(x, y)sum(abs(x - y)) / 2
Chebyshevchebyshev(x, y)max(abs(x - y))
Minkowskiminkowski(x, y, p)sum(abs(x - y).^p) ^ (1/p)
Hamminghamming(k, l)sum(k .!= l)
RogersTanimotorogerstanimoto(a, b)2(sum(a&!b) + sum(!a&b)) / (2(sum(a&!b) + sum(!a&b)) + sum(a&b) + sum(!a&!b))
Jaccardjaccard(x, y)1 - sum(min(x, y)) / sum(max(x, y))
BrayCurtisbraycurtis(x, y)sum(abs(x - y)) / sum(abs(x + y))
CosineDistcosine_dist(x, y)1 - dot(x, y) / (norm(x) * norm(y))
CorrDistcorr_dist(x, y)cosine_dist(x - mean(x), y - mean(y))
ChiSqDistchisq_dist(x, y)sum((x - y).^2 / (x + y))
KLDivergencekl_divergence(p, q)sum(p .* log(p ./ q))
GenKLDivergencegkl_divergence(x, y)sum(p .* log(p ./ q) - p + q)
RenyiDivergencerenyi_divergence(p, q, k)log(sum( p .* (p ./ q) .^ (k - 1))) / (k - 1)
JSDivergencejs_divergence(p, q)KL(p, m) / 2 + KL(q, m) / 2 with m = (p + q) / 2
SpanNormDistspannorm_dist(x, y)max(x - y) - min(x - y)
BhattacharyyaDistbhattacharyya(x, y)-log(sum(sqrt(x .* y) / sqrt(sum(x) * sum(y)))
HellingerDisthellinger(x, y)sqrt(1 - sum(sqrt(x .* y) / sqrt(sum(x) * sum(y))))
Haversinehaversine(x, y, r = 6_371_000)Haversine formula
SphericalAnglespherical_angle(x, y)Haversine formula
Mahalanobismahalanobis(x, y, Q)sqrt((x - y)' * Q * (x - y))
SqMahalanobissqmahalanobis(x, y, Q)(x - y)' * Q * (x - y)
MeanAbsDeviationmeanad(x, y)mean(abs.(x - y))
MeanSqDeviationmsd(x, y)mean(abs2.(x - y))
RMSDeviationrmsd(x, y)sqrt(msd(x, y))
NormRMSDeviationnrmsd(x, y)rmsd(x, y) / (maximum(x) - minimum(x))
WeightedEuclideanweuclidean(x, y, w)sqrt(sum((x - y).^2 .* w))
WeightedSqEuclideanwsqeuclidean(x, y, w)sum((x - y).^2 .* w)
WeightedCityblockwcityblock(x, y, w)sum(abs(x - y) .* w)
WeightedMinkowskiwminkowski(x, y, w, p)sum(abs(x - y).^p .* w) ^ (1/p)
WeightedHammingwhamming(x, y, w)sum((x .!= y) .* w)
Bregmanbregman(F, ∇, x, y; inner=dot)F(x) - F(y) - inner(∇(y), x - y)

Note: The formulas above are using Julia's functions. These formulas are mainly for conveying the math concepts in a concise way. The actual implementation may use a faster way. The arguments x and y are iterable objects, typically arrays of real numbers; w is an iterator/array of parameters (like weights or periods); k and l are iterators/arrays of distinct elements of any kind; a and b are iterators/arrays of Bools; and finally, p and q are iterators/arrays forming a discrete probability distribution and are therefore both expected to sum to one.

Precision for Euclidean and SqEuclidean

For efficiency (see the benchmarks below), Euclidean and SqEuclidean make use of BLAS3 matrix-matrix multiplication to calculate distances. This corresponds to the following expansion:

(x-y)^2 == x^2 - 2xy + y^2

However, equality is not precise in the presence of roundoff error, and particularly when x and y are nearby points this may not be accurate. Consequently, Euclidean and SqEuclidean allow you to supply a relative tolerance to force recalculation:

julia> x = reshape([0.1, 0.3, -0.1], 3, 1);

julia> pairwise(Euclidean(), x, x)
1×1 Array{Float64,2}:
 7.45058e-9

julia> pairwise(Euclidean(1e-12), x, x)
1×1 Array{Float64,2}:
 0.0

Benchmarks

The implementation has been carefully optimized based on benchmarks. The script in benchmark/benchmarks.jl defines a benchmark suite for a variety of distances, under column-wise and pairwise settings.

Here are benchmarks obtained running Julia 1.5 on a computer with a quad-core Intel Core i5-2300K processor @ 3.2 GHz. Extended versions of the tables below can be replicated using the script in benchmark/print_table.jl.

Column-wise benchmark

Generically, column-wise distances are computed using a straightforward loop implementation. For [Sq]Mahalanobis, however, specialized methods are provided in Distances.jl, and the table below compares the performance (measured in terms of average elapsed time of each iteration) of the generic to the specialized implementation. The task in each iteration is to compute a specific distance between corresponding columns in two 200-by-10000 matrices.

distanceloopcolwisegain
SqMahalanobis0.089470s0.014424s6.2027
Mahalanobis0.090882s0.014096s6.4475

Pairwise benchmark

Generically, pairwise distances are computed using a straightforward loop implementation. For distances of which a major part of the computation is a quadratic form, however, the performance can be drastically improved by restructuring the computation and delegating the core part to GEMM in BLAS. The table below compares the performance (measured in terms of average elapsed time of each iteration) of generic to the specialized implementations provided in Distances.jl. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a 100-by-200 and 100-by-250 matrices, which will result in a 200-by-250 distance matrix.

 

distancelooppairwisegain
SqEuclidean0.001273s0.000124s10.2290
Euclidean0.001445s0.000194s7.4529
CosineDist0.001928s0.000149s12.9543
CorrDist0.016837s0.000187s90.1854
WeightedSqEuclidean0.001603s0.000143s11.2119
WeightedEuclidean0.001811s0.000238s7.6032
SqMahalanobis0.308990s0.000248s1248.1892
Mahalanobis0.313415s0.000346s906.1836

Download Details:
Author: JuliaStats
The Demo/Documentation: View The Demo/Documentation
Download Link: Download The Source Code
Official Website: https://github.com/JuliaStats/Distances.jl 
License: MIT

#julia #programming #developer 

Wiley  Mayer

Wiley Mayer

1620719072

Everything a Data Scientist Needs to Know About Julia in 2021

An overview of everything a data scientist needs to know about Julia in 2021.

In recent years, the Data Science and machine-learning industry have exploded in popularity. This has given way to programming languages new and old subsequently both rising and falling in popularity. SAS, for example, has been much less popularly adopted for Data Science, taking a backseat to the excitement of Python and its capabilities in machine-learning. Python has soared in popularity over the past few years due to its use in Data Science due to its ecosystem, high-level syntax, and general-purpose nature.

However, in the world of Data Science, there is a new kid on the block. If you are working in the Data Science discipline, it is likely you have heard of the new open-source language coming out of MIT, Julia. Because of Julia’s rather recent rise in popularity, many scientists have been questioning whether or not they should learn Julia, and where the industry is going to be in regards to the most popular programming languages used.

Today, I would like to modernize that concept for general Data Science practices in the year 2021. A lot has changed since that article was written, mainly Julia soaring in popularity over the course of the past two years. There are definitely some new things to learn about the language and know before getting into it this year. Today I seek to answer modern questions about Julia by providing everything that you need to know about Julia in its current state, and what I subjectively expect out of the language in the future.

  • Python is not going anywhere
  • The multiple dispatch paradigm
  • Ecosystem

#machine-learning #julia #programming #data-science #python