Wilford  Pagac

Wilford Pagac

1597164060

Vectorize everything with Julia

Do you ever feel like for loops are taking over your life and there’s no escape from them? Do you feel trapped by all those loops? Well, fear not! There’s a way out! I’ll show you how to do the FizzBuzz challenge without any for loops at all.

Vectorize all the things! — SOURCE

The task of FizzBuzz is to print every number up to 100, but replace numbers divisible by 3 with “Fizz”, numbers divisible by 5 by “Buzz” and numbers that are divisible by both 3 and 5 have to be replaced by “FizzBuzz”.

Solving FizzBuzz with for loops is easy, you can even do this in BigQuery. Here, I’ll show you an alternative way of doing this — without any for loops whatsoever. The solution is Vectorised Functions.

If you already had some experience with R and Python, you’ve probably already come across vectorised functions in standard R or via Python’s numpy library. Let’s see how we can use them in Julia similarly.

Vectorised functions are great as they reduce the clutter often associated with for loops.

First steps with vectorized functions

Before we dive into solving FizzBuzz let’s see how you can replace a very simple for loop with a vectorized alternative in Julia.

Let’s start with a trivial task: Given a vector _a_ add 1 to each element of it.

For loop version:

a = [1,2,3];

	for i in 1:length(a)
	    a[i] += 1
	end
julia> print(a)

[2, 3, 4]

Vectorized version:

The above gets the job done, but it takes up 3 lines and a lot more characters than needed. If a was a numpy array in Python 🐍, you could just do a + 1 and job done. But first, you would have to convert your plain old array to a numpy array.

	a = [1,2,3];

	a .+ 1

Julia has a clever solution. You can use the broadcast operator . to apply an operation — in this case, addition — to all elements of an object. Here it is in action:

This gives the same answer as the for loop above. And there’s no need to convert your array.

SOURCE

Even better than that, you can broadcast any function of your liking, even your own ones. Here we calculate the area of a circle and then we broadcast it across our array:

function area_of_circle(r)
	    return π * r^2
	end

	a = [1,2,3];
	area_of_circle.(a)

Yes, pi is a built in constant in Julia!

julia> area_of_circle.(a)

3-element Array{Float64,1}:
  3.141592653589793
 12.566370614359172
 28.274333882308138

Bye-bye for loops

Image for post

Bye-bye for loops! — SOURCE

Now that we know the basics, let’s do FizzBuzz! But remember, no for loops allowed.

We will rephrase our problem a little bit. Instead of printing the numbers, Fizzes and Buzzes, we’ll return all of them together as a vector. I’ll break down the solution the same way as in the for loop article [LINK], so if you haven’t seen the previous posts, now would be a good time to check it out!

First, let’s return the numbers up until nas a vector:

	function fizzbuzz(n)
	    return collect(1:n)
	end

Here, collect just takes our range operator and evaluates it to an array.

julia> fizzbuzz(5)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Adding Fizzes

This works. Let’s see if we can print Fizz for each number that’s divisible by 3. We can do this by replacing all numbers that are divisible by 3 with a Fizz string.

julia> fizzbuzz(7)

7-element Array{String,1}:
 "1"
 "2"
 "Fizz"
 "4"
 "5"
 "Fizz"
 "7"

Let’s break this down step by step:

  • Why did we replace everything with string? Well, the array of numbers are just that, an array of numbers. We don’t want to have numbers and strings mingled up in a single object.
  • We broadcast rem.(numbers, 3 to find the remainder of all the numbers.
  • Then we compared this array of remainders elementwise to 0 ( .== 0 ).
  • Finally, we indexed our string array with the boolean mask and assigned “Fizz” to every element where our mask says true.

Feel free to break these steps down and try them in your own Julia REPL!

I know that the use of .= to assign a single element to many can be a bit controversial, but I actually quite like it. By explicitly specifying the broadcast of assignment you force yourself to think about the differences of these objects and everyone who reads your code afterwards will see that one is a vector and the other one is a scalar.

Adding Buzzes

Adding the Buzzes is done exactly the same way:

#programming #julia #optimization #coding #vectorization

What is GEEK

Buddha Community

Vectorize everything with Julia
Wilford  Pagac

Wilford Pagac

1597164060

Vectorize everything with Julia

Do you ever feel like for loops are taking over your life and there’s no escape from them? Do you feel trapped by all those loops? Well, fear not! There’s a way out! I’ll show you how to do the FizzBuzz challenge without any for loops at all.

Vectorize all the things! — SOURCE

The task of FizzBuzz is to print every number up to 100, but replace numbers divisible by 3 with “Fizz”, numbers divisible by 5 by “Buzz” and numbers that are divisible by both 3 and 5 have to be replaced by “FizzBuzz”.

Solving FizzBuzz with for loops is easy, you can even do this in BigQuery. Here, I’ll show you an alternative way of doing this — without any for loops whatsoever. The solution is Vectorised Functions.

If you already had some experience with R and Python, you’ve probably already come across vectorised functions in standard R or via Python’s numpy library. Let’s see how we can use them in Julia similarly.

Vectorised functions are great as they reduce the clutter often associated with for loops.

First steps with vectorized functions

Before we dive into solving FizzBuzz let’s see how you can replace a very simple for loop with a vectorized alternative in Julia.

Let’s start with a trivial task: Given a vector _a_ add 1 to each element of it.

For loop version:

a = [1,2,3];

	for i in 1:length(a)
	    a[i] += 1
	end
julia> print(a)

[2, 3, 4]

Vectorized version:

The above gets the job done, but it takes up 3 lines and a lot more characters than needed. If a was a numpy array in Python 🐍, you could just do a + 1 and job done. But first, you would have to convert your plain old array to a numpy array.

	a = [1,2,3];

	a .+ 1

Julia has a clever solution. You can use the broadcast operator . to apply an operation — in this case, addition — to all elements of an object. Here it is in action:

This gives the same answer as the for loop above. And there’s no need to convert your array.

SOURCE

Even better than that, you can broadcast any function of your liking, even your own ones. Here we calculate the area of a circle and then we broadcast it across our array:

function area_of_circle(r)
	    return π * r^2
	end

	a = [1,2,3];
	area_of_circle.(a)

Yes, pi is a built in constant in Julia!

julia> area_of_circle.(a)

3-element Array{Float64,1}:
  3.141592653589793
 12.566370614359172
 28.274333882308138

Bye-bye for loops

Image for post

Bye-bye for loops! — SOURCE

Now that we know the basics, let’s do FizzBuzz! But remember, no for loops allowed.

We will rephrase our problem a little bit. Instead of printing the numbers, Fizzes and Buzzes, we’ll return all of them together as a vector. I’ll break down the solution the same way as in the for loop article [LINK], so if you haven’t seen the previous posts, now would be a good time to check it out!

First, let’s return the numbers up until nas a vector:

	function fizzbuzz(n)
	    return collect(1:n)
	end

Here, collect just takes our range operator and evaluates it to an array.

julia> fizzbuzz(5)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Adding Fizzes

This works. Let’s see if we can print Fizz for each number that’s divisible by 3. We can do this by replacing all numbers that are divisible by 3 with a Fizz string.

julia> fizzbuzz(7)

7-element Array{String,1}:
 "1"
 "2"
 "Fizz"
 "4"
 "5"
 "Fizz"
 "7"

Let’s break this down step by step:

  • Why did we replace everything with string? Well, the array of numbers are just that, an array of numbers. We don’t want to have numbers and strings mingled up in a single object.
  • We broadcast rem.(numbers, 3 to find the remainder of all the numbers.
  • Then we compared this array of remainders elementwise to 0 ( .== 0 ).
  • Finally, we indexed our string array with the boolean mask and assigned “Fizz” to every element where our mask says true.

Feel free to break these steps down and try them in your own Julia REPL!

I know that the use of .= to assign a single element to many can be a bit controversial, but I actually quite like it. By explicitly specifying the broadcast of assignment you force yourself to think about the differences of these objects and everyone who reads your code afterwards will see that one is a vector and the other one is a scalar.

Adding Buzzes

Adding the Buzzes is done exactly the same way:

#programming #julia #optimization #coding #vectorization

ChainedVectors.jl: Few Utility Types Over Julia Vector Type

ChainedVectors consist of a bunch of types that:

  • chain multiple Vectors and make it appear like a single Vector
  • give a window into a portion of the chained vector that appears like a single Vector. The window may straddle across boundaries of multiple elements in the chain.

ChainedVector

Chains multiple vectors. Only index translation is done and the constituent Vectors are not copied. This can be efficient in situations where avoiding allocation and copying of data is important. For example, during sequential file reading, ChainedVectors can be used to store file blocks progressively as the file is read. As it grows beyond a certain size, buffers from the head of the chain can be removed and resued to read further data at the tail.

julia> v1 = [1, 2, 3]
3-element Int64 Array:
 1
 2
 3

julia> v2 = [4, 5, 6]
3-element Int64 Array:
 4
 5
 6

julia> cv = ChainedVector{Int}(v1, v2)
6-element Int64 ChainedVector:
[1, 2, 3, 4, 5, ...]

julia> cv[1]
1

julia> cv[5]
5

ChainedVector{Uint8} has specialized methods for search, beginswith, and beginswithat that help in working with textual data.

julia> cv = ChainedVector{Uint8}(b"Hello World ", b"Goodbye World ")
26-element Uint8 ChainedVector:
[0x48, 0x65, 0x6c, 0x6c, 0x6f, ...]

julia> search(cv, 'W')
7

julia> search(cv, 'W', 8)
21

julia> search(cv, 'W', 22)
0

julia> beginswith(cv, b"Hello")
true

julia> beginswith(cv, b"ello")
false

julia> beginswithat(cv, 2, b"ello")
true

julia> beginswithat(cv, 7, b"World Goodbye")
true

Window view of a ChainedVector

Using the sub method, a portion of the data in the ChainedVector can be accessed as a view:

sub(cv::ChainedVector, r::Range1{Int})

Example:

julia> v1 = [1, 2, 3, 4, 5, 6];

julia> v2 = [7, 8, 9, 10, 11, 12];

julia> cv = ChainedVector{Int}(v1, v2);

julia> sv = sub(cv, 3:10)
8-element Int64 SubVector:
[3, 4, 5, 6, 7, ...]


julia> sv[1]
3

julia> # sv[7] is the same as cv[9] and v2[3]

julia> println("sv[7]=$(sv[7]), v2[3]=$(v2[3]), cv[9]=$(cv[9])")
sv[7]=9, v2[3]=9, cv[9]=9

julia> 

julia> # changing values through sv will be visible at cv and v2

julia> sv[7] = 71
71

julia> println("sv[7]=$(sv[7]), v2[3]=$(v2[3]), cv[9]=$(cv[9])")
sv[7]=71, v2[3]=71, cv[9]=71

The sub method returns a Vector that indexes into the chained vector at the given range. The returned Vector is not a copy and any modifications affect the Chainedvector and consequently the constituent vectors of the ChainedVector as well. The returned vector can be an instance of either a SubVector or a Vector obtained through the method fast_sub_vec.

SubVector

Provides index translations for abstract vectors. Example:

julia> v1 = [1, 2, 3, 4, 5, 6];
julia> sv = SubVector(v1, 2:5)
4-element Int64 SubVector:
[2, 3, 4, 5, ]


julia> sv[1]
2


julia> sv[1] = 20
20


julia> v1[2]
20

 

fast_sub_vec

Provides an optimized way of creating a Vector that points within another Vector and uses the same underlying data. Since it reuses the same memory locations, it works only on concrete Vectors that give contiguous memory locations. Internally the instance of the view vector is maintained in a WeakKeyDict along with a reference to the larger vector to prevent gc from releasing the parent vector till the view is in use. Example:

julia> v1 = [1, 2, 3, 4, 5, 6];
julia> sv = fast_sub_vec(v1, 2:5)
4-element Int64 Array:
2
3
4
5


julia>


julia> println("sv[1]=$(sv[1]), v1[2]=$(v1[2])")
sv[1]=2, v1[2]=2


julia> sv[1] = 20
20


julia> println("sv[1]=$(sv[1]), v1[2]=$(v1[2])")
sv[1]=20, v1[2]=20

Tests and Benchmarks

Below is the output of some benchmarks done using time_tests.jl located in the test folder.

Times for getindex across all elements of vectors of 33554432 integers.
Split into two 16777216 buffers for ChainedVectors.

Vector: 0.041909848
ChainedVector: 0.261795721
SubVector: 0.172702399
FastSubVector: 0.041579312
SubArray: 3.848813439
SubVector of ChainedVector: 0.418898455

Download Details:

Author: Tanmaykm
Source Code: https://github.com/tanmaykm/ChainedVectors.jl 
License: MIT license

#julia #vector 

Vec.jl: 2D and 3D Vectors and Their Operations for Julia

Vec

Provides 2D and 3D vector types for vector operations in Julia.

Installation

Run one of those commands in the Julia REPL:

Through the SISL registry:

] registry add https://github.com/sisl/Registry
add Vec

Through Pkg

import Pkg
Pkg.add(PackageSpec(url="https://github.com/sisl/Vec.jl.git"))

Usage

Vec.jl provides several vector types, named after their groups. All types are immutable and are subtypes of 'StaticArrays'' FieldVector, so they can be indexed and used as vectors in many contexts.

  • VecE2 provides an (x,y) type of the Euclidean-2 group.
  • VecE3 provides an (x,y,z) type of the Euclidean-3 group.
  • VecSE2 provides an (x,y,theta) type of the special-Euclidean 2 group.
v = VecE2(0, 1)
v = VecSE2(0,1,0.5)
v = VecE3(0, 1, 2)

Additional geometry types include Quat for quaternions, Line, LineSegment, and Projectile.

The switch to StaticArrays brings several breaking changes. If you need a backwards-compatible version, please checkout the v0.1.0 tag with cd(Pkg.dir("Vec")); run(`git checkout v0.1.0`).

Download Details:

Author: sisl
Source Code: https://github.com/sisl/Vec.jl 
License: View license

#julia #vector #3d 

Monty  Boehm

Monty Boehm

1658520480

KSVM.jl: Support Vector Machines (SVM) in Pure Julia

KSVM

DEVELOPMENT CURRENTLY ON HOLD

A Julia package for research and application of linear- and non-linear Support Vector Machines (SVM).

This package aims to make use of Julia's strength in order to provide an easily extensible, highly modular, and at the same time computationally efficient framework for SVMs.

Why Julia?

Julia's high-level language design (which should feel familiar to the scientific community), combined with it's high-performance, may have the potential to bridge the two-language problem, that is arguably associated with research and application of machine learning algorithms.

It is our hope, that frameworks such as this enable a broader range of scientists to contribute effective algorithms with fair comparisons, and without having to be fluent in low-level languages such as C

  • TODO some objective benchmarks

Just for research?

While research is an important goal of this framework, it is of equal importance that all the included algorithms are 1.) well documented, and thanks to Julia 2.) efficient. Additionally, because of the framework's modular and transparent interface (e.g. callback functions), it should be simple to use this library as basis for course exercises.

Bottom Line: The use of this framework for educational and/or commercial reasons is encouraged under the MIT "Expat" license.

Installation

This is still a work in progress. Installation not yet advised.

Usage

Here is a quick "Hello World" example of using a linear SVM on the example of the Iris dataset.

# Adapted example from SVM.jl by John Myles White
using KSVM
using RDatasets
iris = dataset("datasets", "iris")
X = convert(Array, iris[1:100, 1:2])'  # The observations have to be in the columns
y = iris[1:100, :Species]
train = bitrand(size(X,2))             # Split training and testset
model = svm(X[:, train], y[train])     # Fit the linear SVM
acc = accuracy(model, X[:, ~train], y[~train])

Iris Data SVM

Note: The preview plot on the bottom is only provided if the dataset is small and lies in two dimensions

Linear Support Vector Machines

svm(X, y⃗; nargs...) → SVM

The function svm fits a Support Vector Machine (SVM) to the given training data X and y⃗. Depending on the specified parameters in nargs, the SVM can be trained to perform various forms of regression or classification tasks. There are a number of parameters that can be configured to the individual needs, such as the type of lossfunction, the type of regularization, or the algorithm used for training the SVM in the primal/dual.

Parameters

X : The input matrix of the training set, with the observations as columns. It can be either a dense or sparse matrix of floatingpoint numbers. Note that it is generally advised to center and scale your data before training a SVM.

y⃗ : The target vector of the training set. It has to be the same length as X has columns. In the case of classification, it can be either a vector with elements yᵢ ∈ {-1.0, 1.0} for two class prediction, or a vector with elements yᵢ ∈ N for multi-class prediction.

solver : The concrete algorithm to train the SVM. Depending on the algorithm this will either solve the linear SVM in the primal or dual formulation. In contrast to other frameworks, the algorithm does not determine if the solution of the primal or dual problem is returned (see parameter dual).

kernel : The kernel that should be used. The ScalarProductKernel denotes the special case of a linear SVM and allows for additional functionality and solvers that are specialized for the linear case.

loss : The utilized loss function. The typical loss functions for SVMs are HingeLoss() (L1-SVM), L2HingeLoss() (L2-SVM), SmoothedL1HingeLoss(h), or ModifiedHuberLoss() for classification, and EpsilonInsLoss(e), or L2EpsilonInsLoss(e) for support vector regression. Note that in general each solvers is only able to deal with a subset of those loss functions and some combinations might not be supported. For example the implementation of DualCD does only support HingeLoss and L2HingeLoss.

regtype : The type of regularization that should be used. In general this can either be L1Penalty or L2Penalty. Note that not all solver support L1Penalty.

bias : The scaling factor of the bias. If set to 0, no intercept will be fitted.

C : The penalty parameter. It plays a role similar to λ⁻¹ in Empirical Risk Minimization. Conceptually, it denotes the trade off between model complexity and the training error. Larger C will increase the penalization of training errors and thus lead to behavior similar to a hard margin classifier.

dual : Boolean that specifies if the solution to the dual problem should be returned instead of the solution to the primal (default = false).

In contrast to other frameworks, the dual parameter does not influence whether the dual or the primal problem is solved (this is instead specified implicitly by the solver parameter), but the result the user is interested in. It is generally recommended to leave dual = false, unless the user is either explicitly interested in the support vectors, or the dataset is sparse and high-dimensional.

Note: Not all solver are able to natively provide both solutions. If the algorithm is unable to provide the desired solution type then the solution will be converted accordingly, if feasible.

iterations : Specifies the maximum number of iterations before the solver exits with the current (and probably suboptimal) solution.

ftol : Specifies the tolerance for the change of the objective function value.

xtol : Specifies the tolerance for the change of the solution.

gtol : Specifies the tolerance for the change of the gradient norm.

callback : The optional callback function with signature f(i, w, v, G). If a callback is specified, then it will be called every iteration with the following four parameters in order:

paramdescription
ithe current iteration number.
wthe current coefficients (i.e. α if dual=true, or θ if dual=false).
vthe current objective value.
Gthe current gradient.

Note: The callback function can be used for early stopping by returning the symbol :Exit.

show_trace : Instead of (or additional to) the callback function, the user can enable the output of the training information to STDOUT.

nargs... : Additional named arguments that are passed unchecked to the specified solver. This functionality can be used to pass around special arguments that the library does not natively implement.

Non-linear Support Vector Machines

coming soon

License

This code is free to use under the terms of the MIT license.

Acknowledgement

This package makes heavy use of the following packages in order to provide it's main functionality. To see at full list of utilized packages, please take a look at the REQUIRE file.

Early inspiration by Regression.jl

References

Chapelle, Olivier. "Training a support vector machine in the primal." Neural Computation 19.5 (2007): 1155-1178. link

Hsieh, Cho-Jui, et al. "A dual coordinate descent method for large-scale linear SVM." Proceedings of the 25th International Conference on Machine learning (ICML-08). ACM, 2008. link

Platt, John. "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods." Advances in large margin classifiers 10.3 (1999): 61-74. link

Platt, John. "Fast training of support vector machines using sequential minimal optimization." Advances in kernel methods—support vector learning 3 (1999). link

Shalev-shwartz, Shai, et al. "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM." Proceedings of the 24th International Conference on Machine Learning (ICML-07). 2007. link

Author: Evizero
Source Code: https://github.com/Evizero/KSVM.jl 
License: View license

#julia #vector

VSL.jl: Julia Bindings to The intel Vector Statistics Library

VSL.jl

This package provides bindings to the Intel Vector Statistics Library.

Using VSL.jl

You must have the Intel® Math Kernel Library installed to use VSL.jl, and the shared library must be in a directory known to the linker.

VML.jl provides several basic random number generators (BRNGs) and distributions, and each distribution has at least one method to generate random number. After VSL.jl loaded, you can use the distributions such like the followings:

julia> using VSL

julia> brng = BasicRandomNumberGenerator(VSL_BRNG_MT19937, 12345);
# A BRNG created, in which 12345 is the random seed.

julia> u = Uniform(brng, 0.0, 1.0); # Create a uniform distribution between 0.0 and 1.0.

julia> rand(u) # Generate one random number.
0.41661986871622503

julia> rand(u, 2, 3) # Generate an random 2*3 array.
2×3 Array{Float64,2}:
 0.732685   0.820175  0.802848
 0.0101692  0.825207  0.29864 

julia> A = Array{Float64}(3, 4);

julia> rand!(u, A) # Fill an array with random numbers.
3×4 Array{Float64,2}:
 0.855138  0.193661  0.436228  0.124267
 0.368412  0.270245  0.161688  0.874174
 0.931785  0.566008  0.373064  0.432936

Basic random number generators

Use the Enum BRNGType to set the type of BRNG.

BRNGType Enum
VSL_BRNG_MCG31
VSL_BRNG_R250
VSL_BRNG_MRG32K3A
VSL_BRNG_MCG59
VSL_BRNG_WH
VSL_BRNG_SOBOL
VSL_BRNG_NIEDERR
VSL_BRNG_MT19937
VSL_BRNG_MT2203
VSL_BRNG_SFMT19937
VSL_BRNG_NONDETERM
VSL_BRNG_ARS5
VSL_BRNG_PHILOX4X32X10

Supported distributions

Contigurous: Uniform, Gaussian, GaussianMV, Exponential, Laplace, Weibull, Cauchy, Rayleigh, Lognormal, Gumbel, Gamma, Beta

Discrete: UniformDiscrete, UniformBits, UniformBits32, UniformBits64, Bernoulli, Geometric, Binomial, Hypergeometric, Poisson, PoissonV, NegBinomial

Notes

Most of the discrete distributions return values of 32-bit integer. Please be careful when using those distributions.

For more information, please refer to the Intel® Math Kernel Library Developer Reference

Download Details:

Author: Sunoru
Source Code: https://github.com/sunoru/VSL.jl 
License: MIT license

#julia #cryptography #vector