An algorithm is a sequence of well-defined steps that defines an abstract solution to a problem. Use this tag when your issue is related to algorithm design.
Monty  Boehm

Monty Boehm


PlmDCA: Pseudo Likelihood Maximization for Protein in Julia


Pseudo-likelihood maximization in Julia. A complete description of the algorithm can be found at If you use this algorithm you should cite:

M. Ekeberg, C. Lovkvist, Y. Lan, M. Weigt, E. Aurell, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev. E 87, 012707 (2013)

M. Ekeberg, T. Hartonen, E. Aurell, Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences, arXiv:1401.4832 (supplementary material)

The present software is a Julia implementation of above mentioned papers, with no reference to the original MATLAB software implementation.

The code now requires at least Julia version 1.5 or later.


To install just use the package manager and do a

(v1.?) pkg> add


The code internally uses NLopt which provides a Julia interfaces to the free/open-source NLopt library.


To load the code just type

julia> using PlmDCA

The functions in this package are written to maximize performance. Most computationally-heavy functions can use multiple threads (start julia with the -t option or set the JULIA_NUM_THREADS environment variable). For more information on how set correctly the number of threads, please refer to the online Julia Documentation on Multi-Threading.

The program (only in its symmetric version plmdca_sym) can be run on multiple cores previous addprocs(nprocs) where nprocs should be some integer number np lower or equal to your (physical) number of cores.

The software provides two main functions plmdca(filename::String, ...) and plmdca_sym(filename::String,...) (resp. the asymmetric and symmetric coupling version of the algorithm). Empirically it turns out that the asymmetric version is faster and more accurate. This function take as input the name of a (possibly zipped) multiple sequence.

We also provide another function mutualinfo(filename::String,...) to compute the mutual information score.

There are a number of possible algorithmic strategies for the optimization problem. As long as local gradient-based optimization is concerned, this is a list of :symbols (associated to the different methods):


After some experiments we found that the best compromise between accuracy and speed is achieved by the Low Storage BFGS method :LD_LBFGS, which is the default method in the code. The other methods can be set changing the default optional argument (e.g. method=:LD_SLSQP).

There are more optional arguments that can be set (to be documented...).


The functions output a type PlmOut (say X) with 4 fields:

  • X.Jtensor: the coupling matrix J[ri,rj,i,j] a symmetrized q x q x N x N array, where N is the number of residues in the multiple sequence alignment, and q is the alphabet "size" (typically 21 for proteins).
  • X.htensor: the external field h[r_i,i] q x N array.
  • X.pslike: the pseudolikelihood
  • X.score: a vector of Tuple{Int,Int,Float64} containing the candidate contacts in descending score order (residue1, residue2 , score12).


The minimal julia version for using this code is 1.3 (package version <= v0.2.0)

From package versions 0.3.0 on the minimal julia requirement is 1.5


A lot!

Download Details:

Author: Pagnani
Source Code: 
License: View license

#julia #like #algorithm 

PlmDCA: Pseudo Likelihood Maximization for Protein in Julia
Elian  Harber

Elian Harber


A Feature Complete and High Performance Multi-group Raft Library in Go

Dragonboat - A Multi-Group Raft library in Go / 中文版   


Dragonboat is a high performance multi-group Raft consensus library in pure Go.

Consensus algorithms such as Raft provides fault-tolerance by alllowing a system continue to operate as long as the majority member servers are available. For example, a Raft shard of 5 servers can make progress even if 2 servers fail. It also appears to clients as a single entity with strong data consistency always provided. All Raft replicas can be used to handle read requests for aggregated read throughput.

Dragonboat handles all technical difficulties associated with Raft to allow users to just focus on their application domains. It is also very easy to use, our step-by-step examples can help new users to master it in half an hour.


  • Easy to use pure-Go APIs for building Raft based applications
  • Feature complete and scalable multi-group Raft implementation
  • Disk based and memory based state machine support
  • Fully pipelined and TLS mutual authentication support, ready for high latency open environment
  • Custom Raft log storage and transport support, easy to integrate with latest I/O techs
  • Prometheus based health metrics support
  • Built-in tool to repair Raft shards that permanently lost the quorum
  • Extensively tested including using Jepsen's Knossos linearizability checker, some results are here

All major features covered in Diego Ongaro's Raft thesis have been supported -

  • leader election, log replication, snapshotting and log compaction
  • membership change
  • pre-vote
  • ReadIndex protocol for read-only queries
  • leadership transfer
  • non-voting member
  • witness member
  • idempotent update transparent to applications
  • batching and pipelining
  • disk based state machine


Dragonboat is the fastest open source multi-group Raft implementation on Github.

For 3-nodes system using mid-range hardware (details here) and in-memory state machine, when RocksDB is used as the storage engine, Dragonboat can sustain at 9 million writes per second when the payload is 16bytes each or 11 million mixed I/O per second at 9:1 read:write ratio. High throughput is maintained in geographically distributed environment. When the RTT between nodes is 30ms, 2 million I/O per second can still be achieved using a much larger number of clients. throughput

The number of concurrent active Raft groups affects the overall throughput as requests become harder to be batched. On the other hand, having thousands of idle Raft groups has a much smaller impact on throughput. nodes

Table below shows write latencies in millisecond, Dragonboat has <5ms P99 write latency when handling 8 million writes per second at 16 bytes each. Read latency is lower than writes as the ReadIndex protocol employed for linearizable reads doesn't require fsync-ed disk I/O.

OpsPayload Size99.9% percentile99% percentileAVG

When tested on a single Raft group, Dragonboat can sustain writes at 1.25 million per second when payload is 16 bytes each, average latency is 1.3ms and the P99 latency is 2.6ms. This is achieved when using an average of 3 cores (2.8GHz) on each server.

As visualized below, Stop-the-World pauses caused by Go1.11's GC are sub-millisecond on highly loaded systems. Such very short Stop-the-World pause time is further significantly reduced in Go 1.12. Golang's runtime.ReadMemStats reports that less than 1% of the available CPU time is used by GC on highly loaded system. stw


  • x86_64/Linux, x86_64/MacOS or ARM64/Linux, Go 1.15 or 1.14

Getting Started

Master is our unstable branch for development, it is current working towards the v4.0 release. Please use the latest released versions for any production purposes. For Dragonboat v3.3.x, please follow the instructions in v3.3.x's

Go 1.17 or above with Go module support is required.

Use the following command to add Dragonboat v3 into your project.

go get

Or you can use the following command to start using the development version of the Dragonboat, which is current at v4 for its APIs.

go get

By default, Pebble is used for storing Raft Logs in Dragonboat. RocksDB and other storage engines are also supported, more info here.

You can also follow our examples on how to use Dragonboat.


FAQ, docs, step-by-step examples, DevOps doc, CHANGELOG and online chat are available.


Dragonboat examples are here.


Dragonboat is production ready.


For reporting bugs, please open an issue. For contributing improvements or new features, please send in the pull request.


  • 2022-06-03 We are working towards a v4.0 release which will come with API changes. See CHANGELOG for details.
  • 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes.

Author: lni
Source Code: 
License: Apache-2.0 license

#go #golang #protocol #algorithm 

A Feature Complete and High Performance Multi-group Raft Library in Go
Elian  Harber

Elian Harber


Dragonboat-example: Examples for Dragonboat

About / 中文版

This repo contains examples for dragonboat.

The master branch and the release-3.3 branch of this repo target Dragonboat's master and v3.3.x releases.

Go 1.17 or later releases with Go module support is required.


Programs provided here in this repo are examples - they are intentionally created in a more straight forward way to help users to understand the basics of the dragonboat library. They are not benchmark programs.


To download the example code to say $HOME/src/dragonboat-example:

$ cd $HOME/src
$ git clone

Build all examples:

$ cd $HOME/src/dragonboat-example
$ make


Click links below for more details.

Next Step

Author: lni
Source Code: 
License: Apache-2.0 license

#go #golang #protocol #algorithm 

Dragonboat-example: Examples for Dragonboat

Jaro Winkler: Ruby/C Implementation Of Jaro-Winkler Distance Algorithm

jaro_winkler is an implementation of Jaro-Winkler distance algorithm which is written in C extension and will fallback to pure Ruby version in platforms other than MRI/KRI like JRuby or Rubinius. Both of C and Ruby implementation support any kind of string encoding, such as UTF-8, EUC-JP, Big5, etc.


gem install jaro_winkler


require 'jaro_winkler'

# Jaro Winkler Distance

JaroWinkler.distance "MARTHA", "MARHTA"
# => 0.9611
JaroWinkler.distance "MARTHA", "marhta", ignore_case: true
# => 0.9611
JaroWinkler.distance "MARTHA", "MARHTA", weight: 0.2
# => 0.9778

# Jaro Distance

JaroWinkler.jaro_distance "MARTHA", "MARHTA"
# => 0.9444444444444445

There is no JaroWinkler.jaro_winkler_distance, it's tediously long.


ignore_casebooleanfalseAll lower case characters are converted to upper case prior to the comparison.
weightnumber0.1A constant scaling factor for how much the score is adjusted upwards for having common prefixes.
thresholdnumber0.7The prefix bonus is only added when the compared strings have a Jaro distance above the threshold.
adj_tablebooleanfalseThe option is used to give partial credit for characters that may be errors due to known phonetic or character recognition errors. A typical example is to match the letter "O" with the number "0".

Adjusting Table

Default Table

['A', 'E'], ['A', 'I'], ['A', 'O'], ['A', 'U'], ['B', 'V'], ['E', 'I'], ['E', 'O'], ['E', 'U'], ['I', 'O'], ['I', 'U'],
['O', 'U'], ['I', 'Y'], ['E', 'Y'], ['C', 'G'], ['E', 'F'], ['W', 'U'], ['W', 'V'], ['X', 'K'], ['S', 'Z'], ['X', 'S'],
['Q', 'C'], ['U', 'V'], ['M', 'N'], ['L', 'I'], ['Q', 'O'], ['P', 'R'], ['I', 'J'], ['2', 'Z'], ['5', 'S'], ['8', 'B'],
['1', 'I'], ['1', 'L'], ['0', 'O'], ['0', 'Q'], ['C', 'K'], ['G', 'J'], ['E', ' '], ['Y', ' '], ['S', ' ']

How it works?

Original Formula:



  • m is the number of matching characters.
  • t is half the number of transpositions.

With Adjusting Table:



  • s is the number of nonmatching but similar characters.

Why This?

There is also another similar gem named fuzzy-string-match which both provides C and Ruby version as well.

I reinvent this wheel because of the naming in fuzzy-string-match such as getDistance breaks convention, and some weird code like a1 = s1.split( // ) (s1.chars could be better), furthermore, it's bugged (see tables below).

Compare with other gems

Encoding SupportYesPure Ruby onlyNoNo
Windows SupportYes?NoYes
Adjusting TableYesNoNoNo
Pure RubyYesYesNoNo

I made a table below to compare accuracy between each gem:



$ bundle exec rake benchmark
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]

# C Extension
Rehearsal --------------------------------------------------------------
jaro_winkler (8c16e09)       0.240000   0.000000   0.240000 (  0.241347)
fuzzy-string-match (1.0.1)   0.400000   0.010000   0.410000 (  0.403673)
hotwater (0.1.2)             0.250000   0.000000   0.250000 (  0.254503)
amatch (0.4.0)               0.870000   0.000000   0.870000 (  0.875930)
----------------------------------------------------- total: 1.770000sec

                                 user     system      total        real
jaro_winkler (8c16e09)       0.230000   0.000000   0.230000 (  0.236921)
fuzzy-string-match (1.0.1)   0.380000   0.000000   0.380000 (  0.381942)
hotwater (0.1.2)             0.250000   0.000000   0.250000 (  0.254977)
amatch (0.4.0)               0.860000   0.000000   0.860000 (  0.861207)

# Pure Ruby
Rehearsal --------------------------------------------------------------
jaro_winkler (8c16e09)       0.440000   0.000000   0.440000 (  0.438470)
fuzzy-string-match (1.0.1)   0.860000   0.000000   0.860000 (  0.862850)
----------------------------------------------------- total: 1.300000sec

                                 user     system      total        real
jaro_winkler (8c16e09)       0.440000   0.000000   0.440000 (  0.439237)
fuzzy-string-match (1.0.1)   0.910000   0.010000   0.920000 (  0.920259)


  • Custom adjusting word table.

Author: tonytonyjan
Source code:
License: MIT license

#ruby #algorithm 

Jaro Winkler: Ruby/C Implementation Of Jaro-Winkler Distance Algorithm

Algorithms: Ruby Algorithms and Data Structures. C Extensions


API Documentation


Started as a Google Summer of Code 2008 project

Written by Kanwei Li, mentored by Austin Ziegler

Original Proposal:

Using the right data structure or algorithm for the situation is an important aspect of programming. In computer science literature, many data structures and algorithms have been researched and extensively documented. However, there is still no standard library in Ruby implementing useful structures and algorithms like Red/Black Trees, tries, different sorting algorithms, etc. This project will create such a library with documentation on when to use a particular structure/algorithm. It will also come with a benchmark suite to compare performance in different situations.


* Heaps              Containers::Heap, Containers::MaxHeap, Containers::MinHeap
* Priority Queue     Containers::PriorityQueue
* Deque              Containers::Deque, Containers::CDeque (C ext)
* Stack              Containers::Stack
* Queue              Containers::Queue
* Red-Black Trees    Containers::RBTreeMap, Containers::CRBTreeMap (C ext)
* Splay Trees        Containers::SplayTreeMap, Containers::CSplayTreeMap (C ext)
* Tries              Containers::Trie
* Suffix Array       Containers::SuffixArray

* Search algorithms
  - Binary Search            Algorithms::Search.binary_search
  - Knuth-Morris-Pratt       Algorithms::Search.kmp_search
* Sorting algorithms           
  - Bubble sort              Algorithms::Sort.bubble_sort
  - Comb sort                Algorithms::Sort.comb_sort
  - Selection sort           Algorithms::Sort.selection_sort
  - Heapsort                 Algorithms::Sort.heapsort
  - Insertion sort           Algorithms::Sort.insertion_sort
  - Shell sort               Algorithms::Sort.shell_sort
  - Quicksort                Algorithms::Sort.quicksort
  - Mergesort                Algorithms::Sort.mergesort
  - Dual-Pivot Quicksort     Algorithms::Sort.dualpivotquicksort


require 'rubygems'
require 'algorithms'

max_heap =

# To not have to type "Containers::" before each class, use:
include Containers
max_heap =


  • Ruby 1.8, Ruby 1.9, JRuby
  • C extensions (optional, but very much recommended for vast performance benefits)

Author: kanwei
Source code:
License: MIT license

#ruby #datastructures #algorithm 

Algorithms: Ruby Algorithms and Data Structures. C Extensions

ID3-based Implementation Of The ML Decision Tree Algorithm for Ruby

Decision Tree

A Ruby library which implements ID3 (information gain) algorithm for decision tree learning. Currently, continuous and discrete datasets can be learned.

  • Discrete model assumes unique labels & can be graphed and converted into a png for visual analysis
  • Continuous looks at all possible values for a variable and iteratively chooses the best threshold between all possible assignments. This results in a binary tree which is partitioned by the threshold at every step. (e.g. temperate > 20C)


  • ID3 algorithms for continuous and discrete cases, with support for inconsistent datasets.
  • Graphviz component to visualize the learned tree
  • Support for multiple, and symbolic outputs and graphing of continuous trees.
  • Returns default value when no branches are suitable for input


  • Ruleset is a class that trains an ID3Tree with 2/3 of the training data, converts it into set of rules and prunes the rules with the remaining 1/3 of the training data (in a C4.5 way).
  • Bagging is a bagging-based trainer (quite obvious), which trains 10 Ruleset trainers and when predicting chooses the best output based on voting.

Blog post with explanation & examples


require 'decisiontree'

attributes = ['Temperature']
training = [
  [36.6, 'healthy'],
  [37, 'sick'],
  [38, 'sick'],
  [36.7, 'healthy'],
  [40, 'sick'],
  [50, 'really sick'],

# Instantiate the tree, and train it based on the data (set default to '1')
dec_tree =, training, 'sick', :continuous)

test = [37, 'sick']
decision = dec_tree.predict(test)
puts "Predicted: #{decision} ... True decision: #{test.last}"

# => Predicted: sick ... True decision: sick

# Specify type ("discrete" or "continuous") in the training data
labels = ["hunger", "color"]
training = [
        [8, "red", "angry"],
        [6, "red", "angry"],
        [7, "red", "angry"],
        [7, "blue", "not angry"],
        [2, "red", "not angry"],
        [3, "blue", "not angry"],
        [2, "blue", "not angry"],
        [1, "red", "not angry"]

dec_tree =, training, "not angry", color: :discrete, hunger: :continuous)

test = [7, "red", "angry"]
decision = dec_tree.predict(test)
puts "Predicted: #{decision} ... True decision: #{test.last}"

# => Predicted: angry ... True decision: angry

Author:  igrigorik
Source code: 

#ruby #algorithm 

ID3-based Implementation Of The ML Decision Tree Algorithm for Ruby

Rgl: A Framework for Graph Data Structures and Algorithms in Ruby

Ruby Graph Library (RGL)  

RGL is a framework for graph data structures and algorithms.

The design of the library is much influenced by the Boost Graph Library (BGL) which is written in C++. Refer to for further links and documentation on graph data structures and algorithms and the design rationales of BGL.

A comprehensive summary of graph terminology can be found in the graph section of the Dictionary of Algorithms and Data Structures at or Wikipedia.

Design principles

This document concentrates on the special issues of the implementation in Ruby. The main design goals directly taken from the BGL design are:

An interface for how the structure of a graph can be accessed using a generic interface that hides the details of the graph data structure implementation. This interface is defined by the module {RGL::Graph}, which should be included in concrete classes.

A standardized generic interface for traversing graphs {RGL::GraphIterator}

RGL provides some general purpose graph classes that conform to this interface, but they are not meant to be the only graph classes. As in BGL I believe that the main contribution of the RGL is the formulation of this interface.

The BGL graph interface and graph components are generic in the sense of the C++ Standard Template Library (STL). In Ruby other techniques are available to express the generic character of the algorithms and data structures mainly using mixins and iterators. The BGL documentation mentions three means to achieve genericity:

  • Algorithm/Data-Structure Interoperability
  • Extension through Function Objects and Visitors
  • Element Type Parameterization
  • Vertex and Edge Property Multi-Parameterization

The first is easily achieved in RGL using mixins, which of course is not as efficient than C++ templates (but much more readable :-). The second one is even more easily implemented using standard iterators with blocks or using the stream module. The third one is no issue since Ruby is dynamically typed: Each object can be a graph vertex. There is no need for a vertex (or even edge type). In the current version of RGL properties of vertices are simply attached using hashes. At first there seems to be not much need for the graph property machinery.


RGL current contains a core set of algorithm patterns:

  • Breadth First Search {RGL::BFSIterator}
  • Depth First Search {RGL::DFSIterator}

The algorithm patterns by themselves do not compute any meaningful quantities over graphs, they are merely building blocks for constructing graph algorithms. The graph algorithms in RGL currently include:

  • Topological Sort {RGL::TopsortIterator}
  • Connected Components {RGL::Graph#each_connected_component}
  • Strongly Connected Components {RGL::Graph#strongly_connected_components}
  • Transitive Closure {RGL::Graph#transitive_closure}
  • Dijkstras Shortest Path Algorithm {RGL::DijkstraAlgorithm}
  • Bellman Ford Algorithm {RGL::BellmanFordAlgorithm}

Data Structures

RGL currently provides two graph classes that implement a generalized adjacency list and an edge list adaptor.

  • {RGL::AdjacencyGraph}
  • {RGL::ImplicitGraph}

The AdjacencyGraph class is the general purpose swiss army knife of graph classes. It is highly parameterized so that it can be optimized for different situations: the graph is directed or undirected, allow or disallow parallel edges, efficient access to just the out-edges, fast vertex insertion and removal at the cost of extra space overhead, etc.

Differences to BGL

The concepts of IncidenceGraph, AdjacencyGraph and VertexListGraph (see are here bundled in the base graph module. Most methods of IncidenceGraph should be standard in the base module Graph. The complexity guarantees can not necessarily provided. See


% gem install rgl

or download the latest sources from the git repository

If you are going to use the drawing functionalities install Graphviz.

Running tests

Checkout RGL git repository and go to the project directory. First, install RGL dependencies with bundler:

% bundle install

After that you can run the tests:

% rake test

Example irb session with RGL

% irb -Ilib

irb> require 'rgl/adjacency'
irb> dg=RGL::DirectedAdjacencyGraph[1,2 ,2,3 ,2,4, 4,5, 6,4, 1,6]
# Use DOT to visualize this graph:
irb> require 'rgl/dot'
irb> dg.write_to_graphic_file('jpg')

The result:


You can control the graph layout by passing layout parameters to write_to_graphic_file. See TestDot::test_to_dot_digraph_with_options for an example using a feature implemented by Lia Skalkos (see PR #41).

irb> dg.directed?
irb> dg.vertices
[5, 6, 1, 2, 3, 4]
irb> dg.has_vertex? 4

Every object could be a vertex (there is no class Vertex), even the class object Object:

irb> dg.has_vertex? Object
irb> dg.edges.sort.to_s
irb> dg.to_undirected.edges.sort.to_s

Add inverse edge (4-2) to directed graph:

irb> dg.add_edge 4,2

(4-2) == (2-4) in the undirected graph:

irb> dg.to_undirected.edges.sort.to_s

(4-2) != (2-4) in directed graphs:

irb> dg.edges.sort.to_s
irb> dg.remove_edge 4,2

Check whether a path exists between vertices 1 and 5

irb> require 'rgl/path'
irb> dg.path?(1, 5)

Topological sort is implemented as an iterator:

require 'rgl/topsort'
irb> dg.topsort_iterator.to_a
[1, 2, 3, 6, 4, 5]

A more elaborated example showing implicit graphs:

require 'rgl/implicit'
def module_graph { |g|
    g.vertex_iterator { |b|
      ObjectSpace.each_object(Module, &b)
    g.adjacent_iterator { |x, b|
      x.ancestors.each { |y| unless x == y || y == Kernel || y == Object
    g.directed = true

This function creates a directed graph, with vertices being all loaded modules:

g = module_graph

We only want to see the ancestors of {RGL::AdjacencyGraph}:

require 'rgl/traversal'
tree = g.bfs_search_tree_from(RGL::AdjacencyGraph)

Now we want to visualize this component of g with DOT. We therefore create a subgraph of the original graph, using a filtered graph:

g = g.vertices_filtered_by {|v| tree.has_vertex? v}

creates the following graph image with DOT:

Module graph

This graph shows all loaded RGL modules:

RGL Modules

Look for more in examples directory.

I collect some links to stuff around RGL at


Many thanks to Robert Feldt which also worked on a graph library ( who pointed me to BGL and many other graph resources.

Robert kindly allowed to integrate his work on graphr, which I did not yet succeed. Especially his work to output graphs for GraphViz is much more elaborated than the minimal support in dot.rb.

Jeremy Siek one of the authors of the nice book The Boost Graph Library kindly allowed to use the BGL documentation as a cheap reference for RGL. He and Robert also gave feedback and many ideas for RGL.

Dave Thomas for RDoc which generated what you read and matz for Ruby. Dave included in the latest version of RDoc (alpha9) the module dot/dot.rb which I use instead of Roberts module to visualize graphs (see rgl/dot.rb).

Jeremy Bopp, John Carter, Sascha Doerdelmann, Shawn Garbett, Andreas Schörk and Kirill Lashuk for contributing additions, test cases and bugfixes.

Kirill Lashuk who started to take over further development in November 2012.

See also


RGL is Copyright (c) 2002,2004,2005,2008,2013,2015,2019,2020 by Horst Duchene. It is free software, and may be redistributed under the Ruby license and terms specified in the LICENSE file.


Author:   monora
Source code:
License: View license

#ruby #datastructures  #algorithm 

Rgl: A Framework for Graph Data Structures and Algorithms in Ruby
Royce  Reinger

Royce Reinger


Calculates Edit Distance using Damerau-Levenshtein Algorithm


The damerau-levenshtein gem allows to find edit distance between two UTF-8 or ASCII encoded strings with O(N*M) efficiency.

This gem implements pure Levenshtein algorithm, Damerau modification of it (where 2 character transposition counts as 1 edit distance). It also includes Boehmer & Rees 2008 modification of Damerau algorithm, where transposition of bigger than 1 character blocks is taken in account as well (Rees 2014).

require "damerau-levenshtein"
DamerauLevenshtein.distance("Something", "Smoething") #returns 1

It also returns a diff between two strings according to Levenshtein alrorithm. The diff is expressed by tags <ins>, <del>, and <subst>. Such tags make it possible to highlight differnce between strings in a flexible way.

require "damerau-levenshtein"
differ ="corn", "cron")
# output: ["c<subst>or</subst>n", "c<subst>ro</subst>n"]


sudo apt-get install build-essential libgmp3-dev


gem install damerau-levenshtein


require "damerau-levenshtein"
dl = DamerauLevenshtein
  • compare using Damerau Levenshtein algorithm
dl.distance("Something", "Smoething") #returns 1
  • compare using Levensthein algorithm
dl.distance("Something", "Smoething", 0) #returns 2
  • compare using Boehmer & Rees modification
dl.distance("Something", "meSothing", 2) #returns 2 instead of 4
  • comparison of words with UTF-8 characters should work fine:
dl.distance("Sjöstedt", "Sjostedt") #returns 1
  • compare two arrays
dl.array_distance([1,2,3,5], [1,2,3,4]) #returns 1
  • return diff between two strings
differ ="Something", "smthg")
  • return diff between two strings in raw format
differ =
differ.format = :raw"Something", "smthg")

API Description



#returns version number of the gem


DamerauLevenshtein.distance(string1, string2, block_size, max_distance)
#returns edit distance between 2 strings

DamerauLevenshtein.string_distance(string1, string2, block_size, max_distance)
# an alias for .distance

DamerauLevenshtein.array_distance(array1, array2, block_size, max_distance)
# returns edit distance between 2 arrays of integers

DamerauLevenshtein.distance and .array_distance take 4 arguments:

  • string1 (array1 for .array_distance)
  • string2 (array2 for .array_distance)
  • block_size (default is 1)
  • max_distance (default is 10)

block_size determines maximum number of characters in a transposition block:

block_size = 0
(transposition does not count -- it is a pure Levenshtein algorithm)

block_size = 1
(transposition between 2 adjustent characters --
it is pure Damerau-Levenshtein algorithm)

block_size = 2
(transposition between blocks as big as 2 characters -- so abcd and cdab
counts as edit distance 2, not 4)

block_size = 3
(transposition between blocks as big as 3 characters --
so abcdef and defabc counts as edit distance 3, not 6)


max_distance -- is a threshold after which algorithm gives up and returns max_distance instead of real edit distance.

Levenshtein algorithm is expensive, so it makes sense to give up when edit distance is becoming too big. The argument max_distance does just that.

DamerauLevenshtein.distance("abcdefg", "1234567", 0, 3)
# output: 4 -- it gave up when edit distance exceeded 3


differ = creates an instance of new differ class to return difference between two strings

differ.format shows current format for diff. Default is :tag format

differ.format = :raw changes current format for diffs. Possible values are :tag and :raw"String1", "String2") returns difference between two strings.

For example:

differ ="Something", "smthng")
# output: ["<ins>S</ins><subst>o</subst>m<ins>e</ins>th<ins>i</ins>ng",
#          "<del>S</del><subst>s</subst>m<del>e</del>th<del>i</del>ng"]

Or with parsing:

require "damerau-levenshtein"
require "nokogiri"

differ =
res ="Something", "Smothing!")
nodes = Nokogiri::XML("<root>#{res.first}</root>")

markup = do |n|
  when "text"
  when "del"
  when "ins"
  when "subst"

puts markup



Contributing to damerau-levenshtein

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
  • Fork the project
  • Start a feature/bugfix branch
  • Commit and push until you are happy with your contribution
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.


This gem is following practices of Semantic Versioning

Download Details: 

Author: GlobalNamesArchitecture
Source Code: 
License: MIT license

#ruby #algorithm 

Calculates Edit Distance using Damerau-Levenshtein Algorithm
Tamale  Moses

Tamale Moses


Quad Mesh Generation on Given Triangular Mesh input with Algorithm


Over view

Quad mesh generation on given triangular mesh input with quadcover algorithm.

Three js permits to modify by hand the frame field topology on the input tri mesh, to see in real time the impact on the output quadrilateral mesh.


To run the project:

Compile cpp (need wasm installation):

cd cpp
make compile

Run the node js server :

cd Back
npm install // if running for the first time
node server.js

Compile and run the page

cd Front
npm install // if running for the first time
npm run build

then go to localhost:8080, the page should be up



Author: david540
Source code:

#cpluplus #algorithm 

Quad Mesh Generation on Given Triangular Mesh input with Algorithm
Royce  Reinger

Royce Reinger


Naturally: Natural Sort Algorithm


Natural ("version number") sorting with support for legal document numbering, college course codes, and Unicode. See Jeff Atwood's Sorting for Humans: Natural Sort Order and the Public.Law post Counting to 10 in Californian.


$ gem install naturally


require 'naturally'

# Sort a simple array of strings with legal numbering
Naturally.sort(["336", "335a", "335", "335.1"])  # => ["335", "335.1", "335a", "336"]

# Sort version numbers
Naturally.sort(["13.10", "13.04", "10.10", "10.04.4"])  # => ["10.04.4", "10.10", "13.04", "13.10"]

Usually the library is used to sort an array of objects:

# Define a new simple object for storing Ubuntu versions
UbuntuVersion =, :version)

# Create an array
releases = ['Saucy Salamander', '13.10'),'Raring Ringtail',  '13.04'),'Precise Pangolin', '12.04.4'),'Maverick Meerkat', '10.10'),'Quantal Quetzal',  '12.10'),'Lucid Lynx',       '10.04.4')

# Sort by version number
sorted = Naturally.sort(releases, by: :version)

# Check what we have
expect( eq [
  'Lucid Lynx',
  'Maverick Meerkat',
  'Precise Pangolin',
  'Quantal Quetzal',
  'Raring Ringtail',
  'Saucy Salamander'

More examples are in the specs.

Implementation Notes

The algorithm capitalizes on Ruby's array comparison behavior: Since each dotted number actually represents a hierarchical identifier, array comparison is a natural fit:

Arrays are compared in an “element-wise” manner; the first element of ary is compared with the first one of other_ary using the <=> operator, then each of the second elements, etc… As soon as the result of any such comparison is non zero (i.e. the two corresponding elements are not equal), that result is returned for the whole array comparison.

And so, when given input such as,

['1.9', '1.9a', '1.10']

...this module sorts the segmented numbers by comparing them in their array forms:

[['1', '9'], ['1', '9a'], ['1', '10']]

Finally, upon actual sort comparison, each of these strings is converted to an array of typed objects. This is to determine the sort order between heterogenous (yet ordered) segments such as '9a' and '9'.

The final nested comparison structure looks like this:

     [1], [9]
     [1], [9, 'a']
     [1], [10]

Related Work


  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: Dogweather
Source Code: 
License: MIT license

#ruby #sort #algorithm 

Naturally: Natural Sort Algorithm
Monty  Boehm

Monty Boehm


SoftConfidenceWeighted.jl: Exact Soft Confidence-Weighted Learning


This is an online supervised learning algorithm which utilizes the four salient properties:

  • Large margin training
  • Confidence weighting
  • Capability to handle non-separable data
  • Adaptive margin

The paper is here.


SCW has 2 formulations of its algorithm which are SCW-I and SCW-II.
You can choose which to use by the parameter of init.


  1. This package performs only binary classification, not multiclass classification.
  2. Training labels must be 1 or -1. No other labels allowed.

Training from matrix

Feature vectors are given as the columns of the matrix X.

using SoftConfidenceWeighted

# C and ETA are hyperparameters.
# X is a data matrix which each column represents a data vector.
# y is corresponding labels.

model = init(C = 1, ETA = 1, type_ = SCW1)
model = fit!(model, X_train, y_train)
y_pred = predict(model, X_test)

Author: IshitaTakeshi
Source Code: 
License: MIT license

#julia #algorithm #learning 

SoftConfidenceWeighted.jl: Exact Soft Confidence-Weighted Learning
Vaughn  Sauer

Vaughn Sauer


MatLab/Octave Examples Of Popular Machine Learning Algorithms

Machine Learning in MatLab/Octave

For Python/Jupyter version of this repository please check homemade-machine-learning project.

This repository contains MatLab/Octave examples of popular machine learning algorithms with code examples and mathematics behind them being explained.

The purpose of this repository was not to implement machine learning algorithms using 3rd party libraries or Octave/MatLab "one-liners" but rather to practice and to better understand the mathematics behind each algorithm. In most cases the explanations are based on this great machine learning course.

Supervised Learning

In supervised learning we have a set of training data as an input and a set of labels or "correct answers" for each training set as an output. Then we're training our model (machine learning algorithm parameters) to map the input to the output correctly (to do correct prediction). The ultimate purpose is to find such model parameters that will successfully continue correct input→output mapping (predictions) even for new input examples.


In regression problems we do real value predictions. Basically we try to draw a line/plane/n-dimensional plane along the training examples.

Usage examples: stock price forecast, sales analysis, dependency of any number, etc.

🤖 Linear Regression - example: house prices prediction.


In classification problems we split input examples by certain characteristic.

Usage examples: spam-filters, language detection, finding similar documents, handwritten letters recognition, etc.

🤖 Logistic Regression - examples: microchip fitness detection, handwritten digits recognitions using one-vs-all approach.

Unsupervised Learning

Unsupervised learning is a branch of machine learning that learns from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.


In clustering problems we split the training examples by unknown characteristics. The algorithm itself decides what characteristic to use for splitting.

Usage examples: market segmentation, social networks analysis, organize computing clusters, astronomical data analysis, image compression, etc.

🤖 K-means algorithm - example: split data into three clusters.

Anomaly Detection

Anomaly detection (also outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.

Usage examples: intrusion detection, fraud detection, system health monitoring, removing anomalous data from the dataset etc.

🤖 Anomaly Detection using Gaussian distribution - example: detect overloaded server.

Neural Network (NN)

The neural network itself isn't an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs.

Usage examples: as a substitute of all other algorithms in general, image recognition, voice recognition, image processing (applying specific style), language translation, etc.

🤖 Neural Network: Multilayer Perceptron (MLP) - example: handwritten digits recognition.

Machine Learning Map

Machine Learning Map

The source of the following machine learning topics map is this wonderful blog post

How to Use This Repository

Install Octave or MatLab

This repository contains *.m scripts that are intended to be run in Octave or MatLab. Thus in order to launch demos you need either Octave or MatLab to be installed on you local machine. In case of MatLab you may also use its web-version.

Run Demos

In order to run the demo of your choice you should move to the chosen folder (i.e. neural-network):

cd neural-network

Launch Octave console:


Launch demo script from Octave console:


To see all demo variables you may launch:


To exit the demo you may launch:


Also be aware that demo scripts opens additional window with charts and other graphical information that is related to the running algorithm. You may find screenshots of the window that each demo will render for you on the dedicated README files for each machine learning algorithm.


Author: trekhleb
Source code:
License: MIT license

#machine-learning #matlab #algorithm 

MatLab/Octave Examples Of Popular Machine Learning Algorithms
Sofia  Maggio

Sofia Maggio


MLBase.jl: Functions to Support The Development Of ML Algorithms


Swiss knife for machine learning.  

This package does not implement specific machine learning algorithms. Instead, it provides a collection of useful tools to support machine learning programs, including:

  • Data manipulation & preprocessing
  • Score-based classification
  • Performance evaluation (e.g. evaluating ROC)
  • Cross validation
  • Model tuning (i.e. search best settings of parameters)

Notes: This package depends on StatsBase and reexports all names therefrom.


Author: JuliaStats
Source code:
License: MIT license

#machine-learning #algorithm #julia 

MLBase.jl: Functions to Support The Development Of ML Algorithms
Monty  Boehm

Monty Boehm


Ladder.jl: A Reliable Leaderboard Algorithm for Machine Learning


A realiable leaderboard for machine learning competitions


Open a Julia prompt and call: Pkg.clone("")


See this blog post for a discussion on the problem of overfitting to the public leaderboard in a data science competition.

This is the code repository for this paper. Here's a bibtex reference:

  author    = {Avrim Blum and Moritz Hardt},
  title     = {The Ladder: {A} Reliable Leaderboard for Machine Learning Competitions},
  journal   = {CoRR},
  volume    = {abs/1502.04585},
  year      = {2015},
  url       = {},
  timestamp = {Mon, 02 Mar 2015 14:17:34 +0100},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}

If you use the code, we encourage you to cite our paper.


The basic usage is as follows:

using Ladder
# these are the labels corresponding to your holdout data set
holdoutlabels = [1.0, 0.0, 1.0, 1.0, 1.0, 0.0]
# create ladder instance around holdout labels
l = ladder(holdoutlabels)
# create submission
submission1 = Submission("sub1","teamA",[0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
score!(l,submission1,Ladder.loss01) # returns: 0.6666666666666666
# create another submission
submission2 = Submission("sub2","teamA",[1.0, 0.0, 0.0, 0.0, 0.0, 0.0])
score!(l,submission2,Ladder.loss01) # returns: 0.6666666666666666
# Ladder judged that there was no significant improvement
# create another submission
submission3 = Submission("sub3","teamA",[1.0, 0.0, 1.0, 0.0, 0.0, 0.0])
score!(l,submission2,Ladder.loss01) # 0.3333333333333333

See examples/photo.jl for a comprehensive example on Kaggle's Photo Quality Prediction challenge. The data set is not yet available, but will most likely be released by Kaggle in the near future.

Other usage

You can also use the Ladder mechanism to keep track of your own progress in a data science project and avoid overfitting to your holdout set. This can be useful in situations where you repeatedly evaluate candidate models against a holdout set.

Author: mrtzh
Source Code: 
License: View license

#julia #algorithm #machinelearning 

Ladder.jl: A Reliable Leaderboard Algorithm for Machine Learning