Thomas  Granger

Thomas Granger

1656403049

JavaScript Data Structures | Implement a Linked List in JavaScript

Understanding Linked Lists can be a difficult task when you are a beginner JavaScript developer since JavaScript does not provide built-in Linked List support. In an advanced language like JavaScript, we need to implement this data structure by scratch and, if you are unfamiliar with how this data structure works, the implementation part becomes that much more difficult.

In this article, we will discuss how a linked list gets stored in the database and, we will implement a linked-list from scratch with operations like the addition and deletion of an element, lookups, and reversing a linked-list. Before moving onto implementing a linked-list, one needs to understand what are the advantages of using a linked-list when we already have data structures like arrays and objects present.

We know that elements inside an array get stored in the database with numbered indexes and, in sequential order:

Arrays in memory

Arrays in memory

While using arrays, operations like adding/deleting elements at the start or at a specific index can be a slow task since we have to shift indexes of all the other elements. This slowness is caused due to the numbered indexes feature of arrays.

The above problem can get solved with the use of objects. Since in objects, the elements get stored at random positions and therefore, there is no need to shift indexes of elements while performing the operations like adding/deleting elements at the start or specific index:

Objects in memory

Objects in memory

Although operations like addition and deletion are fast in objects, we observe from the above image that, when it comes to iterating operations, objects are not the best choice since elements of an object get stored in random positions. Therefore, iterating operations can take a long time. This is where linked-lists come in.

So what is a linked list?

From the name itself, we can figure out that it’s a list that is linked in some way. So how is it linked and what does the list contain? A linked list consists of nodes that have two properties, data and the pointer. The pointer inside a node points to the next node in the list. The first node inside a linked list is called the head. For understanding better, let’s take a look at the following image that describes a linked list:

Linked List Illustration

Linked List Illustration

We observe from the above image that each node has two properties, data and a pointer. The pointer points to the next node in the list and, the pointer of the last node points to null. The above image represents a singly linked list.

We can see there is a big difference when comparing linked lists with objects. In linked-lists, each node gets connected to the next node via a pointer. Therefore, we have a connection between each node of the linked list, whereas, in objects, the key-value pairs get stored randomly and have no connection between each other.

Let’s implement a linked-list that stores integers as data. Since JavaScript does not provide built-in linked list support, we will be using objects and classes to implement a linked list. Let’s get started:

class Node{
  constructor(value){
    this.value = value;
    this.next = null;
  }
}

class LinkedList{
  constructor(){
    this.head = null;
    this.tail = this.head;
    this.length = 0;
  }

  append(value){

  }

  prepend(value){

  }

  insert(value,index){

  }

  lookup(index){
  
  }

  remove(index){

  }

  reverse(){
    
  }
}

In the above code, we have created two classes, one for the linked-list itself and the other for creating nodes. Like we discussed, each node will have two properties, a value and a pointer (“next” in the above case). The LinkedList class contains three properties, the head (which is null initially), the tail (which points to null as well) that is used to store the last node of the linked list and the length property that holds the length of the linked list. It also consists of functions that are empty for now. We will fill these functions one by one.

append (Adding values sequentially)

This function adds a node to the end of the linked list. For implementing this function, we need to understand the operation that it’s going to perform:

Illustration of append function

Illustration of append function

From the above image, we can implement the append function in the following way:

  append(value){
    const newNode = new Node(value);
    if(!this.head){
      this.head = newNode;
      this.tail = newNode;
    }
    else{
      this.tail.next = newNode;
      this.tail = newNode;
   }
    this.length++;
  }

Let’s decode the function,

If you are new to JavaScript, understanding the above function can be daunting therefore, let’s breakdown what happens when we perform the append function:

const linkedList1 = new LinkedList();
linkedList1.append(2);

Check whether the head points to null, it does, so we create a new object and assign the new object to both head and tail:

let node = new Node(2);
this.head = newNode;
this.tail = newNode;

Now, both head and tail are pointing to the same object and this is a very important point to remember.

Next, let’s append two more values to the linked list:

linkedList1.append(3);
linkedList1.append(4);

Now, the head does not point to null, so we go into the else condition of append function:

this.tail.next = node;

Since both head and tail point to the same object, any change in tail results in change of the head object as well. That is how objects work in JavaScript. In JavaScript, objects are passed by reference and therefore, both head and tail point to the same address space where the object is stored. The above line of code is equivalent to:

this.head.next = node;

Next,

this.tail = node;

Now, after the executing the above line of code, the this.head.next and this.tail are pointing to the same object and therefore, whenever we append new nodes, the head object will automatically get updated.

After performing three appends, try console.logging the linkedList1 object, this is how it should look:

head: {value: 2 , next: {value: 3, next: {value: 4,next: null}}}
tail : {value: 4, next: null}
length:3

We observe from all the above code that the append function of a linked list is of complexity O(1) since we neither have to shift indexes nor iterate through the linked list.

Let’s move on to the next function,

prepend (Adding values to the start of the linked list)

For implementing this function, we create a new node using the Node class and point the next object of this new node to the head of the linked list. Next, we assign the new node to the head of the linked list:

prepend(value){
   const node = new Node(value);
    
   node.next = this.head;
   this.head = node;
   this.length++;
}

Just like the append function, this function as well has the complexity of O(1).

insert(Adding values at a specific index)

Before implementing this function in code, it’s important to visualise what this function does, therefore, for understanding purposes, let’s create a linked list with few values and then visualise the insert function. The insert function takes in two parameters, value and index.

let linkedList2 = new LinkedList();
linkedList2.append(23);
linkedList2.append(89);
linkedList2.append(12);
linkedList2.append(3);
linkedList2.insert(45,2);

Step 1:

Iterate through the linked list till we reach the index-1 position (1st index in this case):

Step 1 illustration of insert operation

Step 1 illustration of insert operation

Step 2:

Assign the pointer of node at index-1 position(89 in this case) to the new node(45 in this case):

Step 2 illustration of insert operation

Step 2 illustration of insert operation

Step 3: 

Assign the next pointer of the new node (45) to the next node (12):

Step 3 illustration of insert operation

Step 3 illustration of insert operation

This is how the insert operation is performed. Using the above visualisation, we observe that we need to find nodes at index-1 position and index position so that we can insert the new node between them. Let’s implement this in code:

 insert(value,index){
    if(index >= this.length){
      this.append(value);
    }
    
    const node = new Node(value);
    
    const {prevNode,nextNode} = this.getPrevNextNodes(index);
    prevNode.next = node;
    node.next = nextNode;
    
    this.length++;
  }

Let’s decode the function, if the value of index is greater than or equal to the length property, we handover the operation to the append function. For the else condition, we create a new node using Node class, next we observe a new function getPrevNextNodes( ) through which we receive values of prevNode and nextNode. The getPrevNextNodes function gets implemented like this:

 getPrevNextNodes(index){
    let count = 0;
    let prevNode = this.head;
    let nextNode = prevNode.next;

    while(count < index - 1){
      prevNode = prevNode.next;
      nextNode = prevNode.next;
      count++;
    }

    return {
      prevNode,
      nextNode
    }
  }

The above function basically returns the nodes at index-1 position and index position by iterating through the linked list. After receiving these nodes, we point the next property of the prevNode to the new node and the new node’s next property to the nextNode.

The insert operation for a linked list is of complexity O(n) since we have to iterate through the linked list and search for nodes at index-1 and index positions. Although the complexity is O(n), we observe that this insert operation is much faster than the insert operation on arrays, in arrays we would have to shift the indexes of all the elements after a particular index but, in the case of a linked list, we only manipulate the next properties of nodes at index-1 and index positions.

remove (Removing element at a specific index)

Now that we have covered the insertion operation, the remove operation might feel easier since it’s almost the same as the insertion operation with a small difference, when we get prevNode and nextNode values from the getPrevNextNodes function, we have to perform the following operation in remove function:

prevNode.next = nextNode.next;

By executing the above line of code, the next property of node at index-1 position will now point to node at index+1 position. This way, the node at index position will be removed.

The complete function:

remove(index){
    let {previousNode,currentNode} = this.getNodes(index);
    previousNode.next = currentNode.next;
    this.length--;
}

The remove operation is also of complexity O(n) but, again, like the insertion operation, the remove operation in linked lists is faster than the remove operation in arrays.

reverse(Reversing the linked list)

Although it might seem simple, reversing a linked list can often be the most confusing operation to implement and hence, this operation gets asked a lot in coding interviews. Before implementing the function, let’s visualise the strategy that we are going to use to reverse a linked list.

For reversing a linked list, we need to keep track of three nodes, previousNode, currentNode and nextNode.

Consider the following linked list:

let linkedList2 = new LinkedList();
linkedList2.append(67);
linkedList2.append(32);
linkedList2.append(44);

Step 1:

Initially, the previousNode has the value null and, the currentNode has the value of head:

Step 1 iStep 1 illustration of reverse operationllustration of insert operation

SteStep 1 illustration of reverse operationp 1 illustration of insert operation

Step 2:

Next, we assign the nextNode to the currentNode.next:

Step 2 illustration of reverse operation

Step 2 illustration of reverse operation

Step 3:

Next, we point the currentNode.next property to the previousNode:

Step 3 illustration of reverse operation

Step 3 illustration of reverse operation

Step 4:

Now, we shift the previousNode to currentNode and currentNode to nextNode:

Step 4 illustration of reverse operation

Step 4 illustration of reverse operation

This process restarts from step 2 and continues till currentNode equals null.

To implement this on code:

reverse(){
    let previousNode = null;
    let currentNode = this.head;

    while(currentNode !== null){
      let nextNode = currentNode.next;
      currentNode.next = previousNode;
      previousNode = currentNode;
      currentNode = nextNode;
    }

    this.head = previousNode;
}

Like we visualised, till we hit the currentNode === null mark, we keep iterating and shifting the values. In the end, we assign the previousNode value to the head.

The reverse operation has a complexity of O(n).

lookup (Looking up a value at specific index)

This operation is simple, we just iterate through the linked list and return the node at specific index. This operation as well has the complexity of O(n).

lookup(index){
    let counter = 0;
    let currentNode = this.head;
    while(counter < index){
      currentNode = currentNode.next;
      counter++;
    }
    return currentNode;
  }

There you go, we have finished implementing basic operations of a singly linked list in javascript. The difference between a singly and doubly linked list is that, doubly linked list has nodes which have pointers to both the previous node and the next node.

From the above operations, let’s conclude linked lists.

Linked lists provide us with fast append(Adding element at the end) and prepend(Adding element at the start) operations. Although the insertion operation in linked lists is of complexity O(n), it is much faster than insertion operation of arrays. The other problem that we face while using arrays is size complexity, when we use dynamic arrays, while adding an element, we have to copy the complete array to a different address space and then add the element whereas, in linked lists, we don’t face such problems.

The problem we face while using objects is the random placement of elements in memory whereas in linked lists, the nodes are connected to each other with pointers that provide us some order.

So finally, we have finished understanding and evaluating a commonly used data structure called, a Linked List.

#datastructure #javascript

JavaScript Data Structures | Implement a Linked List in JavaScript
Dixie  Wolff

Dixie Wolff

1655721660

How to Implementation of Queue in JavaScript

Queue is a linear collection of items where items are inserted and removed in a particular order. Queue is also called a FIFO Data Structure because it follows the "First In First Out" principle i.e. the item that is inserted in the first is the one that is taken out first. In this video, we look at what the queue is, how is it implemented, what are the different operations you can perform on a queue, and the implementation of Queue in JavaScript. After watching this video, you will be able to answer the following questions:

- What is Queue Data Structure?
- What is FIFO principle?
- What are different operations you can perform on a Queue?
- How to implement stack in Queue?

#queue #datastructure #javascript 

How to Implementation of Queue in JavaScript
Dixie  Wolff

Dixie Wolff

1655030040

Learn What The Stack Data Structureis, How Is It Implemented?

Stack is a linear collection of items where items are inserted and removed in a particular order. Stack is also called a LIFO Data Structure because it follows the "Last In First Out" principle i.e. the item that is inserted in the last is the one that is taken out first. In this video, we look at what the stack is, how is it implemented, what are the different operations you can perform on a stack, and some of the real-world usages of Stack. After watching this video, you will be able to answer the following questions:

- What is Stack Data Structure?
- What is LIFO principle?
- What are different operations you can perform on a Stack?
- What are some usage examples of Stack?
- How to implement stack in JavaScript?

#stack #datastructure 

Learn What The Stack Data Structureis, How Is It Implemented?

SBE: High Performance Message Codec Written in Java

Simple Binary Encoding (SBE) 

SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applications. This repository contains the reference implementations in Java, C++, Golang, C#, and Rust.

More details on the design and usage of SBE can be found on the Wiki.

An XSD for SBE specs can be found here. Please address questions about the specification to the SBE FIX community.

For the latest version information and changes see the Change Log with downloads at Maven Central.

The Java and C++ SBE implementations work very efficiently with the Aeron messaging system for low-latency and high-throughput communications. The Java SBE implementation has a dependency on Agrona for its buffer implementations. Commercial support is available from sales@real-logic.co.uk.

Binaries

Binaries and dependency information for Maven, Ivy, Gradle, and others can be found at http://search.maven.org.

Example for Maven:

<dependency>
    <groupId>uk.co.real-logic</groupId>
    <artifactId>sbe-all</artifactId>
    <version>${sbe.tool.version}</version>
</dependency>

Build

Build the project with Gradle using this build.gradle file.

Full clean build:

$ ./gradlew

Run the Java examples

$ ./gradlew runJavaExamples

Distribution

Jars for the executable, source, and javadoc for the various modules can be found in the following directories:

sbe-benchmarks/build/libs
sbe-samples/build/libs
sbe-tool/build/libs
sbe-all/build/libs

An example to execute a Jar from command line using the 'all' jar which includes the Agrona dependency:

java -Dsbe.generate.ir=true -Dsbe.target.language=Cpp -Dsbe.target.namespace=sbe -Dsbe.output.dir=include/gen -Dsbe.errorLog=yes -jar sbe-all/build/libs/sbe-all-${SBE_TOOL_VERSION}.jar my-sbe-messages.xml

C++ Build using CMake

NOTE: Linux, Mac OS, and Windows only for the moment. See FAQ. Windows builds have been tested with Visual Studio Express 12.

For convenience, the cppbuild script does a full clean, build, and test of all targets as a Release build.

$ ./cppbuild/cppbuild

If you are comfortable using CMake, then a full clean, build, and test looks like:

$ mkdir -p cppbuild/Debug
$ cd cppbuild/Debug
$ cmake ../..
$ cmake --build . --clean-first
$ ctest

Note: The C++ build includes the C generator. Currently, the C generator is a work in progress.

Golang Build

First build using Gradle to generate the SBE jar and then use it to generate the golang code for testing.

$ ./gradlew
$ ./gradlew generateGolangCodecs

For convenience on Linux, a gnu Makefile is provided that runs some tests and contains some examples.

$ cd gocode
# make # test, examples, bench

Users of golang generated code should see the user documentation.

Developers wishing to enhance the golang generator should see the developer documentation

C# Build

Users of CSharp generated code should see the user documentation.

Developers wishing to enhance the CSharp generator should see the developer documentation

Rust Build

The SBE Rust generator will produce 100% safe rust crates (no unsafe code will be generated). Generated crates do not have any dependencies on any libraries (including no SBE libraries). If you don't yet have Rust installed see Rust: Getting Started

Generate the Rust codecs

$ ./gradlew generateRustCodecs

Run the Rust test from Gradle

$ ./gradlew runRustTests

Or run test directly with Cargo

$ cd rust
$ cargo test

Download Details:
Author: real-logic
Source Code: https://github.com/real-logic/simple-binary-encoding
License: Apache-2.0 license

#datastructure  #java 

SBE: High Performance Message Codec Written in Java

Roaring Bitmap: A Better Compressed Bitset in Java

RoaringBitmap

Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps.

Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, roaring bitmaps can be hundreds of times faster and they often offer significantly better compression. They can even be faster than uncompressed bitmaps.

Roaring bitmaps are found to work well in many important applications:

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)

kudos for making something that makes my software run 5x faster (Charles Parker from BigML)

The YouTube SQL Engine, Google Procella, uses Roaring bitmaps for indexing. Apache Lucene uses Roaring bitmaps, though they have their own independent implementation. Derivatives of Lucene such as Solr and Elastic also use Roaring bitmaps. Other platforms such as Whoosh, Microsoft Visual Studio Team Services (VSTS) and Pilosa also use Roaring bitmaps with their own implementations. You find Roaring bitmaps in InfluxDB, Bleve, Cloud Torrent, and so forth.

There is a serialized format specification for interoperability between implementations. We have interoperable C/C++, Java and Go implementations.

(c) 2013-... the RoaringBitmap authors

This code is licensed under Apache License, Version 2.0 (AL2.0).

When should you use a bitmap?

Sets are a fundamental abstraction in software. They can be implemented in various ways, as hash sets, as trees, and so forth. In databases and search engines, sets are often an integral part of indexes. For example, we may need to maintain a set of all documents or rows (represented by numerical identifier) that satisfy some property. Besides adding or removing elements from the set, we need fast functions to compute the intersection, the union, the difference between sets, and so on.

To implement a set of integers, a particularly appealing strategy is the bitmap (also called bitset or bit vector). Using n bits, we can represent any set made of the integers from the range [0,n): the ith bit is set to one if integer i is present in the set. Commodity processors use words of W=32 or W=64 bits. By combining many such words, we can support large values of n. Intersections, unions and differences can then be implemented as bitwise AND, OR and ANDNOT operations. More complicated set functions can also be implemented as bitwise operations.

When the bitset approach is applicable, it can be orders of magnitude faster than other possible implementation of a set (e.g., as a hash set) while using several times less memory.

However, a bitset, even a compressed one is not always applicable. For example, if you have 1000 random-looking integers, then a simple array might be the best representation. We refer to this case as the "sparse" scenario.

When should you use compressed bitmaps?

An uncompressed BitSet can use a lot of memory. For example, if you take a BitSet and set the bit at position 1,000,000 to true and you have just over 100kB. That is over 100kB to store the position of one bit. This is wasteful even if you do not care about memory: suppose that you need to compute the intersection between this BitSet and another one that has a bit at position 1,000,001 to true, then you need to go through all these zeroes, whether you like it or not. That can become very wasteful.

This being said, there are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integers from [0,n) where n is small (e.g., n=64 or n=128). If you are able to uncompressed BitSet and it does not blow up your memory usage, then compressed bitmaps are probably not useful to you. In fact, if you do not need compression, then a BitSet offers remarkable speed.

The sparse scenario is another use case where compressed bitmaps should not be used. Keep in mind that random-looking data is usually not compressible. E.g., if you have a small set of 32-bit random integers, it is not mathematically possible to use far less than 32 bits per integer, and attempts at compression can be counterproductive.

How does Roaring compares with the alternatives?

Most alternatives to Roaring are part of a larger family of compressed bitmaps that are run-length-encoded bitmaps. They identify long runs of 1s or 0s and they represent them with a marker word. If you have a local mix of 1s and 0, you use an uncompressed word.

There are many formats in this family:

  • Oracle's BBC (Byte-aligned Bitmap Code) is an obsolete format at this point: though it may provide good compression, it is likely much slower than more recent alternatives due to excessive branching.
  • WAH (Word Aligned Hybrid) is a patented variation on BBC that provides better performance.
  • Concise is a variation on the patented WAH. It some specific instances, it can compress much better than WAH (up to 2x better), but it is generally slower.
  • EWAH (Enhanced Word Aligned Hybrid) is both free of patent, and it is faster than all the above. On the downside, it does not compress quite as well. It is faster because it allows some form of "skipping" over uncompressed words. So though none of these formats are great at random access, EWAH is better than the alternatives.

There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and "uncompress" the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case...

Roaring solves this problem. It works in the following manner. It divides the data into chunks of 216 integers (e.g., [0, 216), [216, 2 x 216), ...). Within a chunk, it can use an uncompressed bitmap, a simple list of integers, or a list of runs. Whatever format it uses, they all allow you to check for the present of any one value quickly (e.g., with a binary search). The net result is that Roaring can compute many operations much faster than run-length-encoded formats like WAH, EWAH, Concise... Maybe surprisingly, Roaring also generally offers better compression ratios.

API docs

http://www.javadoc.io/doc/org.roaringbitmap/RoaringBitmap/

Scientific Documentation

  • Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
  • Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience Volume 46, Issue 5, pages 709–719, May 2016 http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549
  • Samy Chambi, Daniel Lemire, Robert Godin, Kamel Boukhalfa, Charles Allen, Fangjin Yang, Optimizing Druid with Roaring bitmaps, IDEAS 2016, 2016. http://r-libre.teluq.ca/950/

Code sample

import org.roaringbitmap.RoaringBitmap;

public class Basic {

  public static void main(String[] args) {
        RoaringBitmap rr = RoaringBitmap.bitmapOf(1,2,3,1000);
        RoaringBitmap rr2 = new RoaringBitmap();
        rr2.add(4000L,4255L);
        rr.select(3); // would return the third value or 1000
        rr.rank(2); // would return the rank of 2, which is index 1
        rr.contains(1000); // will return true
        rr.contains(7); // will return false

        RoaringBitmap rror = RoaringBitmap.or(rr, rr2);// new bitmap
        rr.or(rr2); //in-place computation
        boolean equals = rror.equals(rr);// true
        if(!equals) throw new RuntimeException("bug");
        // number of values stored?
        long cardinality = rr.getLongCardinality();
        System.out.println(cardinality);
        // a "forEach" is faster than this loop, but a loop is possible:
        for(int i : rr) {
          System.out.println(i);
        }
  }
}

Please see the examples folder for more examples, which you can run with ./gradlew :examples:runAll, or run a specific one with ./gradlew :examples:runExampleBitmap64, etc.

Unsigned integers

Java lacks native unsigned integers but integers are still considered to be unsigned within Roaring and ordered according to Integer.compareUnsigned. This means that Java will order the numbers like so 0, 1, ..., 2147483647, -2147483648, -2147483647,..., -1. To interpret correctly, you can use Integer.toUnsignedLong and Integer.toUnsignedString.

Working with memory-mapped bitmaps

If you want to have your bitmaps lie in memory-mapped files, you can use the org.roaringbitmap.buffer package instead. It contains two important classes, ImmutableRoaringBitmap and MutableRoaringBitmap. MutableRoaringBitmaps are derived from ImmutableRoaringBitmap, so that you can convert (cast) a MutableRoaringBitmap to an ImmutableRoaringBitmap in constant time.

An ImmutableRoaringBitmap that is not an instance of a MutableRoaringBitmap is backed by a ByteBuffer which comes with some performance overhead, but with the added flexibility that the data can reside anywhere (including outside of the Java heap).

At times you may need to work with bitmaps that reside on disk (instances of ImmutableRoaringBitmap) and bitmaps that reside in Java memory. If you know that the bitmaps will reside in Java memory, it is best to use MutableRoaringBitmap instances, not only can they be modified, but they will also be faster. Moreover, because MutableRoaringBitmap instances are also ImmutableRoaringBitmap instances, you can write much of your code expecting ImmutableRoaringBitmap.

If you write your code expecting ImmutableRoaringBitmap instances, without attempting to cast the instances, then your objects will be truly immutable. The MutableRoaringBitmap has a convenience method (toImmutableRoaringBitmap) which is a simple cast back to an ImmutableRoaringBitmap instance. From a language design point of view, instances of the ImmutableRoaringBitmap class are immutable only when used as per the interface of the ImmutableRoaringBitmap class. Given that the class is not final, it is possible to modify instances, through other interfaces. Thus we do not take the term "immutable" in a purist manner, but rather in a practical one.

One of our motivations for this design where MutableRoaringBitmap instances can be casted down to ImmutableRoaringBitmap instances is that bitmaps are often large, or used in a context where memory allocations are to be avoided, so we avoid forcing copies. Copies could be expected if one needs to mix and match ImmutableRoaringBitmap and MutableRoaringBitmap instances.

The following code sample illustrates how to create an ImmutableRoaringBitmap from a ByteBuffer. In such instances, the constructor only loads the meta-data in RAM while the actual data is accessed from the ByteBuffer on demand.

        import org.roaringbitmap.buffer.*;

        //...

        MutableRoaringBitmap rr1 = MutableRoaringBitmap.bitmapOf(1, 2, 3, 1000);
        MutableRoaringBitmap rr2 = MutableRoaringBitmap.bitmapOf( 2, 3, 1010);
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        DataOutputStream dos = new DataOutputStream(bos);
        // If there were runs of consecutive values, you could
        // call rr1.runOptimize(); or rr2.runOptimize(); to improve compression
        rr1.serialize(dos);
        rr2.serialize(dos);
        dos.close();
        ByteBuffer bb = ByteBuffer.wrap(bos.toByteArray());
        ImmutableRoaringBitmap rrback1 = new ImmutableRoaringBitmap(bb);
        bb.position(bb.position() + rrback1.serializedSizeInBytes());
        ImmutableRoaringBitmap rrback2 = new ImmutableRoaringBitmap(bb);

Alternatively, we can serialize directly to a ByteBuffer with the serialize(ByteBuffer) method.

Operations on an ImmutableRoaringBitmap such as and, or, xor, flip, will generate a RoaringBitmap which lies in RAM. As the name suggest, the ImmutableRoaringBitmap itself cannot be modified.

This design was inspired by Druid.

One can find a complete working example in the test file TestMemoryMapping.java.

Note that you should not mix the classes from the org.roaringbitmap package with the classes from the org.roaringbitmap.buffer package. They are incompatible. They serialize to the same output however. The performance of the code in org.roaringbitmap package is generally superior because there is no overhead due to the use of ByteBuffer instances.

Kryo

Many applications use Kryo for serialization/deserialization. One can use Roaring bitmaps with Kryo efficiently thanks to a custom serializer (Kryo 5):

public class RoaringSerializer extends Serializer<RoaringBitmap> {
    @Override
    public void write(Kryo kryo, Output output, RoaringBitmap bitmap) {
        try {
            bitmap.serialize(new KryoDataOutput(output));
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException();
        }
    }
    @Override
    public RoaringBitmap read(Kryo kryo, Input input, Class<? extends RoaringBitmap> type) {
        RoaringBitmap bitmap = new RoaringBitmap();
        try {
            bitmap.deserialize(new KryoDataInput(input));
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException();
        }
        return bitmap;
    }

}

64-bit integers (long)

Though Roaring Bitmaps were designed with the 32-bit case in mind, we have extensions to 64-bit integers. We offer two classes for this purpose: Roaring64NavigableMap and Roaring64Bitmap.

The Roaring64NavigableMap relies on a conventional red-black-tree. The keys are 32-bit integers representing the most significant 32~bits of elements whereas the values of the tree are 32-bit Roaring bitmaps. The 32-bit Roaring bitmaps represent the least significant bits of a set of elements.

The newer Roaring64Bitmap approach relies on the ART data structure to hold the key/value pair. The key is made of the most significant 48bits of elements whereas the values are 16-bit Roaring containers. It is inspired by [The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases](https://db.in.tum.de/leis/papers/ART.pdf) by Leis et al. (ICDE '13).

    import org.roaringbitmap.longlong.*;

    
    // first Roaring64NavigableMap
    LongBitmapDataProvider r = Roaring64NavigableMap.bitmapOf(1,2,100,1000);
    r.addLong(1234);
    System.out.println(r.contains(1)); // true
    System.out.println(r.contains(3)); // false
    LongIterator i = r.getLongIterator();
    while(i.hasNext()) System.out.println(i.next());


    // second Roaring64Bitmap
    bitmap1 = new Roaring64Bitmap();
    bitmap2 = new Roaring64Bitmap();
    int k = 1 << 16;
    long i = Long.MAX_VALUE / 2;
    long base = i;
    for (; i < base + 10000; ++i) {
       bitmap1.add(i * k);
       bitmap2.add(i * k);
    }
    b1.and(bitmap2);

Range Bitmaps

RangeBitmap is a succinct data structure supporting range queries. Each value added to the bitmap is associated with an incremental identifier, and queries produce a RoaringBitmap of the identifiers associated with values that satisfy the query. Every value added to the bitmap is stored separately, so that if a value is added twice, it will be stored twice, and if that value is less than some threshold, there will be at least two integers in the resultant RoaringBitmap.

It is more efficient - in terms of both time and space - to provide a maximum value. If you don't know the maximum value, provide a Long.MAX_VALUE. Unsigned order is used like elsewhere in the library.

var appender = RangeBitmap.appender(1_000_000);
appender.add(1L);
appender.add(1L);
appender.add(100_000L);
RangeBitmap bitmap = appender.build();
RoaringBitmap lessThan5 = bitmap.lt(5); // {0,1}
RoaringBitmap greaterThanOrEqualTo1 = bitmap.gte(1); // {0, 1, 2}
RoaringBitmap greaterThan1 = bitmap.gt(1); // {2}

RangeBitmap is can be written to disk and memory mapped:

var appender = RangeBitmap.appender(1_000_000);
appender.add(1L);
appender.add(1L);
appender.add(100_000L);
ByteBuffer buffer = mapBuffer(appender.serializedSizeInBytes());
appender.serialize(buffer);
RangeBitmap bitmap = RangeBitmap.map(buffer);

The serialization format uses little endian byte order.

Prerequisites

  • Version 0.7.x requires JDK 8 or better
  • Version 0.6.x requires JDK 7 or better
  • Version 0.5.x requires JDK 6 or better

To build the project you need maven (version 3).

Download

You can download releases from github: https://github.com/RoaringBitmap/RoaringBitmap/releases

Maven repository

If your project depends on roaring, you can specify the dependency in the Maven "pom.xml" file:

        <dependencies>
          <dependency>
            <groupId>org.roaringbitmap</groupId>
            <artifactId>RoaringBitmap</artifactId>
            <version>0.9.9</version>
          </dependency>
        </dependencies>

where you should replace the version number by the version you require.

For up-to-date releases, we recommend configuring maven and gradle to depend on the Jitpack repository.

Usage

Get java

./gradlew assemble will compile

./gradlew build will compile and run the unit tests

./gradlew test will run the tests

./gradlew :roaringbitmap:test --tests TestIterators.testIndexIterator4 run just the test TestIterators.testIndexIterator4

./gradlew bsi:test --tests BufferBSITest.testEQ run just the test BufferBSITest.testEQ in the bsi submodule

./gradlew checkstyleMain will check that you abide by the code style and that the code compiles. We enforce a strict style so that there is no debate as to the proper way to format the code.

IntelliJ and Eclipse

If you plan to contribute to RoaringBitmap, you can have load it up in your favorite IDE.

  • For IntelliJ, in the IDE create a new project, possibly from existing sources, choose import, gradle.
  • For Eclipse: File, Import, Existing Gradle Projects, Select RoaringBitmap on my disk

Contributing

Contributions are invited. We enforce the Google Java style. Please run ./gradlew checkstyleMain on your code before submitting a patch.

FAQ

  • I am getting an error about a bad cookie. What is this about?

In the serialized files, part of the first 4 bytes are dedicated to a "cookie" which serves to indicate the file format.

If you try to deserialize or map a bitmap from data that has an unrecognized "cookie", the code will abort the process and report an error.

This problem will occur to all users who serialized Roaring bitmaps using versions prior to 0.4.x as they upgrade to version 0.4.x or better. These users need to refresh their serialized bitmaps.

  • How big can a Roaring bitmap get?

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call RoaringBitmap.maximumSerializedSize for a more precise estimate.

  • What is the worst case scenario for Roaring bitmaps?

There is no such thing as a data structure that is always ideal. You should make sure that Roaring bitmaps fit your application profile. There are at least two cases where Roaring bitmaps can be easily replaced by superior alternatives compression-wise:

  1. You have few random values spanning in a large interval (i.e., you have a very sparse set). For example, take the set 0, 65536, 131072, 196608, 262144 ... If this is typical of your application, you might consider using a HashSet or a simple sorted array.
  2. You have dense set of random values that never form runs of continuous values. For example, consider the set 0,2,4,...,10000. If this is typical of your application, you might be better served with a conventional bitset (e.g., Java's BitSet class).
  • How do I select an element at random?
Random random = new Random();
   bitmap.select(random.nextInt(bitmap.getCardinality()));

Benchmark

To run JMH benchmarks, use the following command:

     $ ./gradlew jmhJar

You can also run specific benchmarks...

     $ ./jmh/run.sh 'org.roaringbitmap.aggregation.and.identical.*'

Download Details:
Author: RoaringBitmap
Source Code: https://github.com/RoaringBitmap/RoaringBitmap
License: Apache-2.0 license

#datastructure  #java 

Roaring Bitmap: A Better Compressed Bitset in Java

PCollections: A Persistent Java Collections Library

PCollections

A Persistent Java Collections Library

Overview

PCollections serves as a persistent and immutable analogue of the Java Collections Framework. This includes efficient, thread-safe, generic, immutable, and persistent stacks, maps, vectors, sets, and bags, compatible with their Java Collections counterparts.

Persistent and immutable datatypes are increasingly appreciated as a simple, design-friendly, concurrency-friendly, and sometimes more time- and space-efficient alternative to mutable datatypes.

Persistent versus Unmodifiable

Note that these immutable collections are very different from the immutable collections returned by Java's Collections.unmodifiableCollection() and similar methods. The difference is that Java's unmodifiable collections have no producers, whereas PCollections have very efficient producers.

Thus if you have an unmodifiable Collection x and you want a new Collection x2 consisting of the elements of x in addition to some element e, you would have to do something like:

Collection x2 = new HashSet(x);
x2.add(e);

which involves copying all of x, using linear time and space.

If, on the other hand, you have a PCollection y you can simply say:

PCollection y2 = y.plus(e);

which still leaves y untouched but generally requires little or no copying, using time and space much more efficiently.

Usage

PCollections are created using producers and static factory methods. Some example static factory methods are HashTreePSet.empty() which returns an empty PSet, while HashTreePSet.singleton(e) returns a PSet containing just the element e, and HashTreePSet.from(collection) returns a PSet containing the same elements as collection. See Example Code below for an example of using producers.

The same empty(), singleton(), and from() factory methods are found in each of the PCollections implementations, which currently include one concrete implementation for each abstract type:

PCollections are highly interoperable with Java Collections: every PCollection is a java.util.Collection, every PMap is a java.util.Map, every PSequence — including every PStack and PVector — is a java.util.List, and every PSet is a java.util.Set.

PCollections uses Semantic Versioning, which establishes a strong correspondence between API changes and version numbering.

PCollections is in the Maven Central repository, under org.pcollections. Thus the Maven coordinates for PCollections are:

<dependency>
    <groupId>org.pcollections</groupId>
    <artifactId>pcollections</artifactId>
    <version>3.1.4</version>
</dependency>

or Gradle:

compile 'org.pcollections:pcollections:3.1.4'

Example Code

The following gives a very simple example of using PCollections, including the static factory method HashTreePSet.empty() and the producer plus(e):

import org.pcollections.*;

public class Example {
  public static void main(String... args) {
    PSet<String> set = HashTreePSet.empty();
    set = set.plus("something");

    System.out.println(set);
    System.out.println(set.plus("something else"));
    System.out.println(set);
  }
}

Running this program gives the following output:

[something]
[something else, something]
[something]

Building form source

To build the project from source clone the repository and then run ./gradlew

Related Work

Clojure and Scala also provide persistent collections on the JVM, but they are less interoperable with Java. Both Guava and java.util.Collections provide immutable collections but they are not persistent—that is, they do not provide efficient producers—so they are not nearly as useful. See Persistent versus Unmodifiable above.

Download Details:
Author: hrldcpr
Source Code: https://github.com/hrldcpr/pcollections
License: MIT license

#datastructure  #java 

PCollections: A Persistent Java Collections Library
Awesome  Rust

Awesome Rust

1654417200

Roaring Rs: Roaring Bitmaps in Rust

RoaringBitmap

This is a Rust port of the Roaring bitmap data structure, initially defined as a Java library and described in Better bitmap performance with Roaring bitmaps.

Rust version policy

This crate only supports the current stable version of Rust, patch releases may use new features at any time.

Developing

This project uses Clippy, rustfmt, and denies warnings in CI builds. Available via rustup component add clippy rustfmt.

To ensure your changes will be accepted please check them with:

cargo fmt -- --check
cargo fmt --manifest-path benchmarks/Cargo.toml -- --check
cargo clippy --all-targets -- -D warnings

In addition, ensure all tests are passing with cargo test

Benchmarking

It is recommended to run the cargo bench command inside of the benchmarks directory. This directory contains a library that is dedicated to benchmarking the Roaring library by using a set of real-world datasets. It is also advised to run the benchmarks on a bare-metal machine, running them on the base branch and then on the contribution PR branch to better see the changes.

Those benchmarks are designed on top of the Criterion library, you can read more about it on the user guide.

Experimental features

The simd feature is in active development. It has not been tested. If you would like to build with simd note that std::simd is only available in Rust nightly.

Download Details:
Author: RoaringBitmap
Source Code: https://github.com/RoaringBitmap/roaring-rs
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #database  #datastructure 

Roaring Rs: Roaring Bitmaps in Rust
Awesome  Rust

Awesome Rust

1654410420

Enum Map: Providing Type Safe Enum Array Written in Rust

enum-map

A library providing enum map providing type safe enum array. It is implemented using regular Rust arrays, so using them is as fast as using regular Rust arrays.

This library doesn't provide Minimum Supported Rust Version (MSRV). If you find having MSRV valuable, please use enum-map 0.6 instead.

Examples

#[macro_use]
extern crate enum_map;

use enum_map::EnumMap;

#[derive(Debug, Enum)]
enum Example {
    A,
    B,
    C,
}

fn main() {
    let mut map = enum_map! {
        Example::A => 1,
        Example::B => 2,
        Example::C => 3,
    };
    map[Example::C] = 4;

    assert_eq!(map[Example::A], 1);

    for (key, &value) in &map {
        println!("{:?} has {} as value.", key, value);
    }
}

Download Details:
Author: xfix
Source Code: https://github.com/xfix/enum-map
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #database  #datastructure 

Enum Map: Providing Type Safe Enum Array Written in Rust
Awesome  Rust

Awesome Rust

1654409880

Persistent Data Structures in Rust

Rust Persistent Data Structures provides fully persistent data structures with structural sharing.

Setup

To use rpds add the following to your Cargo.toml:

[dependencies]
rpds = "<version>"

Data structures

This crate offers the following data structures:

  1. List
  2. Vector
  3. Stack
  4. Queue
  5. HashTrieMap
  6. HashTrieSet
  7. RedBlackTreeMap
  8. RedBlackTreeSet

List

List documentation

Your classic functional list.

Example

use rpds::List;

let list = List::new().push_front("list");

assert_eq!(list.first(), Some(&"list"));

let a_list = list.push_front("a");

assert_eq!(a_list.first(), Some(&"a"));

let list_dropped = a_list.drop_first().unwrap();

assert_eq!(list_dropped, list);

Vector

A sequence that can be indexed. The implementation is described in Understanding Persistent Vector Part 1 and Understanding Persistent Vector Part 2.

Example

use rpds::Vector;

let vector = Vector::new()
    .push_back("I’m")
    .push_back("a")
    .push_back("vector");

assert_eq!(vector[1], "a");

let screaming_vector = vector
    .drop_last().unwrap()
    .push_back("VECTOR!!!");

assert_eq!(screaming_vector[2], "VECTOR!!!");

Stack

A LIFO (last in, first out) data structure. This is just a List in disguise.

Example

use rpds::Stack;

let stack = Stack::new().push("stack");

assert_eq!(stack.peek(), Some(&"stack"));

let a_stack = stack.push("a");

assert_eq!(a_stack.peek(), Some(&"a"));

let stack_popped = a_stack.pop().unwrap();

assert_eq!(stack_popped, stack);

Queue

A FIFO (first in, first out) data structure.

Example

use rpds::Queue;

let queue = Queue::new()
    .enqueue("um")
    .enqueue("dois")
    .enqueue("tres");

assert_eq!(queue.peek(), Some(&"um"));

let queue_dequeued = queue.dequeue().unwrap();

assert_eq!(queue_dequeued.peek(), Some(&"dois"));

HashTrieMap

A map implemented with a hash array mapped trie. See Ideal Hash Trees for details.

Example

use rpds::HashTrieMap;

let map_en = HashTrieMap::new()
    .insert(0, "zero")
    .insert(1, "one");

assert_eq!(map_en.get(&1), Some(&"one"));

let map_pt = map_en
    .insert(1, "um")
    .insert(2, "dois");

assert_eq!(map_pt.get(&2), Some(&"dois"));

let map_pt_binary = map_pt.remove(&2);

assert_eq!(map_pt_binary.get(&2), None);

HashTrieSet

A set implemented with a HashTrieMap.

Example

use rpds::HashTrieSet;

let set = HashTrieSet::new()
    .insert("zero")
    .insert("one");

assert!(set.contains(&"one"));

let set_extended = set.insert("two");

assert!(set_extended.contains(&"two"));

let set_positive = set_extended.remove(&"zero");

assert!(!set_positive.contains(&"zero"));

RedBlackTreeMap

A map implemented with a red-black tree.

Example

use rpds::RedBlackTreeMap;

let map_en = RedBlackTreeMap::new()
    .insert(0, "zero")
    .insert(1, "one");

assert_eq!(map_en.get(&1), Some(&"one"));

let map_pt = map_en
    .insert(1, "um")
    .insert(2, "dois");

assert_eq!(map_pt.get(&2), Some(&"dois"));

let map_pt_binary = map_pt.remove(&2);

assert_eq!(map_pt_binary.get(&2), None);

assert_eq!(map_pt_binary.first(), Some((&0, &"zero")));

RedBlackTreeSet

A set implemented with a RedBlackTreeMap.

Example

use rpds::RedBlackTreeSet;

let set = RedBlackTreeSet::new()
    .insert("zero")
    .insert("one");

assert!(set.contains(&"one"));

let set_extended = set.insert("two");

assert!(set_extended.contains(&"two"));

let set_positive = set_extended.remove(&"zero");

assert!(!set_positive.contains(&"zero"));

assert_eq!(set_positive.first(), Some(&"one"));

Other features

Mutable methods

When you change a data structure you often do not need its previous versions. For those cases rpds offers you mutable methods which are generally faster:

use rpds::HashTrieSet;

let mut set = HashTrieSet::new();

set.insert_mut("zero");
set.insert_mut("one");

let set_0_1 = set.clone();
let set_0_1_2 = set.insert("two");

Initialization macros

There are convenient initialization macros for all data structures:

use rpds::*;

let vector = vector![3, 1, 4, 1, 5];
let map = ht_map!["orange" => "orange", "banana" => "yellow"];

Check the documentation for initialization macros of other data structures.

Thread safety

All data structures in this crate can be shared between threads, but that is an opt-in ability. This is because there is a performance cost to make data structures thread safe. That cost is worth avoiding when you are not actually sharing them between threads.

Of course if you try to share a rpds data structure across different threads you can count on the rust compiler to ensure that it is safe to do so. If you are using the version of the data structure that is not thread safe you will get a compile-time error.

To create a thread-safe version of any data structure use new_sync():

let vec = Vector::new_sync()
    .push_back(42);

Or use the _sync variant of the initialization macro:

let vec = vector_sync!(42);

no_std support

This crate supports no_std. To enable that you need to disable the default feature std:

[dependencies]
rpds = { version = "<version>", default-features = false }

Further details

Internally the data structures in this crate maintain a lot of reference-counting pointers. These pointers are used both for links between the internal nodes of the data structure as well as for the values it stores.

There are two implementations of reference-counting pointers in the standard library: Rc and Arc. They behave the same way, but Arc allows you to share the data it points to across multiple threads. The downside is that it is significantly slower to clone and drop than Rc, and persistent data structures do a lot of those operations. In some microbenchmarks with rpds data structure we can see that using Rc instead of Arc can make some operations twice as fast! You can see this for yourself by running cargo bench.

To implement this we parameterize the type of reference-counting pointer (Rc or Arc) as a type argument of the data structure. We use the archery crate to do this in a convenient way.

The pointer type can be parameterized like this:

let vec: Vector<u32, archery::ArcK> = Vector::new_with_ptr_kind();
//                              ↖
//                                This will use `Arc` pointers.
//                                Change it to `archery::RcK` to use a `Rc` pointer.

Serialization

We support serialization through serde. To use it enable the serde feature. To do so change the rpds dependency in your Cargo.toml to

[dependencies]
rpds = { version = "<version>", features = ["serde"] }

Download Details:
Author: orium
Source Code: https://github.com/orium/rpds
License: MPL-2.0 license

#rust  #rustlang  #database  #datastructure 

Persistent Data Structures in Rust
Awesome  Rust

Awesome Rust

1654402500

Kdtree Rs: Fast Geospatial indexing & Nearest Neighbors Lookup

kdtree

K-dimensional tree in Rust for fast geospatial indexing and nearest neighbors lookup

Usage

Add kdtree to Cargo.toml

[dependencies]
kdtree = "0.5.1"

Add points to kdtree and query nearest n points with distance function

use kdtree::KdTree;
use kdtree::ErrorKind;
use kdtree::distance::squared_euclidean;

let a: ([f64; 2], usize) = ([0f64, 0f64], 0);
let b: ([f64; 2], usize) = ([1f64, 1f64], 1);
let c: ([f64; 2], usize) = ([2f64, 2f64], 2);
let d: ([f64; 2], usize) = ([3f64, 3f64], 3);

let dimensions = 2;
let mut kdtree = KdTree::new(dimensions);

kdtree.add(&a.0, a.1).unwrap();
kdtree.add(&b.0, b.1).unwrap();
kdtree.add(&c.0, c.1).unwrap();
kdtree.add(&d.0, d.1).unwrap();

assert_eq!(kdtree.size(), 4);
assert_eq!(
    kdtree.nearest(&a.0, 0, &squared_euclidean).unwrap(),
    vec![]
);
assert_eq!(
    kdtree.nearest(&a.0, 1, &squared_euclidean).unwrap(),
    vec![(0f64, &0)]
);
assert_eq!(
    kdtree.nearest(&a.0, 2, &squared_euclidean).unwrap(),
    vec![(0f64, &0), (2f64, &1)]
);
assert_eq!(
    kdtree.nearest(&a.0, 3, &squared_euclidean).unwrap(),
    vec![(0f64, &0), (2f64, &1), (8f64, &2)]
);
assert_eq!(
    kdtree.nearest(&a.0, 4, &squared_euclidean).unwrap(),
    vec![(0f64, &0), (2f64, &1), (8f64, &2), (18f64, &3)]
);
assert_eq!(
    kdtree.nearest(&a.0, 5, &squared_euclidean).unwrap(),
    vec![(0f64, &0), (2f64, &1), (8f64, &2), (18f64, &3)]
);
assert_eq!(
    kdtree.nearest(&b.0, 4, &squared_euclidean).unwrap(),
    vec![(0f64, &1), (2f64, &0), (2f64, &2), (8f64, &3)]
);

Benchmark

cargo bench with 2.3 GHz Intel i5-7360U:

cargo bench
     Running target/release/deps/bench-9e622e6a4ed9b92a

running 2 tests
test bench_add_to_kdtree_with_1k_3d_points       ... bench:         106 ns/iter (+/- 25)
test bench_nearest_from_kdtree_with_1k_3d_points ... bench:       1,237 ns/iter (+/- 266)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

Thanks Eh2406 for various fixes and perf improvements.

Learn More

Download Details:
Author: mrhooray
Source Code: https://github.com/mrhooray/kdtree-rs
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #database  #datastructure 

Kdtree Rs: Fast Geospatial indexing & Nearest Neighbors Lookup
Awesome  Rust

Awesome Rust

1654395120

Generic Array: Generic Array Types in Rust

generic-array

This crate implements generic array types for Rust.

Requires minumum Rust version of 1.36.0, or 1.41.0 for From<[T; N]> implementations

Documentation

Usage

The Rust arrays [T; N] are problematic in that they can't be used generically with respect to N, so for example this won't work:

struct Foo<N> {
    data: [i32; N]
}

generic-array defines a new trait ArrayLength<T> and a struct GenericArray<T, N: ArrayLength<T>>, which let the above be implemented as:

struct Foo<N: ArrayLength<i32>> {
    data: GenericArray<i32, N>
}

The ArrayLength<T> trait is implemented by default for unsigned integer types from typenum crate:

use generic_array::typenum::U5;

struct Foo<N: ArrayLength<i32>> {
    data: GenericArray<i32, N>
}

fn main() {
    let foo = Foo::<U5>{data: GenericArray::default()};
}

For example, GenericArray<T, U5> would work almost like [T; 5]:

use generic_array::typenum::U5;

struct Foo<T, N: ArrayLength<T>> {
    data: GenericArray<T, N>
}

fn main() {
    let foo = Foo::<i32, U5>{data: GenericArray::default()};
}

In version 0.1.1 an arr! macro was introduced, allowing for creation of arrays as shown below:

let array = arr![u32; 1, 2, 3];
assert_eq!(array[2], 3);

Download Details:
Author: fizyk20
Source Code: https://github.com/fizyk20/generic-array
License: MIT license

#rust  #rustlang  #database  #datastructure 

Generic Array: Generic Array Types in Rust
Awesome  Rust

Awesome Rust

1654387800

Array Tool: Array Helpers for Rust's Vector and String Types

array_tool

Array helpers for Rust. Some of the most common methods you would use on Arrays made available on Vectors. Polymorphic implementations for handling most of your use cases.

Installation

Add the following to your Cargo.toml file

[dependencies]
array_tool = "~1.0.3"

And in your rust files where you plan to use it put this at the top

extern crate array_tool;

And if you plan to use all of the Vector helper methods available you may do

use array_tool::vec::*;

This crate has helpful methods for strings as well.

Iterator Usage

use array_tool::iter::ZipOpt;
fn zip_option<U: Iterator>(self, other: U) -> ZipOption<Self, U>
  where Self: Sized, U: IntoIterator;
  //  let a = vec![1];
  //  let b = vec![];
  //  a.zip_option(b).next()      // input
  //  Some((Some(1), None))       // return value

Vector Usage

pub fn uniques<T: PartialEq + Clone>(a: Vec<T>, b: Vec<T>) -> Vec<Vec<T>>
  //  array_tool::uniques(vec![1,2,3,4,5], vec![2,5,6,7,8]) // input
  //  vec![vec![1,3,4], vec![6,7,8]]                        // return value

use array_tool::vec::Uniq;
fn uniq(&self, other: Vec<T>) -> Vec<T>;
  //  vec![1,2,3,4,5,6].uniq( vec![1,2,5,7,9] ) // input
  //  vec![3,4,6]                               // return value
fn uniq_via<F: Fn(&T, &T) -> bool>(&self, other: Self, f: F) -> Self;
  //  vec![1,2,3,4,5,6].uniq_via( vec![1,2,5,7,9], |&l, r| l == r + 2 ) // input 
  //  vec![1,2,4,6]                                                     // return value
fn unique(&self) -> Vec<T>;
  //  vec![1,2,1,3,2,3,4,5,6].unique()          // input
  //  vec![1,2,3,4,5,6]                         // return value
fn unique_via<F: Fn(&T, &T) -> bool>(&self, f: F) -> Self;
  //  vec![1.0,2.0,1.4,3.3,2.1,3.5,4.6,5.2,6.2].
  //  unique_via( |l: &f64, r: &f64| l.floor() == r.floor() ) // input
  //  vec![1.0,2.0,3.3,4.6,5.2,6.2]                           // return value
fn is_unique(&self) -> bool;
  //  vec![1,2,1,3,4,3,4,5,6].is_unique()       // input
  //  false                                     // return value
  //  vec![1,2,3,4,5,6].is_unique()             // input
  //  true                                      // return value

use array_tool::vec::Shift;
fn unshift(&mut self, other: T);    // no return value, modifies &mut self directly
  //  let mut x = vec![1,2,3];
  //  x.unshift(0);
  //  assert_eq!(x, vec![0,1,2,3]);
fn shift(&mut self) -> Option<T>;
  //  let mut x = vec![0,1,2,3];
  //  assert_eq!(x.shift(), Some(0));
  //  assert_eq!(x, vec![1,2,3]);

use array_tool::vec::Intersect;
fn intersect(&self, other: Vec<T>) -> Vec<T>;
  //  vec![1,1,3,5].intersect(vec![1,2,3]) // input
  //  vec![1,3]                            // return value
fn intersect_if<F: Fn(&T, &T) -> bool>(&self, other: Vec<T>, validator: F) -> Vec<T>;
  //  vec!['a','a','c','e'].intersect_if(vec!['A','B','C'], |l, r| l.eq_ignore_ascii_case(r)) // input
  //  vec!['a','c']                                                                           // return value

use array_tool::vec::Join;
fn join(&self, joiner: &'static str) -> String;
  //  vec![1,2,3].join(",")                // input
  //  "1,2,3"                              // return value

use array_tool::vec::Times;
fn times(&self, qty: i32) -> Vec<T>;
  //  vec![1,2,3].times(3)                 // input
  //  vec![1,2,3,1,2,3,1,2,3]              // return value

use array_tool::vec::Union;
fn union(&self, other: Vec<T>) -> Vec<T>;
  //  vec!["a","b","c"].union(vec!["c","d","a"])   // input
  //  vec![ "a", "b", "c", "d" ]                   // return value

String Usage

use array_tool::string::ToGraphemeBytesIter;
fn grapheme_bytes_iter(&'a self) -> GraphemeBytesIter<'a>;
  //  let string = "a s—d féZ";
  //  let mut graphemes = string.grapheme_bytes_iter()
  //  graphemes.skip(3).next();            // input
  //  [226, 128, 148]                      // return value for emdash `—`

use array_tool::string::Squeeze;
fn squeeze(&self, targets: &'static str) -> String;
  //  "yellow moon".squeeze("")            // input
  //  "yelow mon"                          // return value
  //  "  now   is  the".squeeze(" ")       // input
  //  " now is the"                        // return value

use array_tool::string::Justify;
fn justify_line(&self, width: usize) -> String;
  //  "asd as df asd".justify_line(16)     // input
  //  "asd  as  df  asd"                   // return value
  //  "asd as df asd".justify_line(18)     // input
  //  "asd   as   df  asd"                 // return value

use array_tool::string::SubstMarks;
fn subst_marks(&self, marks: Vec<usize>, chr: &'static str) -> String;
  //  "asdf asdf asdf".subst_marks(vec![0,5,8], "Z") // input
  //  "Zsdf ZsdZ asdf"                               // return value

use array_tool::string::WordWrap;
fn word_wrap(&self, width: usize) -> String;
  //  "01234 67 9 BC EFG IJ".word_wrap(6)  // input
  //  "01234\n67 9\nBC\nEFG IJ"            // return value

use array_tool::string::AfterWhitespace;
fn seek_end_of_whitespace(&self, offset: usize) -> Option<usize>;
  //  "asdf           asdf asdf".seek_end_of_whitespace(6) // input
  //  Some(9)                                              // return value
  //  "asdf".seek_end_of_whitespace(3)                     // input
  //  Some(0)                                              // return value
  //  "asdf           ".seek_end_of_whitespace(6)          // input
  //  None                                                 // return_value

Future plans

Expect methods to become more polymorphic over time (same method implemented for similar & compatible types). I plan to implement many of the methods available for Arrays in higher languages; such as Ruby. Expect regular updates.

Download Details:
Author: danielpclark
Source Code: https://github.com/danielpclark/array_tool
License: MIT license

#rust  #rustlang  #database  #datastructure 

Array Tool: Array Helpers for Rust's Vector and String Types
Awesome  Rust

Awesome Rust

1654380540

Hashbrown: Rust Port Of Google's SwissTable Hash Map

hashbrown

This crate is a Rust port of Google's high-performance SwissTable hash map, adapted to make it a drop-in replacement for Rust's standard HashMap and HashSet types.

The original C++ version of SwissTable can be found here, and this CppCon talk gives an overview of how the algorithm works.

Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.

Change log

Features

  • Drop-in replacement for the standard library HashMap and HashSet types.
  • Uses AHash as the default hasher, which is much faster than SipHash.
  • Around 2x faster than the previous standard library HashMap.
  • Lower memory usage: only 1 byte of overhead per entry instead of 8.
  • Compatible with #[no_std] (but requires a global allocator with the alloc crate).
  • Empty hash maps do not allocate any memory.
  • SIMD lookups to scan multiple hash entries in parallel.

Performance

Compared to the previous implementation of std::collections::HashMap (Rust 1.35).

With the hashbrown default AHash hasher (not HashDoS-resistant):

 name                       oldstdhash ns/iter  hashbrown ns/iter  diff ns/iter   diff %  speedup 
 insert_ahash_highbits        20,846              7,397                   -13,449  -64.52%   x 2.82 
 insert_ahash_random          20,515              7,796                   -12,719  -62.00%   x 2.63 
 insert_ahash_serial          21,668              7,264                   -14,404  -66.48%   x 2.98 
 insert_erase_ahash_highbits  29,570              17,498                  -12,072  -40.83%   x 1.69 
 insert_erase_ahash_random    39,569              17,474                  -22,095  -55.84%   x 2.26 
 insert_erase_ahash_serial    32,073              17,332                  -14,741  -45.96%   x 1.85 
 iter_ahash_highbits          1,572               2,087                       515   32.76%   x 0.75 
 iter_ahash_random            1,609               2,074                       465   28.90%   x 0.78 
 iter_ahash_serial            2,293               2,120                      -173   -7.54%   x 1.08 
 lookup_ahash_highbits        3,460               4,403                       943   27.25%   x 0.79 
 lookup_ahash_random          6,377               3,911                    -2,466  -38.67%   x 1.63 
 lookup_ahash_serial          3,629               3,586                       -43   -1.18%   x 1.01 
 lookup_fail_ahash_highbits   5,286               3,411                    -1,875  -35.47%   x 1.55 
 lookup_fail_ahash_random     12,365              4,171                    -8,194  -66.27%   x 2.96 
 lookup_fail_ahash_serial     4,902               3,240                    -1,662  -33.90%   x 1.51 

With the libstd default SipHash hasher (HashDoS-resistant):

 name                       oldstdhash ns/iter  hashbrown ns/iter  diff ns/iter   diff %  speedup 
 insert_std_highbits        32,598              20,199                  -12,399  -38.04%   x 1.61 
 insert_std_random          29,824              20,760                   -9,064  -30.39%   x 1.44 
 insert_std_serial          33,151              17,256                  -15,895  -47.95%   x 1.92 
 insert_erase_std_highbits  74,731              48,735                  -25,996  -34.79%   x 1.53 
 insert_erase_std_random    73,828              47,649                  -26,179  -35.46%   x 1.55 
 insert_erase_std_serial    73,864              40,147                  -33,717  -45.65%   x 1.84 
 iter_std_highbits          1,518               2,264                       746   49.14%   x 0.67 
 iter_std_random            1,502               2,414                       912   60.72%   x 0.62 
 iter_std_serial            6,361               2,118                    -4,243  -66.70%   x 3.00 
 lookup_std_highbits        21,705              16,962                   -4,743  -21.85%   x 1.28 
 lookup_std_random          21,654              17,158                   -4,496  -20.76%   x 1.26 
 lookup_std_serial          18,726              14,509                   -4,217  -22.52%   x 1.29 
 lookup_fail_std_highbits   25,852              17,323                   -8,529  -32.99%   x 1.49 
 lookup_fail_std_random     25,913              17,760                   -8,153  -31.46%   x 1.46 
 lookup_fail_std_serial     22,648              14,839                   -7,809  -34.48%   x 1.53 

Usage

Add this to your Cargo.toml:

[dependencies]
hashbrown = "0.9"

Then:

use hashbrown::HashMap;

let mut map = HashMap::new();
map.insert(1, "one");

This crate has the following Cargo features:

  • nightly: Enables nightly-only features: #[may_dangle].
  • serde: Enables serde serialization support.
  • rayon: Enables rayon parallel iterator support.
  • raw: Enables access to the experimental and unsafe RawTable API.
  • inline-more: Adds inline hints to most functions, improving run-time performance at the cost of compilation time. (enabled by default)
  • ahash: Compiles with ahash as default hasher. (enabled by default)
  • ahash-compile-time-rng: Activates the compile-time-rng feature of ahash, to increase the DOS-resistance, but can result in issues for no_std builds. More details in issue#124. (enabled by default)

Download Details:
Author: contain-rs
Source Code: https://github.com/contain-rs/hashbrown2
License: View license

#rust  #rustlang  #database  #datastructure 

Hashbrown: Rust Port Of Google's SwissTable Hash Map
Awesome  Rust

Awesome Rust

1654373160

Ternary Search Tree Collection in Rust

tst

Ternary search tree collection in rust with similar API to std::collections as it possible.

Ternary search tree is a type of trie (sometimes called a prefix tree) where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and auto-completion. TSTMap and TSTSet structures for map and set like usage.

Documentation is available at http://billyevans.github.io/tst/tst

It has special methods:

  • wildcard_iter/wildcard_iter_mut - get iterator by wildcard
  • prefix_iter/prefix_iter_mut - get iterator by prefix
  • longest_prefix - get longest prefix

Usage

Add this to your Cargo.toml:

[dependencies]
tst = "0.10.*"

Quick Start

#[macro_use]
extern crate tst;
use tst::TSTMap;

let m = tstmap! {
    "first" =>  1,
    "second" => 2,
    "firstthird" => 3,
    "firstsecond" => 12,
    "xirst" => -13,
};

// iterate
for (key, value) in m.iter() {
    println!("{}: {}", key, value);
}
assert_eq!(Some(&1), m.get("first"));
assert_eq!(5, m.len());

// calculating longest prefix
assert_eq!("firstsecond", m.longest_prefix("firstsecondthird"));

// get values with common prefix
for (key, value) in m.prefix_iter("first") {
    println!("{}: {}", key, value);
}

// get sum by wildcard iterator
assert_eq!(-12, m.wildcard_iter(".irst").fold(0, |sum, (_, val)| sum + val));

Iterating over keys with wildcard

#[macro_use]
extern crate tst;
use tst::TSTMap;

let m = tstmap! {
    "ac" => 1,
    "bd" => 2,
    "cc" => 3,
};

for (k, v) in m.wildcard_iter(".c") {
    println!("{} -> {}", k, v);
}

Itereting over keys with common prefix

#[macro_use]
extern crate tst;
use tst::TSTMap;

let m = tstmap! {
    "abc" => 1,
    "abcd" => 1,
    "abce" => 1,
    "abca" => 1,
    "zxd" => 1,
    "add" => 1,
    "abcdef" => 1,
};

for (key, value) in m.prefix_iter("abc") {
    println!("{}: {}", key, value);
}

Search for longest prefix in the tree

#[macro_use]
extern crate tst;
use tst::TSTMap;

let m = tstmap! {
    "abc" => 1,
    "abcd" => 1,
    "abce" => 1,
    "abca" => 1,
    "zxd" => 1,
    "add" => 1,
    "abcdef" => 1,
};

assert_eq!("abcd", m.longest_prefix("abcde"));

Implementation details

https://en.wikipedia.org/wiki/Ternary_search_tree

Download Details:
Author: billyevans
Source Code: https://github.com/billyevans/tst
License: MIT license

#rust  #rustlang  #database  #datastructure 

Ternary Search Tree Collection in Rust
Sasha  Hall

Sasha Hall

1654312200

How GitHub Copilot ANSWERS Leetcode Interview Questions

In this video, we will test out Github Copilot by using it to solve Leetcode problems.

GitHub Copilot is powered by Codex, the new AI system created by OpenAI. GitHub Copilot understands significantly more context than most code assistants. So, whether it’s in a docstring, comment, function name, or the code itself, GitHub Copilot uses the context you’ve provided and synthesizes code to match. Together with OpenAI, we’re designing GitHub Copilot to get smarter at producing safe and effective code as developers use it.

#copilot  #datastructure #github 

How GitHub Copilot ANSWERS Leetcode Interview Questions