Viral Genome Coverage Evaluation for Metagenomic Diagnostics in Rust

vircov

Minimal virus genome coverage assessment for metagenomic diagnostics

Purpose

Viral metagenomic diagnostics from low-abundance clinical samples can be challenging in the absence of sufficient genome coverage. Vircov extracts distinct non-overlapping regions from a reference alignment and generates some helpful statistics. It can be used to flag potential hits without inspection of coverage plots in automated pipelines and reports.

Implementation

Vircov is written in Rust and works with alignments in the standard PAF or SAM/BAM/CRAM (next release) formats. It is extremely fast and can process alignments against thousands of viral reference genomes in seconds. Basic input filters can be selected to remove spurious alignments and text-style coverage plots can be printed to the terminal for visual confirmation.

Vircov is written for implementation in (accredited) metagenomics pipelines for human patients enroled in the META-GP network (Australia). As such, it attempts to be production-grade code with high test coverage, continuous integration, and versioned releases with precompiled binaries for Linux and MacOS.

Version 0.6.0 will be distributed on BioConda and Cargo and will have precompiled binaries available.

Installation

git clone https://github.com/esteinig/vircov 
cd vircov && cargo build --release

Usage

vircov tests/cases/test_full_ok.paf --fasta tests/cases/test_ok.fasta --table

Tests

git clone https://github.com/esteinig/vircov 
cd vircov && cargo test && cargo tarpaulin 

Concept

Definitive viral diagnosis from metagenomic clinical samples can be extremely challenging due to low sequencing depth, large amounts of host reads and low infectious titres, especially in blood or CSF. One way to distinguish a positive viral diagnosis is to look at alignment coverage against one or multiple reference sequences. When only few reads map to the reference, and genome coverage is low, positive infections often display multiple distinct alignment regions, as opposed to reads mapping to a single or few regions on the reference.

De Vries et al. (2021) summarize this concept succinctly in this figure (adapted):

devries

Positive calls in these cases can be made from coverage plots showing the distinct alignment regions and a threshold on the number of regions is chosen by the authors (> 3). However, coverage plots require visual assessment and may not be suitable for flagging potential hits in automated pipelines or summary reports.

Vircov attempts to make visual inspection and automated flagging easier by counting the distinct (non-overlapping) coverage regions in an alignment and reports some helpful statistics to make an educated call.

Clinical examples


Performance

Alignments conducted with minimap2 -c -x sr (PAF) and minimap2 --sam-hits-only -ax sr (SAM/BAM, only aligned sequences). Peak memory is mainly determined by aligned interval records that are stored for the overlap computations. It may vary depending on how many alignments remain after filtering, the number of aligned reads, output formats and size of the reference database.

  • Sample 1: ~ 70 million Illumina PE reads against ~ 70k reference genomes, ~ 1.7 million alignments
  • Sample 2: ~ 80 million Illumina PE reads against ~ 70k reference genomes, ~ 63 million alignments

.paf

  • Sample 1 (212 MB): 0.56 seconds, peak memory: 12 MB
  • Sample 2 (8.9 GB): 55.4 seconds, peak memory: 2.9 GB

Etymology

Not a very creative abbreviation of "virus coverage" but the little spectacles in the logo are a reference to Rudolf Virchow who described such trivial concepts as cells, cancer and pathology. His surname is pronounced like vircov if you mumble the terminal v.

Contributors

  • Prof. Deborah Williamson and Prof. Lachlan Coin (principal investigators for META-GP)
  • Dr. Leon Caly (samples and sequencing for testing on clinical data)

Download Details:

Author: esteinig
Source Code: https://github.com/esteinig/vircov

License: Apache-2.0 license

#rust 

What is GEEK

Buddha Community

Viral Genome Coverage Evaluation for Metagenomic Diagnostics in Rust

Serde Rust: Serialization Framework for Rust

Serde

*Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.*

You may be looking for:

Serde in action

Click to show Cargo.toml. Run this code in the playground.

[dependencies]

# The core APIs, including the Serialize and Deserialize traits. Always
# required when using Serde. The "derive" feature is only required when
# using #[derive(Serialize, Deserialize)] to make Serde work with structs
# and enums defined in your crate.
serde = { version = "1.0", features = ["derive"] }

# Each data format lives in its own crate; the sample code below uses JSON
# but you may be using a different one.
serde_json = "1.0"

 

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let point = Point { x: 1, y: 2 };

    // Convert the Point to a JSON string.
    let serialized = serde_json::to_string(&point).unwrap();

    // Prints serialized = {"x":1,"y":2}
    println!("serialized = {}", serialized);

    // Convert the JSON string back to a Point.
    let deserialized: Point = serde_json::from_str(&serialized).unwrap();

    // Prints deserialized = Point { x: 1, y: 2 }
    println!("deserialized = {:?}", deserialized);
}

Getting help

Serde is one of the most widely used Rust libraries so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

Download Details:
Author: serde-rs
Source Code: https://github.com/serde-rs/serde
License: View license

#rust  #rustlang 

Viral Genome Coverage Evaluation for Metagenomic Diagnostics in Rust

vircov

Minimal virus genome coverage assessment for metagenomic diagnostics

Purpose

Viral metagenomic diagnostics from low-abundance clinical samples can be challenging in the absence of sufficient genome coverage. Vircov extracts distinct non-overlapping regions from a reference alignment and generates some helpful statistics. It can be used to flag potential hits without inspection of coverage plots in automated pipelines and reports.

Implementation

Vircov is written in Rust and works with alignments in the standard PAF or SAM/BAM/CRAM (next release) formats. It is extremely fast and can process alignments against thousands of viral reference genomes in seconds. Basic input filters can be selected to remove spurious alignments and text-style coverage plots can be printed to the terminal for visual confirmation.

Vircov is written for implementation in (accredited) metagenomics pipelines for human patients enroled in the META-GP network (Australia). As such, it attempts to be production-grade code with high test coverage, continuous integration, and versioned releases with precompiled binaries for Linux and MacOS.

Version 0.6.0 will be distributed on BioConda and Cargo and will have precompiled binaries available.

Installation

git clone https://github.com/esteinig/vircov 
cd vircov && cargo build --release

Usage

vircov tests/cases/test_full_ok.paf --fasta tests/cases/test_ok.fasta --table

Tests

git clone https://github.com/esteinig/vircov 
cd vircov && cargo test && cargo tarpaulin 

Concept

Definitive viral diagnosis from metagenomic clinical samples can be extremely challenging due to low sequencing depth, large amounts of host reads and low infectious titres, especially in blood or CSF. One way to distinguish a positive viral diagnosis is to look at alignment coverage against one or multiple reference sequences. When only few reads map to the reference, and genome coverage is low, positive infections often display multiple distinct alignment regions, as opposed to reads mapping to a single or few regions on the reference.

De Vries et al. (2021) summarize this concept succinctly in this figure (adapted):

devries

Positive calls in these cases can be made from coverage plots showing the distinct alignment regions and a threshold on the number of regions is chosen by the authors (> 3). However, coverage plots require visual assessment and may not be suitable for flagging potential hits in automated pipelines or summary reports.

Vircov attempts to make visual inspection and automated flagging easier by counting the distinct (non-overlapping) coverage regions in an alignment and reports some helpful statistics to make an educated call.

Clinical examples


Performance

Alignments conducted with minimap2 -c -x sr (PAF) and minimap2 --sam-hits-only -ax sr (SAM/BAM, only aligned sequences). Peak memory is mainly determined by aligned interval records that are stored for the overlap computations. It may vary depending on how many alignments remain after filtering, the number of aligned reads, output formats and size of the reference database.

  • Sample 1: ~ 70 million Illumina PE reads against ~ 70k reference genomes, ~ 1.7 million alignments
  • Sample 2: ~ 80 million Illumina PE reads against ~ 70k reference genomes, ~ 63 million alignments

.paf

  • Sample 1 (212 MB): 0.56 seconds, peak memory: 12 MB
  • Sample 2 (8.9 GB): 55.4 seconds, peak memory: 2.9 GB

Etymology

Not a very creative abbreviation of "virus coverage" but the little spectacles in the logo are a reference to Rudolf Virchow who described such trivial concepts as cells, cancer and pathology. His surname is pronounced like vircov if you mumble the terminal v.

Contributors

  • Prof. Deborah Williamson and Prof. Lachlan Coin (principal investigators for META-GP)
  • Dr. Leon Caly (samples and sequencing for testing on clinical data)

Download Details:

Author: esteinig
Source Code: https://github.com/esteinig/vircov

License: Apache-2.0 license

#rust 

Awesome  Rust

Awesome Rust

1654894080

Serde JSON: JSON Support for Serde Framework

Serde JSON

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

[dependencies]
serde_json = "1.0"

You may be looking for:

JSON is a ubiquitous open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs.

{
    "name": "John Doe",
    "age": 43,
    "address": {
        "street": "10 Downing Street",
        "city": "London"
    },
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
}

There are three common ways that you might find yourself needing to work with JSON data in Rust.

  • As text data. An unprocessed string of JSON data that you receive on an HTTP endpoint, read from a file, or prepare to send to a remote server.
  • As an untyped or loosely typed representation. Maybe you want to check that some JSON data is valid before passing it on, but without knowing the structure of what it contains. Or you want to do very basic manipulations like insert a key in a particular spot.
  • As a strongly typed Rust data structure. When you expect all or most of your data to conform to a particular structure and want to get real work done without JSON's loosey-goosey nature tripping you up.

Serde JSON provides efficient, flexible, safe ways of converting data between each of these representations.

Operating on untyped JSON values

Any valid JSON data can be manipulated in the following recursive enum representation. This data structure is serde_json::Value.

enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

A string of JSON data can be parsed into a serde_json::Value by the serde_json::from_str function. There is also from_slice for parsing from a byte slice &[u8] and from_reader for parsing from any io::Read like a File or a TCP stream.

use serde_json::{Result, Value};

fn untyped_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into serde_json::Value.
    let v: Value = serde_json::from_str(data)?;

    // Access parts of the data by indexing with square brackets.
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    Ok(())
}

The result of square bracket indexing like v["name"] is a borrow of the data at that index, so the type is &Value. A JSON map can be indexed with string keys, while a JSON array can be indexed with integer keys. If the type of the data is not right for the type with which it is being indexed, or if a map does not contain the key being indexed, or if the index into a vector is out of bounds, the returned element is Value::Null.

When a Value is printed, it is printed as a JSON string. So in the code above, the output looks like Please call "John Doe" at the number "+44 1234567". The quotation marks appear because v["name"] is a &Value containing a JSON string and its JSON representation is "John Doe". Printing as a plain string without quotation marks involves converting from a JSON string to a Rust string with as_str() or avoiding the use of Value as described in the following section.

The Value representation is sufficient for very basic tasks but can be tedious to work with for anything more significant. Error handling is verbose to implement correctly, for example imagine trying to detect the presence of unrecognized fields in the input data. The compiler is powerless to help you when you make a mistake, for example imagine typoing v["name"] as v["nmae"] in one of the dozens of places it is used in your code.

Parsing JSON as strongly typed data structures

Serde provides a powerful way of mapping JSON data into Rust data structures largely automatically.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into a Person object. This is exactly the
    // same function as the one that produced serde_json::Value above, but
    // now we are asking it for a Person as output.
    let p: Person = serde_json::from_str(data)?;

    // Do things just like with any other Rust data structure.
    println!("Please call {} at the number {}", p.name, p.phones[0]);

    Ok(())
}

This is the same serde_json::from_str function as before, but this time we assign the return value to a variable of type Person so Serde will automatically interpret the input data as a Person and produce informative error messages if the layout does not conform to what a Person is expected to look like.

Any type that implements Serde's Deserialize trait can be deserialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Deserialize)].

Once we have p of type Person, our IDE and the Rust compiler can help us use it correctly like they do for any other Rust code. The IDE can autocomplete field names to prevent typos, which was impossible in the serde_json::Value representation. And the Rust compiler can check that when we write p.phones[0], then p.phones is guaranteed to be a Vec<String> so indexing into it makes sense and produces a String.

The necessary setup for using Serde's derive macros is explained on the Using derive page of the Serde site.

Constructing JSON values

Serde JSON provides a json! macro to build serde_json::Value objects with very natural JSON syntax.

use serde_json::json;

fn main() {
    // The type of `john` is `serde_json::Value`
    let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    println!("first phone number: {}", john["phones"][0]);

    // Convert to a string of JSON and print it out
    println!("{}", john.to_string());
}

The Value::to_string() function converts a serde_json::Value into a String of JSON text.

One neat thing about the json! macro is that variables and expressions can be interpolated directly into the JSON value as you are building it. Serde will check at compile time that the value you are interpolating is able to be represented as JSON.

let full_name = "John Doe";
let age_last_year = 42;

// The type of `john` is `serde_json::Value`
let john = json!({
    "name": full_name,
    "age": age_last_year + 1,
    "phones": [
        format!("+44 {}", random_phone())
    ]
});

This is amazingly convenient, but we have the problem we had before with Value: the IDE and Rust compiler cannot help us if we get it wrong. Serde JSON provides a better way of serializing strongly-typed data structures into JSON text.

Creating JSON by serializing data structures

A data structure can be converted to a JSON string by serde_json::to_string. There is also serde_json::to_vec which serializes to a Vec<u8> and serde_json::to_writer which serializes to any io::Write such as a File or a TCP stream.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Address {
    street: String,
    city: String,
}

fn print_an_address() -> Result<()> {
    // Some data structure.
    let address = Address {
        street: "10 Downing Street".to_owned(),
        city: "London".to_owned(),
    };

    // Serialize it to a JSON string.
    let j = serde_json::to_string(&address)?;

    // Print, write to a file, or send to an HTTP server.
    println!("{}", j);

    Ok(())
}

Any type that implements Serde's Serialize trait can be serialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Serialize)].

Performance

It is fast. You should expect in the ballpark of 500 to 1000 megabytes per second deserialization and 600 to 900 megabytes per second serialization, depending on the characteristics of your data. This is competitive with the fastest C and C++ JSON libraries or even 30% faster for many use cases. Benchmarks live in the serde-rs/json-benchmark repo.

Getting help

Serde is one of the most widely used Rust libraries, so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo, but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

No-std support

As long as there is a memory allocator, it is possible to use serde_json without the rest of the Rust standard library. This is supported on Rust 1.36+. Disable the default "std" feature and enable the "alloc" feature:

[dependencies]
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }

For JSON support in Serde without a memory allocator, please see the serde-json-core crate.

Link: https://crates.io/crates/serde_json

#rust  #rustlang  #encode   #json 

Oyente | An Analysis Tool for Smart Contracts

Oyente

An Analysis Tool for Smart Contracts

This repository is currently maintained by Xiao Liang Yu (@yxliang01). If you encounter any bugs or usage issues, please feel free to create an issue on our issue tracker.

Quick Start

A container with required dependencies configured can be found here. The image is however outdated. We are working on pushing the latest image to dockerhub for your convenience. If you experience any issue with this image, please try to build a new docker image by pulling this codebase before open an issue.

To open the container, install docker and run:

docker pull luongnguyen/oyente && docker run -i -t luongnguyen/oyente

To evaluate the greeter contract inside the container, run:

cd /oyente/oyente && python oyente.py -s greeter.sol

and you are done!

Note - If need the version of Oyente referred to in the paper, run the container from here

To run the web interface, execute docker run -w /oyente/web -p 3000:3000 oyente:latest ./bin/rails server

Custom Docker image build

docker build -t oyente .
docker run -it -p 3000:3000 -e "OYENTE=/oyente/oyente" oyente:latest

Open a web browser to http://localhost:3000 for the graphical interface.

Installation

Execute a python virtualenv

python -m virtualenv env
source env/bin/activate

Install Oyente via pip:

$ pip2 install oyente

Dependencies:

The following require a Linux system to fufill. macOS instructions forthcoming.

solc evm

Full installation

Install the following dependencies

solc

$ sudo add-apt-repository ppa:ethereum/ethereum
$ sudo apt-get update
$ sudo apt-get install solc

evm from go-ethereum

  1. https://geth.ethereum.org/downloads/ or
  2. By from PPA if your using Ubuntu

z3 Theorem Prover version 4.5.0.

Download the source code of version z3-4.5.0

Install z3 using Python bindings

$ python scripts/mk_make.py --python
$ cd build
$ make
$ sudo make install

Requests library

pip install requests

web3 library

pip install web3

Evaluating Ethereum Contracts

#evaluate a local solidity contract
python oyente.py -s <contract filename>

#evaluate a local solidity with option -a to verify assertions in the contract
python oyente.py -a -s <contract filename>

#evaluate a local evm contract
python oyente.py -s <contract filename> -b

#evaluate a remote contract
python oyente.py -ru https://gist.githubusercontent.com/loiluu/d0eb34d473e421df12b38c12a7423a61/raw/2415b3fb782f5d286777e0bcebc57812ce3786da/puzzle.sol

And that's it! Run python oyente.py --help for a list of options.

Paper

The accompanying paper explaining the bugs detected by the tool can be found here.

Miscellaneous Utilities

A collection of the utilities that were developed for the paper are in misc_utils. Use them at your own risk - they have mostly been disposable.

  1. generate-graphs.py - Contains a number of functions to get statistics from contracts.
  2. get_source.py - The get_contract_code function can be used to retrieve contract source from EtherScan
  3. transaction_scrape.py - Contains functions to retrieve up-to-date transaction information for a particular contract.

Benchmarks

Note: This is an improved version of the tool used for the paper. Benchmarks are not for direct comparison.

To run the benchmarks, it is best to use the docker container as it includes the blockchain snapshot necessary. In the container, run batch_run.py after activating the virtualenv. Results are in results.json once the benchmark completes.

The benchmarks take a long time and a lot of RAM in any but the largest of clusters, beware.

Some analytics regarding the number of contracts tested, number of contracts analysed etc. is collected when running this benchmark.

Contributing

Checkout out our contribution guide and the code structure here.

$ sudo apt-get install software-properties-common
$ sudo add-apt-repository -y ppa:ethereum/ethereum
$ sudo apt-get update
$ sudo apt-get install ethereum

Download Details:
Author: enzymefinance
Source Code: https://github.com/enzymefinance/oyente
License: GPL-3.0 license

#blockchain #smartcontract #ethereum

Rust Lang Course For Beginner In 2021: Guessing Game

 What we learn in this chapter:
- Rust number types and their default
- First exposure to #Rust modules and the std::io module to read input from the terminal
- Rust Variable Shadowing
- Rust Loop keyword
- Rust if/else
- First exposure to #Rust match keyword

=== Content:
00:00 - Intro & Setup
02:11 - The Plan
03:04 - Variable Secret
04:03 - Number Types
05:45 - Mutability recap
06:22 - Ask the user
07:45 - First intro to module std::io
08:29 - Rust naming conventions
09:22 - Read user input io:stdin().read_line(&mut guess)
12:46 - Break & Understand
14:20 - Parse string to number
17:10 - Variable Shadowing
18:46 - If / Else - You Win, You Loose
19:28 - Loop
20:38 - Match
23:19 - Random with rand
26:35 - Run it all
27:09 - Conclusion and next episode

#rust