Royce  Reinger

Royce Reinger

1658820180

Encode & Decode Emoji Unicode Characters into Emoji-cheat-sheet form

Rumoji

This is a tool to convert Emoji Unicode codepoints into the human-friendly codes used by http://www.emoji-cheat-sheet.com/ and back again.

tl;dr

By doing this, you can ensure that users across devices can see the authorโ€™s intention. You can always show users an image, but you canโ€™t show them a range of characters their system does not support.

This gem is primarily for handling emoji characters in user-generated content. Depending on your technical stack, these characters could end up being lost.

Usage

Rumoji.encode(str)
# Takes a String, transforms Emoji into cheat-sheet codes

Rumoji.encode(str) { |emoji| #your code here }
# Takes a String, transforms Emoji into whatever you want

Rumoji.decode(str)
# Does the reverse of encode

Rumoji.encode_io(read, write)
# For an IO pipe (a read stream, and a write stream), transform Emoji from the
# read end, and write the cheat-sheet codes on the write end.

Rumoji.decode_io(read, write)
# Same thing but in reverse!

Installation

gem install rumoji

Note that rumoji has only been tested in Rubies >= 1.9!!!

Some examples:

puts Rumoji.encode("Lack of cross-device emoji support makes me ๐Ÿ˜ญ")
#=> Lack of cross-device emoji support makes me :sob:

Rumoji.encode_io(StringIO.new("๐Ÿ’ฉ")).string
#=> ":poop:"

Here's a fun file:

Rumoji.decode_io($stdin, $stdout)

On the command line

echo "But Rumoji makes encoding issues a :joy:" | ruby ./funfile.rb
#=> But Rumoji makes encoding issues a ๐Ÿ˜‚

Emoji methods

.code

The symbol of the emoji surrounded with colons

Rumoji.encode("๐Ÿ˜ญ") {|emoji| emoji.code}
#=> ":sob:"

.symbol

The symbol of the emoji

Rumoji.encode("๐Ÿ˜ญ") {|emoji| emoji.code}
#=> "sob"

.multiple?

Returns true if the emoji is made up of multiple code points. E.g. ๐Ÿ‡บ๐Ÿ‡ธ

Rumoji.encode("๐Ÿ‡บ๐Ÿ‡ธ") {|emoji| emoji.multiple?}
#=> true

.string

The raw emoji

Rumoji.encode("๐Ÿ˜ญ") {|emoji| emoji.string}
#=> "๐Ÿ˜ญ"

Implement the emoji codes from emoji-cheat-sheet.com using a tool like gemoji along with Rumoji, and you'll easily be able to transform user input with raw emoji unicode into images you can show to all users.

Having trouble discerning what's happening in this README? You might be on a device with NO emoji support! All the more reason to use Rumoji. Transcode the raw unicode into something users can understand across devices!

Thanks!

Why would you want to do this? Read this blog post: http://mwunsch.tumblr.com/post/34721548842/we-need-to-talk-about-emoji

Author: Mwunsch
Source Code: https://github.com/mwunsch/rumoji 
License: MIT license

#ruby #encode #emoji 

Encode & Decode Emoji Unicode Characters into Emoji-cheat-sheet form
Gordon  Taylor

Gordon Taylor

1657199580

Lexicographically Ordered integers for Level(up)

lexicographic-integer-encoding

Lexicographically ordered integers for level(up). Wraps lexicographic-integer.

usage with level

const level = require('level')
const lexint = require('lexicographic-integer-encoding')('hex')

const db = level('./db', { keyEncoding: lexint })

db.put(2, 'example', (err) => {
  db.put(10, 'example', (err) => {
    // Without our encoding, the keys would sort as 10, 2.
    db.createKeyStream().on('data', console.log) // 2, 10
  })
})

usage with levelup

const levelup = require('levelup')
const encode = require('encoding-down')
const leveldown = require('leveldown')
const lexint = require('lexicographic-integer-encoding')('hex')

const db = levelup(encode(leveldown('./db'), { keyEncoding: lexint }))

api

lexint = require('lexicographic-integer-encoding')(encoding, [options])

  • encoding (string, required): 'hex' or 'buffer'
  • options.strict (boolean): opt-in to type-checking input. If true, encode will throw:
    • A TypeError if input is not a number or if NaN
    • A RangeError if input is < 0 or > Number.MAX_SAFE_INTEGER

Returns a level-codec compliant encoding object.

see also

install

With npm do:

npm install lexicographic-integer-encoding

Author: vweevers
Source Code: https://github.com/vweevers/lexicographic-integer-encoding 
License: MIT license

#javascript #encode 

Lexicographically Ordered integers for Level(up)
Awesome  Rust

Awesome Rust

1654979700

STFU-8: Sorta Text format in UTF-8 Written In Rust

STFU-8 is a hacky text encoding/decoding protocol for data that might be not quite UTF-8 but is still mostly UTF-8. It is based on the syntax of the repr created when you write (or print) binary text in rust, python, C or other common programming languages.

Its primary purpose is to be able to allow a human to visualize and edit "data" that is mostly (or fully) visible UTF-8 text. It encodes all non visible or non UTF-8 compliant bytes as longform text (i.e. ESC becomes the full string r"\x1B"). It can also encode/decode ill-formed UTF-16.

Comparision to other formats:

  • UTF-8 (i.e. std::str): UTF-8 is a standardized format for encoding human understandable text in any language on the planet. It is the reason the internet can be understood by almost anyone and should be the primary way that text is encoded. However, not everything that is "UTF-8 like" follows the standard exactly. For instance:
    • The linux command line defines ANSI escape codes to provide styles like color, bold, italic, etc. Even though almost everything printed to a terminal is UTF-8 text these "escape codes" might not be, and even if they are UTF-8, they are not visible characters.
    • Windows paths are not necessarily UTF-8 compliant as they can have [ill formed text][utf-16-ill-formed-text].
    • There might be other cases you can think of or want to create. In general, try not to create more use cases if you don't have to.
  • rust's OsStr: OsStr is the "cross platform" type for handling system specific strings, mainly in file paths. Unlike STFU-8 it not (always) coercible into UTF-8 and therefore cannot be serialized into JSON or other formats.
  • WTF-8 (rust-wtf8): is great for interoperating with different UTF standards but cannot be used to transmit data over the internet. The spec states: "WTF-8 must not be used to represent text in a file format or for transmission over the Internet."
  • base64 (base64): also encodes binary data as UTF-8. If your data is actually binary (i.e. not text) then use base64. However, if your data was formerly text (or mostly text) then encoding to base64 will make it completely un(human)readable.
  • Array[u8]: obviously great if your data is actually binary (i.e. NOT TEXT) and you don't need to put it into a UTF-8 encoding. However, an array of bytes (i.e. [0x72, 0x65, 0x61, 0x64, 0x20, 0x69, 0x74] is not human readable. Even if it were in pure ASCII the only ones who can read it efficiently are low-level programming Gods who have never figured out how to debug-print their ASCII.
  • STFU-8 (this crate): is "good" when you want to have only printable/hand-editable text (and your data is mostly UTF-8) but the data might have a couple of binary/non-printable/ill-formed pieces. It is very poor if your data is actually binary, requiring (on average) a mapping of 4/1 for binary data.

Specification

In simple terms, encoded STFU-8 is itself always valid unicode which decodes to binary (the binary is not necessarily UTF-8). It differs from unicode in that single \ items are illegal. The following patterns are legal:

  • \\: decodes to the backward-slash (\) byte (\x5c)
  • \t: decodes to the tab byte (\x09)
  • \n: decodes to the newline byte (\x0A)
  • \r: decodes to the linefeed byte (\x0D)
  • \xXX where XX are exactly two case-insensitive hexidecimal digits: decodes to the \xXX byte, where XX is a hexidecimal number (example: \x9F, \xaB or \x05). This never gets resolved into a code point, the value is pushed directly into the decoder stream.
  • \uXXXXXX where XXXXXX are exacty six case-insensitive hexidecimal digits, decodes to a 24bit number that typically represenents a unicode code point. If the value is a unicode code point it will always be decoded as such. Otherwise stfu8 will attempt to store the value into the decoder (if the value is too large for the decoding type it will be an error).

stfu8 provides 2 different categories of functions for encoding/decoding data that are not necessarily interoperable (don't decode output created from encode_u8 with decode_u16).

  • encode_u8(&[u8]) -> String and decode_u8(&str) -> Vec<u8>: encodes or decodes an array of u8 values to/from STFU-8, primarily used for interfacing with binary/nonvisible data that is almost UTF-8.
  • encode_u16(&[u16]) -> String and decode_u16(&str) -> Vec<u16>: encodes or decodes an array of u16 values to/from STFU-8, primarily used for interfacing with legacy UTF-16 formats that may contain [ill formed text][utf-16-ill-formed-text] but also converts unprintable characters.

There are some general rules for encoding and decoding:

  • If \u... cannot be resolved into a valid UTF code point it must fit into the decoder. For instance, trying to decode "\u00DEED" (which is an UTF-16 Trail surrogage) using decode_u8 will fail, but will succeed with decode_u16.
  • No escaped values are ever chained. For example, "\x01\x02" will be [0x01, 0x02] not [0x0102] -- even if you use decode_u16.
  • Values escaped with \x... are always copied verbatum into the decoder. I.e. \xFF is a valid UTF-32 code point, but if decoded with decode_u8 it will be 0xFE in the buffer, not two bytes of data as the UTF-8 character 'รพ'. Note that with decode_u16 0xFE is a valid UTF-16 code point, so when re-encoded would be the 'รพ' character. Moral of the story: don't mix inputs/outputs of the the u8 and u16 functions.

tab, newline, and line-feed characters are "visible", so encoding with them in "pretty form" is optional.

UTF-16 Ill Formed Text

The problem is succinctly stated here:

http://unicode.org/faq/utf_bom.html

Q: How do I convert an unpaired UTF-16 surrogate to UTF-8?

A different issue arises if an unpairedsurrogate is encountered when converting ill-formed UTF-16 data. By represented such an unpaired surrogate on its own as a 3-byte sequence, the resulting UTF-8 data stream would become ill-formed. While it faithfully reflects the nature of the input, Unicode conformance requires that encoding form conversion always results in valid data stream. Therefore a convertermust treat this as an error. [AF]

Also, from the WTF-8 spec

As a result, [unpaired] surrogates do occur in practice and need to be preserved. For example:

In ECMAScript (a.k.a. JavaScript), a String value is defined as a sequence of 16-bit integers that usually represents UTF-16 text but may or may not be well-formed. Windows applications normally use UTF-16, but the file system treats path and file names as an opaque sequence of WCHARs (16-bit code units).

We say that strings in these systems are encoded in potentially ill-formed UTF-16 or WTF-16.

Basically: you can't (always) convert from UTF-16 to UTF-8 and it's a real bummer. WTF-8, while kindof an answer to this problem, doesn't allow me to serialize UTF-16 into a UTF-8 format, send it to my webapp, edit it (as a human), and send it back. That is what STFU-8 is for.

Download Details:
Author: vitiral
Source Code: https://github.com/vitiral/stfu8
License: Unknown, MIT licenses found

#rust  #rustlang  #encode  #yaml 

STFU-8: Sorta Text format in UTF-8 Written In Rust
Awesome  Rust

Awesome Rust

1654972380

YAML Rust: A Pure Rust YAML Implementation

yaml-rust

The missing YAML 1.2 implementation for Rust.

yaml-rust is a pure Rust YAML 1.2 implementation, which enjoys the memory safety property and other benefits from the Rust language. The parser is heavily influenced by libyaml and yaml-cpp.

Quick Start

Add the following to the Cargo.toml of your project:

[dependencies]
yaml-rust = "0.4"

and import:

extern crate yaml_rust;

Use yaml::YamlLoader to load the YAML documents and access it as Vec/HashMap:

extern crate yaml_rust;
use yaml_rust::{YamlLoader, YamlEmitter};

fn main() {
    let s =
"
foo:
    - list1
    - list2
bar:
    - 1
    - 2.0
";
    let docs = YamlLoader::load_from_str(s).unwrap();

    // Multi document support, doc is a yaml::Yaml
    let doc = &docs[0];

    // Debug support
    println!("{:?}", doc);

    // Index access for map & array
    assert_eq!(doc["foo"][0].as_str().unwrap(), "list1");
    assert_eq!(doc["bar"][1].as_f64().unwrap(), 2.0);

    // Chained key/array access is checked and won't panic,
    // return BadValue if they are not exist.
    assert!(doc["INVALID_KEY"][100].is_badvalue());

    // Dump the YAML object
    let mut out_str = String::new();
    {
        let mut emitter = YamlEmitter::new(&mut out_str);
        emitter.dump(doc).unwrap(); // dump the YAML object to a String
    }
    println!("{}", out_str);
}

Note that yaml_rust::Yaml implements Index<&'a str> & Index<usize>:

  • Index<usize> assumes the container is an Array
  • Index<&'a str> assumes the container is a string to value Map
  • otherwise, Yaml::BadValue is returned

If your document does not conform to this convention (e.g. map with complex type key), you can use the Yaml::as_XXX family API to access your documents.

Features

  • Pure Rust
  • Ruby-like Array/Hash access API
  • Low-level YAML events emission

Specification Compliance

This implementation aims to provide YAML parser fully compatible with the YAML 1.2 specification. The parser can correctly parse almost all examples in the specification, except for the following known bugs:

  • Empty plain scalar in certain contexts

However, the widely used library libyaml also fails to parse these examples, so it may not be a huge problem for most users.

Goals

  • Encoder
  • Tag directive
  • Alias while deserialization

Minimum Rust version policy

This crate's minimum supported rustc version is 1.31 (released with Rust 2018, after v0.4.3), as this is the currently known minimum version for regex as well.

Contribution

Fork & PR on Github.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Download Details:
Author: chyh1990
Source Code: https://github.com/chyh1990/yaml-rust
License: Apache-2.0, MIT licenses found

#rust #yaml  #rustlang  #encode 

YAML Rust: A Pure Rust YAML Implementation
Awesome  Rust

Awesome Rust

1654965000

Quick XML: Rust High Performance Xml Reader and Writer

quick-xml

High performance xml pull reader/writer.

The reader:

  • is almost zero-copy (use of Cow whenever possible)
  • is easy on memory allocation (the API provides a way to reuse buffers)
  • support various encoding (with encoding feature), namespaces resolution, special characters.

Example

Reader

use quick_xml::Reader;
use quick_xml::events::Event;

let xml = r#"<tag1 att1 = "test">
                <tag2><!--Test comment-->Test</tag2>
                <tag2>
                    Test 2
                </tag2>
            </tag1>"#;

let mut reader = Reader::from_str(xml);
reader.trim_text(true);

let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();

// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
    // NOTE: this is the generic case when we don't know about the input BufRead.
    // when the input is a &str or a &[u8], we don't actually need to use another
    // buffer, we could directly call `reader.read_event_unbuffered()`
    match reader.read_event(&mut buf) {
        Ok(Event::Start(ref e)) => {
            match e.name() {
                b"tag1" => println!("attributes values: {:?}",
                                    e.attributes().map(|a| a.unwrap().value).collect::<Vec<_>>()),
                b"tag2" => count += 1,
                _ => (),
            }
        },
        Ok(Event::Text(e)) => txt.push(e.unescape_and_decode(&reader).unwrap()),
        Ok(Event::Eof) => break, // exits the loop when reaching end of file
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
        _ => (), // There are several other `Event`s we do not consider here
    }

    // if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
    buf.clear();
}

Writer

use quick_xml::Writer;
use quick_xml::Reader;
use quick_xml::events::{Event, BytesEnd, BytesStart};
use std::io::Cursor;
use std::iter;

let xml = r#"<this_tag k1="v1" k2="v2"><child>text</child></this_tag>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut writer = Writer::new(Cursor::new(Vec::new()));
let mut buf = Vec::new();
loop {
    match reader.read_event(&mut buf) {
        Ok(Event::Start(ref e)) if e.name() == b"this_tag" => {

            // crates a new element ... alternatively we could reuse `e` by calling
            // `e.into_owned()`
            let mut elem = BytesStart::owned(b"my_elem".to_vec(), "my_elem".len());

            // collect existing attributes
            elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));

            // copy existing attributes, adds a new my-key="some value" attribute
            elem.push_attribute(("my-key", "some value"));

            // writes the event to the writer
            assert!(writer.write_event(Event::Start(elem)).is_ok());
        },
        Ok(Event::End(ref e)) if e.name() == b"this_tag" => {
            assert!(writer.write_event(Event::End(BytesEnd::borrowed(b"my_elem"))).is_ok());
        },
        Ok(Event::Eof) => break,
    // you can use either `e` or `&e` if you don't want to move the event
        Ok(e) => assert!(writer.write_event(&e).is_ok()),
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
    }
    buf.clear();
}

let result = writer.into_inner().into_inner();
let expected = r#"<my_elem k1="v1" k2="v2" my-key="some value"><child>text</child></my_elem>"#;
assert_eq!(result, expected.as_bytes());

Serde

When using the serialize feature, quick-xml can be used with serde's Serialize/Deserialize traits.

Here is an example deserializing crates.io source:

// Cargo.toml
// [dependencies]
// serde = { version = "1.0", features = [ "derive" ] }
// quick-xml = { version = "0.22", features = [ "serialize" ] }
use serde::Deserialize;
use quick_xml::de::{from_str, DeError};

#[derive(Debug, Deserialize, PartialEq)]
struct Link {
    rel: String,
    href: String,
    sizes: Option<String>,
}

#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Lang {
    En,
    Fr,
    De,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Head {
    title: String,
    #[serde(rename = "link", default)]
    links: Vec<Link>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Script {
    src: String,
    integrity: String,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Body {
    #[serde(rename = "script", default)]
    scripts: Vec<Script>,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Html {
    lang: Option<String>,
    head: Head,
    body: Body,
}

fn crates_io() -> Result<Html, DeError> {
    let xml = "<!DOCTYPE html>
        <html lang=\"en\">
          <head>
            <meta charset=\"utf-8\">
            <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
            <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">

            <title>crates.io: Rust Package Registry</title>


        <!-- EMBER_CLI_FASTBOOT_TITLE --><!-- EMBER_CLI_FASTBOOT_HEAD -->
        <link rel=\"manifest\" href=\"/manifest.webmanifest\">
        <link rel=\"apple-touch-icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" sizes=\"227x227\">

            <link rel=\"stylesheet\" href=\"/assets/vendor-8d023d47762d5431764f589a6012123e.css\" integrity=\"sha256-EoB7fsYkdS7BZba47+C/9D7yxwPZojsE4pO7RIuUXdE= sha512-/SzGQGR0yj5AG6YPehZB3b6MjpnuNCTOGREQTStETobVRrpYPZKneJwcL/14B8ufcvobJGFDvnTKdcDDxbh6/A==\" >
            <link rel=\"stylesheet\" href=\"/assets/cargo-cedb8082b232ce89dd449d869fb54b98.css\" integrity=\"sha256-S9K9jZr6nSyYicYad3JdiTKrvsstXZrvYqmLUX9i3tc= sha512-CDGjy3xeyiqBgUMa+GelihW394pqAARXwsU+HIiOotlnp1sLBVgO6v2ZszL0arwKU8CpvL9wHyLYBIdfX92YbQ==\" >


            <link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\">
            <link rel=\"icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" type=\"image/png\">
            <link rel=\"search\" href=\"/opensearch.xml\" type=\"application/opensearchdescription+xml\" title=\"Cargo\">
          </head>
          <body>
            <!-- EMBER_CLI_FASTBOOT_BODY -->
            <noscript>
                <div id=\"main\">
                    <div class='noscript'>
                        This site requires JavaScript to be enabled.
                    </div>
                </div>
            </noscript>

            <script src=\"/assets/vendor-bfe89101b20262535de5a5ccdc276965.js\" integrity=\"sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==\" ></script>
            <script src=\"/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js\" integrity=\"sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==\" ></script>

          </body>
        </html>
}";
    let html: Html = from_str(xml)?;
    assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
    Ok(html)
}

Credits

This has largely been inspired by serde-xml-rs. quick-xml follows its convention for deserialization, including the $value special name.

Original quick-xml was developed by @tafia and abandoned around end of 2021.

Parsing the "value" of a tag

If you have an input of the form <foo abc="xyz">bar</foo>, and you want to get at the bar, you can use the special name $value:

struct Foo {
    pub abc: String,
    #[serde(rename = "$value")]
    pub body: String,
}

Unflattening structs into verbose XML

If your XML files look like <root><first>value</first><second>value</second></root>, you can (de)serialize them with the special name prefix $unflatten=:

struct Root {
    #[serde(rename = "$unflatten=first")]
    first: String,
    #[serde(rename = "$unflatten=second")]
    other_field: String,
}

Serializing unit variants as primitives

The $primitive prefix lets you serialize enum variants without associated values (internally referred to as unit variants) as primitive strings rather than self-closing tags. Consider the following definitions:

enum Foo {
    #[serde(rename = "$primitive=Bar")]
    Bar
}

struct Root {
    foo: Foo
}

Serializing Root { foo: Foo::Bar } will then yield <Root foo="Bar"/> instead of <Root><Bar/></Root>.

Performance

Note that despite not focusing on performance (there are several unnecessary copies), it remains about 10x faster than serde-xml-rs.

Features

  • encoding: support non utf8 xmls
  • serialize: support serde Serialize/Deserialize

Performance

Benchmarking is hard and the results depend on your input file and your machine.

Here on my particular file, quick-xml is around 50 times faster than xml-rs crate. (measurements was done while this crate named quick-xml)

// quick-xml benches
test bench_quick_xml            ... bench:     198,866 ns/iter (+/- 9,663)
test bench_quick_xml_escaped    ... bench:     282,740 ns/iter (+/- 61,625)
test bench_quick_xml_namespaced ... bench:     389,977 ns/iter (+/- 32,045)

// same bench with xml-rs
test bench_xml_rs               ... bench:  14,468,930 ns/iter (+/- 321,171)

// serde-xml-rs vs serialize feature
test bench_serde_quick_xml      ... bench:   1,181,198 ns/iter (+/- 138,290)
test bench_serde_xml_rs         ... bench:  15,039,564 ns/iter (+/- 783,485)

For a feature and performance comparison, you can also have a look at RazrFalcon's parser comparison table.

Contribute

Any PR is welcomed!

docs.rs

Syntax is inspired by xml-rs.

Download Details:
Author: tafia
Source Code: https://github.com/tafia/quick-xml
License: MIT license

#rust  #rustlang  #encode 

Quick XML: Rust High Performance Xml Reader and Writer
Awesome  Rust

Awesome Rust

1654957680

SXD XPath: An XML XPath Library in Rust

SXD-XPath

An XML XPath library in Rust

Overview

The project is broken into two crates:

  1. document - Basic DOM manipulation and reading/writing XML from strings.
  2. xpath - Implementation of XPath 1.0 expressions.

There are also scattered utilities for playing around at the command line.

In the future, I hope to add support for XSLT 1.0.

Goals

This project has a lofty goal: replace libxml and libxslt.

Contributing

  1. Fork it ( https://github.com/shepmaster/sxd-xpath/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Add a failing test.
  4. Add code to pass the test.
  5. Commit your changes (git commit -am 'Add some feature')
  6. Ensure tests pass.
  7. Push to the branch (git push origin my-new-feature)
  8. Create a new Pull Request

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

Download Details:
Author: shepmaster
Source Code: https://github.com/shepmaster/sxd-xpath
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #encode 

SXD XPath: An XML XPath Library in Rust
Awesome  Rust

Awesome Rust

1654950120

SXD Document: An XML library in Rust

SXD-Document

An XML library in Rust.

Overview

The project is currently broken into two crates:

  1. document - Basic DOM manipulation and reading/writing XML from strings.
  2. xpath - Implementation of XPath 1.0 expressions.

There are also scattered utilities for playing around at the command line.

In the future, I hope to add support for XSLT 1.0.

Goals

This project has two goals, one more achievable than the other:

  1. Help me learn Rust.
  2. Replace libxml and libxslt.

Contributing

  1. Fork it ( https://github.com/shepmaster/sxd-document/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Add a failing test.
  4. Add code to pass the test.
  5. Commit your changes (git commit -am 'Add some feature')
  6. Ensure tests pass.
  7. Push to the branch (git push origin my-new-feature)
  8. Create a new Pull Request

Download Details:
Author: shepmaster
Source Code: https://github.com/shepmaster/sxd-document
License: MIT license

#rust  #rustlang  #encode 

SXD Document: An XML library in Rust
Awesome  Rust

Awesome Rust

1654942800

XML RS: An XML Library in Rust

xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for XML (StAX).

This library currently contains pull parser much like StAX event reader. It provides iterator API, so you can leverage Rust's existing iterators library features.

It also provides a streaming document writer much like StAX event writer. This writer consumes its own set of events, but reader events can be converted to writer events easily, and so it is possible to write XML transformation chains in a pretty clean manner.

This parser is mostly full-featured, however, there are limitations:

  • no other encodings but UTF-8 are supported yet, because no stream-based encoding library is available now; when (or if) one will be available, I'll try to make use of it;
  • DTD validation is not supported, <!DOCTYPE> declarations are completely ignored; thus no support for custom entities too; internal DTD declarations are likely to cause parsing errors;
  • attribute value normalization is not performed, and end-of-line characters are not normalized too.

Other than that the parser tries to be mostly XML-1.0-compliant.

Writer is also mostly full-featured with the following limitations:

  • no support for encodings other than UTF-8, for the same reason as above;
  • no support for emitting <!DOCTYPE> declarations;
  • more validations of input are needed, for example, checking that namespace prefixes are bounded or comments are well-formed.

What is planned (highest priority first, approximately):

  1. missing features required by XML standard (e.g. aforementioned normalization and proper DTD parsing);
  2. miscellaneous features of the writer;
  3. parsing into a DOM tree and its serialization back to XML text;
  4. SAX-like callback-based parser (fairly easy to implement over pull parser);
  5. DTD validation;
  6. (let's dream a bit) XML Schema validation.

Building and using

xml-rs uses Cargo, so just add a dependency section in your project's manifest:

[dependencies]
xml-rs = "0.8"

The package exposes a single crate called xml:

extern crate xml;

Reading XML documents

xml::reader::EventReader requires a Read instance to read from. When a proper stream-based encoding library is available, it is likely that xml-rs will be switched to use whatever character stream structure this library would provide, but currently it is a Read.

Using EventReader is very straightforward. Just provide a Read instance to obtain an iterator over events:

extern crate xml;

use std::fs::File;
use std::io::BufReader;

use xml::reader::{EventReader, XmlEvent};

fn indent(size: usize) -> String {
    const INDENT: &'static str = "    ";
    (0..size).map(|_| INDENT)
             .fold(String::with_capacity(size*INDENT.len()), |r, s| r + s)
}

fn main() {
    let file = File::open("file.xml").unwrap();
    let file = BufReader::new(file);

    let parser = EventReader::new(file);
    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{}+{}", indent(depth), name);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{}-{}", indent(depth), name);
            }
            Err(e) => {
                println!("Error: {}", e);
                break;
            }
            _ => {}
        }
    }
}

EventReader implements IntoIterator trait, so you can just use it in a for loop directly. Document parsing can end normally or with an error. Regardless of exact cause, the parsing process will be stopped, and iterator will terminate normally.

You can also have finer control over when to pull the next event from the parser using its own next() method:

match parser.next() {
    ...
}

Upon the end of the document or an error the parser will remember that last event and will always return it in the result of next() call afterwards. If iterator is used, then it will yield error or end-of-document event once and will produce None afterwards.

It is also possible to tweak parsing process a little using xml::reader::ParserConfig structure. See its documentation for more information and examples.

You can find a more extensive example of using EventReader in src/analyze.rs, which is a small program (BTW, it is built with cargo build and can be run after that) which shows various statistics about specified XML document. It can also be used to check for well-formedness of XML documents - if a document is not well-formed, this program will exit with an error.

Writing XML documents

xml-rs also provides a streaming writer much like StAX event writer. With it you can write an XML document to any Write implementor.

extern crate xml;

use std::fs::File;
use std::io::{self, Write};

use xml::writer::{EventWriter, EmitterConfig, XmlEvent, Result};

fn handle_event<W: Write>(w: &mut EventWriter<W>, line: String) -> Result<()> {
    let line = line.trim();
    let event: XmlEvent = if line.starts_with("+") && line.len() > 1 {
        XmlEvent::start_element(&line[1..]).into()
    } else if line.starts_with("-") {
        XmlEvent::end_element().into()
    } else {
        XmlEvent::characters(&line).into()
    };
    w.write(event)
}

fn main() {
    let mut file = File::create("output.xml").unwrap();

    let mut input = io::stdin();
    let mut output = io::stdout();
    let mut writer = EmitterConfig::new().perform_indent(true).create_writer(&mut file);
    loop {
        print!("> "); output.flush().unwrap();
        let mut line = String::new();
        match input.read_line(&mut line) {
            Ok(0) => break,
            Ok(_) => match handle_event(&mut writer, line) {
                Ok(_) => {}
                Err(e) => panic!("Write error: {}", e)
            },
            Err(e) => panic!("Input error: {}", e)
        }
    }
}

The code example above also demonstrates how to create a writer out of its configuration. Similar thing also works with EventReader.

The library provides an XML event building DSL which helps to construct complex events, e.g. ones having namespace definitions. Some examples:

// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")

// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")

// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")

Of course, one can create XmlEvent enum variants directly instead of using the builder DSL. There are more examples in xml::writer::XmlEvent documentation.

The writer has multiple configuration options; see EmitterConfig documentation for more information.

Other things

No performance tests or measurements are done. The implementation is rather naive, and no specific optimizations are made. Hopefully the library is sufficiently fast to process documents of common size. I intend to add benchmarks in future, but not until more important features are added.

Known issues

All known issues are present on GitHub issue tracker: http://github.com/netvl/xml-rs/issues. Feel free to post any found problems there.

Download Details:
Author: netvl
Source Code: https://github.com/netvl/xml-rs
License: MIT license

#rust  #rustlang  #encode 

XML RS: An XML Library in Rust
Awesome  Rust

Awesome Rust

1654935480

Yaserde: Yet Another Serializer / Deserializer for XML

yaserde

Yet Another Serializer/Deserializer specialized for XML

Goal

This library will support XML de/ser-ializing with all specific features.

Supported types

  •  Struct
  •  Vec
  •  Enum
  •  Enum with complex types
  •  Option
  •  String
  •  bool
  •  number (u8, i8, u32, i32, f32, f64)

Attributes

  •  attribute: this field is defined as an attribute
  •  default: defines the default function to init the field
  •  flatten: Flatten the contents of the field
  •  namespace: defines the namespace of the field
  •  rename: be able to rename a field
  •  root: rename the based element. Used only at the XML root.
  •  skip_serializing: Exclude this field from the serialized output. More details...
  •  skip_serializing_if: Skip the serialisation for this field if the condition is true. More details...
  •  text: this field match to the text content

Custom De/Ser-rializer

Any type can define a custom deserializer and/or serializer. To implement it, define the implementation of YaDeserialize/YaSerialize

impl YaDeserialize for MyType {
  fn deserialize<R: Read>(reader: &mut yaserde::de::Deserializer<R>) -> Result<Self, String> {
    // deserializer code
  }
}

impl YaSerialize for MyType {
  fn serialize<W: Write>(&self, writer: &mut yaserde::ser::Serializer<W>) -> Result<(), String> {
    // serializer code
  }
}

Download Details:
Author: media-io
Source Code: https://github.com/media-io/yaserde
License: MIT license

#rust  #rustlang  #encode 

Yaserde: Yet Another Serializer / Deserializer for XML
Awesome  Rust

Awesome Rust

1654928100

RustyXML: A XML Parser Written in Rust

RustyXML

RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder based on that.

The parser itself is derived from OFXMLParser as found in ObjFW https://webkeks.org/objfw/.

The current limitations are:

  • Incomplete error checking
  • Unstable API

The Minimal Supported Rust Version for this crate is Rust 1.40.0.

Examples

Parse a string into an Element struct:

use xml::Element;

let elem: Option<Element> = "<a href='//example.com'/>".parse();

Get events from parsing string data:

use xml::{Event, Parser};

// Create a new Parser
let mut p = Parser::new();

// Feed data to be parsed
p.feed_str("<a href");
p.feed_str("='//example.com'/>");

// Get events for the fed data
for event in p {
    match event.unwrap() {
        Event::ElementStart(tag) => println!("<{}>", tag.name),
        Event::ElementEnd(tag) => println!("</{}>", tag.name),
        _ => ()
    }
}

This should print:

<a>
</a>

Build Elements from Parser Events:

use xml::{Parser, ElementBuilder};

let mut p = xml::Parser::new();
let mut e = xml::ElementBuilder::new();

p.feed_str("<a href='//example.com'/>");
for elem in p.filter_map(|x| e.handle_event(x)) {
    match elem {
        Ok(e) => println!("{}", e),
        Err(e) => println!("{}", e),
    }
}

Build Elements by hand:

let mut reply = xml::Element::new("iq".into(), Some("jabber:client".into()),
                                  vec![("type".into(), None, "error".into()),
                                       ("id".into(), None, "42".into())]);
reply.tag(xml::Element::new("error".into(), Some("jabber:client".into()),
                            vec![("type".into(), None, "cancel".into())]))
     .tag_stay(xml::Element::new("forbidden".into(),
                                 Some("urn:ietf:params:xml:ns:xmpp-stanzas".into()),
                                 vec![]))
     .tag(xml::Element::new("text".into(),
                            Some("urn:ietf:params:xml:ns:xmpp-stanzas".into()),
                            vec![]))
     .text("Permission denied".into());

Result (some whitespace added for readability):

<iq xmlns='jabber:client' id='42' type='error'>
  <error type='cancel'>
    <forbidden xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
    <text xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'>Permission denied</text>
  </error>
</iq>

Attribute Order

By default the order of attributes is not tracked. Therefore during serialization and iteration their order will be random. This can be changed by enabling the ordered_attrs feature. With this feature enabled the order attributes were encountered while parsing, or added to an Element will be preserved.

Documentation

Download Details:
Author: Florob
Source Code: https://github.com/Florob/RustyXML
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #encode 

RustyXML: A XML Parser Written in Rust
Awesome  Rust

Awesome Rust

1654920780

Rusty Object Notation for Data Serialization Format

Rusty Object Notation

RON is a simple readable data serialization format that looks similar to Rust syntax. It's designed to support all of Serde's data model, so structs, enums, tuples, arrays, generic maps, and primitive values.

Example

GameConfig( // optional struct name
    window_size: (800, 600),
    window_title: "PAC-MAN",
    fullscreen: false,
    
    mouse_sensitivity: 1.4,
    key_bindings: {
        "up": Up,
        "down": Down,
        "left": Left,
        "right": Right,
        
        // Uncomment to enable WASD controls
        /*
        "W": Up,
        "A": Down,
        "S": Left,
        "D": Right,
        */
    },
    
    difficulty_options: (
        start_difficulty: Easy,
        adaptive: false,
    ),
)

Why RON?

Example in JSON

{
   "materials": {
        "metal": {
            "reflectivity": 1.0
        },
        "plastic": {
            "reflectivity": 0.5
        }
   },
   "entities": [
        {
            "name": "hero",
            "material": "metal"
        },
        {
            "name": "monster",
            "material": "plastic"
        }
   ]
}

Same example in RON

Scene( // class name is optional
    materials: { // this is a map
        "metal": (
            reflectivity: 1.0,
        ),
        "plastic": (
            reflectivity: 0.5,
        ),
    },
    entities: [ // this is an array
        (
            name: "hero",
            material: "metal",
        ),
        (
            name: "monster",
            material: "plastic",
        ),
    ],
)

Note the following advantages of RON over JSON:

  • trailing commas allowed
  • single- and multi-line comments
  • field names aren't quoted, so it's less verbose
  • optional struct names improve readability
  • enums are supported (and less verbose than their JSON representation)

RON syntax overview

  • Numbers: 42, 3.14, 0xFF, 0b0110
  • Strings: "Hello", "with\\escapes\n", r#"raw string, great for regex\."#
  • Booleans: true, false
  • Chars: 'e', '\n'
  • Optionals: Some("string"), Some(Some(1.34)), None
  • Tuples: ("abc", 1.23, true), ()
  • Lists: ["abc", "def"]
  • Structs: ( foo: 1.0, bar: ( baz: "I'm nested" ) )
  • Maps: { "arbitrary": "keys", "are": "allowed" }

Note: Serde's data model represents fixed-size Rust arrays as tuple (instead of as list)

Quickstart

Cargo.toml

[dependencies]
ron = "0.7"
serde = { version = "1", features = ["derive"] }

main.rs

use serde::{Deserialize, Serialize};

#[derive(Debug, Deserialize, Serialize)]
struct MyStruct {
    boolean: bool,
    float: f32,
}

fn main() {
    let x: MyStruct = ron::from_str("(boolean: true, float: 1.23)").unwrap();
    
    println!("RON: {}", ron::to_string(&x).unwrap());
}

Tooling

EditorPlugin
IntelliJintellij-ron
VS Codea5huynh/vscode-ron
Sublime TextRON
Atomlanguage-ron
Vimron-rs/ron.vim
EMACS[emacs-ron]

Specification

There is a very basic, work in progress specification available on the wiki page. A more formal and complete grammar is available here.

Download Details:
Author: ron-rs
Source Code: https://github.com/ron-rs/ron
License: Apache-2.0, MIT licenses found

#rust  #rustlang  #encode 

Rusty Object Notation for Data Serialization Format
Awesome  Rust

Awesome Rust

1654913400

Prost: A Protocol Buffers Implementation for Rust

prost is a Protocol Buffers implementation for the Rust Language. prost generates simple, idiomatic Rust code from proto2 and proto3 files.

Compared to other Protocol Buffers implementations, prost

  • Generates simple, idiomatic, and readable Rust types by taking advantage of Rust derive attributes.
  • Retains comments from .proto files in generated Rust code.
  • Allows existing Rust types (not generated from a .proto) to be serialized and deserialized by adding attributes.
  • Uses the bytes::{Buf, BufMut} abstractions for serialization instead of std::io::{Read, Write}.
  • Respects the Protobuf package specifier when organizing generated code into Rust modules.
  • Preserves unknown enum values during deserialization.
  • Does not include support for runtime reflection or message descriptors.

Using prost in a Cargo Project

First, add prost and its public dependencies to your Cargo.toml:

[dependencies]
prost = "0.10"
# Only necessary if using Protobuf well-known types:
prost-types = "0.10"

The recommended way to add .proto compilation to a Cargo project is to use the prost-build library. See the prost-build documentation for more details and examples.

See the snazzy repository for a simple start-to-finish example.

Generated Code

prost generates Rust code from source .proto files using the proto2 or proto3 syntax. prost's goal is to make the generated code as simple as possible.

protoc

It's recommended to install protoc locally in your path to improve build times. Prost uses protoc to parse protobuf files and will attempt to compile protobuf from source requiring a C++ toolchain. For more info checkout the prost-build docs.

Packages

Prost can now generate code for .proto files that don't have a package spec. prost will translate the Protobuf package into a Rust module. For example, given the package specifier:

package foo.bar;

All Rust types generated from the file will be in the foo::bar module.

Messages

Given a simple message declaration:

// Sample message.
message Foo {
}

prost will generate the following Rust struct:

/// Sample message.
#[derive(Clone, Debug, PartialEq, Message)]
pub struct Foo {
}

Fields

Fields in Protobuf messages are translated into Rust as public struct fields of the corresponding type.

Scalar Values

Scalar value types are converted as follows:

Protobuf TypeRust Type
doublef64
floatf32
int32i32
int64i64
uint32u32
uint64u64
sint32i32
sint64i64
fixed32u32
fixed64u64
sfixed32i32
sfixed64i64
boolbool
stringString
bytesVec<u8>

Enumerations

All .proto enumeration types convert to the Rust i32 type. Additionally, each enumeration type gets a corresponding Rust enum type. For example, this proto enum:

enum PhoneType {
  MOBILE = 0;
  HOME = 1;
  WORK = 2;
}

gets this corresponding Rust enum [1]:

pub enum PhoneType {
    Mobile = 0,
    Home = 1,
    Work = 2,
}

You can convert a PhoneType value to an i32 by doing:

PhoneType::Mobile as i32

The #[derive(::prost::Enumeration)] annotation added to the generated PhoneType adds these associated functions to the type:

impl PhoneType {
    pub fn is_valid(value: i32) -> bool { ... }
    pub fn from_i32(value: i32) -> Option<PhoneType> { ... }
}

so you can convert an i32 to its corresponding PhoneType value by doing, for example:

let phone_type = 2i32;

match PhoneType::from_i32(phone_type) {
    Some(PhoneType::Mobile) => ...,
    Some(PhoneType::Home) => ...,
    Some(PhoneType::Work) => ...,
    None => ...,
}

Additionally, wherever a proto enum is used as a field in a Message, the message will have 'accessor' methods to get/set the value of the field as the Rust enum type. For instance, this proto PhoneNumber message that has a field named type of type PhoneType:

message PhoneNumber {
  string number = 1;
  PhoneType type = 2;
}

will become the following Rust type [1] with methods type and set_type:

pub struct PhoneNumber {
    pub number: String,
    pub r#type: i32, // the `r#` is needed because `type` is a Rust keyword
}

impl PhoneNumber {
    pub fn r#type(&self) -> PhoneType { ... }
    pub fn set_type(&mut self, value: PhoneType) { ... }
}

Note that the getter methods will return the Rust enum's default value if the field has an invalid i32 value.

The enum type isn't used directly as a field, because the Protobuf spec mandates that enumerations values are 'open', and decoding unrecognized enumeration values must be possible.

[1] Annotations have been elided for clarity. See below for a full example.

Field Modifiers

Protobuf scalar value and enumeration message fields can have a modifier depending on the Protobuf version. Modifiers change the corresponding type of the Rust field:

.proto VersionModifierRust Type
proto2optionalOption<T>
proto2requiredT
proto3defaultT for scalar types, Option<T> otherwise
proto3optionalOption<T>
proto2/proto3repeatedVec<T>

Note that in proto3 the default representation for all user-defined message types is Option<T>, and for scalar types just T (during decoding, a missing value is populated by T::default()). If you need a witness of the presence of a scalar type T, use the optional modifier to enforce an Option<T> representation in the generated Rust struct.

Map Fields

Map fields are converted to a Rust HashMap with key and value type converted from the Protobuf key and value types.

Message Fields

Message fields are converted to the corresponding struct type. The table of field modifiers above applies to message fields, except that proto3 message fields without a modifier (the default) will be wrapped in an Option. Typically message fields are unboxed. prost will automatically box a message field if the field type and the parent type are recursively nested in order to avoid an infinite sized struct.

Oneof Fields

Oneof fields convert to a Rust enum. Protobuf oneofs types are not named, so prost uses the name of the oneof field for the resulting Rust enum, and defines the enum in a module under the struct. For example, a proto3 message such as:

message Foo {
  oneof widget {
    int32 quux = 1;
    string bar = 2;
  }
}

generates the following Rust[1]:

pub struct Foo {
    pub widget: Option<foo::Widget>,
}
pub mod foo {
    pub enum Widget {
        Quux(i32),
        Bar(String),
    }
}

oneof fields are always wrapped in an Option.

[1] Annotations have been elided for clarity. See below for a full example.

Services

prost-build allows a custom code-generator to be used for processing service definitions. This can be used to output Rust traits according to an application's specific needs.

Generated Code Example

Example .proto file:

syntax = "proto3";
package tutorial;

message Person {
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;
}

// Our address book file is just one of these.
message AddressBook {
  repeated Person people = 1;
}

and the generated Rust code (tutorial.rs):

#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Person {
    #[prost(string, tag="1")]
    pub name: ::prost::alloc::string::String,
    /// Unique ID number for this person.
    #[prost(int32, tag="2")]
    pub id: i32,
    #[prost(string, tag="3")]
    pub email: ::prost::alloc::string::String,
    #[prost(message, repeated, tag="4")]
    pub phones: ::prost::alloc::vec::Vec<person::PhoneNumber>,
}
/// Nested message and enum types in `Person`.
pub mod person {
    #[derive(Clone, PartialEq, ::prost::Message)]
    pub struct PhoneNumber {
        #[prost(string, tag="1")]
        pub number: ::prost::alloc::string::String,
        #[prost(enumeration="PhoneType", tag="2")]
        pub r#type: i32,
    }
    #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, ::prost::Enumeration)]
    #[repr(i32)]
    pub enum PhoneType {
        Mobile = 0,
        Home = 1,
        Work = 2,
    }
}
/// Our address book file is just one of these.
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct AddressBook {
    #[prost(message, repeated, tag="1")]
    pub people: ::prost::alloc::vec::Vec<Person>,
}

Accessing the protoc FileDescriptorSet

The prost_build::Config::file_descriptor_set_path option can be used to emit a file descriptor set during the build & code generation step. When used in conjunction with the std::include_bytes macro and the prost_types::FileDescriptorSet type, applications and libraries using Prost can implement introspection capabilities requiring details from the original .proto files.

Using prost in a no_std Crate

prost is compatible with no_std crates. To enable no_std support, disable the std features in prost and prost-types:

[dependencies]
prost = { version = "0.6", default-features = false, features = ["prost-derive"] }
# Only necessary if using Protobuf well-known types:
prost-types = { version = "0.6", default-features = false }

Additionally, configure prost-build to output BTreeMaps instead of HashMaps for all Protobuf map fields in your build.rs:

let mut config = prost_build::Config::new();
config.btree_map(&["."]);

When using edition 2015, it may be necessary to add an extern crate core; directive to the crate which includes prost-generated code.

Serializing Existing Types

prost uses a custom derive macro to handle encoding and decoding types, which means that if your existing Rust type is compatible with Protobuf types, you can serialize and deserialize it by adding the appropriate derive and field annotations.

Currently the best documentation on adding annotations is to look at the generated code examples above.

Tag Inference for Existing Types

Prost automatically infers tags for the struct.

Fields are tagged sequentially in the order they are specified, starting with 1.

You may skip tags which have been reserved, or where there are gaps between sequentially occurring tag values by specifying the tag number to skip to with the tag attribute on the first field after the gap. The following fields will be tagged sequentially starting from the next number.

use prost;
use prost::{Enumeration, Message};

#[derive(Clone, PartialEq, Message)]
struct Person {
    #[prost(string, tag = "1")]
    pub id: String, // tag=1
    // NOTE: Old "name" field has been removed
    // pub name: String, // tag=2 (Removed)
    #[prost(string, tag = "6")]
    pub given_name: String, // tag=6
    #[prost(string)]
    pub family_name: String, // tag=7
    #[prost(string)]
    pub formatted_name: String, // tag=8
    #[prost(uint32, tag = "3")]
    pub age: u32, // tag=3
    #[prost(uint32)]
    pub height: u32, // tag=4
    #[prost(enumeration = "Gender")]
    pub gender: i32, // tag=5
    // NOTE: Skip to less commonly occurring fields
    #[prost(string, tag = "16")]
    pub name_prefix: String, // tag=16  (eg. mr/mrs/ms)
    #[prost(string)]
    pub name_suffix: String, // tag=17  (eg. jr/esq)
    #[prost(string)]
    pub maiden_name: String, // tag=18
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Enumeration)]
pub enum Gender {
    Unknown = 0,
    Female = 1,
    Male = 2,
}

FAQ

  1. Could prost be implemented as a serializer for Serde?

Probably not, however I would like to hear from a Serde expert on the matter. There are two complications with trying to serialize Protobuf messages with Serde:

  • Protobuf fields require a numbered tag, and currently there appears to be no mechanism suitable for this in serde.
  • The mapping of Protobuf type to Rust type is not 1-to-1. As a result, trait-based approaches to dispatching don't work very well. Example: six different Protobuf field types correspond to a Rust Vec<i32>: repeated int32, repeated sint32, repeated sfixed32, and their packed counterparts.

But it is possible to place serde derive tags onto the generated types, so the same structure can support both prost and Serde.

2.   I get errors when trying to run cargo test on MacOS

If the errors are about missing autoreconf or similar, you can probably fix them by running

brew install automake
brew install libtool

Download Details:
Author: tokio-rs
Source Code: https://github.com/tokio-rs/prost
License: Apache-2.0 license

#rust  #rustlang  #encode 

Prost: A Protocol Buffers Implementation for Rust
Awesome  Rust

Awesome Rust

1654913280

Rust Implementation Of Google Protocol Buffers

rust-protobuf

Protobuf implementation in Rust.

  • Written in pure rust
  • Generates rust code
  • Has runtime library support for generated code (Coded{Input|Output}Stream impl)
  • Supports both Protobuf versions 2 and 3
  • and more

Where is documentation

Documentation is hosted on docs.rs.

Versions and branches

Version 3

Version 3 is current stable version. Compared to version 2 it implements:

  • runtime reflection
  • JSON and text format parsing and printing
  • dynamic messages (messages which can be created from .proto file on the fly without code generation)

Version 2

Version is previous stable version. Only most critical bugfixes will be applied to 2.x version, otherwise it won't be maintained.

Help

The crate needs help:

  • testing
  • documentation
  • examples to be used as documentation
  • feedback on API design
  • feedback on implementation
  • pull requests
  • a new maintainer

Changelog

See CHANGELOG.md for a list of changes and compatility issues between versions.

Related projects

  • prost โ€” another protobuf implementation in Rust, also has gRPC implementation
  • quick-protobuf โ€” alternative protobuf implementation in Rust
  • grpc-rs โ€” another gRPC implementation for Rust
  • grpc-rust โ€” incomplete implementation of gRPC based on this library

Download Details:
Author: stepancheg
Source Code: https://github.com/stepancheg/rust-protobuf
License: MIT license

#rust  #rustlang  #encode 

Rust Implementation Of Google Protocol Buffers
Awesome  Rust

Awesome Rust

1654906020

A Rust Library for Parsing and Encoding PEM-encoded Data

pem

A Rust library for parsing and encoding PEM-encoded data.

Usage

Add this to your Cargo.toml:

[dependencies]
pem = "1.0"

and this to your crate root:

extern crate pem;

Here is a simple example that parse PEM-encoded data and prints the tag:

extern crate pem;

use pem::parse;

const SAMPLE: &'static str = "-----BEGIN RSA PRIVATE KEY-----
MIIBPQIBAAJBAOsfi5AGYhdRs/x6q5H7kScxA0Kzzqe6WI6gf6+tc6IvKQJo5rQc
dWWSQ0nRGt2hOPDO+35NKhQEjBQxPh/v7n0CAwEAAQJBAOGaBAyuw0ICyENy5NsO
2gkT00AWTSzM9Zns0HedY31yEabkuFvrMCHjscEF7u3Y6PB7An3IzooBHchsFDei
AAECIQD/JahddzR5K3A6rzTidmAf1PBtqi7296EnWv8WvpfAAQIhAOvowIXZI4Un
DXjgZ9ekuUjZN+GUQRAVlkEEohGLVy59AiEA90VtqDdQuWWpvJX0cM08V10tLXrT
TTGsEtITid1ogAECIQDAaFl90ZgS5cMrL3wCeatVKzVUmuJmB/VAmlLFFGzK0QIh
ANJGc7AFk4fyFD/OezhwGHbWmo/S+bfeAiIh2Ss2FxKJ
-----END RSA PRIVATE KEY-----

let pem = parse(SAMPLE)?;
println!("PEM tag: {}", pem.tag);

Documentation

Module documentation with examples

Download Details:
Author: jcreekmore
Source Code: https://github.com/jcreekmore/pem-rs
License: MIT license

#rust  #rustlang  #encode 

A Rust Library for Parsing and Encoding PEM-encoded Data
Awesome  Rust

Awesome Rust

1654898700

MessagePack Implementation for Rust

RMP - Rust MessagePack

RMP is a pure Rust MessagePack implementation.

This repository consists of three separate crates: the RMP core and two implementations to ease serializing and deserializing Rust structs.

Features

Convenient API

RMP is designed to be lightweight and straightforward. There are low-level API, which gives you full control on data encoding/decoding process and makes no heap allocations. On the other hand there are high-level API, which provides you convenient interface using Rust standard library and compiler reflection, allowing to encode/decode structures using derive attribute.

Zero-copy value decoding

RMP allows to decode bytes from a buffer in a zero-copy manner easily and blazingly fast, while Rust static checks guarantees that the data will be valid as long as the buffer lives.

Clear error handling

RMP's error system guarantees that you never receive an error enum with unreachable variant.

Robust and tested

This project is developed using TDD and CI, so any found bugs will be fixed without breaking existing functionality.

Requirements

  • Rust 1.53.0 or later

Learn More

crates.rsAPI Documentation
[![rmp][crates-rmp-img]][crates-rmp-url][RMP][rmp-docs-url]
[![rmps][crates-rmps-img]][crates-rmps-url][RMP Serde][rmps-docs-url]
[![rmpv][crates-rmpv-img]][crates-rmpv-url][RMP Value][rmpv-docs-url]

Download Details:
Author: 3Hren
Source Code: https://github.com/3Hren/msgpack-rust
License: MIT license

#rust  #rustlang  #encode 

MessagePack Implementation for Rust