1658820180
This is a tool to convert Emoji Unicode codepoints into the human-friendly codes used by http://www.emoji-cheat-sheet.com/ and back again.
By doing this, you can ensure that users across devices can see the authorโs intention. You can always show users an image, but you canโt show them a range of characters their system does not support.
This gem is primarily for handling emoji characters in user-generated content. Depending on your technical stack, these characters could end up being lost.
Rumoji.encode(str)
# Takes a String, transforms Emoji into cheat-sheet codes
Rumoji.encode(str) { |emoji| #your code here }
# Takes a String, transforms Emoji into whatever you want
Rumoji.decode(str)
# Does the reverse of encode
Rumoji.encode_io(read, write)
# For an IO pipe (a read stream, and a write stream), transform Emoji from the
# read end, and write the cheat-sheet codes on the write end.
Rumoji.decode_io(read, write)
# Same thing but in reverse!
gem install rumoji
Note that rumoji has only been tested in Rubies >= 1.9!!!
puts Rumoji.encode("Lack of cross-device emoji support makes me ๐ญ")
#=> Lack of cross-device emoji support makes me :sob:
Rumoji.encode_io(StringIO.new("๐ฉ")).string
#=> ":poop:"
Here's a fun file:
Rumoji.decode_io($stdin, $stdout)
On the command line
echo "But Rumoji makes encoding issues a :joy:" | ruby ./funfile.rb
#=> But Rumoji makes encoding issues a ๐
The symbol of the emoji surrounded with colons
Rumoji.encode("๐ญ") {|emoji| emoji.code}
#=> ":sob:"
The symbol of the emoji
Rumoji.encode("๐ญ") {|emoji| emoji.code}
#=> "sob"
Returns true if the emoji is made up of multiple code points. E.g. ๐บ๐ธ
Rumoji.encode("๐บ๐ธ") {|emoji| emoji.multiple?}
#=> true
The raw emoji
Rumoji.encode("๐ญ") {|emoji| emoji.string}
#=> "๐ญ"
Implement the emoji codes from emoji-cheat-sheet.com using a tool like gemoji along with Rumoji, and you'll easily be able to transform user input with raw emoji unicode into images you can show to all users.
Having trouble discerning what's happening in this README? You might be on a device with NO emoji support! All the more reason to use Rumoji. Transcode the raw unicode into something users can understand across devices!
Thanks!
Why would you want to do this? Read this blog post: http://mwunsch.tumblr.com/post/34721548842/we-need-to-talk-about-emoji
Author: Mwunsch
Source Code: https://github.com/mwunsch/rumoji
License: MIT license
1657199580
lexicographic-integer-encoding
Lexicographically ordered integers for level(up)
. Wraps lexicographic-integer
.
level
const level = require('level')
const lexint = require('lexicographic-integer-encoding')('hex')
const db = level('./db', { keyEncoding: lexint })
db.put(2, 'example', (err) => {
db.put(10, 'example', (err) => {
// Without our encoding, the keys would sort as 10, 2.
db.createKeyStream().on('data', console.log) // 2, 10
})
})
levelup
const levelup = require('levelup')
const encode = require('encoding-down')
const leveldown = require('leveldown')
const lexint = require('lexicographic-integer-encoding')('hex')
const db = levelup(encode(leveldown('./db'), { keyEncoding: lexint }))
lexint = require('lexicographic-integer-encoding')(encoding, [options])
encoding
(string, required): 'hex'
or 'buffer'
options.strict
(boolean): opt-in to type-checking input. If true, encode will throw:TypeError
if input is not a number or if NaN
RangeError
if input is < 0 or > Number.MAX_SAFE_INTEGER
Returns a level-codec
compliant encoding object.
lexicographic-integer
: main encoding logicunique-lexicographic-integer
: lexicographic-integer
plus a suffix if input is the same as the last call;monotonic-lexicographic-timestamp
: unique-lexicographic-integer
with Date.now()
to get a monotonically increasing timestamp with lexicographic order.With npm do:
npm install lexicographic-integer-encoding
Author: vweevers
Source Code: https://github.com/vweevers/lexicographic-integer-encoding
License: MIT license
1654979700
STFU-8 is a hacky text encoding/decoding protocol for data that might be not quite UTF-8 but is still mostly UTF-8. It is based on the syntax of the repr
created when you write (or print) binary text in rust, python, C or other common programming languages.
Its primary purpose is to be able to allow a human to visualize and edit "data" that is mostly (or fully) visible UTF-8 text. It encodes all non visible or non UTF-8 compliant bytes as longform text (i.e. ESC becomes the full string r"\x1B"
). It can also encode/decode ill-formed UTF-16.
Comparision to other formats:
std::str
): UTF-8 is a standardized format for encoding human understandable text in any language on the planet. It is the reason the internet can be understood by almost anyone and should be the primary way that text is encoded. However, not everything that is "UTF-8 like" follows the standard exactly. For instance:base64
): also encodes binary data as UTF-8. If your data is actually binary (i.e. not text) then use base64. However, if your data was formerly text (or mostly text) then encoding to base64 will make it completely un(human)readable.[0x72, 0x65, 0x61, 0x64, 0x20, 0x69, 0x74]
is not human readable. Even if it were in pure ASCII the only ones who can read it efficiently are low-level programming Gods who have never figured out how to debug-print their ASCII.Specification
In simple terms, encoded STFU-8 is itself always valid unicode which decodes to binary (the binary is not necessarily UTF-8). It differs from unicode in that single \
items are illegal. The following patterns are legal:
\\
: decodes to the backward-slash (\
) byte (\x5c
)\t
: decodes to the tab byte (\x09
)\n
: decodes to the newline byte (\x0A
)\r
: decodes to the linefeed byte (\x0D
)\xXX
where XX are exactly two case-insensitive hexidecimal digits: decodes to the \xXX
byte, where XX
is a hexidecimal number (example: \x9F
, \xaB
or \x05
). This never gets resolved into a code point, the value is pushed directly into the decoder stream.\uXXXXXX
where XXXXXX
are exacty six case-insensitive hexidecimal digits, decodes to a 24bit number that typically represenents a unicode code point. If the value is a unicode code point it will always be decoded as such. Otherwise stfu8
will attempt to store the value into the decoder (if the value is too large for the decoding type it will be an error).stfu8
provides 2 different categories of functions for encoding/decoding data that are not necessarily interoperable (don't decode output created from encode_u8
with decode_u16
).
encode_u8(&[u8]) -> String
and decode_u8(&str) -> Vec<u8>
: encodes or decodes an array of u8
values to/from STFU-8, primarily used for interfacing with binary/nonvisible data that is almost UTF-8.encode_u16(&[u16]) -> String
and decode_u16(&str) -> Vec<u16>
: encodes or decodes an array of u16
values to/from STFU-8, primarily used for interfacing with legacy UTF-16 formats that may contain [ill formed text][utf-16-ill-formed-text] but also converts unprintable characters.There are some general rules for encoding and decoding:
\u...
cannot be resolved into a valid UTF code point it must fit into the decoder. For instance, trying to decode "\u00DEED"
(which is an UTF-16 Trail surrogage) using decode_u8
will fail, but will succeed with decode_u16
."\x01\x02"
will be [0x01, 0x02]
not [0x0102]
-- even if you use decode_u16
.\x...
are always copied verbatum into the decoder. I.e. \xFF
is a valid UTF-32 code point, but if decoded with decode_u8
it will be 0xFE
in the buffer, not two bytes of data as the UTF-8 character 'รพ'
. Note that with decode_u16
0xFE
is a valid UTF-16 code point, so when re-encoded would be the 'รพ'
character. Moral of the story: don't mix inputs/outputs of the the u8
and u16
functions.tab, newline, and line-feed characters are "visible", so encoding with them in "pretty form" is optional.
The problem is succinctly stated here:
http://unicode.org/faq/utf_bom.html
Q: How do I convert an unpaired UTF-16 surrogate to UTF-8?
A different issue arises if an unpairedsurrogate is encountered when converting ill-formed UTF-16 data. By represented such an unpaired surrogate on its own as a 3-byte sequence, the resulting UTF-8 data stream would become ill-formed. While it faithfully reflects the nature of the input, Unicode conformance requires that encoding form conversion always results in valid data stream. Therefore a convertermust treat this as an error. [AF]
Also, from the WTF-8 spec
As a result, [unpaired] surrogates do occur in practice and need to be preserved. For example:
In ECMAScript (a.k.a. JavaScript), a String value is defined as a sequence of 16-bit integers that usually represents UTF-16 text but may or may not be well-formed. Windows applications normally use UTF-16, but the file system treats path and file names as an opaque sequence of WCHARs (16-bit code units).
We say that strings in these systems are encoded in potentially ill-formed UTF-16 or WTF-16.
Basically: you can't (always) convert from UTF-16 to UTF-8 and it's a real bummer. WTF-8, while kindof an answer to this problem, doesn't allow me to serialize UTF-16 into a UTF-8 format, send it to my webapp, edit it (as a human), and send it back. That is what STFU-8 is for.
Download Details:
Author: vitiral
Source Code: https://github.com/vitiral/stfu8
License: Unknown, MIT licenses found
1654972380
yaml-rust
The missing YAML 1.2 implementation for Rust.
yaml-rust
is a pure Rust YAML 1.2 implementation, which enjoys the memory safety property and other benefits from the Rust language. The parser is heavily influenced by libyaml
and yaml-cpp
.
Add the following to the Cargo.toml of your project:
[dependencies]
yaml-rust = "0.4"
and import:
extern crate yaml_rust;
Use yaml::YamlLoader
to load the YAML documents and access it as Vec/HashMap:
extern crate yaml_rust;
use yaml_rust::{YamlLoader, YamlEmitter};
fn main() {
let s =
"
foo:
- list1
- list2
bar:
- 1
- 2.0
";
let docs = YamlLoader::load_from_str(s).unwrap();
// Multi document support, doc is a yaml::Yaml
let doc = &docs[0];
// Debug support
println!("{:?}", doc);
// Index access for map & array
assert_eq!(doc["foo"][0].as_str().unwrap(), "list1");
assert_eq!(doc["bar"][1].as_f64().unwrap(), 2.0);
// Chained key/array access is checked and won't panic,
// return BadValue if they are not exist.
assert!(doc["INVALID_KEY"][100].is_badvalue());
// Dump the YAML object
let mut out_str = String::new();
{
let mut emitter = YamlEmitter::new(&mut out_str);
emitter.dump(doc).unwrap(); // dump the YAML object to a String
}
println!("{}", out_str);
}
Note that yaml_rust::Yaml
implements Index<&'a str>
& Index<usize>
:
Index<usize>
assumes the container is an ArrayIndex<&'a str>
assumes the container is a string to value MapYaml::BadValue
is returnedIf your document does not conform to this convention (e.g. map with complex type key), you can use the Yaml::as_XXX
family API to access your documents.
This implementation aims to provide YAML parser fully compatible with the YAML 1.2 specification. The parser can correctly parse almost all examples in the specification, except for the following known bugs:
However, the widely used library libyaml
also fails to parse these examples, so it may not be a huge problem for most users.
This crate's minimum supported rustc
version is 1.31 (released with Rust 2018, after v0.4.3), as this is the currently known minimum version for regex
as well.
Fork & PR on Github.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Download Details:
Author: chyh1990
Source Code: https://github.com/chyh1990/yaml-rust
License: Apache-2.0, MIT licenses found
1654965000
quick-xml
High performance xml pull reader/writer.
The reader:
Cow
whenever possible)encoding
feature), namespaces resolution, special characters.use quick_xml::Reader;
use quick_xml::events::Event;
let xml = r#"<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>
Test 2
</tag2>
</tag1>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();
// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
// NOTE: this is the generic case when we don't know about the input BufRead.
// when the input is a &str or a &[u8], we don't actually need to use another
// buffer, we could directly call `reader.read_event_unbuffered()`
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) => {
match e.name() {
b"tag1" => println!("attributes values: {:?}",
e.attributes().map(|a| a.unwrap().value).collect::<Vec<_>>()),
b"tag2" => count += 1,
_ => (),
}
},
Ok(Event::Text(e)) => txt.push(e.unescape_and_decode(&reader).unwrap()),
Ok(Event::Eof) => break, // exits the loop when reaching end of file
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
_ => (), // There are several other `Event`s we do not consider here
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
}
use quick_xml::Writer;
use quick_xml::Reader;
use quick_xml::events::{Event, BytesEnd, BytesStart};
use std::io::Cursor;
use std::iter;
let xml = r#"<this_tag k1="v1" k2="v2"><child>text</child></this_tag>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut writer = Writer::new(Cursor::new(Vec::new()));
let mut buf = Vec::new();
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) if e.name() == b"this_tag" => {
// crates a new element ... alternatively we could reuse `e` by calling
// `e.into_owned()`
let mut elem = BytesStart::owned(b"my_elem".to_vec(), "my_elem".len());
// collect existing attributes
elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));
// copy existing attributes, adds a new my-key="some value" attribute
elem.push_attribute(("my-key", "some value"));
// writes the event to the writer
assert!(writer.write_event(Event::Start(elem)).is_ok());
},
Ok(Event::End(ref e)) if e.name() == b"this_tag" => {
assert!(writer.write_event(Event::End(BytesEnd::borrowed(b"my_elem"))).is_ok());
},
Ok(Event::Eof) => break,
// you can use either `e` or `&e` if you don't want to move the event
Ok(e) => assert!(writer.write_event(&e).is_ok()),
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
}
buf.clear();
}
let result = writer.into_inner().into_inner();
let expected = r#"<my_elem k1="v1" k2="v2" my-key="some value"><child>text</child></my_elem>"#;
assert_eq!(result, expected.as_bytes());
When using the serialize
feature, quick-xml can be used with serde's Serialize
/Deserialize
traits.
Here is an example deserializing crates.io source:
// Cargo.toml
// [dependencies]
// serde = { version = "1.0", features = [ "derive" ] }
// quick-xml = { version = "0.22", features = [ "serialize" ] }
use serde::Deserialize;
use quick_xml::de::{from_str, DeError};
#[derive(Debug, Deserialize, PartialEq)]
struct Link {
rel: String,
href: String,
sizes: Option<String>,
}
#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Lang {
En,
Fr,
De,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Head {
title: String,
#[serde(rename = "link", default)]
links: Vec<Link>,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Script {
src: String,
integrity: String,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Body {
#[serde(rename = "script", default)]
scripts: Vec<Script>,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Html {
lang: Option<String>,
head: Head,
body: Body,
}
fn crates_io() -> Result<Html, DeError> {
let xml = "<!DOCTYPE html>
<html lang=\"en\">
<head>
<meta charset=\"utf-8\">
<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">
<title>crates.io: Rust Package Registry</title>
<!-- EMBER_CLI_FASTBOOT_TITLE --><!-- EMBER_CLI_FASTBOOT_HEAD -->
<link rel=\"manifest\" href=\"/manifest.webmanifest\">
<link rel=\"apple-touch-icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" sizes=\"227x227\">
<link rel=\"stylesheet\" href=\"/assets/vendor-8d023d47762d5431764f589a6012123e.css\" integrity=\"sha256-EoB7fsYkdS7BZba47+C/9D7yxwPZojsE4pO7RIuUXdE= sha512-/SzGQGR0yj5AG6YPehZB3b6MjpnuNCTOGREQTStETobVRrpYPZKneJwcL/14B8ufcvobJGFDvnTKdcDDxbh6/A==\" >
<link rel=\"stylesheet\" href=\"/assets/cargo-cedb8082b232ce89dd449d869fb54b98.css\" integrity=\"sha256-S9K9jZr6nSyYicYad3JdiTKrvsstXZrvYqmLUX9i3tc= sha512-CDGjy3xeyiqBgUMa+GelihW394pqAARXwsU+HIiOotlnp1sLBVgO6v2ZszL0arwKU8CpvL9wHyLYBIdfX92YbQ==\" >
<link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\">
<link rel=\"icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" type=\"image/png\">
<link rel=\"search\" href=\"/opensearch.xml\" type=\"application/opensearchdescription+xml\" title=\"Cargo\">
</head>
<body>
<!-- EMBER_CLI_FASTBOOT_BODY -->
<noscript>
<div id=\"main\">
<div class='noscript'>
This site requires JavaScript to be enabled.
</div>
</div>
</noscript>
<script src=\"/assets/vendor-bfe89101b20262535de5a5ccdc276965.js\" integrity=\"sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==\" ></script>
<script src=\"/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js\" integrity=\"sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==\" ></script>
</body>
</html>
}";
let html: Html = from_str(xml)?;
assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
Ok(html)
}
This has largely been inspired by serde-xml-rs. quick-xml follows its convention for deserialization, including the $value
special name.
Original quick-xml was developed by @tafia and abandoned around end of 2021.
If you have an input of the form <foo abc="xyz">bar</foo>
, and you want to get at the bar
, you can use the special name $value
:
struct Foo {
pub abc: String,
#[serde(rename = "$value")]
pub body: String,
}
If your XML files look like <root><first>value</first><second>value</second></root>
, you can (de)serialize them with the special name prefix $unflatten=
:
struct Root {
#[serde(rename = "$unflatten=first")]
first: String,
#[serde(rename = "$unflatten=second")]
other_field: String,
}
The $primitive
prefix lets you serialize enum variants without associated values (internally referred to as unit variants) as primitive strings rather than self-closing tags. Consider the following definitions:
enum Foo {
#[serde(rename = "$primitive=Bar")]
Bar
}
struct Root {
foo: Foo
}
Serializing Root { foo: Foo::Bar }
will then yield <Root foo="Bar"/>
instead of <Root><Bar/></Root>
.
Note that despite not focusing on performance (there are several unnecessary copies), it remains about 10x faster than serde-xml-rs.
Features
encoding
: support non utf8 xmlsserialize
: support serde Serialize
/Deserialize
Benchmarking is hard and the results depend on your input file and your machine.
Here on my particular file, quick-xml is around 50 times faster than xml-rs crate. (measurements was done while this crate named quick-xml)
// quick-xml benches
test bench_quick_xml ... bench: 198,866 ns/iter (+/- 9,663)
test bench_quick_xml_escaped ... bench: 282,740 ns/iter (+/- 61,625)
test bench_quick_xml_namespaced ... bench: 389,977 ns/iter (+/- 32,045)
// same bench with xml-rs
test bench_xml_rs ... bench: 14,468,930 ns/iter (+/- 321,171)
// serde-xml-rs vs serialize feature
test bench_serde_quick_xml ... bench: 1,181,198 ns/iter (+/- 138,290)
test bench_serde_xml_rs ... bench: 15,039,564 ns/iter (+/- 783,485)
For a feature and performance comparison, you can also have a look at RazrFalcon's parser comparison table.
Any PR is welcomed!
Syntax is inspired by xml-rs.
Download Details:
Author: tafia
Source Code: https://github.com/tafia/quick-xml
License: MIT license
1654957680
SXD-XPath
An XML XPath library in Rust
The project is broken into two crates:
document
- Basic DOM manipulation and reading/writing XML from strings.xpath
- Implementation of XPath 1.0 expressions.There are also scattered utilities for playing around at the command line.
In the future, I hope to add support for XSLT 1.0.
This project has a lofty goal: replace libxml and libxslt.
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.
Download Details:
Author: shepmaster
Source Code: https://github.com/shepmaster/sxd-xpath
License: Apache-2.0, MIT licenses found
1654950120
An XML library in Rust.
The project is currently broken into two crates:
document
- Basic DOM manipulation and reading/writing XML from strings.xpath
- Implementation of XPath 1.0 expressions.There are also scattered utilities for playing around at the command line.
In the future, I hope to add support for XSLT 1.0.
This project has two goals, one more achievable than the other:
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)Download Details:
Author: shepmaster
Source Code: https://github.com/shepmaster/sxd-document
License: MIT license
1654942800
xml-rs is an XML library for Rust programming language. It is heavily inspired by Java Streaming API for XML (StAX).
This library currently contains pull parser much like StAX event reader. It provides iterator API, so you can leverage Rust's existing iterators library features.
It also provides a streaming document writer much like StAX event writer. This writer consumes its own set of events, but reader events can be converted to writer events easily, and so it is possible to write XML transformation chains in a pretty clean manner.
This parser is mostly full-featured, however, there are limitations:
<!DOCTYPE>
declarations are completely ignored; thus no support for custom entities too; internal DTD declarations are likely to cause parsing errors;Other than that the parser tries to be mostly XML-1.0-compliant.
Writer is also mostly full-featured with the following limitations:
<!DOCTYPE>
declarations;What is planned (highest priority first, approximately):
xml-rs uses Cargo, so just add a dependency section in your project's manifest:
[dependencies]
xml-rs = "0.8"
The package exposes a single crate called xml
:
extern crate xml;
xml::reader::EventReader
requires a Read
instance to read from. When a proper stream-based encoding library is available, it is likely that xml-rs will be switched to use whatever character stream structure this library would provide, but currently it is a Read
.
Using EventReader
is very straightforward. Just provide a Read
instance to obtain an iterator over events:
extern crate xml;
use std::fs::File;
use std::io::BufReader;
use xml::reader::{EventReader, XmlEvent};
fn indent(size: usize) -> String {
const INDENT: &'static str = " ";
(0..size).map(|_| INDENT)
.fold(String::with_capacity(size*INDENT.len()), |r, s| r + s)
}
fn main() {
let file = File::open("file.xml").unwrap();
let file = BufReader::new(file);
let parser = EventReader::new(file);
let mut depth = 0;
for e in parser {
match e {
Ok(XmlEvent::StartElement { name, .. }) => {
println!("{}+{}", indent(depth), name);
depth += 1;
}
Ok(XmlEvent::EndElement { name }) => {
depth -= 1;
println!("{}-{}", indent(depth), name);
}
Err(e) => {
println!("Error: {}", e);
break;
}
_ => {}
}
}
}
EventReader
implements IntoIterator
trait, so you can just use it in a for
loop directly. Document parsing can end normally or with an error. Regardless of exact cause, the parsing process will be stopped, and iterator will terminate normally.
You can also have finer control over when to pull the next event from the parser using its own next()
method:
match parser.next() {
...
}
Upon the end of the document or an error the parser will remember that last event and will always return it in the result of next()
call afterwards. If iterator is used, then it will yield error or end-of-document event once and will produce None
afterwards.
It is also possible to tweak parsing process a little using xml::reader::ParserConfig
structure. See its documentation for more information and examples.
You can find a more extensive example of using EventReader
in src/analyze.rs
, which is a small program (BTW, it is built with cargo build
and can be run after that) which shows various statistics about specified XML document. It can also be used to check for well-formedness of XML documents - if a document is not well-formed, this program will exit with an error.
xml-rs also provides a streaming writer much like StAX event writer. With it you can write an XML document to any Write
implementor.
extern crate xml;
use std::fs::File;
use std::io::{self, Write};
use xml::writer::{EventWriter, EmitterConfig, XmlEvent, Result};
fn handle_event<W: Write>(w: &mut EventWriter<W>, line: String) -> Result<()> {
let line = line.trim();
let event: XmlEvent = if line.starts_with("+") && line.len() > 1 {
XmlEvent::start_element(&line[1..]).into()
} else if line.starts_with("-") {
XmlEvent::end_element().into()
} else {
XmlEvent::characters(&line).into()
};
w.write(event)
}
fn main() {
let mut file = File::create("output.xml").unwrap();
let mut input = io::stdin();
let mut output = io::stdout();
let mut writer = EmitterConfig::new().perform_indent(true).create_writer(&mut file);
loop {
print!("> "); output.flush().unwrap();
let mut line = String::new();
match input.read_line(&mut line) {
Ok(0) => break,
Ok(_) => match handle_event(&mut writer, line) {
Ok(_) => {}
Err(e) => panic!("Write error: {}", e)
},
Err(e) => panic!("Input error: {}", e)
}
}
}
The code example above also demonstrates how to create a writer out of its configuration. Similar thing also works with EventReader
.
The library provides an XML event building DSL which helps to construct complex events, e.g. ones having namespace definitions. Some examples:
// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")
// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")
// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")
Of course, one can create XmlEvent
enum variants directly instead of using the builder DSL. There are more examples in xml::writer::XmlEvent
documentation.
The writer has multiple configuration options; see EmitterConfig
documentation for more information.
No performance tests or measurements are done. The implementation is rather naive, and no specific optimizations are made. Hopefully the library is sufficiently fast to process documents of common size. I intend to add benchmarks in future, but not until more important features are added.
All known issues are present on GitHub issue tracker: http://github.com/netvl/xml-rs/issues. Feel free to post any found problems there.
Download Details:
Author: netvl
Source Code: https://github.com/netvl/xml-rs
License: MIT license
1654935480
Yet Another Serializer/Deserializer specialized for XML
This library will support XML de/ser-ializing with all specific features.
Any type can define a custom deserializer and/or serializer. To implement it, define the implementation of YaDeserialize/YaSerialize
impl YaDeserialize for MyType {
fn deserialize<R: Read>(reader: &mut yaserde::de::Deserializer<R>) -> Result<Self, String> {
// deserializer code
}
}
impl YaSerialize for MyType {
fn serialize<W: Write>(&self, writer: &mut yaserde::ser::Serializer<W>) -> Result<(), String> {
// serializer code
}
}
Download Details:
Author: media-io
Source Code: https://github.com/media-io/yaserde
License: MIT license
1654928100
RustyXML is a namespace aware XML parser written in Rust. Right now it provides a basic SAX-like API, and an ElementBuilder based on that.
The parser itself is derived from OFXMLParser as found in ObjFW https://webkeks.org/objfw/.
The current limitations are:
The Minimal Supported Rust Version for this crate is Rust 1.40.0.
Parse a string into an Element
struct:
use xml::Element;
let elem: Option<Element> = "<a href='//example.com'/>".parse();
Get events from parsing string data:
use xml::{Event, Parser};
// Create a new Parser
let mut p = Parser::new();
// Feed data to be parsed
p.feed_str("<a href");
p.feed_str("='//example.com'/>");
// Get events for the fed data
for event in p {
match event.unwrap() {
Event::ElementStart(tag) => println!("<{}>", tag.name),
Event::ElementEnd(tag) => println!("</{}>", tag.name),
_ => ()
}
}
This should print:
<a>
</a>
Build Element
s from Parser
Event
s:
use xml::{Parser, ElementBuilder};
let mut p = xml::Parser::new();
let mut e = xml::ElementBuilder::new();
p.feed_str("<a href='//example.com'/>");
for elem in p.filter_map(|x| e.handle_event(x)) {
match elem {
Ok(e) => println!("{}", e),
Err(e) => println!("{}", e),
}
}
Build Element
s by hand:
let mut reply = xml::Element::new("iq".into(), Some("jabber:client".into()),
vec![("type".into(), None, "error".into()),
("id".into(), None, "42".into())]);
reply.tag(xml::Element::new("error".into(), Some("jabber:client".into()),
vec![("type".into(), None, "cancel".into())]))
.tag_stay(xml::Element::new("forbidden".into(),
Some("urn:ietf:params:xml:ns:xmpp-stanzas".into()),
vec![]))
.tag(xml::Element::new("text".into(),
Some("urn:ietf:params:xml:ns:xmpp-stanzas".into()),
vec![]))
.text("Permission denied".into());
Result (some whitespace added for readability):
<iq xmlns='jabber:client' id='42' type='error'>
<error type='cancel'>
<forbidden xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
<text xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'>Permission denied</text>
</error>
</iq>
By default the order of attributes is not tracked. Therefore during serialization and iteration their order will be random. This can be changed by enabling the ordered_attrs
feature. With this feature enabled the order attributes were encountered while parsing, or added to an Element
will be preserved.
Download Details:
Author: Florob
Source Code: https://github.com/Florob/RustyXML
License: Apache-2.0, MIT licenses found
1654920780
Rusty Object Notation
RON is a simple readable data serialization format that looks similar to Rust syntax. It's designed to support all of Serde's data model, so structs, enums, tuples, arrays, generic maps, and primitive values.
GameConfig( // optional struct name
window_size: (800, 600),
window_title: "PAC-MAN",
fullscreen: false,
mouse_sensitivity: 1.4,
key_bindings: {
"up": Up,
"down": Down,
"left": Left,
"right": Right,
// Uncomment to enable WASD controls
/*
"W": Up,
"A": Down,
"S": Left,
"D": Right,
*/
},
difficulty_options: (
start_difficulty: Easy,
adaptive: false,
),
)
{
"materials": {
"metal": {
"reflectivity": 1.0
},
"plastic": {
"reflectivity": 0.5
}
},
"entities": [
{
"name": "hero",
"material": "metal"
},
{
"name": "monster",
"material": "plastic"
}
]
}
Scene( // class name is optional
materials: { // this is a map
"metal": (
reflectivity: 1.0,
),
"plastic": (
reflectivity: 0.5,
),
},
entities: [ // this is an array
(
name: "hero",
material: "metal",
),
(
name: "monster",
material: "plastic",
),
],
)
Note the following advantages of RON over JSON:
42
, 3.14
, 0xFF
, 0b0110
"Hello"
, "with\\escapes\n"
, r#"raw string, great for regex\."#
true
, false
'e'
, '\n'
Some("string")
, Some(Some(1.34))
, None
("abc", 1.23, true)
, ()
["abc", "def"]
( foo: 1.0, bar: ( baz: "I'm nested" ) )
{ "arbitrary": "keys", "are": "allowed" }
Note: Serde's data model represents fixed-size Rust arrays as tuple (instead of as list)
Cargo.toml
[dependencies]
ron = "0.7"
serde = { version = "1", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize, Serialize)]
struct MyStruct {
boolean: bool,
float: f32,
}
fn main() {
let x: MyStruct = ron::from_str("(boolean: true, float: 1.23)").unwrap();
println!("RON: {}", ron::to_string(&x).unwrap());
}
Editor | Plugin |
---|---|
IntelliJ | intellij-ron |
VS Code | a5huynh/vscode-ron |
Sublime Text | RON |
Atom | language-ron |
Vim | ron-rs/ron.vim |
EMACS | [emacs-ron] |
There is a very basic, work in progress specification available on the wiki page. A more formal and complete grammar is available here.
Download Details:
Author: ron-rs
Source Code: https://github.com/ron-rs/ron
License: Apache-2.0, MIT licenses found
1654913400
prost
is a Protocol Buffers implementation for the Rust Language. prost
generates simple, idiomatic Rust code from proto2
and proto3
files.
Compared to other Protocol Buffers implementations, prost
derive
attributes..proto
files in generated Rust code..proto
) to be serialized and deserialized by adding attributes.bytes::{Buf, BufMut}
abstractions for serialization instead of std::io::{Read, Write}
.package
specifier when organizing generated code into Rust modules.prost
in a Cargo ProjectFirst, add prost
and its public dependencies to your Cargo.toml
:
[dependencies]
prost = "0.10"
# Only necessary if using Protobuf well-known types:
prost-types = "0.10"
The recommended way to add .proto
compilation to a Cargo project is to use the prost-build
library. See the prost-build
documentation for more details and examples.
See the snazzy repository for a simple start-to-finish example.
prost
generates Rust code from source .proto
files using the proto2
or proto3
syntax. prost
's goal is to make the generated code as simple as possible.
protoc
It's recommended to install protoc
locally in your path to improve build times. Prost uses protoc
to parse protobuf files and will attempt to compile protobuf from source requiring a C++ toolchain. For more info checkout the prost-build
docs.
Prost can now generate code for .proto
files that don't have a package spec. prost
will translate the Protobuf package into a Rust module. For example, given the package
specifier:
package foo.bar;
All Rust types generated from the file will be in the foo::bar
module.
Given a simple message declaration:
// Sample message.
message Foo {
}
prost
will generate the following Rust struct:
/// Sample message.
#[derive(Clone, Debug, PartialEq, Message)]
pub struct Foo {
}
Fields in Protobuf messages are translated into Rust as public struct fields of the corresponding type.
Scalar value types are converted as follows:
Protobuf Type | Rust Type |
---|---|
double | f64 |
float | f32 |
int32 | i32 |
int64 | i64 |
uint32 | u32 |
uint64 | u64 |
sint32 | i32 |
sint64 | i64 |
fixed32 | u32 |
fixed64 | u64 |
sfixed32 | i32 |
sfixed64 | i64 |
bool | bool |
string | String |
bytes | Vec<u8> |
All .proto
enumeration types convert to the Rust i32
type. Additionally, each enumeration type gets a corresponding Rust enum
type. For example, this proto
enum:
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
gets this corresponding Rust enum [1]:
pub enum PhoneType {
Mobile = 0,
Home = 1,
Work = 2,
}
You can convert a PhoneType
value to an i32
by doing:
PhoneType::Mobile as i32
The #[derive(::prost::Enumeration)]
annotation added to the generated PhoneType
adds these associated functions to the type:
impl PhoneType {
pub fn is_valid(value: i32) -> bool { ... }
pub fn from_i32(value: i32) -> Option<PhoneType> { ... }
}
so you can convert an i32
to its corresponding PhoneType
value by doing, for example:
let phone_type = 2i32;
match PhoneType::from_i32(phone_type) {
Some(PhoneType::Mobile) => ...,
Some(PhoneType::Home) => ...,
Some(PhoneType::Work) => ...,
None => ...,
}
Additionally, wherever a proto
enum is used as a field in a Message
, the message will have 'accessor' methods to get/set the value of the field as the Rust enum type. For instance, this proto PhoneNumber
message that has a field named type
of type PhoneType
:
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
will become the following Rust type [1] with methods type
and set_type
:
pub struct PhoneNumber {
pub number: String,
pub r#type: i32, // the `r#` is needed because `type` is a Rust keyword
}
impl PhoneNumber {
pub fn r#type(&self) -> PhoneType { ... }
pub fn set_type(&mut self, value: PhoneType) { ... }
}
Note that the getter methods will return the Rust enum's default value if the field has an invalid i32
value.
The enum
type isn't used directly as a field, because the Protobuf spec mandates that enumerations values are 'open', and decoding unrecognized enumeration values must be possible.
[1] Annotations have been elided for clarity. See below for a full example.
Protobuf scalar value and enumeration message fields can have a modifier depending on the Protobuf version. Modifiers change the corresponding type of the Rust field:
.proto Version | Modifier | Rust Type |
---|---|---|
proto2 | optional | Option<T> |
proto2 | required | T |
proto3 | default | T for scalar types, Option<T> otherwise |
proto3 | optional | Option<T> |
proto2 /proto3 | repeated | Vec<T> |
Note that in proto3
the default representation for all user-defined message types is Option<T>
, and for scalar types just T
(during decoding, a missing value is populated by T::default()
). If you need a witness of the presence of a scalar type T
, use the optional
modifier to enforce an Option<T>
representation in the generated Rust struct.
Map fields are converted to a Rust HashMap
with key and value type converted from the Protobuf key and value types.
Message fields are converted to the corresponding struct type. The table of field modifiers above applies to message fields, except that proto3
message fields without a modifier (the default) will be wrapped in an Option
. Typically message fields are unboxed. prost
will automatically box a message field if the field type and the parent type are recursively nested in order to avoid an infinite sized struct.
Oneof fields convert to a Rust enum. Protobuf oneof
s types are not named, so prost
uses the name of the oneof
field for the resulting Rust enum, and defines the enum in a module under the struct. For example, a proto3
message such as:
message Foo {
oneof widget {
int32 quux = 1;
string bar = 2;
}
}
generates the following Rust[1]:
pub struct Foo {
pub widget: Option<foo::Widget>,
}
pub mod foo {
pub enum Widget {
Quux(i32),
Bar(String),
}
}
oneof
fields are always wrapped in an Option
.
[1] Annotations have been elided for clarity. See below for a full example.
prost-build
allows a custom code-generator to be used for processing service
definitions. This can be used to output Rust traits according to an application's specific needs.
Example .proto
file:
syntax = "proto3";
package tutorial;
message Person {
string name = 1;
int32 id = 2; // Unique ID number for this person.
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
}
// Our address book file is just one of these.
message AddressBook {
repeated Person people = 1;
}
and the generated Rust code (tutorial.rs
):
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Person {
#[prost(string, tag="1")]
pub name: ::prost::alloc::string::String,
/// Unique ID number for this person.
#[prost(int32, tag="2")]
pub id: i32,
#[prost(string, tag="3")]
pub email: ::prost::alloc::string::String,
#[prost(message, repeated, tag="4")]
pub phones: ::prost::alloc::vec::Vec<person::PhoneNumber>,
}
/// Nested message and enum types in `Person`.
pub mod person {
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct PhoneNumber {
#[prost(string, tag="1")]
pub number: ::prost::alloc::string::String,
#[prost(enumeration="PhoneType", tag="2")]
pub r#type: i32,
}
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, ::prost::Enumeration)]
#[repr(i32)]
pub enum PhoneType {
Mobile = 0,
Home = 1,
Work = 2,
}
}
/// Our address book file is just one of these.
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct AddressBook {
#[prost(message, repeated, tag="1")]
pub people: ::prost::alloc::vec::Vec<Person>,
}
protoc
FileDescriptorSet
The prost_build::Config::file_descriptor_set_path
option can be used to emit a file descriptor set during the build & code generation step. When used in conjunction with the std::include_bytes
macro and the prost_types::FileDescriptorSet
type, applications and libraries using Prost can implement introspection capabilities requiring details from the original .proto
files.
prost
in a no_std
Crateprost
is compatible with no_std
crates. To enable no_std
support, disable the std
features in prost
and prost-types
:
[dependencies]
prost = { version = "0.6", default-features = false, features = ["prost-derive"] }
# Only necessary if using Protobuf well-known types:
prost-types = { version = "0.6", default-features = false }
Additionally, configure prost-build
to output BTreeMap
s instead of HashMap
s for all Protobuf map
fields in your build.rs
:
let mut config = prost_build::Config::new();
config.btree_map(&["."]);
When using edition 2015, it may be necessary to add an extern crate core;
directive to the crate which includes prost
-generated code.
prost
uses a custom derive macro to handle encoding and decoding types, which means that if your existing Rust type is compatible with Protobuf types, you can serialize and deserialize it by adding the appropriate derive and field annotations.
Currently the best documentation on adding annotations is to look at the generated code examples above.
Prost automatically infers tags for the struct.
Fields are tagged sequentially in the order they are specified, starting with 1
.
You may skip tags which have been reserved, or where there are gaps between sequentially occurring tag values by specifying the tag number to skip to with the tag
attribute on the first field after the gap. The following fields will be tagged sequentially starting from the next number.
use prost;
use prost::{Enumeration, Message};
#[derive(Clone, PartialEq, Message)]
struct Person {
#[prost(string, tag = "1")]
pub id: String, // tag=1
// NOTE: Old "name" field has been removed
// pub name: String, // tag=2 (Removed)
#[prost(string, tag = "6")]
pub given_name: String, // tag=6
#[prost(string)]
pub family_name: String, // tag=7
#[prost(string)]
pub formatted_name: String, // tag=8
#[prost(uint32, tag = "3")]
pub age: u32, // tag=3
#[prost(uint32)]
pub height: u32, // tag=4
#[prost(enumeration = "Gender")]
pub gender: i32, // tag=5
// NOTE: Skip to less commonly occurring fields
#[prost(string, tag = "16")]
pub name_prefix: String, // tag=16 (eg. mr/mrs/ms)
#[prost(string)]
pub name_suffix: String, // tag=17 (eg. jr/esq)
#[prost(string)]
pub maiden_name: String, // tag=18
}
#[derive(Clone, Copy, Debug, PartialEq, Eq, Enumeration)]
pub enum Gender {
Unknown = 0,
Female = 1,
Male = 2,
}
prost
be implemented as a serializer for Serde?Probably not, however I would like to hear from a Serde expert on the matter. There are two complications with trying to serialize Protobuf messages with Serde:
serde
.Vec<i32>
: repeated int32
, repeated sint32
, repeated sfixed32
, and their packed counterparts.But it is possible to place serde
derive tags onto the generated types, so the same structure can support both prost
and Serde
.
2. I get errors when trying to run cargo test
on MacOS
If the errors are about missing autoreconf
or similar, you can probably fix them by running
brew install automake
brew install libtool
Download Details:
Author: tokio-rs
Source Code: https://github.com/tokio-rs/prost
License: Apache-2.0 license
1654913280
Protobuf implementation in Rust.
Documentation is hosted on docs.rs.
Version 3 is current stable version. Compared to version 2 it implements:
.proto
file on the fly without code generation)Version is previous stable version. Only most critical bugfixes will be applied to 2.x version, otherwise it won't be maintained.
The crate needs help:
See CHANGELOG.md for a list of changes and compatility issues between versions.
Download Details:
Author: stepancheg
Source Code: https://github.com/stepancheg/rust-protobuf
License: MIT license
1654906020
A Rust library for parsing and encoding PEM-encoded data.
Add this to your Cargo.toml
:
[dependencies]
pem = "1.0"
and this to your crate root:
extern crate pem;
Here is a simple example that parse PEM-encoded data and prints the tag:
extern crate pem;
use pem::parse;
const SAMPLE: &'static str = "-----BEGIN RSA PRIVATE KEY-----
MIIBPQIBAAJBAOsfi5AGYhdRs/x6q5H7kScxA0Kzzqe6WI6gf6+tc6IvKQJo5rQc
dWWSQ0nRGt2hOPDO+35NKhQEjBQxPh/v7n0CAwEAAQJBAOGaBAyuw0ICyENy5NsO
2gkT00AWTSzM9Zns0HedY31yEabkuFvrMCHjscEF7u3Y6PB7An3IzooBHchsFDei
AAECIQD/JahddzR5K3A6rzTidmAf1PBtqi7296EnWv8WvpfAAQIhAOvowIXZI4Un
DXjgZ9ekuUjZN+GUQRAVlkEEohGLVy59AiEA90VtqDdQuWWpvJX0cM08V10tLXrT
TTGsEtITid1ogAECIQDAaFl90ZgS5cMrL3wCeatVKzVUmuJmB/VAmlLFFGzK0QIh
ANJGc7AFk4fyFD/OezhwGHbWmo/S+bfeAiIh2Ss2FxKJ
-----END RSA PRIVATE KEY-----
let pem = parse(SAMPLE)?;
println!("PEM tag: {}", pem.tag);
Module documentation with examples
Download Details:
Author: jcreekmore
Source Code: https://github.com/jcreekmore/pem-rs
License: MIT license
1654898700
RMP - Rust MessagePack
RMP is a pure Rust MessagePack implementation.
This repository consists of three separate crates: the RMP core and two implementations to ease serializing and deserializing Rust structs.
Convenient API
RMP is designed to be lightweight and straightforward. There are low-level API, which gives you full control on data encoding/decoding process and makes no heap allocations. On the other hand there are high-level API, which provides you convenient interface using Rust standard library and compiler reflection, allowing to encode/decode structures using derive
attribute.
Zero-copy value decoding
RMP allows to decode bytes from a buffer in a zero-copy manner easily and blazingly fast, while Rust static checks guarantees that the data will be valid as long as the buffer lives.
Clear error handling
RMP's error system guarantees that you never receive an error enum with unreachable variant.
Robust and tested
This project is developed using TDD and CI, so any found bugs will be fixed without breaking existing functionality.
crates.rs | API Documentation |
---|---|
[![rmp][crates-rmp-img]][crates-rmp-url] | [RMP][rmp-docs-url] |
[![rmps][crates-rmps-img]][crates-rmps-url] | [RMP Serde][rmps-docs-url] |
[![rmpv][crates-rmpv-img]][crates-rmpv-url] | [RMP Value][rmpv-docs-url] |
Download Details:
Author: 3Hren
Source Code: https://github.com/3Hren/msgpack-rust
License: MIT license