Treebender: A Symbolic Natural Language Parsing Library for Rust

Treebender

A symbolic natural language parsing library for Rust, inspired by HDPSG.

What is this?

This is a library for parsing natural or constructed languages into syntax trees and feature structures. There's no machine learning or probabilistic models, everything is hand-crafted and deterministic.

You can find out more about the motivations of this project in this blog post.

But what are you using it for?

I'm using this to parse a constructed language for my upcoming xenolinguistics game, Themengi.

Motivation

Using a simple 80-line grammar, introduced in the tutorial below, we can parse a simple subset of English, checking reflexive pronoun binding, case, and number agreement.

$ cargo run --bin cli examples/reflexives.fgr
> she likes himself
Parsed 0 trees

> her likes herself
Parsed 0 trees

> she like herself
Parsed 0 trees

> she likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-2: [
    case: acc
    pron: ref
    needs_pron: #0 she
    num: sg
    child-0: [ word: herself ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1 sg
  ]
  child-0: [
    child-0: [ word: she ]
    case: nom
    pron: #0
    num: #1
  ]
]

Low resource language? Low problem! No need to train on gigabytes of text, just write a grammar using your brain. Let's hypothesize that in American Sign Language, topicalized nouns (expressed with raised eyebrows) must appear first in the sentence. We can write a small grammar (18 lines), and plug in some sentences:

$ cargo run --bin cli examples/asl-wordorder.fgr -n
> boy sit
Parsed 1 tree
(0..2: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: IV (1..2: sit)))

> boy throw ball
Parsed 1 tree
(0..3: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: TV (1..2: throw))
  (2..3: NP ((2..3: N (2..3: ball)))))

> ball nm-raised-eyebrows boy throw
Parsed 1 tree
(0..4: S
  (0..2: NP
    (0..1: N (0..1: ball))
    (1..2: Topic (1..2: nm-raised-eyebrows)))
  (2..3: NP ((2..3: N (2..3: boy))))
  (3..4: TV (3..4: throw)))

> boy throw ball nm-raised-eyebrows
Parsed 0 trees

Tutorial

As an example, let's say we want to build a parser for English reflexive pronouns (himself, herself, themselves, themself, itself). We'll also support number ("He likes X" v.s. "They like X") and simple embedded clauses ("He said that they like X").

Grammar files are written in a custom language, similar to BNF, called Feature GRammar (.fgr). There's a VSCode syntax highlighting extension for these files available as fgr-syntax.

We'll start by defining our lexicon. The lexicon is the set of terminal symbols (symbols in the actual input) that the grammar will match. Terminal symbols must start with a lowercase letter, and non-terminal symbols must start with an uppercase letter.

// pronouns
N -> he
N -> him
N -> himself
N -> she
N -> her
N -> herself
N -> they
N -> them
N -> themselves
N -> themself

// names, lowercase as they are terminals
N -> mary
N -> sue
N -> takeshi
N -> robert

// complementizer
Comp -> that

// verbs -- intransitive, transitive, and clausal
IV -> falls
IV -> fall
IV -> fell

TV -> likes
TV -> like
TV -> liked

CV -> says
CV -> say
CV -> said

Next, we can add our sentence rules (they must be added at the top, as the first rule in the file is assumed to be the top-level rule):

// sentence rules
S -> N IV
S -> N TV N
S -> N CV Comp S

// ... previous lexicon ...

Assuming this file is saved as examples/no-features.fgr (which it is :wink:), we can test this file with the built-in CLI:

$ cargo run --bin cli examples/no-features.fgr
> he falls
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: falls)))
[
  child-1: [ child-0: [ word: falls ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he falls her
Parsed 0 trees

> he likes her
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: likes ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he likes
Parsed 0 trees

> he said that he likes her
Parsed 1 tree
(0..6: S
  (0..1: N (0..1: he))
  (1..2: CV (1..2: said))
  (2..3: Comp (2..3: that))
  (3..6: S
    (3..4: N (3..4: he))
    (4..5: TV (4..5: likes))
    (5..6: N (5..6: her))))
[
  child-0: [ child-0: [ word: he ] ]
  child-2: [ child-0: [ word: that ] ]
  child-1: [ child-0: [ word: said ] ]
  child-3: [
    child-2: [ child-0: [ word: her ] ]
    child-1: [ child-0: [ word: likes ] ]
    child-0: [ child-0: [ word: he ] ]
  ]
]

> he said that he
Parsed 0 trees

This grammar already parses some correct sentences, and blocks some trivially incorrect ones. However, it doesn't care about number, case, or reflexives right now:

> she likes himself  // unbound reflexive pronoun
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-0: [ child-0: [ word: she ] ]
  child-2: [ child-0: [ word: himself ] ]
  child-1: [ child-0: [ word: likes ] ]
]

> him like her  // incorrect case on the subject pronoun, should be nominative
                // (he) instead of accusative (him)
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: him))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-0: [ child-0: [ word: him ] ]
  child-1: [ child-0: [ word: like ] ]
  child-2: [ child-0: [ word: her ] ]
]

> he like her  // incorrect verb number agreement
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: like ] ]
  child-0: [ child-0: [ word: he ] ]
]

To fix this, we need to add features to our lexicon, and restrict the sentence rules based on features.

Features are added with square brackets, and are key: value pairs separated by commas. **top** is a special feature value, which basically means "unspecified" -- we'll come back to it later. Features that are unspecified are also assumed to have a **top** value, but sometimes explicitly stating top is more clear.

/// Pronouns
// The added features are:
// * num: sg or pl, whether this noun wants a singular verb (likes) or
//   a plural verb (like). note this is grammatical number, so for example
//   singular they takes plural agreement ("they like X", not *"they likes X")
// * case: nom or acc, whether this noun is nominative or accusative case.
//   nominative case goes in the subject, and accusative in the object.
//   e.g., "he fell" and "she likes him", not *"him fell" and *"her likes he"
// * pron: he, she, they, or ref -- what type of pronoun this is
// * needs_pron: whether this is a reflexive that needs to bind to another
//   pronoun.
N[ num: sg, case: nom, pron: he ]                    -> he
N[ num: sg, case: acc, pron: he ]                    -> him
N[ num: sg, case: acc, pron: ref, needs_pron: he ]   -> himself
N[ num: sg, case: nom, pron: she ]                   -> she
N[ num: sg, case: acc, pron: she ]                   -> her
N[ num: sg, case: acc, pron: ref, needs_pron: she]   -> herself
N[ num: pl, case: nom, pron: they ]                  -> they
N[ num: pl, case: acc, pron: they ]                  -> them
N[ num: pl, case: acc, pron: ref, needs_pron: they ] -> themselves
N[ num: sg, case: acc, pron: ref, needs_pron: they ] -> themself

// Names
// The added features are:
// * num: sg, as people are singular ("mary likes her" / *"mary like her")
// * case: **top**, as names can be both subjects and objects
//   ("mary likes her" / "she likes mary")
// * pron: whichever pronoun the person uses for reflexive agreement
//   mary    pron: she  => mary likes herself
//   sue     pron: they => sue likes themself
//   takeshi pron: he   => takeshi likes himself
N[ num: sg, case: **top**, pron: she ]  -> mary
N[ num: sg, case: **top**, pron: they ] -> sue
N[ num: sg, case: **top**, pron: he ]   -> takeshi
N[ num: sg, case: **top**, pron: he ]   -> robert

// Complementizer doesn't need features
Comp -> that

// Verbs -- intransitive, transitive, and clausal
// The added features are:
// * num: sg, pl, or **top** -- to match the noun numbers.
//   **top** will match either sg or pl, as past-tense verbs in English
//   don't agree in number: "he fell" and "they fell" are both fine
// * tense: past or nonpast -- this won't be used for agreement, but will be
//   copied into the final feature structure, and the client code could do
//   something with it
IV[ num:      sg, tense: nonpast ] -> falls
IV[ num:      pl, tense: nonpast ] -> fall
IV[ num: **top**, tense: past ]    -> fell

TV[ num:      sg, tense: nonpast ] -> likes
TV[ num:      pl, tense: nonpast ] -> like
TV[ num: **top**, tense: past ]    -> liked

CV[ num:      sg, tense: nonpast ] -> says
CV[ num:      pl, tense: nonpast ] -> say
CV[ num: **top**, tense: past ]    -> said

Now that our lexicon is updated with features, we can update our sentence rules to constrain parsing based on those features. This uses two new features, tags and unification. Tags allow features to be associated between nodes in a rule, and unification controls how those features are compatible. The rules for unification are:

  1. A string feature can unify with a string feature with the same value
  2. A top feature can unify with anything, and the nodes are merged
  3. A complex feature ([ ... ] structure) is recursively unified with another complex feature.

If unification fails anywhere, the parse is aborted and the tree is discarded. This allows the programmer to discard trees if features don't match.

// Sentence rules
// Intransitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
S -> N[ case: nom, num: #1 ] IV[ num: #1 ]
// Transitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #2)
// * If there's a reflexive in the object position, make sure its `needs_pron`
//   feature matches the subject's `pron` feature. If the object isn't a
//   reflexive, then its `needs_pron` feature will implicitly be `**top**`, so
//   will unify with anything.
S -> N[ case: nom, pron: #1, num: #2 ] TV[ num: #2 ] N[ case: acc, needs_pron: #1 ]
// Clausal verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
// * Reflexives can't cross clause boundaries (*"He said that she likes himself"),
//   so we can ignore reflexives and delegate to inner clause rule
S -> N[ case: nom, num: #1 ] CV[ num: #1 ] Comp S

Now that we have this augmented grammar (available as examples/reflexives.fgr), we can try it out and see that it rejects illicit sentences that were previously accepted, while still accepting valid ones:

> he fell
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: fell)))
[
  child-1: [
    child-0: [ word: fell ]
    num: #0 sg
    tense: past
  ]
  child-0: [
    pron: he
    case: nom
    num: #0
    child-0: [ word: he ]
  ]
]

> he like him
Parsed 0 trees

> he likes himself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-1: [
    num: #0 sg
    child-0: [ word: likes ]
    tense: nonpast
  ]
  child-2: [
    needs_pron: #1 he
    num: sg
    child-0: [ word: himself ]
    pron: ref
    case: acc
  ]
  child-0: [
    child-0: [ word: he ]
    pron: #1
    num: #0
    case: nom
  ]
]

> he likes herself
Parsed 0 trees

> mary likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: mary))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-0: [
    pron: #0 she
    num: #1 sg
    case: nom
    child-0: [ word: mary ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1
  ]
  child-2: [
    child-0: [ word: herself ]
    num: sg
    pron: ref
    case: acc
    needs_pron: #0
  ]
]

> mary likes themself
Parsed 0 trees

> sue likes themself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: sue))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: themself)))
[
  child-0: [
    pron: #0 they
    child-0: [ word: sue ]
    case: nom
    num: #1 sg
  ]
  child-1: [
    tense: nonpast
    num: #1
    child-0: [ word: likes ]
  ]
  child-2: [
    needs_pron: #0
    case: acc
    pron: ref
    child-0: [ word: themself ]
    num: sg
  ]
]

> sue likes himself
Parsed 0 trees

If this is interesting to you and you want to learn more, you can check out my blog series, the excellent textbook Syntactic Theory: A Formal Introduction (2nd ed.), and the DELPH-IN project, whose work on the LKB inspired this simplified version.

Using from code

I need to write this section in more detail, but if you're comfortable with Rust, I suggest looking through the codebase. It's not perfect, it started as one of my first Rust projects (after migrating through F# -> TypeScript -> C in search of the right performance/ergonomics tradeoff), and it could use more tests, but overall it's not too bad.

Basically, the processing pipeline is:

  1. Make a Grammar struct
  • Grammar is defined in rules.rs.
  • The easiest way to make a Grammar is Grammar::parse_from_file, which is mostly a hand-written recusive descent parser in parse_grammar.rs. Yes, I recognize the irony here.
  1. It takes input (in Grammar::parse, which does everything for you, or Grammar::parse_chart, which just does the chart)
  2. The input is first chart-parsed in earley.rs
  3. Then, a forest is built from the chart, in forest.rs, using an algorithm I found in a very useful blog series I forget the URL for, because the algorithms in the academic literature for this are... weird.
  4. Finally, the feature unification is used to prune the forest down to only valid trees. It would be more efficient to do this during parsing, but meh.

The most interesting thing you can do via code and not via the CLI is probably getting at the raw feature DAG, as that would let you do things like pronoun coreference. The DAG code is in featurestructure.rs, and should be fairly approachable -- there's a lot of Rust ceremony around Rc<RefCell<...>> because using an arena allocation crate seemed too harlike overkill, but that is somewhat mitigated by the NodeRef type alias. Hit me up at https://vgel.me/contact if you need help with anything here!

Download Details:
Author: vgel
Source Code: https://github.com/vgel/treebender
License: MIT License

#rust  #machinelearning 

What is GEEK

Buddha Community

Treebender: A Symbolic Natural Language Parsing Library for Rust

Serde Rust: Serialization Framework for Rust

Serde

*Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.*

You may be looking for:

Serde in action

Click to show Cargo.toml. Run this code in the playground.

[dependencies]

# The core APIs, including the Serialize and Deserialize traits. Always
# required when using Serde. The "derive" feature is only required when
# using #[derive(Serialize, Deserialize)] to make Serde work with structs
# and enums defined in your crate.
serde = { version = "1.0", features = ["derive"] }

# Each data format lives in its own crate; the sample code below uses JSON
# but you may be using a different one.
serde_json = "1.0"

 

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let point = Point { x: 1, y: 2 };

    // Convert the Point to a JSON string.
    let serialized = serde_json::to_string(&point).unwrap();

    // Prints serialized = {"x":1,"y":2}
    println!("serialized = {}", serialized);

    // Convert the JSON string back to a Point.
    let deserialized: Point = serde_json::from_str(&serialized).unwrap();

    // Prints deserialized = Point { x: 1, y: 2 }
    println!("deserialized = {:?}", deserialized);
}

Getting help

Serde is one of the most widely used Rust libraries so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

Download Details:
Author: serde-rs
Source Code: https://github.com/serde-rs/serde
License: View license

#rust  #rustlang 

Treebender: A Symbolic Natural Language Parsing Library for Rust

Treebender

A symbolic natural language parsing library for Rust, inspired by HDPSG.

What is this?

This is a library for parsing natural or constructed languages into syntax trees and feature structures. There's no machine learning or probabilistic models, everything is hand-crafted and deterministic.

You can find out more about the motivations of this project in this blog post.

But what are you using it for?

I'm using this to parse a constructed language for my upcoming xenolinguistics game, Themengi.

Motivation

Using a simple 80-line grammar, introduced in the tutorial below, we can parse a simple subset of English, checking reflexive pronoun binding, case, and number agreement.

$ cargo run --bin cli examples/reflexives.fgr
> she likes himself
Parsed 0 trees

> her likes herself
Parsed 0 trees

> she like herself
Parsed 0 trees

> she likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-2: [
    case: acc
    pron: ref
    needs_pron: #0 she
    num: sg
    child-0: [ word: herself ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1 sg
  ]
  child-0: [
    child-0: [ word: she ]
    case: nom
    pron: #0
    num: #1
  ]
]

Low resource language? Low problem! No need to train on gigabytes of text, just write a grammar using your brain. Let's hypothesize that in American Sign Language, topicalized nouns (expressed with raised eyebrows) must appear first in the sentence. We can write a small grammar (18 lines), and plug in some sentences:

$ cargo run --bin cli examples/asl-wordorder.fgr -n
> boy sit
Parsed 1 tree
(0..2: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: IV (1..2: sit)))

> boy throw ball
Parsed 1 tree
(0..3: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: TV (1..2: throw))
  (2..3: NP ((2..3: N (2..3: ball)))))

> ball nm-raised-eyebrows boy throw
Parsed 1 tree
(0..4: S
  (0..2: NP
    (0..1: N (0..1: ball))
    (1..2: Topic (1..2: nm-raised-eyebrows)))
  (2..3: NP ((2..3: N (2..3: boy))))
  (3..4: TV (3..4: throw)))

> boy throw ball nm-raised-eyebrows
Parsed 0 trees

Tutorial

As an example, let's say we want to build a parser for English reflexive pronouns (himself, herself, themselves, themself, itself). We'll also support number ("He likes X" v.s. "They like X") and simple embedded clauses ("He said that they like X").

Grammar files are written in a custom language, similar to BNF, called Feature GRammar (.fgr). There's a VSCode syntax highlighting extension for these files available as fgr-syntax.

We'll start by defining our lexicon. The lexicon is the set of terminal symbols (symbols in the actual input) that the grammar will match. Terminal symbols must start with a lowercase letter, and non-terminal symbols must start with an uppercase letter.

// pronouns
N -> he
N -> him
N -> himself
N -> she
N -> her
N -> herself
N -> they
N -> them
N -> themselves
N -> themself

// names, lowercase as they are terminals
N -> mary
N -> sue
N -> takeshi
N -> robert

// complementizer
Comp -> that

// verbs -- intransitive, transitive, and clausal
IV -> falls
IV -> fall
IV -> fell

TV -> likes
TV -> like
TV -> liked

CV -> says
CV -> say
CV -> said

Next, we can add our sentence rules (they must be added at the top, as the first rule in the file is assumed to be the top-level rule):

// sentence rules
S -> N IV
S -> N TV N
S -> N CV Comp S

// ... previous lexicon ...

Assuming this file is saved as examples/no-features.fgr (which it is :wink:), we can test this file with the built-in CLI:

$ cargo run --bin cli examples/no-features.fgr
> he falls
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: falls)))
[
  child-1: [ child-0: [ word: falls ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he falls her
Parsed 0 trees

> he likes her
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: likes ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he likes
Parsed 0 trees

> he said that he likes her
Parsed 1 tree
(0..6: S
  (0..1: N (0..1: he))
  (1..2: CV (1..2: said))
  (2..3: Comp (2..3: that))
  (3..6: S
    (3..4: N (3..4: he))
    (4..5: TV (4..5: likes))
    (5..6: N (5..6: her))))
[
  child-0: [ child-0: [ word: he ] ]
  child-2: [ child-0: [ word: that ] ]
  child-1: [ child-0: [ word: said ] ]
  child-3: [
    child-2: [ child-0: [ word: her ] ]
    child-1: [ child-0: [ word: likes ] ]
    child-0: [ child-0: [ word: he ] ]
  ]
]

> he said that he
Parsed 0 trees

This grammar already parses some correct sentences, and blocks some trivially incorrect ones. However, it doesn't care about number, case, or reflexives right now:

> she likes himself  // unbound reflexive pronoun
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-0: [ child-0: [ word: she ] ]
  child-2: [ child-0: [ word: himself ] ]
  child-1: [ child-0: [ word: likes ] ]
]

> him like her  // incorrect case on the subject pronoun, should be nominative
                // (he) instead of accusative (him)
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: him))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-0: [ child-0: [ word: him ] ]
  child-1: [ child-0: [ word: like ] ]
  child-2: [ child-0: [ word: her ] ]
]

> he like her  // incorrect verb number agreement
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: like ] ]
  child-0: [ child-0: [ word: he ] ]
]

To fix this, we need to add features to our lexicon, and restrict the sentence rules based on features.

Features are added with square brackets, and are key: value pairs separated by commas. **top** is a special feature value, which basically means "unspecified" -- we'll come back to it later. Features that are unspecified are also assumed to have a **top** value, but sometimes explicitly stating top is more clear.

/// Pronouns
// The added features are:
// * num: sg or pl, whether this noun wants a singular verb (likes) or
//   a plural verb (like). note this is grammatical number, so for example
//   singular they takes plural agreement ("they like X", not *"they likes X")
// * case: nom or acc, whether this noun is nominative or accusative case.
//   nominative case goes in the subject, and accusative in the object.
//   e.g., "he fell" and "she likes him", not *"him fell" and *"her likes he"
// * pron: he, she, they, or ref -- what type of pronoun this is
// * needs_pron: whether this is a reflexive that needs to bind to another
//   pronoun.
N[ num: sg, case: nom, pron: he ]                    -> he
N[ num: sg, case: acc, pron: he ]                    -> him
N[ num: sg, case: acc, pron: ref, needs_pron: he ]   -> himself
N[ num: sg, case: nom, pron: she ]                   -> she
N[ num: sg, case: acc, pron: she ]                   -> her
N[ num: sg, case: acc, pron: ref, needs_pron: she]   -> herself
N[ num: pl, case: nom, pron: they ]                  -> they
N[ num: pl, case: acc, pron: they ]                  -> them
N[ num: pl, case: acc, pron: ref, needs_pron: they ] -> themselves
N[ num: sg, case: acc, pron: ref, needs_pron: they ] -> themself

// Names
// The added features are:
// * num: sg, as people are singular ("mary likes her" / *"mary like her")
// * case: **top**, as names can be both subjects and objects
//   ("mary likes her" / "she likes mary")
// * pron: whichever pronoun the person uses for reflexive agreement
//   mary    pron: she  => mary likes herself
//   sue     pron: they => sue likes themself
//   takeshi pron: he   => takeshi likes himself
N[ num: sg, case: **top**, pron: she ]  -> mary
N[ num: sg, case: **top**, pron: they ] -> sue
N[ num: sg, case: **top**, pron: he ]   -> takeshi
N[ num: sg, case: **top**, pron: he ]   -> robert

// Complementizer doesn't need features
Comp -> that

// Verbs -- intransitive, transitive, and clausal
// The added features are:
// * num: sg, pl, or **top** -- to match the noun numbers.
//   **top** will match either sg or pl, as past-tense verbs in English
//   don't agree in number: "he fell" and "they fell" are both fine
// * tense: past or nonpast -- this won't be used for agreement, but will be
//   copied into the final feature structure, and the client code could do
//   something with it
IV[ num:      sg, tense: nonpast ] -> falls
IV[ num:      pl, tense: nonpast ] -> fall
IV[ num: **top**, tense: past ]    -> fell

TV[ num:      sg, tense: nonpast ] -> likes
TV[ num:      pl, tense: nonpast ] -> like
TV[ num: **top**, tense: past ]    -> liked

CV[ num:      sg, tense: nonpast ] -> says
CV[ num:      pl, tense: nonpast ] -> say
CV[ num: **top**, tense: past ]    -> said

Now that our lexicon is updated with features, we can update our sentence rules to constrain parsing based on those features. This uses two new features, tags and unification. Tags allow features to be associated between nodes in a rule, and unification controls how those features are compatible. The rules for unification are:

  1. A string feature can unify with a string feature with the same value
  2. A top feature can unify with anything, and the nodes are merged
  3. A complex feature ([ ... ] structure) is recursively unified with another complex feature.

If unification fails anywhere, the parse is aborted and the tree is discarded. This allows the programmer to discard trees if features don't match.

// Sentence rules
// Intransitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
S -> N[ case: nom, num: #1 ] IV[ num: #1 ]
// Transitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #2)
// * If there's a reflexive in the object position, make sure its `needs_pron`
//   feature matches the subject's `pron` feature. If the object isn't a
//   reflexive, then its `needs_pron` feature will implicitly be `**top**`, so
//   will unify with anything.
S -> N[ case: nom, pron: #1, num: #2 ] TV[ num: #2 ] N[ case: acc, needs_pron: #1 ]
// Clausal verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
// * Reflexives can't cross clause boundaries (*"He said that she likes himself"),
//   so we can ignore reflexives and delegate to inner clause rule
S -> N[ case: nom, num: #1 ] CV[ num: #1 ] Comp S

Now that we have this augmented grammar (available as examples/reflexives.fgr), we can try it out and see that it rejects illicit sentences that were previously accepted, while still accepting valid ones:

> he fell
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: fell)))
[
  child-1: [
    child-0: [ word: fell ]
    num: #0 sg
    tense: past
  ]
  child-0: [
    pron: he
    case: nom
    num: #0
    child-0: [ word: he ]
  ]
]

> he like him
Parsed 0 trees

> he likes himself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-1: [
    num: #0 sg
    child-0: [ word: likes ]
    tense: nonpast
  ]
  child-2: [
    needs_pron: #1 he
    num: sg
    child-0: [ word: himself ]
    pron: ref
    case: acc
  ]
  child-0: [
    child-0: [ word: he ]
    pron: #1
    num: #0
    case: nom
  ]
]

> he likes herself
Parsed 0 trees

> mary likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: mary))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-0: [
    pron: #0 she
    num: #1 sg
    case: nom
    child-0: [ word: mary ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1
  ]
  child-2: [
    child-0: [ word: herself ]
    num: sg
    pron: ref
    case: acc
    needs_pron: #0
  ]
]

> mary likes themself
Parsed 0 trees

> sue likes themself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: sue))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: themself)))
[
  child-0: [
    pron: #0 they
    child-0: [ word: sue ]
    case: nom
    num: #1 sg
  ]
  child-1: [
    tense: nonpast
    num: #1
    child-0: [ word: likes ]
  ]
  child-2: [
    needs_pron: #0
    case: acc
    pron: ref
    child-0: [ word: themself ]
    num: sg
  ]
]

> sue likes himself
Parsed 0 trees

If this is interesting to you and you want to learn more, you can check out my blog series, the excellent textbook Syntactic Theory: A Formal Introduction (2nd ed.), and the DELPH-IN project, whose work on the LKB inspired this simplified version.

Using from code

I need to write this section in more detail, but if you're comfortable with Rust, I suggest looking through the codebase. It's not perfect, it started as one of my first Rust projects (after migrating through F# -> TypeScript -> C in search of the right performance/ergonomics tradeoff), and it could use more tests, but overall it's not too bad.

Basically, the processing pipeline is:

  1. Make a Grammar struct
  • Grammar is defined in rules.rs.
  • The easiest way to make a Grammar is Grammar::parse_from_file, which is mostly a hand-written recusive descent parser in parse_grammar.rs. Yes, I recognize the irony here.
  1. It takes input (in Grammar::parse, which does everything for you, or Grammar::parse_chart, which just does the chart)
  2. The input is first chart-parsed in earley.rs
  3. Then, a forest is built from the chart, in forest.rs, using an algorithm I found in a very useful blog series I forget the URL for, because the algorithms in the academic literature for this are... weird.
  4. Finally, the feature unification is used to prune the forest down to only valid trees. It would be more efficient to do this during parsing, but meh.

The most interesting thing you can do via code and not via the CLI is probably getting at the raw feature DAG, as that would let you do things like pronoun coreference. The DAG code is in featurestructure.rs, and should be fairly approachable -- there's a lot of Rust ceremony around Rc<RefCell<...>> because using an arena allocation crate seemed too harlike overkill, but that is somewhat mitigated by the NodeRef type alias. Hit me up at https://vgel.me/contact if you need help with anything here!

Download Details:
Author: vgel
Source Code: https://github.com/vgel/treebender
License: MIT License

#rust  #machinelearning 

Awesome  Rust

Awesome Rust

1654894080

Serde JSON: JSON Support for Serde Framework

Serde JSON

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

[dependencies]
serde_json = "1.0"

You may be looking for:

JSON is a ubiquitous open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs.

{
    "name": "John Doe",
    "age": 43,
    "address": {
        "street": "10 Downing Street",
        "city": "London"
    },
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
}

There are three common ways that you might find yourself needing to work with JSON data in Rust.

  • As text data. An unprocessed string of JSON data that you receive on an HTTP endpoint, read from a file, or prepare to send to a remote server.
  • As an untyped or loosely typed representation. Maybe you want to check that some JSON data is valid before passing it on, but without knowing the structure of what it contains. Or you want to do very basic manipulations like insert a key in a particular spot.
  • As a strongly typed Rust data structure. When you expect all or most of your data to conform to a particular structure and want to get real work done without JSON's loosey-goosey nature tripping you up.

Serde JSON provides efficient, flexible, safe ways of converting data between each of these representations.

Operating on untyped JSON values

Any valid JSON data can be manipulated in the following recursive enum representation. This data structure is serde_json::Value.

enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

A string of JSON data can be parsed into a serde_json::Value by the serde_json::from_str function. There is also from_slice for parsing from a byte slice &[u8] and from_reader for parsing from any io::Read like a File or a TCP stream.

use serde_json::{Result, Value};

fn untyped_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into serde_json::Value.
    let v: Value = serde_json::from_str(data)?;

    // Access parts of the data by indexing with square brackets.
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    Ok(())
}

The result of square bracket indexing like v["name"] is a borrow of the data at that index, so the type is &Value. A JSON map can be indexed with string keys, while a JSON array can be indexed with integer keys. If the type of the data is not right for the type with which it is being indexed, or if a map does not contain the key being indexed, or if the index into a vector is out of bounds, the returned element is Value::Null.

When a Value is printed, it is printed as a JSON string. So in the code above, the output looks like Please call "John Doe" at the number "+44 1234567". The quotation marks appear because v["name"] is a &Value containing a JSON string and its JSON representation is "John Doe". Printing as a plain string without quotation marks involves converting from a JSON string to a Rust string with as_str() or avoiding the use of Value as described in the following section.

The Value representation is sufficient for very basic tasks but can be tedious to work with for anything more significant. Error handling is verbose to implement correctly, for example imagine trying to detect the presence of unrecognized fields in the input data. The compiler is powerless to help you when you make a mistake, for example imagine typoing v["name"] as v["nmae"] in one of the dozens of places it is used in your code.

Parsing JSON as strongly typed data structures

Serde provides a powerful way of mapping JSON data into Rust data structures largely automatically.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into a Person object. This is exactly the
    // same function as the one that produced serde_json::Value above, but
    // now we are asking it for a Person as output.
    let p: Person = serde_json::from_str(data)?;

    // Do things just like with any other Rust data structure.
    println!("Please call {} at the number {}", p.name, p.phones[0]);

    Ok(())
}

This is the same serde_json::from_str function as before, but this time we assign the return value to a variable of type Person so Serde will automatically interpret the input data as a Person and produce informative error messages if the layout does not conform to what a Person is expected to look like.

Any type that implements Serde's Deserialize trait can be deserialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Deserialize)].

Once we have p of type Person, our IDE and the Rust compiler can help us use it correctly like they do for any other Rust code. The IDE can autocomplete field names to prevent typos, which was impossible in the serde_json::Value representation. And the Rust compiler can check that when we write p.phones[0], then p.phones is guaranteed to be a Vec<String> so indexing into it makes sense and produces a String.

The necessary setup for using Serde's derive macros is explained on the Using derive page of the Serde site.

Constructing JSON values

Serde JSON provides a json! macro to build serde_json::Value objects with very natural JSON syntax.

use serde_json::json;

fn main() {
    // The type of `john` is `serde_json::Value`
    let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    println!("first phone number: {}", john["phones"][0]);

    // Convert to a string of JSON and print it out
    println!("{}", john.to_string());
}

The Value::to_string() function converts a serde_json::Value into a String of JSON text.

One neat thing about the json! macro is that variables and expressions can be interpolated directly into the JSON value as you are building it. Serde will check at compile time that the value you are interpolating is able to be represented as JSON.

let full_name = "John Doe";
let age_last_year = 42;

// The type of `john` is `serde_json::Value`
let john = json!({
    "name": full_name,
    "age": age_last_year + 1,
    "phones": [
        format!("+44 {}", random_phone())
    ]
});

This is amazingly convenient, but we have the problem we had before with Value: the IDE and Rust compiler cannot help us if we get it wrong. Serde JSON provides a better way of serializing strongly-typed data structures into JSON text.

Creating JSON by serializing data structures

A data structure can be converted to a JSON string by serde_json::to_string. There is also serde_json::to_vec which serializes to a Vec<u8> and serde_json::to_writer which serializes to any io::Write such as a File or a TCP stream.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Address {
    street: String,
    city: String,
}

fn print_an_address() -> Result<()> {
    // Some data structure.
    let address = Address {
        street: "10 Downing Street".to_owned(),
        city: "London".to_owned(),
    };

    // Serialize it to a JSON string.
    let j = serde_json::to_string(&address)?;

    // Print, write to a file, or send to an HTTP server.
    println!("{}", j);

    Ok(())
}

Any type that implements Serde's Serialize trait can be serialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Serialize)].

Performance

It is fast. You should expect in the ballpark of 500 to 1000 megabytes per second deserialization and 600 to 900 megabytes per second serialization, depending on the characteristics of your data. This is competitive with the fastest C and C++ JSON libraries or even 30% faster for many use cases. Benchmarks live in the serde-rs/json-benchmark repo.

Getting help

Serde is one of the most widely used Rust libraries, so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo, but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

No-std support

As long as there is a memory allocator, it is possible to use serde_json without the rest of the Rust standard library. This is supported on Rust 1.36+. Disable the default "std" feature and enable the "alloc" feature:

[dependencies]
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }

For JSON support in Serde without a memory allocator, please see the serde-json-core crate.

Link: https://crates.io/crates/serde_json

#rust  #rustlang  #encode   #json 

Ananya Gupta

Ananya Gupta

1594464365

Advantage of C Language Certification Online Training in 2020

C language is a procedural programming language. C language is the general purpose and object oriented programming language. C language is mainly used for developing different types of operating systems and other programming languages. C language is basically run in hardware and operating systems. C language is used many software applications such as internet browser, MYSQL and Microsoft Office.
**
Advantage of doing C Language Training in 2020 are:**

  1. Popular Programming language: The main Advantage of doing C language training in 2020 is popular programming language. C programming language is used and applied worldwide. C language is adaptable and flexible in nature. C language is important for different programmers. The basic languages that are used in C language is Java, C++, PHP, Python, Perl, JavaScript, Rust and C- shell.

  2. Basic language of all advanced languages: The another main Advantage of doing C language training in 2020 is basic language of all advanced languages. C language is an object oriented language. For learning, other languages, you have to master in C language.

  3. Understand the computer theories: The another main Advantage of doing C language training in 2020 is understand the computer theories. The theories such as Computer Networks, Computer Architecture and Operating Systems are based on C programming language.

  4. Fast in execution time: The another main Advantage of doing C language training in 2020 is fast in execution time. C language is to requires small run time and fast in execution time. The programs are written in C language are faster than the other programming language.

  5. Used by long term: The another main Advantage of doing C language training in 2020 is used by long term. The C language is not learning in the short span of time. It takes time and energy for becoming career in C language. C language is the only language that used by decades of time. C language is that exists for the longest period of time in computer programming history.

  6. Rich Function Library: The another main Advantage of doing C language training in 2020 is rich function library. C language has rich function of libraries as compared to other programming languages. The libraries help to build the analytical skills.

  7. Great degree of portability: The another main Advantage of doing C language training in 2020 is great degree of portability. C is a portable assemble language. It has a great degree of portability as compilers and interpreters of other programming languages are implemented in C language.
    The demand of C language is high in IT sector and increasing rapidly.

C Language Online Training is for individuals and professionals.
C Language Online Training helps to develop an application, build operating systems, games and applications, work on the accessibility of files and memory and many more.

C Language Online Course is providing the depth knowledge of functional and logical part, develop an application, work on memory management, understanding of line arguments, compiling, running and debugging of C programs.

Is C Language Training Worth Learning for You! and is providing the basic understanding of create C applications, apply the real time programming, write high quality code, computer programming, C functions, variables, datatypes, operators, loops, statements, groups, arrays, strings, etc.

The companies which are using C language are Amazon, Martin, Apple, Samsung, Google, Oracle, Nokia, IBM, Intel, Novell, Microsoft, Facebook, Bloomberg, VM Ware, etc.
C language is used in different domains like banking, IT, Insurance, Education, Gaming, Networking, Firmware, Telecommunication, Graphics, Management, Embedded, Application Development, Driver level Development, Banking, etc.

The job opportunities after completing the C Language Online certificationAre Data Scientists, Back End Developer, Embedded Developer, C Analyst, Software Developer, Junior Programmer, Database Developer, Embedded Engineer, Programming Architect, Game Programmer, Quality Analyst, Senior Programmer, Full Stack Developer, DevOps Specialist, Front End Web Developer, App Developer, Java Software Engineer, Software Developer and many more.

#c language online training #c language online course #c language certification online #c language certification #c language certification course #c language certification training

Ananya Gupta

Ananya Gupta

1599550659

Benefits Of C Language Over Other Programming Languages

C may be a middle-level programing language developed by Dennis Ritchie during the first 1970s while performing at AT&T Bell Labs within the USA. the target of its development was within the context of the re-design of the UNIX OS to enable it to be used on multiple computers.

Earlier the language B was now used for improving the UNIX. Being an application-oriented language, B allowed a much faster production of code than in programming language. Still, B suffered from drawbacks because it didn’t understand data-types and didn’t provide the utilization of “structures”.

These drawbacks became the drive for Ritchie for the development of a replacement programing language called C. He kept most of the language B’s syntax and added data-types and lots of other required changes. Eventually, C was developed during 1971-73, containing both high-level functionality and therefore the detailed features required to program an OS. Hence, many of the UNIX components including the UNIX kernel itself were eventually rewritten in C.

Benefits of C language

As a middle-level language, C combines the features of both high-level and low-level languages. It is often used for low-level programmings, like scripting for it also supports functions of high-level C programming languages, like scripting for software applications, etc.
C may be a structured programing language that allows a posh program to be broken into simpler programs called functions. It also allows free movement of knowledge across these functions.

Various features of C including direct access to machine level hardware APIs, the presence of C compilers, deterministic resource use, and dynamic memory allocation make C language an optimum choice for scripting applications and drivers of embedded systems.

C language is case-sensitive which suggests lowercase and uppercase letters are treated differently.
C is very portable and is employed for scripting system applications which form a serious a part of Windows, UNIX, and Linux OS.

C may be a general-purpose programing language and may efficiently work on enterprise applications, games, graphics, and applications requiring calculations, etc.
C language features a rich library that provides a variety of built-in functions. It also offers dynamic memory allocation.

C implements algorithms and data structures swiftly, facilitating faster computations in programs. This has enabled the utilization of C in applications requiring higher degrees of calculations like MATLAB and Mathematica.

Riding on these advantages, C became dominant and spread quickly beyond Bell Labs replacing many well-known languages of that point, like ALGOL, B, PL/I, FORTRAN, etc. C language has become available on a really wide selection of platforms, from embedded microcontrollers to supercomputers.

#c language online training #c language training #c language course #c language online course #c language certification course