CommonMark.jl: A CommonMark-compliant Parser for Julia

CommonMark

A CommonMark-compliant parser for Julia. 

Interface

using CommonMark

Create a markdown parser with the default CommonMark settings and then add footnote syntax to our parser.

parser = Parser()
enable!(parser, FootnoteRule())

Parse some text to an abstract syntax tree from a String:

ast = parser("Hello *world*")

Parse the contents of a source file:

ast = open(parser, "document.md")

Write ast to a string.

body = html(ast)
content = "<head></head><body>$body</body>"

Write to a file.

open("file.tex", "w") do file
    latex(file, ast)
    println(file, "rest of document...")
end

Or write to a buffer, such as stdout.

term(stdout, ast)

Output Formats

Supported output formats are currently:

  • html
  • latex
  • term: colourised and Unicode-formatted for display in a terminal.
  • markdown
  • notebook: Jupyter notebooks.

Extensions

Extensions can be enabled using the enable! function and disabled using disable!.

Typography

Convert ASCII dashes, ellipses, and quotes to their Unicode equivalents.

enable!(parser, TypographyRule())

Keyword arguments available for TypographyRule are

  • double_quotes
  • single_quotes
  • ellipses
  • dashes

which all default to true.

Admonitions

enable!(parser, AdmonitionRule())

Front matter

Fenced blocks at the start of a file containing structured data.

+++
[heading]
content = "..."
+++

The rest of the file...

The block must start on the first line of the file. Supported blocks are:

  • ;;; for JSON
  • +++ for TOML
  • --- for YAML

To enable provide the FrontMatterRule with your choice of parsers for the formats:

using JSON
enable!(parser, FrontMatterRule(json=JSON.Parser.parse))

You can access front matter from a parsed file using frontmatter. As follows.

ast = open(parser, "document.md")
meta = frontmatter(ast)

Footnotes

enable!(parser, FootnoteRule())

Math

Julia-style inline and display maths:

Some ``\LaTeX`` math:

```math
f(a) = \frac{1}{2\pi}\int_{0}^{2\pi} (\alpha+R\cos(\theta))d\theta
```

Enabled with:

enable!(parser, MathRule())

Dollar-style inline and display math is also available using

enable!(parser, DollarMathRule())

Supported syntax:

  • single dollar signs surrounding inline math,
  • double dollars surrounding a single line paragraph for display math.

For more complex math, such as multiline display math, use the literal block syntax available with MathRule().

Tables

Pipe-style tables, similar to GitHub's tables. Literal | characters that are not wrapped in other syntax such as * must be escaped with a backslash. The number of columns in the table is specified by the second line.

| Column One | Column Two | Column Three |
|:---------- | ---------- |:------------:|
| Row `1`    | Column `2` |              |
| *Row* 2    | **Row** 2  | Column ``|`` |

Rows with more cells than specified have the trailing cells discarded, and rows with less cells are topped up with empty cells.

Enabled with:

enable!(parser, TableRule())

Raw Content

Overload literal syntax to support passing through any type of raw content.

enable!(parser, RawContentRule())

By default RawContentRule will handle inline and block content in HTML and LaTeX formats.

This is raw HTML: `<img src="myimage.jpg">`{=html}.

And here's an HTML block:

```{=html}
<div id="main">
 <div class="article">
```
```{=latex}
\begin{tikzpicture}
\draw[gray, thick] (-1,2) -- (2,-4);
\draw[gray, thick] (-1,-1) -- (2,2);
\filldraw[black] (0,0) circle (2pt) node[anchor=west] {Intersection point};
\end{tikzpicture}
```

This can be used to pass through different complex content that can't be easily handled by CommonMark natively without any loss of expressiveness.

Custom raw content handlers can also be passed through when enabling the rule. The naming scheme is <format>_inline or <format>_block.

enable!(p, RawContentRule(rst_inline=RstInline))

The last example would require the definition of a custom RstInline struct and associated display methods for all supported output types, namely: html, latex, and term. When passing your own keywords to RawContentRule the defaults are not included and must be enabled individually.

Attributes

Block and inline nodes can be tagged with arbitrary metadata in the form of key/value pairs using the AttributeRule extension.

enable!(p, AttributeRule())

Block attributes appear directly above the node that they target:

{#my_id color="red"}
# Heading

This will attach the metadata id="my_id" and color="red" to # Heading.

Inline attributes appear directly after the node that they target:

*Some styled text*{background="green"}.

Which will attach metadata background="green" to the emphasised text Some styled text.

CSS-style shorthand syntax #<name> and .<name> are available to use in place of id="<name>" and class="name". Multiple classes may be specified sequentially.

AttributeRule does not handle writing metadata to particular formats such as HTML or LaTeX. It is up to the implementation of a particular writer format to make use of available metadata itself. The built-in html and latex outputs make use of included attributes. html will include all provided attributes in the output, while latex makes use of only the #<id> attribute.

Citations

Use the following to enable in-text citations and reference list generation:

enable!(p, CitationRule())

Syntax for citations is similar to what is offered by Pandoc. Citations start with @.

Citations can either appear in square brackets [@id], or they can be written as
part of the text like @id. Bracketed citations can contain more than one
citation; separated by semi-colons [@one; @two; and @three].

{#refs}
# References

A reference section that will be populated with a list of all references can be marked using a {#refs} attribute from AttributeRule at the toplevel of the document. The list will be inserted after the node, in this case # References.

Citations and reference lists are formatted following the Chicago Manual of Style. Styling will, in future versions, be customisable using Citation Style Language styles.

The reference data used for citations must be provided in a format matching CSL JSON. Pass this data to CommonMark.jl when writing an AST to a output format.

html(ast, Dict{String,Any}("references" => JSON.parsefile("references.json")))

CSL JSON can be exported easily from reference management software such as Zotero or generated via pandoc-citeproc --bib2json or similar. The references data can be provided by the front matter section of a document so long as the FrontMatterRule has been enabled, though this does require writing your CSL data manually.

Note that the text format of the reference list is not important, and does not have to be JSON data. So long as the shape of the data matches CSL JSON it is valid. Below we use YAML references embedded in the document's front matter:

---
references:
- id: abelson1996
  author:
    - family: Abelson
      given: Harold
    - family: Sussman
      given: Gerald Jay
  edition: 2nd Editon
  event-place: Cambridge
  ISBN: 0-262-01153-0
  issued:
    date-parts:
      - - 1996
  publisher: MIT Press/McGraw-Hill
  publisher-place: Cambridge
  title: Structure and interpretation of computer programs
  type: book
---

Here's a citation [@abelson1996].

{#refs}
# References

Auto Identifiers

Headings within a document can be assigned ids automatically using

enable!(p, AutoIdentifierRule())

Identifiers are determined with CommonMark.slugify, which is based on the algorithm used by Pandoc. Non-unique identifiers are suffixed with a numeric counter and so cannot be considered stable. If you need stable identifiers then you should use AttributeRule to assign stable ids manually.

CommonMark Defaults

Block rules enabled by default in Parser objects:

  • AtxHeadingRule()
  • BlockQuoteRule()
  • FencedCodeBlockRule()
  • HtmlBlockRule()
  • IndentedCodeBlockRule()
  • ListItemRule()
  • SetextHeadingRule()
  • ThematicBreakRule()

Inline rules enabled by default in Parser objects:

  • AsteriskEmphasisRule()
  • AutolinkRule()
  • HtmlEntityRule()
  • HtmlInlineRule()
  • ImageRule()
  • InlineCodeRule()
  • LinkRule()
  • UnderscoreEmphasisRule()

These can all be disabled using disable!. Note that disabling some parser rules may result in unexpected results. It is recommended to be conservative in what is disabled.

Note

Until version 1.0.0 the rules listed above are subject to change and should be considered unstable regardless of whether they are exported or not.

Writer Configuration

When writing to an output format configuration data can be provided by:

  • passing a Dict{String,Any} to the writer method,
  • front matter in the source document using the FrontMatterRule extension.

Front matter takes precedence over the passed Dict.

Notable Variables

Values used to determine template behaviour:

template-engine::Function Used to render standalone document templates.

No default is provided by this package. The template-engine function should follow the interface provided by Mustache.render. It is recommended to use Mustache.jl to provide this functionalilty.

Syntax for opening and closing tags used by CommonMark.jl is ${...}. See the templates in src/writers/templates for usage examples.

<format>.template.file::String Custom template file to use for standalone <format>.

<format>.template.string::String Custom template string to use for standalone <format>.

Generic variables that can be included in templates to customise documents:

abstract::String Summary of the document.

authors::Vector{String} Vector of author names.

date::String Date of file generation.

keywords::Vector{String} Vector of keywords to be included in the document metadata.

lang::String Language of the document.

title::String Title of the document.

subtitle::String Subtitle of the document.

Format-specific variables that should be used only in a particular format's template. They are namespaced to avoid collision with other variables.

html

html.css::Vector{String} Vector of CSS files to include in document.

html.js::Vector{String} Vector of JavaScript files to include in document.

html.header::String String content to add at end of <head>.

html.footer::String String content to add at end of <body>.

latex

latex.documentclass::String Class file to use for document. Default is article.

latex.preamble::String String content to add directly before \begin{document}.

The following are automatically available in document templates.

body::String Main content of the page.

curdir::String Current directory.

outputfile::String Name of file that is being written to. When writing to an in-memory buffer this variable is not defined.

Download Details:

Author: MichaelHatherly
Source Code: https://github.com/MichaelHatherly/CommonMark.jl 
License: View license

#julia #markdown #language 

What is GEEK

Buddha Community

CommonMark.jl: A CommonMark-compliant Parser for Julia

CommonMark.jl: A CommonMark-compliant Parser for Julia

CommonMark

A CommonMark-compliant parser for Julia. 

Interface

using CommonMark

Create a markdown parser with the default CommonMark settings and then add footnote syntax to our parser.

parser = Parser()
enable!(parser, FootnoteRule())

Parse some text to an abstract syntax tree from a String:

ast = parser("Hello *world*")

Parse the contents of a source file:

ast = open(parser, "document.md")

Write ast to a string.

body = html(ast)
content = "<head></head><body>$body</body>"

Write to a file.

open("file.tex", "w") do file
    latex(file, ast)
    println(file, "rest of document...")
end

Or write to a buffer, such as stdout.

term(stdout, ast)

Output Formats

Supported output formats are currently:

  • html
  • latex
  • term: colourised and Unicode-formatted for display in a terminal.
  • markdown
  • notebook: Jupyter notebooks.

Extensions

Extensions can be enabled using the enable! function and disabled using disable!.

Typography

Convert ASCII dashes, ellipses, and quotes to their Unicode equivalents.

enable!(parser, TypographyRule())

Keyword arguments available for TypographyRule are

  • double_quotes
  • single_quotes
  • ellipses
  • dashes

which all default to true.

Admonitions

enable!(parser, AdmonitionRule())

Front matter

Fenced blocks at the start of a file containing structured data.

+++
[heading]
content = "..."
+++

The rest of the file...

The block must start on the first line of the file. Supported blocks are:

  • ;;; for JSON
  • +++ for TOML
  • --- for YAML

To enable provide the FrontMatterRule with your choice of parsers for the formats:

using JSON
enable!(parser, FrontMatterRule(json=JSON.Parser.parse))

You can access front matter from a parsed file using frontmatter. As follows.

ast = open(parser, "document.md")
meta = frontmatter(ast)

Footnotes

enable!(parser, FootnoteRule())

Math

Julia-style inline and display maths:

Some ``\LaTeX`` math:

```math
f(a) = \frac{1}{2\pi}\int_{0}^{2\pi} (\alpha+R\cos(\theta))d\theta
```

Enabled with:

enable!(parser, MathRule())

Dollar-style inline and display math is also available using

enable!(parser, DollarMathRule())

Supported syntax:

  • single dollar signs surrounding inline math,
  • double dollars surrounding a single line paragraph for display math.

For more complex math, such as multiline display math, use the literal block syntax available with MathRule().

Tables

Pipe-style tables, similar to GitHub's tables. Literal | characters that are not wrapped in other syntax such as * must be escaped with a backslash. The number of columns in the table is specified by the second line.

| Column One | Column Two | Column Three |
|:---------- | ---------- |:------------:|
| Row `1`    | Column `2` |              |
| *Row* 2    | **Row** 2  | Column ``|`` |

Rows with more cells than specified have the trailing cells discarded, and rows with less cells are topped up with empty cells.

Enabled with:

enable!(parser, TableRule())

Raw Content

Overload literal syntax to support passing through any type of raw content.

enable!(parser, RawContentRule())

By default RawContentRule will handle inline and block content in HTML and LaTeX formats.

This is raw HTML: `<img src="myimage.jpg">`{=html}.

And here's an HTML block:

```{=html}
<div id="main">
 <div class="article">
```
```{=latex}
\begin{tikzpicture}
\draw[gray, thick] (-1,2) -- (2,-4);
\draw[gray, thick] (-1,-1) -- (2,2);
\filldraw[black] (0,0) circle (2pt) node[anchor=west] {Intersection point};
\end{tikzpicture}
```

This can be used to pass through different complex content that can't be easily handled by CommonMark natively without any loss of expressiveness.

Custom raw content handlers can also be passed through when enabling the rule. The naming scheme is <format>_inline or <format>_block.

enable!(p, RawContentRule(rst_inline=RstInline))

The last example would require the definition of a custom RstInline struct and associated display methods for all supported output types, namely: html, latex, and term. When passing your own keywords to RawContentRule the defaults are not included and must be enabled individually.

Attributes

Block and inline nodes can be tagged with arbitrary metadata in the form of key/value pairs using the AttributeRule extension.

enable!(p, AttributeRule())

Block attributes appear directly above the node that they target:

{#my_id color="red"}
# Heading

This will attach the metadata id="my_id" and color="red" to # Heading.

Inline attributes appear directly after the node that they target:

*Some styled text*{background="green"}.

Which will attach metadata background="green" to the emphasised text Some styled text.

CSS-style shorthand syntax #<name> and .<name> are available to use in place of id="<name>" and class="name". Multiple classes may be specified sequentially.

AttributeRule does not handle writing metadata to particular formats such as HTML or LaTeX. It is up to the implementation of a particular writer format to make use of available metadata itself. The built-in html and latex outputs make use of included attributes. html will include all provided attributes in the output, while latex makes use of only the #<id> attribute.

Citations

Use the following to enable in-text citations and reference list generation:

enable!(p, CitationRule())

Syntax for citations is similar to what is offered by Pandoc. Citations start with @.

Citations can either appear in square brackets [@id], or they can be written as
part of the text like @id. Bracketed citations can contain more than one
citation; separated by semi-colons [@one; @two; and @three].

{#refs}
# References

A reference section that will be populated with a list of all references can be marked using a {#refs} attribute from AttributeRule at the toplevel of the document. The list will be inserted after the node, in this case # References.

Citations and reference lists are formatted following the Chicago Manual of Style. Styling will, in future versions, be customisable using Citation Style Language styles.

The reference data used for citations must be provided in a format matching CSL JSON. Pass this data to CommonMark.jl when writing an AST to a output format.

html(ast, Dict{String,Any}("references" => JSON.parsefile("references.json")))

CSL JSON can be exported easily from reference management software such as Zotero or generated via pandoc-citeproc --bib2json or similar. The references data can be provided by the front matter section of a document so long as the FrontMatterRule has been enabled, though this does require writing your CSL data manually.

Note that the text format of the reference list is not important, and does not have to be JSON data. So long as the shape of the data matches CSL JSON it is valid. Below we use YAML references embedded in the document's front matter:

---
references:
- id: abelson1996
  author:
    - family: Abelson
      given: Harold
    - family: Sussman
      given: Gerald Jay
  edition: 2nd Editon
  event-place: Cambridge
  ISBN: 0-262-01153-0
  issued:
    date-parts:
      - - 1996
  publisher: MIT Press/McGraw-Hill
  publisher-place: Cambridge
  title: Structure and interpretation of computer programs
  type: book
---

Here's a citation [@abelson1996].

{#refs}
# References

Auto Identifiers

Headings within a document can be assigned ids automatically using

enable!(p, AutoIdentifierRule())

Identifiers are determined with CommonMark.slugify, which is based on the algorithm used by Pandoc. Non-unique identifiers are suffixed with a numeric counter and so cannot be considered stable. If you need stable identifiers then you should use AttributeRule to assign stable ids manually.

CommonMark Defaults

Block rules enabled by default in Parser objects:

  • AtxHeadingRule()
  • BlockQuoteRule()
  • FencedCodeBlockRule()
  • HtmlBlockRule()
  • IndentedCodeBlockRule()
  • ListItemRule()
  • SetextHeadingRule()
  • ThematicBreakRule()

Inline rules enabled by default in Parser objects:

  • AsteriskEmphasisRule()
  • AutolinkRule()
  • HtmlEntityRule()
  • HtmlInlineRule()
  • ImageRule()
  • InlineCodeRule()
  • LinkRule()
  • UnderscoreEmphasisRule()

These can all be disabled using disable!. Note that disabling some parser rules may result in unexpected results. It is recommended to be conservative in what is disabled.

Note

Until version 1.0.0 the rules listed above are subject to change and should be considered unstable regardless of whether they are exported or not.

Writer Configuration

When writing to an output format configuration data can be provided by:

  • passing a Dict{String,Any} to the writer method,
  • front matter in the source document using the FrontMatterRule extension.

Front matter takes precedence over the passed Dict.

Notable Variables

Values used to determine template behaviour:

template-engine::Function Used to render standalone document templates.

No default is provided by this package. The template-engine function should follow the interface provided by Mustache.render. It is recommended to use Mustache.jl to provide this functionalilty.

Syntax for opening and closing tags used by CommonMark.jl is ${...}. See the templates in src/writers/templates for usage examples.

<format>.template.file::String Custom template file to use for standalone <format>.

<format>.template.string::String Custom template string to use for standalone <format>.

Generic variables that can be included in templates to customise documents:

abstract::String Summary of the document.

authors::Vector{String} Vector of author names.

date::String Date of file generation.

keywords::Vector{String} Vector of keywords to be included in the document metadata.

lang::String Language of the document.

title::String Title of the document.

subtitle::String Subtitle of the document.

Format-specific variables that should be used only in a particular format's template. They are namespaced to avoid collision with other variables.

html

html.css::Vector{String} Vector of CSS files to include in document.

html.js::Vector{String} Vector of JavaScript files to include in document.

html.header::String String content to add at end of <head>.

html.footer::String String content to add at end of <body>.

latex

latex.documentclass::String Class file to use for document. Default is article.

latex.preamble::String String content to add directly before \begin{document}.

The following are automatically available in document templates.

body::String Main content of the page.

curdir::String Current directory.

outputfile::String Name of file that is being written to. When writing to an in-memory buffer this variable is not defined.

Download Details:

Author: MichaelHatherly
Source Code: https://github.com/MichaelHatherly/CommonMark.jl 
License: View license

#julia #markdown #language 

Monty  Boehm

Monty Boehm

1658900520

JuliaParser.jl: A Rewrite Of Julia's Parser in Julia

JuliaParser

Note: This package is unmaintained and heavily bitrotted. It will not parse up to date Julia code correctly!

A pure Julia port of Julia's parser. It strives to be fully compatible with Julia's built-in parser.

Differences with Julia's built-in parser

  • BigInt and Int128 numbers are treated as literal values instead of expressions.
  • Literal negation is done as negated literals rather than using Expr(:-)
  • QuoteNodes are replaced with Expr(:quote).

Using JuliaParser as your primary parser

JuliaParser provides a script that will replace the built-in parser by itself. You may load it as follows:

julia -L ~/.julia/v0.5/JuliaParser/bin/repl.jl

TODO items

  • performance improvements
  • refactor number tokenization
  • refactor to make it more useful to use as a library (right now it is pretty monolithic)

Trying it out

julia> Pkg.clone("JuliaParser")

julia> import JuliaParser.Parser
julia> import JuliaParser.Lexer

julia> src = """
              function test(x::Int)
                  return x ^ 2
              end
              """
julia> ts = Lexer.TokenStream(src);

julia> Lexer.next_token(ts)
:function

julia> Lexer.next_token(ts)
:test

julia> Lexer.next_token(ts)
'('

julia> Lexer.next_token(ts)
:x

julia> Lexer.next_token(ts)
:(::)

julia> Lexer.next_token(ts)
:Int

julia> ast = Parser.parse(src);

julia> Meta.show_sexpr(ast)
(:function, (:call, :test, (:(::), :x, :Int)), (:block,
    (:line, 2, :none),
    (:return, (:call, :^, :x, 2))
  ))

julia> dump(ast)
Expr 
  head: Symbol function
  args: Array(Any,(2,))
    1: Expr 
      head: Symbol call
      args: Array(Any,(2,))
        1: Symbol test
        2: Expr 
          head: Symbol ::
          args: Array(Any,(2,))
          typ: Any
      typ: Any
    2: Expr 
      head: Symbol block
      args: Array(Any,(2,))
        1: Expr 
          head: Symbol line
          args: Array(Any,(2,))
          typ: Any
        2: Expr 
          head: Symbol return
          args: Array(Any,(1,))
          typ: Any
      typ: Any
  typ: Any

Author: JuliaLang
Source Code: https://github.com/JuliaLang/JuliaParser.jl 
License: View license

#julia #nlp 

OpenSMILES.jl: OpenSMILES parser in Julia

OpenSMILES.jl

This is a SMILES parser in Julia that aims to follow the OpenSMILES format (to the best of my ability). Theres probably bugs, this isn't inventive its just a parser that turns SMILES into a weighted Graphs.jl graph. Contributions welcome!

Notice

Although this package does mostly what it intends too (still missing chiral support, etc), an excellent package MolecularGraph has been released. It is highly reccommended users contribute to that package instead of this one, unless they specifically want LightGraphs.jl(or Graphs.jl) integration.

Examples

Tryptophan

using OpenSMILES, GraphPlot

# Tryptophan
Graph, Data = OpenSMILES.ParseSMILES("C1=CC=C2C(=C1)C(=CN2)CC(C(=O)O)N")
GraphPlot.gplot( Graph, nodelabel = OpenSMILES.abbreviation.( Data ) )
OpenSMILES.EmpiricalFormula( Data ) # C11H12N2O2

Tryptophan

# Bowtie ( not a real molecule :P )
Graph, Data = OpenSMILES.ParseSMILES("C1CC12CC2")
GraphPlot.gplot( Graph, nodelabel = OpenSMILES.abbreviation.( Data ) )
OpenSMILES.EmpiricalFormula( Data ) #C5H8

Bowtie

Cool? Enjoy!

Download Details:

Author: Caseykneale
Source Code: https://github.com/caseykneale/OpenSMILES.jl 
License: MIT license

#julia #open 

Monty  Boehm

Monty Boehm

1658950560

PEGParser.jl: PEG Parser for Julia

PEGParser

PEGParser is a PEG Parser for Julia with Packrat capabilties. PEGParser was inspired by pyparsing, parsimonious, boost::spirit, as well as several others.

Defining a grammar

To define a grammar you can write:

@grammar <name> begin
  rule1 = ...
  rule2 = ...
  ...
end

Allowed rules

The following rules can be used:

  • Terminals: Strings and characters
  • Or: a | b | c
  • And: a + b + c
  • Grouping: (a + b) | (c + d)
  • Optional: ?(a + b)
  • One or more: +((a + b) | (c + d))
  • Zero or more: *((a + b) | (c + d))
  • Look ahead: a > (b + c)
  • Regular expressions: r"[a-zA-Z]+"
  • Lists: list(rule, delim) or list(rule, delim, min=1)
  • Suppression: -rule
  • Semantic action: rule { expr }

For semantic actions, the expr may use the variables: node, value, first, last, and children. The value variable has a corresponding alias _0 and each element of children _i, where i is the index into children. See below for examples using this.

TODO

Multiple: (a+b)^(3, 5)

Example 1

Let's start by creating a simple calculator that can take two numbers and an operator to give a result.

We first define the grammar:

@grammar calc1 begin
  start = number + op + number
  op = plus | minus
  number = -space + r"[0-9]+"
  plus = -space + "+"
  minus = -space + "-"
  space = r"[ \t\n\r]*"
end

All grammars by default use start as the starting rule. You can specify a different starting rule in the parse function if you desire.

The starting rule is composed of two other rules: number and op. For this calculator, we only allow + and -. Note, that this could in fact be written more concisely with:

op = -space + r"[+-]"

The number rule just matches any digit between 0 to 9. You'll note that spaces appear in front of all terminals. This is because PEGs don't handle spaces automatically.

Now we can run this grammar with some input:

(ast, pos, error) = parse(calc1, "4+5")
println(ast)

will result in the following output:

node(start) {AndRule}
1: node(number) {AndRule}
  1: node(number.2) {'4',RegexRule}
2: node(plus) {AndRule}
  1: node(plus.2) {'+',Terminal}
3: node(number) {AndRule}
  1: node(number.2) {'5',RegexRule}

Our input is correctly parsed by our input, but we either have to traverse the tree to get out the result, or use change the output of the parse.

We can change the output of the parse with semantic actions. Every rule already has a semantic action attached to it. Normally it is set to either return a node in the tree or (for the or-rule) give the first child node.

For example, we can change the number rule to emit an actual number:

number = (-space + r"[0-9]+") { parseint(_1.value) }

The curly-braces after a rule allows either an expression or function to be used as the new action. In this case, the first child (the number, as the space is suppressed), as specified by _1, is parsed as an integer.

If we rewrite the grammar fully with actions defined for the rules, we end up with:

@grammar calc1 begin
  start = (number + op + number) {
    apply(eval(_2), _1, _3)
  }

  op = plus | minus
  number = (-space + r"[0-9]+") {parseint(_1.value)}
  plus = (-space + "+") {symbol(_1.value)}
  minus = (-space + "-") {symbol(_1.value)}
  space = r"[ \t\n\r]*"
end

data = "4+5"
(ast, pos, error) = parse(calc1, data)
println(ast)

We now get 9 as an answer. Thus, the parse is also doing the calculation. The code for this can be found in calc1.jl, with calc2.jl providing a more realistic (and useful) calculator.

Example 2

In calc3.jl, you can find a different approach to this problem. Instead of trying to calculate the answer immediately, the full syntax tree is created. This allows it to be transformed into different forms. In this example, we transform the tree into Julia code:

@grammar calc3 begin
  start = expr

  expr_op = term + op1 + expr
  expr = expr_op | term
  term_op = factor + op2 + term

  term = term_op | factor
  factor = number | pfactor
  pfactor = (lparen + expr + rparen) { _2 }
  op1 = add | sub
  op2 = mult | div

  number = (-space + float) { parsefloat(_1.value) } | (-space + integer) { parseint(_1.value) }
  add = (-space + "+") { symbol(_1.value) }
  sub = (-space + "-") { symbol(_1.value) }
  mult = (-space + "*") { symbol(_1.value) }
  div = (-space + "/") { symbol(_1.value) }

  lparen = (-space + "(") { _1 }
  rparen = (-space + ")") { _1 }
  space = r"[ \n\r\t]*"
end

You will also notice that instead of trying to define integer and float manually, we are now using pre-defined parsers. Custom parsers can be defined to both make defining new grammars easier as well as add new types of functionality (e.g. maintaining symbol tables).

The grammar is now ready to be used to parse strings:

(ast, pos, error) = parse(calc3, "3.145+5*(6-4.0)")

which results in the following AST:

node(start) {ReferencedRule}
  node(expr_op) {AndRule}
  1: 3.145 (Float64)
  2: + (Symbol)
  3: node(term_op) {AndRule}
    1: 5 (Int64)
    2: * (Symbol)
    3: node(expr_op) {AndRule}
      1: 6 (Int64)
      2: - (Symbol)
      3: 400.0 (Float64)

Now that we have an AST, we can create transforms to convert the AST into Julia code:

toexpr(node, cnodes, ::MatchRule{:default}) = cnodes
toexpr(node, cnodes, ::MatchRule{:term_op}) = Expr(:call, cnodes[2], cnodes[1], cnodes[3])
toexpr(node, cnodes, ::MatchRule{:expr_op}) = Expr(:call, cnodes[2], cnodes[1], cnodes[3])

and to use the transforms:

code = transform(toexpr, ast)

to generate the Expr:

Expr
  head: Symbol call
  args: Array(Any,(3,))
    1: Symbol +
    2: Float64 3.145
    3: Expr
      head: Symbol call
      args: Array(Any,(3,))
        1: Symbol *
        2: Int64 5
        3: Expr
        head: Symbol call
        args: Array(Any,(3,))
        typ: Any
      typ: Any
  typ: Any

Caveats

This is still very much a work in progress and doesn't yet have as much test coverage as I would like.

The error handling still needs a lot of work. Currently only a single error will be emitted, but the hope is to allow multiple errors to be returned.

Author: Abeschneider
Source Code: https://github.com/abeschneider/PEGParser.jl 
License: View license

#julia #peg 

OpenStreetMapParser.jl: Julia OpenStreetMap Parser

OpenStreetMapParser [Not recommended for use]

See OpenStreetMap.jl for now. But if you must:

This package provides basic functionality for parsing OpenStreetMap data in Julia, in the following file formats:

For a complete introduction into the OSM project, the OSM API, and the OSM XML file format we refer to the project’s wiki available at http://wiki.openstreetmap.org/.

Installation

OSM Elements

The OpenStreetMap project provides data in the OSM XML format, which consists of three basic elements:

  • Node: The basic element. (defining points in space)
  • Way: An ordered interconnection of nodes (defining linear features and area boundaries)
  • Relation: A grouping of elements (nodes, ways, and relations), which are sometimes used to explain how other elements work together

The following functions are supported:

parseNodes() # document/examples
parseWays() # document/examples
parseRelations() # document/examples
parseMap() # document/examples

Each element has further attributes like the element ID (unique within the corresponding element group) and timestamp. Furthermore, each element may have an arbitrary number of tags (key-value pairs) which describe the element. Ways and relations, in addition, have references to their members’ IDs.

Remark: A distinction should be made between data elements (a data primitive used to represent semantic objects), and semantic elements (which represent the geometry of physical world objects). Data elements are an implementation detail, while semantic elements carry the desired meaning. This will be made clearer in the next section on OSM Features.

OSM Features

OpenStreetMap represents physical features on the ground (e.g., roads or buildings) using tags attached to its basic data structures (its nodes, ways, and relations). Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation. The community agrees on certain key and value combinations for the most commonly used tags, which act as informal standards. For a comprehensive list of OSM features, we suggest visiting their wiki page here http://wiki.openstreetmap.org/wiki/Map_Features.

Scope of this Package

This package is meant for parsing of small/medium-sized (typically city-sized, <500MB) OSM files. If you're dealing with bigger files, you might want to scope it down into something smaller, or handle it through a database instead.

It will be possible with LibExpat, but not particularly profitable for us to selection/filtering of the OSM data within the parser itself. Given the size of the files we expect (c.f. #1), you can either filter/select them after the parsing, or roll out your own parser to perform the selection/filtering.

All coordinates are unprojected WGS84 (EPSG:4326). You can perform the necessary transformations through Geodesy.jl or LibOGR.jl.

The availability of high-resolution aerial imagery has led to many features being recorded as areas (building or site outlines), not points, in OpenStreetMap. You will, for example, often find a restaurant or hotel drawn as an area. This might make processing difficult because you have to cater for both types of features even if you are not interested in areas. As the conversion from areas to points is not well-defined, we do not perform it automatically.

We will not be providing the following conveniences, but suggest packages that might help (in parentheses):

  • plotting/viewing of the map elements (Compose/Winston) # OpenStreetMapPlotter.jl
  • routing on the road network (LightGraphs/Graphs) # OpenStreetMapRouter.jl
  • map projections/transformations between different coordinate systems (Geodesy/OGR)
  • filtering/selection of data (DataFrames)
  • geometric operations (JuliaGeometry/LibGEOS)

We will, on the other hand, support Pull-Requests that updates the package to be in line with official/well-supported frameworks of OSM data.

References

Download Details:

Author: Yeesian
Source Code: https://github.com/yeesian/OpenStreetMapParser.jl 
License: View license

#julia #map