1660987560
An auto formatter for Julia.
using Pkg
Pkg.add("DocumentFormat")
using DocumentFormat
Documentation:
The main function to format code is format
. When called with a string argument, that string is assumed to be code and a new string in which the code is formatted is returned. When called with an AbstractPath
that points to a file, that file is being formatted. If called with an AbstractPath
that points to a folder, all *.jl
files in that folder are formatted.
The function isformatted
checks whether a piece of code is formatted. It can be called with a string (assumed to hold code) or an AbstractPath
.
Author: Julia-vscode
Source Code: https://github.com/julia-vscode/DocumentFormat.jl
License: View license
1660987560
An auto formatter for Julia.
using Pkg
Pkg.add("DocumentFormat")
using DocumentFormat
Documentation:
The main function to format code is format
. When called with a string argument, that string is assumed to be code and a new string in which the code is formatted is returned. When called with an AbstractPath
that points to a file, that file is being formatted. If called with an AbstractPath
that points to a folder, all *.jl
files in that folder are formatted.
The function isformatted
checks whether a piece of code is formatted. It can be called with a string (assumed to hold code) or an AbstractPath
.
Author: Julia-vscode
Source Code: https://github.com/julia-vscode/DocumentFormat.jl
License: View license
1633338948
Width-sensitive formatter for Julia code. Inspired by gofmt, refmt, and black.
]add JuliaFormatter
julia> using JuliaFormatter
# Recursively formats all Julia files in the current directory
julia> format(".")
# Formats an individual file
julia> format_file("foo.jl")
# Formats a string (contents of a Julia file)
julia> format_text(str)
Check out the docs for further description of the formatter and its options.
For integration with other editors:
Download Details:
Author: domluna
The Demo/Documentation: View The Demo/Documentation
Download Link: Download The Source Code
Official Website: https://github.com/domluna/JuliaFormatter.jl
License: MIT
#julia #programming #developer
1668229380
Aqua.jl provides functions to run a few automatable checks for Julia packages:
export
s.Project.toml
.Project.toml
and test project (test/Project.toml
) are consistent.deps
have corresponding compat
entry.Project.toml
formatting is compatible with Pkg.jl output.See more in the documentation.
Call Aqua.test_all(YourPackage)
from test/runtests.jl
, e.g.,
using YourPackage
using Aqua
Aqua.test_all(YourPackage)
To avoid breaking test when a new Aqua.jl version is released, it is recommended to add version bound for Aqua.jl in test/Project.toml
:
[deps]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[compat]
Aqua = "0.5"
You can add the following line in README.md to include Aqua.jl badge:
[](https://github.com/JuliaTesting/Aqua.jl)
which is rendered as
Author: JuliaTesting
Source Code: https://github.com/JuliaTesting/Aqua.jl
License: MIT license
1659066232
In this Julia tutorial, we'll learn about JuliaSyntax.jl, a Julia frontend, written in Julia. JuliaSyntax.jl is a new Julia language frontend designed for precise error reporting, speed and flexibility.
A Julia frontend, written in Julia.
JuliaSyntax.jl is a new Julia language frontend designed for precise error reporting, speed and flexibility.
Core
ParseStream
interface.SyntaxNode
(an AST) layered on top of GreenNode
(a lossless parse tree). We might need other tree types later.The library is in pre-0.1 stage, but parses all of Base correctly with only a handful of failures remaining in the Base tests and standard library. The tree data structures should be somewhat usable but will evolve as we try out various use cases.
Examples
Here's what parsing of a small piece of code currently looks like in various forms. We'll use the parseall
convenience function to demonstrate, but there's also a more flexible parsing interface with JuliaSyntax.parse()
.
First, a source-ordered AST with SyntaxNode
(call-i
in the dump here means the call
has the infix -i
flag):
julia> parseall(SyntaxNode, "(x + y)*z", filename="foo.jl")
line:col│ byte_range │ tree │ file_name
1:1 │ 1:9 │[toplevel] │foo.jl
1:1 │ 1:9 │ [call-i]
1:2 │ 2:6 │ [call-i]
1:2 │ 2:2 │ x
1:4 │ 4:4 │ +
1:6 │ 6:6 │ y
1:8 │ 8:8 │ *
1:9 │ 9:9 │ z
Internally this has a full representation of all syntax trivia (whitespace and comments) as can be seen with the more raw "green tree" representation with GreenNode
. Here ranges on the left are byte ranges, and ✔
flags nontrivia tokens. Note that the parentheses are trivia in the tree representation, despite being important for parsing.
julia> text = "(x + y)*z"
greentree = parseall(GreenNode, text)
1:9 │[toplevel]
1:9 │ [call]
1:1 │ (
2:6 │ [call]
2:2 │ Identifier ✔
3:3 │ Whitespace
4:4 │ + ✔
5:5 │ Whitespace
6:6 │ Identifier ✔
7:7 │ )
8:8 │ * ✔
9:9 │ Identifier ✔
GreenNode
stores only byte ranges, but the token strings can be shown by supplying the source text string:
julia> show(stdout, MIME"text/plain"(), greentree, text)
1:9 │[toplevel]
1:9 │ [call]
1:1 │ ( "("
2:6 │ [call]
2:2 │ Identifier ✔ "x"
3:3 │ Whitespace " "
4:4 │ + ✔ "+"
5:5 │ Whitespace " "
6:6 │ Identifier ✔ "y"
7:7 │ ) ")"
8:8 │ * ✔ "*"
9:9 │ Identifier ✔ "z"
Julia Expr
can also be produced:
julia> parseall(Expr, "(x + y)*z")
:($(Expr(:toplevel, :((x + y) * z))))
Using JuliaSyntax as the default parser
To use JuliaSyntax as the default Julia parser to include()
files, parse code with Meta.parse()
, etc, call
julia> JuliaSyntax.enable_in_core!()
This causes some startup latency, so to reduce that you can create a custom system image by running the code in ./sysimage/compile.jl
as a Julia script (or directly using the shell, on unix). Then use julia -J $resulting_sysimage
.
Using a custom sysimage has the advantage that package precompilation will also go through the JuliaSyntax parser.
Parser implementation
Our goal is to losslessly represent the source text with a tree; this may be called a "lossless syntax tree". (This is sometimes called a "concrete syntax tree", but that term has also been used for the parse tree of the full formal grammar for a language including any grammar hacks required to solve ambiguities, etc. So we avoid this term.)
JuliaSyntax
uses a mostly recursive descent parser which closely follows the high level structure of the flisp reference parser. This makes the code familiar and reduces porting bugs. It also gives a lot of flexibility for designing the diagnostics, tree data structures, compatibility with different Julia versions, etc. I didn't choose a parser generator as they still seem marginal for production compilers — for the parsing itself they don't seem greatly more expressive and they can be less flexible for the important "auxiliary" code which needs to be written in either case.
We use a version of Tokenize.jl which has been modified to better match the needs of parsing:
String
kindas
, var
, doc
) have been added and moved to a subcategory of keywords.This copy of Tokenize lives in the JuliaSyntax
source tree due to the volume of changes required but once the churn settles down it would be good to figure out how to un-fork the lexer in some way or other.
The main parser innovation is the ParseStream
interface which provides a stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required. This is like the design of rust-analyzer though with a simpler implementation.
Parsing proceeds by recursive descent;
peek()
to examine tokens and bump()
to consume them.bump()
to transfer tokens to the output and position()
/emit()
for nonterminal ranges.bump()
ed and don't need to be handled explicitly. The exception is syntactically relevant newlines in space sensitive mode.ParseState
.The output spans track the byte range, a syntax "kind" stored as an integer tag, and some flags. The kind tag makes the spans a sum type but where the type is tracked explicitly outside of Julia's type system.
For lossless parsing the output spans must cover the entire input text. Using bump()
, position()
and emit()
in a natural way also ensures that:
These properties make the output spans naturally isomorphic to a "green tree" in the terminology of C#'s Roslyn compiler.
The build_tree
function performs a depth-first traversal of the ParseStream
output spans allowing it to be assembled into a concrete tree data structure, for example using the GreenNode
data type. We further build on top of this to define build_tree
for the AST type SyntaxNode
and for normal Julia Expr
.
The goal of the parser is to produce well-formed hierarchical structure from the source text. For interactive tools we need this to work even when the source text contains errors; it's the job of the parser to include the recovery heuristics to make this work.
Concretely, the parser in JuliaSyntax
should always produce a green tree which is well formed in the sense that GreenNode
s of a given Kind
have well-defined layout of children. This means the GreenNode
to SyntaxNode
transformation is deterministic and tools can assume they're working with a "mostly valid" AST.
What does "mostly valid" mean? We allow the tree to contain the following types of error nodes:
a + (b *
as (call-i a + (call-i * b XXX))
where XXX
is a placeholder error node.a + b end * c
could be parsed as the green tree (call-i a + b (error-t end * c))
, and turned into the AST (call + a b)
.We want to encode both these cases in a way which is simplest for downstream tools to use. This is an open question, but for now we use K"error"
as the kind, with the TRIVIA_FLAG
set for unexpected syntax.
Syntax trees
Julia's Expr
abstract syntax tree can't store precise source locations or deal with syntax trivia like whitespace or comments. So we need some new tree types in JuliaSyntax
.
JuliaSyntax currently deals in three types of trees:
GreenNode
is a minimal lossless syntax tree whereSyntaxNode
is an abstract syntax tree which hasGreenTree
nodes.Expr
is used as a conversion target for compatibilityWherever possible, the tree structure of GreenNode
/SyntaxNode
is 1:1 with Expr
. There are, however, some exceptions.
First, GreenNode
inherently stores source position, so there's no need for the LineNumberNode
s used by Expr
. There's also a small number of other differences
Flattened generators are uniquely problematic because the Julia AST doesn't respect a key rule we normally expect: that the children of an AST node are a contiguous range in the source text. This is because the for
s in [xy for x in xs for y in ys]
are parsed in the normal order of a for loop to mean
for x in xs
for y in ys
push!(xy, collection)
so the xy
prefix is in the body of the innermost for loop. Following this, the standard Julia AST is like so:
(flatten
(generator
(generator
xy
(= y ys))
(= x xs)))
however, note that if this tree were flattened, the order would be (xy) (y in ys) (x in xs)
and the x
and y
iterations are opposite of the source order.
However, our green tree is strictly source-ordered, so we must deviate from the Julia AST. The natural representation seems to be to remove the generators and use a flattened structure:
(flatten
xy
(= x xs)
(= y ys))
For triple quoted strings, the indentation isn't part of the string data so should also be excluded from the string content within the green tree. That is, it should be treated as separate whitespace trivia tokens. With this separation things like formatting should be much easier. The same reasoning goes for escaping newlines and following whitespace with backslashes in normal strings.
Detecting string trivia during parsing means that string content is split over several tokens. Here we wrap these in the K"string" kind (as is already used for interpolations). The individual chunks can then be reassembled during Expr construction. (A possible alternative might be to reuse the K"String" and K"CmdString" kinds for groups of string chunks (without interpolation).)
Take as an example the following Julia fragment.
x = """
$a
b"""
Here this is parsed as (= x (string-s a "\n" "b"))
(the -s
flag in string-s
means "triple quoted string")
Looking at the green tree, we see the indentation before the $a
and b
are marked as trivia:
julia> text = "x = \"\"\"\n \$a\n b\"\"\""
show(stdout, MIME"text/plain"(), parseall(GreenNode, text, rule=:statement), text)
1:23 │[=]
1:1 │ Identifier ✔ "x"
2:2 │ Whitespace " "
3:3 │ = "="
4:4 │ Whitespace " "
5:23 │ [string]
5:7 │ """ "\"\"\""
8:8 │ String "\n"
9:12 │ Whitespace " "
13:13 │ $ "\$"
14:14 │ Identifier ✔ "a"
15:15 │ String ✔ "\n"
16:19 │ Whitespace " "
20:20 │ String ✔ "b"
21:23 │ """ "\"\"\""
We generally track the type of syntax nodes with a syntax "kind", stored explicitly in each node an integer tag. This effectively makes the node type a sum type in the type system sense, but with the type tracked explicitly outside of Julia's type system.
Managing the type explicitly brings a few benefits:
is_operator
can be extremely efficient, given that we know the meaning of the kind's bits.There's arguably a few downsides:
QuoteNode
with a single field vs Expr
with generic head
and args
fields.) This could be a disadvantage for code which processes one specific kind but for generic code processing many kinds having a generic but concrete data layout should be faster.Differences from the flisp parser
Practically the flisp parser is not quite a classic recursive descent parser, because it often looks back and modifies the output tree it has already produced. We've tried to eliminate this pattern it favor of lookahead where possible because
However, on occasion it seems to solve genuine ambiguities where Julia code can't be parsed top-down with finite lookahead. Eg for the kw
vs =
ambiguity within parentheses. In these cases we put up with using the functions look_behind
and reset_node!()
.
Large structural changes were generally avoided while porting. In particular, nearly all function names for parsing productions are the same with -
replaced by _
and predicates prefixed by is_
.
Some notable differences:
parse-arglist
and a parts of parse-paren-
have been combined into a general function parse_brackets
. This function deals with all the odd corner cases of how the AST is emitted when mixing ,
and ;
within parentheses. In particular regard to:;
are block syntax separators or keyword parametersparameter
sections based on contextkw
or =
depending on contextparse-resword
is entered has been rearranged to avoid parsing reserved words with parse-atom
inside parse-unary-prefix
. Instead, we detect reserved words and enter parse_resword
earlier.Here's some behaviors which seem to be bugs. (Some of these we replicate in the name of compatibility, perhaps with a warning.)
b() = rand() > 0.5 ? Base : Core
b().@info "hi"
@
in macro module paths like A.@B.x
is parsed as odd broken-looking AST like (macrocall (. A (quote (. B @x))))
. It should probably be rejected.+(a;b,c)
where keyword parameters are separated by commas. A tuple is produced instead.const
and global
allow chained assignment, but the right hand side is not constant. a
const here but not b
.const a = b = 1
ncat
array concatenation syntax within braces gives strange AST: {a ;; b}
parses to (bracescat 2 a b)
which is the same as {2 ; a ; b}
, but should probably be (bracescat (nrow 2 a b))
in analogy to how {a b}
produces (bracescat (row a b))
.export a, \n $b
is rejected, but export a, \n b
parses fine.finally
clause is allowed before the catch
, but always executes afterward. (Presumably was this a mistake? It seems pretty awful!)"[x \n\n ]"
the flisp parser gets confused, but "[x \n ]"
is correctly parsed as Expr(:vect)
(maybe fixed in 1.7?)f(x for x in in xs)
is accepted, and parsed very strangely."\777"
results in "\xff"
. This is inconsistent with Base.parse(::Type{Int}, ...)
import .⋆
parsing to (import (. .⋆))
whereas it should be (import (. . ⋆))
for consistency with the parsing of import .A
.f(((((x=1)))))
parses as a keyword call to function f
with the keyword x=1
, but arguably it should be an assignment.f
for example, 0x1p1f
but this doesn't do anything. In the flisp
C code such cases are treated as Float32 literals and this was intentional https://github.com/JuliaLang/julia/pull/2925 but this has never been officially supported in Julia. It seems this bug arises from (set! pred char-hex?)
in parse-number
accepting hex exponent digits, all of which are detected as invalid except for a trailing f
when processed by isnumtok_base
.There's various allowed syntaxes which are fairly easily detected in the parser, but which will be rejected later during lowering. To allow building DSLs this is fine and good but some such allowed syntaxes don't seem very useful, even for DSLs:
macro (x) end
is allowed but there are no anonymous macros.abstract type A < B end
and other subtype comparisons are allowed, but only A <: B
makes sense.x where {S T}
produces (where x (bracescat (row S T)))
. This seems pretty weird![x for outer x in xs]
parses, but outer
makes no real sense in this context (and using this form is a lowering error)kw
and =
inconsistenciesThere's many apparent inconsistencies between how kw
and =
are used when parsing key=val
pairs inside parentheses.
(a=1,) # (tuple (= a 1))
f.(a=1) # (tuple (kw a 1))
,
and ;
in calls give nested parameter AST which parses strangely, and is kind-of-horrible to use.# (tuple (parameters (parameters e f) c d) a b)
(a,b; c,d; e,f)
function (a;b) end
the (a;b)
is parsed as a block! This leads to more inconsistency in the use of kw
for keywords.Operators with suffices don't seem to always be parsed consistently as the same operator without a suffix. Unclear whether this is by design or mistake. For example, [x +y] ==> (hcat x (+ y))
, but [x +₁y] ==> (hcat (call +₁ x y))
global const x=1
is normalized by the parser into (const (global (= x 1)))
. I suppose this is somewhat useful for AST consumers, but reversing the source order is pretty weird and inconvenient when moving to a lossless parser.
let
bindings might be stored in a block, or they might not be, depending on special cases:
# Special cases not in a block
let x=1 ; end ==> (let (= x 1) (block))
let x::1 ; end ==> (let (:: x 1) (block))
let x ; end ==> (let x (block))
# In a block
let x=1,y=2 ; end ==> (let (block (= x 1) (= y 2) (block)))
let x+=1 ; end ==> (let (block (+= x 1)) (block))
The elseif
condition is always in a block but not the if
condition. Presumably because of the need to add a line number node in the flisp parser if a xx elseif b yy end ==> (if a (block xx) (elseif (block b) (block yy)))
Spaces are allowed between import dots — import . .A
is allowed, and parsed the same as import ..A
import A..
produces (import (. A .))
which is arguably nonsensical, as .
can't be a normal identifier.
The raw string escaping rules are super confusing for backslashes near the end of the string: raw"\\\\ "
contains four backslashes, whereas raw"\\\\"
contains only two. However this was an intentional feature to allow all strings to be represented and it's unclear whether the situation can be improved.
In braces after macrocall, @S{a b}
is invalid but both @S{a,b}
and @S {a b}
parse. Conversely, @S[a b]
parses.
Macro names and invocations are post-processed from the output of parse-atom
/ parse-call
, which leads to some surprising and questionable constructs which "work":
@(((((a))))) x ==> (macrocall @a x)
@(x + y) ==> (macrocall @+ x y)
(ok, kinda cute and has some weird logic to it... but what?)@(f(x)) ==> (macrocall @f x)
Allowing @
first in macro module paths (eg @A.B.x
instead of A.B.@x
) seems like unnecessary variation in syntax. It makes parsing valid macro module paths more complex and leads to oddities like @$.x y ==> (macrocall ($ (quote x)) y
where the $
is first parsed as a macro name, but turns out to be the module name after the .
is parsed. But $
can never be a valid module name in normal Julia code so this makes no sense.
Triple quoted var"""##"""
identifiers are allowed. But it's not clear these are required or desired given that they come with the complex triple-quoted string deindentation rules.
Deindentation of triple quoted strings with mismatched whitespace is weird when there's nothing but whitespace. For example, we have "\"\"\"\n \n \n \"\"\"" ==> "\n \n"
so the middle line of whitespace here isn't dedented but the other two longer lines are?? Here it seems more consistent that either (a) the middle line should be deindented completely, or (b) all lines should be dedented only one character, as that's the matching prefix.
Parsing of anonymous function arguments is somewhat inconsistent. function (xs...) \n body end
parses the argument list as (... xs)
, whereas function (x) \n body end
parses the argument list as (tuple x)
.
The difference between multidimensional vs flattened iterators is subtle, and perhaps too syntactically permissive. For example,
[(x,y) for x * in 1:10, y in 1:10]
is a multidimensional iterator[(x,y) for x * in 1:10 for y in 1:10]
is a flattened iterator[(x,y) for x in 1:10, y in 1:10 if y < x]
is a flattened iteratorComparisons to other packages
The official Julia compiler frontend lives in the Julia source tree. It's mostly contained in just a few files:
.scm
and .c
files.There's two issues with the official reference frontend which suggest a rewrite.
First, there's no support for precise source locations and the existing data structures (bare flisp lists) can't easily be extended to add these. Fixing this would require changes to nearly all of the code.
Second, it's written in flisp: an aestheically pleasing, minimal but obscure implementation of Scheme. Learning Scheme is actually a good way to appreciate some of Julia's design inspiration, but it's quite a barrier for developers of Julia language tooling. (Flisp has no user-level documentation but non-schemers can refer to the Racket documentation which is quite compatible for basic things.) In addition to the social factors, having the embedded flisp interpreter and runtime with its own separate data structures and FFI is complex and inefficient.
JuliaParser.jl was a direct port of Julia's flisp reference parser but was abandoned around Julia 0.5 or so. However it doesn't support lossless parsing and doing so would amount to a full rewrite. Given the divergence with the flisp reference parser since Julia-0.5, it seemed better just to start with the reference parser instead.
Tokenize.jl is a fast lexer for Julia code. The code from Tokenize has been imported and used in JuliaSyntax, with some major modifications as discussed in the lexer implementation section.
CSTParser.jl is a (mostly?) lossless parser with goals quite similar to JuliaParser and used extensively in the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very useful but I do find the implementation hard to understand and I wanted to try a fresh approach with a focus on:
Core
.rust-analyzer
is very clean, well documented, and a great source of inspiration.A big benefit of the JuliaSyntax parser is that it separates the parser code from the tree data structures entirely which should give a lot of flexibility in experimenting with various tree representations.
I also want JuliaSyntax to tackle macro expansion and other lowering steps, and provide APIs for this which can be used by both the core language and the editor tooling.
Using a modern production-ready parser generator like tree-sitter
is an interesting option and some progress has already been made in tree-sitter-julia. But I feel like the grammars for parser generators are only marginally more expressive than writing the parser by hand after accounting for the effort spent on the weird edge cases of a real language and writing the parser's tests and "supporting code".
On the other hand a hand-written parser is completely flexible and can be mutually understood with the reference implementation so I chose that approach for JuliaSyntax.
Resources
Here's a few links to relevant Julia issues.
Persistence, façades and Roslyn’s red-green trees
rust-analyzer
seems to be very close to what I'm building here, and has come to the same conclusions on green tree layout with explicit trivia nodes. Their document on internals here is great. Points of note:
In general I think it's unclear whether we want typed ASTs in Julia and we particularly need to deal with the fact that Expr
is the existing public interface. Could we have Expr2
wrap SyntaxNode
?
Not all the design decisions in rust-analyzer
are finalized but the architecture document is a fantastic source of design inspiration.
Highlights:
RSLint is a linter for javascript, built in Rust. It uses the same parsing infrastructure and green tree libraries rust-analyzer
. There's an excellent and friendly high level overview of how all this works in the rslint parsing devdocs.
Points of note:
Backtracking and restarting the parser on error is actually quite simple in the architecture we (mostly) share with rust-analyzer
:
... events allow us to cheaply backtrack the parser by simply draining the events and resetting the token source cursor back to some place.
The section on error recovery is interesting; they talk about various error recovery strategies.
The paper P2429 - Concepts Error Messages for Humans is C++ centric, but has a nice review of quality error reporting in various compilers including Elm, ReasonML, Flow, D and Rust.
Some Rust-specific resources:
println!
macro shows how these can be emitted from macrosModern parser generator has a lot of practical notes on writing parsers. Highlights:
Some notes about stateful lexers for parsing shell-like string interpolations: http://www.oilshell.org/blog/2017/12/17.html
Design notes
The following are some fairly disorganized design notes covering a mixture of things which have already been done and musings about further work.
The tree datastructure design here is tricky:
2*(x + y)
and the explicit vs implicit multiplication symbol in 2*x
vs 2x
.Having so many use cases suggests it might be best to have several different tree types with a common interface rather than one main abstract syntax tree type. But it seems useful to figure this out by prototyping several important work flows:
[a, b] = (c, d)
should report "invalid assignment location [a, b]
". But at a precise source location.Raw syntax tree (or "Green tree" in the terminology from Roslyn)
We want GreenNode to be
The simplest idea possible is to have:
Call represents a challenge for the AST vs Green tree in terms of node placement / iteration for infix operators vs normal prefix function calls.
a + 1
vs +(a, 1)
a + 1 + 2
vs +(a, 1, 2)
Clearly in the AST's interface we need to abstract over this placement. For example with something like the normal Julia AST's iteration order.
By pointing to green tree nodes, AST nodes become traceable back to the original source.
Unlike most languages, designing a new AST is tricky because the existing Expr
is a very public API used in every macro expansion. User-defined macro expansions interpose between the source text and lowering, and using Expr
looses source information in many ways.
There seems to be a few ways forward:
Expr
some new semi-hidden fields to point back to the green tree nodes that the Expr
or its args
list came from?Expr
during macro expansion and try to recover source information after macro expansion using heuristics. Likely the presence of correct hygiene can help with this.One option which may help bridge between locationless ASTs and something new may be to have wrappers for the small number of literal types we need to cover. For example:
SourceSymbol <: AbstractSymbol
SourceInt <: Integer
SourceString <: AbstractString
Having source location attached to symbols would potentially solve most of the hygiene problem. There's still the problem of macro helper functions which use symbol literals; we can't very well be changing the meaning of :x
! Perhaps the trick there is to try capturing the current module at the location of the interpolation syntax. Eg, if you do :(y + $x)
, lowering expands this to Core._expr(:call, :+, :y, x)
, but it could expand it to something like Core._expr(:call, :+, :y, _add_source_symbol(_module_we_are_lowering_into, x))
?
Some disorganized musings about error recovery
Different types of errors seem to occur...
=
in parse_atom, or a closing token inside an infix expression. Here we can emit a K"error"
, but we can't descend further into the parse tree; we must pop several recursive frames off. Seems tricky!A typical structure is as follows:
function parse_foo(ps)
mark = position(ps)
parse_bar(ps) # What if this fails?
if peek(ps) == K"some-token"
bump(ps)
parse_baz(ps) # What if this fails?
emit(ps, mark, K"foo")
end
end
Emitting plain error tokens are good in unfinished infix expressions:
begin
a = x +
end
The "missing end" problem is tricky, as the intermediate syntax is valid; the problem is often only obvious until we get to EOF.
Missing end
function f()
begin
a = 10
end
# <-- Indentation would be wrong if g() was an inner function of f.
function g()
end
It seems like ideal error recovery would need to backtrack in this case. For example:
f()
g()
was calledf()
Missing commas or closing brackets in nested structures also present the existing parser with a problem.
f(a,
g(b,
c # -- missing comma?
d),
e)
Again the local indentation might tell a story
f(a,
g(b,
c # -- missing closing `)` ?
d)
But not always!
f(a,
g(b,
c # -- missing closing `,` ?
d))
Another particularly difficult problem for diagnostics in the current system is broken parentheses or double quotes in string interpolations, especially when nested.
Fun research questions
Can we learn fast and reasonably accurate recovery heuristics for when the parser encounters broken syntax, rather than hand-coding these? How would we set the parser up so that training works and injecting the model is nonintrusive? If the model is embedded in and works together with the parser, can it be made compact enough that training is fast and the model itself is tiny?
Given source and syntax tree, can we regress/learn a generative model of indentation from the syntax tree? Source formatting involves a big pile of heuristics to get something which "looks nice"... and ML systems have become very good at heuristics. Also, we've got huge piles of training data — just choose some high quality, tastefully hand-formatted libraries.
#julia
1659577560
This package ships as part of the Julia stdlib.
SparseArrays.jl provides functionality for working with sparse arrays in Julia.
To use a newer version of this package, you need to build Julia from scratch. The build process is the same as any other build except that DEPS_GIT="SparseArrays"
should be passed to make
i.e. make DEPS_GIT="SparseArrays"
. Then after the build is complete, the git repo in stdlib/SparseArrays
can be used to check out the desired commit/branch. Alternatively, you need to change the commit used in stdlib/SparseArrays.version
. It is also possible to do both to get an up to date git repo directly. There is no need to rebuild julia in case of changes in stdlib/SparseArrays
(or SparseArrays-<sha1>
if DEPS_GIT
was not used) as the package is not in the sysimage but having a git repo is convenient.
It's also possible to load a development version of the package using the trick used in the Section named "Using the development version of Pkg.jl" in the Pkg.jl
repo, but the capabilities are limited as all other packages will depend on the stdlib version of the package and will not work with the modified package.
The main environment may become inconsistent so you might need to run Pkg.instantiate()
and/or Pkg.resolve()
in the main or project environments if Julia complains about missing Serialization.jl
in this package's dependencies.
For older (1.8 and before) SuiteSparse.jl
needs to be bumped too.
Documentation | Build Status |
---|---|
Author: JuliaSparse
Source Code: https://github.com/JuliaSparse/SparseArrays.jl
License: View license