hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.5K
active users

zwarich

Does anyone have a favorite strategy for mismatched bracket error recovery in recursive descent parsers?

@zwarich My favorite strategy is technically *before* the parser because I find it really hard to do well there:

Detect during lexing, and then post-process the token stream, ideally using indent to guide fixes.

Some of this implemented, but not the indentation bit:
github.com/carbon-language/car

There is a TODO, and we have the indent data, just need someone to write the code to peak at the indent and select good fixes until we run out, and then run the greedy algorithm to fix anything left.

GitHubcarbon-lang/toolchain/lex/lex.cpp at trunk · carbon-language/carbon-langCarbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README) - carbon-language/carbon-lang

@chandlerc @zwarich IMHO for a new language like Carbon, even if you have braces rather than indentation-defined block structure it makes sense to enforce indentation consistency with braces (and other grouping tokens) in the language definition itself rather than leaving it as a degree of freedom in individual lexer/parser implementations.

@pervognsen @zwarich Mostly? We definitely thought about that but didn't *quite* go that direction.

Specifically, it didn't seem worth *forcing* the indent check, and producing compile error messages when it is wrong. For example, copy/pasting code with too much or little indentation -- it seems more useful for the user to compile than reject that.

We then expect the formatter and linter to enforce consistent indentation. And then design error recovery entirely around consistent indentation.

@chandlerc I came up with something similar, but was curious whether in practice it's really enough to do this without the grammatical context that you would have from a parser. Curious to see how it works out for Carbon in practice.

@zwarich yeah, definitely interesting to see how it pans out.

Generally, my feeling is that recovering balanced delimiters somewhat in isolation is likely to result in unsurprising recovery for users compared to using more contextual cues.

But the big win IMO is the simplicity (and speed) of the parser due to not needing to try to do any of this stuff.

@chandlerc Yeah, I don't really think there's a great way to do this otherwise in a recursive descent parser. All of the other comparable solutions I know of rely on modifying either a shift-reduce parser or a general CF parser.

Another interesting question is whether you disable all other errors within a recovery pair.

@zwarich Yeah, that's a question I really wonder about, but we don't have any real experience playing with options here.

If we start using indentation, I would expect us to be able to make decent guess on "absurd" recoveries and disable errors within that. But definitely an interesting area to explore what actually works best for users...