The winnow library for #RustLang is the most fun I've had writing parsers in my entire life. It's taken a thing that I've always dreaded and turned it into a joy.
I love two things about it:
1. It uses &mut &[u8], &mut &str, &mut <token stream ref>, etc. That is simply so much nicer to use than consuming a &[u8] and returning a &[u8].
2. It isn't all-encompassing! it encourages you to interleave imperative and functional code, just like idiomatic Rust in general.
I've written parsers before using:
- lex/yacc
- nom
- hand-written recursive descent
winnow just stands out as so much nicer. Hand-written parsers will probably still produce better error messages, but winnow has upcoming support for error recovery that I'm really looking forward to.
(For nextest's DSL we do manual error recovery with winnow, which is possible but annoying.)
I cannot emphasize enough how much I like the ability to interleave imperative and functional/combinator-style parsing. Sometimes I don't want to have to think about the right combinator -- in those cases I can just write a for-loop myself. It's awesome, and less error-prone than nom!
@rain yeah, I love this. When writing the parser for KittyCAD's programming language, there were some times where I had really weird logic I couldn't express in a combinator pipeline, a for loop was SO much easier. It's rare, but really helpful when you need it.
@adam_chal I think it's really important for a well-designed framework to be able to fall back to something more familiar. Whether it's for beginners who aren't used to it, or for advanced users who need something very complex
@rain oh, good! we need to look at it, in that case
@rain I've looked at winnow several times. Every time I quickly become overwhelmed and completely lost.
The docs don't seem to give a good explanation of *how* to use the crate to do something. How do you decide which combinators to use?
The function signatures are a mess too, further adding to the confusion. And every time the docs say "this is simple" or "this is easy" etc., it's really discouraging if you are struggling to understand.
@bigbass Sounds like it's been a real struggle for you, sorry!
Do you have experience with parser combinator libraries? If not, it might be worth trying to start with something simple, like a parser for a list of integers terminated by newlines.
@bigbass you'll want to do that in two parts:
1. parse a single integer -- accept 1 or more digits
2. use the repeat_till, terminated, and eof combinators to put it together
@bigbass You can definitely do this without a parser library! But it may be an easy way to get started.
@rain I've written my own text and binary parsers, and did some stuff with pest (PEG). But nothing with combinators. The winnow concept seems simple on the surface, but it isn't obvious how to actually use it.
My natural instinct for your example, is to split the string based on a delimiter. But that doesn't seem to work with winnow's mentality. My guess would have been to somehow use take_until and discard the newline, and wrap that in a repeat. But I'm guessing that would error at EOF.
@bigbass Yes, generally parsers like winnow read incrementally, left-to-right, rather than doing a top-level analysis of the whole input buffer.
In this case you'd use three combinators:
1. repeat_till to repeat a parser a specified number of times
2. terminated to take two parsers one after the other, and discard the result of the second parser
3. eof as the termination condition for repeat_till.
give it a go!
@bigbass In winnow, character and string literals are also parsers, so you can just do:
terminated(parse_int, '\n')
or
terminated(parse_int, b'\n')
@rain Well perhaps I'll give it another shot sometime. Appreciate your help and encouragement though!
Still, I think the crate's documentation could be better, or at least expanded with more explanation and guidance.
@bigbass I do agree that I've found the documentation to be slightly lacking. But it's still a work in progress, and I think I'll submit some PRs to improve the situation :)
@bigbass @rain Sounds like there is a gap with mental models. I remember running into that when I first used parser combinators (`nom`) but overcame that as I had to dig into it more in maintaining `toml_edit` (which used `combine` at that time). Unfortunately, its been long enough that I have the curse of knowledge making it harder to see how to bridge that gap.
@epage @rain I'm available via the Rust Discord if you wish to talk further.
One example is take_till vs take_until. Semantically that means the same thing, and their descriptions on the doc's list of combinators, seem to say the same thing just with different words.
Another is the phrase "the longest list of bytes/chars"; it feels ambiguous. Is there a chance it'll return some kind of shorter list? "longest" is a comparison, but it's not clear what the alternative outcome there is.
@epage @rain I agree combinator parsers seem to require a particular mindset, one that I've struggled to grasp. There's plenty of people who say it's really good for creating parsers, but I'm just not seeing it.
Another major concern is that even if I did manage to learn, and used it to write a library, anyone wishing to contribute would then need to go through the whole ordeal of learning winnow too.
@bigbass @rain For anyone following along, I've been making some documentation updates
- https://github.com/winnow-rs/winnow/pull/485
- https://github.com/winnow-rs/winnow/pull/486
- https://github.com/winnow-rs/winnow/pull/491
All released. Always looking for further ways of improving things.
@rain @bigbass I've just published a parser package ("sipp") to crates.io and I have to admit that winnow does sound a bit complicated (but also potentially powerful and elegant if you can make sense of it).
Using my sipp parser, a newline-separated list of numbers would need a while loop like this:
while let Some(_) = parser.peek()? {
if let Some(num) = parser.read_up_to('\n')? {
println!("Found num {}.", num);
}
parser.accept('\n')?;
}
@rain number 1 sounds like it would make it more similar to the ergonomics of using parsec in haskell, since that library uses monads to make returning the updated position automatic and implicit, so that’s a nice detail. a mutable reference would be the correct way to translate a state monad from haskell to rust