hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.4K
active users

rain 🌦️

The winnow library for is the most fun I've had writing parsers in my entire life. It's taken a thing that I've always dreaded and turned it into a joy.

I love two things about it:

1. It uses &mut &[u8], &mut &str, &mut <token stream ref>, etc. That is simply so much nicer to use than consuming a &[u8] and returning a &[u8].

2. It isn't all-encompassing! it encourages you to interleave imperative and functional code, just like idiomatic Rust in general.

docs.rs/winnow

docs.rswinnow - Rustwinnow, making parsing a breeze

I've written parsers before using:

- lex/yacc
- nom
- hand-written recursive descent

winnow just stands out as so much nicer. Hand-written parsers will probably still produce better error messages, but winnow has upcoming support for error recovery that I'm really looking forward to.

(For nextest's DSL we do manual error recovery with winnow, which is possible but annoying.)

I cannot emphasize enough how much I like the ability to interleave imperative and functional/combinator-style parsing. Sometimes I don't want to have to think about the right combinator -- in those cases I can just write a for-loop myself. It's awesome, and less error-prone than nom!

@rain yeah, I love this. When writing the parser for KittyCAD's programming language, there were some times where I had really weird logic I couldn't express in a combinator pipeline, a for loop was SO much easier. It's rare, but really helpful when you need it.

@adam_chal I think it's really important for a well-designed framework to be able to fall back to something more familiar. Whether it's for beginners who aren't used to it, or for advanced users who need something very complex

@rain Yeah, especially for combinators, it's nice to be able to write your own, the same way the library author did (i.e. loops).

Also a big reason to like Winnow: @epage usually responds to and fixes bug reports, and cuts a new patch release, within a few hours. Seriously amazing.

@rain oh, good! we need to look at it, in that case

@rain I've looked at winnow several times. Every time I quickly become overwhelmed and completely lost.

The docs don't seem to give a good explanation of *how* to use the crate to do something. How do you decide which combinators to use?

The function signatures are a mess too, further adding to the confusion. And every time the docs say "this is simple" or "this is easy" etc., it's really discouraging if you are struggling to understand.

@bigbass Sounds like it's been a real struggle for you, sorry!

Do you have experience with parser combinator libraries? If not, it might be worth trying to start with something simple, like a parser for a list of integers terminated by newlines.

@bigbass you'll want to do that in two parts:

1. parse a single integer -- accept 1 or more digits

2. use the repeat_till, terminated, and eof combinators to put it together

@bigbass You can definitely do this without a parser library! But it may be an easy way to get started.

@rain I've written my own text and binary parsers, and did some stuff with pest (PEG). But nothing with combinators. The winnow concept seems simple on the surface, but it isn't obvious how to actually use it.

My natural instinct for your example, is to split the string based on a delimiter. But that doesn't seem to work with winnow's mentality. My guess would have been to somehow use take_until and discard the newline, and wrap that in a repeat. But I'm guessing that would error at EOF.

@bigbass Yes, generally parsers like winnow read incrementally, left-to-right, rather than doing a top-level analysis of the whole input buffer.

In this case you'd use three combinators:

1. repeat_till to repeat a parser a specified number of times
2. terminated to take two parsers one after the other, and discard the result of the second parser
3. eof as the termination condition for repeat_till.

give it a go!

@bigbass In winnow, character and string literals are also parsers, so you can just do:

terminated(parse_int, '\n')

or

terminated(parse_int, b'\n')

@rain Well perhaps I'll give it another shot sometime. Appreciate your help and encouragement though!

Still, I think the crate's documentation could be better, or at least expanded with more explanation and guidance.

@bigbass I do agree that I've found the documentation to be slightly lacking. But it's still a work in progress, and I think I'll submit some PRs to improve the situation :)

@rain @bigbass strangely this seems to be a common issue with parser combinator libraries. I wonder why? E.g. Parsec's (Python) documentation is also *highly* useless as an introduction, and the "Examples" section is just weird. That it's inspired by a haskell lib is not an "example" ...?

pythonhosted.org/parsec/

pythonhosted.orgParsec: A parser combinator library in Python — parsec 3.3 documentationA universal Python parser combinator library inspired by Parsec library of Haskell.

@dngrs @bigbass oof, that's not great. I think parser combinators are a thing you have to spend a few hours learning, but that's not helpful at all.

FWIW I think winnow's basic example is pretty good, a nontrivial but still simple use of the basic combinators.

docs.rs/winnow/latest/winnow/#

docs.rswinnow - Rustwinnow, making parsing a breeze

@bigbass @rain Sounds like there is a gap with mental models. I remember running into that when I first used parser combinators (`nom`) but overcame that as I had to dig into it more in maintaining `toml_edit` (which used `combine` at that time). Unfortunately, its been long enough that I have the curse of knowledge making it harder to see how to bridge that gap.

@bigbass @rain
If you'd be available for it, I'd be interested in interviewing you to see if we can find the essence of that gap to see how it could be addressed.

@epage @rain I'm available via the Rust Discord if you wish to talk further.

One example is take_till vs take_until. Semantically that means the same thing, and their descriptions on the doc's list of combinators, seem to say the same thing just with different words.

Another is the phrase "the longest list of bytes/chars"; it feels ambiguous. Is there a chance it'll return some kind of shorter list? "longest" is a comparison, but it's not clear what the alternative outcome there is.

@epage @rain I agree combinator parsers seem to require a particular mindset, one that I've struggled to grasp. There's plenty of people who say it's really good for creating parsers, but I'm just not seeing it.

Another major concern is that even if I did manage to learn, and used it to write a library, anyone wishing to contribute would then need to go through the whole ordeal of learning winnow too.

@epage @bigbass these improvements are awesome -- particularly love the non-generic example one! I think most combinators people will write by themselves will probably be non-generic, so it's helpful to have examples to work with.

@rain @bigbass I've just published a parser package ("sipp") to crates.io and I have to admit that winnow does sound a bit complicated (but also potentially powerful and elegant if you can make sense of it).

Using my sipp parser, a newline-separated list of numbers would need a while loop like this:

while let Some(_) = parser.peek()? {
if let Some(num) = parser.read_up_to('\n')? {
println!("Found num {}.", num);
}
parser.accept('\n')?;
}

@rain number 1 sounds like it would make it more similar to the ergonomics of using parsec in haskell, since that library uses monads to make returning the updated position automatic and implicit, so that’s a nice detail. a mutable reference would be the correct way to translate a state monad from haskell to rust