Back

hmm, this musl-perf thing effectively makes musl LGPL, but they don't supply the source for the glibc string functions they borrowed.

**Ariadne Conill** @ariadne@treehouse.systems · Aug 12, 2024

Aug 12, 2024

one of the patches they apply changes the stdio default buffer size from 1024 bytes to 8192 bytes. why? who knows, no rationale is provided.

i guess the thinking is to align on a page boundary, but why *two* pages?

**Ariadne Conill** @ariadne@treehouse.systems · Aug 12, 2024

Aug 12, 2024

outside of adding glibc string functions and the bufsize change, they add ifunc support to the linker.

ifuncs are awful because they make program execution inconsistent across different microarchitectures

Cassandrich @dalias@hachyderm.io

@ariadne Ifunc is just an utterly dumb way to do runtime microarch specific code selection.

Aug 13, 2024, 01:59 AM··Web

1boost·1favorite

**Ariadne Conill** @ariadne@treehouse.systems · Aug 13, 2024

Aug 13, 2024

@dalias yeah, i agree. it would be nice to have some of those AVX string implementations though in musl.

**Cassandrich** @dalias · Aug 13, 2024

Aug 13, 2024

Val Packett @valpackett@treehouse.systems

@ariadne Possibly. We have a tentative roadmap for a reasonable way to do that involving nothing like ifunk.

**Val Packett** @valpackett@treehouse.systems · Aug 13, 2024

Aug 13, 2024

@dalias @ariadne what makes it "dumb"? AFAIK it's just the least-overhead way, applying the selection at the ELF relocations level seems like the correct place to do it to me

**Robert D. French** @robertdfrench@mastodon.social · Aug 30, 2024

Aug 30, 2024

Robert D. French @robertdfrench@mastodon.social

@valpackett @dalias @ariadne Oooh, let me try to answer that! I think the "dumb" comes from the fact that there are other, more portable ways to implement "runtime selection of features" (i.e. function pointers stored in a protected page) that don't allow libraries to provide *arbitrary plugins* for the dynamic linker.

I wrote a whole mini-thesis on this if you are interested in getting INSANELY deep down in the weeds: https://github.com/robertdfrench/ifuncd-up

GitHubGitHub - robertdfrench/ifuncd-up: GNU IFUNC is the real culprit behind CVE-2024-3094GNU IFUNC is the real culprit behind CVE-2024-3094 - robertdfrench/ifuncd-up

**equi** @equinox@chaos.social · Aug 31, 2024

@robertdfrench @valpackett @dalias @ariadne

x86 microarchitecture levels [ https://www.phoronix.com/news/GCC-11-x86-64-Feature-Levels ] would probably address 90% of IFUNC uses, I wish we could get that rolling more... (And maybe have an ARM64 equivalent?)

But distro & package support seems spotty on this for now :'(

www.phoronix.comGCC 11's x86-64 Microarchitecture Feature Levels Are Ready To Roll

**equi** @equinox@chaos.social · Aug 31, 2024

@robertdfrench @valpackett @dalias @ariadne (to be clear this means shipping 5 copies of binaries, i.e. it does also dissimilarize what code is running - and thus a new factor in bugs - but at least you can quite easily tell what you ended up with. And switching to another variant is just moving files around.)

**Robert D. French** @robertdfrench@mastodon.social · Aug 31, 2024

Robert D. French @robertdfrench@mastodon.social

@equinox @valpackett @dalias @ariadne This is a great solution for systems that are installed and operated on the same hardware, but VM & Container images have to boot without guidance from the package manager (until SystemD grows its own package manager, which it should!)

**equi** @equinox@chaos.social · Aug 31, 2024

@robertdfrench @valpackett @dalias @ariadne that's not how this works, all 5 binaries are part of the same package; the dynamic linker chooses which one to load at program start. They're in different subdirectories under /lib. (...needs work for /bin...)

Of course the package is then 5x in size for binaries, which depending on your use case can be anywhere from irrelevant to a dealbreaker.

**Robert D. French** @robertdfrench@mastodon.social · Aug 31, 2024

Robert D. French @robertdfrench@mastodon.social

@equinox @valpackett @dalias @ariadne oh you want the linker making the choice? Yeah, I could get behind that. You could go even further and mark symbols in the same binary as being variants for each micro-architecture, and then let the linker assemble it based on its own feature detection decisions. Like if ifunc were a table rather than ARBITRARY CODE.

I endorse this solution wholeheartedly.

**Cassandrich** @dalias · Aug 31, 2024

@robertdfrench @equinox @valpackett @ariadne The linker doesn't even need to make the choice. The system can just be configured to symlink the ones to a tmpfs or bind mount them over the default baseline-portable ones or add a directory to the path search file as appropriate for the running hardware.

This is why #musl does not (and won't) have uarch-optimization-resolving logic in ldso. It's easily factored to a better policy layer.

**equi** @equinox@chaos.social · Aug 31, 2024

@dalias @robertdfrench @valpackett @ariadne this conveniently also works for /bin, it's just... "less obvious"... where to put the uarch subdirs. Not that it needs a huge standard or anything.

Really just a question of build and packaging.

P.S.: I'm really eager on this because I have good reasons to want POPCOUNT. Which is only missing on the very oldest x86_64 CPUs :'(

**Robert D. French** @robertdfrench@mastodon.social · Aug 31, 2024

Robert D. French @robertdfrench@mastodon.social

@dalias @equinox @valpackett @ariadne yeah okay, I'll allow it. That approach would give a lot more administrative visibility anyways, since you could just run `mount` instead of having to query the linker for its decisions.

However... we do already have the expectation that you can query the linker for how it would resolve dependencies. So if it can't give you the "whole" picture, that might confuse folks.

**James Henstridge** @jamesh@aus.social · Aug 31, 2024

James Henstridge @jamesh@aus.social

@robertdfrench @valpackett @dalias @ariadne Your benchmarks don't seem to be testing the use case ifuncs purport to improve. You're basically just showing that there is overhead in routing function calls via the dynamic linker compared to doing them direct, which is true but not particularly interesting.

For the use in glibc, they're already paying the PLT indirection cost. So the ifunc use lets them avoid a second indirection to pick the implementation.

A more useful benchmark would be to put your increment_counter() implementations in a shared library called by your benchmark harness.

**Robert D. French** @robertdfrench@mastodon.social · Aug 31, 2024

Robert D. French @robertdfrench@mastodon.social

@jamesh @valpackett @dalias @ariadne yeah I have been feeling a little unsure about those tests for a while. Let me take another crack at them at see what comes out.

**Robert D. French** @robertdfrench@mastodon.social · Sep 3, 2024

Sep 3, 2024

Robert D. French @robertdfrench@mastodon.social

@jamesh @valpackett @dalias @ariadne What do you think about this? https://github.com/robertdfrench/ifuncd-up/pull/21

Every different approach has its own libincrement that contains two different runtime-selectable increment implementations, so the cost now reflects making all of those available via the PLT.

This does not seem to change the fact that ifunc does not outperform function pointers, nor does it meaningfully outperform the worst case strategy of just checking the CPU features every single time.

GitHubMeasure invocation cost for dynamic symbols. Fixes #20 by robertdfrench · Pull Request #21 · robertdfrench/ifuncd-upBy robertdfrench

**Cassandrich** @dalias · Sep 3, 2024

Sep 3, 2024

@robertdfrench @jamesh @valpackett @ariadne What's been obvious to me for a long time is that, even if there were a performance advantage to ifunc, it could only be when the entire function call is so short that call overhead can be a significant portion of overall time.

On the other hand, use of a uarch-optimized variant for something like memcpy is only going to make any sense when the operation is above a certain size/time threshold.

**Cassandrich** @dalias · Sep 3, 2024

Sep 3, 2024