hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.6K
active users

hmm, this musl-perf thing effectively makes musl LGPL, but they don't supply the source for the glibc string functions they borrowed.

one of the patches they apply changes the stdio default buffer size from 1024 bytes to 8192 bytes. why? who knows, no rationale is provided.

i guess the thinking is to align on a page boundary, but why *two* pages?

outside of adding glibc string functions and the bufsize change, they add ifunc support to the linker.

ifuncs are awful because they make program execution inconsistent across different microarchitectures

Cassandrich

@ariadne Ifunc is just an utterly dumb way to do runtime microarch specific code selection.

@dalias yeah, i agree. it would be nice to have some of those AVX string implementations though in musl.

@ariadne Possibly. We have a tentative roadmap for a reasonable way to do that involving nothing like ifunk.

@dalias @ariadne what makes it "dumb"? AFAIK it's just the least-overhead way, applying the selection at the ELF relocations level seems like the correct place to do it to me

@valpackett @dalias @ariadne Oooh, let me try to answer that! I think the "dumb" comes from the fact that there are other, more portable ways to implement "runtime selection of features" (i.e. function pointers stored in a protected page) that don't allow libraries to provide *arbitrary plugins* for the dynamic linker.

I wrote a whole mini-thesis on this if you are interested in getting INSANELY deep down in the weeds: github.com/robertdfrench/ifunc

GitHubGitHub - robertdfrench/ifuncd-up: GNU IFUNC is the real culprit behind CVE-2024-3094GNU IFUNC is the real culprit behind CVE-2024-3094 - robertdfrench/ifuncd-up

@robertdfrench @valpackett @dalias @ariadne

x86 microarchitecture levels [ phoronix.com/news/GCC-11-x86-6 ] would probably address 90% of IFUNC uses, I wish we could get that rolling more... (And maybe have an ARM64 equivalent?)

But distro & package support seems spotty on this for now :'(

www.phoronix.comGCC 11's x86-64 Microarchitecture Feature Levels Are Ready To Roll

@robertdfrench @valpackett @dalias @ariadne (to be clear this means shipping 5 copies of binaries, i.e. it does also dissimilarize what code is running - and thus a new factor in bugs - but at least you can quite easily tell what you ended up with. And switching to another variant is just moving files around.)

@equinox @valpackett @dalias @ariadne This is a great solution for systems that are installed and operated on the same hardware, but VM & Container images have to boot without guidance from the package manager (until SystemD grows its own package manager, which it should!)

@robertdfrench @valpackett @dalias @ariadne that's not how this works, all 5 binaries are part of the same package; the dynamic linker chooses which one to load at program start. They're in different subdirectories under /lib. (...needs work for /bin...)

Of course the package is then 5x in size for binaries, which depending on your use case can be anywhere from irrelevant to a dealbreaker.

@equinox @valpackett @dalias @ariadne oh you want the linker making the choice? Yeah, I could get behind that. You could go even further and mark symbols in the same binary as being variants for each micro-architecture, and then let the linker assemble it based on its own feature detection decisions. Like if ifunc were a table rather than ARBITRARY CODE.

I endorse this solution wholeheartedly.

@robertdfrench @equinox @valpackett @ariadne The linker doesn't even need to make the choice. The system can just be configured to symlink the ones to a tmpfs or bind mount them over the default baseline-portable ones or add a directory to the path search file as appropriate for the running hardware.

This is why does not (and won't) have uarch-optimization-resolving logic in ldso. It's easily factored to a better policy layer.

@dalias @robertdfrench @valpackett @ariadne this conveniently also works for /bin, it's just... "less obvious"... where to put the uarch subdirs. Not that it needs a huge standard or anything.

Really just a question of build and packaging.

P.S.: I'm really eager on this because I have good reasons to want POPCOUNT. Which is only missing on the very oldest x86_64 CPUs :'(

@dalias @equinox @valpackett @ariadne yeah okay, I'll allow it. That approach would give a lot more administrative visibility anyways, since you could just run `mount` instead of having to query the linker for its decisions.

However... we do already have the expectation that you can query the linker for how it would resolve dependencies. So if it can't give you the "whole" picture, that might confuse folks.

@robertdfrench @valpackett @dalias @ariadne Your benchmarks don't seem to be testing the use case ifuncs purport to improve. You're basically just showing that there is overhead in routing function calls via the dynamic linker compared to doing them direct, which is true but not particularly interesting.

For the use in glibc, they're already paying the PLT indirection cost. So the ifunc use lets them avoid a second indirection to pick the implementation.

A more useful benchmark would be to put your increment_counter() implementations in a shared library called by your benchmark harness.

@jamesh @valpackett @dalias @ariadne yeah I have been feeling a little unsure about those tests for a while. Let me take another crack at them at see what comes out.

@jamesh @valpackett @dalias @ariadne What do you think about this? github.com/robertdfrench/ifunc

Every different approach has its own libincrement that contains two different runtime-selectable increment implementations, so the cost now reflects making all of those available via the PLT.

This does not seem to change the fact that ifunc does not outperform function pointers, nor does it meaningfully outperform the worst case strategy of just checking the CPU features every single time.

GitHubMeasure invocation cost for dynamic symbols. Fixes #20 by robertdfrench · Pull Request #21 · robertdfrench/ifuncd-upBy robertdfrench

@robertdfrench @jamesh @valpackett @ariadne What's been obvious to me for a long time is that, even if there were a performance advantage to ifunc, it could only be when the entire function call is so short that call overhead can be a significant portion of overall time.

On the other hand, use of a uarch-optimized variant for something like memcpy is only going to make any sense when the operation is above a certain size/time threshold.

@robertdfrench @jamesh @valpackett @ariadne This is why the proposed direction for further uarch-optimized string ops, etc. in is not to have full asm functions selected at runtime, but to allow archs to provide uarch-optimized "bulk middle" operations that only get called for large operations, don't have any alignment/edge-case logic, and that get called from the generic C function only past a threshold where they could help (and where call cost is tiny %).

@robertdfrench I'd suggest structuring your benchmarks so the choice of implementation is made on the library side in all situations: currently it is done on the library side in some cases and in the benchmark program side in others.

Ideally the benchmark harness would be identical in each case, perhaps even letting you swap one library implementation for another.