Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

**Ariadne Conill** @ariadne@treehouse.systems · Aug 12, 2024

Aug 12, 2024

Ariadne Conill @ariadne@treehouse.systems

hmm, this musl-perf thing effectively makes musl LGPL, but they don't supply the source for the glibc string functions they borrowed.

**Ariadne Conill** @ariadne@treehouse.systems · Aug 12, 2024

Aug 12, 2024

Ariadne Conill @ariadne@treehouse.systems

one of the patches they apply changes the stdio default buffer size from 1024 bytes to 8192 bytes. why? who knows, no rationale is provided.

i guess the thinking is to align on a page boundary, but why *two* pages?

**Ariadne Conill** @ariadne@treehouse.systems · Aug 12, 2024

Aug 12, 2024

Ariadne Conill @ariadne@treehouse.systems

outside of adding glibc string functions and the bufsize change, they add ifunc support to the linker.

ifuncs are awful because they make program execution inconsistent across different microarchitectures

**Cassandrich** @dalias · Aug 13, 2024

Aug 13, 2024

Cassandrich @dalias

@ariadne Ifunc is just an utterly dumb way to do runtime microarch specific code selection.

**Val Packett** @valpackett@treehouse.systems · Aug 13, 2024

Aug 13, 2024

Val Packett @valpackett@treehouse.systems

@dalias @ariadne what makes it "dumb"? AFAIK it's just the least-overhead way, applying the selection at the ELF relocations level seems like the correct place to do it to me

**Robert D. French** @robertdfrench@mastodon.social · Aug 30, 2024

Aug 30, 2024

Robert D. French @robertdfrench@mastodon.social

@valpackett @dalias @ariadne Oooh, let me try to answer that! I think the "dumb" comes from the fact that there are other, more portable ways to implement "runtime selection of features" (i.e. function pointers stored in a protected page) that don't allow libraries to provide *arbitrary plugins* for the dynamic linker.

I wrote a whole mini-thesis on this if you are interested in getting INSANELY deep down in the weeds: https://github.com/robertdfrench/ifuncd-up

GitHubGitHub - robertdfrench/ifuncd-up: GNU IFUNC is the real culprit behind CVE-2024-3094GNU IFUNC is the real culprit behind CVE-2024-3094 - robertdfrench/ifuncd-up

**James Henstridge** @jamesh@aus.social · Aug 31, 2024

Aug 31, 2024

James Henstridge @jamesh@aus.social

@robertdfrench @valpackett @dalias @ariadne Your benchmarks don't seem to be testing the use case ifuncs purport to improve. You're basically just showing that there is overhead in routing function calls via the dynamic linker compared to doing them direct, which is true but not particularly interesting.

For the use in glibc, they're already paying the PLT indirection cost. So the ifunc use lets them avoid a second indirection to pick the implementation.

A more useful benchmark would be to put your increment_counter() implementations in a shared library called by your benchmark harness.

**Robert D. French** @robertdfrench@mastodon.social · Sep 3, 2024

Sep 3, 2024

Robert D. French @robertdfrench@mastodon.social

@jamesh @valpackett @dalias @ariadne What do you think about this? https://github.com/robertdfrench/ifuncd-up/pull/21

Every different approach has its own libincrement that contains two different runtime-selectable increment implementations, so the cost now reflects making all of those available via the PLT.

This does not seem to change the fact that ifunc does not outperform function pointers, nor does it meaningfully outperform the worst case strategy of just checking the CPU features every single time.

GitHubMeasure invocation cost for dynamic symbols. Fixes #20 by robertdfrench · Pull Request #21 · robertdfrench/ifuncd-upBy robertdfrench

Cassandrich @dalias@hachyderm.io

@robertdfrench @jamesh @valpackett @ariadne What's been obvious to me for a long time is that, even if there were a performance advantage to ifunc, it could only be when the entire function call is so short that call overhead can be a significant portion of overall time.

On the other hand, use of a uarch-optimized variant for something like memcpy is only going to make any sense when the operation is above a certain size/time threshold.

Sep 03, 2024, 03:12 PM··Web

0boosts·1favorite

**Cassandrich** @dalias · Sep 3, 2024

Sep 3, 2024

Cassandrich @dalias

@robertdfrench @jamesh @valpackett @ariadne This is why the proposed direction for further uarch-optimized string ops, etc. in #musl is not to have full asm functions selected at runtime, but to allow archs to provide uarch-optimized "bulk middle" operations that only get called for large operations, don't have any alignment/edge-case logic, and that get called from the generic C function only past a threshold where they could help (and where call cost is tiny %).

**Robert D. French** @robertdfrench@mastodon.social · Sep 4, 2024

Sep 4, 2024

Robert D. French @robertdfrench@mastodon.social

@dalias @jamesh @valpackett @ariadne spot on.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back