Had a bunch of thoughts about the recent safety stuff, way more than fit in social media post... Blog post story time! (It's a bit of a ramble, sorry about that...)
https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/
@chandlerc I'm not a fan of using %-diffs to make an argument around effectiveness of performance improvements. More often than not, these numbers just lead people astray.
For all we know, the 0.3% penalty might just be so small because it's being overshadowed by some other severe inefficiency in the codebase.
There's an interesting effect where inefficient code will suffer *less* from adding *more* inefficient code, because it's already bottlenecked.
@dist1ll So, if you look at the referenced blog post[1], we actually clarified what this represented. This is 0.3% across Google's entire main production fleet. Our fleet performance is dominated by the hottest services, a relatively small %-age, your classical long-tailed distribution. Those services are **incredibly** optimized systems. We have large teams doing nothing but removing every tiny inefficiency we can find.
[1]: https://security.googleblog.com/2024/11/retrofitting-spatial-safety-to-hundreds.html
@dist1ll We've also published pretty in-depth articles about this environment if you want to get a better sense:
https://dl.acm.org/doi/abs/10.1145/2749469.2750392
https://dl.acm.org/doi/abs/10.1145/3620666.3651350
@chandlerc (Thanks for the articles and response)
I'm curious, how much of that optimization is done on the infra side, compared to the application side? I was under the impression that orgs prioritizing infra optimizations, like PGO, data structures, stdlib stuff like memcpy, improving compilers etc.
Perhaps I'm way off base. I guess what I'm curious about is how much effort is spent on application-specific optimizations, things that perhaps *don't* carry over to other parts of the codebase.
@dist1ll The larger applications have their own teams driving application-side optimizations. That covers a *lot* of the larger applications.
And we then also have a large team that drives infrastructure level optimizations just like what you mention.
It's a joint effort, and both teams talk extensively. So for these systems, they are *very* well optimized. There are huge incentives to find and fix any significant inefficiencies.
@chandlerc Makes sense. In that case, congrats for getting such low overheads! Happy to see much of the long-standing FUD around efficient spatial safety challenged.