I've created a #blog post comparing the performance of standard library 'sort' calls in different languages. Chapel's sort uses composable parallelism to be 10x faster than other popular languages for a test sort on my PC.
@mppf sorting apples and oranges.
@manpacket @mppf Here is a PR for a better comparison:
https://github.com/chapel-lang/chapel/pull/24302
Why would you use the rand crate for random number generation but not the rayon crate for parallel sorting? Read my comment in the PR about Rust's standard library design and who maintains crates like rayon.
You probably also know that THE sorting algorithm doesn't exist. You need to compare the same algorithm…
@mo8it @manpacket @mppf a while ago i attempted to contribute rayon to https://github.com/Voultapher/sort-research-rs which benchmarks quite a few sorting algorithms. understandably, the maintainer was not interested in paralell sorts, but its still quite a nice resource.
@mo8it @mppf it is still comparing sort with safety checks on vs off and radix sort that only works for numbers vs I'm assuming it's quick sort. For fair comparison we need to get those safety checks back on and start sorting strings. Or u128 numbers.
Next there's a question of why exactly sort in most of the languages is not parallel - usually you don't have gigabytes of numbers to sort, you might have 20 of them. So adding multiple sizes is a logical choice. Along with two versions of Rust - parallel and single threaded.
@mo8it @mppf then there's C and C++. Sorting is one of the things that C++ can do better than C thanks to templates. C must call comparison function leading to overhead, C++ can do it inline. Looking at the benchmark sources now I see two versions that use qsort, one compiled by C compiler and one by C++...
@mppf this is a rather disingenuous comparison, since your headline figure is only true for a system with 16 cores. Surely a more honest headline is "Chapel's sort is faster than other languages by a factor depending on how many cores you're allowed to use"? Because on an RPi4 it's not going to be 10x faster!