Stability of std sort was undertested before this change. Add a fuzz
test for more confidence.
Specifically, we used to have a single example test that used an array
of eight elements. That ends up exercising only a tiny fraction of
sorting logic, as it hits a hard-coded sorting network due to small
size.
The old definitions had some problems:
- In `lowerBound`, the `lhs` (left hand side) argument was passed on the
right hand side.
- In `upperBound`, the `greaterThan` function needed to return
`greaterThanOrEqual` for the function work, so either the name or the
implementation is incorrect.
To fix both problems, define the functions in terms of a `lessThan` function that returns `lhs < rhs`.
The is more consistent with the rest of `sort.zig` and it's also how C++ implements lower/upperBound (1)(2).
(1) https://en.cppreference.com/w/cpp/algorithm/lower_bound
(2) https://en.cppreference.com/w/cpp/algorithm/upper_bound
- Rewrite doc comments.
- Add a couple of more test cases.
- Add docstring for std.sort.binarySearch
Authored by https://github.com/CraigglesO
Original Discussion: #9890
I already had to create these functions and test cases for my own
project so I decided to contribute to the main code base in hopes it
would simplify my own.
I realize this is still under discussion but this was a trivial amount
of work so I thought I could help nudge the discussion towards a
decision.
Why add these to the standard library
To better illustrate and solidify their value, the standard library's
"sort" module already contains several binary search queries on arrays
such as binarySearch and internal functions binaryFirst & binaryLast. A
final example of its use: the Zig code itself created and used a
bounding search in the linker.
There still lacks the ability to allow the programmer themselves to
search the array for a rough position and find an index to read &/
update.
Adding these functions would also help to complement dynamic structures
like ArrayList with it's insert function.
Example Case
I'm building a library in Zig for GIS geometry. To store points, lines,
and polygons each 3D point is first translated into what's called an
S2CellId. This is a fancy way of saying I reduce the Earth's spherical
data into a 1D Hilbert Curve with cm precision. This gives me 2
convenient truths:
Hilbert Curves have locality of reference.
All points can be stored inside a 1D array
Since lowerBound and upperBound to find data inside a radius for
instance. If I'm interested in a specific cell at a specific "level" and
want to iterate all duplicates, equalRange is a best fit.
This reverts commit 0c99ba1eab, reversing
changes made to 5f92b070bf.
This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
(Firstly, I changed `n` to `b`, as that is less confusing. It's not a length, it's a right boundary.)
The invariant maintained is `cur < b`. In the worst case `2*cur + 1` results in a maximum of `2b`. Since `2b` is not guaranteed to be lower than `maxInt`, we have to add one overflow check to `siftDown` to make sure we avoid undefined behavior.
LLVM also seems to have a nicer time compiling this version of the function. It is about 2x faster in my tests (I think LLVM was stumped by the `child += @intFromBool` line), and adding/removing the overflow check has a negligible performance difference on my machine. Of course, we could check `2b <= maxInt` in the parent function, and dispatch to a version of the function without the overflow check in the common case, but that probably is not worth the code size just to eliminate a single instruction.
Forcing the key to be of the same type as the sorted items used during
the search is a valid use case.
There, however, exists some cases where the key and the items are of
heterogeneous types, like searching for a code point in ordered ranges
of code points:
```zig
const CodePoint = u21;
const CodePointRange = [2]CodePoint;
const valid_ranges = &[_]CodePointRange{
// an ordered array of ranges
};
fn orderCodePointAndRange(
context: void,
code_point: CodePoint,
range: CodePointRange
) std.math.Order {
_ = context;
if (code_point < range[0]) {
return .lt;
}
if (code_point > range[1]) {
return .gt;
}
return .eq;
}
fn isValidCodePoint(code_point: CodePoint) bool {
return std.sort.binarySearch(
CodePointRange,
code_point,
valid_ranges,
void,
orderCodePointAndRange
) != null;
}
```
It is so expected that `std.sort.binarySearch` should therefore support
both homogeneous and heterogeneous keys.
This also adds `std.sort.sortContext` and
`std.sort.insertionSortContext` which are more advanced methods that
allow overriding the `swap` method. The former calls the latter for now
because reworking the main sort implementation is a big task that can be
done later without any changes to the API.
These changes have been made to resolve issue #10037. The `Random`
interface was implemented in such a way that causes significant slowdown
when calling the `fill` function of the rng used.
The `Random` interface is no longer stored in a field of the rng, and is
instead returned by the child function `random()` of the rng. This
avoids the performance issues caused by the interface.
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
Conflicts:
* doc/langref.html.in
* lib/std/enums.zig
* lib/std/fmt.zig
* lib/std/hash/auto_hash.zig
* lib/std/math.zig
* lib/std/mem.zig
* lib/std/meta.zig
* test/behavior/alignof.zig
* test/behavior/bitcast.zig
* test/behavior/bugs/1421.zig
* test/behavior/cast.zig
* test/behavior/ptrcast.zig
* test/behavior/type_info.zig
* test/behavior/vector.zig
Master branch added `try` to a bunch of testing function calls, and some
lines also had changed how to refer to the native architecture and other
`@import("builtin")` stuff.