When filling the last (len % 4) bytes of a buffer, the random number n was only being shifted right by 4 bits for each byte instead of 8. A random u16, for example, would always have its middle two nybbles be equal when generated this way. For comparison, Isaac64.zig, Sfc64.zig, and Xoroshiro128.zig all correctly shift right by 8 bits for each of the last bytes in their nearly identical fill functions.
This makes a few changes to the base64 codecs.
* The padding character is optional. The common "URL-safe" variant, in
particular, is generally not used with padding. This is also the case for
password hashes, so having this will avoid code duplication with bcrypt,
scrypt and other functions.
* The URL-safe variant is added. Instead of having individual constants
for each parameter of each variant, we are now grouping these in a
struct. So, `standard_pad_char` just becomes `standard.pad_char`.
* Types are not `snake_case`'d any more. So, `standard_encoder` becomes
`standard.Encoder`, as it is a type.
* Creating a decoder with ignored characters required the alphabet and
padding. Now, `standard.decoderWithIgnore(<ignored chars>)` returns a
decoder with the standard parameters and the set of ignored chars.
* Whatever applies to `standard.*` obviously also works with `url_safe.*`
* the `calcSize()` interface was inconsistent, taking a length in the
encoder, and a slice in the encoder. Rename the variant that takes a
slice to `calcSizeForSlice()`.
* In the decoder with ignored characters, add `calcSizeUpperBound()`,
which is more useful than the one that takes a slice in order to size
a fixed buffer before we have the data.
* Return `error.InvalidCharacter` when the input actually contains
characters that are neither padding nor part of the alphabet. If we
hit a padding issue (which includes extra bits at the end),
consistently return `error.InvalidPadding`.
* Don't keep the `char_in_alphabet` array permanently in a decoder;
it is only required for sanity checks during initialization.
* Tests are unchanged, but now cover both the standard (padded) and
the url-safe (non-padded) variants.
* Add an error set, rename `OutputTooSmallError` to `NoSpaceLeft`
to match the `hex2bin` equivalent.
Introduce "inline" variants of ZIR tags:
* block => block_inline
* repeat => repeat_inline
* break => break_inline
* condbr => condbr_inline
The inline variants perform control flow at compile-time, and they
utilize the return value of `Sema.analyzeBody`.
`analyzeBody` now returns an Index, not a Ref, which is the ZIR index of
a break instruction. This effectively communicates both the intended
break target block as well as the operand, allowing parent blocks to
find out whether they, in turn, should return the break instruction up the
call stack, or accept the operand as the block's result and continue
analyzing instructions in the block.
Additionally:
* removed the deprecated ZIR tag `block_comptime`.
* removed `break_void_node` so that all break instructions use the same Data.
* zir.Code: remove the `root_start` and `root_len` fields. There is now
implied to be a block at index 0 for the root body. This is so that
`break_inline` has something to point at and we no longer need the
special instruction `break_flat`.
* implement source location byteOffset() for .node_offset_if_cond
.node_offset_for_cond is probably redundant and can be deleted.
We don't have `comptime var` supported yet, so this commit adds a test
that at least makes sure the condition is required to be comptime known
for `inline while`.
This provides us greatly increased type safety and prevents the common
mistake of using a zir.Inst.Ref where a zir.Inst.Index was expected or
vice-versa. It also increases the ergonomics of using the typed values
which can be directly referenced with a Ref over the previous zir.Const
approach.
The main pain point is casting between a []Ref and []u32, which could be
alleviated in the future with a new std.mem function.
This is useful for build.zig files to check in some cases, for example
to adhere to the convention of installing config to /etc instead of
/usr/etc on linux when using the /usr prefix. Perhaps std.build will
handle such common cases eventually, but that is not yet the case.
We are now passing this test:
```zig
export fn _start() noreturn {}
```
```
test.zig:1:30: error: expected noreturn, found void
```
I ran into an issue where we get an integer overflow trying to compute
node index offsets from the containing Decl. The problem is that the
parser adds the Decl node after adding the child nodes. For some things,
it is easy to reserve the node index and then set it later, however, for
this case, it is not a trivial code change, because depending on tokens
after parsing the decl determines whether we want to add a new node or
not.
Possible strategies here:
1. Rework the parser code to make sure that Decl nodes are before
children nodes in the AST node array.
2. Use signed integers for Decl node offsets.
3. Just flip the order of subtraction and addition. Expect Decl Node
index to be greater than children Node indexes.
I opted for (3) because it seems like the simplest thing to do. We'll
want to unify the logic for computing the offsets though because if the
logic gets repeated, it will probably get repeated wrong.
Next up is reworking the seam between the LazySrcLoc emitted by Sema
and the byte offsets currently expected by codegen.
And then the big one: updating astgen.zig to use the new memory layout.
See https://eprint.iacr.org/2019/1492.pdf for justification.
8 rounds ChaCha20 provides a 2.5x speedup, and is still believed
to be safe.
Round-reduced versions are actually deployed (ex: Android filesystem
encryption), and thanks to the magic of comptime, it doesn't take much
to support them.
This also makes the ChaCha20 code more consistent with the Salsa20 code,
removing internal functions that were not part of the public API any more.
No breaking changes; the public API remains backwards compatible.
Bring this in line with how variable declarations are handled.
Open a new indentation level for the initialization expression to handle
nested expressions like blocks.
Closes#7618
The memory layout for ZIR instructions is completely reworked. See
zir.zig for those changes. Some new types:
* `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating
each instruction independently, there is now a Tag and 8 bytes of
data available for all ZIR instructions. Small instructions fit
within these 8 bytes; larger ones use 4 bytes for an index into
`extra`. There is also `string_bytes` so that we can have 4 byte
references to strings. `zir.Inst.Tag` describes how to interpret
those 8 bytes of data.
- This is shared by all `Block` scopes.
* `Module.WipZirCode`: represents an in-progress `zir.Code`. In this
structure, the arrays are mutable, and get resized as we add/delete
things. There is extra state to keep track of things. This struct is
stored on the stack. Once it is finished, it produces an immutable
`zir.Code`, which will remain on the heap for the duration of a
function's existence.
- This is shared by all `GenZir` scopes.
* `Sema`: represents in-progress semantic analysis of a `zir.Code`.
This data is stored on the stack and is shared among all `Block`
scopes. It is now the main "self" argument to everything in the file
that was previously named `zir_sema.zig`.
Additionally, I moved some logic that was in `Module` into here.
`Module.Fn` now stores its parameter names inside the `zir.Code`,
instead of inside ZIR instructions. When the TZIR memory layout
reworking time comes, codegen will be able to reference this data
directly instead of duplicating it.
astgen.zig is (so far) almost entirely untouched, but nearly all of it
will need to be reworked to adhere to this new memory layout structure.
I have no benchmarks to report yet, as I am still working through
compile errors and fixing various things that I broke in this branch.
Overhaul of Source Locations:
Previously we used `usize` everywhere to mean byte offset, but sometimes
also mean other stuff. This was error prone and also made us do
unnecessary work, and store unnecessary bytes in memory.
Now there are more types involved into source locations, and more ways
to describe a source location.
* AllErrors.Message: embrace the assumption that files always have less
than 2 << 32 bytes.
* SrcLoc gets more complicated, to model more complicated source
locations.
* Introduce LazySrcLoc, which can model interesting source locations
with very little stored state. Useful for avoiding doing unnecessary
work when no compile errors occur.
Also, previously, we had `src: usize` on every ZIR instruction. This is
no longer the case. Each instruction now determines whether it even cares
about source location, and if so, how that source location is stored.
This requires more careful work inside `Sema`, but it results in fewer
bytes stored on the heap, without compromising accuracy and power of
compile error messages.
Miscellaneous:
* std.zig: string literals have more helpful result values for
reporting errors. There is now a lower level API and a higher level
API.
- side note: I noticed that the string literal logic needs some love.
There is some unnecessarily hacky code there.
* cut & pasted some TZIR logic that was in zir.zig to ir.zig. This
probably broke stuff and needs to get fixed.
* Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't
think this quite how this code will be organized. Need some more
careful planning about how to implement structs, unions, enums. They
need to be independent Decls, just like a top level function.
The main realization here was that getting rid of the early returns
in renderWhile() and rewriting the logic into a mostly unified execution
path took things from ~200 lines to ~100 lines and improved consistency
by deduplicating code.
Also add several test cases and fix a few issues along the way:
Fixes https://github.com/ziglang/zig/issues/6114
Fixes https://github.com/ziglang/zig/issues/8022
Add failing testcase to reproduce issue 8088
Tidy up renderWhile(), factoring out renderWhilePayload()
Ensure correct newline is used before 'then' token in while/for/if
Handle indents for 'if' inside 'for' or 'while'
Stop special-casing 'if' compared to 'for' and 'while'
liburing commit: 1bafb3ce5f
As stated in the liburing commit message, this fixes a regression,
reverting code that was added specutively to avoid a syscall in some
cases.
The modification to the grammar in the comment is in line with the
grammar in the zig-spec repo.
Note: checking if the previous token is a colon is insufficent to tell
if a block has a label, the identifier must be checked for as well. This
can be seen in sentinel terminated slicing: `foo[0..1:{}]`
In order to update the printed progress string the code tried to move
the cursor N cells to the left, where N is the number of written bytes,
and then clear the remaining part of the line.
This strategy has two main issues:
- Is only valid if the number of characters is equal to the number of
written bytes,
- Is only valid if the line doesn't get too long.
The second point is the main motivation for this change, when the line
becomes too long the terminal wraps it to a new physical line. This
means that moving the cursor to the left won't be enough anymore as once
the left border is reached it cannot move anymore.
The wrapped line is still stored by the terminal as a single line,
despite now taking more than a single one when displayed. If you try to
resize the terminal you'll notice how the contents are reflowed and are
essentially illegible.
Querying the cursor position on non-Windows systems (plot twist,
Microsoft suggests using VT escape sequences on newer systems) is
extremely cumbersome so let's do something different.
Before printing anything let's save the cursor position and clear the
screen below the cursor, this way we ensure there's absolutely no trace
of stale data on screen, and after the message is printed we simply
restore it.
Currently `// zig fmt: off` does not work as there are two spaces
after the `//` instead of one. This can cause confusion, so allow
arbitrary whitespace before the `zig fmt: (off|on)` in the comment but
trim this whitespace to the canonical single space in the output.
Let's follow the road paved by the removal of 'z'/'Z', the Formatter
pattern is nice enough to let us remove the remaining four special cases
and declare u8 slices free from any special casing!
OCB has been around for a long time.
It's simpler, faster and more secure than AES-GCM.
RFC 7253 was published in 2014. OCB also won the CAESAR competition
along with AEGIS.
It's been implemented in OpenSSL and other libraries for years.
So, why isn't everybody using it instead of GCM? And why don't we
have it in Zig already?
The sad reason for this was patents. GCM was invented only to work
around these patents, and for all this time, OCB was that nice
thing that everybody knew existed but that couldn't be freely used.
That just changed. The OCB patents are now abandoned, and OCB's
author just announced that OCB was officially public domain.
Beside the new order being consistent with the ThreadPool API and making
more sense, this shuffling allows to write the context argument type in
terms of the startFn arguments, reducing the use of anytype (eg. less
explicit casts when using comptime_int parameters, yay).
Sorry for the breakage.
Closes#8082
This completes the process. All target CPU features are now
auto-generated by the tools/update_cpu_features.zig script, which
contains all the overrides.
Invoking this tool against LLVM 12rc2 now produces an empty git diff.
With this change, added & modified cpus & features participate in the
same pruning system, and sorting takes into account the zig name, not
the pre-modified llvm name.
The modified target files in this commit are due to the improved
sorting and pruning.
The script now fully supports extra cpus & features.
The tools/update_cpu_features script is coming along, and generates
correct information for all these targets. The remaining targets are:
* arm
* aarch64
* amdgpu
* riscv
I will commit them once the issues with the updater tool are resolved.
This introduces {'} to indicate escape for a single-quoted string,
and {} to indicate escape for a double quoted string.
Without this, there would be unnecessary \' inside double quoted
strings, and unnecessary \" inside single quoted strings.
Motivated by the llvm12 branch, in the new tool I am writing for
updating target CPU features.
Conflicts:
* src/clang.zig
* src/llvm.zig
- this file got moved to src/llvm/bindings.zig in master branch so I
had to put the new LLVM arch/os enum tags into it.
* lib/std/target.zig, src/stage1/target.cpp
- haiku had an inconsistency with its default target ABI, gnu vs
eabi. In this commit we make it gnu in both places to match the
latest changes by @hoanga.
* src/translate_c.zig
This is an accident from a merge conflict introduced in
7edb204edf.
The new pipe2 code I believe is supposed to work for all posix-like
systems. If haiku needs special handling here, it should be
re-introduced.
* no isHaiku() function since there is not more than one os tag that
this applies to.
* clean up some control flow into a switch
* add some TODO comments to investigate panics that suspiciously look
like they should be compile errors (see #363)
Reverts bf642204b3 and uses a different
workaround, suggested by @LemonBoy.
There is either a compiler bug or a design flaw somewhere around here.
It does not have to block this branch, but I need to understand exactly
what's going on here and make it so that nobody ever has to run into
this problem again.