In the pre-decompression code, use the appropriate largest possible
rep movs and rep stos to move code and clear bss, respectively. For
reverse copy, do note that the initial values are supposed to be the
address of the first (highest) copy datum, not one byte beyond the end
of the buffer.
rep strings are not necessarily the fastest way to perform these
operations on all current processors, but are likely to be in the
future, and perhaps more importantly, we want to encourage the
architecturally right thing to do here.
This also fixes a couple of trivial inefficiencies on 64 bits.
[ Impact: trivial performance enhancement, increase code similarity ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>