Conversation
|
That s quite an addition ... worth discussing. |
f9f7291 to
bf8a527
Compare
|
This is indeed a lot of code. I think this should be discussed on the list, and possibly voted on. |
|
asked on php-internals mailing list: https://marc.info/?l=php-internals&m=170568974400302&w=2 |
ext/hash/php_hash_blake3.h
Outdated
| #define PHP_BLAKE3_CTX blake3_hasher | ||
| // help: is V correct? | ||
| //#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760" | ||
| #define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760" |
There was a problem hiding this comment.
can someone double-check this? because i couldn't wrap my head around it, and this is just my best guess based on
php-src/ext/hash/blake3/upstream_blake3/c/blake3.h
Lines 42 to 61 in 882466b
There was a problem hiding this comment.
How did you come up with this? From what I see in
Lines 211 to 230 in 6d4598e
There was a problem hiding this comment.
Look at line 150, uppercase Q still exists:
Line 150 in 6d4598e
Pretty sure uppercase C also existed with the meaning unsigned char or uint8_t when it was written, but it seems to have been removed! I'll try to re-write it again.. thanks!
There was a problem hiding this comment.
Still can't get it quite right, do you see anything off here?
#define PHP_BLAKE3_SPEC /* uint32_t key[8]; */"l8" \
/* uint32_t cv[8]; */ "l8" \
/* uint64_t chunk_counter; */ "q1" \
/* uint8_t buf[BLAKE3_BLOCK_LEN]; */ "b64" \
/* uint8_t buf_len; */ "b1" \
/* uint8_t blocks_compressed */ "b1" \
/* uint8_t flags; */ "b1" \
/* uint8_t cv_stack_len; */ "b1" \
/* uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN]; */ "b1760" \
"."It still hits
Lines 1493 to 1495 in 6d4598e
There was a problem hiding this comment.
I found a possible solution: it seems the compiler insert 5 alignment-padding-bytes after uint8_t buf_len;,
and an additional 7 padding bytes after uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];, if the spec is changed to
#define PHP_BLAKE3_SPEC /* uint32_t key[8]; */"l8" \
/* uint32_t cv[8]; */ "l8" \
/* uint64_t chunk_counter; */ "q" \
/* uint8_t buf[BLAKE3_BLOCK_LEN]; */ "b64" \
/* uint8_t buf_len; */ "b" \
/* skip 5 bytes of alignment padding in chunk */ "B5" \
/* uint8_t blocks_compressed */ "b" \
/* uint8_t flags; */ "b" \
/* uint8_t cv_stack_len; */ "b" \
/* uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN]; */ "b1760" \
/* skip 7 trailing alignment bytes */ "B7" \
"."
serialization actually works on my system!
But this seems like the kind of thing different compilers, and even the same compiler with different optimization settings, may choose to do differently. For example, I wouldn't trust gcc -Ofast (optimize-for-speed) and gcc -Os (optimize-for-size) to do the exact same alignments paddings in the exact same location 🤔 Nor would I trust the compiler to do the exact same alignment padding across 32bit and 64bit builds..
I might be wrong but.. idk!
There was a problem hiding this comment.
I think 5 bytes is correct only on 64bit but it should not be correct on 32bit. Also the compiler is free to choose how to do padding so your choice of the padding place is basically relaying on undefined behaviour and it could theoretical differ between compilers. As I mentioned, the only way how to be sure is to define fields as compiler must not re-order them.
There was a problem hiding this comment.
From what I see, there's also option to define your own serialization function but not sure if it would make it any simpler.
There was a problem hiding this comment.
Well it could be cleaner because it wouldn't rely on struct memory layout...
There was a problem hiding this comment.
I think 5 bytes is correct only on 64bit but it should not be correct on 32bit
well 263b2f6 works on 32bit (tested locally on a debian 12 32bit docker image, and now confirmed on the 32bit github CI runner)
From what I see, there's also option to define your own serialization function but not sure if it would make it any simpler.
hmm could you link to an example? I haven't seen it. A custom serializer sounds like the safest bet across compilers/architectures/configurations
There was a problem hiding this comment.
well 263b2f6 works on 32bit (tested locally on a debian 12 32bit docker image, and now confirmed on the 32bit github CI runner)
so it pads to 64bit on 32bit? interesting. I thought that it gets padded to 32bit on 32bit arch but probably I'm wrong.
hmm could you link to an example?
You can define your own php_hash_serialize_func_t and php_hash_unserialize_func_t that you can set to php_hash_ops instead of php_hash_serialize / php_hash_unserialize. Then you should be able to do a custom serialization / unserialization and ignore the spec.
|
post on the mailing list worth re-posting here: wrote some benchmarks, but the TL;DR is: array(6) {
["O2-portable-march"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(29295)
["mb_per_second"]=>
float(533.3674688513398)
}
["O2-portable"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(13876)
["mb_per_second"]=>
float(1126.0449697319111)
}
["O2-sse2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4969)
["mb_per_second"]=>
float(3144.4958744214127)
}
["O2-sse41"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4688)
["mb_per_second"]=>
float(3332.977815699659)
}
["O2-avx2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(2384)
["mb_per_second"]=>
float(6554.1107382550335)
}
["O2-avx512"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(1753)
["mb_per_second"]=>
float(8913.291500285226)
}
}Edit: -O2 portable: 596MB/s Again, even with -march=native, the compiler cannot make the portable |
|
It's missing license in each file so that will need to be added. |
|
@bukka added the LICENSE file from https://github.com/BLAKE3-team/BLAKE3/blob/master/LICENSE , |
|
Regarding the license, can we maybe add it to the README.REDIST.BINS file? Number |
|
I think the added LICENSE file and a note in README.REDIST.BINS should be enough for upstream files. You should add a header to the files that you added - hash_blake3.c and php_hash_blake3.h |
|
Having this would be good |
|
More than 2 years since the first attempt to get Blake3 into PHP, why is it so hard? |
the main problem this round was that I just lost steam. Got occupied with other stuff and forgot about this. You reminded me though! |
I'm requesting re-review. |
|
-1 from me I think it is a terrible idea to bundle more crypto stuff in PHP Is there an RFC for this ? |
|
@remicollet imo we should support BLAKE3 for the same reason we support xxHash (added 8.1.0): speed. BLAKE3 offers SHA3-256-like security at much higher speed than SHA3-256. Quoting Exactly how we support BLAKE3 is not that important, if OpenSSL/Sodium starts offering BLAKE3, I wouldn't mind requiring OpenSSL/Sodium for BLAKE3 support. But they don't support BLAKE3 (yet?). As for RFC, made a draft a year ago, but now I can't even find it. I'll make a new one. |
So effort should go there, not here. |
|
RFC draft: https://wiki.php.net/rfc/blake3 |
bukka
left a comment
There was a problem hiding this comment.
Except the missing serialization, it seem ok to me. It would be also good to add a bit more tests to test some predefined vectors to be sure all is good.
I think it's good already for the RFC so you can announce it.
In terms of the maintanence, the hash extension bundles already quite a few of algos so this is just another one. It's taken from upstream like some other algorithms so I don't think it matters that much the amount of code. The upstream seems to be quite well maintained.
ext/hash/php_hash_blake3.h
Outdated
| #define PHP_BLAKE3_CTX blake3_hasher | ||
| // help: is V correct? | ||
| //#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760" | ||
| #define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760" |
There was a problem hiding this comment.
How did you come up with this? From what I see in
Lines 211 to 230 in 6d4598e
| @@ -73,6 +106,8 @@ PHP_INSTALL_HEADERS([ext/hash], m4_normalize([ | |||
| php_hash_ripemd.h | |||
| php_hash_sha.h | |||
| php_hash_sha3.h | |||
| php_hash_blake3.h | |||
| blake3/upstream_blake3/c/blake3.h | |||
There was a problem hiding this comment.
Why is there that extra upstream_blake3 directory. Cannot this go directly to blake3?
There was a problem hiding this comment.
the directory separate code written for the php-src integration from the substantial amount of code cloned directly from https://github.com/BLAKE3-team/BLAKE3/
Cannot this go directly to blake3?
It can, but I'd prefer if it didn't.
There was a problem hiding this comment.
I don't really understand why it needs extra subdirectory. The blake3 directly contains only upstream_blake3 directory which really doesn't make any sense to me. And you also have c directory which I can see is from upstream. So it should really be just blake3/c .
|
Right now there is a problem with https://dev.mysql.com/ causing Windows CI's to fail, specifically https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.31-winx64.zip returns HTTP 403 Forbidden. It's not this PR's fault 🤔 edit: made a dedicated issue for it: #17561 |
|
PHP needs fast hash, can the RFC be moved to voting? |
|
@mvorisek last time I brought it up on the internals mailing list, they were worried about how the ext/hash API was getting cluttered, how it would increase compile time, how it would be a maintenance burden, and started discussing how less-frequently-used-hashes could be moved into pecl extensions (iirc) ... I dropped out of the conversation rather quickly 😅 |
|
Hmm, I cannot comment on the compile times, but given this hash algo is currently the best one given its entropy, cryptographic safety and speed, I would say it should be integrated into the core and made available on every platform. The maintanance should be also minimal given the source can be verified using CI and thus only the integration needs to be maintained mainly. Given these, I belive there is enough rationale to integrate this PR. What about simply opening the voting with a possible optional question if the CI sources verification should be added as well. If the voting would pass, great, if not, we cannot do much other than pecl.. |
|
It's hard to say if those comments would mean that the RFC cannot pass. I don't think it will be much more extra maintenance than other hashes as they are just taken from upstream. In this case the upstream is actually quite lively compare to others. The deciding factor should be also what's good for our users. So I would try to at least put it to vote and see the result then. I think it's worth a try for sure. |
|
@divinity76 It would certainly help if you'd update the benchmarks to include the latest SHA-2 changes. At least the SSE2 variant should be available everywhere. So it would probably make sense to list SHA-2 separately with the SSE2 implementation and the SHA-NI implementation that should be supported by your CPU. |
|
And looking back at the thread, you should also specify which of the Blake2 implementations is being used (SSE2, SSE4.1, AVX2, AVX512), because otherwise it's not an apples to apples comparison. |
|
It's been so long, why isn't the algorithm built into PHP yet? What problems remain? |
Because an RFC was requested but not yet brought to discussion+vote |
squash and rebase https://github.com/php/php-src/tree/896351a76d138071a21fea1ba20e91d5d0859c1b Co-authored-by: divinity76 <hans@loltek.net> Co-authored-by: Peter Kokot <peterkokot@gmail.com> Co-authored-by: Jakub Zelenka <bukka@php.net>
there's padding issues, idk how portable this is... The CI's should be interesting.
I don't have a 32bit system to test on locally, so testing on CI for now..
Also BLAKE3-team/BLAKE3#382 and BLAKE3-team/BLAKE3#443 have both been accepted upstream, so we can cleanly use upstream BLAKE3 sources without php-specific patches, at least for now :)



BLAKE3 is a very fast cryptographically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

RFC: https://wiki.php.net/rfc/blake3
Check this speed chart:
and this

/ext/hash/bench.phprun where BLAKE3 is much faster than every other secure hash, and beats several non-cryptographic hashes (AMD Ryzen 9 7950x):Quoting Google Bard: