The example record in section 8 of the CDXJ spec contains two digest fields: digest and recordDigest.
Only digest is mentioned in the spec.
org,example)/index.html 20220106150849300 {"url":"https://example.org/index.html","digest":"sha-256:a8c5ac6f47aa34c5c5183daedc6ebbc7ca1e53fd2ec7db5e98d71bffb163b2ce","mime":"image/png","offset":283,"length":2269,"recordDigest":"sha256:e520b333999144ff38f593f6d76f5333d24895701953b2ea0507ed041d20ca2c","status":200,"filename":"data.warc.gz"}
On my understanding the digest value can be copied from the WARC-Payload-Digest field in the WARC header, but reading back the WARC spec it's not entirely clear.
What did the extra recordDigest field refer to?
I notice they're different values so they refer to different things.
Simply removing recordDigest from the example in the spec would clear up some confusion.
The example record in section 8 of the CDXJ spec contains two digest fields:
digestandrecordDigest.Only
digestis mentioned in the spec.On my understanding the digest value can be copied from the WARC-Payload-Digest field in the WARC header, but reading back the WARC spec it's not entirely clear.
What did the extra
recordDigestfield refer to?I notice they're different values so they refer to different things.
Simply removing
recordDigestfrom the example in the spec would clear up some confusion.