Skip to content

POST request replay fails for WordPress REST API endpoints #984

@rosemlondon

Description

@rosemlondon

Describe the bug

Several archived pages fail to load dynamic content (news listings, publications, events, datasets) because PyWB cannot consistently replay POST requests to a WordPress REST API endpoint. The pages appear to load, but the content blocks populated by the API remain empty.

This appears to be a variant of the known POST request replay limitation documented in Issue #768 and the POST Request Replay wiki page.

Steps to reproduce the bug

Visit the below example pages in The National Archives webarchive we’ve been working on:

https://webarchive.nationalarchives.gov.uk/ukgwa/20260206124441/https://www.nceo.ac.uk/news/
https://webarchive.nationalarchives.gov.uk/ukgwa/20260206124441/https://www.nceo.ac.uk/news/latest-events/
https://webarchive.nationalarchives.gov.uk/ukgwa/20260106092954/https://www.nceo.ac.uk/data-facilities/datasets-tools/
https://webarchive.nationalarchives.gov.uk/ukgwa/20260106092954/https://www.nceo.ac.uk/our-research/publications/

The dynamic content blocks (news, publications, events) under the HTML content-area div do not populate - see screenshots for comparisons. In replay, the POST request to https://www.nceo.ac.uk/wp-json/v2/archive-block/results is not consistently resolved to the captured response, so the frontend receives no usable data for those blocks.

Expected behavior

Dynamic content should render as captured, with POST API responses replayed correctly from the WARC.

Screenshots

See PyWB replay of News page below:

Image

And then the live replay of this page below:

Image

Environment

  • OS: MacOS Tahoe 26.3
  • Browser: Chrome
  • Version: Version 145.0.7632.159

Additional context

This appears to be a replay-time POST matching issue rather than a capture gap. The WARC contains successful 200 responses for the endpoint, but during replay PyWB does not always match/reconstruct the original POST request context (method + body canonicalization/index key), which can lead to a 404 or missing response delivery for the same URL. The issue is most consistent with known PyWB limitations around POST replay/index compatibility. The POST body is standard application/x-www-form-urlencoded and relatively small.

There is an additional complicating factor: the endpoint https://www.nceo.ac.uk/wp-json/v2/archive-block/results is POST-only. A GET request to the same URL returns a 404 rest_no_route error, which makes it difficult to inspect or test the response directly in a live environment and rules out standard fuzzy matching workarounds.

This issue is similar to an already raised issue - [#768], however it differs in that it is not specific to the OutbackCDX backend, and the POST-only nature of the endpoint (GET → 404 rest_no_route) eliminates fallback matching strategies that might otherwise apply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions