Skip to content

cURL requests with page numbers always return page 1 #783

@famosab

Description

@famosab

I am trying to programmatically access clinical information for multiple donors in multiple projects using the ICGC API. The goal is to filter the donors by a specific cancer type, in this case, Cholangiocarcinoma. I have written a bash script to retrieve the donor IDs, but I encountered an issue when some projects have more than 100 donors, resulting in multiple pages of results.

Here is the script I have so far:

#!/bin/bash

# Define the list of projects
projects=("BTCA-SG" "BTCA-JP" "LIHC-US" "LIRI-JP")

# Loop through the projects and generate a list of donors
for project in "${projects[@]}"; do
    page=1
    while true; do
        result=$(curl -X GET --header 'Accept: application/json' "https://dcc.icgc.org/api/v1/projects/${project}/donors?size=1000&page=${page}" | jq -r '.hits[].id')
        if [[ -z "$result" ]]; then
            break
        fi
        echo "$result"
        page=$((page+1))
    done
done > donor_ids.txt

I noticed that changing the page parameter in the API request does not affect the returned results. I expected that changing the page parameter should retrieve the next page of results, but it seems to be returning the same set of results regardless of the page value. I would appreciate any guidance on how to properly handle pagination and retrieve all the donor IDs for the specified projects.

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions