Skip to content

Add Confluence Data Center PAT detector#4886

Open
amanfcp wants to merge 1 commit intomainfrom
INS-400
Open

Add Confluence Data Center PAT detector#4886
amanfcp wants to merge 1 commit intomainfrom
INS-400

Conversation

@amanfcp
Copy link
Copy Markdown
Contributor

@amanfcp amanfcp commented Apr 14, 2026

Summary

Adds a detector for Confluence Data Center Personal Access Tokens.

  • No branded prefix. DC PATs are plain 44-char base64. To avoid drowning in false positives, the detector base64-decodes each candidate and checks for the <numeric_id>:<random_bytes> structural shape at the byte level
  • URL regex captures http?://host(:port)?, not just https://. On-prem Confluence commonly runs plain HTTP inside corporate networks and on non-standard ports (:8090, :8443).
  • Token-only emission. When no URL is found in the chunk, we still emit an unverified result.

Testing

  • Unit tests: pattern coverage, verification coverage via gock.
  • No CI integration test. Confluence Data Center requires a paid Atlassian license.
  • Manual end-to-end: stood up Confluence DC locally via the official Docker image using Atlassian's time-bomb licenses for testing server apps, created and verified valid PATs.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Adds a new detector with optional live HTTP verification against self-hosted Confluence instances; risk is mainly false positives/extra network calls and endpoint pairing behavior during scans.

Overview
Adds a new ConfluenceDataCenter detector that identifies Confluence Data Center Personal Access Tokens by matching 44-char base64 candidates and post-filtering them via base64 decode to the expected <numeric_id>:<bytes> structure, then optionally pairing them with nearby instance base URLs.

When verification is enabled, it validates tokens by calling GET /rest/api/user/current with Bearer auth (caching DNS failures to avoid repeated lookups). The detector is registered in default detector lists and a new DetectorType_ConfluenceDataCenter enum value is added, with unit tests covering pattern extraction, token-only emission, URL pairing, and verification status handling.

Reviewed by Cursor Bugbot for commit 6a39daf. Bugbot is set up for automated code reviews on this repo. Configure here.

@amanfcp amanfcp requested a review from a team April 14, 2026 17:16
@amanfcp amanfcp requested review from a team as code owners April 14, 2026 17:16
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6a39daf. Configure here.

keywords = []string{"confluence", "atlassian", "wiki"}

// 44-char base64 PAT; decoded form must match the structural check below.
tokenPat = regexp.MustCompile(detectors.PrefixRegex(keywords) + `\b([A-Za-z0-9+/]{44})\b`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing \b silently misses tokens ending in + or /

Medium Severity

The tokenPat regex uses \b (word boundary) around a character class [A-Za-z0-9+/]{44} that includes + and /. Since \b only considers [A-Za-z0-9_] as word characters, the trailing \b will fail to match whenever the 44th character happens to be + or / and is followed by whitespace, a newline, or end-of-string. This silently drops roughly 3% of structurally valid PATs (2 out of 64 base64 characters). Both test tokens conveniently end in alphanumeric characters (z and K), so this gap isn't caught by tests.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6a39daf. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid.

for baseURL := range uniqueURLs {
if invalidHosts.Exists(baseURL) {
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable host handling inconsistently drops subsequent token results

Medium Severity

When errNoHost occurs during verification, the host is cached in invalidHosts but the result is still emitted with a verification error. However, the check at line 122 silently skips any cached invalid host via continue, emitting no result at all. This means the first token encountering an unreachable host gets a result, but subsequent tokens for the same host are silently dropped — not even as token-only results, since len(uniqueURLs) > 0 prevents that fallback. Because Go map iteration is randomized, which token gets reported is non-deterministic. The comparable artifactory detector avoids this by consistently skipping results on errNoHost with continue.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6a39daf. Configure here.

keywords = []string{"confluence", "atlassian", "wiki"}

// 44-char base64 PAT; decoded form must match the structural check below.
tokenPat = regexp.MustCompile(detectors.PrefixRegex(keywords) + `\b([A-Za-z0-9+/]{44})\b`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can tighten this regex? Although I see that you have an additional check that checks the structure of the pattern, which is good.

Jira also has a similar pattern (same <numeric id>:<random bytes> structure). I found that having this structure also means that the credential will always start with an M, N or O. By any chance is that the same case with Confluence?

Comment on lines +68 to +86
// isStructuralPAT decodes a candidate base64 string and checks that it matches
// the "<numeric id>:<random bytes>" structure used by Confluence DC PATs:
// one or more ASCII digits, a colon, then at least one more byte.
func isStructuralPAT(candidate string) bool {
raw, err := base64.StdEncoding.DecodeString(candidate)
if err != nil {
return false
}
colon := bytes.IndexByte(raw, ':')
if colon <= 0 || colon == len(raw)-1 {
return false
}
for _, b := range raw[:colon] {
if b < '0' || b > '9' {
return false
}
}
return true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this. It's a great check for noise reduction. Will implement this in my Jira PR as well.

Comment on lines +111 to +118
if len(uniqueURLs) == 0 {
// Token-only: report unverified since we can't reach a host.
results = append(results, detectors.Result{
DetectorType: detector_typepb.DetectorType_ConfluenceDataCenter,
Raw: []byte(token),
RawV2: []byte(token),
})
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of reporting tokens as unverified if no URL was configured/found. Even though it can be misleading in case the token was actually live, it is still better than not reporting it at all.

A suggestion: we can indicate this (as a message in ExtraData maybe?) that we were unable to verify the token because of absence of URL/host. That can serve as a good distinction for the user.

switch resp.StatusCode {
case http.StatusOK:
return true, nil
case http.StatusUnauthorized, http.StatusForbidden:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that the API can return both 401 and 403 for an expired/invalid token? Usually it's one of these, and 403 can indicate lack of permissions instead of the credential being invalid.

Copy link
Copy Markdown
Contributor

@mustansir14 mustansir14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I have some questions/suggestions which you can look into.

Also it seems the credential pattern for this is the same as Jira Data Center, so there may be some overlap in results. I guess that's okay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants