Reconcile bare/rich component identities in ComponentsFound#1760
Reconcile bare/rich component identities in ComponentsFound#1760
Conversation
eefbb60 to
0ec6835
Compare
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
There was a problem hiding this comment.
Pull request overview
This PR updates the orchestrator’s component aggregation so that, within a single detector run, components registered under a “bare” identity (no provenance URLs) are reconciled into their corresponding “rich” identities (same BaseId, but with DownloadUrl/SourceUrl). This ensures ComponentsFound reflects a single coherent identity while retaining important metadata and graph-derived fields.
Changes:
- Reconcile detected components by
BaseId, merging bare metadata into rich entries and dropping the bare entries. - Extend graph translation to associate graph data with components by either
IdorBaseId, so rich components pick up graph info from bare-id graphs. - Add unit tests covering reconciliation behavior at both the
ComponentRecorderand graph translation layers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs |
Groups detected components by BaseId and merges bare metadata into rich components. |
src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs |
Applies graph roots/ancestors/dev-dep/scope/locations using Id or BaseId matching. |
test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs |
Adds tests validating bare/rich reconciliation behavior for component aggregation. |
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs |
Adds tests validating rich components absorb graph data from bare-id graphs. |
0ec6835 to
f1ee3e5
Compare
f1ee3e5 to
699abfc
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses identity fragmentation in ComponentsFound when the same package is registered with both a bare Id (no DownloadUrl/SourceUrl) and a rich Id (with URL data) within a single detector, by reconciling on BaseId so rich components absorb metadata/graph enrichment from their bare counterparts.
Changes:
- Reconcile
ComponentRecorder.GetDetectedComponents()output by grouping onBaseIdand merging bare component metadata into all rich entries. - Extend
DefaultGraphTranslationServiceenrichment to match dependency graphs by either full Id orBaseId, allowing rich entries to pick up roots/ancestors/scope/devDep/locations from bare-Id graphs. - Add unit tests and supporting spike/sample artifacts documenting npm lockfile behavior and identity reconciliation rationale.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs | Reconciles bare vs rich detected components by BaseId and merges selected metadata into rich entries. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs | Updates graph enrichment to treat rich components as present when graphs contain the bare BaseId. |
| test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs | Adds tests validating bare→rich subsumption and metadata merge semantics in GetDetectedComponents(). |
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs | Adds tests ensuring graph-derived data is transferred from bare-Id graphs to rich components. |
| docs/component-identity-reconciliation-design.md | Design doc describing reconciliation points and semantics for ComponentsFound vs DependencyGraphs. |
| docs/component-identity-merging.md | Expanded design discussion and scenarios for bare/rich merging behavior. |
| docs/npm-detector-spike-plan.md | Work breakdown and analysis plan for npm detector metadata population (spike). |
| docs/npm-detector-spike-findings.md | Spike findings summarizing lockfile field availability and detector gaps. |
| docs/npm-lockfile-samples/README.md | Documentation of npm lockfile structural differences and detector read paths. |
| docs/npm-lockfile-samples/v1-lockfile-sample.json | Trimmed v1 lockfile excerpt used as documentation reference. |
| docs/npm-lockfile-samples/v2-lockfile-sample.json | Trimmed v2 lockfile excerpt used as documentation reference. |
| docs/npm-lockfile-samples/v3-lockfile-sample.json | Trimmed v3 lockfile excerpt used as documentation reference. |
| test-npm-spike/package.json | Spike project input for npm v3 lockfile generation/repro. |
| test-npm-spike/package-lock.json | Spike npm v3 lockfile (full) captured for repro. |
| test-npm-spike/baseline-output/GovCompDisc_Log_20260313102052505_25872.log | Captured run log artifact for the spike. |
| test-npm-spike/baseline-output/ScanManifest_20260313102052512.json | Captured scan manifest artifact for the spike run. |
| test-npm-spike-v1/package.json | Spike project input for npm v2 lockfile generation/repro. |
| test-npm-spike-v1/package-lock.json | Spike npm v2 lockfile (full) captured for repro. |
| test-npm-spike-v1/baseline-output/GovCompDisc_Log_20260313102109400_18928.log | Captured run log artifact for the spike. |
| test-npm-spike-v1/baseline-output/ScanManifest_20260313102109408.json | Captured scan manifest artifact for the spike run. |
| test-npm-spike-v1-only/package.json | Spike project input for npm v1 lockfile generation/repro. |
| test-npm-spike-v1-only/package-lock.json | Spike npm v1 lockfile (full) captured for repro. |
| test-npm-spike-v1-only/baseline-output/GovCompDisc_Log_20260313102122587_32472.log | Captured run log artifact for the spike. |
| test-npm-spike-v1-only/baseline-output/ScanManifest_20260313102122594.json | Captured scan manifest artifact for the spike run. |
Copilot's findings
Files not reviewed (3)
- test-npm-spike-v1-only/package-lock.json: Language not supported
- test-npm-spike-v1/package-lock.json: Language not supported
- test-npm-spike/package-lock.json: Language not supported
- Files reviewed: 16/24 changed files
- Comments generated: 1
Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId. Changes: - ComponentRecorder.GetDetectedComponents(): group by BaseId, merge bare metadata (licenses, suppliers, containers) into all rich entries - DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged(): extend graph lookup to match on BaseId so rich components absorb graph data (roots, ancestors, devDep, scope, file paths) from bare-Id graphs - Tests for both reconciliation levels Addresses work item #2372676. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
699abfc to
b06a7d2
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR reconciles “bare” and “rich” component identities so that bare-Id registrations are merged into rich counterparts (and rich components can still pick up dependency-graph data recorded under bare Ids).
Changes:
- Update
ComponentRecorder.GetDetectedComponents()to group byBaseIdand merge bare metadata into all rich entries sharing thatBaseId. - Update
DefaultGraphTranslationServiceto apply dependency-graph data to rich components even when the graph stored the component underBaseId. - Add tests covering reconciliation behavior at both the recorder and graph-translation layers.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs | Adds tests ensuring rich components absorb graph data recorded under bare Ids. |
| test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs | Adds tests verifying bare/rich reconciliation and metadata merging in GetDetectedComponents(). |
| src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs | Extends graph lookup to match on BaseId and plumbs the resolved graph id into graph queries. |
| src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs | Implements grouping by BaseId, merges bare metadata into rich components, and refactors merge helpers. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 4
…t-identity-merging-level1
…s://github.com/microsoft/component-detection into user/aamaini/component-identity-merging-level1
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reconciles component identities when the same package is registered under both a “bare” identity (BaseId only) and one or more “rich” identities (Id includes provenance like DownloadUrl/SourceUrl), ensuring ComponentsFound and graph-derived metadata are attributed to the rich component(s).
Changes:
- Update
ComponentRecorder.GetDetectedComponents()to group byBaseId, merge bare metadata into rich entries, and drop bare entries when rich exists. - Update
DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged()to match dependency-graph nodes byIdorBaseIdso rich components can absorb graph data from bare-Id graphs. - Add unit tests covering reconciliation in both the recorder and graph translation layers.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs | Reconciles detected components by BaseId, merging bare metadata into rich entries and preserving multiple distinct rich identities. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs | Extends graph membership checks to fall back to BaseId for rich components so they inherit roots/ancestors/devDep/scope/locations from bare-Id graphs. |
| test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs | Adds tests validating bare↔rich reconciliation behavior and metadata merging in GetDetectedComponents(). |
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs | Adds tests validating that rich components pick up graph-derived data from graphs keyed by bare Ids. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 0 new
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1760 +/- ##
============================
============================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId.
Changes:
Addresses work item #2372676.