fix(stabilization): all 18 audit fixes from STABILIZATION-PLAN.md#422
fix(stabilization): all 18 audit fixes from STABILIZATION-PLAN.md#422
Conversation
… and performance - Audit 002: Address unsafe `(T)(object)` casts in `DeserializeResponseAsync<T>` to prevent runtime exceptions and improve type safety. - Audit 003: Fix asymmetric serializer/deserializer maps in `TeamsActivityType` to prevent data loss during serialization. - Audit 004: Resolve silent `as`-cast failures in `TeamsActivity` property shadowing to ensure proper type handling. - Audit 005: Prevent multiple calls to `BuildServiceProvider()` during startup to avoid resource leaks and improve DI correctness. - Audit 006: Ensure `ConfigurationManager<OpenIdConnectConfiguration>` is disposed properly to prevent memory leaks. - Audit 007: Validate `object.ToString()` on property bag values to avoid unexpected string representations. - Audit 008: Preserve unknown entity types during JSON deserialization to prevent data loss and improve extensibility. - Audit 009: Replace fragile exception filtering by message string in `TeamsStreamingWriter` with more robust error handling. - Audit 010: Optimize string concatenation in `TeamsStreamingWriter` to use `StringBuilder` for better performance. - Audit 011: Register missing `TeamsActivity` base type in `ActivitySerializerMap` to ensure correct serialization behavior.
… copies, and input validation
… prioritized issue list
CompatAdapter.ProcessAsync was overwriting the shared OnActivity field on TeamsBotApplication for every request. Under concurrent load this caused request N's handler to overwrite request N-1's handler, so each request could execute the wrong compat callback. Fix: add a new BotApplication.ProcessAsync(HttpContext, handler, ct) overload that accepts a per-request activity handler. CompatAdapter now passes a locally-captured async-lambda to this overload instead of mutating the shared field. The original single-argument overload is preserved and delegates to the new one using OnActivity as the fallback. Tests: two new unit tests in BotApplicationTests assert (1) the per-request handler fires and the shared field does not, and (2) 20 concurrent requests each see exactly their own activity with no cross-contamination. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…solver The IssuerSigningKeyResolver delegate was calling ConfigurationManager.GetConfigurationAsync().GetAwaiter().GetResult() on the hot request path. Under load this blocked thread-pool threads while waiting on network I/O (OIDC discovery), risking starvation. Fix: introduce JwksKeyCache — a per-authority cache that kicks off a background Task on construction so the volatile _cached field is populated before the first real request arrives. GetKeys() reads from the volatile field (allocation-free, non-blocking) on the warm path. On cold start (background fetch not yet done) it falls back to a Task.Run-wrapped fetch which avoids any SynchronizationContext deadlock. Tests: three new unit tests verify that (1) keys are served after background warm-up, (2) 50 concurrent GetKeys() calls all succeed without exceptions, and (3) warm-cache GetKeys() completes in under 100 ms (no network I/O). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…in BotAuthenticationHandler Two crash vectors on the hot outbound request path: 1. (017) Guid.Parse(agenticIdentity.AgenticUserId) threw FormatException for any caller that passes a non-GUID string as the user ID. Fixed by switching to Guid.TryParse; when parsing fails the handler logs a warning and falls through to app-only token acquisition instead of crashing. 2. (018) new JwtSecurityToken(token) (deprecated ctor) could throw ArgumentException or JsonException for opaque/non-JWT tokens (e.g. managed-identity tokens). Fixed by replacing it with JsonWebToken (non-deprecated) and wrapping the call in a try/catch that traces the failure and continues; trace-logging must never crash a send. Tests: three new unit tests cover the valid-GUID agentic path, the malformed-GUID fallback, and the non-JWT opaque-token non-crash case. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…lizerMap
6 of 8 activity subtypes were absent from ActivitySerializerMap, so
TeamsActivity.ToJson() silently fell back to the base TeamsActivity
serializer and dropped all subtype-specific fields (e.g. reactionsAdded,
membersAdded, name/value for invokes) on serialization.
Fix: add all 7 missing types to ActivitySerializerMap (MessageReaction,
MessageUpdate, MessageDelete, ConversationUpdate, InstallUpdate, Invoke,
Event). Register them in TeamsActivityJsonContext so the STJ source
generator emits the required JsonTypeInfo<T> properties.
StreamingActivity remains the only serializer-only entry (no inbound
deserialization path).
Tests: four new unit tests assert
(1) every deserializer type has a matching serializer entry,
(2) serializer.Count == deserializer.Count + 1 (StreamingActivity),
(3) round-trips for MessageReaction and ConversationUpdate preserve
subtype fields in JSON, and
(4) MessageReactionActivity specifically preserves reactionsAdded and
replyToId after a full round-trip.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…operty getters The shadowed From/Recipient/Conversation/ChannelData getters used a bare 'as' cast which silently returned null whenever the base field held a plain base type (e.g. ConversationAccount instead of TeamsConversationAccount). This happened whenever an activity was deserialized directly into CoreActivity and then accessed via TeamsActivity without going through the constructor. Fix: each getter now falls back to an explicit wrapping constructor call (TeamsConversationAccount.FromConversationAccount, TeamsConversation.FromConversation, TeamsChannelData ctor) when the 'as' cast returns null but the field is non-null. This preserves all fields and avoids silent data loss. Tests: four new unit tests assert that reading From, Recipient, Conversation, and ChannelData never returns null when the base field is populated with a plain base-type instance. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…string path DeserializeResponseAsync used (T)(object)responseString in two places to return the raw string when T == string. This pattern compiles but throws InvalidCastException at runtime if T ever deviates from string; the cast also obfuscates intent. Fix: extract DeserializeAsString<T>() that explicitly handles the T==string branch: tries JsonSerializer.Deserialize<string>() to unwrap a JSON-quoted response body, falls back to the raw text on JsonException. The (T)(object) cast is now confined to one documented place with a clear comment. Tests: four new unit tests verify JSON-quoted string unwrapping, plain-text passthrough, standard JSON-object deserialization, and empty-body null return. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Calling services.BuildServiceProvider() inside extension methods creates a second, throwaway DI container that can register duplicate singleton instances (one per BuildServiceProvider() call) and captures resources that are never disposed. This violates the ASP.NET Core DI guidelines. Two call sites removed: - BotConfig.Resolve(IServiceCollection) — used by AddConversationClient and related extensions - JwtExtensions.ResolveBotConfig — used by AddBotAuthentication / AddBotAuthorization Fix: both methods now read IConfiguration directly from the service descriptor list (using ImplementationInstance, which is how AddSingleton<IConfiguration> registers it). If IConfiguration is absent, an InvalidOperationException is thrown immediately with a clear, actionable message instead of silently building a container that later leaks. Tests: four new unit tests cover the happy path (IConfiguration present), the fail-fast path (IConfiguration absent), and the AddBotAuthentication surface in both cases. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ializer STJ path SendMeetingNotificationAsync used Newtonsoft.Json.JsonConvert.SerializeObject to serialize the Bot Framework MeetingNotificationBase and then immediately deserialized the result with System.Text.Json. The cross-framework round-trip drops fields silently when the two serializers disagree on property naming conventions (Newtonsoft is PascalCase by default, STJ by default is also PascalCase but respects JsonPropertyName attributes differently). Fix: serialize with STJ using the same s_jsonOptions instance that is used for deserialization. A single serializer guarantees that property names are handled consistently in both legs of the conversion. Tests: two new unit tests verify that a TargetedMeetingNotification round-trip preserves type, recipients, and arbitrary top-level property keys. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tException in CompatTeamsInfo Five batch-messaging methods cast IActivity to Activity with a bare hard cast that throws InvalidCastException for any non-Activity IActivity implementation, producing a confusing runtime error that hides the root cause. Fix: extract RequireActivity(IActivity?, string) helper that uses 'as' and throws ArgumentException with a descriptive message (including the actual runtime type) when the cast fails. The guard fires before GetTeamsApiClient so the error is reported immediately without requiring a valid TurnState. The method signatures intentionally keep IActivity as the parameter type because the public API must remain compatible with the Bot Framework SDK. Tests: two new unit tests confirm that SendMessageToListOfUsersAsync and SendMessageToListOfChannelsAsync throw ArgumentException (not InvalidCastException) and include the parameter name in the exception. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…CitationEntity Two shallow-copy bugs where mutations of a copy propagated back to the original: (013) CoreActivity copy constructor assigned the same object references for ChannelData, From, Recipient, Conversation, Entities, Attachments, Value, and Properties. Fixed by: - Serializing/deserializing ChannelData, From, Recipient, Conversation via the existing STJ CoreActivityJsonContext type infos. - Cloning JsonArray (Entities, Attachments) and JsonNode (Value) via JsonNode.Parse(node.ToJsonString()). - Copying Properties into a new ExtendedPropertiesDictionary. TeamsConversationAccount.FromConversationAccount also shared the Properties dictionary reference. Fixed by copying entries into a fresh dictionary. (015) CitationEntity copy constructor used new List<CitationClaim>(source) which copies list items by reference. Fixed by serializing/deserializing each CitationClaim via TeamsActivityJsonContext so the nested CitationAppearanceDocument is also independently copied. Tests: five new unit tests assert that From, Conversation, Properties, Value, and CitationEntity.Citation are all independently mutable after a copy. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ble reference leak Build() returned the private _attachment field directly. Any caller that kept a reference to the builder and called a With* method after Build() would silently mutate the already-returned attachment. This is a mutable-reference-leak bug. Fix: add a _built flag. Build() sets the flag on first call and throws InvalidOperationException on any subsequent call, making post-Build mutations an immediate, visible error rather than a silent corruption. Tests: three new unit tests cover the happy path (first Build succeeds), the double-Build guard, and the mutation-after-Build scenario. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ing them EntityList.FromJsonArray returned null for any unrecognised type string and immediately skipped the entity, causing silent data loss for custom entity types such as BotMessageMetadata used by Power Automate. Fix (deserialization): change the switch default arm from '_ => null' to '_ => item.Deserialize<Entity>(options)'. The base Entity class already has [JsonExtensionData] on Properties so all unrecognised JSON fields survive the round-trip in that dictionary. Fix (serialization): ToJsonArray() was crashing with NotSupportedException when a Properties value was a JsonElement (produced by STJ [JsonExtensionData] during deserialization). Added a switch that re-parses JsonElement values via JsonNode.Parse(je.GetRawText()) before writing them into the output JsonObject. Tests: two new unit tests verify that (1) unknown types produce a non-null base Entity with the correct Type, and (2) known extension-data fields are preserved in Entity.Properties after deserialization. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… in compat layer All ConversationAccount property reads in ToCompatChannelAccount used value?.ToString() on object? values from the Properties dictionary. When those values are JsonElement (the runtime type produced by STJ [JsonExtensionData] deserialization), ToString() returns the raw JSON text — which includes surrounding quotes for strings — instead of the actual string value. Fix: add ExtractString(object?) helper that uses je.GetString() for JsonElement values with ValueKind == String (the correct, unescaped value) and je.GetRawText() for other JsonElement kinds. All property read sites in ToCompatChannelAccount (both overloads, 7 + 6 call sites) now use this helper. Tests: six new unit tests cover plain strings, null, JsonElement string extraction, raw-text fallback for non-string JSON elements, and an end-to-end round-trip through ToCompatChannelAccount with STJ-deserialized JsonElement properties. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ellation detection AppendResponseAsync caught HttpRequestException by matching the message text "Content stream was cancelled by user". This breaks silently whenever the Teams API or HttpClient wording changes across SDK or runtime versions. Fix: extract IsStreamCancelled(HttpRequestException) helper that checks: 1. HTTP status code 499 "Client Closed Request" — the structured Teams API signal 2. HTTP 408 Request Timeout — a fallback structured signal 3. Message text — retained as a secondary fallback for environments where the status code is not propagated (network-level errors) Non-cancellation errors (e.g. HTTP 500) are still propagated to the caller. Tests: four new unit tests verify that status-code 499, legacy message text, and an additional legacy message variant all set the cancelled flag silently, while an unrelated 500 error propagates correctly from AppendResponseAsync. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…r with StringBuilder _accumulated += chunk creates a new string allocation on every AppendResponseAsync call, producing O(n²) memory allocations as chunk count grows. Under high streaming workloads with many small chunks this degrades throughput noticeably. Fix: replace the string field with a StringBuilder field. Append(chunk) is O(1) amortized; all read sites call .ToString() to materialise the string only when it is actually needed (at each send boundary). The guard condition is updated to check _accumulated.Length == 0 instead of string.IsNullOrEmpty. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…stration TurnMiddleware used a plain List<ITurnMiddleware> with no enforcement that writes stop after the application starts. Any code that held a reference to the pipeline could silently add middleware after the first request arrived, causing non-deterministic behaviour. Fix: separate the build-phase list (_building: List<T>) from the frozen read-path array (_frozen: T[]). Freeze() copies the list into the array and sets _isFrozen = true. Subsequent Use() calls throw InvalidOperationException. RunPipelineAsync and GetEnumerator read from the frozen array when frozen, falling back to the building list during startup. BotApplication exposes FreezeMiddleware() as the public hook. Callers should call it from IHostedService.StartAsync or an equivalent startup extension. Tests: two new unit tests verify that (1) UseMiddleware throws after FreezeMiddleware() is called, and (2) previously registered middleware still executes correctly after freezing. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…CompatConversations GetActivityMembersWithHttpMessagesAsync and GetConversationsWithHttpMessagesAsync used the null-forgiving operator (ServiceUrl!) to suppress the nullable warning instead of validating the value. When ServiceUrl is null the Uri constructor throws NullReferenceException with no useful context. Fix: add ArgumentException.ThrowIfNullOrWhiteSpace(ServiceUrl) at the entry point of each method (before any network call), matching the pattern already used by the other methods in the same class. Remove the null-forgiving operator. Tests: two new unit tests assert that both methods throw ArgumentException (NullArgumentException is a subtype) when ServiceUrl is null. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…rClient.Dispose()
GC.SuppressFinalize(this) suppresses the finalizer call on the next GC cycle.
Calling it on a class that has no finalizer is harmless but misleading — it
implies there is a ~CompatConnectorClient() to suppress. The associated
boilerplate comment ("Do not change this code. Put cleanup code in
'Dispose(bool disposing)' method") referred to a non-existent Dispose(bool)
overload, adding further confusion.
Fix: replace the body with a single explanatory comment that states there are
no unmanaged resources and no finalizer. Remove GC.SuppressFinalize.
Tests: two new unit tests verify that Dispose() is a safe no-op and is
idempotent (calling it twice does not throw).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
Looked at fix for: A-012 fix(A-012): eliminate race condition on OnActivity in CompatAdapter — ProcessAsync was overwriting a shared field per-request, corrupting handlers under concurrency. Added a per-request overload to BotApplication.ProcessAsync. Fixes the race condition. Looks good to me. Not sure how to review this PR since there are a lot of changes. Split this into at least 3? |
| OpenIdConnectConfiguration config = manager.GetConfigurationAsync(CancellationToken.None).GetAwaiter().GetResult(); | ||
| ``` | ||
|
|
||
| `IssuerSigningKeyResolver` is a synchronous `Func<>` callback. Because ASP.NET Core uses a thread-pool-based async pipeline, calling `.GetAwaiter().GetResult()` here risks **thread-pool starvation**: the thread that calls `GetResult()` is blocked waiting for the async operation to complete, but that operation may itself need a thread-pool thread to continue, leading to a deadlock or severe latency degradation under load. |
There was a problem hiding this comment.
GetAwaiter().GetResult() blocks the thread-pool thread and so we should also look at other parts of the codebase to flag this pattern.
There was a problem hiding this comment.
that's what the audit did, seems this is the only place in core (there are more in main)
Can you resolve this comment?
| Pre-fetch and cache the signing keys in a `Lazy<Task<IEnumerable<SecurityKey>>>` or a background `Timer`, so the synchronous resolver can return from an already-populated in-memory cache without ever blocking: | ||
|
|
||
| 1. Replace the `ConcurrentDictionary<string, ConfigurationManager<OpenIdConnectConfiguration>>` with a `ConcurrentDictionary<string, Lazy<IEnumerable<SecurityKey>>>`. | ||
| 2. On first access, start the async fetch on a **background task** (fire-and-forget with error handling). Return an empty key set (authentication will fail-fast) until the cache warms up. |
There was a problem hiding this comment.
We should be able to force a cache warm up instead of failing on the first try.
There was a problem hiding this comment.
I'm not able to make it fail on the first request, note that the suggested implementation is not using Lazy, can you review the actual fix?
|
|
||
| 1. Replace the `ConcurrentDictionary<string, ConfigurationManager<OpenIdConnectConfiguration>>` with a `ConcurrentDictionary<string, Lazy<IEnumerable<SecurityKey>>>`. | ||
| 2. On first access, start the async fetch on a **background task** (fire-and-forget with error handling). Return an empty key set (authentication will fail-fast) until the cache warms up. | ||
| 3. Schedule periodic refresh (e.g., every 24 hours) to pick up key rotations. |
There was a problem hiding this comment.
We should sync this with known key rotation time periods
There was a problem hiding this comment.
I'm not sure how often they rotate the keys, and we will need to get into long-haul testing to verify. worst case, an app restart will fix it.. no?
I'm not sure how the keys are refreshed in 2.0/2.1 before this fix, but having a blocking call in a hot code path does not seems right to me.
| @@ -0,0 +1,104 @@ | |||
| # Audit Issue 002: Unsafe `(T)(object)` Cast in Generic HTTP Response Deserializer | |||
|
|
|||
| **Severity:** Critical | |||
There was a problem hiding this comment.
This doesn't seem critical
There was a problem hiding this comment.
For a library code, where we accept any T, this might be a problem, specially when T is string.
I dont see any issue with the suggested fix.
| @@ -0,0 +1,109 @@ | |||
| # Audit Issue 003: Asymmetric Serializer / Deserializer Maps Causing Silent Data Loss | |||
There was a problem hiding this comment.
This is a feature not a bug. We wouldn't want to serialize a MessageUpdate/MessageDelete/ConversationUpdate/InstallationUpdate activity
There was a problem hiding this comment.
agree on the intent, but the fix just avoids issues if a customer call toJson from onMessageDelete. I dont see much harm, but @MehakBindra might have more insights.
|
|
||
| 1. **Type mismatch at runtime** — If the base property was set to a `ConversationAccount` (not a `TeamsConversationAccount`), `From` returns `null` with no diagnostic. Callers that assume the property is non-null (because it was set) receive `null` and may NullReferenceException later, far from the assignment site. | ||
|
|
||
| 2. **Polymorphism violation** — Code holding a `CoreActivity` reference gets different property values than code holding a `TeamsActivity` reference to the _same object_. This violates the Liskov Substitution Principle and can cause subtle bugs in generic code that accepts `CoreActivity`. |
There was a problem hiding this comment.
This is problematic b.c. TeamsActivity (i assume) is used in general methods that accept a CoreActivity parameter
There was a problem hiding this comment.
The real issue is with the shadow properties, and rebase. I'm thinking to remove channelData from CoreActivity, so we won't have these issues
| @@ -0,0 +1,109 @@ | |||
| # Audit Issue 005: `BuildServiceProvider()` Called Multiple Times During Startup | |||
Improve extraction of IConfiguration from IServiceCollection to handle both ImplementationInstance and ImplementationFactory registrations. This ensures compatibility with HostApplicationBuilder and prevents errors when resolving configuration in various DI setups. An explicit error is thrown if IConfiguration is not found.
Summary
Implements every fix from
core/docs/stabilization/STABILIZATION-PLAN.md. All 23 raw audits (consolidated to 18 work items) are addressed with one commit per issue and accompanying unit tests.Phase 1 — Critical (production risk)
fix(A-012): eliminate race condition onOnActivityinCompatAdapter—ProcessAsyncwas overwriting a shared field per-request, corrupting handlers under concurrency. Added a per-request overload toBotApplication.ProcessAsync.fix(A-001): replace blocking.GetAwaiter().GetResult()in JWT key resolver — introducedJwksKeyCachewith background pre-warming so the synchronousIssuerSigningKeyResolverserves from a volatile cache without ever blocking a thread-pool thread on the hot path.Phase 2 — High (silent data loss / crash)
fix(AUTH-01):Guid.Parse→Guid.TryParse+ fallback;new JwtSecurityToken→JsonWebTokenwrapped in try/catch.fix(SER-01): fill 7 missing entries inActivitySerializerMapto matchActivityDeserializerMap; register all types inTeamsActivityJsonContext.fix(CAST-01): null-safe as-cast fallback inTeamsActivityshadowed property getters (From/Recipient/Conversation/ChannelData); also fixedTeamsConversationAccount.FromConversationAccountsharing the Properties reference.fix(A-002): replace fragile(T)(object)double-cast inBotHttpClientstring deserialization path with a typedDeserializeAsString<T>helper.fix(DI-01): remove twoBuildServiceProvider()calls during DI registration (BotConfig.Resolve,JwtExtensions.ResolveBotConfig); fail-fast withInvalidOperationExceptionwhenIConfigurationis absent.Phase 3 — Medium (correctness defects)
fix(A-021): replace Newtonsoft→STJ two-leg round-trip with single STJ serializer inSendMeetingNotificationAsync.fix(A-016): replace 5 direct(Activity)casts withas Activity ?? throw ArgumentExceptioninCompatTeamsInfo; guard fires beforeGetTeamsApiClientfor early, descriptive errors.fix(COPY-01): deep-copy all mutable reference fields inCoreActivitycopy constructor (via STJ round-trip); fixCitationEntityto clone eachCitationClaim; fixTeamsConversationAccount.FromConversationAccountcopying Properties by reference.fix(A-014): sealTeamsAttachmentBuilderafterBuild()with a_builtflag; second call throwsInvalidOperationException.fix(A-008): preserve unknown entity types as baseEntityinstead ofnull; fixToJsonArrayto handleJsonElementextension-data values.fix(A-007): addExtractString(object?)helper inCompatActivitythat usesje.GetString()forJsonElementvalues instead of.ToString()(which includes JSON quotes).fix(A-009): replace single fragile message-string match withIsStreamCancelledhelper that checks HTTP status 499, 408, and text as a fallback.Phase 4 — Low (hygiene)
fix(A-010): replace_accumulated += chunk(O(n²)) withStringBuilder.AppendinTeamsStreamingWriter.fix(A-020): addTurnMiddleware.Freeze()+BotApplication.FreezeMiddleware()to seal the pipeline after startup.fix(A-022): addArgumentException.ThrowIfNullOrWhiteSpace(ServiceUrl)guard in twoCompatConversationsmethods that usednew Uri(ServiceUrl!).fix(A-019): remove misleadingGC.SuppressFinalizefromCompatConnectorClient.Dispose().Test plan
🤖 Generated with Claude Code