[fix] Fix HTML logger exception on invalid XML chars in test display names by nohwnd · Pull Request #16051 · microsoft/vstest

nohwnd · 2026-05-22T01:56:27Z

🤖 This is an automated fix generated by the Issue Triage & Auto-Fix workflow.

Root Cause

When a test's DisplayName (e.g. set via a [DataRow] attribute) contains XML 1.0 invalid control characters such as \x01–\x08, \x0B, \x0C, \x0E–\x1F, DataContractSerializer.WriteObject() in HtmlLogger.PopulateHtmlFile() throws an XmlException:

Html Logger Error : '', hexadecimal value 0x01, is an invalid character.

The HTML report is silently not generated even though tests continue to pass. The XML 1.0 spec forbids these code points entirely — they cannot appear even as character references.

Fix

Apply the same sanitization already used by TrxLogger's XmlPersistence.RemoveInvalidXmlChar(): replace XML-invalid characters with their Unicode escape representation (e.g. \u0001) before storing them in the HTML logger object model.

Added a private RemoveInvalidXmlChars(string?) helper to HtmlLogger and applied it to DisplayName, FullyQualifiedName, ErrorStackTrace, and ErrorMessage in TestResultHandler().

Test

Added unit test TestResultHandlerShouldSanitizeInvalidXmlCharsInDisplayName that verifies control characters (\x01, \x02, \x03) in DisplayName, ErrorMessage, and ErrorStackTrace are replaced with their \uXXXX escape sequences.

🔍 Triaged by Issue Repro Triage & Auto-Fix 🔍

When a test's DisplayName (e.g. from a DataRow attribute) contains XML 1.0 invalid control characters such as 0x01-0x08, 0x0B, 0x0C, 0x0E-0x1F, DataContractSerializer throws XmlException and silently prevents the HTML report from being generated. Apply the same sanitization pattern already used by TrxLogger's XmlPersistence to replace invalid XML characters with their Unicode escape representation (e.g. \u0001) before they are stored in the HTML logger object model. Fixes #10431 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR fixes a failure mode in the HTML logger where DataContractSerializer.WriteObject() can throw when test result strings contain XML 1.0–invalid control characters (e.g., from [DataRow] display names), preventing the HTML report from being generated.

Changes:

Sanitize DisplayName, FullyQualifiedName, ErrorMessage, and ErrorStackTrace in HtmlLogger.TestResultHandler() before storing them in the HTML logger object model.
Add RemoveInvalidXmlChars(string?) helper to escape invalid XML characters as \uXXXX.
Add a unit test validating escaping for control characters in result fields.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/Microsoft.TestPlatform.Extensions.HtmlLogger/HtmlLogger.cs`	Adds XML-invalid character sanitization for key serialized fields and introduces a helper used during result handling.
`test/Microsoft.TestPlatform.Extensions.HtmlLogger.UnitTests/HtmlLoggerTests.cs`	Adds coverage to ensure control characters are escaped instead of causing serialization failures.

+        // From xml spec (http://www.w3.org/TR/xml/#charsets) valid chars:
+        // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
+        const string invalidChar = @"[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD]";
+        return Regex.Replace(str, invalidChar, m => $@"\u{(ushort)m.Value[0]:x4}");
+    }


+    /// <summary>
+    /// Removes characters that are invalid in XML 1.0 from a string.
+    /// </summary>
+    /// <remarks>
+    /// XML 1.0 valid characters: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD].
+    /// Control characters in the range #x00-#x08, #x0B, #x0C, #x0E-#x1F are not valid and
+    /// will cause <see cref="DataContractSerializer"/> to throw an <see cref="System.Xml.XmlException"/>.
+    /// Invalid characters are replaced with their Unicode escape representation.
+    /// </remarks>


    }

+    [TestMethod]
+    public void TestResultHandlerShouldSanitizeInvalidXmlCharsInDisplayName()


nohwnd

Review: Fix HTML logger exception on invalid XML chars

The fix correctly addresses the root cause — DataContractSerializer throwing on XML 1.0 invalid control characters in test display names. Two issues worth addressing before merge:

Correctness (surrogate pairs) — The regex excludes the entire surrogate range \uD800–\uDFFF, which means valid supplementary Unicode characters (emoji, etc.) in test display names are silently corrupted to \uD83D\uDE00-style sequences. Since DataContractSerializer handles valid surrogate pairs correctly, this sanitization is overly aggressive. See inline comment at line 476.
Performance (static regex) — The regex should be a static readonly compiled field rather than recreated per-call via the string overload. See inline comment at line 474.

The primary fix (replacing control chars \x00–\x1F minus the whitespace ones) is correct and the test coverage is good.

🧠 Reviewed by Expert Code Reviewer

🧠 Reviewed by Expert Code Reviewer 🧠

nohwnd · 2026-05-22T02:13:59Z

+        }
+
+        // From xml spec (http://www.w3.org/TR/xml/#charsets) valid chars:
+        // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]


[Correctness] Surrogate pairs (supplementary Unicode characters) are incorrectly stripped

The regex [^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD] has a gap between \uD7FF and \uE000 that covers the entire surrogate range \uD800–\uDFFF. In .NET strings, supplementary Unicode characters (U+10000–U+10FFFF, including all emoji) are encoded as surrogate pairs — two char values both in that range.

DataContractSerializer / XmlWriter handle surrogate pairs correctly and serialize them as the corresponding supplementary Unicode code point, which is valid XML 1.0 ([#x10000-#x10FFFF]). So DataContractSerializer will not throw on a string like "Test(😀)".

The current code will corrupt such strings: "Test(😀)" → "Test(\uD83D\uDE00)" — two broken escape sequences instead of the emoji. A test whose [DataRow] argument contains an emoji will have its display name silently mangled in the HTML report.

Suggested fix — allow valid surrogate pairs to pass through by checking for a complete surrogate pair before replacing:

private static readonly Regex s_invalidXmlCharsRegex = new Regex(@"[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]", RegexOptions.Compiled); private static string? RemoveInvalidXmlChars(string? str) { if (str is null) return null; // Allow valid surrogate pairs (they represent U+10000–U+10FFFF, which are valid XML chars). // Only lone surrogates are truly invalid. return s_invalidXmlCharsRegex.Replace(str, m => m.Value.Length == 1 ? $@"\u{(ushort)m.Value[0]:x4}" : m.Value); }

Alternatively, use char.IsHighSurrogate/char.IsLowSurrogate in a manual loop, which is clearer and avoids nested regex complexity.

Fixed — the surrogate range \uD800-\uDFFF is now excluded from the negated character class in the first alternative, so valid surrogate pairs pass through untouched. Two additional alternatives handle lone surrogates (unpaired high or low) using lookahead/lookbehind. Added a test that verifies "Test(😀)" is preserved as-is after sanitization.

🔧 Iterated by PR Iteration Agent 🔧

nohwnd · 2026-05-22T02:13:59Z

+        {
+            return null;
+        }
+


[Performance] Regex is re-instantiated on every call via the string overload

Regex.Replace(str, invalidChar, ...) with a const string pattern looks up (and possibly compiles) the regex on every invocation via .NET's internal cache (size 15). This method is called 4× per test result; for a run with thousands of tests this accumulates.

Since the project targets netstandard2.0 and net48, [GeneratedRegex] isn't available, but a static readonly field with RegexOptions.Compiled eliminates the cache lookup overhead:

private static readonly Regex s_invalidXmlCharsRegex = new Regex( @"[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD]", RegexOptions.Compiled);

Minor, but worth fixing given the hot-path nature of TestResultHandler.

Fixed — the regex is now a private static readonly Regex InvalidXmlCharsRegex with RegexOptions.Compiled, eliminating the per-call lookup overhead.

🔧 Iterated by PR Iteration Agent 🔧

…rs correctly - Use a static readonly compiled Regex instead of re-creating on every call - Exclude the surrogate range from the negated char class in the first alternative so valid surrogate pairs are not matched; add explicit lone-surrogate alternatives with lookahead/lookbehind to catch only invalid lone surrogates - Add test verifying emoji (valid surrogate pair) passes through unchanged Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

nohwnd · 2026-05-22T02:35:56Z

Commit pushed: 82ef8e5

🔧 Iterated by PR Iteration Agent 🔧

nohwnd

Both previous findings are resolved in this revision:

[Performance] Regex is now a private static readonly field with RegexOptions.Compiled — no per-call overhead.
[Correctness] The regex now correctly handles surrogate pairs: valid high+low surrogate pairs pass through unchanged; only lone surrogates are replaced with their \uXXXX escape. The three-alternative pattern is logically correct, including the edge cases of a high surrogate at end-of-string and a low surrogate at start-of-string.

Test coverage is solid — both the invalid control char case and the emoji (surrogate pair) preservation case are tested.

The fix is correct and complete. ✅

🧠 Reviewed by Expert Code Reviewer 🧠

🧠 Reviewed by Expert Code Reviewer 🧠

Copilot AI review requested due to automatic review settings May 22, 2026 01:56

nohwnd added the bug label May 22, 2026

Copilot started reviewing on behalf of nohwnd May 22, 2026 01:56 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

nohwnd commented May 22, 2026

View reviewed changes

This was referenced May 22, 2026

Daily Maintenance Digest — 2026-05-25 #16049

Open

Fix HTML logger crash on invalid XML chars in test names #16052

Draft

Daily Maintenance Digest— 2026-05-26 #16066

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Fix HTML logger exception on invalid XML chars in test display names#16051

[fix] Fix HTML logger exception on invalid XML chars in test display names#16051
nohwnd wants to merge 2 commits into
mainfrom
fix/issue-10431-html-logger-special-chars-390ce20d95623f88

nohwnd commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

nohwnd left a comment

Uh oh!

nohwnd May 22, 2026

Uh oh!

nohwnd May 22, 2026

Uh oh!

nohwnd May 22, 2026

Uh oh!

nohwnd May 22, 2026

Uh oh!

nohwnd commented May 22, 2026

Uh oh!

nohwnd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nohwnd commented May 22, 2026

Root Cause

Fix

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

nohwnd left a comment

Choose a reason for hiding this comment

Review: Fix HTML logger exception on invalid XML chars

Uh oh!

nohwnd May 22, 2026

Choose a reason for hiding this comment

Uh oh!

nohwnd May 22, 2026

Choose a reason for hiding this comment

Uh oh!

nohwnd May 22, 2026

Choose a reason for hiding this comment

Uh oh!

nohwnd May 22, 2026

Choose a reason for hiding this comment

Uh oh!

nohwnd commented May 22, 2026

Uh oh!

nohwnd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants