Skip to content

Fix HTML logger crash on invalid XML chars in test names#16052

Draft
nohwnd wants to merge 2 commits into
microsoft:mainfrom
nohwnd:fix/html-logger-invalid-xml-chars
Draft

Fix HTML logger crash on invalid XML chars in test names#16052
nohwnd wants to merge 2 commits into
microsoft:mainfrom
nohwnd:fix/html-logger-invalid-xml-chars

Conversation

@nohwnd
Copy link
Copy Markdown
Member

@nohwnd nohwnd commented May 22, 2026

Fix #10431

When test display names contain XML 1.0 invalid control characters (\x01\x08, \x0B, \x0C, \x0E\x1F), DataContractSerializer throws an XmlException and the HTML report is silently broken — test entries with those characters are missing from the output.

Sanitize those characters before storing them in the HTML logger object model — same approach as TrxLogger already uses. Invalid chars are replaced with their \uXXXX escape representation. Valid surrogate pairs (emoji etc.) pass through unchanged.

Applied to DisplayName, FullyQualifiedName, ErrorStackTrace, and ErrorMessage in TestResultHandler.

Before — HTML report with control chars in test names (test entries missing, only Test(normal) shows up):

before-broken

After — control chars sanitized to \u0001, all three tests visible:

after-fixed

nohwnd and others added 2 commits May 22, 2026 08:44
When a test's DisplayName (e.g. from a DataRow attribute) contains XML 1.0
invalid control characters such as 0x01-0x08, 0x0B, 0x0C, 0x0E-0x1F,
DataContractSerializer throws XmlException and silently prevents the HTML
report from being generated.

Apply the same sanitization pattern already used by TrxLogger's XmlPersistence
to replace invalid XML characters with their Unicode escape representation
(e.g. \u0001) before they are stored in the HTML logger object model.

Fixes microsoft#10431

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rs correctly

- Use a static readonly compiled Regex instead of re-creating on every call
- Exclude the surrogate range from the negated char class in the first
  alternative so valid surrogate pairs are not matched; add explicit
  lone-surrogate alternatives with lookahead/lookbehind to catch only
  invalid lone surrogates
- Add test verifying emoji (valid surrogate pair) passes through unchanged

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 06:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a failure mode in the HTML logger where DataContractSerializer can throw XmlException (preventing HTML report generation) when test names or failure details include XML 1.0–invalid control characters. It introduces XML-character sanitization (aligned with the TRX logger’s approach) before persisting the HTML logger object model.

Changes:

  • Added XML 1.0 invalid-character detection (with surrogate-pair preservation) and sanitization to HtmlLogger.
  • Applied sanitization to DisplayName, FullyQualifiedName, ErrorMessage, and ErrorStackTrace in TestResultHandler.
  • Added unit tests covering invalid control character replacement and preservation of valid surrogate pairs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/Microsoft.TestPlatform.Extensions.HtmlLogger/HtmlLogger.cs Sanitizes XML-invalid characters before serializing test result fields; adds regex + helper.
test/Microsoft.TestPlatform.Extensions.HtmlLogger.UnitTests/HtmlLoggerTests.cs Adds unit tests for invalid XML control characters and surrogate-pair preservation.

Comment on lines +251 to +273
[TestMethod]
public void TestResultHandlerShouldSanitizeInvalidXmlCharsInDisplayName()
{
// Characters like \x01 (SOH) are invalid in XML 1.0 and would cause DataContractSerializer to throw.
var testCase = CreateTestCase("Pass1");
testCase.FullyQualifiedName = "fully";
testCase.Source = "abc/def.dll";

var testResult = new ObjectModel.TestResult(testCase)
{
DisplayName = "TestMethod(\x01value)",
ErrorMessage = "error\x02message",
ErrorStackTrace = "stack\x03trace",
};

_htmlLogger.TestResultHandler(new object(), new Mock<TestResultEventArgs>(testResult).Object);

var result = _htmlLogger.TestRunDetails!.ResultCollectionList!.First().ResultList!.First();

Assert.AreEqual(@"TestMethod(\u0001value)", result.DisplayName);
Assert.AreEqual(@"error\u0002message", result.ErrorMessage);
Assert.AreEqual(@"stack\u0003trace", result.ErrorStackTrace);
}
Comment on lines +471 to +474
/// XML 1.0 valid characters: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD].
/// Control characters in the range #x00-#x08, #x0B, #x0C, #x0E-#x1F are not valid and
/// will cause <see cref="DataContractSerializer"/> to throw an <see cref="System.Xml.XmlException"/>.
/// Invalid characters are replaced with their Unicode escape representation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dotnet test html logger throws execption when using special characters in DataRow attributes.

2 participants