Further XSS fixes in link attrs (#703) by Crozzers · Pull Request #705 · trentm/python-markdown2

Crozzers · 2026-05-14T21:44:12Z

Further fixes for #703, specifically the follow up issues raised in this comment: #703 (comment)

Issue 1:
For some reason html encoded colons function as normal colons in hrefs, so javascript&colon;alert() is equal to javascript:alert().

Fixed this by checking for these sequences alongside colons.

Issue 2:
The safe_href regex we use allows for URL domains with ports. This was intended for links like localhost:880/abcdef but could be abused by making the JS look like a domain with a port, like so:

javascript:1/alert();
^ domain  ^ port

Issue 3:
This used some quirky markdown in a image title attr to escape the link. Fixed by hashing the title attr to prevent reprocessing, much like we do with alt=

JorianWoltjer · 2026-05-15T17:31:23Z

Nice work, unfortunately the fuzzer found another bypass on this new branch 😅

Input:

![](`<A B="
" onerror="alert(origin)">`)

Output:

<p><img src="code&gt;&lt;A B="
" onerror="alert(origin)"&gt;&lt;/code" alt="" /></p>

Issue was a while loop comparison. We did `orig != text` but assigned `orig = text` at the end of the loop, where it should have been at the start, before any transformations take place

Crozzers · 2026-05-23T09:59:54Z

Nice work, unfortunately the fuzzer found another bypass on this new branch 😅

Input:
![](`<A B="
" onerror="alert(origin)">`)
Output:
<p><img src="code&gt;&lt;A B="
" onerror="alert(origin)"&gt;&lt;/code" alt="" /></p>

Managed to fix this. Turns out we were hashing the code and spans, and we were meant to unhash them again in the URL encoding, but the while loop was incorrect, and didn't properly recursively unhash everything.

JorianWoltjer · 2026-05-23T11:21:52Z

Another bypass 😅

Input:

![x](<"`"![x][id]
[id]: x "<A B="" onerror="alert(origin)">`

Output:

<p>![x](&lt;"`"<img src="x &quot;&lt;A B="" onerror="alert(origin)"&gt;`" alt="x" /></p>

Crozzers · 2026-05-24T09:03:44Z

Another bypass 😅

Another fix. That one was smuggling the XSS through link definitions, so I've changed the _protect_url function to do all the escaping (unhashing code+html spans, escaping bold/em), and LinkProcessor should run all image and anchor URLs through it

JorianWoltjer · 2026-05-24T17:16:47Z

Took a few minutes more than usual this time, but fuzzer pulled through again on the latest commit:

Input:

- [x]
   1. - [x]
___
[x](`")}<img src="x`" onerror="alert(origin)">
___

Output:

<ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>

</ol></li>
</ul>

<hr />

[x](`")}<img src="x`" onerror="alert(origin)">

<hr />

…line

Crozzers · 2026-06-13T15:49:55Z

Apologies, been a few weeks. Haven't had much of a chance to look at this again

Took a few minutes more than usual this time

Maybe I'm winning then ;).

This one, the issue was in how _hash_html_blocks works, and how we output lists.

That list gets converted to this:

<ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>
</ol></li>
</ul>

The block hasher would previously try and figure out where this list started and ended by ONLY looking at the first tag on the line, since it's meant to run against neatly formatted HTML.

This list would trip it up as the sublists open with tags at the end of the line, so hash the first half of the list:

md5-<hash>

</ol></li>
</ul>

And then because of the tag imbalance and the HRs it would end up hashing the rest of the sample, including the actual XSS bit, protecting it from processing and allowing it to smuggle itself through.

Maybe the proper fix would be to change how we output lists, so that they're indented and formatted properly. However, this would change the output of 50+ test files, so for least disruption and churn I've changed the hashing process instead.

I've changed the process so that when it's looking at these blocks it will account for the tags not being at the exact start of the line.

Small side effect of this change was the test case from #584 has changed. It doesn't crash still (which was the main meat of the issue) but the output has changed slightly as it's no longer hashed.

JorianWoltjer · 2026-06-14T08:29:19Z

Nice, another bypass on your Crozzers:further-xss-fixes branch:

Input:

<http://onclick=alert(origin)//![](x)>

Output:

<p><http://onclick=alert(origin)//<img src="x" alt="" />&gt;</p>Click me

Click on the text and it triggers the alert(origin).

Crozzers · 2026-06-14T15:10:51Z

Managed to get this one too. Issue was the link processor was processing links even within autolinks.

Added a check for this

nicholasserra · 2026-06-15T17:15:11Z

You folks are killin it, thank you. Ping me when you wanna land this chunk

JorianWoltjer · 2026-06-16T18:00:15Z

Aaaaand another one, took another few minutes now (also looks pretty complicated), maybe getting close 😆

Input:

---
* ```
    * ```
	
	x
```
---
```) <script>alert(origin)</script>```
"
---

Output:

<hr />

<ul>
<li><p>```</p>

<ul>
<li>```</li>
</ul>

<p>x</p>

<h2>```</h2></li>
</ul>

```) <script>alert(origin)</script>```
<h2>"</h2>

Crozzers · 2026-06-20T16:03:02Z

This one was easier. In HTML hashing we were checking for balanced tags, not accounting for the fact that a <hr /> tag is completely balanced by itself.

Added a check for void elements that short circuits and returns that it's balanced

Weirdly enough, I remember looking at void elements recently, although can't remember why. Don't think I added it in a previous PR? Maybe I'm going crazy

JorianWoltjer · 2026-06-22T09:41:41Z

Another bypass using autolinks:

Input:

<http://onclick=alert(origin)//[`Click me`]()>

Output:

<p><http://onclick=alert(origin)//<a href="#"><code>Click me</code></a>&gt;</p>

Crozzers added 3 commits May 14, 2026 22:23

Fix XSS fromn HTML encoded colons in hrefs

3b96ec1

Fix XSS from making javascript: hrefs look like domains with ports

a11ce82

Fix onerror XSS in image title attr

82b4482

Crozzers marked this pull request as draft May 16, 2026 11:16

Crozzers added 2 commits May 23, 2026 10:51

Fix incomplete recursive unhashing of spans

456f8a9

Issue was a while loop comparison. We did `orig != text` but assigned `orig = text` at the end of the loop, where it should have been at the start, before any transformations take place

Update github actions versions

c173c12

Crozzers marked this pull request as ready for review May 23, 2026 09:58

Fix smuggling XSS into link def URLs

b0dd0b3

Fix HTML block hashing messing up if open/close tags not at start of …

c7a75f6

…line

Fix links being processed within autolink syntax

e7b0ba1

Fix tag balance checkers not accounting for void tags

21eb34b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further XSS fixes in link attrs (#703)#705

Further XSS fixes in link attrs (#703)#705
Crozzers wants to merge 9 commits into
trentm:masterfrom
Crozzers:further-xss-fixes

Crozzers commented May 14, 2026

Uh oh!

JorianWoltjer commented May 15, 2026

Uh oh!

Crozzers commented May 23, 2026

Uh oh!

JorianWoltjer commented May 23, 2026

Uh oh!

Crozzers commented May 24, 2026

Uh oh!

JorianWoltjer commented May 24, 2026

Uh oh!

Crozzers commented Jun 13, 2026 •

edited

Loading

Uh oh!

JorianWoltjer commented Jun 14, 2026

Uh oh!

Crozzers commented Jun 14, 2026

Uh oh!

nicholasserra commented Jun 15, 2026

Uh oh!

JorianWoltjer commented Jun 16, 2026

Uh oh!

Crozzers commented Jun 20, 2026 •

edited

Loading

Uh oh!

JorianWoltjer commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Crozzers commented May 14, 2026

Uh oh!

JorianWoltjer commented May 15, 2026

Uh oh!

Crozzers commented May 23, 2026

Uh oh!

JorianWoltjer commented May 23, 2026

Uh oh!

Crozzers commented May 24, 2026

Uh oh!

JorianWoltjer commented May 24, 2026

Uh oh!

Crozzers commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JorianWoltjer commented Jun 14, 2026

Uh oh!

Crozzers commented Jun 14, 2026

Uh oh!

nicholasserra commented Jun 15, 2026

Uh oh!

JorianWoltjer commented Jun 16, 2026

Uh oh!

Crozzers commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JorianWoltjer commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Crozzers commented Jun 13, 2026 •

edited

Loading

Crozzers commented Jun 20, 2026 •

edited

Loading