Skip to content

Further XSS fixes in link attrs (#703)#705

Open
Crozzers wants to merge 9 commits into
trentm:masterfrom
Crozzers:further-xss-fixes
Open

Further XSS fixes in link attrs (#703)#705
Crozzers wants to merge 9 commits into
trentm:masterfrom
Crozzers:further-xss-fixes

Conversation

@Crozzers

Copy link
Copy Markdown
Contributor

Further fixes for #703, specifically the follow up issues raised in this comment: #703 (comment)

Issue 1:
For some reason html encoded colons function as normal colons in hrefs, so javascript:alert() is equal to javascript:alert().

Fixed this by checking for these sequences alongside colons.

Issue 2:
The safe_href regex we use allows for URL domains with ports. This was intended for links like localhost:880/abcdef but could be abused by making the JS look like a domain with a port, like so:

javascript:1/alert();
^ domain  ^ port

Issue 3:
This used some quirky markdown in a image title attr to escape the link. Fixed by hashing the title attr to prevent reprocessing, much like we do with alt=

@JorianWoltjer

Copy link
Copy Markdown

Nice work, unfortunately the fuzzer found another bypass on this new branch 😅

Input:

![](`<A B="
" onerror="alert(origin)">`)

Output:

<p><img src="code&gt;&lt;A B="
" onerror="alert(origin)"&gt;&lt;/code" alt="" /></p>

@Crozzers Crozzers marked this pull request as draft May 16, 2026 11:16
Crozzers added 2 commits May 23, 2026 10:51
Issue was a while loop comparison. We did `orig != text` but assigned `orig = text` at the end of the loop,
where it should have been at the start, before any transformations take place
@Crozzers Crozzers marked this pull request as ready for review May 23, 2026 09:58
@Crozzers

Copy link
Copy Markdown
Contributor Author

Nice work, unfortunately the fuzzer found another bypass on this new branch 😅

Input:

![](`<A B="
" onerror="alert(origin)">`)

Output:

<p><img src="code&gt;&lt;A B="
" onerror="alert(origin)"&gt;&lt;/code" alt="" /></p>

Managed to fix this. Turns out we were hashing the code and spans, and we were meant to unhash them again in the URL encoding, but the while loop was incorrect, and didn't properly recursively unhash everything.

@JorianWoltjer

Copy link
Copy Markdown

Another bypass 😅

Input:

![x](<"`"![x][id]
[id]: x "<A B="" onerror="alert(origin)">`

Output:

<p>![x](&lt;"`"<img src="x &quot;&lt;A B="" onerror="alert(origin)"&gt;`" alt="x" /></p>

@Crozzers

Copy link
Copy Markdown
Contributor Author

Another bypass 😅

Another fix. That one was smuggling the XSS through link definitions, so I've changed the _protect_url function to do all the escaping (unhashing code+html spans, escaping bold/em), and LinkProcessor should run all image and anchor URLs through it

@JorianWoltjer

Copy link
Copy Markdown

Took a few minutes more than usual this time, but fuzzer pulled through again on the latest commit:

Input:

- [x]
   1. - [x]
___
[x](`")}<img src="x`" onerror="alert(origin)">
___

Output:

<ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>

</ol></li>
</ul>

<hr />

[x](`")}<img src="x`" onerror="alert(origin)">

<hr />

@Crozzers

Crozzers commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Apologies, been a few weeks. Haven't had much of a chance to look at this again

Took a few minutes more than usual this time

Maybe I'm winning then ;).


This one, the issue was in how _hash_html_blocks works, and how we output lists.

That list gets converted to this:

<ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>
</ol></li>
</ul>

The block hasher would previously try and figure out where this list started and ended by ONLY looking at the first tag on the line, since it's meant to run against neatly formatted HTML.

This list would trip it up as the sublists open with tags at the end of the line, so hash the first half of the list:

md5-<hash>

</ol></li>
</ul>

And then because of the tag imbalance and the HRs it would end up hashing the rest of the sample, including the actual XSS bit, protecting it from processing and allowing it to smuggle itself through.

Maybe the proper fix would be to change how we output lists, so that they're indented and formatted properly. However, this would change the output of 50+ test files, so for least disruption and churn I've changed the hashing process instead.

I've changed the process so that when it's looking at these blocks it will account for the tags not being at the exact start of the line.


Small side effect of this change was the test case from #584 has changed. It doesn't crash still (which was the main meat of the issue) but the output has changed slightly as it's no longer hashed.

@JorianWoltjer

Copy link
Copy Markdown

Nice, another bypass on your Crozzers:further-xss-fixes branch:

Input:

<http://onclick=alert(origin)//![](x)>

Output:

<p><http://onclick=alert(origin)//<img src="x" alt="" />&gt;</p>Click me

Click on the text and it triggers the alert(origin).

@Crozzers

Copy link
Copy Markdown
Contributor Author

Managed to get this one too. Issue was the link processor was processing links even within autolinks.

Added a check for this

@nicholasserra

Copy link
Copy Markdown
Collaborator

You folks are killin it, thank you. Ping me when you wanna land this chunk

@JorianWoltjer

Copy link
Copy Markdown

Aaaaand another one, took another few minutes now (also looks pretty complicated), maybe getting close 😆

Input:

---
* ```
    * ```
	
	x
```
---
```) <script>alert(origin)</script>```
"
---

Output:

<hr />

<ul>
<li><p>```</p>

<ul>
<li>```</li>
</ul>

<p>x</p>

<h2>```</h2></li>
</ul>

```) <script>alert(origin)</script>```
<h2>"</h2>

@Crozzers

Crozzers commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

This one was easier. In HTML hashing we were checking for balanced tags, not accounting for the fact that a <hr /> tag is completely balanced by itself.

Added a check for void elements that short circuits and returns that it's balanced


Weirdly enough, I remember looking at void elements recently, although can't remember why. Don't think I added it in a previous PR? Maybe I'm going crazy

@JorianWoltjer

Copy link
Copy Markdown

Another bypass using autolinks:

Input:

<http://onclick=alert(origin)//[`Click me`]()>

Output:

<p><http://onclick=alert(origin)//<a href="#"><code>Click me</code></a>&gt;</p>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants