Further XSS fixes in link attrs (#703)#705
Conversation
|
Nice work, unfortunately the fuzzer found another bypass on this new branch 😅 Input: ">`)Output: <p><img src="code><A B="
" onerror="alert(origin)"></code" alt="" /></p> |
Issue was a while loop comparison. We did `orig != text` but assigned `orig = text` at the end of the loop, where it should have been at the start, before any transformations take place
Managed to fix this. Turns out we were hashing the code and spans, and we were meant to unhash them again in the URL encoding, but the while loop was incorrect, and didn't properly recursively unhash everything. |
|
Another bypass 😅 Input: ">`Output: <p>">`" alt="x" /></p> |
Another fix. That one was smuggling the XSS through link definitions, so I've changed the |
|
Took a few minutes more than usual this time, but fuzzer pulled through again on the latest commit: Input: - [x]
1. - [x]
___
[x](`")}<img src="x`" onerror="alert(origin)">
___Output: <ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>
</ol></li>
</ul>
<hr />
[x](`")}<img src="x`" onerror="alert(origin)">
<hr /> |
|
Apologies, been a few weeks. Haven't had much of a chance to look at this again
Maybe I'm winning then ;). This one, the issue was in how That list gets converted to this: <ul>
<li>[x]
<ol>
<li><ul>
<li>[x]</li>
</ul></li>
</ol></li>
</ul>The block hasher would previously try and figure out where this list started and ended by ONLY looking at the first tag on the line, since it's meant to run against neatly formatted HTML. This list would trip it up as the sublists open with tags at the end of the line, so hash the first half of the list: md5-<hash>
</ol></li>
</ul>And then because of the tag imbalance and the HRs it would end up hashing the rest of the sample, including the actual XSS bit, protecting it from processing and allowing it to smuggle itself through. Maybe the proper fix would be to change how we output lists, so that they're indented and formatted properly. However, this would change the output of 50+ test files, so for least disruption and churn I've changed the hashing process instead. I've changed the process so that when it's looking at these blocks it will account for the tags not being at the exact start of the line. Small side effect of this change was the test case from #584 has changed. It doesn't crash still (which was the main meat of the issue) but the output has changed slightly as it's no longer hashed. |
|
Nice, another bypass on your Input: <http://onclick=alert(origin)//>Output: <p><http://onclick=alert(origin)//<img src="x" alt="" />></p>Click meClick on the text and it triggers the |
|
Managed to get this one too. Issue was the link processor was processing links even within autolinks. Added a check for this |
|
You folks are killin it, thank you. Ping me when you wanna land this chunk |
|
Aaaaand another one, took another few minutes now (also looks pretty complicated), maybe getting close 😆 Input: ---
* ```
* ```
x
```
---
```) <script>alert(origin)</script>```
"
---Output: <hr />
<ul>
<li><p>```</p>
<ul>
<li>```</li>
</ul>
<p>x</p>
<h2>```</h2></li>
</ul>
```) <script>alert(origin)</script>```
<h2>"</h2> |
|
This one was easier. In HTML hashing we were checking for balanced tags, not accounting for the fact that a Added a check for void elements that short circuits and returns that it's balanced Weirdly enough, I remember looking at void elements recently, although can't remember why. Don't think I added it in a previous PR? Maybe I'm going crazy |
|
Another bypass using autolinks: Input: <http://onclick=alert(origin)//[`Click me`]()>Output: <p><http://onclick=alert(origin)//<a href="#"><code>Click me</code></a>></p> |
Further fixes for #703, specifically the follow up issues raised in this comment: #703 (comment)
Issue 1:
For some reason html encoded colons function as normal colons in hrefs, so
javascript:alert()is equal tojavascript:alert().Fixed this by checking for these sequences alongside colons.
Issue 2:
The safe_href regex we use allows for URL domains with ports. This was intended for links like
localhost:880/abcdefbut could be abused by making the JS look like a domain with a port, like so:Issue 3:
This used some quirky markdown in a image title attr to escape the link. Fixed by hashing the title attr to prevent reprocessing, much like we do with
alt=