The bug was found by Masato Kinugawa and LiveOverflow has made a video about it which went viral!! (it really is worth it to check it out, plus you can find a lot of additional great stuff on his YouTube channel as well).
The video clearly explains the bug, but practice makes perfect so we’ve created a tutorial challenge about the bug, where you can:
- Exploit the same vulnerability
- Play with the HTML parser of your browser
- See how the vulnerable version of the sanitizer worked
Also we think it’s worth highlighting the key elements and main points of the story, which is also the goal of this post.
Dealing with user input
Then let’s sanitize it on the client-side
It sounds really wrong - so why would Google do that? Because parsing HTML isn’t as easy as you may think. The specification is really complex and the implementation can be different in the browsers as well. Additionally, browsers don’t just simply parse the HTML - they’re fixing malformed code, completing missing tags, paying attention to headers, loading external resources. Implementing and maintaining a library for that would be really hard - especially because of the different versions of the different browsers. I guess using the client for that makes more sense now.
But how to do it securely?
There’s a very special
<template> tag which is perfect for the job. Its content is parsed, but not rendered. That means the browser does its magic (like fixing the missing closing tags), but it won’t execute scripts or load images. The basic concept of sanitizing HTML on the client is the following:
- Loading the user input into a
<template>tag and letting the browser parse it
- Removing scripts and unwanted tags and attributes (by using a whitelist for example)
- The result can now be used securely in the HTML code
And here comes the
<noscript> tag which is really special as well. The specification says:
noscriptelement represents nothing if scripting is enabled, and represents its children if scripting is disabled. It’s used to present different markup to user agents that support scripting and those that don’t support scripting, by affecting how the document is parsed.
<template> element scripting is disabled, but in the browser (after using the sanitized HTML in the DOM) scripting is enabled. Combining this fact with the helpful behavior of the browsers (where they finish incomplete tags) led to the ultimate payload, which could’ve been used to bypass the Google - Closure and the Cure53 - DOMPurify libraries (both are popular HTML sanitizers):
<noscript><p title='</noscript><img src=x onerror=alert(1)>'>
It’s parsed inside the template element like this:
But when it’s used in a scripting enabled context it becomes:
And boom - the
alert(1) is executed. It’s a really awesome example of how weird browsers can be and a motivation for all the bug bounty hunters out there. I bet almost none of us thought that we’d see a working XSS on the homepage of Google.
How could it have been avoided?
The funny thing is that they (at Google) knew about this attack vector already and fixed the code a long time ago. However they didn’t add any unit tests and when someone later reverted it, the build passed (because it didn’t break the non-existent test) and the vulnerability ended up in the production code. So I think the most important takeway from the story is tests are really important.
I’ve seen the video multiple times, checked many-many comments about this vulnerability and played with the JS debugger for hours, but none of these could answer why was the user input parsed as HTML in the search engine?, which is the real question here I think. But even if it wasn’t the case - the payload probably could’ve worked in GMail.
Now it’s your time to analyze the bug interactively by solving our tutorial challenge about it. Have fun!