Acid 2 screenshots
This is a series of screenshots of current major browsers (or browsers that pass), rendering the Acid 2 test. Each can be compared with the screenshots of Internet Explorer's current releases - currently the most popular browser, and also the one with the poorest support for Internet standards. Congratulations to everyone who passes; Safari, Prince, Opera, iCab, Konqueror, (future) Mozilla/Firefox, Obigo, Tkhtml, and finally Internet Explorer. You help to make the Web a more reliable place for Web developers.
iCab and Konqueror almost passed (and claimed to pass) before Opera, but they both failed to apply one of the styles required by the test, and as a result they displayed a scrollbar even though they shouldn't. This was fixed in later releases, after the release of Opera. For more information, see my notes about the scrollbar below.
Notes about the test:
- The forehead is a fixed position element. It will stay still when you scroll. That is correct behaviour. The test only works when it is at the default position after clicking the link. That is the correct response.
- The test may not pass if you make your minimum font size too big, or if you zoom, etc. The test itself is intentionally designed to only work assuming a 'normal' setup, so this is expected behaviour. Complain to the authors if you do not like it :)
- Some of the CSS used in the test is invalid. This is deliberate. CSS has very strict error handling rules, and part of the test checks if the browser follows these rules correctly.
- Resizing the window may cause text at the top to wrap, and scroll the page a little, so you will need to click the link again - see note 2.
- Making the window very narrow will make parts of the test become too narrow (such as the chin) - this is because they use the table layout algorithm, and again, this is expected behaviour.
- When the test was written, no browser passed. The test was intentionally written to ensure that it picked up on bugs in all browsers.
- The nose is 1 px smaller in Safari and Konqueror than it is in the reference rendering, and is 1 px offset in Firefox 3 compared with the reference rendering (which itself was prepared in Firefox with lots of hacks for Firefox's bugs, just so you know). According to the author of the test, this does not constitute a failure, since this behaviour relating to border intersection is undefined, and either response is correct.
- Anyone who claims that browsers should all pass already because it is easy to implement CSS; when you have written your own engine that passes the test, and still manages to render the majority of Web pages correctly, then you try claiming it was easy ;) Until then, just accept that it certainly is not easy to pass this test.
Opera
Opera is produced by Opera Software, and is their main browser release. Opera 9 passes the Acid 2 test, making it the second browser to do so, and the first browser for Windows or Linux/UNIX to pass. It was the first browser available for download for all of Windows, Linux/UNIX and Mac, that passes Acid 2. Although it is the second browser to pass, Opera is the third application to pass, as Prince - an application for converting XML/HTML+CSS into PDF - was second, after Safari.
The biggest jumps were clearly 3 to 3.6 (when CSS support was introduced), and 6 to 7 (when the new Presto engine was used). It is somewhat worrying that IE 6 renders Acid 2 very similarly to Opera 3.6, and the hyped IE 7 renders it about as well as Opera 4 in terms of number of mistakes (which was released in 2000, 6 years before the release date of IE 7).
Notes:
- The screenshots are taken with the mouse hovering rows 6-9, to show if the :hover selector is recognised, so yes, the nose will be blue as required by the test. It will be black if the mouse is not hovering those rows. Try using Opera yourself if you need clarification.
- The test will not work if you enable small screen rendering or Fit To Width (with a narrow window). That is intentional behaviour.
- Rendering the Acid 2 test on mobiles is more difficult. The Acid 2 test itself is too big to fit on most mobile screens (meaning the chin gets broken), and scrolling to see it all (correctly) breaks the forehead. You will either need a high resolution device (and disable 'fit to screen' reformatting), such as the one shown here, or you will need to:
- Use Opera's zoom feature to zoom out (preferably to a convenient number such as 50% to prevent rounding errors).
- Disable Opera's small screen or 'fit to screen' reformatting (since the test itself does not have any CSS specifically targetting handheld media)
- You might also have to use fullscreen mode, if it is available.
- Activate the 'Take The Acid2 Test' link only after you have done these other things.
The test itself was published in early April 2005, just a week before the release of Opera 8. For what it's worth, Opera 4 was the first version to pass the original (CSS1) Acid test, which was published around the same time as Opera 3, in 1997. As far as I can tell, it was the third browser to pass that test.
- Opera 1 - Sorry fans, I cannot send you a copy
- Opera 2
- Opera 3
- Opera 3.6
- Opera 4
- Opera 5
- Opera 6
- Opera 7
- Opera 7.5 (the eyes showed up when I took the screenshot, but I have not been able to get them to appear afterwards - go figure)
- Opera 8 - The Acid 2 test was published just a week before Opera 8.0 was released
- Opera 8.5
- Opera 9 - See the notes at the top of the section about the colour of the nose
- Opera Mobile 9 on Series 60 - The first mobile browser to pass the Acid 2 test (the hover effect will work on devices that have a mouse)
KHTML/WebKit Konqueror and Safari
These browsers have a somewhat mixed lineage. Originally it started out as Konqueror - a file manager of the Linux/UNIX KDE desktop environment, which also included a browser. At version 3.3, Apple made their own port for use on Mac. The two are now developed independently, but are sometimes able to share code or algorithms. Safari generally is further ahead than Konqueror, but is tied very heavily into the OS functionality. Both can be used by other browsers on the same platforms, but this is not done very often (OmniWeb being one of the few adopters).
Safari passed Acid 2 before Konqueror, and in fact, before any other browser. It was also the first public final release to pass, at version 2.0.2. Konqueror 3.5 claimed to pass third, but although it got the Acid 2 face correct, it failed to hide the viewport scrollbar, meaning that it failed the test (although that went unnoticed for a while, until after the original publication of this article). Konqueror 3.5.2 (public final) passes the test, making it the fourth browser to pass, after Safari, Opera, and iCab.
Konqueror 3 was the first KHTML browser to pass the original (CSS1) Acid test, which was published 3 years before the first Konqueror browser was released. Acid 2 was published while Konqueror was at version 3.4, at about the same time as the Safari 1.3 release.
- OmniWeb 4.2 - a completely different rendering engine to the others (included only for the sake of interest), and it is easy to see why it was abandoned in favour of the Safari engine in OmniWeb 4.5
- Konqueror 3.1
- Konqueror 3.3 - just after Apple branched it to create Safari
- OmniWeb 4.5, using the same engine version as Safari 1.0
- Safari 1.3
- Safari 2.0.2
- Konqueror 3.5.2
Supposedly the Series 60 browser development builds should also pass the Acid 2 test, making it the third mobile browser to do so, but I have been unable to compile it to prove this for myself.
Several other browsers use the same rendering engine, and as such they will pass when they update their engines to the passing version (such as Shiira, OmniWeb, and RapidWeaver). I will not list them individually, since they all reuse each other's work, instead of passing the test for themselves. They are all treated as a single program.
Prince
Prince is special, since it is the first (and currently only) non-browser to pass the Acid 2 test. It is an application that converts CSS styled XML or HTML into PDF. It presents some unique problems with the test itself, since it does not have a scrolling canvas. As a result, it is hard to see how the fixed position element is rendered (perhaps it is drawn on all target pages), or how the correct offset is obtained to ensure it lines up correctly (this may be by pure chance, based on the position of the page breaks).
At least it does not have a problem with a scrollbar, since paper - the target medium - does not have such an interface. The target medium also does not have an interactive pointing device, so the hover effect on the nose does not work, but the test is supported to the maximum extent of the target medium. This allowed Prince 5.1 to be the second program to pass the Acid 2 test, after Safari and ahead of Opera. (It was the fourth program to claim to pass, after Safari, iCab and Konqueror - see the Konqueror and iCab sections for more details).
Prince 5.1 is also the first version to pass the original (CSS1) Acid test, which was published before Prince even existed. It fails to render the radio inputs, but that is a limitation of the target medium (radio inputs are not available in PDF or on paper). Earlier versions may have been capable of passing the CSS part of the Acid 1 test, but they could only parse XML, not the HTML used by the Acid 1 and 2 tests.
As well as the limitations of the target medium, Prince also is unique in that it does not actually display the test itself. It converts it into another file format. The test then has to be rendered by a printer, or a PDF viewer or plugin. This means that although its own conversion of the test itself is correct, how it appears is at the mercy of the application being used to display or print the PDF.
In reality, Prince helps to show the rendering problems of PDF viewers. None of the many applications I tried were capable of rendering the PDF correctly. Most apply ugly antialiasing that breaks the image, some leave narrow gaps over parts of the image, some do not apply the image transparency, some do not align the image layers correctly, some make parts of the image too large, and some make the image too small at 100% scale (some failed to display anything at all). In other words, it was not possible to obtain a perfect screenshot. The best rendering was provided by Acrobat 5 at 132.5% scale. See the Prince Acid 2 FAQ for more information. The final screenshot was taken by combining unbroken parts of several renderings, with careful zooming and scaling to make viewers provide the least broken version of the output.
- Prince 5.0 - debugging output showing that it cannot render HTML
- Prince 5.0 using an XHTML version of the test - this shows some of the antialiasing problems (fuzzy edges of black parts), odd lines, and oversized parts (several black borders are 1px too big) caused by the PDF viewer; Acrobat 7
- Prince 5.1 output rendered by Acrobat 5 - several rounding errors are visible in Acrobat's scaling; many of the black bars are 1px too big, and the eyes are 1px too narrow
- Prince 5.1
Try the PDF output, produced by Prince 5.1.
iCab
iCab is an independent browser written by only two developers; Alexander Clauss (everything except the JavaScript engine) and Thomas Much (the JavaScript engine). The browser itself is relatively unknown, and has spent its life in perpetual beta stage. Despite the development being a little less steady than other browsers (the gap between 2 and 3 was exceptionally long), the overall increase in standards support makes up for the delays, and the jump between 2 and 3 is quite staggering, clearly far more than any of the other browsers.
It passed the Acid 2 test earlier than several of the more popular browsers. In part this was due to lack of pressurised release schedules, and a very small user base, but also was because unlike those browsers, it did not have to worry about regressions (where a page that used to work stops working), as very few pages worked properly in it before version 3 anyway.
iCab was the second browser to claim to pass Acid 2, but although it got the Acid 2 face correct, it failed to hide the viewport scrollbar, meaning that it failed the test (although that went unnoticed for a while, until after the original publication of this article). iCab 3.0.2b400 now has a preference that can be used to allow the test to hide the scrollbar, making it the third browser to pass the test, after Safari and Opera. iCab 3 is also the first version to pass the original (CSS1) Acid test, which was published before iCab existed on Mac - at that time it was the Atari CAB browser. Acid 2 was published while iCab was at version 2.9.
Note that for some users, upgrading from iCab 2 to iCab 3 may cause the test to fail, because their default font sizes in iCab 2 were too large to allow the test to pass. If this affects you, use iCab - Preferences - Fonts & Language - Factory Settings, then reload the test.
- iCab 2 - this rendering is based loosely on CSS 1 with many mistakes - very artistic; to me, looks either like a hand reaching out of a red sea, or a badly drawn character from Henry's Cat
- iCab 3
Although iCab 3 passed the test independently, iCab 4 has switched to using Safari's engine.
Mozilla/Firefox/Netscape
The Mozilla browsers are fairly well known, mainly due to the fact that the Mozilla project is derived from the original Netscape browser (although the engine was rewritten). Once the great innovator of the Web, Netscape failed badly with Netscape 4, and the Mozilla project started instead. The long and slow release cycles allowed Internet Explorer to storm into the lead, become the most commonly used browser, then to forget about keeping up to date, and yet still remain the most used, despite now being behind the others. The main Mozilla browser is now Firefox, but the Mozilla suite, Netscape, and countless other interfaces are also available. Firefox 1.5 is slightly ahead of the Mozilla suite 1.8.
Originally planned to have the 1.6 engine branch in version 2.0, this was abandoned as it would delay the release, meaning that Firefox 2 has the same engine as 1.5. The Reflow branch has been developed further than the original 1.6 branch, and passes the Acid 2 test, making it the fifth browser to do so. This branch is in all Firefox 3 builds after alpha 1. The screenshot was prepared in the first reflow branch builds that passed, compiled from the CVS source, it was certainly nowhere near ready for public use yet (images failed to appear on many pages, several form inputs did not work, layout did not cope with resizing, parts of the page or interface randomly appeared and disappeared, and fixed position elements jumped and jittered when the page was scrolled). The released version 3 passes the test without breaking the existing functionality.
Mozilla 1 was the first version (and as far as I can tell, the first browser) that passed the original (CSS1) Acid test, which was published about the same time as Netscape 4.0 (over a year before the version shown here). Acid 2 was published while Firefox was at version 1.
- Netscape 1
- Netscape 2
- Netscape 3
- Netscape 4.8 (I edited the test to allow Netscape 4 to use the CSS, since its broken error handling means it would normally ignore it completely)
- Mozilla 1
- Firefox 1
- Firefox 1.5 and proposed 2.0
- Unused Firefox 1.6 branch
- Firefox reflow branch (proposed Firefox 3)
Several other browsers use the same rendering engine, and as such they will pass when they update their engines to the passing version (such as Camino). I will not list them individually, since they all reuse each other's work, instead of passing the test for themselves. They are all treated as a single program.
Obigo
Obigo is a relatively unknown mobile browser. Teleca recently announced that internal versions pass Acid 2, making it the sixth browser to do so after Safari, Opera, iCab, Konqueror, and Firefox. It is the second mobile browser to pass, after Opera Mobile. It is the seventh program to pass (Prince was second).
They have a small flash movie demonstrating that they pass, and claim it to be flawless. I cannot verify if it is flawless, as their movie is very poor resolution, and seems to have been taken with a hand mounted camera, instead of a proper screenshot, so a mistake of a few pixels would not be distinguishable. For now, I will take them at their word, and assume they pass.
Note, however, that users have reported - with screenshots - that subsequent releases clearly fail the test. Without knowing the version number that is expected to pass, it is not possible to know if the version that fails is one of the versions that Telca claimed should pass. It must be noted, however, that the released version fails due to only one remaining bug, and it is unlikely that it would be a build created from a branch just before the last bug was fixed in order to allow it to pass. It is probably a subsequent release that was supposed to pass, but in actual fact has regressed to a point where it now fails. Thus it could be safely assumed that Obigo does not, in fact, pass the Acid 2 test, and only briefly passed in an internal build that was never publically released. This could (depending on your definition) constitute an overall fail.
Note that I cannot produce screenshots of any versions, as there is no freely available SDK that I could find. No version numbers are given.
- Obigo internal build
- Obigo public released build clearly failing the test, caused by the object fallback not being correctly triggered by the 404 error page. (The chin is also too narrow, but this does not constitute a failure, as it relates to the narrow window of the mobile device and the automatic table layout algorithm - see the notes about the test above.)
Tkhtml Html Viewer
Tkhtml is a rendering engine designed to be used by Tcl applications. The Html Viewer (misnomer) is the demonstration browser using that engine. Tkhtml2 was a part of the ActiveTcl distribution. Tkhtml3 is a new engine designed to bring Tkhtml up to date. Currently, the engine is incapable of running complex JavaScript (so many Web pages are unusable), but its CSS is surprisingly good.
Version 3 passes the Acid 2 test, making it the seventh browser to do so, after Safari, Opera, iCab, Konqueror, Firefox, and Obigo. It is the eighth program to pass (Prince was second). It is also the first version that passes the original (CSS1) Acid test, which was published before Tkhtml existed. Acid 2 was published while Tkhtml was at version 2.
The first version to pass Acid 2 had several problems with the parts of CSS 2 that Acid 2 is designed to test, such as that it's generated content support is very problematic, it sometimes fails to apply selectors that it normally understands, it cannot combine certain selectors, it cannot use overflow correctly, it fails with many table styles, as well as font and text spacing settings, the advanced box model (for replaced elements), layering of positioned elements and text, and even borders. It is amazing that it passed Acid 2 at all, but it seems that the bare minimum was implemented to allow it to pass, before working on the other applications of the styles the test uses. As a result, many pages work very well, but many look very bad (most pages on this site show several problems with that version of the Tkhtml engine). A few of these problems have since been fixed (escape characters in CSS strings, ignoring unknown media blocks, and combining selectors), so that most - but not all - of this site now works, but it still has trouble with non-visible overflow, with text incorrectly layered on top of positoned elements, and cannot work with images or counters in generated content.
Internet Explorer on Windows
The only major browser not to pass the test in a public final release. Not only does it not pass, but it fails quite spectacularly. Early on, IE was one of the great innovators. It was the first browser to support CSS and DOM, but since Microsoft stopped updating it in IE 6, it has watched the other browsers fly past it, becoming much more accomplished at Internet standards. The few changes in IE 7, released in late 2006, do not bring it much closer to passing. The CSS 2 selectors and max/minWidth/Height seem to make a slight curve at the top, but still a long way off. IE is showing its age, and it has a lot of catching up to do. Sadly it is taking much longer than the other browsers, and IE will continue to drag the Web down, for as long as it remains so common.
Internet Explorer 8 passes the Acid 2 test, making it the eighth browser to do so, after Safari, Opera, iCab, Konqueror, Firefox, Obigo, and Tkhtml. It is the ninth program to pass (Prince was second). However, IE 8 beta 1 has a bug that cause it to fail when the test is hosted on other sites - specifically the object fallback fails when it references a file on an external site. This counts as a pass in my book, since it passes on the original site. However, it could be considered a fail, depending on your viewpoint. It is clearly less than ideal, but at least they are finally taking the test seriously, and it passes on any site in beta 2.
IE 6 was the first version to pass the original (CSS1) Acid test, which was published about the same time as IE 4. Acid 2 was published while IE was at version 6, and was written as a direct challenge to all browsers, but especially IE (since it had fallen far behind the others, but was the most popular).
- Internet Explorer 3, which supposedly supported CSS
- Internet Explorer 4 - one of the most artistically wrong renderings
- Internet Explorer 5
- Internet Explorer 5.5
- Internet Explorer 6
- Internet Explorer 7 (if the browser window is any bigger, the forehead becomes detatched from its correct position, due to a missing margin at the bottom of the page - I decided to be nice to IE 7 when taking the main screenshot)
- Internet Explorer 8 (beta 1)
- Internet Explorer 8 (beta 1) with the test hosted on its new home, showing the failure. Subsequent public releases now pass the test on both the old and new URLs.
(Internet) Explorer on Mac
The Mac version of Internet Explorer that has been discontinued, and is no longer a major browser, having been replaced by Safari. It is a different engine to the Windows version, and has a very different response to the Acid tests. Originally reasonably good at CSS, it quickly became outdated and problematic, and it was a welcome change when Safari was announced, and IE Mac was discontinued. In version 5, it was (as far as I can tell) the second browser to pass the original (CSS1) Acid test, which itself was published while Mac IE was at version 3. Acid 2 was published a few years after Mac IE was discontinued, at version 5.2.
- (Internet) Explorer 4 Mac
- (Internet) Explorer 4.5 Mac
- (Internet) Explorer 5.2 Mac (I edited the test to allow IE 5 Mac to display the page, since its mishandling of overflow normally makes the page completely blank)
Notes about the scrollbar
Are browsers allowed to show a scrollbar on the test
Part of the Acid 2 test is the style that hides the scrollbar on the viewport (that is the browser window to you and me). The Acid 2 guide neglects to mention this style, and it is not in the reference rendering provided by the guide, as it concentrates only on the face, but the guide is only (as its name suggests) a guide, and is not the official test. There can be absolutely no doubt that the scrollbar is a part of the test, since it is mentioned inside a comment in the source code of the test itself:
html { ... overflow: hidden; /* hides scrollbars on viewport, see 11.1.1:3 */ ... }
(In case you are wondering, the spec paragraph referenced by the comment only refers to how overflow propagates to the viewport, it does not say what the actual effect of that should be.) Some people have suggested that a browser does not have to support this to pass the test. The reasons being that:
- The CSS 2.1 spec says that with hidden overflow,
no scrolling user interface should be provided
. - The CSS 2.1 spec says that
The key word[..] [...] "SHOULD" [...] in this document [is] to be interpreted as described in RFC 2119 [...]. However, for readability, [this] word[..] do[es] not appear in all uppercase letters in this specification
. - RFC 2119 states that
SHOULD: This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course
. - Theoretically, if a browser vendor thought long and hard about it, and decided that it causes accessibility problems by allowing the scrollbar to be removed (leaving the user with a potentially confusing interface), they could show a scrollbar, and still be considered CSS 2.1 conformant.
However, the test is not about finding loopholes. It is about trying to get a reliable response across browsers, so that authors can use a style or technology, and know exactly how it will work. It is not (just) about CSS 2.1. Should a browser that fails to support PNG transparency, still be allowed to claim to pass, since that is not in the CSS spec? Should a browser that fails to support data URIs still be allowed to claim to pass, since they are not in the CSS spec either? No, of course not, because whether or not they are in the spec is irrelevant. They are still part of the test.
Here is a simple example; I have prepared a simple test, that uses styles from CSS 1 (and requires HTML and image support). The ethical question you have to ask yourself is; does Internet Explorer 6 pass the test? It's simple enough. By all means read the source, and check the relevant part of the CSS 1 spec, which states UAs may treat 'fixed' as 'scroll'. However, it is recommended they interpret 'fixed' correctly, at least on the HTML and BODY elements
.
In my very strong opinion, no, it does not pass, because the question was "does Internet Explorer 6 pass the test", not "does it conform with the relevant parts of the CSS 1 specification". It did not meet the pass criteria for the test. The test tests what it wants to test. The test is not the spec. If you don't like it, ask the authors of the Acid 2 test to change the test.
Note, if you disagree, please do not email me saying you disagree. The question is one for you to solve for yourself. I have already stated my opinion, and you are unlikely to change it.
In the case of the Acid 2 test scrollbar, it is actually useful to be able to hide it, since you can then render the body separately from the HTML element, and display a scrollbar on that instead, which is a useful ability.
There was also the suggestion that a browser's claim to pass should not be discounted because they failed to apply one style. Well, this sample shows just one style error (incorrect float and inline layering, in Opera 9 Preview 2), and this sample also shows just one style error (a failed margin collapse). Neither of those would be considered a pass either. This is not a school exam where 60% is a pass grade. It is 100% or nothing. Almost passing is not passing.
Some people also suggested that since the WaSP (hosts of the test) said the browser passed, that browser passed. In most cases, the WaSP took the browser vendors at their words (the vendors believed they passed, after all, so that is what they told the WaSP). However, in the case of Konqueror, there was a screenshot showing a scrollbar. As a simple response; the WaSP are humans too, and like the vendors, some of them missed the style requiring the scrollbar to be hidden. And in case you were wondering, the person who announced that iCab and Konqueror had passed on the WaSP blog was not the author of the test. They make mistakes. Get over it.
How can they scroll without scrolling interface
That's right, the test requires the page to scroll when you click on the link, even though no scrollbar should be provided. I have spoken with the test author about this, and generally, it is included because that is what many browsers did. It is not specified explicitly anywhere.
However, the CSS 2.1 specification says no scrolling user interface should be provided
. It never said that the browser could not scroll without an interface. HTML 4.01 says that By activating these links [...], users may visit these resources
, meaning that the browser needs to scroll the page to allow the user to visit that resource. OK, it is not very well specified in the specs, but nonetheless, it is in the test, and it is what is in the test that is important when you are trying to pass the test.
The official guide to the test goes on to say that the page may also be scrolled (implying that this can be done manually by the user, even though they have no scrollbar): If the Acid2 page is scrolled, the scalp will stay fixed in place, becoming unstuck from the rest of the face, which will scroll.
What is a scrolling mechanism
CSS 2.1 is not very explicit as to what constitutes a scrolling mechanism. It is mentioned in section 11.1.1 as a scrolling mechanism that is visible on the screen (such as a scroll bar or a panner) [...] In the case of a scrollbar being placed on an edge of the element's box, it should be inserted between the inner border edge and the outer padding edge. Any space taken up by the scrollbars should be subtracted from the computed width/height, thus preserving the inner border edge.
.
However, the browsers (Safari, Opera, iCab, Konqueror, Firefox, and Tkhtml [and Internet Explorer] - Obigo not tested) still allow panning and/or scrolling with the arrow keys or mousewheel. So how can they pass? Well, for a start, the CSS 2.1 spec neglects to say what exactly should be hidden. It mentions that no scrolling user interface should be provided
. Different wording. Interface vs Mechanism.
Like many, I was confused by this, so I asked an author of the spec (who happens to be the main author of the test). It was apparently intended to refer to an initially visible mechanism, such as a scrollbar. Panners are not initially visible, and key scrolling does not have any visible indicator. It needs some better wording I think, but welcome to reality.
But even if it is changed to say that these ways to scroll are not allowed (meaning that the browsers would be slightly violating the CSS 2.1 spec), that would not matter to the passing of the Acid 2 test. Why? Because the test explicitly asks for them to "hide[..] scrollbars on viewport". They do that correctly. Meaning they pass. *Phew*.
Of course, if you disagree, you are free to contact the authors of the test, and ask them to change the wording in the test. However, note that right now, we have a fairly consistent cross browser response, which is helpful to everyone, and users can scroll if they need to, which is also helpful, especially if things reshuffle when the window is resized. In addition, they comply with the intention of the relevant part of the spec.