An explanation
A few people have asked me to publish the browser statistics for this site. Statistics like this generally only make sense if your site is large enough and popular enough to intersect an appropriate cross-section of the relevant population. Although this site is easily large enough for this (several GB per month bandwidth usage), I do not, and will not publish statistics. And I have good reasons not to.
- Browser statistics are the cause of the problems that I hope to help you avoid.
- You look at your statistics. You see that most of your visitors are using browser X. So you decide to design your site so that it works in browser X.
- Someone turns up using browser Y. Finds that your site is not written correctly and does not work in browser Y. They try a few tricks, and it doesn't work, so a bit annoyed, they are forced to use browser X whenever they visit.
- Someone else turns up using browser Z. Finds that your site is not written correctly, and it doesn't work. Frustrated, they go away, and never return to your site.
- You look at your stats again, and you confirm the fact that your visitors are all using browser X. Hardly surprising, since you won't let them use anything else.
- And so the cycle continues.
- Statistics are just an excuse for lazy web development, one of the things I really despise. They are used as an excuse to develop for just one or two browsers. As a result, when someone uses another browser, for whatever reason; accessibility, ease of use, security, whatever, they are insulted by being told to use something else which may not even be capable of serving their needs.
- If you design properly in the first place, it would not matter what browser they used. And your statistics would be unnecessary.
- And when the market changes, and people begin to use other browsers, reliance on statistics will mean that the site needs to be re-done to cater for the new browser. For example, after the release of Firefox 1.0, and the subsequent growth in its use, I got a large number of people saying they were needing to redo their sites to make them work in it (for some reason, the previous existence of Opera, Mozilla, Safari and Konqueror didn't seem to be reason enough). If they were designed properly in the first place, this would not be necessary. Do not rely on browser statistics, just design properly.
- It is impossible to accurately identify a browser by its User-Agent HTTP header, since many, many browsers are forced (by badly written sites) to mis-identify themselves as more well known browsers. Although with some browsers (like Opera) these are easy to identify, with some other browsers, there is no way to distinguish these from the browsers they pretend to be. The only way to accurately identify a browser is using JavaScript, and even then, it is not possible to know every possible browser that might be being used. If stats are done using server-side checks, it is a guarentee that you will get your stats wrong. If stats are done using JavaScript, it is a guarentee that they will also be wrong, as you will not be able to identify browsers that do not support JavaScript, or have it disabled. Since I wish to cater for these people, I will not use scripts as a basis for stats, but I also refuse to get incorrect stats by using server-side detection.
- If collecting stats, should each page request count? Or just one per browser? Or just one per IP address? And then what about people who share a connection - common with Universities and corporations, most wireless connections, and even an entire country? What about ISPs like AOL, where one user may have more than 20 IP addresses for consecutive or even simultaneous requests, but those same few IP addresses hide all the millions of AOL users? What about people who share proxy connections or transcoders like Opera Mini? Perhaps just count people who load the site start page? Then what about those who use bookmarks or a search engine to enter part way through the site? Or a person testing multiple browsers? Or should a person who loads just one page then realises it was not what they wanted and goes away count as much as a person who looks at 100 pages? How about a bot that crawls every page on my site? Or should a person that whips through 10 pages without reading them be counted as much as a person who only reads one page, but takes 20 minutes to read it carefully? And should cookies be used to avoid counting the same person twice (I refuse to use cookies, and Eurpean law even demands that you get permission fromeach user for this sort of purpose)? And what about people that reject them? And what about people who use Google's cache instead of loading the real page? Should I use an image to track them? And then what if they have images disabled? Or since external CSS files are cached, should I check only when that is loaded? And what if they disable CSS? Look, it is ridiculous! Statistics are impossible. Even if it were possible to accurately identify the browser (which it is not), it is impossible to get statistics that accurately (or even remotely) represent usage.
- I use my own site more than most other people, and I will sway results, as not only do I use my own preferred browser most of the time, but I also regularly test many other browsers on several pages of my site.
- Spammers use crawler bots looking for email addresses, and comment forms so they can spam them. I regularly see these crawling over my site. In fact, on some sites, these bots appear more often than real visitors. Typically, these bots identify themselves as Internet Explorer (virtually all of them, in fact), and these will substantially sway results towards IE, giving it an inordinately high result, making it appear more popular than it actually is. Even supposedly legitimate software is known to pre-scan search results while identifying itself as Internet Explorer. As an example, AVG's extremely abusive malware scanner can be indistinguishable from real IE browsing, and on some sites has been shown to be responsible for a staggering 6% to 90% of all traffic received by some sites, particularly those with good search engine rankings.
- Opera (in particular, but not exclusively), has very efficient caching policies, so someone could actually read a page several times and only show up in the stats once. By contrast, other browsers (such as Internet Explorer and Firefox) will usually make a request for every viewing, including for all linked stylesheets, JavaScript files and images. And they will sometimes also do the same while using the 'Back' and 'Forward' buttons (depending on headers sent by the server). This makes Firefox and Internet Explorer look significantly more popular than they are, and make Opera look much less popular than it actualy is. This only gets worse if a page uses scripts or conditional comments to serve extra content to specific browsers. The same applies to text based browsers, which not only have generally efficient caching, but do not download images, stylesheets or JavaScript files at all.
- Mozilla/Firefox incorporates link preloading. This means that if the page uses <link> tags to link to the next or previous pages (used by many sites, especially blogs), the browser will preload the linked page, even if the user never visits the page. This will produce a higher number of perceved hits than the user is actually creating. With many server processed pages (including most pages on this site), that functionality misfires and downloads the page a second time if they do visit it, which obviously sways results even more.
- Internet Explorer contains several caching bugs; it often fails to cache pages that are served with gzip compression (used by many, many sites, including Google, this site, all sites hosted by Dreamhosts, etc etc etc). It usually fails to cache .htc behaviour files, or images/CSS/JavaScript files that are loaded as a result of running a .htc file, and will often load them once for every element they apply to, and again every time the mouse moves over them. This can result in hundreds of extra requests for a single page (400 per visitor on one of my pages, until I removed the behaviour), and as a result, Internet Explorer can look far more popular than it actually is.
- Many ISPs have transparent caching proxies, even if the user is not even aware of it (there is even such a proxy on many sites implemented by the hosting service). This means that virtually all requests are handled by caching servers, and pages are updated from the site itself on a much less regular basis. This means that the site itself sees a totally unrealistic representation of the actual number of page requests. Some ISPs also do not allow certain browsers to reload pages from the originating server, meaning those browsers are hardly represented at all in the site stats unless they are by chance the first browser to make a request after cache expiry. An example of this is NTLWorld, who do not allow Opera to reload pages on modem services.
- I aim the content of this site at technically minded people, and hopefully enlighten them to the benefits of other browsers, so I get an abnormally high number of users of these alternatives. This in itself is a good thing, as the more people that use these browsers, the better things will be, but it still would not reflect general reality.
- This site is aimed at web developers, not normal users. Many of them originally find this site while looking for help to make a site work in a browser that they do not normally use. There is no point in me telling you what you use. You already know what you use. What is more important is catering for what your viewers will use, or would use, if only your site worked in them.
- I offer testing suites so that browsers can be put through their paces, and as a result, I expect a significant number of my visitors to be using browsers they would not normally use, simply for the purposes of testing.
- I regularly report bugs to public listings, or vendor bug tracking systems, and I have hundreds of bug reports and testcases for IE, Mozilla, Opera, Safari, Konqueror, iCab, ICEbrowser, Escape, etc. I get regular visits from vendors and other authors testing the relevant browsers to check on the status of the fixes for the bugs. This also causes a significant sway on my results.
- I am an Opera user (someone who has suffered from statistically driven sites before). I have a significant section of this site devoted to Opera information, documentation and tools. I get many visits from other Opera users, looking for feature information, or general hints for customizing Opera. This will make Opera appear far more popular than the other browsers (a distinction I believe it deserves, but not entirely accurate).
- While helping to produce scripts or setups for my friends or site visitors, I often create files which I place in temporary folders so that they can look at them. Mostly, these are for Opera users, again altering its results.
- When stats are published, in general, you end up with the fanboy factor. The fanboy sees the stats, decides that their favourite browser is not ranked high enough, so sets the page to automatically reload every second or so, in order to boost the ratings of their browser. Opera actually has an auto-reload feature built in, and Firefox has a similar extension (it is not designed for abuse like this, but I have known people to use it for this purpose). Fanboys do nothing but harm to the name of their chosen browser, and I don't want to allow them to try this sort of stupid popularity game with my site.
So no, I will not collect statistics, and I certainly will not publish them. And I recommend that you do the same, instead of allowing statistics to lie to you, and cause problems for your visitors.
Virtuelvis has a nice blog entry with some additional points about browser download stats.