Did Microsoft do it again?


Author: Jesper Nøhr (jespern_AT_opera_DOT_com), October 21, 2004.

Prelude

Some time ago, Haakon Wium Lie wrote an article, with a detailed explanation and analysis of why the current MSN.com didn't work in Opera. After this 'stunt', Microsoft actually got around to fix this problem. It seems that this has somehow happened again, this time on msnbc.com. The site seems to work perfectly, but if you click one of the articles on this site, you will get redirected to a page stating that the requested URL could not be found. What's interesting, is that this only seems to happen when your browser claims to be Opera.

I will try to be as objective as possible in this analysis, since I work for Opera Software. Nepotism aside, this issue just screams to be analysed, and that is what this article is about.

Proof

Every claim must be proven. I will do that by 'disguising' myself as several browsers, and try to obtain the same page. Since Microsoft seem to do this odd redirection server-side, I cannot prove it by any code.

Here's a table of the user agents I have tried:

User-Agent HTTP Code Works
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 200 Yes.
Opera/7.60 (X11; Linux i686; U; en) 302 No.
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.0 [en] 302 No.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.5 302 No.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040928 Firefox/0.9.3 200 Yes.
Links (0.99; Linux 2.6.8-1-686-smp i686; 94x42) 302 No.
Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.0.16 302 No.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01 200 Yes.
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/103u (KHTML, like Gecko) safari/100 200 Yes.
Mozilla/6.0 (compatible; Konqueror/4.2; i686 FreeBSD 6.4; 20060308) 302 No.

Okay, so this is not just Opera.

They seem to block text browsers (links/lynx), Opera and Konqueror. So what does these browsers have in common? Hmm.. Well, they all claim to be running on something different than MS Windows. No, that's not true. We tried identifying as Opera on NT 5.1, and we still got blocked out. Opera on Mac works, though. Might they be blocking non-windows operating systems and browsers with 'Opera' in the user-agent string? Very doubtful. Also, running firefox seemed to work, even though we both identified us as running linux *and* X11 in that string. So the word "Linux" doesn't seem to trigger whatever is refusing to output the page. If we look at the first user-agent string (equivalent to standard IE6), that works. If we apply the string 'Opera 7.5' *after* that exact same string, it fails.

I decided to try and replace the string "Opera 7.5" with something else, just to see if they actually block *due* to the reason that there's something appended to the string. I tried this:

% wget --user-agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Bogus 1.0" \
            http://www.msnbc.msn.com/id/6260506/
--09:10:34--  http://www.msnbc.msn.com/id/6260506/
           => `index.html'

Resolving www.msnbc.msn.com... 207.46.245.32, 207.46.150.52, 207.46.150.51, ...
Connecting to www.msnbc.msn.com[207.46.245.32]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41,186 [text/html]

100%[==================================================>] 41,186        63.47K/s             

09:10:35 (63.37 KB/s) - `index.html' saved [41186/41186]
          

Very surprising. Even tried replacing 'Bogus 1.0' with many other strings. Worked too. So *apparently* they are blocking out Opera, and they must be aware of it. They've taken time to do it. The proof lies in that if you replace Bogus with Opera and you get redirected to a page saying 'this page is non-existent'.

Talkshow-hosts works, though.

As tried before in the previously referenced article by Haakon, I tried replacing the word 'Opera', with 'Oprah'. The user-agent string remains exactly the same as the one that got blocked, *only* with the word 'Opera' replaced with 'Oprah'. Guess what. We got the page we were looking for. Also, 'Letterman 5.0' and 'JayLeno 10.5' works.

Not reference based either.

I know some of you might be thinking that this has something to do with the Referer:-header when requesting the page with wget. Allthough, that doesn't explain why identifying as IE in the wget works. So it's not referer-related either.

Does Opera render the page wrong?

In a last attempt to justify this for Microsoft, I desperately made a local mirror the page and tried to render it within Opera. There might be a reason for Microsoft to do this, such as "protect" users from viewing the page in Opera if Opera didn't render the page correctly. This is the result:

Internet Explorers rendering
Rendering of MSNBC.com in Internet Exporer 6.0.

Opera 7.6's rendering
Rendering of MSNBC.com in Opera 7.60.

The difference? Some images are lacking (due to wget-issues), and some placements are different. Does it render the page virtually unreadable? Definitely not.

Why would Microsoft be doing this?

A possible answer could be that these browsers are competitive browsers, and they block out the users because of that. It is doubtful, though. Since the last publicity stunt, I doubt that they want to go through more of those. Another possible answer could be that this is a precaution because the browser lacks capacity to either render or give the end user the capabilities that are meant to be in the website. This would explain why they also seem to refuse lynx, links and konqueror. Also, the reason could simply be that they forgot to remove this from their code base. You probably expect me to state some profound and noble reason why they could do this now. Unfortunately, I can't think of any. I have tried not to force any conclusion on the reader in this article. You decide what you think.