Email conversation
From | Barry Samuels |
To | Me |
Subject | 'People who like to make pages available offline' |
Date | 28 November 2005 19:53 |
Tarquin
I have just stumbled upon your web site when doing a Google search about
Netfront, which is the browser supplied with my [Phone], and I must
compliment you on your site even though I have seen only a small part.
One thing that makes steam come out of my ears are the web sites that cater
for specific browsers or add-ons only. Particularly sites that one cannot
even enter unless Macromedia Flash is installed.
Would you object if, when I complain to those sites, I refer them to your
site?
However - I digress. My specific reason for contacting you was that I
noticed the following:
"Making pages available offline (using Internet Explorer favourites) is
permitted, but you will not be allowed to extend this to pages that are
linked to by the current page."
How do you do this? My site is hosted using Apache and I have an .htacces
file which redirects known offline browers to a page explaining why I don't
permit access to offline browsers. I know in some cases visitors using some
varieties of Windows appear to automatically cache every page linked from
the current page and I would like to stop it.
My reasons are similar to yours although it is not causing a bandwith
problem at present but I'd prefer prevention rather than cure.
Kind regards
Barry Samuels
From | Me |
To | Barry Samuels |
Subject | Re: 'People who like to make pages available offline' |
Date | 28 November 2005 20:34 |
Barry,
> I have just stumbled upon your web site when doing a Google search about
> Netfront, which is the browser supplied with my [Phone]
My condolences ;) - I have never been tremendously impressed with the
rendering, bandwidth handling, or broken DOM support of that browser. But
yes, it is still very popular on devices, and there is no reason to actively
block it, or rely on technologies that are clearly more than a device
browser can handle.
> Would you object if, when I complain to those sites, I refer them to your
> site?
You are welcome to do so if you want, yes :) Another similarly minded site
is: http://www.quirksmode.org/
> "Making pages available offline (using Internet Explorer favourites) is
> permitted, but you will not be allowed to extend this to pages that are
> linked to by the current page."
>
> How do you do this?
There are several techniques used on the site:
1. robots.txt tells IE's inbuilt offline crawler not to index anything -
done using this:
User-agent: MSIECrawler
Disallow: /
2. several pages link to directories blocked by robots.txt - a server side
check (done by my hosting service) makes sure that if any crawler violates
the rules in robots.txt and attempts to index those directories, it is
instantly banned (this catches spam crawlers, as they generally do not obey
the rules).
3. more server side checks ensure that if too many requests are made for the
same pages within a certain space of time, that will also earn an instant
ban. This is because a lot of my pages link back to the front page of the
site, and many spam crawlers are very badly behaved, and will repeatedly
request the same page, causing them to get stuck in infinite loops, wasting
bandwidth.
When an address is banned, any requests it tries to make are ignored.
This is done via Apache, but I am not sure exactly how all of it works, I
just followed the instructions of my host :)
Mark 'Tarquin' Wilton-Jones - author of http://www.howtocreate.co.uk/