Doug

Navigation

Skip navigation.

Search

Site navigation

Email conversation

FromDoug
ToMe
Subjectaccessing data from an external html page
Date4 May 2005 18:07
I hate to bother you because I know with such a useful page you must get a
lot of email, but I have searched high and low for a solution.  I am an
experienced programmer so I don't need any hand holding just a straight up
answer would be great.

I have a small meta search engine online ( [URL] ).  We
would like to put a preview function into our results pages.  I already
programmed a solution using iframes to include each page.  When the user
puts their mouse over the link it shows the iframe with the page already
preloaded.  Works really nifty!

Now the problem is that some websites have "frame breakout" scripts.  When
the iframe loads these pages they automatically takes over the entire
session ( top.location = self.location or something like that).  So I
thought a solution might be to load the pages somehow, filter the javascript
out, then display them in a layer or iframe or something.  I know there are
a lot of security restrictions for cross domain stuff but your page had so
many techniques I thought maybe it is possible. 

PS.  I also thought of redefining the "top" object or overriding its
properties or methods.  Any comments would be great if you know if this is
possible or if it would be effective since the other frames might have their
own top object... or are they shared?

Well any info you can send my way would be most appreciated!

Thanks in advance, Doug
FromMe
ToDoug
SubjectRe: accessing data from an external html page
Date5 May 2005 14:01
Doug,

> Now the problem is that some websites have "frame breakout" scripts.
> When the iframe loads these pages they automatically takes over the
> entire session

You are quite right, the cross domain resrictions would make this pretty
much impossible to avoid. You cannot load their page into an iframe if they
refuse to allow it.

In theory, you can use an object instead, but IE would almost certainly fail
at this.

> PS.  I also thought of redefining the "top" object or overriding its
> properties or methods.

'top' is a property unique to each frame, so you cannot override the child's
'top' from the parent frame. However, since 'top' and 'top.location' are
objects, you may be able to delete them (not sure how successful this might
be):
delete location; //this way, they can't change it.

The most reliable way to do what you need is a url proxy, but this would eat
your bandwidth, and you would need to change the base href of the page to
reflect its original location (so that all the links and images work);

<iframe src="getPage.php?url=http://example.com/"></iframe>

then in getPage.php:
<?
if( preg_match("/^http:\/\//",$_GET['url']) ) {
 if( $file = fopen($_GET['url'],'rb') ) {
  $contents = '';
  $base = '<base href="'.urlencode($_GET['url']).'">';
  while( !feof( $file ) ) { $contents .= fread( $file, 8192 ); }
  if( stristr( $contents, '</html>' ) ) {
    $contents = preg_replace("/<\/html>/i",$base.'</html>',$contents);
  } else {
    $contents = $base.$contents;
  }
  //this regexp may have an ugly effect on some pages - you will need to
  //tailor the regexp to suit your needs.
  $contents = preg_replace("/(=|\s|\()top\b/i","$1".'window',$contents);
  print $contents;
 }
}
?>


Hope this helps

Mark 'Tarquin' Wilton-Jones - author of http://www.howtocreate.co.uk/
FromDoug
ToMe
SubjectRe: accessing data from an external html page
Date5 April 2005 14:18
Yah the proxy option I ruled out.  The bandwidth would be too high.
I will try deleting the location object.  I doubt this will work but it is
my last try before shutting down this idea.  Too bad because we have
rave reviews from beta testers.  THANK YOU very much!  Really
appreciate your thoughts.  I will let you know if I have any success.

Later, Doug
This site was created by Mark "Tarquin" Wilton-Jones.
Don't click this link unless you want to be banned from our site.