Using Google to Get Link Info

Album Cover: Various Positions

"Her beauty and the moonlight overthrew ya."
Leonard Cohen / Hallelujah

Posted on February 19, 2006 4:00 AM in Web Development
Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

Let's say one day you get a little zealous and decide you want to write a script that will automatically add title attributes to links on your site. Or let's say you get really zealous and decide to write a script that will automatically generate hover boxes like the ones over at Henrik Gemal's site that contain the appropriate title and description for any given link.

If you really want to be dynamic, you'll need to figure out some way to extract the correct page title for the link and, potentially, an excerpt from the text or a good site description. The former is pretty easy, assuming the site you're linking to has a <title> tag in its <head> block. The latter isn't as straightforward, though. You could look for paragraph text and extract a little to use as your description, but that is hit-and-miss. Perhaps a better option would be to look for the meta description tag and extract its contents as your site description. The only problem with that approach is that not everyone includes that information in their site's markup.

So why not avoid these options altogether and go to the king of all data, Google? Ideally, you could tap in to their API to get the information you need, but you could also be a bit more geeky and use a regular expression to get at the info. Have you ever tried searching Google for a link instead of a term or set of terms? A search for bernzilla.com, for instance, returns both the title of my site and a brief description. This should be perfect for your needs.

Okay, you say, but how do I get at that information dynamically? Well, I'll show you a bit of PHP code to do just that:

<?
// first set up some definitions
define("GOOGLE_QUERY", "http://www.google.com/search?q=");
define("WEBSITE_URI", "http://www.bernzilla.com/");
define("REGEX", "/class=l.+?>(.+?)<.+?font.+?>(.+?)</is");

// utilize Google for title and description
$google = file_get_contents(GOOGLE_QUERY . WEBSITE_URI);
$matched = preg_match(REGEX, $google, $matches);

// if data was extracted successfully
if ($matched)
{
   // store the data
   $title = $matches[1];
   $description = $matches[2];
}
?>

Once you've grabbed what you need, you can go ahead and use the data to dynamically populate the title and description in your ambitious hover boxes (or less ambitious but equally important title attributes).

Comments

tutor on September 20, 2009 at 6:33 PM:

I wished google still worked like this. Does anyone know how you can use a modern approach to strip out the description using php

Permalink

Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.

Name:

Email Address:

Website:

Comments:

Check this box if you hate spam.