Handling gzip-compressed (or Not) Content in Python

Album Cover: Various Positions

"Her beauty and the moonlight overthrew ya."
Leonard Cohen / Hallelujah

Posted on June 02, 2012 10:11 PM in Programming

I was recently working with an HTTP-based API that supported gzip compression. In trying to figure out how best to request and then handle gzip compressed data in a Python script, I ran across a Dive Into Python entry and a discussion on the subject at Stack Overflow. While the former goes into great detail about all aspects of requesting the compressed data and then handling it when it's returned and the latter is largely based on that same information, neither shed any light on what to do if the data comes back uncompressed for some reason. That's a fairly significant oversight because, as a user agent, you can request gzip compressed data as often as you like, but the web servers you request it from aren't guaranteed to return it that way.

In attempting to paint the rest of the picture, I came up with the following function for requesting gzip compressed data and handling the content that's returned whether it's compressed or not:

import gzip
from StringIO import StringIO
import urllib2

# helper function for fetching content from a URL
def fetch_url(url):

  # attempt to fetch the URL's contents with gzip compression
  request = urllib2.Request(url)
  request.add_header('Accept-encoding', 'gzip')
  response = urllib2.urlopen(request)

  # get ready for the content
  content = ''

  # if the response is gzip-encoded as expected
  if response.info().get('Content-Encoding') == 'gzip':

    # read the encoded response into a buffer
    buffer = StringIO(response.read())

    # gzip decode the response
    f = gzip.GzipFile(fileobj=buffer)

    # store the result
    content = f.read()

    # close the buffer
    buffer.close()

  # else if the response isn't gzip-encoded
  else:

    # store the result
    content = response.read()

  # return the content
  return content

I tested the function with and without the add_header() line and it worked as desired in both cases. Hopefully this will prove useful for someone else.

Comments

No one has added any comments.

Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.

Name:

Email Address:

Website:

Comments:

Check this box if you hate spam.