I was recently working with an HTTP-based API that supported gzip compression. In trying to figure out how best to request and then handle gzip compressed data in a Python script, I ran across a Dive Into Python entry and a discussion on the subject at Stack Overflow. While the former goes into great detail about all aspects of requesting the compressed data and then handling it when it's returned and the latter is largely based on that same information, neither shed any light on what to do if the data comes back uncompressed for some reason. That's a fairly significant oversight because, as a user agent, you can request gzip compressed data as often as you like, but the web servers you request it from aren't guaranteed to return it that way.
In attempting to paint the rest of the picture, I came up with the following function for requesting gzip compressed data and handling the content that's returned whether it's compressed or not:
import gzip
from StringIO import StringIO
import urllib2
# helper function for fetching content from a URL
def fetch_url(url):
# attempt to fetch the URL's contents with gzip compression
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
# get ready for the content
content = ''
# if the response is gzip-encoded as expected
if response.info().get('Content-Encoding') == 'gzip':
# read the encoded response into a buffer
buffer = StringIO(response.read())
# gzip decode the response
f = gzip.GzipFile(fileobj=buffer)
# store the result
content = f.read()
# close the buffer
buffer.close()
# else if the response isn't gzip-encoded
else:
# store the result
content = response.read()
# return the content
return content
I tested the function with and without the add_header() line and it worked as desired in both cases. Hopefully this will prove useful for someone else.
Comments
No one has added any comments.
Post Comments
If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.