Blogroll

Using Python to convert OPML to HTML

If you follow a number of blogs in a feed reader such as Feedly, wouldn’t it be great if you could turn the OPML export directly into nicely formatted HTML for a bulleted list in your own blog, complete with descriptions of each blog from the authors themselves. That’s what I thought, so I wrote this Python script to do exactly that.

It looks through each feed in an OPML file, loads the feed, and then reads the description before compiling them all into one outputted chunk of HTML – a list of links ready to drop into a page in a blog. Here’s how you might call the script :

python opml2html.py subscriptions.opml > html.txt

And here’s the script to do the work:

import sys,urllib2
import xml.etree.ElementTree as ET

# Prepare a blog object
class Blog:
    def __init__(self,title,url,rss,description):
        self.Title = title
        self.URL = url
        self.RSS = rss
        self.Description = description

# Prepare a blog list
blogs = []

# get the filename passed in
filename = sys.argv[1]
print 'Processing ' + filename

# load and parse the file
opml_tree = ET.parse(filename)
opml_root = opml_tree.getroot()

# find the feeds
feeds = opml_root.findall(".//outline")

# loop through the feeds and output their titles
for feed in feeds :

    # Check we have the text and htmlUrl attributes at least (the title and url of the blog)
    if "text" in feed.attrib :

        if "htmlUrl" in feed.attrib :

            # get the properties of the feed
            feed_title = feed.attrib['text']
            feed_url = feed.attrib['htmlUrl']

            feed_description = ""
            feed_rss = ""

            if "xmlUrl" in feed.attrib :

                feed_rss = feed.attrib['xmlUrl']
                
                print feed_rss
                
                try:
                    
                    feed_tree = ET.parse(urllib2.urlopen(feed_rss))
                    feed_root = feed_tree.getroot();
                    descriptions = feed_root.findall('channel//description')

                    if descriptions[0].text is None :
                        feed_description = "No description..."
                    else :
                        feed_description = descriptions[0].text

                    
                except IndexError, e:
                    feed_description = "No description..."
                except urllib2.HTTPError, e:
                    feed_description = "RSS Feed Not Found..."
                except urllib2.URLError, e:
                    feed_description = "RSS Feed Not Found..."

                print feed_description
                print "-"


            blog = Blog(feed_title,feed_url,feed_rss,feed_description)
            blogs.append(blog)

# Sort the blogs
blogs.sort(key=lambda blog: blog.Title)

# start HTML output
html = "\n"

# output HTML
print html
Posted by Jonathan Beckett in Notes, 0 comments