RSS4website.com
 



 

    Making Headlines with RSS

    Using Rich Site Summaries To Draw New Visitors

    By Jonathan Eisenzopf

    In the early years of the Web, most sites were not concerned about sharing data with other sites. Today, the trend is that sites are increasingly interdependent and many rely upon integrating content that originates somewhere else. Such content might include news feeds, events listings, a set of project updates, and even interchange of corporate information. Effective integration usually requires a good deal of effort on the part of the information provider, as well as the recipient of each unique data source.

    Sharing content among sites is most often called syndication, a term we associate with licensed content such as TV reruns and newspaper columns. Providing content from one source for distribution in many different channels is what a syndicate does, and it usually requires an established business relationship. Companies like iSyndicate.com and specifications such as Internet Content Exchange (ICE) are examples of attempts to apply the traditional syndication model to the Web. (For more information on ICE, see "Self-Service Syndication with ICE," Web Techniques, November 1999.) However, the Web also offers a new open-ended syndication model that's hardly traditional.

    The basis for this new model is an XML-based format known as Rich Site Summary. RSS was first developed by Netscape to drive channels for Netscape Netcenter. Netscape no longer seems to be leading the RSS effort, but others, such as Dave Winer of Userland Software, have picked it up. More importantly, content providers like Slashdot, the Motley Fool, Wired News, and Linux Today have been adopting RSS as a means of circulating headlines and links to new stories on their sites. RSS is becoming a vital "What's New" mechanism that serves a variety of purposes while helping to attract traffic from many different locations on the Web. RSS seems to be succeeding because it's a simple way to solve a common problem that extends far beyond the idea of syndication. RSS is a better way to share data than more common approaches, such as fetching and parsing HTML, or using proprietary APIs, database dumps, and cobranding.

    Grabbing and parsing the HTML from a provider's Web site is the most common way to share data. The problem with this cut-and-paste method is that an application must be developed and maintained for each data source. These applications will most likely have to change each time the provider changes the HTML presentation. This can quickly become cumbersome and cost prohibitive when gathering information from multiple sources.

    APIs that let partners access data are an improvement, but they also can create problems. First, APIs are usually language dependent, and hence may require core competencies unavailable in-house. Second, APIs are not extensible: You are constrained to the data and functionality that the API provides. Third, each API will be implemented differently based on the habits and needs of the programmers that developed it. You'll have to maintain in-house expertise for each API you use.

    Web sites also exchange data via database dumps. But the data must be converted on both ends and you don't necessarily eliminate the problem of dealing with multiple data formats. This option would actually work if all content providers used the same data model for delivering information, an improbable scenario.

    Cobranding is a method in which the information provider hosts custom versions of the application for each customer. This works out nicely for subscribers that don't have any programming resources. The problem is that the data is either presented in a generic format that doesn't fit the customer's interface, or it requires that the content provider maintain a cobranding template for each customer. While this is a good solution, the functionality is limited to what the Web application can provide. It also requires a large amount of planning and development on the provider's part. However, this technique has worked out nicely for companies like Amazon.com that allow users to sign up and sell books from their own Web sites.

    Under the RSS model, each site publishes a file describing the contents of its "channel." Other sites can subscribe to that channel and grab its contents. The RSS file could be converted to HTML and displayed directly on a subscriber site, or it might be edited first to select only those items that are appropriate for the site's audience. The nice thing about RSS, of course, is that once you've built the system to subscribe to one RSS channel, you can subscribe to thousands of them.

    RSS Syntax

    RSS is an XML grammar for sharing data. That means that an RSS file contains placeholders for data, which are identified by a starting and ending tag. The first task required to RSS-enable your site is to create such a file on your Web server. This RSS file contains the title and description of items that you want to promote on your site. As you'll see, an RSS file is usually generated by a simple program but it can also be created by hand.

    Like any XML document, the first line of an RSS file contains an XML declaration:

    <?xml version="1.0"?>

    While the XML declaration isn't required, it is recommended for backwards compatibility.

    The next item in an RSS file is the DTD that identifies the file as an RSS document. This is necessary to determine whether the file is valid when tested against the rules of the RSS DTD:

    <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"

    "http://my.netscape.com/publish/ formats/rss-0.91.dtd">

    The rss element is the root or top-level element of an RSS file. The rss element must specify the version attribute. (The current version is 0.91). It may also contain an encoding attribute (the default is UTF-8):

    <rss version="0.91" encoding=

    "ISO_8859-1">

    The root element is the top-level element that contains the rest of an XML document.

    An rss element may contain one and only one channel element. This element will contain the individual items. Each channel must contain the following elements:

    • title - the name of the channel
    • description - a short description of the channel
    • link - an HTML link to the channel Web site
    • language - the language encoding of the channel. A list of values is available from my.netscape.com. The code for U.S. English is en-us
    • one or more item elements

    A channel may also contain the following optional elements:

    rating - the PICS rating for the channel Web site. PICS ratings are assigned by an independent agency. A list is available at www.w3.org/PICS/raters

    • copyright - content copyright
    • pubDate - date the channel was published
    • lastBuildDate - date the RSS was last updated
    • docs - additional information about the channel
    • managingEditor - channel's managing editor
    • webMaster - channel Webmaster
    • image - channel image
    • textinput - allows a user to send an HTML form text input string to a URL
    • skipHours - the hours that an aggregator should not collect the RSS file
    • skipDays - the weekdays that an aggregator should not collect the RSS file

    See Listing One for a complete example of an RSS channel for XML.com. [Editor's Note: All listings referenced in this article are available online at www.webtechniques.com/sourcecode.]

    A channel may contain an image or logo. The image element must contain the image title, commonly used as the ALT attribute when converted to an HTML image element, and the URL of the image itself.

    The image element may also include the following optional elements:

    • link - a URL that the image should be linked to
    • width - the image width
    • height - the image height
    • description - an area for additional text

    The textinput element lets users input data in an HTML text field:

    • title - label of the submit button
    • description - text input description
    • name - text input name
    • link - URL to which to send the input

    For example, the Freshmeat channel in Listing Two (available online) contains a textinput element that lets users search the application database.

    Each channel can contain up to 15 items. Actually, you can include more, but if you do, Netscape Netcenter won't accept the file. Each item contains a title, link, and description. The item elements are the real meat of the RSS file. They provide the headlines and summaries of the content you want to share with other sites.

    The RSS specification includes all HTML entities for convenience; however, you can't include any HTML elements, such as <p>. For the RSS file to remain valid you should use only those elements that have been defined in the specification. Additionally, you must follow a few basic XML guidelines for the file to be well formed. An XML parser can't properly parse an XML file unless it follows the following well-formed rules:

    1) Each starting tag must have an ending tag.

    2) Internal entities such as &amp;, &quot;, &lt;, &gt;, must be encoded.

    3) XML elements must be well balanced; that is, the end tag should be at the same level in the tree as the start tag.

    Creating an RSS File

    I've written a Perl module that makes it easy to maintain and parse RSS files. XML::RSS also requires the XML::Parser module maintained by Clark Cooper. Both are available through CPAN. In addition to those described in this article, I've also developed a number of other freely available RSS tools for gathering, editing, and displaying RSS files, most of which are available at motherofperl.com. Instructions on installing the XML::RSS module are also available from the site.

    To use the module in a Perl program, you must first load the module into memory and create a new instance of the class:

    use XML::RSS;
    my $rss = new XML::RSS;

    Optionally, you can pass the RSS version and the language encoding into the new method when creating a new instance:

    my $rss = new XML::RSS (version=> '0.91', encoding=>'ISO_8859-1');

    XML::RSS simplifies several common tasks related to maintaining an RSS file. First, the module abstracts the XML syntax into a number of class methods. For each RSS element, there is a related method. Each element method operates in a similar fashion. For example, to set values for the channel element, we would call the channel method and pass it an associative array, which contains the names and values of each channel subelement (see Example 1).

    You can also use these methods to modify values of the RSS. For example, to change the URL of the RSS image, you might use the following:

    $rss->image(url => 'http://fresh
    meat.net/images/fm.mini.jpg');

    Because there are multiple items in an RSS file, the add_item method is used to add a new RSS item, as shown in Example 2.

    By default, the add_item method appends the item to the list, but you can also force the item to be inserted by setting the mode to insert the code, as shown in Example 3.

    Retrieving the values of an RSS file is also simple, but first, you probably want to parse an RSS file. The parsefile method takes the RSS filename as its only parameter and transforms it into a multidimensional hash:

    $rss->parsefile("fm.rss");

    To access the value of a subelement, simply pass the name of the subelement into the method. For example, to retrieve the value of the textinput description:

    my $ti_desc = $rss->
    textinput("description");

    The element method will return the value of the subelement. Once you've created and/or modified the RSS file, you can save it with the save method:

    $rss->save("fm.rss);

    Before you can begin syndicating content, you'll need to set up a process to keep the RSS file up-to-date. Optimally, when a new item is posted to your Web site, it will also show up in the RSS file. You can maintain this file by hand, but I suspect most will prefer to automate the process with a script. Listing Two is a Perl script that uses the XML::RSS module that creates a channel for Freshmeat and saves it to fm.rdf. The output of the script is contained in Listing Three.

    The XML::RSS module also makes it easy to update an RSS file. Listing Four is a short script that inserts a new item in our Freshmeat RSS file. Notice the order of the script. First, we load the module into memory with the use statement. Then we create a new instance of the class with the new method, setting the RSS version to 0.91. Next we parse an RSS file with the parsefile method, insert a new item with the add_item method, and then save the RSS to a file with the save method.

    Converting an RSS File to HTML

    The previous two examples demonstrated how to maintain an RSS file, but what if you're on the receiving end? The easiest method of displaying an RSS file on a Web site is to convert it to HTML and use an SSI to bring the content into a template. Listing Five does just that. It's a command-line script that takes a filename or URL as a parameter, iterates through the XML::RSS internal structure, and prints the HTML equivalent. If the command-line parameter is an HTTP URL, the RSS file is fetched from the remote Web server via the LWP::Simple module.

    In Listing Five, we iterate through items inside a foreach loop, printing the corresponding title and link. The last part of the subroutine prints the HTML form using the textinput subelements. The result is an HTML form field that lets a user search for applications on Freshmeat. Listing Six is the output of the script when using Listing One as the input file.

    Now that we have the XML.com channel in an HTML format, we can include it on our Web site. The majority of the script is contained in the print_html subroutine, which handles the RSS-to-HTML conversion. Most of the subroutine is actually HTML code.

    The first few lines of the subroutine print a table header that contains the channel title, link, and image. As I mentioned previously, the XML::RSS module builds a multidimensional hash that represents the RSS file. The hash can be accessed directly instead of using the class methods. For example, the channel title and link are contained in

    $rss->{'channel'}->{'title'}

    and in

    $rss->{'channel'}->{'link'},

    respectively. The image URL and link would be contained in

    $rss->{'image'}->{'url'}

    and

    $rss->{'image'}->{'link'}

    variables.

    $rss->{'items'}

    is a reference to the array of RSS items.

    Syndication

    Once an RSS file exists, any other site can grab it regularly. RSS standardizes a format for the delivery of content. This makes it easier for a content provider to distribute content broadly, and for an affiliate to receive and process content from multiple sources. However, in most cases, the actual content is not really distributed, only the headlines are, which means that users will come back to your affiliate site if they're interested in the story. For example, many content providers use ad banners as a primary source of revenue. This model depends on a large volume of users reading their content on a regular basis. The RSS format is a marriage made in heaven for extending readership. This explains why most early adopters have been news providers.

    Here's how it works. First, start generating one or more RSS files for your Web site. Drop the headline into the title element and give a teaser or summary in the description element. Drop the content URL into the link element. Second, make the RSS file available on your Web server and register it on as many aggregators as you can (see " Online"). Traffic to your Web site will increase as users add your RSS channel to their Web sites and news readers.

    Once you're publishing an RSS file you can begin to flow content into new venues such mailing lists, PDAs, cell phones, and set-top boxes. For example, you may decide to offer headlines in a PDA-friendly format, or create a weekly email newsletter comprising what's new on your Web site. More importantly, you can now flow data between partner or affiliate Web sites.

    Let's pretend for a moment that your site is part of a Web development affiliate network. Each site focuses on a particular specialty or area of interest. You would like to cross-promote headlines among sites to maximize readership. Also, there are times when one affiliate may carry content that directly relates to another site's readership. Cross-posting this information is in the interest of all affiliates. If a Web site makes its RSS files available, its affiliates can easily integrate the providers' headlines. When users read the headline and click on the link to read the story, both sites get their page views.

    Aggregation

    The practice of gathering multiple RSS channels into one central location is called aggregation. While most aggregator Web sites share a common goal -- gathering content -- they serve different purposes. For example, my.netscape.com offers its feeds as channels to Netcenter users, whereas iSyndicate.com offers news feeds primarily for use on other Web sites. Another implementation of aggregation is Dave Winer's my.userland.com, which offers a service similar to my.netscape.com. However, the aggregator also offers aggregate feeds, which send new content to partners via XML-RPC function calls. The benefit of using aggregators is that they make many feeds available from one place. Furthermore, an aggregator may offer tools or solutions that allow partners to customize feeds and minimize the integration effort. In addition, an aggregator site might provide tools and services that make it easier for content providers to syndicate their information.

    Weblogs

    One of the more interesting trends the Web has seen in the past months is the advent of the Weblog. A Weblog is a portal to the life of an individual or group. The ideas posted on a Weblog often include personal, political, technical, or editorial comments that are significant to the author. The Web site that popularized the Weblog is probably Slashdot.org, a site that posts interesting technology tidbits for computer geeks. Scripting.com, an earlier example of a Weblog, is a site at which readers get a personal insight into the mind of Dave Winer. Dave often combines his opinions of technical innovations with politics, philosophy, and history, which makes for an interesting daily read.

    It turns out that RSS is a good foundation for creating a Weblog. An example is PerlXML.com, a site containing Perl/XML resources and news. A simple CGI script that uses the XML::RSS Perl module is used to add new headlines. The script updates both the front-page HTML file and the RSS headlines, which are then picked up by several aggregators including my.netscape.com and my.userland.com. This dual-purpose method alleviates the Weblog editor from updating multiple files. Instead, the editor can focus on his or her job and let an application on the Web server do the work behind the scenes.

    The Future

    RSS can be used easily as a generic format for exchanging content on the Web. More Web sites are using XML and RSS as they discover that the technologies help promote traffic to a site. RSS is a good starting point for many Webmasters who aren't ready to immerse themselves in XML yet.

    It's important to note that while RSS is capable of syndicating content headlines, there are other XML formats like XMLNews and ICE that are better suited for handling larger syndication systems.



    Jonathan is vice president of technology at Whirlwind Interactive (www.wwind.com) and can be reached via email at eisen@wwind.com

  • Source: http://www.newarchitectmag.com/archives/2000/02/eisenzopf/

.

 


Copyright © 2004 RSS4website.com
AE - 326, Salt Lake City, Kolkata (calcutta), - 700 064, West Bengal, INDIA
Tele: +91-
9830068765 Fax: +91-33-23580807
Email: info@rss4website.com