RSS Over CSV

DRAFT SPECIFICATION
Version 0.1

This specification describes a method of encoding RSS data as a comma-separated plain text file.

Rationale

XML can get quite messy. Here is an alternative method of outputting a site summary that is easier to generate and easier to parse. It is recommended that RSS Over CSV is implemented as well as rather than instead of regular RSS.

Syntax

CSV Syntax

Over the years, many different variants of CSV file have evolved, offering various different methods of escaping special characters. The variant described here is intended to be compatible with as many existing user agents as possible.

CSV is the encoding of a two dimensional array, referred to in terms of rows and columns. Where a row and column intersect, there is a cell which may contain a value. The following is a non-CSV representation of an array with four rows and three columns:

One     Un       Ein
Two     Deux     Zwei
Three   Trois    Drei
Four    Quatros  Vier

When encoding to CSV, the array is output one row at a time into a plain text file, seperating each row with a new-line character. Within each row, cells are sepearted by comma characters.

The example above might be represented in a CSV file as:

One,Un,Ein
Two,Deux,Zwei
Three,Trois,Drei
Four,Quatros,Vier

Line Endings

RSS Over CSV files SHOULD use Unix-style line endings (ASCII code 10), but may use Windows-style endings (ASCII 13 followed by ASCII 10). They MUST NOT use Mac-OS endings (ASCII 13). Files MUST end each line consistantly.

Escaping Special Characters

It might be clear from the above that commas, carriage returns and new line characters have a special meaning in CSV files, thus need some form of escaping when they occur in real character data.

Dealing with Line Breaks in Cell Data

Line breaks in cell data MUST be stripped out before writing the data to the CSV file. They SHOULD be replaced with other whitespace characters, such as tab or space.

Dealing with Commas in Cell Data

Any cells containing a comma character MUST have their value wrapped with double-quote marks. Cells not containing commas may be similarly wrapped, but do not have to be.

For example, the following table:

Soap Opera          Setting
EastEnders          London, UK
Coronation Street   Manchester, UK
Home & Away         Sydney, Australia

Might be encoded as:

Soap Opera,Setting
"EastEnders","London, UK"
Coronation Street,"Manchester, UK"
Home & Away,"Sydney, Australia"

(Note that “EastEnders” did not need to be quoted, but can be.)

When reassembling the CSV file into a two-dimensional array, the outer quote marks should not form part of the final cell data.

Dealing with Double-Quotes in Cell Data

Double quotes are thus clearly special characters as well, so need some form of escaping when they occur within cell data. Cells containing double quotes MUST be wrapped with outer double quotes. Double quotes within the cell MUST be escaped by “doubling up” the quotes. So the cell:

She said, "there's something in the wood shed."

is encoded as:

"She said, ""there's something in the wood shed."""

Encoding of White Space

Leading and trailing whitespace in cell values MUST be ignored unless the whitespace is enclosed within quote marks. So the following lines are considered equivalent:

Alpha,      Beta      ,Gamma, Delta
Alpha,Beta,Gamma    ,Delta

But the following lines are not:

Alpha,Beta  ,Gamma,Delta
Alpha,"Beta  ",Gamma,Delta

Encoding RSS

Now we have a specific method of encoding a two-dimensional array (table) into a CSV file, we may deal with how RSS can be fitted into a two-dimensional array.

An RSS file typically contains one or more channel elements, each of which contains one or more item elements. These channel and item elements are encoded to a table such that each channel element forms one row of the table. Item elements contained within that channel form subsequent table rows until either another channel element is encountered or the end of the file is reached.

After the first row (which contains column headings), the next row SHOULD encode a channel, not an item.

Column Headings

The first row of the table MUST provide column headings. The following columns MUST be present:

Additionally, a “Language” column SHOULD be present.

Other columns may be present, but SHOULD correspond to metadata elements defined in the RSS 2.0 specification. These columns may occur in any order. Receiving agents MUST accept columns in any order.

Columns headings MUST be considered case-insensitive. Thus a “link” column is the samne as “Link”, “LINK” or “LiNk”.

“RSS Element” Columns

Cells in this column MUST have a value of “item” or “channel”. This is case-insensitive.

The cell value informs the user agent whether this row encodes a channel element or an item element. When a channel element is encountered, the following item elements are assumed to belong to that channel until another channel is encountered, or the end of the file reached.

Other Columns

The other columns correspond to the link, title, description, language, etc metadata elements found in RSS.

Examples

Example RSS file taken from http://developer.apple.com/internet/webcontent/rsssample_source.html:

<?xml version="1.0"?>
<rss version="0.91">
  <channel>
    <title>scottandrew.com JavaScript and DHTML Channel</title>
    <link>http://www.scottandrew.com</link>
    <description>DHTML, DOM and JavaScript snippets from scottandrew.com</description>
    <language>en-us</language>
    <item>
      <title>DHTML Animation Array Generator</title>
      <description>Robert points us to the first third-party tool for the DomAPI: The Animation Array Generator, a visual tool for creating...</description>
      <link>http://www.scottandrew.com/weblog/2002_06#a000395</link>
    </item>
    <item>
      <title>DOM and Extended Entries</title>
      <description>Aarondot: A Better Way To Display Extended Entries. Very cool, and uses the DOM and JavaScript to reveal the extended...</description>
      <link>http://www.scottandrew.com/weblog/2002_06#a000373</link>
    </item>
    <item>
      <title>cellspacing and the DOM</title>
      <description>By the way, if you're using the DOM to generate TABLE elements, you have to use setAttribute() to set the...</description>
      <link>http://www.scottandrew.com/weblog/2002_05#a000365</link>
    </item>
    <item>
      <title>contenteditable for Mozilla</title>
      <description>The folks art Q42, creator of Quek (cute little avatar/chat) and Xopus (browser-based WYSIWYG XML-editor) have released code that simulates...</description>
      <link>http://www.scottandrew.com/weblog/2002_05#a000361</link>
    </item>
  </channel>
</rss>

Translated into RSS Over CSV:

RSS Element, Title, Link, Description
channel, scottandrew.com JavaScript and DHTML Channel, http://www.scottandrew.com, "DHTML, DOM and JavaScript snippets from scottandrew.com"
item, DHTML Animation Array Generator, http://www.scottandrew.com/weblog/2002_06#a000395, "Robert points us to the first third-party tool for the DomAPI: The Animation Array Generator, a visual tool for creating..."
item, DOM and Extended Entries, http://www.scottandrew.com/weblog/2002_06#a000373, "Aarondot: A Better Way To Display Extended Entries. Very cool, and uses the DOM and JavaScript to reveal the extended..."
item,"cellspacing and the DOM",   "http://www.scottandrew.com/weblog/2002_05#a000365","By the way, if you're using the DOM to generate TABLE elements, you have to use setAttribute() to set the..."
item,  contenteditable for Mozilla,http://www.scottandrew.com/weblog/2002_05#a000361, "The folks art Q42, creator of Quek (cute little avatar/chat) and Xopus (browser-based WYSIWYG XML-editor) have released code that simulates..."

Credits

RSS Over CSV devised by Toby Inkster as part of the demiblog project.