DRAFT SPECIFICATION
Version 0.1
This specification describes a method of encoding RSS data as a comma-separated plain text file.
XML can get quite messy. Here is an alternative method of outputting a site summary that is easier to generate and easier to parse. It is recommended that RSS Over CSV is implemented as well as rather than instead of regular RSS.
Over the years, many different variants of CSV file have evolved, offering various different methods of escaping special characters. The variant described here is intended to be compatible with as many existing user agents as possible.
CSV is the encoding of a two dimensional array, referred to in terms of rows and columns. Where a row and column intersect, there is a cell which may contain a value. The following is a non-CSV representation of an array with four rows and three columns:
One Un Ein Two Deux Zwei Three Trois Drei Four Quatros Vier
When encoding to CSV, the array is output one row at a time into a plain text file, seperating each row with a new-line character. Within each row, cells are sepearted by comma characters.
The example above might be represented in a CSV file as:
One,Un,Ein Two,Deux,Zwei Three,Trois,Drei Four,Quatros,Vier
RSS Over CSV files SHOULD use Unix-style line endings (ASCII code 10), but may use Windows-style endings (ASCII 13 followed by ASCII 10). They MUST NOT use Mac-OS endings (ASCII 13). Files MUST end each line consistantly.
It might be clear from the above that commas, carriage returns and new line characters have a special meaning in CSV files, thus need some form of escaping when they occur in real character data.
Line breaks in cell data MUST be stripped out before writing the data to the CSV file. They SHOULD be replaced with other whitespace characters, such as tab or space.
Any cells containing a comma character MUST have their value wrapped with double-quote marks. Cells not containing commas may be similarly wrapped, but do not have to be.
For example, the following table:
Soap Opera Setting EastEnders London, UK Coronation Street Manchester, UK Home & Away Sydney, Australia
Might be encoded as:
Soap Opera,Setting "EastEnders","London, UK" Coronation Street,"Manchester, UK" Home & Away,"Sydney, Australia"
(Note that “EastEnders” did not need to be quoted, but can be.)
When reassembling the CSV file into a two-dimensional array, the outer quote marks should not form part of the final cell data.
Double quotes are thus clearly special characters as well, so need some form of escaping when they occur within cell data. Cells containing double quotes MUST be wrapped with outer double quotes. Double quotes within the cell MUST be escaped by “doubling up” the quotes. So the cell:
She said, "there's something in the wood shed."
is encoded as:
"She said, ""there's something in the wood shed."""
Leading and trailing whitespace in cell values MUST be ignored unless the whitespace is enclosed within quote marks. So the following lines are considered equivalent:
Alpha, Beta ,Gamma, Delta Alpha,Beta,Gamma ,Delta
But the following lines are not:
Alpha,Beta ,Gamma,Delta Alpha,"Beta ",Gamma,Delta
Now we have a specific method of encoding a two-dimensional array (table) into a CSV file, we may deal with how RSS can be fitted into a two-dimensional array.
An RSS file typically contains one or more channel elements, each of which contains one or more item elements. These channel and item elements are encoded to a table such that each channel element forms one row of the table. Item elements contained within that channel form subsequent table rows until either another channel element is encountered or the end of the file is reached.
After the first row (which contains column headings), the next row SHOULD encode a channel, not an item.
The first row of the table MUST provide column headings. The following columns MUST be present:
Additionally, a “Language” column SHOULD be present.
Other columns may be present, but SHOULD correspond to metadata elements defined in the RSS 2.0 specification. These columns may occur in any order. Receiving agents MUST accept columns in any order.
Columns headings MUST be considered case-insensitive. Thus a “link” column is the samne as “Link”, “LINK” or “LiNk”.
Cells in this column MUST have a value of “item” or “channel”. This is case-insensitive.
The cell value informs the user agent whether this row encodes a channel element or an item element. When a channel element is encountered, the following item elements are assumed to belong to that channel until another channel is encountered, or the end of the file reached.
The other columns correspond to the link, title, description, language, etc metadata elements found in RSS.
Example RSS file taken from http://developer.apple.com/internet/webcontent/rsssample_source.html:
<?xml version="1.0"?>
<rss version="0.91">
<channel>
<title>scottandrew.com JavaScript and DHTML Channel</title>
<link>http://www.scottandrew.com</link>
<description>DHTML, DOM and JavaScript snippets from scottandrew.com</description>
<language>en-us</language>
<item>
<title>DHTML Animation Array Generator</title>
<description>Robert points us to the first third-party tool for the DomAPI: The Animation Array Generator, a visual tool for creating...</description>
<link>http://www.scottandrew.com/weblog/2002_06#a000395</link>
</item>
<item>
<title>DOM and Extended Entries</title>
<description>Aarondot: A Better Way To Display Extended Entries. Very cool, and uses the DOM and JavaScript to reveal the extended...</description>
<link>http://www.scottandrew.com/weblog/2002_06#a000373</link>
</item>
<item>
<title>cellspacing and the DOM</title>
<description>By the way, if you're using the DOM to generate TABLE elements, you have to use setAttribute() to set the...</description>
<link>http://www.scottandrew.com/weblog/2002_05#a000365</link>
</item>
<item>
<title>contenteditable for Mozilla</title>
<description>The folks art Q42, creator of Quek (cute little avatar/chat) and Xopus (browser-based WYSIWYG XML-editor) have released code that simulates...</description>
<link>http://www.scottandrew.com/weblog/2002_05#a000361</link>
</item>
</channel>
</rss>
Translated into RSS Over CSV:
RSS Element, Title, Link, Description channel, scottandrew.com JavaScript and DHTML Channel, http://www.scottandrew.com, "DHTML, DOM and JavaScript snippets from scottandrew.com" item, DHTML Animation Array Generator, http://www.scottandrew.com/weblog/2002_06#a000395, "Robert points us to the first third-party tool for the DomAPI: The Animation Array Generator, a visual tool for creating..." item, DOM and Extended Entries, http://www.scottandrew.com/weblog/2002_06#a000373, "Aarondot: A Better Way To Display Extended Entries. Very cool, and uses the DOM and JavaScript to reveal the extended..." item,"cellspacing and the DOM", "http://www.scottandrew.com/weblog/2002_05#a000365","By the way, if you're using the DOM to generate TABLE elements, you have to use setAttribute() to set the..." item, contenteditable for Mozilla,http://www.scottandrew.com/weblog/2002_05#a000361, "The folks art Q42, creator of Quek (cute little avatar/chat) and Xopus (browser-based WYSIWYG XML-editor) have released code that simulates..."
RSS Over CSV devised by Toby Inkster as part of the demiblog project.