h1. RSS Over CSV *DRAFT SPECIFICATION* *Version 0.1* This specification describes a method of encoding RSS data as a comma-separated plain text file. h2. Rationale XML can get quite messy. Here is an alternative method of outputting a site summary that is easier to generate and easier to parse. It is recommended that RSS Over CSV is implemented *as well as* rather than instead of regular RSS. h2. Syntax h3. CSV Syntax Over the years, many different variants of CSV file have evolved, offering various different methods of escaping special characters. The variant described here is intended to be compatible with as many existing user agents as possible. CSV is the encoding of a two dimensional array, referred to in terms of _rows_ and _columns_. Where a row and column intersect, there is a _cell_ which may contain a _value_. The following is a non-CSV representation of an array with four rows and three columns: ====
One Un Ein Two Deux Zwei Three Trois Drei Four Quatros Vier==== When encoding to CSV, the array is output one row at a time into a plain text file, seperating each row with a new-line character. Within each row, cells are sepearted by _comma_ characters. The example above might be represented in a CSV file as: ====
One,Un,Ein Two,Deux,Zwei Three,Trois,Drei Four,Quatros,Vier==== h4. Line Endings RSS Over CSV files *SHOULD* use Unix-style line endings (ASCII code 10), but may use Windows-style endings (ASCII 13 followed by ASCII 10). They *MUST NOT* use Mac-OS endings (ASCII 13). Files *MUST* end each line consistantly. h4. Escaping Special Characters It might be clear from the above that commas, carriage returns and new line characters have a special meaning in CSV files, thus need some form of escaping when they occur in real character data. h5. Dealing with Line Breaks in Cell Data Line breaks in cell data *MUST* be stripped out before writing the data to the CSV file. They *SHOULD* be replaced with other whitespace characters, such as tab or space. h5. Dealing with Commas in Cell Data Any cells containing a comma character *MUST* have their value wrapped with double-quote marks. Cells not containing commas may be similarly wrapped, but do not have to be. For example, the following table: ====
Soap Opera Setting EastEnders London, UK Coronation Street Manchester, UK Home & Away Sydney, Australia==== Might be encoded as: ====
Soap Opera,Setting "EastEnders","London, UK" Coronation Street,"Manchester, UK" Home & Away,"Sydney, Australia"==== (Note that "EastEnders" did not need to be quoted, but can be.) When reassembling the CSV file into a two-dimensional array, the outer quote marks should not form part of the final cell data. h5. Dealing with Double-Quotes in Cell Data Double quotes are thus clearly special characters as well, so need some form of escaping when they occur within cell data. Cells containing double quotes *MUST* be wrapped with outer double quotes. Double quotes within the cell *MUST* be escaped by "doubling up" the quotes. So the cell: ====
She said, "there's something in the wood shed."==== is encoded as: ====
"She said, ""there's something in the wood shed."""==== h4. Encoding of White Space Leading and trailing whitespace in cell values *MUST* be ignored unless the whitespace is enclosed within quote marks. So the following lines are considered equivalent: ====
Alpha, Beta ,Gamma, Delta Alpha,Beta,Gamma ,Delta==== But the following lines are not: ====
Alpha,Beta ,Gamma,Delta Alpha,"Beta ",Gamma,Delta==== h3. Encoding RSS Now we have a specific method of encoding a two-dimensional array (table) into a CSV file, we may deal with how RSS can be fitted into a two-dimensional array. An RSS file typically contains one or more _channel_ elements, each of which contains one or more _item_ elements. These _channel_ and _item_ elements are encoded to a table such that each _channel_ element forms one row of the table. _Item_ elements contained within that _channel_ form subsequent table rows until either another _channel_ element is encountered or the end of the file is reached. After the first row (which contains column headings), the next row *SHOULD* encode a channel, not an item. h4. Column Headings The first row of the table *MUST* provide column headings. The following columns *MUST* be present: * RSS Element * Title * Link * Description Additionally, a "Language" column *SHOULD* be present. Other columns may be present, but *SHOULD* correspond to metadata elements defined in the RSS 2.0 specification. These columns may occur in any order. Receiving agents *MUST* accept columns in any order. Columns headings *MUST* be considered case-insensitive. Thus a "link" column is the samne as "Link", "LINK" or "LiNk". h4. "RSS Element" Columns Cells in this column *MUST* have a value of "item" or "channel". This is case-insensitive. The cell value informs the user agent whether this row encodes a _channel_ element or an _item_ element. When a _channel_ element is encountered, the following _item_ elements are assumed to belong to that _channel_ until another _channel_ is encountered, or the end of the file reached. h4. Other Columns The other columns correspond to the _link_, _title_, _description_, _language_, etc metadata elements found in RSS. h2. Examples Example RSS file taken from "http://developer.apple.com/internet/webcontent/rsssample_source.html":http://developer.apple.com/internet/webcontent/rsssample_source.html: ====|code::xml|
RSS Element, Title, Link, Description channel, scottandrew.com JavaScript and DHTML Channel, http://www.scottandrew.com, "DHTML, DOM and JavaScript snippets from scottandrew.com" item, DHTML Animation Array Generator, http://www.scottandrew.com/weblog/2002_06#a000395, "Robert points us to the first third-party tool for the DomAPI: The Animation Array Generator, a visual tool for creating..." item, DOM and Extended Entries, http://www.scottandrew.com/weblog/2002_06#a000373, "Aarondot: A Better Way To Display Extended Entries. Very cool, and uses the DOM and JavaScript to reveal the extended..." item,"cellspacing and the DOM", "http://www.scottandrew.com/weblog/2002_05#a000365","By the way, if you're using the DOM to generate TABLE elements, you have to use setAttribute() to set the..." item, contenteditable for Mozilla,http://www.scottandrew.com/weblog/2002_05#a000361, "The folks art Q42, creator of Quek (cute little avatar/chat) and Xopus (browser-based WYSIWYG XML-editor) have released code that simulates..."==== h2. Credits RSS Over CSV devised by "Toby Inkster":http://tobyinkster.co.uk/ as part of the "demiblog":http://demiblog.org/ project.