July 30th, 2007 - by Golgotha

I was asked a while back to try to explain, in plain English, the algorithm for associating headers cells with data cells in the HTML5 working draft. I am not a member of the HTML5 working group, and I have not been involved with creating this algorithm at all, so anything you read here is merely my interpretation of the working group’s words.

There has been some debate about this part of the HTML5 specification, since the working group proposes to remove some HTML4 attributes that are meant to improve accessibility: the abbr, summary, headers and axes attributes. One particularly interesting discussion in which I think proponents for retaining these made a convincing case can be found on Juicy Studio.

Let’s look closer at the new HTML5 table specifications.

The algorithm I was asked to explain is actually very straightforward in itself. The problem is that it depends on a far more complex algorithm for finding out which row(s) and column(s) each cell in a table belongs. I’ll get back to the header/data relationship algorithm in a bit, but I’d like to take a look at some other things first.

Columns and Column Groups

As I read the specification for tabular data in HTML5, the first thing that caught my attention was that HTML5 doesn’t allow col elements as immediate children of a table. In HTML4 we can either specify a number of columns or a number of column groups (but not a mix). Thus, in HTML5, we will always have to specify a column group, even if there is only one, if we want to declare columns.

This is strange, because I thought the HTML5 working group was basing its specification on contemporary use of HTML. I’m sure I’m not the only one to use col elements without an explicit colgroup.

Table Footers

The next difference from HTML4 is that HTML5 allows the tfoot element to appear after the tbody elements, which seems a bit odd since it defeats the purpose of a table footer. The reason why tfoot must appear before the first tbody in HTML4 is that a user agent shouldn’t have to parse the entire table to get to the footer information.

Let’s say we have a massive data table with a few thousand rows. When printing, we would like the table header and footer to appear on every page. With the HTML4 model, a user agent will know what the footer contains before it starts rendering the table body. In combination with using the fixed table layout algorithm in CSS2, this will allow it to render the table incrementally in one pass.

With the HTML5 model, the user agent has to read and parse the entire table to know whether or not there is a footer to render. This two-pass approach may take a significant amount of time for a large table.

Cell Content

A third divergence from HTML4 is that table cells – th and td elements – may contain either block-level elements or inline-level content, but not both. In HTML4 they can contain a mix. This is an improvement, since it is not semantically appropriate to mix block and inline content on the same structural level.

Associating Header Cells with Data Cells

Let’s go back to the header/data relationships, then.

The only method for explicitly associating header cells (th) with data cells (td) in HTML5 is by using the scope attribute on the header cells. This attribute can take one of four legal values: row, col, rowgroup or colgroup. As demonstrated in the comments of the Juicy Studio article, this makes some complex tables impossible to mark up correctly with HTML5.

Unfortunately, assistive technologies like screen readers have poor support for scope, since it is a non-trivial task to determine which row and column a cell belongs to (more on this below). Screen readers usually support headers well, but that attribute is forbidden by the HTML5 specification.

The explicit values for the scope attribute are straightforward. A th element with scope="row" is associated with all td elements after the th element in that row. A th element with scope="col" is associated with all td elements below the th element in that column. For scope="rowgroup" the th element is associated with all td elements after the th element, starting at the row that contains the th and stretching to the end of the row group. For scope="colgroup" the th element is associated with all td elements in that column group that occur after the th element.

For header cells that do not have an explicit scope attribute, the association is automatic. In this case, only header cells in the first row and/or first column of a table are associated with data cells. Header cells in the first row are associated with all data cells in the same column, while header cells in the first column of a row are associated with all data cells in that row.

Thus, if we need to have a multi-row table header, we must use explicit scope attributes to associate the header cells with the corresponding data cells.

With the automatic association algorithm, any header cell that is not in the first row and not in the first column will not be associated with any data cells.

Where Does a Cell Live?

As I said before, this algorithm appears fairly straightforward, albeit limited to simple tables. That appearance is, however, deceptive. Or, rather, the algorithm itself is simple enough, but the tricky part is to determine which cells exist in a given row, column, row group or column group. The algorithm for this is also given, in excruciating detail, in the specification. Explaining it in normal prose would not only be tediously lengthy, it would also be impenetrable.

You may wonder why this should have to be so complicated. A table is just a grid, right? A rectangular shape that has X rows and Y columns. In many cases, this is true, but the HTML table model allows far more complicated tables than that.

Using the rowspan and colspan attributes of cells, we can make some very complex table models, which is why the table forming algorithm is quite complicated.

Incidentally, HTML5 appears to change the valid values for the colspan attribute in a small, but significant way. In HTML4, both rowspan and colspan values are non-negative integers, i.e., 0, 1, 2, 3, … The value zero is special, because it means the cell spans to the end of the row group or column group. In HTML5, rowspan="0" is still allowed, but the colspan attribute only allows non-negative integers greater than zero.

One of the main differences between HTML4 and HTML5 is that the latter defines how errors should be handled. The HTML4 specification doesn’t say what should happen if an author specifies rowspan="2" on a cell in the last row of the table, for instance. HTML5, on the other hand, says that the algorithm must be aborted at that point and that it must return the table model it has assembled so far.

HTML5 also deprecates layout tables, but that’s another story.

Summary

This is what an HTML5-compliant user agent is required to do when it encounters a table element:

  1. Identify the caption element, if present. (It must be the first child of the table element, if it is.)
  2. Identify all row groups, rows, column groups and columns in the table. Also associate each cell with a row and column where is it ‘anchored’. This algorithm is quite complex.
  3. For each header cell, associate it with the data cells for which it applies. If the scope attribute is omitted, only cells in the first row of the table and cells in the first column of a row are associated with data cells.

3 Responses to “HTML5 Tables”

1 Lachlan Hunt

The reason colgroup is now required in HTML5 is based of the way IE parses col elements. In IE’s DOM, col elements are always children of a colgroup element. But the start tags for colgroup are optional, so you don’t have to write them in your HTML markup.

Requiring scope for multi-row header cells seems like a bug in the spec (assuming you haven’t misinterpreted it). The association algorithm should still be able to reliably handle header cells, no matter how many rows they span. Similarly, for row headers spanning multiple columns.

2 AutisticCuckoo

For the “auto” state, the spec says, “If the header cell is not in the first row of the table, or not in the first cell of a row, then don’t assign the header cell to any data cells.”

Thus if you have a THEAD that contains more than one TR, you will have to use explicit SCOPE attributes for THs in the 2nd, 3rd, etc., rows.

I’m not talking about TH elements with ROWSPAN=”2″, but about multiple TR elements containing multiple levels of column or column group headers.

3 ses5909

Thanks for bringing it down to my level tommy. I’ve got this post saved.

mulberry sale spyder womens jacket cheap new balance 574 mulberry outlet cheap new balance 574 arcteryx outlet mulberry sale spyder womens jacket mulberry sale spyder womens jacket mulberry outlet mulberry outlet new balance 574

Popular Articles

Top 10 Commentators


Subscribe to this feed! Subscribe by Email!

Random Bits Podcast

You need to download the Flash player from Adobe

Blogs Worth Reading