Thoughts and Notes Ideas that stay with me long enough to get written down

17Jan/060

Unique items in xsl:for-each

xsl:for-each is very useful for looping through elements, especially elements that you have grouped together using the Muenchian Method (BTW, name dropping here, I used to work with Steve Muench, and he's one of the smartest, more dedicated people I've ever met - if you're just starting to develop a Java data-driven application, you owe it to yourself to try ADF, especially if you are considering similar frameworks like Spring).
There's one problem with xsl:for-each, though, it doesn't have any type of uniqueness testing. This is a problem for tasks like indexes, that need multi-level uniqueness. For example, not only do you only want to have one index entry for "validation", you only want to have one child index entry of "validation" for "SQL".
Here's an example XML snippet.
[code lang="xml"]

SQL validation

is ...

...

Validation for SQL queries

...

[/code]
and I need the index to look like this

validation
SQL 3, 7
queries 7

where the first indexterm element composes on page 3 and the second is on page 7.
If you are using RenderX XEP's extension for indexing, the resulting FO should look something like this:
[code lang="xml"]

Validation

SQL

queries

[/code]
Using grouping, I can easily ensure that I only process "validation" once. The trouble comes when I loop through all the child indexterms.
Thinking this was easy, I tried a grouping (in this example, assume $parent_term is "validation" and the result of text() is "SQL"):
[code lang="xml"]
select="key('top-level-indexterms',parent_term)/
descendant::indexterm[generate-id(key('second-level-indexterms',text())[1])=generate-id(.)">
[/code]
but that doesn't work. The key will match all top level indexterms with the value "validation", which is both top-level indexterms above. The descendant::indexterm will find the first child of that term with the value of SQL - that's true for both elements in the example above.
Clearly, I can't look at just the child indexterms of the first indexterm with the value "validation", I need to process all of the child indexterms of each indexterms with the value of "validation". Since these elements are children of different elements, I can't test test their uniqueness using generate-id(). So how do I ensure that I only process one indexterm with the value of "SQL" that is a child of an indexterm "validation"?
I tried a few other things, but I couldn't get any closer. Either I duplicated the "SQL" entry, or I lost the other child entries. Eventually I just added a second transformation step to remove duplicates.
What I'd like to see is an attribute for xsl:sort called "unique" that sorts the entries, but restricts the loop to unique occurrences. It'd be tricky (which part of the node or node-tree has to be unique?), but very valuable.
If you can help me with a solution, I'd really appreciate it. I'd hate to suggest adding to XSLT, when there's a solution already available.