Extracting Unique Values with XSL

Album Cover: Icky Thump

"You can't be a pimp and a prostitute, too."
White Stripes / Icky Thump

Posted on September 17, 2004 5:51 PM in XML
Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

Anyone who has used XSL will tell you that its nothing short of powerful. For a quick and dirty example of what can be done with XSL, just take a look at what Mark Pilgrim has done to his Atom feed (if you're interested in how he did it, you can read all about it).

A problem I ran into recently when trying to transform an XML document using XSL was that I only wanted my transformation to grab values that were unique. So for instance, consider the following example:

<?xml version="1.0" ?>
<celebrities>
 <celebrity>
  <name first="Marilyn" last="Monroe" />
  <birthdate value="1926-06-01" />
 </celebrity>
 <celebrity>
  <name first="Damon" last="Wayans" />
  <birthdate value="1960-09-04" />
 </celebrity>
 <celebrity>
  <name first="Marilyn" last="Manson" />
  <birthdate value="1969-01-05" />
 </celebrity>
</celebrities>

Now let's suppose I want to extract all the celebrity first names from that document, but I want each item in the list returned to be unique (i.e. I only want 2 names returned instead of 3). A quick search around the Web revealed a consensus as to what approach to take when tackling this problem – the only problem being that I could not, for the life of me, get it to work.

If you attempt to find a solution to this problem, you'll doubtless run across several references to XPath's preceding-sibling selector. The problem is, most people (myself included) can't get this to work in complex situations – mostly since there are so few samples of the selector's usage available online.

This story has a happy ending, though. While sifting through tons of preceding-sibling content, I happened to stumble upon an entirely different solution involving XSL's key element. This approach, which is similar to hashing all the values, is not only clearly documented online, but it's also relatively easy to follow.

The following XSL transformation will return the list I'm after:

...
<xsl:key name="names" match="name" use="@first"/>
<xsl:template match="/">
<xsl:for-each select="//name[generate-id() = generate-id(key('names',@first)[1])]">
<xsl:value-of select="@first"/>
</xsl:for-each>
</xsl:template>
...

As you can see, I first establish a key by the name of "names," telling it to match the name element using its first attribute. Then when it comes time to select each first attribute in the document, I generate an id for the current element and make sure it is equal to the id of the first first attribute in the defined group of keys. Make sense? Well, it doesn't have to, because it works, and that's more than I can say about the preceding-sibling approach.

If you have any questions about the key-based approach or can lend any insight regarding the preceding-sibling approach, please feel free to let me know.

Comments

Damon on November 22, 2005 at 8:44 PM:

The following thread and site were helpful in understanding why my preceding-sibling wasn't working. I was having a similar problem, but wanted unique elements within a specific nodes, not the entire document. It turns that preceding-sibling itself returns a node-set, which is in the order of the document...which means logical usage always gets you the first member in the document. ...it's expained better in the links:

http://www.biglist.com/lists/xsl-list/archives/200302/msg00368.html

http://www.dpawson.co.uk/xsl/sect2/N1641.html#d2340e325

Permalink

Jesse Weinstein on February 07, 2007 at 10:42 PM:

There's a rather critical typo in the above. The line

generate-id(key('names',@first))[1])]">

should be

generate-id(key('names',@first)[1])]">

with only one close paren following the @first.

Permalink

Abhinav Maheshwari on September 14, 2007 at 4:18 PM:

Bernie,

this can be done using "following" axis although "preceding" will not work. The example has been taken from Dave Pawson which is a wonderful resource for XSLT solutions.

Warm regards,
Abhinav


<location>
<state>xxxx</state>
</location>

<location>
<state>yyyy</state>
</location>

<location>
<state>xxxx</state>
</location>


The desired output is:

xxxx
yyyy

That is, duplicate values of state should not be printed. This can be done as follows.


<xsl:variable name="unique-list"
select="//state[not(.=following::state)]" />

<xsl:for-each select="$unique-list">
<xsl:value-of select="." />
</xsl:for-each>

Permalink

Tommy on October 02, 2007 at 6:48 PM:

Awesome tip...I found the preceding sibling thing a bunch of times and it never worked for me either :(

Anyway, I used your way and it works!

Permalink

Bob Duckworth on January 15, 2008 at 9:10 AM:

This was extremely helpful and worked perfectly (when the typo was corrected!)

Top tip.

Permalink

Inigo on January 29, 2008 at 8:45 AM:

If you're using XSLT 2, then instead you can use:

<xsl:for-each-group select="//state" group-by=".">
<xsl:sequence select="current-group()[1]" />
</xsl:for-each>

(updating Abhinav's example above)

Permalink

squeaky on March 15, 2008 at 4:51 AM:

Buddy - You saved my life ;)

Permalink

Vithal on June 18, 2008 at 5:26 AM:

Abhinav's tip worked like a charm for me!

Permalink

Kumar on July 15, 2008 at 12:01 PM:

I need unique list of combinatin of attributes city and state. I tried the above solution which didn't work. Any help will be welcomed.

<location>
<city>aaaa</city>
<state>xxxx</state>
</location>

<location>
<city>aaaa</city>
<state>yyyy</state>
</location>

<location>
<city>aaaa</city>
<state>yyyy</state>
</location>

<location>
<city>bbbb</city>
<state>xxxx</state>
</location>

<location>
<city>bbbb</city>
<state>xxxx</state>
</location>

<location>
<city>bbbb</city>
<state>yyyy</state>
</location>

Permalink

Tommy E on August 04, 2008 at 11:25 AM:

Great advice. I have been looking awhile for a way to pick out unique elements and this works. Thanks.

Permalink

jlh on August 10, 2008 at 3:55 PM:

Abhinav's tip is excellent! One quibble: the code as written will pick up all state elements, whether or not they are contained in location elements. It might be better to write:

select="/location/state[not(.=following::state)]" />

Permalink

Douglas on November 04, 2008 at 9:46 AM:

Dude this article and those comments saved my life. Thank you!

Permalink

Sundar on December 17, 2008 at 6:56 AM:

Thanks, this is realy great helpfull refrence.

Permalink

Gene Wood on January 23, 2009 at 4:37 PM:

Whoops, guess I'll have to encode this comment myself. Here goes a second try :

A followup to Abhinav Maheshwari's comment above. If you'd like to do this with an attribute instead here is an example :

XML :

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" rel="nofollow" rel="nofollow" href="b.xsl"?>

<xml>
<location state="xxxx">a</location>
<location state="yyyy">b</location>
<location state="xxxx">c</location>
</xml>

XSL :

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:template match="/">
<xsl:variable name="unique-list" select="/xml/location[not(@state=following::location/@state)]" />
<xsl:for-each select="$unique-list">
<xsl:value-of select="@state" />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Permalink

Manasa on April 20, 2009 at 4:27 AM:

Thanks for the example using the attribute, it worked fine. But i need to set a parameter to get the attribute name so that i can use $attributeName to fetch the unique records based on the attribute. Any idea about this ?

Permalink

dilip on May 09, 2009 at 8:42 AM:

if you add one more attribute to it than @state will contain values of both of the attribute
i.e


a
b
c

Permalink

Tom on May 18, 2009 at 5:45 AM:

Works! Thank you very much.
However, all the future readers of above article would probably appreciate a correction regarding the typo mentioned by Jesse Weinstein in the second comment.
Btw: The links to the original documentation of the trick are broken.
Apart from that, it helped me very much and saved my day. Thanks again.

Permalink

charan on May 21, 2009 at 7:14 AM:

it is really excellent i used the xslt tranformation very well

Permalink

Bernie Zimmermann on May 24, 2009 at 11:20 PM:

Thanks to Jesse for pointing out the typo and to Tom for reminding me I need to update the original article. I've now updated it, so hopefully fewer people will be confused by the example given going forward.

Permalink

Dennis on September 30, 2009 at 6:45 AM:

Yeeesss! Thank you very much.

Permalink

Lo-Tan on December 02, 2009 at 8:10 AM:

I believe that this is actually called the muenchian method.

http://www.jenitennison.com/xslt/grouping/muenchian.html

Just nice to know, and give credit where it's due.

Permalink

Steven Wilber on February 16, 2010 at 7:22 AM:

I don't know if this will help anyone out in the future, but I have been struggling for a few hours with the following XML:








I wanted to select unique years and started off with the obvious xpath:

years/year[@value != preceding-sibling::year/@value]


However it only returned the 2009 node and no 2010 nodes. After much searching and messing around with XPath I finally made it work with only a very slight and curious tweak. I replaced the '!=' with a 'not', ie.:

years/year[not(@value = preceding-sibling::year/@value)]


This works a treat, but I con't really see what the difference between them is apart from a few hours of my life.

Hey ho! Maybe this will help someone else.

Cheers

Steve

Permalink

Steven Wilber on February 16, 2010 at 7:24 AM:

Okay, I'll encode the XML myself for you to see:

<years>
<year value="2010"></year>
<year value="2010"></year>
<year value="2010"></year>
<year value="2009"></year>
</years>

Cheers

Steve

Permalink

prisilla on April 04, 2010 at 7:27 PM:

Hi,

Thankyou so much for the post. U saved me.

Permalink

Vyom Dixit on November 13, 2010 at 1:53 AM:

hi thanks.
It's working perfectly. I need to get distinct values from the various nodes and i finally got by using not(.following) sentax.
Thanks Again.

Permalink

Dave Pickard on January 25, 2011 at 4:59 AM:

Still a great tip more than 6 years on :-)

Permalink

Xavi on March 24, 2011 at 9:40 AM:

Awesome, it worked! Thanks!

Permalink

Kenny on April 20, 2011 at 11:55 AM:

This was a lifesaver since I don't appear to have an XSLT 2.0 capable processor (still using classic ASP). Need to process the following nodes too now, so will have to figure that one.

XSLT is very elegant in my opinion.

Permalink

Dale on June 27, 2011 at 3:12 PM:

Still good value. Its worked from me too
Thanks

Permalink

Muzietto on July 06, 2011 at 3:50 AM:

Thank youv very much. Still valuable and helpful.

You could mention that this mechanism works also in selecting distinct child nodes instead of attributes. All you have to do is omit the @.

Have a good day,
Marco

Permalink

Preeti Raj on July 29, 2011 at 2:27 AM:

Abhinav's sample saved my weekend. Thanks to all the contributors, this is such a useful article.

Permalink

Devika on May 22, 2012 at 5:11 AM:

Thanks Abhinav.. Ur code saved me..

Also, a big thanks to Berni for creating this forum

U folks rock :D

Permalink

mc on June 21, 2012 at 8:06 PM:

Thanks heaps, this code was very useful.

Permalink

modest_alpaca on October 10, 2012 at 11:02 PM:

Great Tutorial, honesty and clarity.
You have also saved my life.....either that or the Computer from having a swim in the ocean.

Permalink

amul on December 23, 2012 at 11:03 PM:

Need only loop first and second values
If there multiple unique value ..please help


Thanks

Permalink

Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.

Name:

Email Address:

Website:

Comments:

Check this box if you hate spam.