Monday, 1 April 2013

Groovy Xml Series: Querying Xml with GPath

The most common way of querying XML in Groovy is using GPath. The entry from the official page:

"GPath is a path expression language integrated into Groovy which allows parts of nested structured data to be identified. In this sense, it has similar aims and scope as XPath does for XML. The two main places where you use GPath expressions is when dealing with nested POJOs or when dealing with XML"

So it's similar to XPath expressions and you can use it not only with XML but also with POJO classes. Ok, so lets begin.

Given the following xml:

                   Don Xijote       
                   Manuel De Cervantes
                   Catcher in the Rye
                  JD Salinger
                  Alice in Wonderland
                  Lewis Carroll
                  Don Xijote       
                  Manuel De Cervantes

Node's text content

First thing we are going to do is to get a value using POJO's notation. Lets get the first book's author's name (Code is available at Github.).

      def "Using POJO notation: Getting a node using POJOs notation a.b.c"(){
          setup: "Parsing the document"
              def response = new XmlSlurper().parse(xmlFile) 
          when: "Trying to get a given node using the a.b.c notation"
              def authorNode =[0].author
          then: "We can check the author's value"
              authorNode.text() == 'Manuel De Cervantes'

So first we parse the document with XmlSlurper (The xmlFile is a variable of type and the we have to consider the returning value as the root of the XML document, so in this case is "response".

So that's why we start traversing the document from response and then[0].author. Note that in XPath the node arrays starts in [1] instead of [0], but because GPath is Java-based it starts in [0] index.

GPathResult (XmlSlurper) and Node (XmlParser)

In the end we'll have the instance of the "author" node and because we wanted the text inside that node we are going to call the text() method.  The "author" node is an instance of GPathResult type and text() a method giving us the content of that node as a String.

When using GPath with an xml parsed with XmlSlurper we'll have as a result a GPathResult object. GPathResult has many other convenient methods to convert the text inside a node to any other type such as:

  • toInteger()
  • toFloat()
  • toBigInteger()
  • ...
All these methods try to convert an String to a certain type.

If we were using a XML parsed with XmlParser we could be dealing with instances of type Node. But still all the actions applied to GPathResult in these examples could be applied to a Node as well. Creators of both parsers took into account GPath compatibility.

Attribute's content

Next step is to get the some values from a given node's attribute. In the following sample we want to get the first book's author's id. We'll be using two different approaches. Let's see the code first:

         def "Using POJO notation: Getting an attribute's value using POJOs notation a.b.c"(){
          setup: "Parsing the document"
              def response = new XmlSlurper().parse(xmlFile) 
          when: "Trying to get a given node using the a.b.c notation"
              def firstBook =[0]
              def firstAuthorIdNode1 =
              def firstAuthorIdNode2 =['@id']
          then: "Getting the id's value"
              firstAuthorIdNode1.toInteger() == 1
              firstAuthorIdNode2.toInteger() == 1

Again we first parse the document and then using the POJO's notation we get the first book node. Now take a look at the first expression:

I specially like the former type of notation because is more straight forward, and meaningful.  The latter is more like using an instance of a map (which I guess it should be eventually).

Speeding things up: "breadfirst()" and "depthfirst()"

If you ever have used XPath you have been using the expressions like
  • "//" : Look everywhere
  • "/following-sibling::othernode" : Look for a node "othernode" in the same level

More or less we have their conterparts in Gpath with the methods breadfirst() and depthfirst(). The first example shows a simple use of breadfirst(). The creators of this methods created a shorter syntax for it using '*'.

        def "Using '*': Getting a node using breadthFirst operator '*'"(){                                                              
          setup: "Parsing the document"
              def response = new XmlSlurper().parse(xmlFile)
          when: "Looking for the node having the name 'book'"
          and: "with attribute id equals to 2"
           /* You can use the breadthFirst operator to look among a group 
              of nodes at the same level */
              def catcherInTheRye = response.value.books.'*'.find{node-> 
               /* node.@id == 2 could be expressed as node['@id'] == 2 */
         == 'book' && node.@id == '2'
          then: "Getting the author's value"
              catcherInTheRye.title.text() == 'Catcher in the Rye'

This Spock specification looks for any node at the same level as "books" node first, and only if it couldn't find the node we were looking for then it will look deeper in the tree, always taking into account the given the expression inside the closure.

That expression says "Look for any node with a tag name equals 'book' and having an id with a value of '2'".

Today I woke up very lazy and I'd like to look for a given value without caring where it might be. The only thing I know is that I need the id of the author "Lewis Carroll" . How do I do that? using depthFirst()

        def "Using '**': Getting a node using depthFirst operator '**'"(){
          setup: "parsing the document"
              def response = new XmlSlurper().parse(xmlFile)                                                                              
          when: "Using the deptFirst operator we can look for something"
          and: "it doesn't matter how deep the node is"
          and: "Let's say we want to look for the book's id of the book written by Lewis Carrol"
           /* Beware of the name I used for the closure's parameter. It may look like 
              the ** is too smart, but it isn't. It's just that I'm sure only books will 
              match the query. To avoid any confusion I'd rather use 'node' */
              def bookId = response.'**'.find{book->
         == 'Lewis Carroll'
          then: "The bookId should be 3"
              bookId == "3"

Definitely is shorter that using the POJO notation isn't it? depthfirst() is the same as looking something "everywhere in the tree from this point down". In this case we've used the method find(Closure cl) to find just the first occurrence.

What if we want to collect all book's titles?

     def "Using '**': Collecting all titles"(){
          setup: "parsing the document"
              def response = new XmlSlurper().parse(xmlFile)
          when: "Looking for all titles within the document"
              def titles = response.'**'.findAll{node-> == 'title'}*.text()
          then: "There should be only four"
              titles.size() == 4

I've mentioned there are some useful methods that convert a node's value to an integer,float...etc. Those methods could be convenient when doing comparisons like this:

      def "Using findAll: Collecting all titles"(){
          setup: "parsing the document"
              def response = new XmlSlurper().parse(xmlFile)
          when: "Looking for all titles with an id greater than 2"
              def titles ={book->
               /* You can use toInteger() over the GPathResult object */
                  book.@id.toInteger() > 2
          then: "There should be only two"
              titles.size() == 2

In this case the number 2 has been hardcoded but imagine that value could have come from any other source (Gorm id's...etc)



  1. Hi, what if I wanted to put a part of the GPath as a variable?

    def gpathPiece = '**'.find{book-> == 'Lewis Carroll'}.@id

    def bookId = response.gpathPiece //This part is not working for me
    I am trying to use a variable to represent that piece of the GPath, but it doesn't work. Confirmed that when I put that variable's value directly back into the GPath, it works.

    Any suggestions? Thanks

    1. Great Article android based projects

      Java Training in Chennai Project Center in Chennai Java Training in Chennai projects for cse The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

  2. Thanks for the post,Really you given a valuable information on xml.worth to read this type of articles .
    Thank you.
    oracle R12 training

  3. I have read your blog its very attractive and impressive. I like it your blog.

    Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

    Java Online Training Java Online Training JavaEE Training in Chennai Java EE Training in Chennai

  4. This is a nice article here with some useful tips for those who are not used-to comment that frequently. Thanks for this helpful information I agree with all points you have given to us. I will follow all of them.
    Best Devops training in sholinganallur
    Devops training in velachery
    Devops training in annanagar
    Devops training in tambaram

  5. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    Best Devops online Training
    Online DevOps Certification Course - Gangboard

  6. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    Selenium training in Chennai
    Selenium training in Bangalore
    Selenium training in Pune
    Selenium Online training

  7. All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
    python Training in Pune
    python Training in Chennai
    python Training in Bangalore

  8. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
    microsoft azure training in bangalore
    rpa training in bangalore
    best rpa training in bangalore
    rpa online training

  9. Thanks For Sharing The Information The information Shared Is Very valuable Please keep updating us Time Just Went On reading The article Python Online Course AWS Online Course Devops Online Course DataScience Online Course


  10. I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! devops training in chennai | devops training in anna nagar | devops training in omr | devops training in porur | devops training in tambaram | devops training in velachery

  11. This post is so interactive and informative.keep update more information...
    Java Training in Tambaram
    java course in tambaram

  12. Thanks for sharing such an amazing blog! Kindly update more information
    Five Reasons to Use Google Ads
    5 Reasons to Use Google Ads

  13. This post is so interactive and informative.keep update more information...
    DevOps course in Tambaram
    DevOps Training in Chennai

  14. This post is so interactive and informative.keep update more information...
    Salesforce Training in Tambaram
    Salesforce Training in Anna Nagar