Thursday 21 March 2013

Groovy Xml Series: Parsing Xml

During this week I've been reviewing a little bit the way we're dealing with XML in Groovy. My curiosity was caused because some entry in the Groovy mailing list about the use of XmlUtil. Then I decided that I needed to review my knowledge about the Groovy API for handling XML.

First things first. The first thing we need to know to start playing with xml is to parse the xml.  The two ways I've been practicing this week have been XmlSlurper and XmlParser both located in groovy.util package (Which means we don't have to import them).

Both have the same approach to parse an xml, create the instance and then use one of the parse(...) or parseText(String) methods available in both:
 
   def parsedByXmlSlurper = new XmlSlurper().parseText(xmlText)
   def parsedByXmlParser = new XmlParser().parseText(xmlText)

So what is the difference between them?

Well let's see the similarities first:
  1. Both are based on SAX so they both are low memory footprint
  2. Both can update/transform the XML
Differences then:
  1. XmlSlurper evaluates the structure lazily. So if you update the xml you'll have to evaluate the whole tree again.
  2. XmlSlurper returns GPathResult when parsing an xml
  3. XmlParser returns Node objects when parsing an xml
When to use one or the another? Well reading an entry at StackOverflow some ideas came across:
  • If you want to transform an existing document to another then XmlSlurper will be the choice but if you want to update and read at the same time then XmlParser is the choice. The rationale behind this is that every time you create a node with XmlSlurper it won't be available until you parse the document again with another XmlSlurper instance.
  • Need to read just a few nodes XmlSlurper is for you "...I would say that the slurper is faster if you just have to read a few nodes, since it will not have to create a complete structure in memory"
So far my experience is that both classes work pretty the same way. Even the way of using GPath expressions with them are the same (both use breadFirst() and depthFirst() expressions). So I guess it depends on the write/read frequency.

So let's say we have the following document:


 
    def xml = """          
              
                      
                      
                                         
                          Don Xijote       
                          Manuel De Cervantes
                      
                     
                     
          
      """


I've create a couple of Spock specs to parse and do a simple query to the parsed document. First using XmlParser:

def "Parsing an xml from a String"(){
          setup: "The parser"
              def parser = new XmlParser()    
          when: "Parsing the xml"
              def response = parser.parseText(xml)
          then: "Checking the xml's content"
              response.value.books.book[0].title.text() == "Don Xijote"
      }

And then using XmlSlurper:

def "Parsing xml from a String"(){
          setup: "Creating an instanceof XmlSlurper"
              def parser = new XmlSlurper()   
          when: "Parsing the xml as text"
              def responseNode = parser.parseText(xml)
          then: "You can check the xml's content"
              responseNode.value.books.book[0].title.text() == "Don Xijote"
      }

Can you see the difference? None apart from the name of the parser engine. In the next entry I will be writing about inserting, updating and deleting nodes with both XmlSlurper and XmlParser.

Almost forgot it. All the code is at Github if you want to check it out!

No comments:

Post a Comment