How to use DOM Parser to traverse / parse a XML file using recursion in Java

 Let us start with an example. The following is the sample XML document to represent the Books details. 
<?xml version="1.0" encoding="UTF-8"?>
<books>
   <book id=“954”>
      <title>Effective Java</title>
      <author>Joshua Bloch</author>
      <year> 2009 </year>
   </book>
   <book id=“777”>
      <title>Effective Java</title>
      <author>Scott Meyers</author>
      <year> 2010 </year>
   </book>
</books>


Tree Representation of above XML Books 


XML parser checks syntax. DOM parser builds a data tree in memory. XML document can have an optional Document Type Definition (DTD), called Schema which defines the XML document structure. If the  XML document adheres to the structure of the DTD , then it is valid .

Now let us see how to access and use an XML document through the Java programming language. There are many ways.
1. Through parsers using the API Java API for XML Processing (JAXP) 
two parsers are provided with the above API .
        i) Simple API for XML (SAX)        ii) Document Object Model (DOM).

2. Through the new API Java Architecture for XML Binding (JAXB).
3. Using JDOM an open-source API 
4. Using Apache Xerces 

In our tutorial , we are going to parse the above books and Employees XML files using DOM Parser.
Java developers can make use of DOM parser in an application through the JAXP API . DOM parser creates a tree of objects that represents the data in the whole document and puts the tree in memory. Now the program can traverse the tree , to access / modify the data.

Steps to Parse XML file using DOM Parser:-

1.Create a document factory for DOM methods to obtain a parser that produces DOM object trees from XML documents.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); which creates a new factory instance.

2. Create document builder to obtain DOM Document instances from an XML document.
DocumentBuilder db = dbf.newDocumentBuilder(); which creates a new instance of a DocumentBuilder . XML can be parsed using the instance of this class.

3. Get the DOM Document object by parsing the content of the given XML file as an XML document
Document dom = db.parse(books); which returns DOM object where books is an XML file. Other input sources accepted by parse method are InputStreams, Files, URLs, and SAX InputSources.

4. Access / manipulate the XML document using various methods

Some of the useful methods to get nodelist , elements , node , node value are as follows:-

To get the node list of all the elements from the document by giving the tag name .
NodeList nodes = dom.getElementsByTagName(“book”); where book is the tag name. Nodes are returned by traversing Document tree by preorder traversal

To get the single node item from the above nodelist . The items in the NodeList are accessible through index, starting from 0.
Node node = nodes.item(index);

To get the element node of the given node .
Element element = (Element) node;

To get all child nodes of the above element for a particular tag
NodeList nodes = element.getElementsByTagName(“title”).item(0).getChildNodes(); – where title is the tag.

To get the children of root node .
Element root = doc.getDocumentElement(); // gets the root node.
NodeList children = root.getChildNodes(); 
// returns the children of root.

To get at node information , getNodeName() , getNodeValue() can be used .

Tree struture can be traversed using recursion easily . The following code traverses the entire Book XML document and prints all node names and values if exist using recursion . This code can be used to traverse any XML document by changing the XML file name.


import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.*;
public class DOMParser1 {

public static void main(String args[]) {
try {

File books = new File("books.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(books);
doc.getDocumentElement().normalize();
Element root = doc.getDocumentElement(); // gets the root element
bookxml_traverse_DOM(root);

} catch (Exception ex) {
ex.printStackTrace();
}
}

private static void bookxml_traverse_DOM(Node element)
{

System.out.println(element.getNodeName()+" = "+element.getNodeValue());

for (Node child = element.getFirstChild(); child != null; child = child.getNextSibling()) {
bookxml_traverse_DOM(child);

}
}

}

The following code (partly) can be used to find the value (text node) of the tags title , author , year of books.xml


NodeList bookNodes = doc.getElementsByTagName("book"); // all book nodes
for (int i = 0; i < bookNodes.getLength(); i++)

{
Node bookNode = bookNodes.item(i);

if (bookNode.getNodeType() == Node.ELEMENT_NODE)

{
Element element = (Element) bookNode;
System.out.println("Book Title: " + getTextNode("title", element));
System.out.println("Author Name: " + getTextNode("author", element));
System.out.println("Year of Publishing: " + getTextNode("year", element));
}
}

<b>//getTextNode method</b>

private static String getTextNode(String tag, Element element)

{
NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue(); // returns the only one text node value.
}

Output:

Reference : http://docs.oracle.com/javase/tutorial/jaxp/

 http://docs.oracle.com/javase/tutorial/jaxp/dom/readingXML.html

Leave a Reply