analytics

Monday, November 21, 2011

Parsing XML Files in Java

How to read XML file in Java – (DOM Parser)

The DOM interface is the easiest XML parser to understand, and use. It parses an entire XML document and loads it into memory, modelling it with Object for easy traversal or manipulation.

Note
DOM Parser is slow and consume a lot of memory if it load a XML document which contains a lot of data. Please consider SAX parser as solution for it, SAX is faster than DOM and use less memory.

DOM Parser Example
A DOM XML parser read below XML file and print out each elements one by one.

File : file.xml




File : ReadXMLFile.java – A Java class to read above XML file.

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
 
public class ReadXMLFile {
 
         public static void main(String argv[]) {
 
           try {
 
                 File fXmlFile = new File("c:\\file.xml");
                 DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
                 DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
                 Document doc = dBuilder.parse(fXmlFile);
                 doc.getDocumentElement().normalize();
 
                 System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
                 NodeList nList = doc.getElementsByTagName("staff");
                 System.out.println("-----------------------");
 
                 for (int temp = 0; temp < nList.getLength(); temp++) {
 
                    Node nNode = nList.item(temp);
                    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
 
                       Element eElement = (Element) nNode;
 
                       System.out.println("First Name : " + getTagValue("firstname", eElement));
                       System.out.println("Last Name : " + getTagValue("lastname", eElement));
                       System.out.println("Nick Name : " + getTagValue("nickname", eElement));
                       System.out.println("Salary : " + getTagValue("salary", eElement));
 
                    }
                 }
           } catch (Exception e) {
                 e.printStackTrace();
           }
  }
 
  private static String getTagValue(String sTag, Element eElement) {
         NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();
 
        Node nValue = (Node) nlList.item(0);
 
         return nValue.getNodeValue();
  }
 
}


Result:

Root element :company
-----------------------
First Name : yong
Last Name : mook kim
Nick Name : mkyong
Salary : 100000
First Name : low
Last Name : yin fong
Nick Name : fong fong
Salary : 200000



How to read XML file in Java – (SAX Parser)


SAX parser is work differently than DOM parser, it neither load any XML document into memory nor create any object representation of the XML document. Instead, the SAX parser use callback function (org.xml.sax.helpers.DefaultHandler) to informs clients of the XML document structure.

Note
SAX Parser is faster and uses less memory than DOM parser.

See following SAX callback methods :
§                          startDocument() and endDocument() – Method called at the start and end of an XML document.
§                          startElement() and endElement() – Method called at the start and end of a document element.
§                          characters() – Method called with the text contents in between the start and end tags of an XML document element.


File : file.xml



File : ReadXMLFile.java – A Java class to read above XML file.

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
 
public class ReadXMLFile {
 
   public static void main(String argv[]) {
 
    try {
 
         SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();
 
         DefaultHandler handler = new DefaultHandler() {
 
         boolean bfname = false;
         boolean blname = false;
         boolean bnname = false;
         boolean bsalary = false;
 
         public void startElement(String uri, String localName,String qName, 
                Attributes attributes) throws SAXException {
 
                 System.out.println("Start Element :" + qName);
 
                 if (qName.equalsIgnoreCase("FIRSTNAME")) {
                          bfname = true;
                 }
 
                 if (qName.equalsIgnoreCase("LASTNAME")) {
                          blname = true;
                 }
 
                 if (qName.equalsIgnoreCase("NICKNAME")) {
                          bnname = true;
                 }
 
                 if (qName.equalsIgnoreCase("SALARY")) {
                          bsalary = true;
                 }
 
         }
 
         public void endElement(String uri, String localName,
                 String qName) throws SAXException {
 
                 System.out.println("End Element :" + qName);
 
         }
 
         public void characters(char ch[], int start, int length) throws SAXException {
 
                 if (bfname) {
                          System.out.println("First Name : " + new String(ch, start, length));
                          bfname = false;
                 }
 
                 if (blname) {
                          System.out.println("Last Name : " + new String(ch, start, length));
                          blname = false;
                 }
 
                 if (bnname) {
                          System.out.println("Nick Name : " + new String(ch, start, length));
                          bnname = false;
                 }
 
                 if (bsalary) {
                          System.out.println("Salary : " + new String(ch, start, length));
                          bsalary = false;
                 }
 
         }
 
     };
 
       saxParser.parse("c:\\file.xml", handler);
 
     } catch (Exception e) {
       e.printStackTrace();
     }
 
   }
 
}


Result:

Start Element :company
Start Element :staff
Start Element :firstname
First Name : yong
End Element :firstname
Start Element :lastname
Last Name : mook kim
End Element :lastname
Start Element :nickname
Nick Name : mkyong
End Element :nickname
Start Element :salary
Salary : 100000
End Element :salary
End Element :staff
Start Element :staff
Start Element :firstname
First Name : low
End Element :firstname
Start Element :lastname
Last Name : yin fong
End Element :lastname
Start Element :nickname
Nick Name : fong fong
End Element :nickname
Start Element :salary
Salary : 200000
End Element :salary
End Element :staff
End Element :company


Warning
This example may encounter exceptions for 
UTF-8 XML file.


if you parse a XML file which contains some special UTF-8 characters, it will prompts “Invalid byte 1 of 1-byte UTF-8 sequence” exception.

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
Invalid byte 1 of 1-byte UTF-8 sequence.

See following xml file which contain a special UTF-8 characters “§”



To fix it, just override the SAX input source like this :
File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
 
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
 
saxParser.parse(is, handler);


No comments:

Post a Comment