Parsing Extremely Large XML Files

October 15, 2018

Parsing Extremely Large XML Files

Question:

I’m trying to read a very large xml file. It’s about 500MB. The content is a list of records. There are about 100,000 nodes with the same tag that contain single records. Is there a limit on the file size or number of nodes that can be processed?

Answer:

There is no limit other than running out of memory. Using a DOM style XML parser for extremely large XML files containing a huge number of elements (nodes) is not a good choice. A DOM style parser (Document Object Mode) is where the entire XML document is loaded into memory and stored in some sort of document object model.

A better choice would be to use a SAX style parser. See https://en.wikipedia.org/wiki/Simple_API_for_XML

Opinion:

In my opinion, a format such as XML should never be used for huge datasets. The original mistake was when the software architect decided to use XML as the data storage format. Repeating the same opening and closing XML tags severely bloats the data and imposes huge memory and database storage requirements that could’ve been avoided.

admin

Parsing Extremely Large XML Files

Blogroll

Tags