What is New in 2.11
Version 2.11, simultaneously available in C, Java, C++, and C#, is the latest release of VTD-XML. So what is new? The shortly answer: (1) It is more standards-compliant by conforming strictly to XPath 1.0 spec’s notion of node(). (2) It introduces major performance improvement for XPath expressions involving simple position index.(3)This release introduces major performance improvement for XPath expression containing complex predicates involving absolute location path expressions. (4) It also contains various bug releases as reported by VTD-XML users.
Change to Node() Interpretation
Before 2.11, node() in a location step in an XPath expression will be interpreted as equivalent to *, i.e., an element node with any name. With 2.11 the same node() will be interpreted either one of “element(), text(), comment(), or processing-instruction(), as defined by XPath 1.0 spec.
Performance Improvement for Simple Position Index
A quick example is “a[2]/b[1].” A simple position index is basically a constant index value in predicate. 2.11’s XPath engine is now smart enough to detect this use case and allow for early escaping from the execution loop, resulting in faster execution performance. The amount of improvement depends on how frequent the simple index is used in each location step. In some cases, a 50% to 70% execution speedup is possible.
Performance Improvement for Predicates Containing Absolute Path Expressions
A quick example is //a[//abc/@val=’1′]. Notice that predicate contains //abc, which is an absolute path expression. Before 2.11, this expression will trigger repetitive evaluation of //abc to determine whether the predicate is true or false. The processing cost would increase rapidly with respect to the size of the document. This release would intelligently cache the evaluation result so the corresponding XPath is evaluated only once. Please notice that this feature is enabled by default, if you can turn it off (we don’t recommend it) by invoking AutoPilot’s enableCaching’s method and give it a “false.”
How much of an improvement can you expect to see? Depending on size of documents, complexity of predicates and other things. Sometime you will be achieve astonishing results. Consider the following expression.
//CDResults[../../../TargetName/@Value=”//SiteInformation[“TargetName/@Value!=//SiteInformation[1]/TargetName/@Value and TargetName/@Value!=//SiteInformation[TargetName/@Value!=//SiteInformation[1]/TargetName/@Value”][1]/TargetName/@Value][1]/TargetName/@Value]/BottomCD/@Value
Running this document on a 22MB xml document in Java would take many hours in virtually all XPath implementation including 2.10 version of VTD-XML. With 2.11, it took less than 5 seconds on a commodity, 3 year old PC.
Bug fixes
There are other bug fixes, covering XMLModifier’s deletion capabilities and permissiveness of deletion of sub-nodes.
Hi,
During my testing, I have learned that AutoPilot does not support Xpath that starts with predicate. Technically, the “[“. Is this true?
Thanks,
it seems not a valid xpath
Hi,
I need to parse and modify an XML that has attribute values with lengths longer than 2^30 -1 (token max length). Now the VTDGen throws “Token Length Error: Attr val too long”.
Is there any way that I can parse this XML? I saw some comments on the VTD official site that one can “add another 32 bit if 64 bits are not enough”. How is this possible?
Or is it possible that the attribute value to be stored in multiple tokens (like it is done for text)?
Thanks.
I am sorry to hear that but it is a limitation that you have to deal with ( I am referring to token length for attr value being limited to 2^20-1…)
Hi jimmyzhang,
Thanks for your answer, so there is no other workaround this limitation? Like using multiple tokens for an attribute value? Or using a 96 bit token?
Thanks.
It won’t be an easy fix… you don’t seem to use attr the right way: attr val should never to very long.
The xmls are not created by me, I just need to parse them and modify them. I have to deal with them as they are. Please let me know if any solution comes up.
Thanks for the help.
I seams that VTDNav.toNormalizedString can not handle tags that are empty self closed. See this examle:
http://www2.freefarm.se/EmptyTagTest.java
Will you do a bug fix for that, or is there a work around?
can you send a code sample to jzhang@ximpleware.com?