XPath for Actionscript 2.0
Table of Contents
- Introduction
- Introducing XPath
- XPath for Flash and Actionscript 2.0
- Adobe’s XPathAPI
- XFactorstudio’s XPath
- Conclusion
Introduction
If you are reading this you’ve probably used XML data in your Flash applications before and you are probably bald by now from pulling your hair out. If there is one thing that makes you pull your hair out when programming Actionscript then it’s probably using XML data using the native Flash XML object.
Flash makes it relatively easy for you to load the XML data from a server. To Flash it doesn’t really matter whether the XML data comes from a static XML data file or a XML data stream that’s generated at request time by the server using e.g. a PHP script or some other server side scripting language. Loading XML data into Flash is a relatively simple and straight forward task.
After loading the XML data, it’s usually when the nightmare begins. After the XML is loaded into memory we need to be able to access the data. We need to traverse the XML DOM (Document Object Model) to gain access to the data that we want to use. The XML object makes it possible for us to access this DOM by traversing the nodes of the XML object using built in properties. To traverse the XML DOM we use the XML object properties like firstChild and childNodes. If you have used the XML object before then I’m sure you have created statements that looked something like this:
myXML.childNodes[1].childNodes[3].firstChild.nodeValue;
What a mess! Besides the fact that this statement is completely unreadable, can you tell what data this statement is referring to without dissecting the XML structure first? Every node is assumed to be literally at a hard coded position. Imagine if you would have the following XML data structure loaded in a XML object called xmlBooks:
<?xml version="1.0" encoding="UTF-8" ?>
<books>
<book id='0'>
<author>J.R.R. Tolkien</author>
<title>The Lord of the Rings</title>
<isbn>0618346252</isbn>
<price>34.95</price>
</book>
<book id='1'>
<author>Dan Brown</author>
<title>The Da Vinci Code<title>
<isbn>1400079179</isbn>
<price>17.95</price>
</book>
</books>
If we want to retrieve the author of the second book, The Da Vinci Code, we could use the following statement to access that data using the XML object:
xmlBooks.firstChild.childNodes[1].childNodes[1].firstChild.nodeValue;
What we’re in reality doing here is telling the XML object to get a reference to the firstChild node of the XML object, which is the <books> tag in our data example. Then, from the firstChild node get the second <book> node by indicating the second index in the childNodes array of the first child node (remember that array indexes start at zero, so we need to specify index 1 here to target the second index). Then we target the second node in the <book> tag, which is the <author> tag, and read the nodeValue of the firstChild node of that tag to get the actual data we want.
Did you actually follow all of that? There must be a better way of doing this!
Introducing XPath
I’m sure you’ve worked with files and folders on a computer drive before. Files and folders are usually stored in other folders, this way you can give structure to your drive and prevent your drive from turning into a big mess. If I was to show you the path to a file you’d probably know exactly what file I was talking about:
C:\Program Files\Macromedia\Flash 8\Readme.txt
Here we’re talking about a file called Readme.txt that is located on the C: drive in the Flash 8 program folder. Imagine if we couldn’t specify folders this way but had to use some obscure syntax like this:
myDrive.folders[4].folders[6].folders[2].files[4].name;
Would you understand what file we were talking about? Of course not, and not to imagine what would happen if the file at index 2 got deleted. Would that mean that our file would now be at index 3? If that was true, all our shell scripts, batch files, program code or whatever else that used that file would have to be adjusted just because a file got deleted! This is pretty messy stuff. It looks however, kind of familiar doesn’t it? This way of accessing files is exactly what you are doing when you traverse the XML DOM using the firstChild and childNodes properties of the XML object.
With XPath, on the other hand, you specify a query to retrieve data. This query is actually just a special kind of path, just as you would use to access the files on your drive (hence the name XPath which basically is derived from XML Path). Having another look at our books example from the previous section; if we want to retrieve the name of the author of the second book using an XPath query we could specify a path, or query, that looked something like this:
"/books/book[2]/author"
Now, that looks a lot more civilized, doesn’t it? If you have never seen an XPath query before then the number 2 between the brackets might look odd to you at first if you would expect a 1 to target the second book element. This is because in XPath, arrays are one based indexed not zero based as in Actionscript.
Now, by the time your are comfortable with XPath, you can understand that reading back an XPath query you wrote a couple of months earlier reads a lot more pleasant then those cryptic statements you had to write with the XML object, i.e. XPath gives a lot more context to what you are trying to accomplish. Further more, if for some reason the layout of your XML data structure was to change; your code wouldn’t break when using XPath because XPath doesn’t really care about the order of the nodes, it only cares about the scope, or nesting, of the nodes. Remember that with the XML object you had to target a specific node at a specific location, like index 1 of the childNodes array, to get to the data you want. If your data structure would change, your code would, not necessarily but probably, need to change as well.
Now, I could start writing down a complete tutorial on XPath. Honestly, whole books have been written on the subject and I do not feel I should add another text the size of a book to the literature written about XPath. Instead, I’m going to direct you to an excellent online tutorial on XPath from the W3 Schools just so you can get a feel for XPath and the query syntax it uses.
This excellent tutorial by the W3 Schools is a very good introduction to XPath and an excellent reference as well. For me it’s the first stop whenever I need to know something about XPath. There are of course more in depth texts available out there, but the W3 Schools introduction covers most of the things you likely need when working with XPath.
XPath for Flash and Actionscript 2.0
If you’ve read through the W3 Schools tutorial mentioned in the previous section then you should have a pretty good idea of what XPath is by now and what it can do for you.
When you’re using Flash 8, then XPath comes delivered as a default package. However, the XPath implementation that comes with Flash 8 is not a fully compliant implementation of the XPath specification. You can however, do basic XPath queries with this implementation, and if you just need basic XML parsing then the Adobe XPathAPI is usually good enough.
When you are deciding that you need more power to parse your XML documents then you might consider using XFactorstudio’s XPath implementation. This XPath package is an open source effort by XFactorstudio and can be downloaded from their website free of charge and is not only available for Flash but also for the excellent open source Actionscript 2.0 compiler, MTASC.
In the next two sections we are going to have a look at both API’s. Based on what is discussed you should be able to make a decision on which API to choose for your own applications.
Adobe’s XPathAPI
As mentioned in an earlier section, the Adobe XPathAPI is not a fully compliant implementation of the XPath API specification. It is, however, good enough for usage in most applications and definitely a better choice than the plain old vanilla XML object.
Because the XPathAPI class from Adobe is not a full implementation of the XPath specification, there are a couple of things that need to be done different. E.g. the Adobe XPathAPI doesn’t support the iterator like we used in an earlier section of this text to retrieve the name of the author of a book from the XML data. Even though this seems to be basic XPath functionality, the Adobe XPathAPI doesn’t support it. To overcome limitations like this you usually need to split your query in two separate queries. But, let’s look at what the XPathAPI can do for us, instead of discussing all that it can’t do. I’m sure you will run into limitations when using the XPathAPI and I’m also sure that for most of these limitations you can figure out a workaround.
The XPathAPI interface
The Adobe XPathAPI only has four methods on its public interface of which you’re probably only going to use two in your day to day work. Let’s have a closer look at these methods and how they work.
| Method | Description |
|---|---|
| selectNodeList | Returns a list (array) of nodes that are specified by the query. |
| selectSingleNode | Returns a value of a specified node. |
| setNodeValue | Set the value of a specified node. |
| getEvalString | This method returns the evaluation string for a specified node. |
To use the methods of the XPathAPI class you need to import the XPathAPI class into your script. You do this by using the import statement. After you have loaded an XML document into a XML object you can start using the XPathAPI class. Since all the public methods in the XPathAPI class are static, you don’t need to create an instance of the XPathAPI class.
import mx.xpath.XPathAPI;
var books : XML;
books = new XML();
books.ignoreWhite = true;
books.onLoad = function(success : Boolean) : Void
{
var titles : Array;
titles = XPathAPI.selectNodeList(books.firstChild, "/books/*/title/*");
trace( titles );
};
books.load("books.xml");
The above example will load an XML file named books.xml (which contains the books XML data we used in previous sections). When the file is finished loading it will use an XPath query using the XPathAPI class to retrieve an array holding the titles of all books and outputs the result to the Flash IDE output panel. The total request for the array with the book titles is just a single line of code! Now, this is the sort of power and simplicity you can’t achieve when just using the plain XML object or manual parsing. Welcome to the world of XPath!
In the next couple of sections we’ll discuss the public interface methods of the XPathAPI class. If you need more information on the XPathAPI class then please refer to the Flash help or the Adobe livedocs for more information.
selectNodeList
Syntax:
XPathAPI.selectNodeList(node : XMLNode, query : String) : Array
Description:
The selectNodeList is one of those methods you will use all the time when using the XPathAPI class. The selectNodeList returns an array of elements that are specified by the XPath query you pass to it. We have seen the selectNodeList method in action before in the introduction of the XPathAPI in the previous section where we used the selectNodeList to retrieve the titles of all the book elements in the books example.
Because you will probably use the selectNodeList method the most, it’s probably with this method that you will run into the limitations of the XPath implementation of the XPathAPI the most as well. I will therefore show here a small demonstration of how to over come one of the most common problems using the XPathAPI class; the lack of support for iterators.
As you might remember from a previous section, you can use an iterator to specify a specific node in your XPath query. Taking a look again at our books example:
<?xml version="1.0" encoding="UTF-8" ?>
<books>
<book id='0'>
<author>J.R.R. Tolkien</author>
<title>The Lord of the Rings</title>
<isbn>0618346252</isbn>
<price>34.95</price>
</book>
<book id='1'>
<author>Dan Brown</author>
<title>The Da Vinci Code<title>
<isbn>1400079179</isbn>
<price>17.95</price>
</book>
</books>
If we wanted to explicitly access the second book we could use the XPath query "/books/book[2]" (If you have read the W3 Schools tutorial on XPath pointed out to you earlier then you might remember that XPath indexes start at 1 and not at 0, so we specify index 2).
The problem is however that the Adobe XPathAPI doesn’t support the iterator. There are two ways to over come this problem. The first solution would be to use an id property. Instead of specifying an iterator in the query, we could reference an id attribute of the <book> tag by using the following query; "/books/book[@id='1']". This works fine and it would return an array of one element with a reference to the XMLNode object for the second book. However, what if you have no control over the XML format? Imagine the XML comes from some sort of external RSS feed, and the tag you want to target doesn’t have an id property?
The second solution would then be to cut the query down into two steps. This would also be the preffered way because it would be kind of odd to change your dataset just because of the limitations of an API that you’re using at the moment. To cut the query into two steps (which I’m showing how to do in the next section where we discuss the selectSingleNode method) the first step would be to specify a query to retrieve an array that returns all book elements, something like "/books/book", and the second step would be to target the book element at index 1 of that array (remember, array indexes in Actionscript start at 0) to get to the book element you’re after. It’s a bit more overhead code wise, but in the end it works just fine. Other than that, the context of your code still makes sense and it will remain readable because of XPath.
selectSingleNode
Syntax:
XPathAPI.selectSingleNode(node : XMLNode, query : String) : String
Description:
The selectSingleNode method is one of the methods you will likely to use a lot as well when working with the Adobe XPathAPI class. The selectSingleNode method returns a value of a node instead of an array with a list of nodes, like the selectNodeList method discussed in a previous section.
Also, when using the selectSingleNode method, be aware of the limitations of the Adobe XPathAPI class. Just as with the selectNodeList, the iterator is not supported and on top of this, attributes are also not supported! So you can’t write a query like this; "/books/book[@id='1']/title" to request the title of the second book using the unique id attribute of the book tag.
The selectSingleNode method can in practical terms only be used on very simple queries. Never the less, you will probably use the selectSingleNode method on a regular basis once you have a simple structure to operate on:
var i : Number;
var books : Array;
books = XPathAPI.selectNodeList(books.firstChild, "/books/book");
for(i = 0; i < books.length; i++)
{
trace( XPathAPI.selectSingleNode(books[i].firstChild, "/title/*") );
}
The above example first requests an array of book elements. After that, a for loop is used to iterate through the array of book elements and for every book element found, the title is selected using the selectSingleNode method. As you see, code like this is perfectly readable and even if you didn't know what the XML data structure would look like, you can still understand what is going on and what kind of data the code is operating on.
setNodeValue
Syntax:
XPathAPI.setNodeValue(node : XMLNode, query : String, value : String) : Number
Description:
Whenever you need to change a value in the XML data you could use the setNodeValue method. Also the setNodeValue has it limitations just like the other XPathAPI methods. With the setNodeValue you can only write string values back into already existing nodes, so you can't create new nodes with this method. Technically you can add string values to every node but that is kind of a useless application of this method. Again, just as the other XPathAPI methods, the setNodeValue doesn’t seem to know the iterator nor does it seem able to handle attributes.
var i : Number;
var books : Array;
books = XPathAPI.selectNodeList(books.firstChild, "/books/book");
for(i = 0; i < books.length; i++)
{
XPathAPI.setNodeValue(books[i], "/book/author", "Adrien Lyon");
}
In the above example we use the setNodeValue method to change the name of the author of all book elements. Just as with the selectSingleNode method you first need to bring down the complexity of the XML objects on which you want to work. Only then the setNodeValue method seems to be working correctly.
getEvalString
Syntax:
XPathAPI.getEvalString(node : XMLNode, query : String) : String
Description:
The getEvalString method returns an evaluation string based on an XPath query of what would normally be the statement that you would use with the plain XML object. Did that make sense? Let’s look at an example using our books XML data:
<?xml version="1.0" encoding="UTF-8" ?>
<books>
<book id='0'>
<author>J.R.R. Tolkien</author>
<title>The Lord of the Rings</title>
<isbn>0618346252</isbn>
<price>34.95</price>
</book>
<book id='1'>
<author>Dan Brown</author>
<title>The Da Vinci Code<title>
<isbn>1400079179</isbn>
<price>17.95</price>
</book>
</books>
If we would use the getEvalString method with the following query:
XPath.getEvalString(books.firstChild, "/books/book[@id='0']/author");
Then it would return the following string:
.childNodes.0.childNodes.1.firstChild.nodeValue
This looks very similar to a typical path structure that you would normally use with the XML object. If you look at the path returned you can actually see what it means and trace it back to the title node of the first book. However, using this method I was unable to retrieve the path for the second book using the following query:
XPath.getEvalString(books.firstChild, "/books/book[@id='1']/author");
Notice that the id attribute for the book node is set to 1 instead of 0? This should mean that the query should return an evaluation string to the second book. Instead it returned an evaluation string to the first book, just as the first query did. I assume that this is one of these little quirky things that happen with the XPathAPI class.
XFactorstudio’s XPath
Finally, we've reached the Crème de la Crème of this article. This section will discuss the XPath API provided by XFactorstudio. This implementation of XPath seems the most complete and robust implementation out there for Actionscript 2.0. Besides the fact that it’s the most complete XPath implementation, it isn’t only available for Flash but also for the excellent open source Actionscript 2.0 compiler, MTASC.
All the things that are missing in Adobe's implementation of the XPath specification are implemented in XFactorstudio's XPath API. With the XFactorstudio’s XPath implementation you can use iterators, attributes, functions and much more advanced XPath expressions of what the Adobe XPathAPI is not capable of. So, sit back and enjoy, in the next sections we’re going to discuss XFactorstudio's XPath API.
The XFactorstudio XPath API interface
Besides the fact that XFactorstudio's implementation of XPath is much more complete, it also provides a richer interface. In the XFactorstudio XPath API there are nine public methods available instead of the four that are available in the XPathAPI from Adobe. But, just as the Adobe XPathAPI, in your day to day work you probably will only use a handful of these methods most of the time. The following table holds a list of all the public methods available of the XFactorstudio XPath API that we are going to discuss in this article.
| Method | Description |
|---|---|
| selectNodes | Returns an array of nodes that match the XPath query passed to the selectNodes method. |
| selectSingleNode | Returns the XMLNode object of a single node. |
| selectNodesAsString | Returns the String value of a single node. |
| selectNodesAsNumber | Returns the Number value of a single node. |
| selectNodesAsBoolean | Returns the Boolean value of a single node. |
As you can see the interface methods the XFactorstudio XPath API is fairly complete. Let’s look at a simple example of the XFactorstudio XPath API in action. The example here gives exactly the same result as the example shown in the section where we introduced the Adobe XPathAPI and where we traced the titles of all book elements in the books.xml file using the Adobe XPathAPI implementation.
import com.xfactorstudio.xml.xpath.XPath;
var books : XML;
books = new XML();
books.ignoreWhite = true;
books.onLoad = function(success : Boolean) : Void
{
var titles : Array;
titles = XPath.selectNodesAsString(books, "/books/book/title");
trace( titles );
};
books.load("books.xml");
The above example uses the selectNodesAsString method to retrieve the titles of all the books. We are going to take a look at this method in a later section in more detail. From the example you can see that the Adobe XPathAPI and the XFactorstudio XPath API both have a different approach. Not only is the actual method call different but also the query syntax used to get the result is quite different as well. You can trust on XFactorstudio's query syntax to be true to the actual XPath specifications whereas Adobe's XPathAPI class needs fiddling from time to time to get a satisfying result.
selectNodes
Syntax:
XPath.selectNodes(node : XMLNode, query : String) : Array
Description:
The selectNodes method is one of the methods you will probably be using a lot when using the XFactorstudio XPath API. With the selectNodes method you can select a whole bunch of nodes that match a given query an receive an array of XMLNode objects for further parsing. The best thing about all this is, just as with all the other XFactorstudio XPath API methods, that you can use advanced XPath queries when you’re making your selection. You’re not limited to the basic stuff that the Adobe XPathAPI class has to offer.
So, if we would want to select all the book elements from the XML data as XMLNode objects, we could use a query like this:
XPath.selectNodes(books, "/books/book");
The above example will return an array with all the book elements found under the books node. When using more advanced queries you can imagine that you can get access to almost all data stored in the XML with only a simple expression. If we e.g. want to select all the values of the id attributes of all the book nodes, you could use a query like this:
XPath.selectNodes(books, "/books/book/@id");
As you can see, the selectNodes method is a very powerful method to use. It gives you direct access to a complete dataset stored in the XML.
selectNodesAsString,
selectNodesAsNumber,
selectNodesAsBoolean
Syntax:
XPath.selectNodesAsString(node : XMLNode, query : String) : Array XPath.selectNodesAsNumber(node : XMLNode, query : String) : Array XPath.selectNodesAsBoolean(node : XMLNode, query : String) : Array
Description
Just as the selectNodes returns an array with XMLNode objects, in this section we'll dicuss three methods that are very simlar. The selectNodeAsString, selectNodesAsNumber and the selectNodesAsBoolean all return an array with actual node values. Whereas the selectNodes returned XMLNode objects still need further processing before the data can be used, with any of these three methods you get direct access to the XML data.
selectSingleNode
Syntax:
selectSingleNode(node : XMLNode, path : String) : XMLNode
Description:
With the selectSingleNode you can easily retrieve data stored in XML nodes. The purpose of this method is to target a specific node to retrieve its value. You use this method to gain access to the actual data stored in the XML data structure, rather then to make a selection and to get a result is in the form of a record or array.
Conclusion
Having discussed both API’s we can conclude that both have their pros and cons. A definitive downside of the Adobe XPathAPI is its lack of implementation of the XPath specification. The XFactorstudio XPath API really shines when it comes to correctness and robustness of the XPath expressions you can use. On the other hand, when all you need is a simple XPath parsing package or when you create components to be used within the Adobe Flash environment then the Adobe XPathAPI is definitely the best choice because you eliminate dependencies on external sources. The only real downside of the XFactorstudio XPath API is its shear size. It adds a whopping 14kb to the compressed .SWF instead of the 4kb the Adobe XPath requires.
All in all, it’s needless to say that using XPath when working with XML data in Flash, or even outside Flash, is a definitive pro; it’s up to you however, to decide which of these two API’s make the best choice for your application.