Skip to main content

Article including examples on How to Query XML files using VBScript.

---
title: Query XML Files Using VBScript
subtitle: The Microsoft Scripting Guys
author: Greg Stemp and Jean Ross
date: October 1, 2008
source: https://technet.microsoft.com/en-us/library/2008.10.heyscriptingguy.aspx
---

The famous Greek philosopher Socrates (who died in the year 399 BC, shortly
before the Scripting Editor was born) is perhaps best known for saying, "An
unexamined life is not worth living." Most people are familiar with this saying.
Remarkably enough, however, the Scripting Guys recently discovered that Socrates
never said that "an unexamined life is not worth living." As it turns out, that
was a mistranslation made by a medieval scribe hundreds of years ago. Instead,
what Socrates actually said was this: "An unexamined XML file is not worth
having."

Now there's a philosophical saying that actually makes sense! XML is becoming
more and more popular these days; a quick search of one of the Scripting Guys'
test computers uncovered more than 500 XML files used by various systems or
applications.

Needless to say, a lot of very important data is now stored in XML format. And
that's fine. Unless, of course, that data goes unexamined. And, sadly, that's
often the case, if for no other reason than this: many people don't have any
idea how to even go about examining an XML file. In particular, they have no
idea how to query and search that XML file.

Note If you're not up on your Greek philosophers, Socrates was a big proponent
of going around talking about things but not all that big on going around doing
things; in fact, his wife, Xanthippe, referred to him as a "worthless idler."
You can see why the Scripting Guys have a soft spot in their hearts for
Socrates.

Of course, some of you who are reading this might be thinking, "Wait a second:
didn't the Scripting Guys already discuss searching an XML file in a previous
column ("Chasing Cars... and XML" at [technet.microsoft.com/magazine/cc162506])? In
that case, why are they revisiting that topic now; are the Scripting Guys so
lazy that they're willing to write the same column over and over again?"

Believe it or not, we're even lazier than that. But that's not the primary
reason we're revisiting this topic. In our previous column, we talked about
working with an XML file that was structured like the code in **Figure 1**, with
each and every property value representing a single node in the file.

## Figure 1 XML file with nodes only

```xml
<?xml version='1.0'?>
  <INVENTORY>
    <COMPUTER>
      <os>Windows XP</os>
      <department>Human Resources
      </department>
      <name>atl-ws-001</name>
    </COMPUTER>
    <COMPUTER>
      <os>Windows XP</os>
      <department>Finance</department>
      <name>atl-ws-002</name>
    </COMPUTER>
    <COMPUTER>
      <os>Windows Vista</os>
      <department>Finance</department>
      <name>atl-ws-003</name>
    </COMPUTER>
  </INVENTORY>
```

That's nice, except---as people were quick to write in and point out---that's not
the only way that XML files can be structured. In the preceding example, each
computer in the file has its own node; in turn, each of those nodes has a number
of child nodes (os, department, and name). However, it's also possible to
construct an XML file in which the individual nodes don't have any child nodes;
instead, additional property values are configured as attributes. You know, like
the file shown in **Figure 2**.

## Figure 2 XML file with attributes

```xml
<?xml version='1.0'?>
  <HARDWARE>
      <COMPUTER os="Windows XP" department="Human Resources">atl-ws-001</COMPUTER>
      <COMPUTER os="Windows XP" department="Finance">atl-ws-002</COMPUTER>
      <COMPUTER os="Windows Server 2003" department="IT">atl-fs-003</COMPUTER>
      <COMPUTER os="Windows Vista" department="IT">atl-ws-004</COMPUTER>
      <COMPUTER os="Windows Vista" department="Human Resources">atl-ws-005</COMPUTER>
      <COMPUTER os="Windows Vista" department="Finance">atl-ws-006</COMPUTER>
      <COMPUTER os="Windows XP" department="Sales">atl-ws-007</COMPUTER>
      <COMPUTER os="Windows Server 2008" department="IT">atl-fs-008</COMPUTER>
      <COMPUTER os="Windows XP" department="Human Resources">atl-ws-009</COMPUTER>
      <COMPUTER os="Windows Vista" department="Sales">atl-ws-010</COMPUTER>
  </HARDWARE>
```

Is that a problem? As a matter of fact, it is. The script we showed you a long
time ago, in an issue far, far away, won't work with an XML file structured like
the one shown in **Figure 2**. It's just not going to happen. And that is
definitely a problem because many of you need to be able to read this type of
XML file.

All you had to do was ask. Well, that and also wait a year and a half before we
finally got around to addressing this issue.

Before we go much further, let's take a closer look at our XML file. In this
case, we have a file with a primary node named HARDWARE; each individual
computer exists as a child node of HARDWARE. In addition to this, each of these
nodes has a pair of attributes: os (used to store the name of the operating
system installed on the computer) and department (used to store the name of the
department that owns the computer).

Suppose, as an example, that we would like to get a list of all the computers
that are running Windows XP. Can we do that using a script? Of course we can:

```vbscript
Set xmlDoc = CreateObject("Microsoft.XMLDOM")

xmlDoc.Async = "False"
xmlDoc.Load("HARDWARE.xml")

Set colNodes=xmlDoc.selectNodes ("//HARDWARE/COMPUTER[@os='Windows XP']")

For Each objNode in colNodes
    Wscript.Echo objNode.Text
Next
```

Let's see what we have here. To begin with, we create an instance of the
Microsoft.XMLDOM object; as the name implies, this is the object that enables us
to work with XML files. Once we've created the object, we next set the Async
property to False; this allows the script know that we want to load the document
synchronously rather than asynchronously. Why does the script care about that?
To be honest, it probably doesn't; after all, scripts (like Scripting Guys) are
inanimate objects.

However, you should care. If we loaded the document asynchronously, the script
would continue to run even if the document had not been fully loaded. That's not
good: imagine the problems that might occur should you try to perform a task on
a document that doesn't exist yet. Loading an XML file synchronously guarantees
that the file will be fully loaded before our script proceeds.

By amazing coincidence, we're now at the point where we load our XML file.
That's something we do by calling the Load method and opening the file
`C:\Scripts\HARDWARE.xml`, which brings us to this line of code:

```vbscript
Set colNodes=xmlDoc.selectNodes ("//HARDWARE/COMPUTER[@os='Windows XP']")
```

What we're doing here is we're using the SelectNodes method to query the XML
file and to tell the script which XML nodes we want to retrieve. Notice the
syntax, which, admittedly, is a little unusual.

To begin with, we need to specify the path within the XML file. Our XML file has
a top-level node named HARDWARE, followed by a series of second-level nodes
named COMPUTER. Each of these second-level nodes represents a single record in
our XML data file. That's why our selectNodes query starts out like this:

```vbscript
//HARDWARE/COMPUTER
```

At that point we encounter this construction:

```vbscript
[@os='Windows XP']
```

This portion of the query is similar to a WHERE clause in a standard SQL query.
With SQL we might use a database query similar to this to retrieve a list of all
the computers running the Windows XP operating system:

```sql
SELECT Name FROM Hardware
WHERE OS = 'Windows XP'
```

With the SelectNodes method we're doing something similar: we're asking to get
back a list of all the computers where the os attribute (@os) is equal to
Windows XP. And to indicate that this is a WHERE clause, we enclose the entire
clause in square brackets.

Note This type of query is known as an XPath query. To find out more about
XPath, look at [msdn.microsoft.com/library/ms256115.aspx].

Like we said, it's a little weird, but it works. What if we wanted to get a list
of all the computers that belong to the Finance department? That's no big deal;
our call to selectNodes would look like this:

```vbscript
Set colNodes=xmlDoc.selectNodes ("//HARDWARE/COMPUTER" & "[@department='Finance']")
```

Again, our WHERE clause is enclosed in square brackets, we preface the attribute
name (department) with the at sign (@), and then we indicate the value we're
interested in.

Simple, huh?

After we have called the selectNodes method, we can then echo back the computer
names simply by looping through the collection of returned values and echoing
back the Text property, as shown in the following:

```vbscript
For Each objNode in colNodes
    Wscript.Echo objNode.Text
Next
```

And what sort of information can we expect to get back? To tell you the truth,
we fully expect to get back information like this:

```
atl-ws-001
atl-ws-002
atl-ws-007
atl-ws-009
```

In other words, it returns the names of all the computers that are still running
Windows XP.

Incidentally, you're not limited to reporting back just the computer name;
because the computer name is the "default" value for each node, that's simply
what we get back if we reference the Text property.

Alternatively, we could specify exactly which attribute values we want returned.
For example, take a look at this modified For Each loop:

```vbscript
For Each objNode in colNodes
    Wscript.Echo objNode.Text
    Wscript.Echo objNode.Attributes. _
        getNamedItem("department").Text
    Wscript.Echo
Next
```

As you can see, in this loop we're still echoing back the value of the Text
property; however, we've also tacked on this line of code:

```vbscript
Wscript.Echo objNode.Attributes. _
    getNamedItem("department").Text
```

In this case, we're using the get­NamedItem method to retrieve the value of the
department attribute; we then echo back the Text property for that attribute.
That's how we can specify which attribute values we do (and do not) echo back to
the screen. (Note too that we added on a `Wscript.Echo` command to put a blank
line between records.) When we run this script, we should get back the
following:

```
atl-ws-001
Human Resources

atl-ws-002
Finance

atl-ws-007
Sales

atl-ws-009
Human Resources
```

And yes, you can write more complex queries if you choose. For example, suppose
you'd like a list of all the computers running either Windows XP or Windows
Vista. In SQL, you'd do that by writing an OR query. Guess what? You do the
exact same thing when querying an XML file:

```vbscript
Set colNodes=xmlDoc.selectNodes _
    ("//HARDWARE/COMPUTER" & _
     "[@os='Windows XP' or " & _
     "@os='Windows Vista']")
```

See what we did here? Inside a single set of square brackets we provided two
criteria: `@os='Windows XP' or @os='Windows Vista'`. That's all you have to do,
and it's going to give us back a report that looks like **Figure 3**.

## Figure 3 Result of OR query

```
atl-ws-001
Human Resources

atl-ws-002
Finance

atl-ws-004
IT

atl-ws-005
Human Resources

atl-ws-006
Finance

atl-ws-007
Sales

atl-ws-009
Human Resources

atl-ws-010
Sales
```

If you double-check the XML file, you'll see that the computers running Windows
XP or Windows Vista are included in the output; computers running Windows Server
2003 or Windows Server 2008 are not included in the output. Nor should they be.

You have a question? You'd like to write a more restricted query, one that
returns only those computers that are running Windows XP and belong to the
Finance department? As you might have guessed, that's easy; all you have to do
is write an AND query like this:

```vbscript
Set colNodes=xmlDoc.selectNodes _
    ("//HARDWARE/COMPUTER" & _
     "[@os='Windows XP' and " & _
     "@department='Finance']")
```

Doing that will then return the following dataset:

```
atl-ws-002
Finance
```

Why just one item in our report? You got it: because the Finance department has
only a single computer that's running Windows XP.

We hope we've helped all of you to deal with this particular type of XML file.
If it turns out that there's yet another type of XML file lurking out there,
well, good luck with that.

Because we started this month's column off with a famous saying, we thought it
would be fitting to end the column on a similar note. With that, we'd like to
close with the words of Kaiser Wilhelm II, words we Scripting Guys hold dear: "I
herewith renounce for all time claims to the throne of Prussia and to the German
Imperial throne connected therewith."

[technet.microsoft.com/magazine/cc162506]: https://technet.microsoft.com/magazine/cc162506
[msdn.microsoft.com/library/ms256115.aspx]: https://msdn.microsoft.com/library/ms256115.aspx