Introducing the PBCore Validator

Since writing my blog post on “Common” PBCore Errors, I’ve been toying with the idea of writing a PBCore validator—a tool to automatically verify if an XML file complies with the PBCore standard and best practices with the aim of better ensuring interoperability among PBCore implementations. Such a tool has now been written, and you can check it out at pbcorevalidator.org.

Like the W3C’s HTML Validator, this is designed to be a tool which you can use to check the conformance of your documents with established standards. Some of the PBCore standard, however, is a bit more subjective or poorly-defined than HTML, so the results of the validator are also somewhat subjective.

Using the Validator

To use the validator, visit pbcorevalidator.org. There, you can either upload a file containing a PBCore record or you can paste the contents of a PBCore record into the text area. Click “Validate PBCore”, and you’ll be shown your results.

Types of Results

The PBCore validator performs three types of checks on your document:

XML Validity. The validator ensures that your document is well-formed XML: every tag is closed, there are no improper entities (i.e., spare ampersands), and so on. If your document fails this step, the validator will be unable to continue, and so you will not get any PBCore-specific error messages.
XSD Compliance. The validator next checks your document against the PBCore XML Schema. This is probably the most crucial part of the validity checking: it ensure that all the tags you provide are part of PBCore, that you are not missing any required tags, that every tag is inside the parent tags that the PBCore standard requires, and so on. Note that the errors from the schema checker can be somewhat confusing; if you have trouble interpreting the output, feel free to post in the comments below, and I’ll do my best to translate into English for you.
Best Practices Checks. The validator then checks your document against a number of “PBCore Best Practices.” These include the requirement that some types of values come from the PBCore picklists, a suggestion against putting more than one value in a single tag, and a check for the proper formatting of personal names. This is probably the most nebulous category of ‘error’—you should review the output carefully and ignore it if, in fact, you know what you’re doing.

Known Bugs

There are some known bugs in the validator.

The first is that, as mentioned above, the messages from the XSD compliance-checking step are very difficult to interpret if you’re not familiar with XSD. For example, a missing format identifier tag will be reported instead as an unexpected appearance of whatever tag comes next.

Another issue which may or may not be considered a bug is that the XSD file and thus compliance checking is very strict about the order in which tags appear in the PBCore file. So, for example, if you have a format identifier tag at the end of your instantiation tag, the validator will not see it, because the XSD says that it is supposed to be at the beginning. I’m unaware, however, of any implementations of PBCore readers which actually require this.

A final issue—which may perhaps be considered more of a UI deficiency than a true bug—is that there is no separation in the output among the different classes of error. So a potential best practices violation (which may not be a violation of PBCore at all) is displayed with the same level of severity as an actual unambiguous violation of the PBCore standard.

Potential Controversies

There are a few things I’ve done with the validator which may be controversial. I’d appreciate the feedback of the PBCore community on how to proceed.

First, I’ve had to modify the XSD schema from that provided by PBCore in two different ways. The first change, which I imagine will not be so controversial, was to change all xml:lang="eng" attributes into xml:lang="en", because XML prefers two-letter language codes and my XML schema parser refused to proceed otherwise. The second change was to modify the schema to require certain tags which the text of the PBCore specification states are required but which the schema had marked as optional (minOccurs="0").

The “best practices” section of the validator is almost all open to discussion as to what are actually best practices and as to any best practices which are not yet checked.

How to Help

The validator is written in Ruby and released as Free Software (sometimes called “Open Source Software”), released under the terms of the GNU General Public License, version 3 or later. You can browse or download the source code, and if you know Ruby, feel free to contribute. Even if you don’t know Ruby but can help making the validator less ugly, your help would be much appreciated!

Another way that anyone can help is to run files through the validator, note the output, and make suggestions for improvements or changes in the comments below.

Thanks, and enjoy the validator!

Posted on Monday the 2nd of February 2009 at 11:59 AM

★ 13 notes

pbcore

ruby

software

validator

validation

mlcastle posted this