Introducing the PBCore Validator

Since writing my blog post on “Common” PBCore Errors, I’ve been toying with the idea of writing a PBCore validator—a tool to automatically verify if an XML file complies with the PBCore standard and best practices with the aim of better ensuring interoperability among PBCore implementations. Such a tool has now been written, and you can check it out at pbcorevalidator.org.

Like the W3C’s HTML Validator, this is designed to be a tool which you can use to check the conformance of your documents with established standards. Some of the PBCore standard, however, is a bit more subjective or poorly-defined than HTML, so the results of the validator are also somewhat subjective.

Using the Validator

To use the validator, visit pbcorevalidator.org. There, you can either upload a file containing a PBCore record or you can paste the contents of a PBCore record into the text area. Click “Validate PBCore”, and you’ll be shown your results.

Types of Results

The PBCore validator performs three types of checks on your document:

Known Bugs

There are some known bugs in the validator.

The first is that, as mentioned above, the messages from the XSD compliance-checking step are very difficult to interpret if you’re not familiar with XSD. For example, a missing format identifier tag will be reported instead as an unexpected appearance of whatever tag comes next.

Another issue which may or may not be considered a bug is that the XSD file and thus compliance checking is very strict about the order in which tags appear in the PBCore file. So, for example, if you have a format identifier tag at the end of your instantiation tag, the validator will not see it, because the XSD says that it is supposed to be at the beginning. I’m unaware, however, of any implementations of PBCore readers which actually require this.

A final issue—which may perhaps be considered more of a UI deficiency than a true bug—is that there is no separation in the output among the different classes of error. So a potential best practices violation (which may not be a violation of PBCore at all) is displayed with the same level of severity as an actual unambiguous violation of the PBCore standard.

Potential Controversies

There are a few things I’ve done with the validator which may be controversial. I’d appreciate the feedback of the PBCore community on how to proceed.

First, I’ve had to modify the XSD schema from that provided by PBCore in two different ways. The first change, which I imagine will not be so controversial, was to change all xml:lang="eng" attributes into xml:lang="en", because XML prefers two-letter language codes and my XML schema parser refused to proceed otherwise. The second change was to modify the schema to require certain tags which the text of the PBCore specification states are required but which the schema had marked as optional (minOccurs="0").

The “best practices” section of the validator is almost all open to discussion as to what are actually best practices and as to any best practices which are not yet checked.

How to Help

The validator is written in Ruby and released as Free Software (sometimes called “Open Source Software”), released under the terms of the GNU General Public License, version 3 or later. You can browse or download the source code, and if you know Ruby, feel free to contribute. Even if you don’t know Ruby but can help making the validator less ugly, your help would be much appreciated!

Another way that anyone can help is to run files through the validator, note the output, and make suggestions for improvements or changes in the comments below.

Thanks, and enjoy the validator!


blog comments powered by Disqus