See other examples too: Creating canonical URNs in Philologist.
The XML editor oXygen (which we had to have available for the Leipzig Hackathon in October 2012) offers several very efficient tools and utilities for exploring any TEI XML file. For example, the file of Athenaeus' Deipnosophists.
Here you can see how to use oXygen to:
//name
This XPath expression means: find anything, anywhere in the XML file, marked with the name
tag.
After the search (reported as XPath - in progress on the bottom of the window), a list of results will be shown as a lower pane. Note that from there you can jump from one result in the XML to another, that you can save the results with a right-hand mouse click, etc.
Try moving between different results. Also examine the “XPath location” column in the results list.
Then we can additionally filter “anything tagged with name
”, selecting just the tags that have a type
attribute (as in <name type=“person”>
). Enter this into the XPath toolbar:
//name[@type]
To filter further, we can select all name
tags with type
attribute that has value month
. Take care to close all the square brackets, as shown:
//name[@type[. eq 'month']]
So, how many names of the “month” type are there in Athenaeus' XML?
If we want to find names of another type, we just change what we write after eq
, e. g. @type[. eq
'person']
etc.
While you're doing this, you're both mastering XPath and examining the markup in Athenaeus' XML.2)
Once we have found a set of tags (and data) that interests us, we can transform the XML file, e. g. discarding everything else but the interesting set. Let's say we want to find names which are incorrectly tagged in Athenaeus. An examination will show that there are several types of “incorrectness”; one of them is that the names are marked with the tag rs
(TEI shorthand for “referring string”, marking any kind of reference to something else) with the type=“nomorph”
attribute and value.
As an exercise, use the XPath toolbar to find all rs
tags.
To “export” the tagset we have found (i. e. to discard from the XML file everything else), we have to write a set of instructions — a program — that is known as XSL stylesheet. XPath is of great importance for such stylesheets, as it tells the program which tags to include, and which to discard.
The oXygen has everything we need to write XSL and process XML files with it.
The XSL we'll be using is here (it includes comments on key instructions):
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:TEI="http://www.tei-c.org/ns/1.0"> <!-- XSL stylesheet to select only some XML elements, discarding everything else --> <!-- produce XML as output: --> <xsl:output method="xml" omit-xml-declaration="yes"/> <!-- select only the rs elements: --> <xsl:template match="TEI:rs[@type[. eq 'nomorph']]"> <!-- copy the text inside the elements: --> <xsl:copy-of select="text()"/> <!-- add several newlines for easier reading etc: --> <xsl:text> </xsl:text><xsl:text> </xsl:text> </xsl:template> <!-- remove text content of all other XML nodes: --> <xsl:template match="text()" /> </xsl:stylesheet>
First, create an XSL file with oXygen:
Now we have to connect the XML file (our Athenaeus) with XSL instructions. See oXygen help on transformation scenarios for more details.
If everything went well, you should have gotten a long list (how many lines are there?) of Greek words, starting with a capital letter, in order in which they appear in the Deipnosophists.