See other examples too: Creating canonical URNs in Philologist.

Athenaeus in oXygen --- examples and exercises

The XML editor oXygen (which we had to have available for the Leipzig Hackathon in October 2012) offers several very efficient tools and utilities for exploring any TEI XML file. For example, the file of Athenaeus' Deipnosophists.

Here you can see how to use oXygen to:

  • find a set of tags in Athenaeus
  • manipulate this found set (through a simple XSL sheet)

Finding a set of tags in an XML file with oXygen

  1. Get a complete text of Deipnosophists in TEI XML (available as a Google Drive document shared with Hackathon members)
  2. Open it from oXygen. Everything should be OK, i. e. the file should validate without red signals on the right-hand side scrollbar beside the text.
  3. Use the oXygen XPath toolbar, located in the upper left-hand corner above the text. There you can enter any XPath expression. XPath is a language for finding specific parts of XML files (treated as XML, and not only as text). Here is a screenshot of XPath toolbar region:1) oXygen XPath toolbar
  4. Enter the following into the XPath toolbar:
//name

This XPath expression means: find anything, anywhere in the XML file, marked with the name tag.

After the search (reported as XPath - in progress on the bottom of the window), a list of results will be shown as a lower pane. Note that from there you can jump from one result in the XML to another, that you can save the results with a right-hand mouse click, etc.

Try moving between different results. Also examine the “XPath location” column in the results list.

Then we can additionally filter “anything tagged with name”, selecting just the tags that have a type attribute (as in <name type=“person”>). Enter this into the XPath toolbar:

 //name[@type]

To filter further, we can select all name tags with type attribute that has value month. Take care to close all the square brackets, as shown:

 //name[@type[. eq 'month']]

So, how many names of the “month” type are there in Athenaeus' XML?

If we want to find names of another type, we just change what we write after eq, e. g. @type[. eq 'person'] etc.

While you're doing this, you're both mastering XPath and examining the markup in Athenaeus' XML.2)

Extract a found set from an XML file

Once we have found a set of tags (and data) that interests us, we can transform the XML file, e. g. discarding everything else but the interesting set. Let's say we want to find names which are incorrectly tagged in Athenaeus. An examination will show that there are several types of “incorrectness”; one of them is that the names are marked with the tag rs (TEI shorthand for “referring string”, marking any kind of reference to something else) with the type=“nomorph” attribute and value.

As an exercise, use the XPath toolbar to find all rs tags.

To “export” the tagset we have found (i. e. to discard from the XML file everything else), we have to write a set of instructions — a program — that is known as XSL stylesheet. XPath is of great importance for such stylesheets, as it tells the program which tags to include, and which to discard.

The oXygen has everything we need to write XSL and process XML files with it.

The XSL we'll be using is here (it includes comments on key instructions):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:TEI="http://www.tei-c.org/ns/1.0">
    <!-- XSL stylesheet to select only some XML elements, discarding everything else -->
 
    <!-- produce XML as output: -->
    <xsl:output method="xml" omit-xml-declaration="yes"/> 
 
    <!-- select only the rs elements: -->
    <xsl:template match="TEI:rs[@type[. eq 'nomorph']]">
        <!-- copy the text inside the elements: -->
        <xsl:copy-of select="text()"/>
        <!-- add several newlines for easier reading etc: -->
<xsl:text>
</xsl:text><xsl:text>
</xsl:text>
    </xsl:template>
 
    <!-- remove text content of all other XML nodes: -->
    <xsl:template match="text()" />
 
</xsl:stylesheet>

Create an XSL file with oXygen

First, create an XSL file with oXygen:

  1. From the main oXygen menu, select File / New / XSLT Stylesheet. Select “Create”.
  2. Delete the default elements created by oXygen, paste the XSL quoted above
  3. See if it validates OK
  4. Name the file and save it somewhere where you can find it (File / Save)

Create an XSL transformation with oXygen

Now we have to connect the XML file (our Athenaeus) with XSL instructions. See oXygen help on transformation scenarios for more details.

  1. From the XML file window, select Document / Transformation / Configure transformation scenario. Alternatively, click on the wrench (spanner) symbol with a triangle next to it (XSL spanner).
  2. Select New / XML transformation with XSLT.
  3. As “name”, type the name you want for the scenario.
  4. As “XSL URL”, give the (local) address of our XSL file. You can browse for the file through the folder symbol (Folder-local-transformation).
  5. As “Transformer”, select Saxon-EE 9… (for XSL 2.0)
  6. Click OK
  7. Click “Apply associated”. You'll see the message “Transformation in progress”
  8. A lower pane opens where you'll see the results. You can save them as a new file, select them and paste them elsewhere, etc

If everything went well, you should have gotten a long list (how many lines are there?) of Greek words, starting with a capital letter, in order in which they appear in the Deipnosophists.

2) An idiosyncratic selection of XML and XPath tools and information can be found in this collection of bookmarks.
 
z/ath-ox.txt · Last modified: 28. 10. 2012. 11:31 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki