====== Add morphological information to aligned text ====== An XQuery to join the XML with sentence aligned in Alpheios' online editor and the XML with morphological information obtained from Perseus Project Morpheus parser via the SoSoL API. **Caveat**: to conform with Perseus treebank XML format, all interpunction in the sentence to be aligned should be separated from the previous word by an added space. ===== Input ===== **What do we need**: an XML file with morphological information (''treebank.xml'') and an XML file with aligned texts (''alignment.xml''). In real life it will be necessary to give precise local address of the files (e. g. ''/home/user1/treebank.xml''). Example of alignment file: Dum paucos dies ad Vesontionem (...) Example of treebank file:
(...) ===== The XQuery ===== Luckily, the XQuery is quite simple. [[http://stackoverflow.com/questions/484192/xquery-multiple-xml-files|Stackoverflow]] helped, as usual. (: add morphological information to aligned texts :) { for $i in doc("alignment.xml")//*:wds[1]/*:w, $p in doc("treebank.xml")//*:sentence/*:word[(count($i/preceding-sibling::*:w) + 1)] return element w { attribute n {$i/@n} , $i/*:text, $i/*:refs, $p } } ===== The result ===== An excerpt here: Dum paucos dies ad (...) ===== Discussion ===== An XML aligned text enriched morphologically enables us to construct e. g. a Moodle question where students can check whether the automatic parsing was correct or not (and the translation will be there to help them); also, this is a step towards [[http://docs.moodle.org/23/en/Embedded_Answers_%28Cloze%29_question_type|Moodle cloze exercises]] in which the lemma is given, and student has to supply the form which is meaningful in the sentence. We'll also think about using the part of speech information (''postag'') from the treebank. And, of course, a fully treebanked file (with dependencies marked) can be used as well; it will enable us to combine syntax, words, translation and grammatical information in exercises.