Perseus Annis environment enables syntactical searches in annotated Greek and Latin texts. However, the query syntax is neither simple nor self-evident.
For the patient (or impatient?), the Annis query language syntax — applied to another set of corpora — is here: ANNIS2 --- Search and Visualization in Multilevel Linguistic Corpora.
Others can read on here to try out my recipes for finding things in Perseus Annis' Greek and Latin corpora.
Syntactic annotation documentation for Latin is here: Guidelines for the Syntactic Annotation of Latin Treebanks
Here is how I searched for nouns in nominative.
case="nominative" & POS="noun" & #1 _=_ #2
There turn out to be 212 annotated Cicero's nominatives. It seems that the clause & #1 _=_ #2
is obligatory (tried first without it, to no avail), and it seems to mean that the first and the second condition both apply to the same word.
There are 105 participles in accusative:
case="accusative" & POS="participle" & #1 _=_ #2
Find all verbs modified by adverbs.
POS="verb" & POS="adverb" & #1 ->parent #2
In a syntactic tree, the verb (element #1) is “parent” of the adverb (element #2).
A nice variation:
POS="adjective" & POS="adverb" & #1 ->parent #2
Is there a noun governing an adverb?
Find any word (form
) which, in the sentence's tree, is subject of the verb:
form & POS="verb" & #2 ->parent[relation="SBJ"] #1
Discussion: first condition — find any form, find any verb. &
connects. Select element number 2 if it is the parent in node (e. g. if it is connected with) element number 1, on the condition that their relationship is “subject” SBJ
. The expression finds elements regardless of how many other words are between them (but in the same sentence).
Caesar has 99 such cases, Plato 252. On corpora larger than 10,000 tokens I get a timeout.
Find any nominative which is subject of the verb (not rocket-science syntactic experiment, I know):
case="nominative" & POS="verb" & #2 ->parent[relation="SBJ"] #1
Plato has 178 results (on 6097 tokens in the corpus).
Find any participle which is subject of the verb — you get the idea:
POS="participle" & POS="verb" & #2 ->parent[relation="SBJ"] #1
Well, the Plato corpus contains 13 such cases. And quite thorny, at that — have to figure out how to deal with predicative expressions.
1. Find subjects in nominative
POS="noun" & case="nominative" & POS="verb" & #1 _=_ #2 & #3 ->parent[relation="SBJ"] #2
1.a Find subjects in nominative, predicates in indicative
POS="noun" & case="nominative" & POS="verb" & mood="indicative" & #1 _=_ #2 & #3 = #4 & #3 ->parent[relation="SBJ"] #2
2. Find SPO structures with direct object in accusative, predicate in indicative, subject in nominative
POS="noun" & case="nominative" & POS="verb" & mood="indicative" & POS="noun" & case="accusative" & #1 _=_ #2 & #3 _=_ #4 & #5 _=_ #6 & #3 ->parent[relation="SBJ"] #2 & #3 ->parent[relation="OBJ"] #5
Tip of the day — check out the “Arch Dependency” tab beneath each result, they're great and useful.
The following Annis query:
form & LEMMA="sum" & #2 ->parent[relation="PNOM"] #1
finds sentences of type “Sapientes beati sunt”.
The query:
form & form & form & #1 ->parent[relation="SBJ"] #2 & #2 ->parent[relation="SBJ"] #3
finds sentences such as fuere qui crederent. We can make the confusing point (verb as SBJ) even more prominent:
POS="verb" & POS="verb" & form & #1 ->parent[relation="SBJ"] #2 & #2 ->parent[relation="SBJ"] #3
Greek has to be entered in Unicode, with accents. This query for the form
δικάζου won't produce any results on the Plato corpus:
form="δικαζου"
Betacode doesn't work either:
form="dika/zou"
This search, however, finds one occurrence:
form="δικάζου"
Search for all forms of δικάζω (two in the Plato corpus):
LEMMA="δικάζω"
Find only the participles of δικάζω (there is exactly one — I proudly use what I already learned):
LEMMA="δικάζω" & POS="participle" & #1 _=_ #2
This one (with the operator =
instead of _=_
) produces the same result in this context. Should read up on Annis operators.
LEMMA="δικάζω" & POS="participle" & #1 = #2
We want to find phrases of type φίλοι γάρ εἰσιν.
case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #2 . #3
This search finds 8 results in the Aeschylus corpus.
It seems that an Annis query must be written in pairs (#1 . #2 & #2 . #3
) – the version #1 . #2 . #3
is not valid.
Find phrases like the one above, but with nominative as the subject:
case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #3 ->parent[relation="SBJ"] #1
Plato corpus – 1 result, in others I get a timeout.
Find attributes and governing nouns (or whatever):
form & form & #2 ->parent[relation="ATR"] #1
786 results in Plato corpus, including phrases such as ἐν Λυκείῳ.
The other way around:
form & form & #1 ->parent[relation="ATR"] #2
Produces 786 results as well, but in different order (ἐμὸς πατήρ comes first now).
Find all attributive phrases with πατήρ:
form & LEMMA="πατήρ" & #2 ->parent[relation="ATR"] #1
20 results in Plato. ἐμὸς πατήρ is one, ὁ (ἐμὸς) πατήρ another. (Should be able to get multiple attributes?)
While most of the categories offered by Perseus Annis seem familiar from classroom, the strangely named “arity” operator is something else. It is a “meta-operator” which, given the “arity number”, selects only search terms that govern exactly so many other words and sentence elements.
E. g. to find all verbs governing four other elements:
POS="verb" & #1:arity=4
In the Cicero corpus there are ninety such situations. By studying the Arch Dependency diagrams, you'll discover that a comma can also be governed by the given element.
Now, one of verbs governing four elements is “interficio”. If we want to concentrate on forms of interficio governing four elements, we do it like this:
LEMMA="interficio" & #1:arity=4
Can you decode what is found by this search? (If not, try pasting it into Annis search interface!)
POS="noun" & #1:arity=4
Our colleague Šime Demo (Croatian Studies, University of Zagreb) thought of a beautiful search:
POS="adjective" & POS="preposition" & POS="noun" & #3 ->parent #1 & #2 -> parent #3 & #1 .* #2 & #2 .* #3
This searches for prepositional phrases of the type “magnis in periculis”, i. e. with preposition interposed (adjective – preposition – noun). On the currently available Latin corpus, the phrase sharply distinguishes prose (Caesar, Cicero, Sallust, Petronius) from poetry (Propertius, Vergil). Lots of it in poetry, rarely in prose. Šime, thanks!
A seemingly simple and self-explanatory syntactical relationship is coordination. However, for treebank notation it has to be learned a little differently.
Take a simple Latin sentence made up of three clauses, connected asyndetically (just with commas):
Ego scribo, tu legis, ille pingit.
In treebank notation, here the root of the sentence is the (first) comma; on it are dependent the three predicates and the other comma (the full stop is on the same level as the root).
Here is an Annis QL query that finds all kinds of coordination, finding root and its child connected through “COORD” relationship, or arc:
form & form & #1 ->parent[relation="COORD"] #2
We can modify the query to find (“filter”) just commas as roots:
form="," & form & #1 ->parent[relation="COORD"] #2
A similar, but more complex case from the Caesar corpus annotated in Perseus is part of Caes. Gal. 2.33:
ad Venetos, Venellos, Osismos, Coriosolitas, Esuvios, Aulercos, Redones, quae sunt maritimae civitates Oceanumque attingunt
Absolutely, Cicero seems to have more cases of coordinating comma than Caesar. But, since in the Perseus annotated corpus, Cicero's 6229 tokens yield 12 cases, while Caesar's 1488 yield 5, relative ratio is actually 0.19 percent for Cicero — 0.33 for Caesar. Sallust, who has 27 cases on 12311 tokens, is between Cicero and Caesar, with 0.2 percent of his corpus. Jerome, with 8382 tokens, seems to have zero coordinating commas, which is slightly strange.
Problem: a difficult sentence has to be syntactically annotated.
Quicquid oritur, causam habeat a natura necesse est
(C. div. 2, 60)
A proposed annotation is here. But is it correct?
To test it, we write an Annis query and see if there is anything similar in the annotated corpora:
POS="verb" & POS="verb" & #1 ->parent[relation="SBJ"] #2
(“Find two verbs of which one governs the other, and their relationship is labeled as “SBJ”.)
This search finds, among other results, the well-known passage from Cicero's In Catilinam:
quod eam [sicam] necesse putas esse in consulis corpore defigere
(Cic. Catil. 1, 16)
I guess this confirms my annotation.
The tool: Alpheios treebank editor.
Annotations can be found in Perseus Annis.
Some exercises done by NJ.
POS="verb" & POS="adverb" & #1 ->parent[relation="ADV"] #2
Or, even more precisely:
POS="verb" & POS="adverb" & #1 ->parent[relation="ADV"] #2 & #2 . #1
Additional exercise: using Annis notation, try to find similar sentences in the Perseus annotated corpus.
Sentences from the ”new Menge” (taken from classical authors):
Source: Pinkster, Harm (1942-) [1990], Latin Syntax and Semantics, xii, 320 p.
Found with Annis QL: subject is a noun in nominative, predicate is verb in indicative, has direct object in accusative. Examples are shortened here (most of the sentence is omitted).
And with any word in nominative: