Goal: check whether Latin literature passages used by Holberg occur in CroALa.
Ov. Metam. I, 91-2 Pœna metusqve aberant, nec verba minantia fixo Ære Ligabantur.
<cit> <bibl>Ov. Metam. I, 91-2</bibl> <quote>Poena metusqve aberant, nec verba minantia fixo<lb/> aere Ligabantur.<lb/></quote> </cit>
bibl
and everything inside it, leaving just what's inside the quote
. The XPath for this is: //quote/text()
morte dolores si non a genero a genero fratrum ab hospite ab hospite tutus aberant nec aberant nec verba abfuit arbor accipit ille acclinavit in acclinavit in illum acer eqvus acer eqvus cum ad imitandum ad imitandum non ad manes ad manes junctae ad superos ad superos Astraea aderisqve dolentibus aditus et aere canoro aere Ligabantur aeterna vocabat aethera contra ...
Links after the number lead to CroALa queries on uppercase strings.
A second, slightly improved version of results (160 combinations found):
First pass, 167 results:
1. zacroala.sh
, transforms words into Philologic regexes.
#!/bin/bash # Jovanovic, 2012-10, format a list of words for CroALa orthographic search # usage: ./zacroala.sh filename # take argument filename: file=$1 # make various character replacements for Philologic crapser search: cat ${file} \ | tr '[:lower:]' '[:upper:]' \ | tr "JY" "I" \ | tr "V" "U" \ | tr " " "+" \ | sed 's/\([AO]\)E/[AO]?E/g' \ | sed 's/\([BCDFGHLMNPRST]\)\1/\1?\1/g' \ | sed 's/H/H?/g' \ | sed 's/T\([^TH?]\)/TH?\1/g' \ | sed 's/\(.*\)/\1*/g' - >> ${file}-zacroala
2. localcaula.sh
, sends a list of queries (via curl) to a Philologic installation and sorts results into positives (with hits) and negatives (no occurrences found).
The HTMLized bash script is here.
3. zacr-rez.sh
transforms a list of positive results into a HTML list with live links to CroALa.
#!/bin/bash # Jovanovic, 2012-10, transforms a list of results into live links for CroALa # usage: ./zacr-rez.sh filename # take argument, find file file=$1 sed 's/^\([^ ]\+\) \([^ ]\+\)/\1+\2/g' ${file} \ | sed 's/^\([^ ]\+\) \([^ ]\+\)/\1+\2/g' \ | sed 's/+=/ =/g' \ | sed 's#^\(.*\)\( =.*\)#<li><a href="http://www.ffzg.unizg.hr/klafil/croala/cgi-bin/search3t?dbname=croala\&word=\1\&OUTPUT=TF">\1</a>\2#g' > ${file}.html # end of script