THE DEPARTMENT’S INTERNATIONAL PROJECTS
ACCURAT – Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation (7th Framework Program, FP 7)
- FP7 project principal researcher: Marko Tadić, Ph.D.
- Associates: Boško Bekavac, Ph.D., Ivana Simeon, Ph.D., Krešimir Šojat, Ph.D.
Project description: The aim of the ACCURAT project was to explore the methods and techniques to overcome one of the central problems in machine translation – the lack of language resources for insufficiently supported MT areas. The main goal of the project was to find, analyze and evaluate new methods which would allow the use of comparable corpora with the purpose of supplementing scarce language resources for certain language pairs and textual domains, in order to ultimately improve MT output. Further details
LetsMT! – Platform for Online Sharing of Training Data and Building User Tailored MT (Competitiveness and Innovation Program, ICT-PSP)
- Croatian team leader: Marko Tadić, Ph.D.
- Associates: Boško Bekavac, Ph.D., Ivana Simeon, Ph.D., Daniela Katunar
Project description: As statistical machine translation has become the leading paradigm in the recent years, it has been shown that its quality largely depends on the volume of training data. Since most parallel texts today pertain to larger languages, statistical MT systems exhibit better quality in those language pairs than in language pairs consisting of languages with scarce parallel language resources. The LetsMT! Project aimed to remedy this scarcity by building a technological platform for its own statistical MT system from own or publicly available comparable texts. Further details
CESAR – Central and South-East European Resources (Competitiveness and Innovation Program, ICT-PSP)
- Croatian Team Leader: Marko Tadić, Ph.D.
- Associates: Boško Bekavac, Ph.D., Krešimir Šojat, Ph.D., Daniela Katunar
Project description: The CESAR project, in close coordination with the META-NET excellence network, with sensitivity to practices within research community, aimed to resolve the scarcity of language resources, tools and services for Central and South-East European languages. This aim was achieved through the improvement, upgrading, standardization and establishing connections of a broad range of language resources and tools, making them available through an open language infrastructure. The project allowed access to a comprehensive set of language resources and tools for Bulgarian, Croatian, Hungarian, Polish, Slovak and Serbian. Resources included monolingual and multilingual spoken and written textual databases, corpora, dictionaries and wordnets, while tools consisted of tokenizers, lemmatizers, taggers and parsers.
CLARIN – Common Language Resources and Technology Infrastructure (7th Framework Program, FP 7)
- FP7 project principal researcher: Marko Tadić, Ph.D.
Project description: The broadly conceived pan-European cooperation project aimed to create, coordinate and make available language resources and technologies for end users. CLARIN offers tools to researchers in humanities and social sciences, providing them with computer-supported processing of language as the carrier of cultural content and knowledge, means of communication, constituent of identity and object of research. Further details
CADIAL – Computer Aided Document Indexing for Accessing Legislation (FP 7)
- FP7 project principal researcher: Marko Tadić, Ph.D.
- Further details
Xlike – Cross-lingual Knowledge Extraction
- Associates: Boško Bekavac, Ph.D., dr. sc. Krešimir Šojat, Ph.D.
- Further details
PARSEME: PARSing and Multi-word Expressions (ICT-COST)
NETWORDS – The European Network on Word Structure, Cross-disciplinary approaches to understanding word structure in the languages of Europe
- Croatian team leader: Ida Raffaelli, Ph.D.
- Associate: Daniela Katunar
- Further details
Evolution of Semantic Systems (EoSS) – A research initiative of the Max Planck Institute for Psycholinguistics
- Croatian team leaders: Ranko Matasović, Ph.D., Ida Raffaelli, Ph.D. (since 2012)
- Associates: Jana Willer Gold, Ph.D., Tena Gnjatović, Daniela Katunar
Project description: The main aim of the EoSS project is to investigate how meanings vary over space and change over time. We focus on different kinds of categories: containers (kinds of objects), colour (attributes of objects), body parts (parts of objects), and spatial relations (how objects are related to one another).Further details
MZT PROGRAM – The scientific program Computational Linguistic Models and Language Technologies for Croatian encompasses 5 projects funded by the Ministry of Science, Education and Sports, two of which are carried out at the Department
- Principal researcher: Marko Tadić, Ph.D.
- Associates: Krešimir Šojat, Ph.D., Božo Bekavac, Ph.D., Daniela Katunar
Project description: The fundamental goal of this interdisciplinary program is to study and develop theoretical models for individual subsystems of the Croatian language. Based on these theoretical findings, computational applications of these theoretical models, namely computationally supported resources and tools, will be developed.Further details
THE DEPARTMENT’S NATIONAL PROJECTS (Ministry of Science, Education and Sports/Ministry of Science and Technology)
Lexical Semantics in Building the Croatian WordNet
- Principal researcher: Ida Raffaelli, Ph.D.
- Associates: Boško Bekavac, Ph.D., Krešimir Šojat, Ph.D., Daniela Katunar
Project description: The goals of the research are: (1) identifying and defining conceptual and lexical distinctive features of Croatian with respect to other languages, i.e. the description of the lexical and semantic system of Croatian; (2) development of the Croatian WordNet, which should be compatible with the processed languages included in the EuroWordNet 1 and 2 and BalkaNet projects. Further details
Croatian Language Resources and Their Annotation (2007-2012)
- Principal researcher: Marko Tadić, Ph.D.
- Associates: Ivana Simeon, Ph.D., Boško Bekavac, Ph.D., Daniela Katunar
Project description: The goal of this project is to bring Croatian corpora on a par with corpora of larger languages in several ways: (1) by expanding the existing Croatian National Corpus (CNC) from 101 million to at least 200 million tokens; (2) by supplementing CNC with morphological (POS; grammatical categories, lemmas), syntactic (syntactic segments, sentence structures) and semantic (lexical meaning tags from the Croatian WordNet) tags; (3) by carrying out fundamental statistical studies of representation, frequency and distribution of linguistic units and their combinations at several levels; (4) by compiling a certain number of smaller Croatian corpora for individual specialist domains; (5) by compiling a certain number of smaller parallel corpora, with Croatian as a member of the language pair. Further details
The Croatian Language in a Comparative Perspective (2006-2011)
- Principal researcher: Ranko Matasović, Ph.D.
- Associates: Mate Kapović, Ph.D., Tena Gnjatović
Project description: The aim of the project was comparative analysis of Croatian from a typological and historical perspective, and within the project, about 40 papers and two books have been published, and two international conferences organized. The project contributed to a better understanding of the position of Croatian among Slavic and Indo-European languages, as well as its distinctive typological features among languages of the world.
Dictionary Range and Structure in Educational Processes (2004 – 2007)
- Principal researcher: Vlasta Erdeljac, Ph.D.
- Associates: Dubravko Škiljan, Ph.D.
- Further details
Language Identity Construction and Structure (2007-2012)
- Principal researchers: Dubravko Škiljan, Ph.D., (2007), Vlasta Erdeljac, Ph.D.
- Associates: Mislava Bertoša, Ph.D., Jana Willer-Gold, Ph.D., Martina Sekulić, Bojan Glavašević
Project description: (1) analysis of language policies as instruments forming collective language consciousness provides the basis for formulating Croatian language policy in the process of accession to the European Union; (2) studying the process of symbolic interaction in complex and multilingual communities ensures that preconditions for multicultural education and the development of society of tolerance and understanding of others are met; (3) analysis of the structure of mental lexicon allows the improvement of the educational processes of learning native and foreign languages, as well as understanding language processing in the cases of various language and speech pathologies and the creation of adequate lexical databases (according to speaker types and the specifics of their language use). Further details
CroDeriV (A morphological database of Croatian verbs)
- Principal researcher: Krešimir Šojat, Ph.D.
- Associate: Ida Raffaelli, Ph.D.
- Further details
Development of Croatian Language Resources (2002-2006)
- Principal researcher: Marko Tadić, Ph.D.
- Associates: Ivana Simeon, Krešimir Šojat, Ph.D., Boško Bekavac, Ph.D.
- Further details
Typology of Annotation in Semiology and Semiotics (2002-2006)
- Principal researcher: Marin Andrijašević, Ph.D.
- Associate: Mislava Bertoša, Ph.D.
PARTICIPATION IN EXTRADEPARTMENTAL PROJECTS
LINEE – Languages in a Network of European Excellence (2006.-2010., FP6)
- Associate: Mislava Bertoša, Ph.D.
Bilateral Slovenian-Croatian scientific project Interlingual and Intercultural Connections and the Construction of National Identity in Slovenian and Croatian Tourist Discourse (2009 – 2011)
- Principal researcher: Vesna Muhvić-Dimanovski, Ph.D.
- Associate: Mislava Bertoša, Ph.D.
Theoretical and Cognitive Linguistic Research of Croatian and Other Languages (2007-)
- Principal researcher: Milena Žic Fuchs, Ph.D.
- Associate: Ida Raffaelli, Ph.D.
- Further details
OTHER ACTIVITIES
LingChat – Thursdays’ teacher and student discussion group
- Organizers: Jana Willer-Gold, Ph.D., Tena Gnjatović, Daniela Katunar
Workshop: Linguistic features of Arbëresh (2012)
- Organizers: Jana Willer-Gold, Ph.D., Tena Gnjatović
LingLab – teachers’ and students’ group focusing on psycholinguistic topics
- Principal researcher: Vlasta Erdeljac, Ph.D.
NetWordS ljetna škola, Interdisciplinary Approaches to Exploring the Mental Lexicon, 2012. (Dubrovnik)
- Organizer: dr. sc. Ida Raffaelli
9th Mediterranean Morphology Meeting, Morphology and Semantics, 2013. (Dubrovnik)
- Co-organizer: dr. sc. Ida Raffaelli
The Eighth International Conference, Formal Approaches to South Slavic and Balkan Languages (FASSBL-8), 2012. (Dubrovnik)
- Co-organizer: dr. sc. Marko Tadić
International Workshop on Balto-Slavic Accentology
- Initiator: dr. sc. Mate Kapović