Sample Use Cases of the CLARIN-D infrastructure
WebMAUS-Basic: Automatic phonetic labelling & segmentation of a single German recording with text
Interviews and conversation are often recorded and later transcribed. The web service WebMAUS Basic, which is available in the CLARIN infrastructure, allows to automatically combine audio recordings and text transcriptions in a way that the phones, words and audio signals are time-aligned.
Especially relevant for
anybody who has audio signals and transcriptions, for example researchers working with:
- linguistics
- phonetics
- anthropology
- ethnology
- media studies
- educational research
- conversation analysis
- speech pathology
- political science
- speech technology
WebMAUS-Pipeline: Dealing with long video interviews with interlocutor speech, noise, long silence intervals etc.
Very long recordings (typical up to several hours in video interviews) are difficult to time-align. Therefore, the BAS offers a web service that automatically splits long recordings into so-called chunks, segments them individually, and combines the results into a common file, as demonstrated in this use case.
Especially relevant for
- Linguistics
- Phonetics
- Phonology
- Speech Technology
From text to phonological pronunciation
The orthography of many languages does not encode the precise pronunciation of the corresponding spoken utterance. In such cases, it is useful to be able to automatically transform a text into a phonological encoding (e.g. for speech synthesis). The CLARIN web service G2P provides such a tool for a multitude of languages.
Especially relevant for
- Linguistics
- Phonetics
- Anthropology
- Ethnology
WebMAUS-Basic: Automatic phonetic labelling and segmentation of multiple Hungarian recordings
Interviews and conversation are often recorded and later transcribed. The web service WebMAUS, which is available in the CLARIN infrastructure, provides tools to combine audio recordings and transcriptions in a way that the words and audio signals are time aligned.
Especially relevant for
- Linguistics
- Phonetics
- Speech Technology
- Anthropology
- Ethnology
- Media Science
Cross-corpus search and download of recordings of the BAS CLARIN repository
Large collections of speech recordings and annotations contain different sub-corpora. CLARIN provides such datasets for academic research. This requires authentication as a member of the academic society.
Especially relevant for
- Humanities scholars interested in empirical speech data
- Developers in speech technology
Support of Enhanced Publications in CLARIN: Citation, Archiving and Access to research data
Repositories contain research-based data available under certain conditions. As repositories are permanent archiving installations, the data in there can be cited and hence made visible. This allows reusing data, attributing the resource to the creator, and reproducing research results. Access to research data will be different from repository to repository.
Especially relevant for
- Humanities scholars working with empirical speech data
- Developers of speech technology
DiaCollo: Collocation analysis with a diachronic perspective
The meaning of a word can be revealed by the context in which it appears. Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). DiaCollo is a software tool for the discovery, comparison, and interactive visualization of typical word combinations for user-specified target terms.
Especially relevant for
- Historians
- Political scientists
- Philologists
- Linguists
Using Automatic Annotation Tools for Transcription Files
The EXMARaLDA Partitur-Editor enables access to the web services provided by WebLicht and the CLARIN-D infrastructure. WebLicht as a service allows workflows to be defined and later re-used with just one click.
Especially relevant for
anybody who works with the EXMARaLDA Partitur-Editor and wants to automatically annotate his or her files, for example:
- Linguists
- Anthropologists
- Political scientists, especially those working with video and audio files
Where do you say
Many linguistic resources contain geographic information, for example the location of the recording or the birthplace of a speaker. The tool Wo sagt man (German for "where do you say") uses the external data from the database of spoken German (Datenbank für Gesprochenes Deutsch, DGD) and highlights the areas in which an expression is being used. On a map, it shows the areas where an expression has been recorded.
Especially relevant for
- Dialectologists
- Historians interested in specific regions
- Philolologists
Further information and example >>
Context Search of Words in Distributed Corpora
The CLARIN Federated Content Search (CLARIN FCS) allows to search in language resources that are archived in different repositories. The aggregator converts the results so that they can be further processed in WebLicht, so as to, for example, perform Named Entity Recognition.
Especially relevant for
- Linguists
- Computational Linguists
Word-level-based comparative text analysis
Many questions from the field of Humanities relating to specific text resources can be reduced to the analysis of vocabulary. Especially comparison is of central interest. The aim of this use case is to demonstrate how to answer one's own research question.
Especially relevant for
All scholars that compare texts or vocabulary, including:
- Historians
- Political scientists
- Philologists
Content analysis of biographical data supported by computational linguistics
Our web service "Textuelle Emigrationsanalyse" (German for 'textual emigration analysis') offers an example of how facts about emigration that were extracted from large textual corpora using computational linguistic techniques within the CLARIN infrastructure can be explored. The results can be seen either in tabular form, on a map with geographical information or person-centred.
Especially relevant for
- Historians
- Political scientists
- Literary scholars
Automatic markup of personal and place names in textual sources
Books, articles, and manuscripts often entail information about people, geographical locations, and organizations. With this tool, names can automatically be marked and categorized.
Especially relevant for
- Historians
- Political scientists
- Literary scholars