CLARIN-D Blog

Data Management in the Humanities: Progress in the Standardisation of Metadata Formats for Language-related Research Data

Data Management in the Humanities: Progress in the Standardisation of Metadata Formats for Language-related Research Data

In July 2019, the International Organization for Standardization (ISO) published a new standard which contributes to describing language-related research data in a significant and sustainable way during archiving. The Standard ISO 24622-2 "Component Metadata Specification Language" standardizes procedures for defining a schema for descriptions tailored to requirements of specific types of research data.

When research data is archived, information about the data is collected and made available in a way that allows other researchers to find the data and to assess the relevance of the data from the description. In addition, potential users can get an idea of how they could incorporate this data into their own research and use it to answer research questions of their own. These descriptions are called metadata.

Experience shows that due to differences in the types of research data and research questions, it is very difficult to find an all-encompassing, universal pattern - or schema - according to which these descriptions can be created. For instance, the description of psychological experiments (number of test persons, research question, free and bound variables, recording system, etc.) are described in a different way than collections of texts for grammatical investigations or for the creation of word embeddings (number of "words", language, length of texts, source of texts, age of texts, authors, etc.). Despite their long tradition, libraries for books have a variety of metadata formats, e.g. Dublin Core, MARC 21, PREMIS, MODS. Many metadata schemas have some fields - also called data categories - that resemble each other, as well as some areas where they differ. In order to enable both an adequate description of research data and the utilisation of similar metadata

Read more

WebLicht Tutorial

https://youtu.be/3RgRCEa6Smo

This video tutorial shows one of multiple ways how you can use WebLicht. WebLicht is a web application provided by CLARIN-D that allows you to build toolchains for linguistic annotations on different layers like Morphology, Syntax or Named Entity Recognition.

To get started you have to log in via your CLARIN- or any other university affiliation account. After clicking the Start-Button and selecting Easy Mode which supplies you with a pre-defined toolchain, you can either analyze text that you directly type in or copy-paste to the corresponding window, use a sample text provided by WebLicht or upload a text file. Now you can select your preferred layer of annotation and hit run to get a detailed analysis for the selection you have made. It is then possible to download the complete file or parts of it as .csv or .xml.

In case the pre-defined toolchain does not satisfy your needs, you can also switch to the Advanced Mode where you can build your own customized toolchain. You can always refer to the Helpdesk if you have any questions, suggestions, or problems that you want to report. 

Read more

How to use COALA

https://youtu.be/yB090931YdM

COALA is a tool to convert simple text tables into CLARIN Metadata-Files (CMDI) for multimodal corpora. If you want to learn more about CMDI in general we refer to this page. As many other CLARIN tools, COALA is a free web service that can be found on the website of the Bavarian Archive for Speech Signals (BAS)

To get your CMDI you just have to upload your files, give your corpus a name and a title and hit the green COALA button to convert your files. Within a few seconds you can download your zipped file which contains all the metadata for your corpus.   

If you are encountering any problems, you first might want to check the logging messages at the top to see whether something went wrong. If you have further problems or questions there is a detailed description of the web service along with templates that you can download to see example inputs. Maybe there is something wrong with your tables? In case you can't find what you are looking for there is also the possibility to get help at the Helpdesk.

Read more

Learning DH and Networking across Europe with CLARIN

Learning DH and Networking across Europe with CLARIN

Take 70 international young scholars in the digital humanities (DH), 11 different classes taught by experienced experts, a couple of presentations by scholars showing their work from various DH subfields, add a social program with excursions to museums and sites of culture: Voilà. In the summer 2017, "Culture & Technology" - The European Summer University in Digital Humanities (ESU) was an excellent venue for scholars to learn about and practice DH methods, expand their horizon to different research questions in the DH and create international networks of expertise.

Existing tools and data sets were utilized to demonstrate usecases and to work on classroom projects based on participants’ interests. CLARIN, as a major contributor to the DH infrastructure in Europe, strongly supported these activities by sponsoring classes related to the CLARIN services, which provide tools, data sets, and workflows.

ESU 2017 organizers: Elisabeth Burr and her team organized the summer school at Leipzig University
ESU 2017 organizers: Elisabeth Burr and her team organized the summer school at Leipzig University

Organized by an enthusiastic team around Elisabeth Burr, the summer school, which was established at the University of Leipzig, Germany in 2009, was again cosponsored by CLARIN – besides receiving funding from Leipzig University, the German Academic Exchange Service (DAAD), and other national and international institutions.  This allowed about 70 participants from almost all over the world to take part in the proceedings of the summer school, including intensive courses in small groups applying DH methods and working on research questions. From Russia to the USA, with the majority of participants coming from European countries ranging from Bulgaria to France, the summer school was an international networking event for young scholars and international experts in DH.

Participants of ESU 2017 listening to a presentation on an international art project
Participants of ESU 2017 listening to a presentation on an international
Read more