Curation and integration of spoken academic language resources of the GeWiss project into the CLARIN infrastructure (WG 1, project 2)
The second curation project of Working Group 1 started in April 2013 under the project leadership of Herder-Institute (University of Leipzig) in co-operation with the CLARIN-D centres at IDS Mannheim, Leipzig University and Hamburg University (HZSK). The curated resources will be made available through the existing web portal of the GeWiss (content in German) project. In addition, the IDS will make the data available through its CLARIN service centre infrastructure.
Project content
The curation project aims to integrate all existing published and unpublished resources of the GeWiss project and to make them available to the academic community at large, using a format compatible with CLARIN standards. The tri-national research project “Spoken Academic Discourse: German in comparison to English and Polish” (GeWiss) had the aim of constructing a multilingual, comparable corpus of spoken German, English, and Polish academic language (2009 – 2013). A beta version of the GeWiss core corpus was published in March 2013 comprising two genres of academic discourse: oral presentations and oral examinations in German, English, and Polish language, recorded in German, British and Polish academic contexts. The curation project aims at extending the core corpus by integrating two additional resources, a corpus of student presentations given in L2-German, recorded in the Bulgarian academic context, as well as a corpus of Italian-language conference papers recorded in the Italian academic context. In addition, the search options of the resource will be improved by integrating a pragmatically annotated sub-corpus. Moreover, the curation project will transfer the existing metadata of the GeWiss corpora into the CMDI format, register PIDs for the subcorpora and their components, and, finally, make the entire resource searchable via the VLO.
GeWiss will provide the academic community with the first freely available corpus resource for the comparative analysis of spoken academic language. Such resources are of a particularly high value as the recording and transcribing of spoken data is a very time- and labour-consuming task that can hardly be accomplished in individual research projects. The curation project contributes substantially to the establishment of the long-term availability of this valuable resource and, in doing so, helps to improve the infrastructure in the context of spoken academic discourse research.
The individual stages and work packages of the curation project will be documented in the form of workflows in order to provide an infrastructural basis both for the long-term integration of additional resources as well as for the further improvement of the search options. This will help pave the way for the GeWiss resource to grow into a key reference corpus for the comparative analysis of spoken academic discourse.
Duration
- 01.04.2013 – 31.03.2014
Applicants
- Prof. Dr. Christian Fandrych (Herder-Institute, University of Leipzig)
Responsible Institutions
- Herder-Institute, University of Leipzig (GeWiss)
- CLARIN-D service centre ASV, University of Leipzig
- CLARIN-D service centre IDS Mannheim
- CLARIN-D service centre HZSK, University of Hamburg
Executive Staff
- Daniel Jettka (HZSK, University of Hamburg)
- Cordula Meißner (Herder-Institute, University of Leipzig)
References
- The GeWiss core corpus is available in a beta version via the web portal https://gewiss.uni-leipzig.de.
- Daniel Jettka: Poster presentation of the curation project CLARIN-KP-GeWiss, CLARIN-D M24 Workshop, 27./28.06.2013, Stadsschouwburg Nijmegen.
- Cordula Meißner, Daniel Jettka & Christian Fandrych: CLARIN-KP-GeWiss: Das zweite Kurationsprojekt der F-AG 1 „Deutsche Philologie“. In: CLARIN-D-Newsletter, Nummer 4, Mai 2013.
- Christian Fandrych, Cordula Meißner & Adriana Slavcheva (2012): “The GeWiss Corpus: Comparing Spoken Academic German, English and Polish“. In: Schmidt, Thomas/Wörner, Kai (Hg.): Multilingual corpora and multilingual corpus analysis. Amsterdam: Benjamins. 319 – 337. (= Hamburg Studies in Multilingualism 14).
- Christian Fandrych, Cordula Meißner & Adriana Slavcheva (i. Dr.): “Das Korpusprojekt „Gesprochene Wissenschaftssprache kontrastiv“ und seine Relevanz für die Vermittlung des Deutschen als Wissenschaftssprache”. In: Mackus, Nicole/Möhring, Jupp (Hg.): Wege für Bildung, Beruf und Gesellschaft - mit Deutsch als Fremd- und Zweitsprache. 38. Jahrestagung des Fachverbandes Deutsch als Fremdsprache an der Universität Leipzig 2011. Göttingen: Universitätsverlag.
- Christian Fandrych, Cordula Meißner & Adriana Slavcheva (Hg.) (i.Vorb): Gesprochene Wissenschaftssprache: Korpusmethodische Fragen und empirische Analysen. Heidelberg: Synchron-Verlag. (= Wissenschaftskommunikation).