CLARIN-D Blog

ESU Erfahrungsbericht: Stipendiatin Viviana Pezzullo

My name is Viviana Pezzullo and I am one of the participants of the European Summer University in Digital Humanties, recipient of the CLARIN-D Fellowship. I would love to publicly thank CLARIN-D for having given me the opportunity to take part in such an amazing experience that has allowed me to improve my project and to meet young scholars from all over the world.

For the two weeks, I joined Dr. Carol Chiodo's and Lauren Tilton's workshop titled Hands on Humanities Data. This workshop has been such a perfect fit for the project on which I am currently working: a digital collection of letters exchanged during the 1940s belonging to the Badia family. Thanks to my instructors now I have a better understanding of how to organize my data and how to make a better use of them. The first week focused on understanding what kind of data one is dealing with and how to use Google Sheets to work collaboratively. After having discussed my method with my peers and Dr. Chiodo and Dr. Tilton, I realized all the unintentional mistakes that I have carried out throughout my whole recollection of data. Therefore, I had the opportunity to improve my cluttered data set.

The second week focused on the tools that one could use not only to study the data but also to showcase it. For instance, our group has discovered Voyant, which I consider an impressive tool for anyone who wants to approach text analysis but who is not familiar yet with Python or R. Furthermore, I learned how to create a timeline (by using TimelineJS) and a map (by using Carto and StoryMap), which I will definitely

Read more

ESU Erfahrungsbericht: Stipendiatin Cecília Magalhães

SHORT TEXT, GREAT EXPERIENCE:
a two-week story about ESU in Digital Humanities in Leipzig University

By Cecília Magalhães - August 3th, 2018. 

The city of Leipzig was, in the two last weeks of July, the perfect stage for productive academic practices, concerning the Digital Humanities field. In this period, the "European Summer School in Digital Humanities", hosted by the University of Leipzig, offered a variety of workshops with distinct approaches, about textual and data analysis, XLM-TEI schemas, Data visualisation and so on. Further, the event was branded by international talks which, beyond of reinforcing some topics of the technical and theoretical subjects already practised in class, reminded us of the importance of actively taking part, as digital humanists, in the academic, political and social discussion into the global DH community.


In this panorama, I took part in the "Hands-on Humanities Data Workshop - Creation, Discovery and Analysis", led by Lauren Tilton (University of Richmond, USA) and Carol Chiodo (Princeton University, USA). The classes were focused, initially, on introducing the concept of data in its complexity and diversity. More than that, we discussed how to handle multiple data by critically selecting and working with combined tools and platforms. Text analysis, text mining, topic modelling, mapping, networks analysis, data visualisation were topics which were presented through fascinating examples of projects in DH.

Abb:
	Cecilia.

What is DH? Hands-on first class with Carol Chiodo. Photo Credit: Cecília Magalhães

I have been investigating the interactional component of the LdoD Archive (PORTELA & SILVA, 2018, ldod.uc.pt), a digital edition based on the textual fragments of Fernando Pessoa's unfinished book, the Book of Disquiet. This platform creatively enhances the users' reading, editing and writing practices through

Read more

ESU Erfahrungsbericht: Stipendiatin Laura Ivaska

Learning how to do Stylometry with style at the European Summer University in Digital Humanities 2018 in Leipzig

I was honored to attend the European Summer University in Digital Humanities “Culture and Technology” at the University of Leipzig in July 2018 as a CLARIN-D Fellow. During the two weeks, some hundred students and teachers from all over the world attended and taught workshops on various topics in Digital Humanities, ranging from Project Management to Reflected Text Analysis and from XML-TEI to Computer Vision. I participated in the workshop on Stylometry, a method to study the similarities and differences between (literary) texts that is often used for authorship attribution, that is, to answer questions such as who is Elena Ferrante and whether Robert Galbraith is actually J.K. Rowling.

In the workshop, I learned how to use the stylo package in R and became familiar with concepts such as Delta and Manhattan Distance as well as oppose() and rolling.classify(). Although stylometry is based on statistics and the stylo package is written in the programming language R, the workshop is suitable also for beginners as the stylo package has a graphic interface just like any computer software and the teachers explain the mathematical equations behind the statistical analyses. 

The workshop – like the whole Summer University ­– was very intensive, meaning that there was much to be learned in a short amount of time, but also that I got to return home having gained skills that will allow me to do my own stylometric analyses. Additionally, the workshop contained a small, yet important, sidenote to learn how to generate data visualizations using software such as Gephi and jMol. It was great to learn about suitable

Read more

ESU Erfahrungsbericht: Stipendiat Erdal Ayan

A little About Me

My name is Erdal Ayan. I am an Academic Assistant in Herder Institut, Marburg, Germany. I have also been a master student in Informatics at Philipps University Marburg since September, 2017. Nowadays, I am working on data visualization, big data processing, text analysis and corpus building in scope of my study and workload at my institut. I am very curious about building up corpora for educational and research purposes.

Special Thanks to Supporters

Actually, I did not know about the summer university in Leipzig until I get an informative email from the head of my department, Barbara Fichtl, who also deserves special thanks by me. :) I want to thank CLARIN-D for providing me with fellowship, organizing committee of European Summer University for accepting me and administration of Herder Institut, Marburg for supporting me to take part in the workshop and academic activities during my stay in Leipzig. I do not want to forget to thank Prof.Dr.Elizabeth Burr and her hard working assistants for their efforts to welcome and host us for almost two weeks time in Leipzig.

About the Workshop and My Experience

From 17.07 to 27.07.2018 I participated in the workshop titled “Word Vectors and Corpus Text Mining with Python” in scope of “Culture & Technology” - The European Summer University (ESU) in Digital Humanities in Leipzig. The workshop was organized and held by an expert on digital humanities and arts and a PhD candidate, Eun Seo Jo from Stanford University, USA. I should mention that I have learned a lot more than what I expected beforehand. The content and scope of the course was extensively focusing on processing large scale texts, common concepts

Read more

WebLicht Tutorial

https://youtu.be/3RgRCEa6Smo

This video tutorial shows one of multiple ways how you can use WebLicht. WebLicht is a web application provided by CLARIN-D that allows you to build toolchains for linguistic annotations on different layers like Morphology, Syntax or Named Entity Recognition.

To get started you have to log in via your CLARIN- or any other university affiliation account. After clicking the Start-Button and selecting Easy Mode which supplies you with a pre-defined toolchain, you can either analyze text that you directly type in or copy-paste to the corresponding window, use a sample text provided by WebLicht or upload a text file. Now you can select your preferred layer of annotation and hit run to get a detailed analysis for the selection you have made. It is then possible to download the complete file or parts of it as .csv or .xml.

In case the pre-defined toolchain does not satisfy your needs, you can also switch to the Advanced Mode where you can build your own customized toolchain. You can always refer to the Helpdesk if you have any questions, suggestions, or problems that you want to report. 

Read more

How to use COALA

https://youtu.be/yB090931YdM

COALA is a tool to convert simple text tables into CLARIN Metadata-Files (CMDI) for multimodal corpora. If you want to learn more about CMDI in general we refer to this page. As many other CLARIN tools, COALA is a free web service that can be found on the website of the Bavarian Archive for Speech Signals (BAS)

To get your CMDI you just have to upload your files, give your corpus a name and a title and hit the green COALA button to convert your files. Within a few seconds you can download your zipped file which contains all the metadata for your corpus.   

If you are encountering any problems, you first might want to check the logging messages at the top to see whether something went wrong. If you have further problems or questions there is a detailed description of the web service along with templates that you can download to see example inputs. Maybe there is something wrong with your tables? In case you can't find what you are looking for there is also the possibility to get help at the Helpdesk.

Read more

Online Perception Experiments with Percy

What is it?

Percy is a device-independent tool to perform online perception experiments. Researchers can learn something about spoken language via setting up an experiment design where participants listen to audio stimuli and can give their judgment about it afterwards.

For Whom is it?

Percy is a tool that can be used by researchers who want to know something about spoken language but it is also quite interesting for the participants as they can give judgements about the stimuli and manipulate them.  

And the Details?

To define an experiment design a researcher first has to think about what stimuli, input options and questions he or she wants to present to the participant. The researcher can chose among three options for setting up the experiment design: He or she can (1) use the inbuilt editor, (2) use the default user interface or (3) contact percy@phonetik.uni-muenchen.de for a more advanced experiment design. There is also the possibility to choose from a set of experiments that were already conducted. 

Read more

How to use WebMAUS

https://youtu.be/G-TVDx5KQBs

This video tutorial about WebMAUS - the Munich AUtomatic Segmentation explains how you can easily generate a textgrid file that aligns an audio signal to a transcription out of the application. If you want to learn more about WebMAUS in general click here. The procedure to receive the textgrid is quite simple. You just need your text file containing the transcription and your corresponding audio file with spoken language and feed it into the application via drag-and-drop (careful! the files need to have the same name.

After this step, a menu drops down where you can select your preferences and hit the 'run' button. After a few seconds, WebMAUS has created a textgrid for you which you can download and open in PRAAT along with your audio file and check where WebMAUS has segmented your file and further process it. 

Read more

WebMAUS Introduction

https://youtu.be/7lI-gOShtFA

This video tutorial gives a brief introduction to the Munich AUtomatic Segmentation -- or WebMAUS. It is a tool to align speech signals to linguistic categories which makes it, amongst other things, possible to align the audio signal of a video to its transcript. As input, WebMAUS needs a video signal and some kind of a transcription of the spoken text. 

To get the actual output, the input text first needs to be normalized. With the Balloon tool, the expected pronunciation is created in SAMPA (a phonetic alphabet). In a next step, all other possible variants of pronunciation are made along with their probability. All those other possible pronunciations are visualized in a probabilistic graph where finally WebMAUS searches for the path of phonetic units that have truly been spoken. The outcome is a transcript of the real pronunciation along with its segmentation. 

There is an open source download and a web application. The usage is free for all academic members of Europe.

Read more