DiaCollo: collocation analysis in diachronic perspective

icon DiaCollo

In the words of the famous language philosopher Ludwig Wittgenstein: “the meaning of a word is its usage in the language” (Philosophical Investigations, Part I, section 43). In other words, the meaning of a word can be revealed by the context in which it appears. An ambiguous word such as ‘bank’ can be be disambiguated given its context: the ‘bank’ bounding a body of water will tend to occur together with terms like “river”, “lake”, or “slope”, while the ‘bank’ which is a financial institution will tend occur together with expressions like “money”, “cheque”, or “go to”.

Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). Even political, cultural, or social changes relating to a central term can be revealed and traced through its typical combinations (see the example for ‘revolution’ below).
 
DiaCollo is a software tool for the discovery, comparison, and interactive visualization of the typical word combinations for a user-specified target term. Characteristic word combination profiles based on various underlying text corpora can be requested for a particular time period, as well as direct comparisons between different time periods. In addition to traditional static tabular display formats, a number of intuitive interactive online visualizations for query result data are also available.

Especially relevant for

  • historians
  • political scientists
  • philologists
  • linguists

Starting point:

Interest in tracking the changes in use of one or more period- or discourse-typical words over a particular time period.

Task:

Exposing a general or discourse-specific shift in meaning by changes in characteristic collocations.

Solution

This task can be accomplished with the help of the DiaCollo tool and the resources provided by the CLARIN-D infrastructure.

Related CLARIN-D tools and services

  • WebLicht web-based analysis tool
  • DTA::CAB historical German text analysis service

More information:

A use-case of the CLARIN-D center in the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW).

Participating projects:

Extensive tutorial Diacollo Tutorial (http://kaskade.dwds.de/diacollo-tutorial/)(currently German only)

Version from: 12th April, 2016

Short guide on how to use DiaCollo

  1. Visit the DiaCollo query form in a browser to query the data from the German Text Archive text corpus
  2. Type the word ‘Revolution’ in the QUERY field.
  3. Select “Cloud” from the FORMAT menu. Leave the rest of the fields unchanged.
  4. Click on the “submit” button (next to the QUERY field).
  5. In the box beneath the query section, the words that typically appear with ‘Revolution’ will be displayed. The window initially shows the situation in 1610. The presentation format is a word-cloud: the displayed words will differ in size and color based on their association strengths with respect to the target word, ‘Revolution’.
  6. Directly above the display area is a time-line beginning at 1610 and ending at 1910, divided into intervals of 10 years each. To the left of the display area is a scale of the (relative) association strengths for the displayed items for easier interpretation of the results.
  7. Clicking on a date in the time-line (e.g. 1790) will cause the typical combinations for ‘Revolution’ in the corresponding decade to be displayed; clicking on a word in the display area will display a window containing detailed information on that word to be displayed, including a direct link to the respective underlying corpus hits. Alternatively, you can click on the “play” button to the left of the time-line to initiate an animation of the changes in typical word combinations over time. Playback speed can be altered with the vertical slider next to the “play” button.

You can modify the basic recipe above in various ways, for example by changing the queried time period (DATE) and/or the size of the intervals on the time-line (SLICE). You can also change the maximal number of displayed collocates (KBEST) or the mode of visual presentation (FORMAT). Additional corpora and further modes of application are also available. For instance, you can use DiaCollo to display the differences or the similarities between two different words on the basis of their typical collocates over a given time period, or to directly compare the typical collocates of a single word in two different time periods. Further details and examples can be found in the full CLARIN-D DiaCollo use-case (in German), as well as in DiaCollo's online help pages.

Additional versions of this guide

A more detailed guide with examples in German is available in PDF format.