1 Corpus

2 Dictionary draft

3 Postediting

Automatically generated dictionary draft requires a team of human editors. With the Dictionary Express method, editors never edit the complete entry. Instead, they edit one entry component, for example, word senses, for all entries in the dictionary.

This component-by-component approach (rather than the traditional entry-by-entry approach) makes it possible to quickly train the editors in one clearly defined task. The editors may not even be expert lexicographers.

Headword list postediting

The dictionary postediting starts with the headword list. The editors assign flags to headwords to distinguish items which qualify for dictionary headwords. Keyboard shortcuts and advancing to the next entry automatically make the task even more efficient.

Image on the right: An example of the headword list editing interface. The concrete flags differ between dictionary projects based on the requirements of the customer.

Word sense postediting

With Dictionary Express, word senses of  headword are suggested automatically based on collocations. The editors name the senses, and validate or reassign the collocations to the correct senses. The sense names can be included in the entries as sense flags or labels (also called a disambiguating gloss).

The process also leads to merging word senses or splitting senses as necessary.

The postedited word sense information is then fed back to the source corpus and used for generating the remaining entry components, e.g. examples, which are  identified taking word senses into account.

Image on the right: Word sense postediting interface. Automatically suggested word senses of the English word bat. Normally, a higher number of collocations is presented to the editor for each sense. This is only an example.

1
2
3
1

Editors name the senses and add new ones if required. The sense names can be included in the dictionary as sense labels or flags.

2

Editors validate the collocations, reassign them to different senses, mark them as MIXED (non-disambiguating collocations) or exclude them completely.

3

A link to sentences from which this collocation was extracted can be used by the editors to refer to the source corpus data to better understand the usage and context.

Example sentences

Example sentences extracted using the GDEX method are validated and edited by the editors in an easy to use interface. Editors mark the examples as good or bad or edit the example. For example, they change a question to an affirmative sentence or remove wording which does contribute to understanding the usage.

Image on the right: Interface for editing example sentences. The sentences are grouped by word senses identified in the previous step. The editors may also be required to pick one best example or to only mark a specified number of examples as good. The interface will then contain the corresponding controls and will check the conditions are met.

Thesaurus

synonyms, antonyms and similar words (semantic field)

Dictionary entries can also contain thesaurus items. The candidate words are generated automatically using the thesaurus tool in Sketch Engine. The editors then sort the suggestions into three groups:

  • synonyms – words with a similar meaning
  • antonyms – words with an opposite meaning
  • similar words – words which belong to the same semantic group (semantic field) but are neither synonyms or antonyms