TEI Lex-0

— A baseline encoding for lexicographic data

5. Senses

In the current TEI Dictionary Chapter, the content model of <entry> allows one to have sense-related information directly within <entry>. TEI Lex-0 proscribes a stricter use of these elements so that sense-related information is grouped within the <sense> element, in accordance with the underlying semasiological model implemented in the TEI Guidelines.

<sense> should be therefore considered mandatory for any dictionary entry that actually provides sense information for the headword. Further in this document, we consider some additional specific cases e.g. “referencing” entries (entries that simply point to other entries) and inflectional lexica (dictionaries that describe word forms only), where <sense> is not a mandatory child of <entry>.

As a consequence of making the use of <sense> more systematic within <entry>, we have seen (see section on <entry>) that some elements are no longer allowed as children of <entry>. We provide here a specific background for each of them:

  • <def> is clearly intended to provide a prose description of a meaning within a <sense> element and should not appear in any other context;
  • In the same way, it is recommended that <cit> be used exclusively as a child of <sense>, or when necessary within <dictScrap>;
  • The case of <hom> is peculiar since it provides a subordinate organization to an entry which is redundant in relation to what <sense> allows one to represent. <hom> is not allowed in TEI Lex-0.

Note: In the case one has to deal with information that does not fit a <sense>-based organization, for instance in the process of retro-digitizing an existing dictionary source, the use of <dictScrap> is recommended. Further step in the encoding of the lexical content may lead to a more precise encoding in a second phase.

In TEI Lex-0, <sense> has a mandatory xml:id.

5.1. Limiting contexts for def

In the current TEI Guidelines, <def> is allowed within the following elements:

TEI Lex-0 allows the use of <def> in <sense> only. All other existing contexts would be implemented by embedding <def> within a <sense>.

5.2. Glosses

In the lexicographic literature, gloss is a rather amorphous category. Zgusta, in his classic Manual of Lexicography (1971), defines it as "any descriptive or explanatory note within the entry" which includes "short comments, explanatory remarks, semantic characteristics or qualifications" (270). Atkins and Rundell (2008) see the gloss as "a more informal explanation of the meaning of a multiword expression or example (or even part of one) in the entry,[...] chiefly used in monolingual dictionaries for learners, to help understanding" (209). While one could argue about the statement that this type of lexicographic construct is used "chiefly... in monolingual dictionaries for learners", it is certainly the case that glosses are expected to help users better understand or more easily locate the particular meaning of a word that they are looking up.

In other words, the prototypical gloss contextualizes and clarifies the meaning of the word. Take this example from Zgusta:
  1. fugitive (of persons)
  2. fugitive (verses)
Here, glosses are used to signal the meaning of fugitive: in the first sense "fugitive" refers to persons, and in the second example, to verses. In TEI Lex-0, this could be represented as:
    <entry xml:id="ED.fugitivexml:lang="en">
     <form type="lemma">
      <orth>fugitive</orth>
     </form>
     <sense n="1">
      <gloss>(of persons)</gloss>
     </sense>
     <sense n="2">
      <gloss>(verses)</gloss>
     </sense>
    </entry>
Glosses, however, are not definitions: one can imagine the above two senses to contain proper lexicographic definitions as well:
    <entry xml:id="ED.fugitivexml:lang="en">
     <form type="lemma">
      <orth>fugitive</orth>
     </form>
     <sense n="1">
      <gloss>(of persons)</gloss>
      <def>given to, or in the act of, running away from a place, especially to avoid arrest or persecution.</def>
     </sense>
     <sense n="2">
      <gloss>(verses)</gloss>
      <def>concerned or dealing with subjects of passing interest; ephemeral, occasional.</def>
     </sense>
    </entry>
Zgusta notes a certain amount of overlapping between glosses and other categories, "the most important probably being that of the examples" (ibid.) This is especially evident in sense no. 2 above where "fugitive verses" or "~ verses" could have been used as an example. The absence of the lemma or lemma reference in "(verses)" as well as the brackets are a clear indicator that the whole construct is not to be read as an example, but rather as a semantic signpost for the given sense.

On sense-distinguishing grammatical properties, see section Grammatical properties in senses

5.2.1. Glossing examples

Semantic glosses can occur at different levels of the entry hierarchy. In the previous section, we saw examples in which glosses were used as a kind of semantic shorthand for an individual sense. They can, however, be used to further qualify individual examples in the entry. Take, for instance, this entry from the Longman Dictionary of Contemporary English (2003):

living /... / adj 1 alive now [...] | The sun affects all living things (=people, animals, and plants). | A living language (=one that people still use) [….]

In TEI Lex-0, this entry would be represented as:

    <entry xml:id="LDOCE.livingxml:lang="en"
     type="mainEntry">
     <form type="lemma">
      <orth>living</orth>
     </form>
     <gramGrp>
      <gram type="pos">adj</gram>
     </gramGrp>
     <sense n="1xml:id="LDOCE.living.1">
      <num>1</num>
      <def>alive now 
       <!--[...] -->
      </def>
      <metamark>|</metamark>
      <cit type="example">
       <quote>The sun affects all <ref type="entryscope="currentEntry">living</ref>
           things <gloss>(=people, animals, and plants)</gloss>.</quote>
      </cit>
      <metamark>|</metamark>
      <cit type="example">
       <quote>A <ref type="entryscope="currentEntry">living</ref> language <gloss>(=one
             that people still use)</gloss>
        <!--[….] -->
       </quote>
      </cit>
     </sense>
    </entry>Gadsby (ed.) (2003) 

5.3. Grammatical properties

In some dictionaries, individual dictionary senses may be associated with grammatical properties, such as part of speech or gender, that differ from the rest of the entry: for instance, a particular sense of a countable noun may be used only in plural. In such cases, <gramGrp> will be naturally placed inside the given <sense>:

Consider, for instance, the second sense of this entry:

    <sense xml:id="DLPC.antepassado_b_2n="2">
     <gramGrp>
      <gram type="number">pl.</gram>
     </gramGrp>
     <def>Pessoas anteriormente ao momento actual.</def>
     <xr type="synonymy">
      <ref type="sense">antecessores</ref>
     </xr>
     <xr type="antonymy">
      <ref type="sense">vindouros</ref>
     </xr>
     <cit type="example">
      <quote>Hérdamos estes costumes dos nossos antepassados.</quote>
     </cit>
     <cit type="example">
      <quote>Culto dos antepassados.</quote>
     </cit>
    </sense>DLPC (2001) 

5.3.1. Grammatical glosses?

Zgusta also uses "gloss" to describe "grammatical indications in the broadest sense of the word" (1971, 240), using an example familiar from Latin (and many other) dictionaries:

  1. petere aliquid ab aliquo [to ask for something from somebody]
  2. petere Romam [to rush to Rome]

In theory, one could choose to encode such phenomena using <gloss>, but TEI Lex-0 recommends a clear separation of roles: <gloss> should be used for semantic or pragmatic information, whereas grammatical information should be encoded using the familiar gramGrp/gram constructs:

    <sense n="1xml:id="LD.peto.1">
     <gramGrp>
      <gram type="government">aliquid ab aliquo</gram>
     </gramGrp>
    </sense>
    <sense n="1xml:id="LD.peto.2">
     <gramGrp>
      <gram type="government">Romam</gram>
     </gramGrp>
    </sense>

Here, too, it is important to note the possibility of ambiguity: unlike "petere aliquid ab aliquo", "petere Romam" could be interpreted as an example. The decision on such ambiguous cases should never be taken in isolation: editors of a digital edition need to consider the conventions of the dictionary as a whole before advising encoders on how to mark up such ambiguous cases.

5.3.2. Nested entries vs. multiple senses

While TEI Lex-0 has been created to simplify the choices available for encoding various lexicographic components, certain levels of ambiguity remain, often due to the highly condensed nature of dictionary content.

Consider, for instance, this entry:

Is this an entry with two senses? Or are these two entries that were on the account of typographic density merged into one?

The answer is as much in the eyes of the beholder, as it is in the eyes of the lexicographers behind the dictionary that the entry stems from, in this case The Chambers Dictionary. Both the encoder and lexicographers, however, are influenced by lexicographic and linguistic traditions in which they operate. For an overview of the homonymy-polysemy dilemma, see, for instance, Zöfgen 1989.

It can't be stressed enough that the goal of dictionary encoding is not to resolve linguistic disputes or evaluate lexicographic traditions but rather to create consistent, if abstracted, representations of lexicographic architectures.

So, what can we do in this particular case? Should we encode gash as an entry consisting of senses, each with a different part of speech, like this:

    <entry xml:id="CHDOEL.gash2xml:lang="en">
     <!--this, as we'll explain later, is valid but not the preferred encoding-->
     <form type="lemma">
      <orth>gash</orth>
      <pron>gash</pron>
     </form>
     <lbl type="homNumrend="sup">2</lbl>
     <sense xml:id="CHDOEL.gash2.1">
      <pc>(</pc>
      <usg type="socioCulturalexpand="slang">sl</usg>
      <pc>)</pc>
      <gramGrp>
       <gram type="pos">adj</gram>
      </gramGrp>
      <def>spare, extra</def>
      <pc>.</pc>
     </sense>
     <metamark function="senseSeparator"></metamark>
     <sense xml:id="CDHDOEL.gash2.2">
      <gramGrp>
       <gram type="pos">n</gram>
      </gramGrp>
      <pc>(</pc>
      <usg type="temporalexpand="originally">orig</usg>
      <lbl>and esp</lbl>
      <usg type="domainexpand="nautical">naut</usg>
      <pc>)</pc>
      <def>rubbish, waste</def>
      <pc>.</pc>
     </sense>
    </entry>

This is surely valid TEI Lex-0. There is conceptually nothing wrong with this encoding: it adequately represents the structure implied by the source text.

We should, however, try to look at the issue at hand from a broader, comparative, perspective.

  • In the Portuguese polysemous entry antepassado above, we had a case in which one particular sense (used in plural only) deviated from the other senses (which are used in both singular and plural). Since the senses were numbered in the original, there was never any doubt about how we would encode this. It was clear from the outset:
    • that the semantic information in that entry was grouped by a construct called <sense>;
    • that senses inherited grammatical properties from the entry as a whole (i.e. entry/gramGrp);
    • that, implicitly, we could assume that each sense can be used with the noun in both singular and plural; and
    • that the plural-only sense was grammatically exceptional, hence entry/sense/gramGrp/).
  • The English example is different: gash as a verb and as a noun are grammatical homonyms. If we encode them, as we did above, as two senses within one entry, we end up with an entry in which there is no inheritance (of grammatical properties) and only exceptions (at each sense-level).

Because TEI Lex-0 is aimed at creating a baseline encoding to facilitate data exchange and comparison between different dictionaries, we, therefore, recommend to encode grammatical homonyms in TEI Lex-0 as nested entries and to use <gramGrp> in <sense> constructs to mark up sense-specific deviations from the rule of grammatical inheritance.

For that reason, our preferred encoding of gash as a verb and a noun would be:

    <entry xml:id="CH.gash2xml:lang="en">
     <form type="lemma">
      <orth>gash</orth>
      <pron>gash</pron>
     </form>
     <lbl type="homNumrend="sup">2</lbl>
     <entry xml:id="CH.gash2.1xml:lang="en"
      type="homonymicEntry">
      <sense xml:id="CH.gash2.1.1">
       <pc>(</pc>
       <usg type="socioCulturalexpand="slang">sl</usg>
       <pc>)</pc>
       <gramGrp>
        <gram type="pos">adj</gram>
       </gramGrp>
       <def>spare, extra</def>
       <pc>.</pc>
      </sense>
     </entry>
     <metamark function="entrySeparator"></metamark>
     <entry xml:id="CH.gash2.2xml:lang="en"
      type="homonymicEntry">
      <gramGrp>
       <gram type="pos">n</gram>
      </gramGrp>
      <sense xml:id="CH.gahs2.2.1">
       <pc>(</pc>
       <usg type="temporalexpand="originally">orig</usg>
       <lbl>and esp</lbl>
       <usg type="domainexpand="nautical">naut</usg>
       <pc>)</pc>
       <def>rubbish, waste</def>
       <pc>.</pc>
      </sense>
     </entry>
    </entry>

For an example in which grammatical homonyms have themselves multiple senses, one of which is grammatically constrained, see, for instance:

    <entry xml:id="ED.aidxml:lang="en">
     <form type="lemma">
      <orth>aid</orth>
      <pron>/ed/</pron>
     </form>
     <entry xml:id="ED.aid_nxml:lang="en"
      type="homonymicEntry">
      <gramGrp>
       <gram type="pos">noun</gram>
      </gramGrp>
      <sense xml:id="ED.aid_n.1n="1">
       <num>1.</num>
       <gramGrp>
        <gram type="number"
         value="singularia tantum"/>
       </gramGrp>
       <def>help, especially money, food or other gifts given to people living in
           difficult conditions</def>
       <metamark function="exampleMarker"></metamark>
       <cit type="example">
        <quote>aid to the earth-quake zone</quote>
       </cit>
       <cit type="example">
        <quote>an aid worker</quote>
       </cit>
       <note>(NOTE: This meaning of aid has no plural.)</note>
       <metamark function="relatedEntryMarker"></metamark>
       <entry type="relatedEntry"
        xml:id="ED.aid_n.1.in_aid_ofxml:lang="en">
        <form type="lemma">
         <orth>in aid of</orth>
        </form>
        <sense xml:id="ED.aid_n.1.in_aid_of.1">
         <def>in order to help</def>
         <metamark function="exampleMarker"></metamark>
         <cit type="example">
          <quote>We give money in aid of the Red Cross.</quote>
         </cit>
         <metamark function="exampleMarker"></metamark>
         <cit type="example">
          <quote>They are collecting money in aid of refugees.</quote>
         </cit>
        </sense>
       </entry>
      </sense>
      <sense xml:id="ED.aid_n.2n="2">
       <num>2.</num>
       <def>thing which helps you to do something</def>
       <metamark function="exampleMarker"></metamark>
       <cit type="example">
        <quote>kitchen aids</quote>
       </cit>
      </sense>
     </entry>
     <metamark function="subentryMarker"></metamark>
     <entry xml:id="ED.aid_vxml:lang="en"
      type="homonymicEntry">
      <gramGrp>
       <gram type="pos">verb</gram>
      </gramGrp>
      <sense xml:id="ED.aid.v.1n="1">
       <num>1.</num>
       <def>to help something to happen</def>
      </sense>
      <sense xml:id="ED.aid.v.2n="2">
       <num>2.</num>
       <def>to help someone</def>
      </sense>
     </entry>
    </entry>