1. Header
A lexical resource encoded in TEI Lex-0 must, like any TEI file, start with the root <TEI> element, which, in turn, must contain a <teiHeader> element.
TEI Lex-0, unlike TEI P5, however, requires the @type attribute on the root /TEI with the value "lex-0".
A TEI header contains information about the lexical resource itself, its source(s), its encoding, and its revisions. Proper, structured metadata of this kind is equally important for scholars using the resource, for software processing them, and for cataloguers in libraries and archives.
The TEI header of a lexical resource has five major parts:
- a file description, tagged <fileDesc>, provides a full bibliographic description of the electronic lexical resource itself as well as the source(s), analogue or digital, from which it may have been derived. For details, see section File Description below.
- an encoding description, tagged <encodingDesc>, describes the relationship between the electronic resource and its source(s). It allows for detailed description of whether (or how) the electronic resource was produced, transcribed or normalized, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied etc.
- a profile description, tagged <profileDesc>, contains classificatory and contextual information about the lexical resource including its object and working languages.
- a container for external metadata, tagged <xenoData>, contains metadata from non-TEI schemas, for instance Dublin Core, MARCXML or MODS, if available.
- a revision history, tagged <revisionDesc>, contains a list of changes made during the development of the lexical resource, both before and after its official release.
Of these, two elements are required in TEI Lex-0: <fileDesc> and <profileDesc>. It is highly recommended to include additional information in <encodingDesc>. It is also an example of good practice to record changes in <revisionDesc>.
1.1. File description
The bibliographic description of the given machine-readable lexical resource is absolutely essential for identifying the basic information about the resource itself, its creators and publishers as well as the conditions under which it is made available to the public.
The elements that make up <fileDesc> are:
- titleStmt (title statement) groups information about the title of a work and those responsible for its content.
- editionStmt (edition statement) groups information relating to one edition of a text.
- extent (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units.
- publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
- seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
- sourceDesc (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as ‘born digital’ for a text which has no previous existence.
<fileDesc> is a mandatory element in plain TEI as well, but in TEI Lex-0 there are some additional constraints and recommendations related to the content of this element.
- In <titleStmt>, TEI Lex-0 recommends the use of type on <title> (with values either full or abbr) to record both the full bibliographic title of the lexicographic resource and the preferred abbreviated title for easy reference, should one exist.
<titleStmt> <title type="full">Lexicon Serbico-Germanico-Latinum</title> <title type="abbr">LSGL</title> </titleStmt> - In <titleStmt>, TEI Lex-0 recommends the use of <persName> and <orgName> to distinguish between the names of persons and organizations. This is especially important since in some cases, the name of an institution is used to take up the collective authorship of a work.
- When using <persName>, TEI Lex-0 recommends to further structure the name with elements <forename> and <surname>.
- In <publicationStmt>, TEI Lex-0 requires the use of <availability> to record the <licence> of the given lexicographic resource. In other words, a TEI Lex-0 must include explicit information on the conditions under which the given resource can be used.
<publicationStmt> <publisher>Ústav pro jazyk český AV ČR, v. v. i.</publisher> <pubPlace>Praha</pubPlace> <availability> <licence target="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International (CC BY 4.0)</licence> </availability> </publicationStmt>StčS (1999-2011) - In addition to <publisher> and <distributor>, the <publicationStmt> in TEI Lex-0 may include information on any other <authority> responsible for creating or making the resource available.
- If using <authority>, TEI Lex-0 requires the use of role with values funder, sponsor or rightsHolder.
1.1.1. Source description
In TEI Lex-0, <sourceDesc> is an optional element. Born-digital resources or those which cannot be properly sourced do not require a <sourceDesc>.
If a resource is sourced, <sourceDesc> in TEI Lex-0 requires that the sources be grouped in <listBibl> elements:
<listBibl type="dictionaries"></listBibl>lists all the dictionaries that were used as a source for the given dictionary; if you are retrodigitizing a print dictionary, your <listBibl> may include only one <biblStruct> with the bibliographic information about your print source;<listBibl type="literature"></listBibl>groups all the literature: for instance, all the sources used by the dictionary author to illustrate examples;<listBibl type="corpora"></listBibl>groups the information on all the corpora that were used in the production of the given lexicographic resource.
TEI Lex-0 requires the use of <biblStruct> for structuring bibliographic information about each individual source. This, too, is a departure from vanilla TEI which is more permissive in this respect.
<sourceDesc>
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title>Vocabulário Ortográfico da Língua Portuguesa</title>
<author>
<orgName>Academia das Ciências</orgName>
</author>
<imprint>
<publisher>Imprensa Nacional de Lisboa</publisher>
<date>1940</date>
</imprint>
<extent>
<measure unit="volumes" quantity="1">1 volume</measure>
<measure unit="pages" quantity="821">821 pp.</measure>
</extent>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>VOLP (1940) <sourceDesc>
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<author>
<persName>
<forename>Wolfgang</forename>
<surname>Pfeifer</surname>
</persName>
</author>
<title>Etymologisches Wörterbuch des Deutschen</title>
<edition>2</edition>
<imprint>
<publisher>Akademie Verlag</publisher>
<pubPlace>Berlin</pubPlace>
<date>1993</date>
<note>with additional notes by the author</note>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>EtymWB-XML (2009) <sourceDesc>
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Staročeský slovník</title>
<title level="m" type="sub">[Seš.] 1–26: na – při</title>
<editor>
<persName>
<forename>Bohuslav</forename>
<surname>Havránek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Vladimír</forename>
<surname>Šmilauer</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Václav</forename>
<surname>Křístek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jan</forename>
<surname>Petr</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Igor</forename>
<surname>Němec</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Emanuel</forename>
<surname>Michálek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jaroslava</forename>
<surname>Pečírková</surname>
</persName>
</editor>
<imprint>
<date>1968–2008</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">A – J</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">K – N</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>StčS (1999-2011) <sourceDesc>
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">A – K</title>
<author>
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<pubPlace>Lisboa</pubPlace>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date when="1789">1789</date>
<note>Com Licença da Real Meza da Comissão Geral, sobre o Exame, e Censura dos Livros.</note>
<note>Vende-ſe na loja de Borel Borel, e Companhia, quaſi defronte da Igreja nova de Noſſa Senhora dos Martyres, na eſquina.</note>
</imprint>
<biblScope unit="volume" n="1">Tomo primeiro</biblScope>
<extent>
<measure unit="pages" quantity="752">752 pp.</measure>
</extent>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">L – Z</title>
<author corresp="https://isni.org/isni/0000000083438040">
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date>1789</date>
</imprint>
<biblScope unit="volume" n="2">Tomo segundo</biblScope>
<extent>
<measure unit="pages" quantity="541">541 pp.</measure>
</extent>
</monogr>
</biblStruct>
</listBibl>
<listBibl type="literature">
<biblStruct>
<monogr corresp="https://purl.pt/29333">
<title>Abecedario Real e Regia Instrucçam dos Principes Lusitanos, composto de 63. discursos Politicos, & Moraes : offerecido ao Serenissimo Principe Dom Joam N.S. / pelo M.R.P. Fr. Joam dos Prazeres, Prègador Gèral, & Chronista mòr da Religiaõ do Principe dos Patriarcas Sam Bento</title>
<author>
<persName>
<surname>Prazeres</surname>
<forename>João dos</forename>
</persName>
</author>
<imprint>
<date>1692</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Miguel Deslandes, Impressor de S. Magestade</publisher>
<note>More information found in BND ; 191 p.</note>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="Academ.sing">
<monogr corresp="https://purl.pt/21936">
<title>Academia dos ſingulares de Lisboa dedicadas a Apollo</title>
<author>
<persName>
<surname>Faria</surname>
<forename>André Leitão de</forename>
</persName>
</author>
<imprint>
<date>1665</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Henrique Valente de Oliveira</publisher>
<biblScope unit="volume">2 t. em 2 vol.</biblScope>
<note>More information found in BND; 2 vol.</note>
</imprint>
</monogr>
</biblStruct>
<!-- [...] -->
</listBibl>
</sourceDesc>Silva (1789) 1.2. Encoding description
<encodingDesc> is an optional element, which can be used to document the methods and editorial principles which governed the transcription or encoding of the lexicographic resource in hand and may also include sets of coded definitions used elsewhere in the text.
For an explanation of how to encode a taxonomy of domain labels to be used for encoding usage labels, see section on hierarchical usage labels.
1.3. Profile description
In TEI P5, <profileDesc> is an optional element, whereas in TEI Lex-0, it is required: <profileDesc> itself requires <langUsage> and <langUsage> requires at least one <language> element. This is because for any lexical or lexicographic resource, it is essential to identify and record the language(s) used as part of the resource metadata.
Regarding the use of the required attribute role and its possible values (objectLanguage, workingLanguage, sourceLanguage or targetLanguage), see the specification details for <language>.
1.3.1. Language profiles
Lexicographic resources often deal not only with well-established languages, but also with language varieties: historical stages, regional dialects, sociolects, writing conventions, or project-specific subsets of a language. TEI Lex-0 requires that each language (or variety) used in the resource be identified with a standard language tag, and described in a structured way in the TEI header.
At the level of identification, TEI Lex-0 follows established practice: use IETF BCP 47 language tags (usually grounded in ISO 639) wherever possible. Because language varieties are not exhaustively standardized, projects may need to create private-use tags. If you do so, make sure to document them explicitly in the header so that the meaning of each tag is clear to both humans and software.
In TEI Lex-0, language profiles are recorded in profileDesc/langUsage. Each <language> element provides:
- a required identifier (ident) and a required functional role (role) such as objectLanguage or workingLanguage;
- one or more human-readable names (<name>), optionally multilingual using xml:lang;
- optional additional identifiers (<ident>) and descriptive metadata that characterize a language variety.
<language ident="en" role="objectLanguage">English</language> This remains technically valid for backwards compatibility. Moving forward, however, users are strongly encouraged to follow this pattern: <language ident="en" role="objectLanguage">
<name>English</name>
</language> so that the language name can be processed consistently alongside other descriptive dimensions in a way that is compatible with ISO 21636. The most common of these dimensions are:- space (where the variety is used), using <settingDesc> with <place> or <listPlace>;
- time (when the variety is used), using <date> and its dating attributes;
- social group (who uses the variety), using <personGrp> (and, when needed, <langKnowledge>);
- medium (how it is communicated), using <channel> (e.g. spoken vs. written).
<language> is a very flexible element in Lex-0. It allows you to record language profiles that range from basic identification of dictionary's object and working languages to full-blown profiles describing a language variety or its speakers.
<langUsage>
<language ident="ru-x-lit19c" role="objectLanguage">
<name xml:lang="ru">Русский литературный язык XIX века</name>
<name xml:lang="en">19th-century literary Russian</name>
<date notBefore="1800" notAfter="1899"/>
</language>
<language ident="ru" role="workingLanguage">
<name xml:lang="ru">Современный русский язык</name>
<name xml:lang="en">Contemporary Russian</name>
</language>
</langUsage><langUsage>
<language role="sourceLanguage" ident="yi">
<name xml:lang="en">Yiddish</name>
<name xml:lang="yi">ייִדיש</name>
</language>
<language role="targetLanguage" ident="en">
<name xml:lang="en">English</name>
<name xml:lang="yi">ענגליש</name>
</language>
</langUsage><langUsage>
<language status="active" role="objectLanguage" ident="fr">
<ident type="languageIdentifier" subtype="ISO639-3B">fre</ident>
<ident type="languageIdentifier" subtype="ISO639-3T">fra</ident>
<ident type="languageIdentifier" subtype="ISO639-2">fr</ident>
<name type="languageName" role="languageReferenceName" xml:lang="en">French</name>
<name type="languageName" xml:lang="fr">Français</name>
</language>
</langUsage><langUsage>
<language role="objectLanguage" ident="ar-x-shawi">
<name type="languageVariety">Shawi Arabic </name>
<personGrp>
<age>Different ages</age>
<gender>All genders</gender>
<faith>Muslim (Sunni)</faith>
<education>No specific level of education. Any formal education received in Turkish, no formal education in Arabic</education>
<nationality type="citizenship">Turkish</nationality>
<nationality type="ethnicity">Arab</nationality>
<residence/>
<occupation/>
<socecStatus>The speakers of this variety were formerly sheep and goat rearing nomads.</socecStatus>
<langKnowledge>
<langKnown tag="ar-x-shaw"/>
<langKnown tag="tr"/>
</langKnowledge>
<note/>
</personGrp>
<channel mode="s">spoken</channel>
</language>
</langUsage>1.4. Revision description
<revisionDesc> is optional in both TEI and TEI Lex-0. The element is used to document the revision history of the given file. For each recorded revision, one should use the <change> element , together with the appropriate attributes: when to indicate the date of the implemented change, resp to assign responsibility and n to assign a number to the particular change,





