10. Patterns
10.1. Inheritance of xml:lang
Some elements in TEI Lex-0, like <entry>, for instance, have a required attribute xml:lang; others like <form> or <quote> do not. In general, TEI Lex-0, unlike TEI, recommends that the xml:lang be attached to so-called container elements (for instance, <entry> and <cit>) rather than on individual word forms or textual segments.
TODO: Add some examples
So how can we extract all orthographic forms in a particular language? We can use an XPath expression like this: //orth[ancestor-or-self::*[@xml:lang][1][@xml:lang='en']] .
This XPath expression identifies:
- each
orthelement, regardless of where it is in the document (//) - but only if it itself or one of its ancestors has the
@xml:langattribute ([ancestor-or-self::*[@xml:lang]]) - when looking for ancestors with the
@xml:langattribute, we stop at the first such ancestor (i.e. we look for the nearest ancestors) ([1]) - finally, we filter out only those selected elements with the
@xml:langattribute whose value is'en'
If your dictionary uses multiple language tags for one language (as in 'en', 'en-GB' and 'en-US') and you want to capture all language varieties with one XPath expression, you can use the XPath lang() function as in: //orth[ancestor-or-self::*[@xml:lang][1][lang('en')]].
While the predicate [@xml:lang='en'] will match only those elements whose xml:lang is exactly equal to 'en', the predicate with the function [lang('en')] will match all the elements whose language is tagged as either English (i.e. 'en') or one of its 'sublanguages' such as 'en-GB'.
If you are new to XPath, you can check out a DARIAH-Campus tutorial XPath for Dictionary Nerds.

