Efficient Subtyping for Unordered XML Types

While XML is an ordered data format, many applications outside the document processing area just drop ordering and manipulate XML data as they were unordered. In these contexts, hence, XML is essentially used as a way for representing unordered, unranked trees. The wide use of unordered XML data should be coupled with a careful and detailed analysis of their theoretical properties. One of the operations that is mostly affected by the presence of a global ordering relation is semantic subtype-checking, i.e., language inclusion. In an unordered context, inclusion has been proved to be inherently more complex than in the ordered case: in particular, subtype-checking for ordered single-type EDTDs is in PSPACE, while the same operation for single-type EDTDs with unordered types is in EXPSPACE (the same complexity result holds for unordered DTDs). Comparing two unordered XML types for inclusion, hence, is very expensive; as a consequence, it becomes very important to identify restrictions defining type classes for which inclusion is tractable or, at least, less complex. This paper identifies two large subclasses of unordered XML types for which inclusion can be computed by an EXPTIME and a PTIME algorithm, respectively. These classes are defined by restrictions on the use of element, repetition, and union types, and comprise many DTDs and XML Schemas used in practice.