Publications
Authors:
  • Boris Chidlovskii
Citation:
KRDB'01 Workshop (Knowledge Representation and Databases), Rome, Italy, September 15, 2001
Abstract:
New XML schema languages have been recently proposed to replace Document Type Definitions (DTDs) as
schema mechanism for XML data. These languages consistently combine grammar-based constructions with
constraint- and pattern-based ones and have a better expressive power than DTDs. As schema remain optional
for XML data, we address the problem of schema extraction from XML data. We model the XML schema as
extended context-free grammars and propose the schema extraction algorithm that is based on methods of
grammatical inference. The extraction algorithm copes also with the schema determinism requirement imposed
by XML DTDs and XML Schema languages. We report results of some tests on real XML collections.
Year:
2001
Report number:
2001/200
Attachments: