XML Database



Introduction The Birdstep Database Engine XML Interface (IBXML) is a module which enables the Birdstep Database Engine to use XML natively. It provides two standardised programming interfaces for XML: The Simple API for XML (SAX) and the Document Object Model (DOM) in addition to a low-level interface for the native IBXML structures. The Extensible Markup Language (XML) is a meta-language defined by the World Wide Web Consortium (W3C), and is a subset of the Standard General Markup Language (SGML) defined in ISO standard 8879:1986. XML is thus related to HTML, but XML is neither a predefined set of tags of the type defined for HTML, nor is it a standardized template for producing documents of different kinds. XML is a specification of a set of languages that can be used to annotate arbitrary character data, to describe hierarchical structures and to attribute meta-data to character data.

Features Birdstep DataBase Engine is an ultra-small footprint database for Linux, VxWorks, pSOS, QNX, EPOC, Solaris, FreeBSD, Windows (2000, NT, 9x, CE, PocketPC) and PalmOS.The fully featured database engine fits in less than 50Kb - Birdstep DataBase Engine enables a complete implementation to fit in less than 100Kb. Birdstep DataBase Engine search for data using a highly efficient search structure that replaces indexes, thus saving significant space and improving performance compared to rival systems.

Capabilities Birdstep DataBase Engine can search, insert, update, and delete the data in the database created by XML file. The database can be part of an application.

Cost  Free product

IBXML The Birdstep DataBase Engine XML Interface (IBXML) provides two standardized programming interfaces for XML : The Simple API for XML (SAX) and the Document Object Model (DOM) for use by the application programmer in addition to a low-level interface to the actual native IBXML implementation.

- IBXML data structure IBXML is implemented using the Birdstep DataBase Engine notion of fields, inner-relations and user types. The stored XML document can be thought of as unidirectional graph, e.g. the graph in Figure 2 which represents the sample XML document in Figure 1. In this figure each box is called a node. Each of these node corresponds to a single field in a Birdstep DataBase Engine database, and each logical unit of information in an XML document is stored using a separate node.

Figure 1. Fragment of an XML document

<?xml version='1.0'?>
<!doctype list>
 <magazine frequency="weekly">
  <title>XML World</title>

Figure 2. IBXML storage graph of sample XML document in Figure 1

The nodes in the document body, i.e. the nodes not part of the prolog chain, are connected by pointers, which are divided into two groups: implicit pointers and explicit pointers. The implicit pointers are provided by the inner-relation mechanism of Birdstep DataBase Engine, which means that a sequence of nodes connected only by implicit pointers constitutes an inner relation in the database. An explicit pointer is the contents of a node with a certain identifier containing a database address. A node containing an explicit pointer is then also part of an inner relation.

- IBXML low-level interface The IBXML database I/O component interfaces with IBAPI and is responsible for all reading and writing of data in the database. The component plays multiple roles when interacting with the database depending on whether data is retrieved or stored (see Figure 3). Compared to the relationship between an application and an XML parser for XML documents stored as pure text, the IBXML database I/O component functions as an XML parser when retrieving an XML document stored in a Birdstep DataBase Engine database, and it plays the role of an application or an XML generator when storing XML documents.

Figure 3. Role of IBXML I/O components.

The low-level interface of IBXML provides access to operations on the IBXML databases on the node-level. Nodes may be created, modified, deleted and relinked. If used only for outputting information, the IBXML low-level interface performs the same set of operations on the data stored in an IBXML database as an XML parser performs on an XML documents in plain text. Application developers may use the low-level interface in special circumstances, but the functions are not primarily intended for end-user applications as they do not guarantee that the resulting IBXML storage graph is internally consistent and represents a well-formed XML document.

- SAX The IBXML SAX interface component provides the SAX version 1.0 interface and implements drivers for parsing of XML documents stored as text and for retrieval of XML documents from databases. It also implements handlers for outputting of XML documents as text and for storing of XML documents in databases (see Figure 4). In a sense these drivers and handlers are wrappers for the IBXML database I/O component and for XML generators of different kinds (see Figure 5).

Figure 4. SAX subcomponents.

Figure 5. Roles of SAX subcomponents.

The Simple API for XML (SAX) describes an event-driven interface to the process of parsing XML documents. SAX is an API in the public domain, developed by individuals on the XML-DEV mailing list and does not have a formal specification document, but is defined by a public domain implementation using the Java Programming Language.  An event-driven interface provides a mechanism for notifications to the application code as the underlying parser recognizes XML syntactic constructions in the document.

The SAX interface is implemented using C++ and thus sometimes deviates from the defining public domain Java implementation due to language differences. The SAX interface component also include drivers for parsing of XML stored as plain text and in IBXML databases, and it features handlers for storing of data in the same two formats.

- DOM For some classes of applications, using SAX or interfacing directly with an XML parser may be the ideal way to process XML documents. If the application is expected to handle XML documents with as little latency as possible or to handle documents too large to fit in memory, processing each event as it occurs in the document is needed.

The problem with using SAX is that the application has to setup event handlers for all elements the application cares about and build its own data structures on-the-fly as the events occur. Rather than responding to each event, it would be easier if the entire tree was already loaded into memory and it was possible to walk the tree and manipulate parts of the tree in a simple way.

Just as an XML parser in general and SAX in particular adds a layer of abstraction over the actual textual representation of the XML document, the Document Object Model (DOM) adds a layer of abstraction on top of the entire document. DOM standardizes the object model representing an XML document and defines a language- and platform-neutral interface to the structure and style of XML documents, which a process may dynamically access and update. Elements are considered as nodes in a tree instead of being composed by start- and end-tags. Nodes may have parents and children, and they may have internal properties which can be modified using objects and methods.

Figure 6 shows the DOM tree representing the sample XML document in Figure 1. Each circle, box and diamond in the figure is referred to as a node. The top-level <bookstore>-node is the root node of the tree. Note that attributes (the diamond in Figure 6) are represented by name only, i.e. both attribute name and attribute value belong to the same node.

Figure 6. DOM tree representing sample XML document in Figure 1

The DOM specification specifies two parts of the DOM level 1: DOM core and DOM HTML. DOM level 1 core contains all functions necessary to manipulate XML documents, whereas DOM level 1 HTML adds support for HTML documents.

As of IBXML version 1.0β, the IBXML DOM interface component has not been implemented. It will support DOM core level 1 and, at a later stage, DOM level 2. Access to the DOM will be offered through CORBA, which will interface directly with the DOM component of IBXML. Language bindings for C++ using the DOM component of IBXML will also be available. The IBXML storage model is designed to be as close to the DOM as possible, but still simple enough to be efficient when outputting data sequentially, i.e. through SAX. Since the IBXML stores XML data in structures which are close to the DOM, IBXML does not have to load an entire XML document into memory before the user may access it. The DOM methods and objects can be accessed while the document is in the IBAPI buffer cache mechanism. This is a major difference from the approach necessary when the XML document is stored as a sequential chain of entities.