Transformation API For XML (TrAX)

Edit Date: November 12, 2000

Introduction

This overview describes the set of APIs contained in javax.xml.transform, javax.xml.transform.stream, javax.xml.transform.dom, and javax.xml.transform.sax. For the sake of brevity, these interfaces are referred to as TrAX (Transformation API for XML).

There is a broad need for Java applications to be able to transform XML and related tree-shaped data structures. In fact, XML is not normally very useful to an application without going through some sort of transformation, unless the semantic structure is used directly as data. Almost all XML-related applications need to perform transformations. Transformations may be described by Java code, Perl code, XSLT Stylesheets, other types of script, or by proprietary formats. The inputs, one or multiple, to a transformation, may be a URL, XML stream, a DOM tree, SAX Events, or a proprietary format or data structure. The output types are the pretty much the same types as the inputs, but different inputs may need to be combined with different outputs.

The great challenge of a transformation API is how to deal with all the possible combinations of inputs and outputs, without becoming specialized for any of the given types.

The Java community will greatly benefit from a common API that will allow them to understand and apply a single model, write to consistent interfaces, and apply the transformations polymorphically. TrAX attempts to define a model that is clean and generic, yet fills general application requirements across a wide variety of uses.

General Terminology

This section will explain some general terminology used in this document. Technical terminology will be explained in the Model section. In many cases, the general terminology overlaps with the technical terminology.

Requirements

The following requirements have been determined from broad experience with XML projects from the various members participating on the JCP.

  1. TrAX must provide a clean, simple interface for simple uses.
  2. TrAX must be powerful enough to be applied to a wide range of uses, such as, e-commerce, content management, server content delivery, and client applications.
  3. A processor that implements a TrAX interface must be optimizeable. Performance is a critical issue for most transformation use cases.
  4. As a specialization of the above requirement, a TrAX processor must be able to support a compiled model, so that a single set of transformation instructions can be compiled, optimized, and applied to a large set of input sources.
  5. TrAX must not be dependent an any given type of transformation instructions. For instance, it must remain independent of XSLT.
  6. TrAX must be able to allow processors to transform DOM trees.
  7. TrAX must be able to allow processors to produce DOM trees.
  8. TrAX must allow processors to transform SAX events.
  9. TrAX must allow processors to produce SAX events.
  10. TrAX must allow processors to transform streams of XML.
  11. TrAX must allow processors to produce XML, HTML, and other types of streams.
  12. TrAX must allow processors to implement the various combinations of inputs and outputs within a single processor.
  13. TrAX must allow processors to implement only a limited set of inputs. For instance, it should be possible to write a processor that implements the TrAX interfaces and that only processes DOM trees, not streams or SAX events.
  14. TrAX should allow a processor to implement transformations of proprietary data structures. For instance, it should be possible to implement a processor that provides TrAX interfaces that performs transformation of JDOM trees.
  15. TrAX must allow the setting of serialization properties, without constraint as to what the details of those properties are.
  16. TrAX must allow the setting of parameters to the transformation instructions.
  17. TrAX must support the setting of parameters and properties as XML Namespaced items (i.e., qualified names).
  18. TrAX must support URL resolution from within the transformation, and have it return the needed data structure.
  19. TrAX must have a mechanism for reporting errors and warnings to the calling application.

Model

The section defines the abstract model for TrAX, apart from the details of the interfaces.

A TRaX TransformerFactory is an object that processes transformation instructions, and produces Templates (in the technical terminology). A Templates object provides a Transformer, which transforms one or more Sources into one or more Results.

To use the TRaX interface, you create a TransformerFactory, which may directly provide a Transformers, or which can provide Templates from a variety of Sources. The Templates object is a processed or compiled representation of the transformation instructions, and provides a Transformer. The Transformer processes a Source according to the instructions found in the Templates, and produces a Result.

The process of transformation from a tree, either in the form of an object model, or in the form of parse events, into a stream, is known as serialization. We believe this is the most suitable term for this process, despite the overlap with Java object serialization.

TRaX Patterns