Working XML: UML, XMI, and code generation

 

Contents:
Data modeling
UML and XML
Round-trip schema derivation
Automatic derivation
XML Metadata Interchange (XMI)
XSLT stylesheets
Towards more comprehensive stylesheets
Different forms of modeling
More powerful stylesheets
Stylesheet implementation
Modeling
Decision time
Conclusion
Resources
Design XML vocabularies with UML tools
 

Since XML has become mainstream, a lot of interest in the design of XML applications has emerged. More specifically, many organizations want to integrate the design of XML applications with the design of their other applications. Adopting one common methodology -- or at least one common set of tools -- is a worthwhile exercise.

As far as XML goes, design activities are centered around the data model. Indeed, because XML is a markup language, it is concerned solely with the organization of information -- unlike, say, the Java language which deals both with the data model (class hierarchy) and data manipulation (methods).

This article is the first in a new series for the Working XML column that will explore the use of UML modeling tools, such as IBM Rational Rose and XSLT, to design XML applications. In this introductory article, I will discuss the basics of data modeling and introduce the techniques that I will cover in the next three articles.

Data modeling
The Concise Oxford Dictionary (Oxford University Press, 2001) has no less than seven definitions of the noun "model." For the purposes of this column series, the following definition is appropriate: "a simplified (often mathematical) description of a system etc., to assist calculations and predictions." The three keywords in this definition are simplified, description, and assist.

According to this definition, a model is a description of a system. This statement is crucial; the model is not the system itself but a formalized representation of the system. In the specific case of XML, the system consists of documents encoded according to a specific vocabulary.

The second aspect of the definition is that a model is a simplified representation. It is not as complex or as rich as the system being modeled. Many systems are designed to tackle complex problems so they are complex by nature. For example, look at the complexity of a vocabulary like DocBook: It is designed for publishing technical books and documentation (the Linux documentation is published in DocBook, among others). Because technical books and documentation are complex, DocBook is very complex (see Resources).

Yet humans are somewhat limited in the amount of information they can process at any one time. When most people work on a complex issue, they like (or need) to break it down into smaller, more manageable issues. Models are built to address that need. A model simplifies a complex system by exposing only some aspects of the system.

The last keyword in the definition is assist. Models are not built in a vacuum, but they serve a very specific purpose: to help the designers reason about a system. A model is not imbued with magical virtues; it is only a tool for achieving a specific goal more efficiently. The goal is never to build a model, but to address the system.

The operative nature of models is closely related to the simplification I mentioned above. To simplify means to choose those elements of the system that are worth including and those that should be discarded. The selection is guided by the goals of the model, such as which calculations and predictions you are trying to assist.

Simplification and modeling
It is difficult to emphasize enough that models are a simplification of an actual system. Again, it is impossible to tackle a complex system unless it is broken down into smaller, simpler elements. In practice, one model is not always enough and a complex system may be represented by a range of models, from simple to complex.

The modeling process may start with a sketch of a system on the legendary napkin (a white board or a regular sheet of paper are good alternatives). The first model is usually very rough, ignoring most aspects of the system other than the few aspects that the designer has identified as essential, either because they are particularly complex or because they are key differentiating factors.

This rough model will be refined into one or more models of increasing sophistication and complexity. Each iteration incorporates more elements from the actual system until all the relevant aspects have been incorporated into the model. Ultimately, you'll reach the implementation data model that defines all the aspects that the system can manage.

With XML, the implementation will be an XML schema. Alternatives include a DTD, RELAX NG, or WSDL (see Resources). Although technical differences between these implementations exist, in this series I will treat them as variations on XML schemas.

The industry generally takes two views on the relationship between the models and the XML schema. Some authors draw a clear line between the design models, typically UML models or entity-relationship models which are supposed to be abstract, and the XML schemas which include lots of implementation details. This distinction promotes a clean separation between the modeling activity and the implementation activity. Modeling is typically done by business analysts, while implementation is the responsibility of technicians. This division of work mimics the division of work between the analyst and the developer in typical application development.

While I think the separation is sensible for programming, I am not sure it is always applicable to XML modeling -- which leads me to the second industry perspective on this relationship. An XML schema is a model of a document and, as you will see in my next article, it is not dramatically more sophisticated than a good UML model. Granted, an XML schema contains a lot of technical information, but it is not uncommon for a UML model to capture almost as much technical information. So I prefer to view the XML schema as part of a continuum of models, from the high-level model to the low-level one.

Viewing the schema as just another model is particularly relevant when you install tools to assist in the modeling, as I will suggest in the remaining articles in this series.

Simplification and graphics
One of the most effective simplifications used in modeling is graphics. The mind finds it easier to work with a graphic than with a long list of complex instructions. Most modeling methodologies are built on a visual language such as UML, entity-relationships, or flow charts.

When it comes to XML schemas, what constitutes the best visual language generally falls into two views. One approach is to use an XML-specific language, the other is to use a more generic modeling language. Products like XML Spy or TurboXML (see Resources) use a custom graphical tree representation to manipulate XML schemas. A visual rendering might look similar to Figure 1:

Figure 1. Visual XML structure
XML Structure

The alternative is to use a standard modeling language, such as UML, for this purpose. Figure 2 is a UML model that is similar to Figure 1:

Figure 2. UML for XML modeling
UML model

Each approach has its benefits and drawbacks. XML-specific symbols are a perfect match for the XML constructs: It is easy to identify an XML sequence, an XML choice, elements, attributes, and more. It is also possible to specify all the technical information in a simple and natural way. Until recently, many designers of XML applications would have recommended this approach because it is simple and effective.

The price to pay is that the modeling and the tools often do not integrate well with the rest of the development effort. While this approach remains suitable for small XML projects, it does not scale well. It is difficult to work with a large, complex model because the visual language offers only one level of abstraction. It is also difficult to work on large projects that combine XML, Java, Web Services, and SQL because everybody else in the team may be using UML.

UML is best suited for medium and large scale projects for two reasons:

  1. UML applies to Java, C++, Python, PHP, SQL, Web services, and just about any other development technology. Its universality reduces the training needs (one language works for everybody), and it is easier to share designs across the team.
  2. UML diagrams can show as much or as little information as necessary, so it is possible to prepare several models of increasing sophistication with the same tool.

The major downside of UML is that it is less friendly when working with the low-level aspects of modeling. For example, it is easy to order the elements of a sequence in a tree, but it is very tricky to do so in UML.

UML and XML
I plan to revisit this topic at length in the next few articles. For now, it suffices to say that many mappings are possible between an XML schema and a UML model. UML supports several diagrams, including use case diagrams, package diagrams, sequence diagrams, and activity diagrams.

The most suitable diagram for my purposes here is the class diagram, which represents an object-oriented data model.

Figure 3 is a very simple UML model for a person. It consists of two classes, one for a person's primary data ("person") and one for his or her "address". The rectangle is the symbol for a class and is divided into three parts: class name, attributes, and methods. Because you'll be modeling data rather than behavior, you can ignore the methods.

Figure 3. UML person model
Person model

Relationships between classes are represented through associations. In a model, an association is drawn as a line. The line may be adorned with connectors to differentiate associations. For example, in Figure 3 the solid diamond indicates that the relationship is a composition -- in other words, that instances of the address class can only exist within the context of a person class.

Note that many options are available for mapping UML constructs to XML, with UML attributes being the best illustration. In UML, an attribute is a field attached to a class. In the Java language, only one sensible mapping exists: The attribute becomes a class variable. In contrast, in XML the attribute may map either to a subelement or to a proper XML attribute. I'll revisit this topic in future articles.

Round-trip schema derivation
When working with UML models, you can try several different approaches:

For all but the simplest models, you will want to use a modeling tool. At first sight, a modeling tool may appear to be nothing more than a glorified drawing tool, but it offers much more. A modeling tool understands the model and therefore can provide a lot of assistance to the designer. For example, when adding a class to a diagram, it can draw all these relationships automatically.

Automatic derivation
As I have stated already, I believe the XML schema is just a specific rendering of a very detailed model. Therefore, it is essential to derive the XML schema automatically from the UML model.

Looking at a UML model and attempting to code it in XML schema form can be very time consuming and error-prone. Chances are you will miss some elements or attributes, and it's easy to get relationships wrong. Fortunately, this process is easy to automate if you establish a one-to-one mapping between UML constructs and XML schema statements.

You will find a number of tools that can be used to derive schemas from models automatically, including:

In this series, I will propose a solution built on XSLT and XML Metadata Interchange (XMI). XMI is a standard format that you can use to export UML models in XML. It was originally designed to allow the exporting/importing of models between different tools, but since it is XML, you can manipulate it in XSLT.

In my work, I have found it very advantageous to work with XMI and XSLT for the following reasons:

I have another criteria, which may or may not be relevant for you. I work mostly in e-commerce, so the models I work on are a collaborative effort between several companies. Because different companies may adopt different tools, I can't impose one proprietary product on an entire team when I'm working on a collaborative effort. Because XMI is an industry standard, a solution that builds on XMI generally works well for the whole team.

Figure 4 illustrates this process. I write one or two stylesheets to derive the XML schema from the XMI document, for instance from the UML model.

Figure 4. Deriving the schema
Conversion

I may also prepare a stylesheet that implements the reverse procedure: from XML schema to XMI. This stylesheet is particularly useful when working with existing schemas that have not been modeled in UML.

Till next time
In this first article in this new series, I reviewed the principles behind document modeling and surveyed how to model an XML schema in UML. More importantly, I showed you tools for generating the XML schema automatically from the UML model. Automatic generation is possible using an XMI and XSLT stylesheet. I will present an example of this stylesheet in the next installment.

Part 2

This column is currently focused on modeling, UML, and XML. More specifically, I am exploring the use of UML modeling for XML development and in particular how XSLT stylesheets can help through automatic derivation.

As XML has become a common feature in development projects, many developers have grown interested in integrating XML with the rest of their development. While many organizations still rely on ad-hoc tools for XML development, the trend is towards adopting the same methodology for XML -- or at least one common set of tools -- that is already in use for other development needs, such as Java technology, databases, or the Web.

Automatic derivation
As discussed in part 1, a model is a simplified description of a system that can assist in calculations and predications. In the context of this article, the system is always an XML vocabulary.

Figure 1 illustrates the modeling cycle as a continuum of models. The first models are drawn on the whiteboard (or on a sheet of paper) and tend to be informal. At this stage, the goal is to give all participants (users, developers, designers) a chance to express themselves freely.

Figure 1. A continuum of models
A continuum of models

The next step is to draw a UML model (or several models if the vocabulary is complex). The UML model is more refined and formal, but it remains synthetic and readable because it is intended primarily as a communication device between the team members.

The last model is the XML schema, which is the most precise of them all. Its goal is to allow the parser to validate XML documents against the vocabulary definition so it can forego readability in favor of precision.

The major difference between all these models is their goal: from informal communication to precise, formal validation by the parser. The difference is not in the nature of the models (simplified description of an XML vocabulary), but in the level of assistance each model provides.

If you think of a continuum of models, from the least precise to the most formal, it makes sense to look into automatic derivation -- the process of generating one model automatically from an earlier model. Obviously, automatic derivation works well only if the two models are equally descriptive, which sort of conflicts with the idea of some models being more descriptive than others. Addressing the different levels of description in the models will be the topic of the next column; here, I will focus on derivation.

XML Metadata Interchange (XMI)
You will recall from the previous installment that I implemented automatic derivation through XMI and XSLT. Assuming that you are already familiar with XML schema (if you're not, see Resources), I will introduce XML Metadata Interchange (XMI) in this section.

Vocabularies and compatibility
XMI is a sophisticated specification (version 1.2 is over 400 pages), so, in this article, I will limit myself to the bare minimum description needed for automatic derivation.

XMI does not specify an XML vocabulary, but rather an algorithm that generates vocabularies for metamodels. In other words, XMI does not define Class, Attribute, Association, or other tags as you would expect. Instead, XMI specifies how to create tags for concepts in a metamodel. I know that's a lot of models to work with, but bear with me -- it will become clearer in a moment.

Therefore XMI is not so much a vocabulary as a framework. Unfortunately, this means that no two tools interpret this framework in the same way. Differences also exist between different versions of the same tool: Rational Rose originally supported XMI through an add-on developed by Unisys. The latest versions of Rational XDE have built-in support for XMI, but it's a slightly different variant. The differences are not necessarily significant, but they may cause incompatibilities. In practice, it makes sense to target your stylesheets to the one or two tools that are used in your community and not worry about the rest.

In this article, rather than adopting one specific version of XMI, I will stick with the examples published by the OMG. Although no tool is directly compatible with the samples, this is good middle ground. Adapting them to your tool of choice will not be difficult.

The XMI header
Although it mostly specifies an algorithm, XMI also defines a few tags and attributes. You will need the following:

The metamodel
The UML metamodel is a model that describes the UML language -- specifically, it describes classes, attributes, associations, packages, collaborations, use cases, actors, messages, states, and all the other concepts in the UML language. For coherence, the metamodel is written in UML.

The prefix "meta" indicates that the metamodel describes a model of a model. Likewise, XML is a metalanguage because it's a language that describes languages.

The UML metamodel is published in the UML specification. More specifically, XMI uses the "UML Model Interchange" described in chapter 5 of the UML specification (see Resources).

Be warned that the UML metamodel is rather large and intimidating. I can only give you a flavor for it in this article. Figure 2 is an excerpt from the metamodel that describes the class, one of the central concepts in class diagrams.

Figure 2. the metamodel for a class
The metamodel for a class

In the metamodel, the class concept is modeled as the metaclass Class which inherits from the abstract metaclass Classifier. Classifier is the parent for Class, Interface, and Datatype (the latter two are not represented in Figure 2). The inheritance chain continues to: GeneralizableElement, which represents all concepts that can be generalized (inherited from); ModelElement, which represents all abstractions in the model (such as namespace, constraints, and class); and finally Element, the topmost metaclass. Each of these metaclasses has attributes from which Class inherits.

 

XMI variations
Few vendors document their XMI variant. The workaround is to create a small model and export it. Open the file in a text editor and review it.

The XMI elements and attributes (XMI.header, XMI.content, or xmi.id) serve as a roadmap through the file.

Look for the main elements from the metamodel (such as Class, Attribute, Association) and see how they are mapped into XML. It helps if you have an excerpt from the metamodel handy.

The differences are mostly cosmetic: It seems no two applications use the same namespace. Some applications encode metamodel attributes as XML elements while others use XML attributes (as in Listing 1). In practice, it is very easy to recognize the difference when you compare to the metamodel.

A composition exists between Classifier and Feature, which is the parent of StructuralFeature. Attribute is derived from StructuralFeature.

Confused by the metamodel? Try to forget it's a metamodel, try to forget it's about UML, and look at it as an ordinary model. Figure 2 is simply pointing out the concept of Class, which is a highly specialized element that is related to interface and data type (through its inheritance from Classifier). Class has a name, visibility, and many more attributes. Finally, there is an association between Class and Attribute.

So Figure 2 formally expresses that a class has a name, visibility, and other properties, and that it may have attributes. Indeed, Figure 2 is the definition of a UML class. If you find this confusing, it's probably because the definition itself is written in UML!

I have intentionally simplified Figure 1 to ignore namespace, constraint, stereotype, inheritance, and many other aspects of what makes a class a class. Trust me, they are included in the complete UML metamodel but they are not useful for this article.

Why bother with the metamodel? Because when you feed it to the XMI algorithm, you get an XML vocabulary for UML. As an example, Listing 1 is an XMI representation of Figure 3 (using the variation of XMI illustrated in the specification -- see above):

Figure 3. A UML model for an address
A UML model for an address

Listing 1. The address exported to XMI

<?xml version="1.0"?>
<XMI xmi.version="1.2" xmlns:UML="org.omg/UML/1.4">
 <XMI.header>
  <XMI.documentation>
   <XMI.exporter>ananas.org stylesheet</XMI.exporter>
  </XMI.documentation>
  <XMI.metamodel xmi.name="UML" xmi.version="1.4"/>
 </XMI.header>
 <XMI.content>
  <UML:Model xmi.id="M.1" name="address" visibility="public"
              isSpecification="false" isRoot="false"
              isLeaf="false" isAbstract="false">
   <UML:Namespace.ownedElement>
    <UML:Class xmi.id="C.1" name="address" visibility="public"
               isSpecification="false" namespace="M.1" isRoot="true"
               isLeaf="true" isAbstract="false" isActive="false">
     <UML:Classifier.feature>
      <UML:Attribute xmi.id="A.1" name="name" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
      <UML:Attribute xmi.id="A.2" name="street" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
      <UML:Attribute xmi.id="A.3" name="zip" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
      <UML:Attribute xmi.id="A.4" name="region" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
      <UML:Attribute xmi.id="A.5" name="city" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
      <UML:Attribute xmi.id="A.6" name="country" visibility="private"
                     isSpecification="false" ownerScope="instance"/>
     </UML:Classifier.feature>
    </UML:Class>
   </UML:Namespace.ownedElement>
  </UML:Model>
 </XMI.content>
</XMI>

Notice how the XML elements and attributes in Listing 1 match the classes and attributes in Figure 2. You've now come full circle: The XMI document is a direct representation of the UML metamodel because the UML metamodel is a description of UML itself.

Presentation aspects
A portion of the UML metamodel deals with the visual representation of concepts -- where to draw the concepts on screen. I don't process that information in my stylesheets for two reasons:

XSLT stylesheets
Now that you have the key to reading XMI files, it's easy to map XMI tags to their XML schema equivalents. One possible mapping is:

Listing 2 is an XSLT stylesheet that implements the mapping:

Listing 2. XML schema derivation

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:UML="org.omg/UML/1.4"
                exclude-result-prefixes="UML"
                version="1.0">

<xsl:output indent="yes"/>

<xsl:template match="XMI[@xmi.version='1.2']">
   <xsl:apply-templates select="XMI.content/UML:Model"/>
</xsl:template>

<xsl:template match="XMI">
   <xsl:message terminate="yes">Unknown XMI version</xsl:message>
</xsl:template>

<xsl:template match="UML:Model">
   <xs:schema targetNamespace="http://psol.com/uml/{@name}">
      <xsl:apply-templates/>
   </xs:schema>
</xsl:template>

<xsl:template match="UML:Namespace.ownedElement/UML:Class">
   <xs:element name="{@name}">
      <xs:complexType>
         <xs:sequence>
            <xsl:apply-templates/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xsl:template>

<xsl:template match="UML:Attribute">
   <xs:element name="{@name}" type="xs:string"/>
</xsl:template>

<xsl:template match="text()">
   <xsl:value-of select="normalize-space(.)"/>
</xsl:template>

</xsl:stylesheet>

Obviously, the stylesheet in Listing 2 is still very limited (and it does very limited error checking) because it only supports a small subset of the UML metamodel. It ignores packages, interfaces, associations, and more. You can enrich the stylesheet and support those concepts with a simple extension of the process I've shown you so far: Study the appropriate portion of the UML metamodel, define a mapping to XML schema, and implement it.

Vice versa
Listing 2 is very handy if you follow the normal modeling workflow: from the least detailed to the most detailed model. Frequently you will find that an XML schema already exists, and that it should serve as the starting point for your work. It would be tedious to recreate the UML model, so a stylesheet that implements the reverse mapping is handy. Listing 3 is an example:

Listing 3. Reverse derivation (from XML schema to UML)

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:UML="org.omg/UML/1.4"
                exclude-result-prefixes="xs"
                version="1.0">

<xsl:output indent="yes"/>

<xsl:template match="xs:schema">
 <XMI xmi.version="1.2">
  <XMI.header>
   <XMI.documentation>
    <XMI.exporter>dW simple stylesheet</XMI.exporter>
   </XMI.documentation>
   <XMI.metamodel xmi.name="UML" xmi.version="1.4"/>
  </XMI.header>
  <XMI.content>
   <UML:Model xmi.id="{generate-id()}" 
     name="{substring-after(@targetNamespace,'http://psol.com/uml/')}"
     visibility="public" isSpecification="false"
     isRoot="false" isLeaf="false" isAbstract="false">
     <UML:Namespace.ownedElement>
       <xsl:apply-templates/>
     </UML:Namespace.ownedElement>
   </UML:Model>
  </XMI.content>
 </XMI>
</xsl:template>

<xsl:template match="xs:element">
 <UML:Class xmi.id="{generate-id()}" name="{@name}"
    visibility="public" isSpecification="false" isRoot="true"
    isLeaf="true" isAbstract="false" isActive="false">
    <xsl:apply-templates/>
 </UML:Class>
</xsl:template>

<xsl:template match="xs:sequence">
 <UML:Classifier.feature>
  <xsl:apply-templates/>
 </UML:Classifier.feature>
</xsl:template>

<xsl:template match="xs:sequence/xs:element">
 <UML:Attribute xmi.id="{generate-id(.)}" name="{@name}"
                visibility="private" isSpecification="false"
                ownerScope="instance"/>
</xsl:template>

</xsl:stylesheet>

Towards more comprehensive stylesheets
To say that the stylesheets I have introduced in this article are simplistic would be an understatement. They are less than 50 lines long and deal with a small subset of the UML metamodel. Real-world stylesheets recognize many more UML concepts, and typically weigh in at 500 lines or more. My goal in this installment has been to introduce the concepts behind automatic model derivation:

In this article, I have had to make simplifications. If you try to extend the stylesheets from Listings 2 and 3, you may encounter two problems:

Solving these two problems is the topic of my next two column installments.

Part 3

Use stereotypes and tags to store information in the model
 

In the last two installments of the Working XML column, I explored modeling and, more specifically, the use of UML modeling for XML application development. Modeling is an important aspect of XML development. After all, XML is a structured language, so structuring and organizing information is the raison d'etre of XML. This series of articles focuses on how to combine the XML-specific modeling languages with UML, the industry standard for software development.

Different forms of modeling
When it comes to modeling, I have explained my bias at length in the previous two articles: Briefly, I believe that the most reasonable strategy is to view modeling as a continuous activity that starts with an open discussion in front of a whiteboard (or a piece of paper in smaller offices) and ends with the production of a W3C XML Schema or a WSDL file.

At each step, the model is refined and made more formal. Bearing in mind that a model is a simplified representation of a system that is created to assist in understanding the system, it seems logical that your model will become more complete as your understanding of the system deepens.

Therefore, I believe it is crucial that you use tools to support your modeling activity -- tools that will help you refine the models. I have witnessed several nightmare projects in which modeling failed, and the one thing they all had in common was a lack of integration between the modeling and development activities. Fortunately, a workaround is as simple as deploying tools to integrate the two activities.

Ideally, you want any changes in the model to be instantaneously reported in the implementation, but it is seldom possible to achieve this ideal. For example, Java code generators may generate and update skeleton implementations but they cannot update the algorithms. With XML, you can achieve the ideal because you are always working with data models. The UML model and the XML Schema are both data models, though they use different languages and usually offer different levels of detail.

Stylesheets, XMI, and XML Schema
In part2, I introduced two stylesheets: The first converts UML models saved in XMI into XML Schema; the second performs the reverse operation, generating an XMI file from an XML Schema.

The transformation relies on a mapping from the UML metamodel -- the data model into which UML models are saved -- to an XML Schema. Every UML concept (class, attribute, association, and more) is represented in the UML metamodel, so if you establish that a UML class should become an XML element, the stylesheet simply transforms UML:Class from XMI into an xs:element in the XML Schema.

A simple stylesheet can automate the tedious process of implementing UML models as XML Schemas, but to fit all this material in this article, I have made numerous simplifications.

Note that while most of this discussion centers on the transformation from UML models to XML Schemas, the reverse transformation is helpful too. For example, you might need to include in your project elements developed by another team or company which are available as XML Schema only.

More powerful stylesheets
Here, I revisit the stylesheets and address one of the two issues that I identified in part2: the lack of implementation information in the UML model. This problem develops for two reasons:

Extending UML
From the outset, the designers of UML recognized that UML would have applications that they had not anticipated, so they implemented extension mechanisms that allow users to improve UML.

One of these extension mechanisms is the stereotype, which allows users to define new concepts in UML that refine existing concepts. Using a stereotype, the user can basically say to the modeling tool "I have a concept that is almost a class (or an object, actor, association, and so forth), but is more specialized."

For all practical purposes, a stereotype is a form of inheritance in the metamodel. It helps to think of a stereotype as a descendant of a UML concept.

In fact, if you read the UML specification you will see that the standard itself uses stereotypes quite heavily. For example:

Of course these are in addition to any user-defined stereotypes.

Graphically, stereotypes are indicated by a keyword in angle brackets, such as <<requirement>> in Figure 1. Note that UML allows users to redefine the icon for a stereotype, but few tools implement this feature.

Figure 1. A stereotype in UML
The requirement stereotype on a comment

A second extension mechanism that is closely linked to stereotypes is the tag. While stereotypes allow users to define new concepts in UML, tags allow users to store additional information about these new concepts. While a stereotype offers metamodel-level inheritance, a tag offers a metamodel-level mechanism to add attribute-like information to a stereotype.

A stereotype for XML
I'll start by showing you how to specify a hierarchy. You'll define a stereotype root to mark elements that could be a root in the XML hierarchy. You will mark one or more classes with the stereotype, and the stylesheet will implement them as global elements. (In an XML Schema, only global elements can become a root.)

Note that more than one element can be marked as a root, which is in keeping with the XML Schema language. Note also that I have decided to call the stereotype "root" and not "global" (for "global element"). Several possible mappings can exist between UML and XML; for example, you can make all elements global in the Schema to expose the elements for reuse in other Schemas, or you can make as many elements local as possible. (In fact, there's a sophisticated theory behind the use of global and local elements -- see Resources.)

I believe these implementation choices should not be exposed in the UML model. Amongst other reasons, this allows me to change my implementation rules without revisiting the UML model. Indeed, if I mark elements as "global" and I decide to change the implementation rules -- for example to expose all elements as global -- I need to revisit the entire model and change most of the stereotypes. I don't like having that much exposure to implementation details in a UML model. I plan to explore this issue in more detail in my next article.

The vocabulary that you're working on will also affect what you'll want to expose as a stereotype. For example, when working on vocabulary for publishing- or document-type applications, you may want to define an attribute stereotype. When working on a vocabulary for data storage, it may be more sensible to have fixed rules as to what maps to attributes versus elements (for instance, the stylesheet I present in this article maps everything to XML elements).

Tags for XML
Similarly, I have defined a position tag that specifies the order in which elements must appear in a sequence in XML. The problem here is that the order of elements in the schema must not change from one conversion to another. This is particularly a problem when working with associations; UML tools don't order associations, so you cannot rely on them to output them in the same order.

One workaround is to order associations by name; another is to use tags to control the position explicitly. I have found that the latter solution is often preferable.

Stylesheet implementation
Figure 2 is a UML model that defines three classes (person, address, and job) and the associations between them. This model is an extension of the model introduced in the previous article.

Figure 2. A UML model
A model with associations

 

Listing 1. The model in XMI
<?xml version="1.0"?>
<XMI xmi.version="1.2" xmlns:UML="org.omg/UML/1.4">
   <XMI.header>
      <XMI.documentation>
         <XMI.exporter>ananas.org stylesheet</XMI.exporter>
      </XMI.documentation>
      <XMI.metamodel xmi.name="UML" xmi.version="1.4"/>
   </XMI.header>
   <XMI.content>
      <UML:Model xmi.id="M.1" name="address" visibility="public"
                 isSpecification="fase" isRoot="false"
                 isLeaf="false" isAbstract="false">
         <UML:Namespace.ownedElement>
            <UML:Stereotype xmi.id="S.1" name="root" visibility="public"
                            isSpecification="false" baseClass="Class">
            </UML:Stereotype>
            <UML:TagDefinition xmi.id="TD.1" name="position"
                               isSpecification="false" tagType="String">
               <UML:TagDefinition.owner>
                  <UML:Stereotype xmi.id="S.1"/>
               </UML:TagDefinition.owner>
            </UML:TagDefinition>
            <UML:Class xmi.id="C.1" name="person" visibility="public"
                       isSpecification="false" namespace="M.1" isRoot="true"
                       isLeaf="true" isAbstract="false" isActive="false">
               <UML:ModelElement.stereotype>
                  <UML:Stereotype xmi.idref="S.1"/>
               </UML:ModelElement.stereotype>
               <UML:Classifier.feature>
                  <UML:Attribute xmi.id="A.1" name="name" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.2" name="telephone" visibility="private"
                                 isSpecification="false" ownerScope="instance">
                     <UML:StructuralFeature.multiplicity>
                        <UML:Multiplicity>
                           <UML:Multiplicity.range>
                              <UML:MultiplicityRange lower="0" upper="5"/>
                           </UML:Multiplicity.range>
                       </UML:Multiplicity>
                     </UML:StructuralFeature.multiplicity>
                  </UML:Attribute>
                  <UML:Attribute xmi.id="A.3" name="fax" visibility="private"
                                 isSpecification="false" ownerScope="instance">
                     <UML:StructuralFeature.multiplicity>
                        <UML:Multiplicity>
                           <UML:Multiplicity.range>
                              <UML:MultiplicityRange lower="0" upper="5"/>
                           </UML:Multiplicity.range>
                       </UML:Multiplicity>
                     </UML:StructuralFeature.multiplicity>
                  </UML:Attribute>
                  <UML:Attribute xmi.id="A.4" name="email" visibility="private"
                                 isSpecification="false" ownerScope="instance">
                     <UML:StructuralFeature.multiplicity>
                        <UML:Multiplicity>
                           <UML:Multiplicity.range>
                              <UML:MultiplicityRange lower="0" upper="5"/>
                           </UML:Multiplicity.range>
                       </UML:Multiplicity>
                     </UML:StructuralFeature.multiplicity>
                  </UML:Attribute>
               </UML:Classifier.feature>
            </UML:Class>
            <UML:Class xmi.id="C.2" name="address" visibility="public"
                       isSpecification="false" namespace="M.1" isRoot="true"
                       isLeaf="true" isAbstract="false" isActive="false">
               <UML:Classifier.feature>
                  <UML:Attribute xmi.id="A.5" name="name" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.6" name="street" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.7" name="zip" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.8" name="region" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.9" name="city" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.10" name="country" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
               </UML:Classifier.feature>
            </UML:Class>
            <UML:Class xmi.id="C.3" name="job" visibility="public"
                       isSpecification="false" namespace="M.1" isRoot="true"
                       isLeaf="true" isAbstract="false" isActive="false">
               <UML:Classifier.feature>
                  <UML:Attribute xmi.id="A.11" name="title" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
                  <UML:Attribute xmi.id="A.12" name="organization" visibility="private"
                                 isSpecification="false" ownerScope="instance"/>
               </UML:Classifier.feature>
            </UML:Class>
            <UML:Association xmi.id="A.1" isSpecification="false">
               <UML:ModelElement.taggedValue>
                  <UML:TaggedValue xmi.id="T.1" isSpecification="false"
                                   dataValue="20">
                     <UML:TaggedValue.type>
                        <UML:TagDefinition xmi.idref="TD.1"/>
                     </UML:TaggedValue.type>
                  </UML:TaggedValue>
               </UML:ModelElement.taggedValue>
               <UML:Association.connection>
                  <UML:AssociationEnd xmi.id="AE.1" visibility="public"
                                      isSpecification="false"
                                      isNavigable="true">
                     <UML:AssociationEnd.participant>
                        <UML:Class xmi.idref="C.1"/>
                     </UML:AssociationEnd.participant>
                  </UML:AssociationEnd>
                  <UML:AssociationEnd xmi.id="AE.2" visibility="public"
                                     isSpecification="false"
                                     isNavigable="true">
                     <UML:AssociationEnd.participant>
                       <UML:Class xmi.idref="C.2"/>
                     </UML:AssociationEnd.participant>
                     <UML:AssociationEnd.multiplicity>
                        <UML:Multiplicity>
                           <UML:Multiplicity.range>
                              <UML:MultiplicityRange lower="1" upper="5"/>
                           </UML:Multiplicity.range>
                        </UML:Multiplicity>
                     </UML:AssociationEnd.multiplicity>
                  </UML:AssociationEnd>
               </UML:Association.connection>
            </UML:Association>
            <UML:Association xmi.id="A.2" isSpecification="false">
               <UML:ModelElement.taggedValue>
                  <UML:TaggedValue xmi.id="T.2" isSpecification="false"
                                   dataValue="120">
                     <UML:TaggedValue.type>
                        <UML:TagDefinition xmi.idref="TD.1"/>
                     </UML:TaggedValue.type>
                  </UML:TaggedValue>
               </UML:ModelElement.taggedValue>
               <UML:Association.connection>
                  <UML:AssociationEnd xmi.id="AE.3" visibility="public"
                                      isSpecification="false"
                                      isNavigable="true">
                     <UML:AssociationEnd.participant>
                        <UML:Class xmi.idref="C.1"/>
                     </UML:AssociationEnd.participant>
                  </UML:AssociationEnd>
                  <UML:AssociationEnd xmi.id="AE.4" visibility="public"
                                     isSpecification="false"
                                     isNavigable="true">
                     <UML:AssociationEnd.participant>
                       <UML:Class xmi.idref="C.3"/>
                     </UML:AssociationEnd.participant>
                  </UML:AssociationEnd>
               </UML:Association.connection>
            </UML:Association>
         </UML:Namespace.ownedElement>
      </UML:Model>
   </XMI.content>
</XMI>

 

Listing 1 is the same model, exported to XMI. As in the last article, this listing is based on the XMI standard and not on the XMI document that's produced by any particular tool, but it is easy to adapt to a specific tool. New in this listing are stereotypes, tags, associations, and multiplicity. As I explained in the previous article, you will need to review the UML metamodel to interpret these. They follow the logic introduced previously, so I won't comment on them any further.

 

Listing 2. The updated stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:xi="http://ananas.org/2002/xi/rules"
                xmlns:UML="org.omg/UML/1.4"
                exclude-result-prefixes="UML"
                version="1.0"
                xmlns:xalan="http://xml.apache.org/xslt">

<xsl:output xi:suffix="xsd" indent="yes" xalan:indent-amount="2"/>

<xsl:variable name="root-id" select="/XMI/XMI.content/UML:Model/UML:Namespace.ownedElement/UML:Stereotype[@name='root']/@xmi.id"/>
<xsl:variable name="position-id" select="/XMI/XMI.content/UML:Model/UML:Namespace.ownedElement/UML:TagDefinition[@name='position']/@xmi.id"/>

<xsl:template match="XMI[@xmi.version='1.2']">
   <xsl:apply-templates select="XMI.content/UML:Model"/>
</xsl:template>

<xsl:template match="XMI">
   <xsl:message terminate="yes">Unknown XMI version</xsl:message>
</xsl:template>

<xsl:template match="UML:Model">
   <xs:schema targetNamespace="http://psol.com/uml/{@name}">
      <xsl:apply-templates select="UML:Namespace.ownedElement/UML:Class[UML:ModelElement.stereotype/UML:Stereotype/@xmi.idref=$root-id]"/>
   </xs:schema>
</xsl:template>

<xsl:template match="UML:Class">
   <xsl:param name="multiplicity"/>
   <xsl:variable name="id" select="@xmi.id"/>
   <xs:element name="{@name}">
      <!-- test because the processor reports an error if the parameter was not set
           (i.e. for root elements, the parameter is never set)                     -->
      <xsl:if test="$multiplicity">
         <xsl:apply-templates select="$multiplicity"/>
      </xsl:if>
      <xs:complexType>
         <xs:sequence>
            <xsl:apply-templates/>
            <xsl:apply-templates select="//UML:Association[UML:Association.connection/UML:AssociationEnd[1]/UML:AssociationEnd.participant/UML:Class[@xmi.idref=$id]]"
                                 mode="association">
               <xsl:sort select="UML:ModelElement.taggedValue/UML:TaggedValue[UML:TaggedValue.type/UML:TagDefinition[@xmi.idref=$position-id]]/@dataValue"
                         data-type="number"/>
            </xsl:apply-templates>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xsl:template>

<xsl:template match="UML:Association" mode="association">
   <xsl:variable name="id" select="UML:Association.connection/UML:AssociationEnd[2]/UML:AssociationEnd.participant/UML:Class/@xmi.idref"/>
   <xsl:apply-templates select="//UML:Class[@xmi.id=$id]" mode="association">
      <xsl:with-param name="multiplicity" select=".//UML:MultiplicityRange"/>
   </xsl:apply-templates>
</xsl:template>

<xsl:template match="UML:Class[UML:ModelElement.stereotype/UML:Stereotype/@xmi.idref=$root-id]" mode="association">
   <xsl:param name="multiplicity"/>
   <xs:element ref="{@name}">
      <xsl:apply-templates select="$multiplicity"/>
   </xs:element>
</xsl:template>

<xsl:template match="UML:Class" mode="association">
   <xsl:param name="multiplicity"/>
   <xsl:apply-templates select=".">
      <xsl:with-param name="multiplicity" select="$multiplicity"/>
   </xsl:apply-templates>
</xsl:template>

<xsl:template match="UML:Attribute">
   <xs:element name="{@name}" type="xs:string">
      <xsl:apply-templates select=".//UML:MultiplicityRange"/>
   </xs:element>
</xsl:template>

<xsl:template match="UML:MultiplicityRange">
   <xsl:if test="@lower and number(@lower) != 1">
      <xsl:attribute name="minOccurs"><xsl:value-of select="@lower"/></xsl:attribute>
   </xsl:if>
   <xsl:choose>
      <xsl:when test="@upper and number(@upper) = -1">
         <xsl:attribute name="maxOccurs">unbounded</xsl:attribute>
      </xsl:when>
      <xsl:when test="@upper and number(@upper) != 1">
         <xsl:attribute name="maxOccurs"><xsl:value-of select="@upper"/></xsl:attribute>
      </xsl:when>
   </xsl:choose>
</xsl:template>

<xsl:template match="text()">
   <xsl:value-of select="normalize-space(.)"/>
</xsl:template>

</xsl:stylesheet>

 

 

Listing 2 is an updated version of the stylesheet that converts this XMI model into an XML Schema. Most of the new code is similar to code introduced in the last article, except for the following templates:

I'll leave updating the other stylesheet, which converts XML Schemas into XMI, as an exercise for the reader.

Conclusion
With the material introduced in this series so far, you should be able to prepare your own stylesheets to convert any UML model into XML Schemas. I trust that as you gain experience with the technique, you will find that UML modeling is one of the easiest ways to design XML Schemas.

Although the stylesheets I have introduced are limited to a subset of the UML metamodel, they provide a good starting point from which to design more powerful models.

In the next article, I'll show you how to solve the last remaining problem: designing a stylesheet when more than one mapping from UML to XML is possible.
 

Part 4

Concept mapping
 

Column iconIn this final article in his series on UML and XML, Benoît wraps up the technique. He discusses the need to simplify the model by burying some of the logic in the XSLT stylesheet. He also points out several common pitfalls.

This article concludes this series on modeling XML applications with the industry-standard UML. The previous installments (see Resources) left one question open: What if you have more than one possible relationship between the UML model and the XML vocabulary? The article further refines what has been an ongoing theme for the series: Modeling is about simplifying reality for practical purposes.

As you have seen, I advocate a realistic and flexible approach that is tailored to the needs of your projects. This article will work on the few remaining loose threads to help you apply this material in your context. With the stylesheets introduced so far and a modeling tool (such as IBM® Rational Rose®), it is easy to start modeling your XML project in an industry-compliant way.

Modeling
As I have stated many times in this series, a model is not drawn in a vacuum; it is created to serve a specific purpose. A model is a simplified representation of certain aspects of the reality, and this simplification makes it easier to analyze the underlying reality and ultimately understand it better.

For this series, the reality is an XML vocabulary or a Web service. Admittedly, that's already abstract reality. Yet if you try to read the text of an XML schema, you will quickly understand why it pays to simplify it. The amount of distraction that's caused by the (somewhat convoluted) schema syntax cries out for simplification. One of the most obvious gains of UML is that, because it's a graphic language it is more readable than markup. Another advantage is that UML offers a more synthetic view: One glance at a model gives you a rough idea of the number of classes and the complexity of the relationships. Last but not least, UML drops many low-level syntactical details such as namespace prefixes, local and global elements, and whether a concept is an element or an attribute.

Ideally, modeling will help you better understand your application, and therefore produce more suitable XML vocabularies or design stronger Web services APIs.

Model refinement
How much simplification is appropriate depends on the specifics of your application as well as how refined the model is. As I've shown you, modeling is not a one-shot deal (except with the most trivial applications). Typically, modeling starts with an informal session in which you collect the basic definitions and the most simple relationships in the model. You then refine the model during several review sessions. These iterations gradually build a more formal and more complete model (typically moving from the whiteboard to the modeling tool). Ultimately, the UML model is converted into an XML schema, which is a very formal description of the XML vocabulary. Alternatively, the model could be processed into a WSDL file -- again a formal description of a Web service. You can use the same model to generate Java classes.

Take a look at the final stage: the processing of the UML model into the more precise XML schema. As you have seen, this is surprisingly easy to do if you follow a simple methodology. Many UML modeling tools (such as IBM Rational Rose) store the model according to the definitions of the UML metamodel.

Simply put, the UML metamodel is the set of classes that represent a UML model. From comments to packages, including classes themselves, every concept in UML has a metaclass. The Object Management Group (OMG) has also standardized an XML representation of the XML Metadata Interchange (XMI) metamodel. XMI makes the model accessible to XML developers. Actually, I should say "sort of standardized" because different modeling tools (and even different versions of the same tool) can interpret XMI differently. In practice, the differences are small and it's trivial to cope with them in the stylesheet.

Anyway, to generate an XML schema from a UML model, it suffices to decide which concept from the UML metamodel matches which XML schema tag. For example, it's obvious that a UML class will become an XML element. Since the UML metamodel is stored in XML, generating the schema is as simple as writing an XSLT template that matches all instances of UML:Class and converts them to xs:element.

It is also possible (and often desirable) to implement the reverse stylesheet to generate UML models from XML schemas. This is particularly handy when you need to integrate standard vocabularies into your design. Such vocabularies are seldom distributed in UML form, but rather as XML schemas. With the appropriate stylesheet, it does not take long to reverse-engineer them into a model.

And beyond
To write a stylesheet that matches elements from XMI and transforms them into XML schema elements is conceptually simple enough. The stylesheets I provided in my previous articles were neither particularly long nor abnormally complex.

That's the theory, at least. In practice, things can get out of hand if you aren't careful. First, the XSLT coding -- although never dramatic -- can be involved. Review the stylesheet I introduced in part 3 and pay special attention to the template for UML:Class. It is far from the most complex XSLT template I have written, but it's not as straightforward as the discussion above would lead you to believe. So make sure you hone your XSLT skills before you tackle this project (or just rip off my stylesheet).

Secondly, and more importantly, it is not always simple to decide which UML concept matches which XML concept. In the previous article, I pointed to stereotypes and tags as tools to extend the UML model and support XML concepts that have no equivalent in UML. Stereotypes and tags are helpful, and you may be tempted to cover every single aspect of XML schema through stereotypes and tags. Resist the temptation.

Remember that by definition a model is a simplification, so it makes sense that a model, even a refined one, should not include all of the nitty-gritty implementation details. Many aspects are best left out of the model and buried in the stylesheet itself.

Decision time
The W3C XML Schema Recommendation is complex, and you probably don't need to worry about its every detail, so it pays to decide on a subset of the features that you need and will use for a given project. Don't waste your time with the rest of the recommendation.

A clearer model
What should you include and what should you leave out? Unfortunately, I do not have a specific answer to this question. The best course of action is to include those aspects that are important for your project and leave out the rest. Of course, that leaves the issue of deciding what is important to your project.

For most applications, you want to bury the differences between global and local elements, as well as between element and attributes. In practice, it is more important to define the data fields that are needed rather than to decide on whether those fields should be elements or attributes. After all, elements and attributes are just that: fields where you can store data.

Although there are exceptions, the distinction between elements and attributes is often an implementation detail that's largely irrelevant and distracting for the designer. Therefore, although it may be tempting to use different UML concepts (and possibly stereotypes) to model elements and attributes, I advise you not to.

Why ignore the distinction between elements and attributes? Because it confuses the model and adds very little useful information. Compare the three models in Figure 1:

Figure 1. Three UML models
Three UML models

In counter-clockwise order, the first model (top left) uses stereotypes to mark elements. The second model (bottom) reserves UML attributes for XML attributes, and models XML elements as associations (different UML concepts mark the differences in XML). The last model (top right) makes no such distinction. Which one is the clearest? Keep in mind, this is a simple model. Imagine if there were dozen of classes in each case -- which one would be the most readable? Which one would print out on a the smallest amount of paper? It should be clear that the last model is the most readable.

It pays to treat a UML diagram as a user interface. As much as possible, you should minimize clutter and encode the information in a concise and readable way.

Obviously, if you take information away from the UML model, you need to make it available somewhere else. That's the role of the XSLT stylesheet: It must not only convert between UML and XML, but also implement rules that ensure an efficient conversion. The stylesheet introduced in the previous articles makes the following decisions:

These two simple rules suffice to add all the information that's missing from the model. As an added bonus, if I decide to change the rule (for example, to no longer use local elements), I only need to change the stylesheet; I don't need to change the model. If those details were written in the model, I would need to update it.

But not simplistic
You may disagree with my position on attributes. Although attributes are not important in this particular application, your application may differ. Different projects emphasize different aspects of the XML syntax:

So if attributes are crucially important to your project, make them visible in the model. Again, think of the UML diagram as a user interface into your XML vocabulary. You don't want the UI to hide essential aspects. The model is a tool that has no firm rules on what must appear and what must be left out. You should capture all the information needed for your application and no more.

As you can see, I am not advocating that a single standard mapping can work perfectly for all of your XML applications. XML applications have such diversity that I cannot envision a single mapping that is appropriate for all of them.

Obviously 95% of the UML representation of XML is common to every project, and you can use the stylesheets I have provided as a good starting point. Note that the situation is similar for SQL code generation: You often need to fine-tune code generation to your database.

A word of warning
While I am encouraging you to fine-tune the UML-to-XML mapping to best suit the needs of your project, I recommend that you do so within the framework of UML. You do not need to deviate from standard UML.

Here's an example. Figure 2 is a model that I have often seen used, and it follows bad practice. Specifically, the problem lies in the make-element and namespace-uri attributes.

Figure 2. Poor modeling practice
Poor modeling practice

Presumably, an address does not have a make-element attribute. Instead, the attribute is a hint to the stylesheet to generate a given syntax. The attribute encodes information not about the address, but about the XML coding of the address. This is both dangerous and useless.

It is dangerous because it perverts the definition of UML attributes. An attribute should provide information about its class, not about XML syntax. The result is non-portable, it confuses readers of the model, and it may result in serious maintenance headaches.

Furthermore, it is totally useless because UML provides the extension mechanism (stereotypes and tags) to address this need. If you need to identify a special sort of class or add metalevel information to a class, then you must use UML extensions. Again, while I am advocating that you customize your UML-to-XML mapping to best suit your project, I strongly recommend you do so in the standard manner.

Conclusion
I trust this series has given you insight into UML modeling of XML applications. Interest in modeling XML applications with UML is growing, if only because UML models can be shared with Java, C++, and other languages. I have reviewed the tools (UML metamodel, XMI, and XSLT) to make modeling of XML applications a reality. With a modeling tool and the stylesheets I have provided, you are ready to go.

Resources

About the author
Photo of Benoit MarchalBenoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books.