Labels

.NET Job Questions About Java Absract class Abstract class Abstract Class and Interface Aggregation ajax aop apache ofbiz Apache ofbiz tutrial Association authentication autocad basics batch Binary Tree bootstrap loader in java build Builder design pattern C++ Job Questions caching CallableStatement in java certifications Chain of responsibility Design pattern charts check parentheses in a string Classes classloader in java classloading concept code quality collage level java program Composition concurrency Concurrency Tutorial Converting InputStream to String Core Java core java concept core java interview questions Core Java Interview Questions Core Java Questions core java tutorial CyclicBarrier in Java data structures database Database Job Questions datetime in c# DB Db2 SQL Replication deserialization in java Design Patterns designpatterns Downloads dtd Eclipse ejb example/sample code exception handling in core java file handling injava File I/O vs Memory-Mapped Filter first program in spring flex Garbage Collection Generics concept in java grails groovy and grails Guice Heap hibernate Hibernate Interview Questions how-to IBM DB2 IBM DB2 Tutorial ide immutable Interceptor Interface interview Interview Questions for Advanced JAVA investment bank j2ee java JAVA Code Examples Java 7 java changes java class loading JAVA Classes and Objects Java Classloader concept Java classloading concept java cloning concept java collection Java collection interview questions Java Collections java concurrency Java CountDownLatch java definiton Java design pattern Java EE 5 Java EE 6 Java Exceptions Java file Java Garbage Collection Java generics Java Glossary java hot concept java immutable concept Java Interface Java interview Question java interview question 2012 java interview question answer Java Interview Questions Java Interview Questions and Answers java interview topic java investment bank Java Job Questions java multithreading java multithreading concept java new features Java Packages java proxy object java questions Java Serialization Java serialization concept java serialization interview question java session concept java string Java Swings Questions java synchronization java threading Java Threads Questions java tutorial java util; java collections; java questions java volatile java volatile interview question Java Wrapper Classes java.java1.5 java.lang.ClassCastException JavaNotes javascript JAX-WS jdbc JDBC JDBC Database connection jdk 1.5 features JDK 1.5 new features Concurrent HashMap JMS interview question JMS tutorial job JSESSIONID concept JSESSIONID interview Question JSF jsp JSP Interview Question JSP taglib JSTL with JSP Junit Junit Concept Junit interview question.Best Practices to write JUnit test cases in Java JVM Linux - Unix tutorial Marker Interfaces MD5 encryption and decryption messaging MNC software java interview question musix NCR java interview question Networking Job Questions news Object Serialization Objects ojdbc14.jar OOP Oracle Oracle SQL Query for two timestamp difference orm own JavaScript function call in Apache ofbiz Packages Palm Apps patterns pdf persistence Portal Portlet Spring Integration Prime number test in java programs Rails Reboot remote computers REST Ruby Sample application schema SCJP security Senior java developer interviews servlet3 servlets session tracking singleton design pattern Spring Spring 2.5 Framework spring ebook Spring framework concept spring MVC spring pdf Spring Security Spring Security interview questions SQL SQL performance SQL Query to create xml file Sql Query tuning ssis and ssrs StAX and XML string concept string immutable string in java strings struts Struts2 Struts2 integration synchronization works in java Technical Interview testing tips Tomcat top Tutorial Volatile in deep Volatile working concept web Web Developer Job Questions web services weblogic Weblogic Application Server websphere what is JSESSIONID xml XML parsing in java XML with Java xslt


Tuesday, 16 July 2013

DTDs and XML Schemas

Describing your Data: DTDs and XML Schemas

If you've been developing with XML for even a short period of time, you are likely to have reached the point of wanting to describe your XML data structures. Document Type Definitions (DTDs) and XML Schemas are key technologies in this area.

Although neither are strictly required for XML development, both DTDs and XML Schemas are important parts of the XML toolbox. DTDs have been around for over twenty years as a part of SGML, while XML Schemas are relative newcomers. Though they use very different syntax and take different approaches to the task of describing document structures, both mechanisms definitely occupy the same turf. The W3C seems to be grooming XML Schemas as a replacement for DTDs, but it isn't yet clear that how quickly the transition will be made. DTDs are here-and-now, while XML Schemas, in large part, are for the future.

What DTDs and XML Schemas Do

Document Type Definitions and XML Schemas both provide descriptions of document structures. The emphasis is on making those descriptions readable to automated processors such as parsers, editors, and other XML-based tools. They can also carry information for human consumption, describing what different elements should contain, how they should be used, and what interactions may take place between parts of a document. Although they use very different syntax to achieve this task, they both create documentation.

Perhaps the most important thing DTDs and XML Schemas do is set expectations, using a formal vocabulary and other information to lay ground rules for document structures. Two parsers, given a document and a DTD, should have the same opinions about whether that document is valid, and different schema processors should similarly agree on whether or not a document conforms to the rules in a given schema. XML editing applications can use DTDs and schemas as frameworks, letting users create documents that meet these expectations. Similarly, developers can use DTDs and XML Schemas as a foundation on which to plan transformations from one format to another. By agreeing to a given DTD or schema, a group of developers has accepted a set of rules about document vocabulary and structure. While this doesn't solve all the problems of application development, it does at least mean that independent development of tools that process these documents is a lot easier.

Schemas and DTDs provide a number of additional functions that make contributions to document content:

  • Providing defaults for attributes: in addition to providing constraints on attribute content, DTDs and XML Schemas allow developers to specify default values that should be used if no value was set in the content explicitly.
  • Entity declaration: DTDs and XML Schemas provide for the declaration of parsed entities, which can be referenced from within documents to include content.

Schemas and DTDs may also describe "notations" and "unparsed entities", adding information to documents that applications may use to interpret their content.

Where DTDs and Schemas Come From

The main thrust of development work, initially for XML 1.0 and its DTDs, and now for XML Schemas, is taking place at the World Wide Web Consortium (W3C). However, the W3C is not the only source for schema languages. At least five other schema proposals have been developed and many of them are in actual use -- notably, Microsoft's XML-Data, which is used for its BizTalk initiative. Most of these proposals are feeding into the main W3C-sanctioned development process. The main contenders in the schema arena, including DTDs, are listed below:

  • DTDs - Document Type Definitions were originally developed for XML's predecessor, SGML. They use a very compact syntax and provide document-oriented data typing. XML DTDs are a subset of those available in SGML, and the rules for using XML DTDs provide much of the complexity of XML 1.0. Complete XML DTD support is (or should be) built into all validating XML parsers, and some XML DTD support is built into all XML parsers.

  • XML-Data/XML-Data Reduced - Based on a proposal that Microsoft and others submitted to the W3C even before XML 1.0 was completed, this schema proposal is used in Microsoft's BizTalk framework. XML-Data provides a large set of data types more appropriate to database and program interchange. XML-Data support is built into Microsoft's XML parser.

  • Document Content Description (DCD) - Created in a joint effort between IBM and Microsoft, DCD uses some ideas from XML-Data and some syntax from another W3C project, Resource Description Framework (RDF).

  • Schema for Object-Oriented XML (SOX) - SOX was developed by Veo Systems (now acquired by CommerceOne) and provides functionality like inheritance to XML structures. SOX has gone through multiple versions. The latest is SOX version 2.

  • Document Description Markup Language (DDML) - DDML was developed on the XML-dev mailing list, creating a schema language with a subset of DTD functionality. Development of DDML (which was once known as XSchema) has halted since the W3C Activity began.

Although you can start work with any of the above tools today -- DTDs being widely supported -- when the specification is complete, using the W3C XML Schemas is probably the safest long-term solution. Fortunately, converting among different schema formats isn't especially difficult, and tools are available to help you in the process.

How Schemas Differ from DTDs

The first, and probably most significant, difference between XML Schemas and XML DTDs is that XML Schemas use XML document syntax. While transforming the syntax to XML doesn't automatically improve the quality of the description, it does make those descriptions far more extensible than they were in the original DTD syntax. Declarations can have richer and more complex internal structures than declarations in DTDs, and schema designers can take advantage of XML's containment hierarchies to add extra information where appropriate -- even sophisticated information like documentation. There are a few other benefits from this approach. XML Schemas can be stored along with other XML documents in XML-oriented data stores, referenced, and even styled, using tools like XLink, XPointer, and XSL.

The largest addition XML Schemas provide to the functionality of the descriptions is a vastly improved data typing system. XML Schemas provide data-oriented data types in addition to the more document-oriented data types XML 1.0 DTDs support, making XML more suitable for data interchange applications. Built-in datatypes include strings, booleans, and time values, and the XML Schemas draft provides a mechanism for generating additional data types. Using that system, the draft provides support for all of the XML 1.0 data types (NMTOKENS, IDREFS, etc.) as well as data-specific types like decimal, integer, date, and time. Using XML Schemas, developers can build their own libraries of easily interchanged data types and use them inside schemas or across multiple schemas.

The current draft of XML Schemas also uses a very different style for declaring elements and attributes to DTDs. In addition to declaring elements and attributes individually, developers can create models -- archetypes -- that can be applied to multiple elements and refined if necessary. This provides a lot of the functionality SOX had developed to support object-oriented concepts like inheritance. Archetype development and refinement will probably become the mark of the high-end schema developer, much as the effective use of parameter entities was the mark of the high-end DTD developer. Archetypes should be easier to model and use consistently, however.

XML Schemas also support namespaces, a key feature of the W3C's vision for the future of XML. While it probably wouldn't be impossible to integrate DTDs and namespaces, the W3C has decided to move on, supporting namespaces in its newer developments and not retrofitting XML 1.0. In many cases, provided that namespace-prefixes don't change or simply aren't used, DTD's can work just fine with namespaces, and should be able to interoperate with namespaces and schema processing that relies on namespaces. There will be a few cases, however, where namespaces may force developers to use the newer schemas rather than the older DTDs.

Alternative Approaches

As exciting as XML Schemas are, there have been a few suggestions for very different approaches that also hold promise. Both Rick Jelliffe's Schematron and the Document Structure Description (DSD), from AT&T Labs and the University of Aarhus, look at documents from a more complex perspective than containment, and use tools derived from style languages -- Schematron is based on XSL, while DSD works from CSS -- to examine documents more closely.

Schematron allows developers to ask about the existence and contents of paths through documents rather than specify containment structures, and places great importance on producing human-readable results. Schematron processing, which can use XSL tools, can produce complete reports on the content and structure of documents, rather than a simple yes/no validation with error reporting.

DSD comes from somewhat similar origins, but uses its own vocabulary to create document descriptions rather than building on the XSL processing model. DSD schemas look much more like the W3C's XML Schemas, but support a different set of tests and have a much greater focus on tasks like providing default content for attributes and elements. DSD allows for context-sensitive rules, where the required usage of a given element may change depending on how it is used in a document. Attributes which are optional in one context may be required in another context. Declarations may impose order on some elements but not on others, making it possible to create 'floating' elements. An open-source implementation in C is available, which adds error information to the document as it is processed, giving applications or users a chance to react to the errors.

It isn't clear at this point whether these approaches will be integrated with XML Schemas at some level, or if they'll be useful tools for supplementing or replacing XML Schemas on particular kinds of projects. In any case, both of these projects are worth further investigation.

Planning Around DTDs and Schemas

Transitioning from one technology to another is often difficult, but at least the transition from DTDs to schemas only involves descriptions of documents, requiring only minor changes to the documents themselves. It is uncertain if it's time yet to begin the transition, as the latest public draft of XML Schemas came with a warning on the XML-dev mailing list that there may be significant changes in future drafts. XML Schemas are still far from stable, so probably only the most enthusiastic early adopters should be considering them at this point.

Although XML Schemas may not yet be ready, XML-based projects should be prepared for their eventual arrival. There are several strategies for handling this transition that may be appropriate to different kinds of projects and different developer needs.

  • Develop DTDs with an eye toward future conversion to schemas. Automated tools for converting among schema formats, like Extensibility's XML Authority, are already available and are likely to grow to include the final W3C XML Schemas.

  • Use other schema formats, like XML-Data and SOX. This lets developers take advantage of features like data typing immediately, and conversions from these experimental schema formats to the new XML Schemas shouldn't be prohibitively difficult.

  • Create well-formed documents for now, ignoring DTDs and schemas in their current incarnation. It's not always easy to retrofit a schema onto a set of documents, but it may be appropriate for some cases where the format of existing data sources (like databases) ensures that there's won't be wild variations in structure. When schemas arrive, you can add them to your processing.

  • Ignore DTDs and schemas completely, and only work with well-formed documents. If you don't need structure checking, this may be a perfectly appropriate strategy.

  • Plan to stick to DTDs. They're here now, they'll be here later. If your XML has to be processed by SGML tools, this may be the best route. Keeping your DTDs around, even if you supplement them with equivalent XML Schemas, will preserve interoperability.

There is no single answer for handling this transition that applies to all XML projects. If all your XML work involves documents, DTDs may be a perfectly adequate tool for your needs, and schemas might only be a distraction. If you're trying to manage data interchange between databases of different kinds, the data typing functionality that schemas provide may drive you to use XML-Data or SOX today, and XML Schemas when they arrive.

The Future for DTDs and Schemas

Right now there are too many options for describing your data, but in the future, they will probably slim down to: DTDs, for legacy XML 1.0 applications and integration with SGML; XML Schemas, and plain old well-formed documents for situations where describing document structures is unnecessary or counterproductive. Whatever you do with DTDs and XML Schemas, remember that their usage should be considered a part of document format specification and documentation. Where documentation is important, these tools will be important, both to set expectations and spare applications the task of checking document structures themselves.

The DSD and Schematron approaches will probably receive more attention in future development as well; Schematron is already an easy and useful supplement to both DTD and XML Schema processing. Both of these tools provide functionality that goes beyond anything the W3C has currently released, demonstrating that there are multiple useful approaches to describing document structures. While it seems unlikely that developers will want to create a DTD, an XML Schema, a Schematron schema, and a DSD, all for the same document, they are all important new tools in the XML developer's toolkit.

No comments:

Post a Comment

LinkWithin

Related Posts Plugin for WordPress, Blogger...