Search Posts

Visits: 709

Why do we validate XML data

Validation can be considered a “firewall” against the diversity of XML. We need such firewalls principally in two situations: to serve as actual firewalls when we receive documents from the external world (as is commonly the case with Web Services and other XML communications), and to provide check points when we design processes as pipelines of transformations. By validating documents against schemas, you can ensure that the documents’ contents conform to your expected set of rules, simplifying the code needed to process them.

Validation of documents can substantially reduce the risk of processing XML documents received from sources beyond your control. It doesn’t remove either the need to follow the administration rules of your chosen communication protocol or the need to write robust applications, but it’s a useful additional layer of tests that fits between the communications interface and your internal code.

Validation can take place at several levels. Structural validation makes certain that XML element and attribute structures meet specified requirements, but doesn’t clarify much about the textual content of those structures. Data validation looks more closely at the contents of those structures, ensuring that they conform to rules about what type of information should be present. Other kinds of validation, often called business rules, may check relationships between information and a higher level of sanity-checking, but this is usually the domain of procedural code, not schema-based validation.

XML is a good foundation for pipelines of transformations using widely available tools. Since each of these transformations introduces a risk of error, and each error is easier to fix when detected near its source, it is good practice to introduce check points in the pipeline where the documents are validated. Some applications will find that validating after each step is an overhead cost they can’t bear, while others will find that it is crucial to detect the errors just as they happen, before they can cause any harm and when they are still easy to diagnose. Different situations may have different validation requirements, and it may make sense to validate more heavily during pipeline design than during production deployment.

 “XML Schema” 1.1. What Schemas Do for XML, O’REILLY

Requirements on XML document validation

  • Pre-requisite

  • XML Schema document should be valid.
  • XML instance document should be well formed.
  • XML instance document should specify xml schema
  • Required validations

  • Is this document contain required fields in required order and occurrence?
  • Is the field satisfy lexical presentation requirements, e.g. ISO8601 data and time format, e.g. CCYY-MM-DD, HHMM?
  • Is this data not contain illegal set of characters, e.g. control characters(TAB, LINEFEED, etc.)?
  • Is this document satisfy conditional requirements for fields? For instance, some field SHOULD NOT be omitted when  detail information is not reported.