Wednesday, January 23, 2008

Web Service Clients in Browsers

A quick Google reveals several toolkits for talking to Web Services from JavaScript. The purpose of this post is to toot the horn of a new one. Apache CXF version 2.1 will include support for browser-resident JavaScript clients, and you can try it out, right now, by downloading a snapshot.

What's so special about the CXF JavaScript support?

To begin with, it really does handle standard-conforming web services. Several of the other packages out there lack true support for XML namespaces and XML schema. Some of them require, for example, that the messages use fixed prefixes for XML namespaces, contrary to the XML spec. It handles some of the more complex XML schema cases (xs:any or xs:anyType), though there are some extreme cases (xs:choice) that it doesn't do.

Along with the rest of CXF, it comes under an Apache license, so that those of you with an allergy to the GPL need have no fear.

It offers a very convenient mechanism for delivering the JavaScript code to the browser. A URL like:
delivers the actual JavaScript. Put that URL in a script tag, and you're all set.

It handles MTOM attachments. Why would you want MTOM attachments in a browser, where you can't have any binary at all? You would want them to attach non-XML-1.0 text without paying the base64 tax.

Note that this is a code-generation scheme. CXF generates a JavaScript 'class' (a constructor with some functions on its prototype) for your service, and more JavaScript classes for your bean classes. Incoming messages are deserialized from XML into these objects, and outgoing messages get the reverse treatment. It is not a dynamic scheme in which you hand in a set of parameters and the code tries to map them to some operation of a service.

Using full web services gives you much more flexibility than using REST or other simplified mechanisms. If you have a use for all of that, please give the CXF JavaScript facility a whirl.

Sunday, January 20, 2008

XML 1.0 versus Web Services

On the cxf-user mailing list, we see a question over and over:

"Why can't I sent an escape character to my web service?"

On further examination, the symptoms are always the same. The schema type is xs:string. The data contains a C0 control character, such as Form Feed or Escape. And the results are an error or that the character disappears. And the OP is annoyed.

The OP generally gets even more annoyed when we reveal the ugly truth: there's no good solution to this.


Long ago, when the world was young, the W3C created the specification for XML 1.0. They chose Unicode as the fundamental representation of text, so that everyone's favorite poem could be captured in an XML document.

However, they made the mistake of actually reading the Unicode specification in detail (an exercise for the insomniac if ever there was one). And they spotted the presence of the dusty, musty, old-fashioned ASCII control characters.

Having noticed them, they banished them from XML as per section 2.2. You may think I'm being bombastic with the term 'banished,' but it's really the simple truth. An XML document cannot contain any characters outside of production [2] of section 2.2. A character represented with an & is still a character. So, 
 is no more permitted than the same character sitting there, literally, in the document.

Here comes the nasty part. Consider a simple Web Service. The service has a WSDL, the WSDL has a schema, and the schema specifies a string. An xsd:string. The web service binding will, quite cheerfully, map this to a Java String or a C## string. Any now the fuse begins to burn...

Java Strings and C# strings hold any Unicode characters. Not just the ones that are valid in XML. xsd:string values, on the other hand, describe the XML content model, and so cannot contain control characters. By a certain logic, toolkits should refuse to map xsd:string to plain old String data types. They should map them to some class that checks for compliance with XML. You can imagine how popular that would be.

This problem seems to have become more visible of late. Why? Because more XML parsers are paying attention to section 2.2 of the specification. A few years back, all the common Java XML parsers ignored the restriction, and only the Microsoft DOM made a conspicuous point of rejecting invalid characters. Now, mainstream parsers, as used by mainstream web service toolkits, are paying attention. In the case of WoodStox, sadly, the attention being paid consists of discarding the rejects rather than diagnosing them.

Application developers are not happy. They want to send document content through their web service, without worrying about the occasional stray form feed.

What is to be done?

Well, there's XML 1.1. It does not forbid these characters. However, all of the web service specifications demand XML 1.0, and there's no sign on the horizon of any alternative. So there's no help there.

There's base64. Particularly for short strings, xsd:base64Binary is the only practical solution. Sadly, data bindings for web services don't give you much help here. You'd like to @nnotate that you want to have a Java String as the Java datatype, xsd:base64Binary as the schema datatype, and let the generated code take care of everything else. No such luck. You can call mystring.getBytes("utf-8") and pass the resulting byte[] into your service, and reverse the process on the other end.

Be careful with that UTF-8, by the way. In JavaScript, in particular, there are many base64 packages floating around that assume that the data will be ASCII.

If you have a lot of data, it's time to contemplate attachments.

I just finished teaching CXF's JavaScript client generator to handle MTOM attachments for this purpose. Half the battle was the sloppy documentation on MTOM on the web. Beware of Metro's documentation here. It has bugs in the example of the wire traffic and bugs in the schema for xmime:ContentType. Other than that it's quite informative.

By the time I was done, a Java side DataHandler with content type of 'text/plain;charset=utf-8' was mapped, bidirectionally, to a JavaScript variable in the browser, with an MTOM attachment in between.

Anyone who needs to ship arbitrary textual content through a web service has to think about this. If your application can scoop up a form feed, there is an angel with a flaming sword standing between you and the convenience of xsd:string. You have been warned.

Saturday, January 19, 2008

Web Services: Contract? Code? Who's On First?

Just about all of the writing on Web Services recommends contract-first development. Even packages that have extensive support for code-first, and weaker support for contract-first, feel compelled to salute this flag.

The advice sounds good. A strong contract avoids ambiguity and maximizes inter-operation. What could be wrong with that?

There is one tiny detail missing, however: the language in which you must specify this contract. Like 18th century English legal contracts written in 17th century French and medieval Latin, web service contracts are written in two languages that few really understand and which offer a wide variety of pitfalls to the unwary. These languages are XML Schema and WSDL.

This doesn't necessarily imply that contract-first development is bankrupt and code-first the only true way. Code-first has pitfalls all of its own. It does imply that you need to have a strong understanding of the interactions between the two models if you want to create and deploy services with a minimum of fuss and bother.

Before I dive in, a word of positioning. There are some cases in which the goal of a web service is to transfer a particular, complex, XML document from one place to another. I've never seen the point of this, myself. This posting is addressed to those writing all the rest of the web services in the world, which have the goal of moving some specified set of information from one place to another, which will only incidentally exist in XML as part of the process of moving it from here to there.

On with the show. What's the fuss with XML Schema? XML Schema is a specification language for arbitrary XML documents with arbitrarily complex content models. XML documents derive, as all of you know, from SGML, which was intended as a markup language. Thus, XML schema must cope with mixed content models.

I don't think that many people contemplating contract-first development need to be told to avoid mixed content model schemas.

XML Schema has a type model. If you are considering contract-first development, you have to understand it. I can promise you that this will not be an easy task. The XML Schema type model is fundamentally different than C++, Java, C#, Python, Perl, Common Lisp, and every other language with inheritance that I've ever encountered.

Stop and savor the irony. The idea of contract-first development is to use a specification that is agnostic as to programming language. It could be seen as an achievement, of a sort, to come up with a specification that is equally incompatible with all known programming languages.

This isn't the fault of the XML Schema designers. They didn't set out to build a data model for use behind many programming languages. They set out to build a specification language for XML documents.

Over and above the type inheritance model, XML schema includes several elements that map poorly to programming languages, such as xs:any and xs:choice.

My purpose here is not to write a jeremiad against XML schema. Rather, it is to suggest a practical approach to constructing web service contracts.

The best way to understand the implications of XML schema constructs is to convert them to code and see what they look like. In the Java universe, JAXB is the predominant mapping, and JAXB comes with xjc.

If you run some schemas through xjc and examine the results, you will begin to see some patterns. First and foremost, you'll see that the resulting code is infested with snails. That is to say, it has many, many, @ annotations.

Many of those annotations are redundant, especially with more modern toolkits such as CXF. Some of them are there so that you can edit the code fairly aggressively without changing the contract.

Next, you can try the opposite experiment. Write straightforward, simple, interfaces and bean types, and then run schemagen (for the beans) or a full java2ws tool (to include the interfaces and operations). Read the resulting XML schema.

Notice that straightforward code constructs lead to relatively simple, stereotypical, XML schema. Using those same constructs in schema leads to simple, readable, code. Let me give you an example of the opposite.

You might, some day, feel inclined to put the following in your schema:

<xs:any namespace="##any"/>

This allows any element to occur. I've seen code that generates schemas like this 'to allow for expansion.' Now, what happens if you map that to code with JAXB? What happens is this: you get an @XmlAnyElement annotation. There's just one problem: @XmlAnyElement doesn't correspond to '##any'. It corresponds to '##other'. In other words, the tools silently change your contract on you?

How does this happen? Well, I'm not a student of the history of JAXB, but my sense is that it's designers didn't want to tie themselves inextricably to XML Schema. They wanted a set of snails that could, perhaps, be mapped to some other schema specification, like RELAX-NG.

For example, consider arrays. In XML Schema, the closest thing to an array is the minOccurs and maxOccurs attributes of elements. Optional elements have 0, 1. Required elements have 1, 1. Arrays are generally 0, N, where N can be 'unbounded.' (Don't ask me what you get in JAXB if you specify, say, 12, 15.)

In JAXB, you have 'required' on an element. An element with required=true gets 1,1. An element with required=false gets 0,1. And a Java array gets 0,unbounded.

There are similar dances with the XML Schema 'form' attribute.

What's a person to do about all of this? Well, I offer a possible prescription.

Unless you are already a scholar of XML Schema, ignore the prescriptions of contract-first.

The first step is to design a code-first contract as as contract. Don't try to 'remote' whatever you have lying around by slapping a few snails on it. You might protest, 'Now I have extra classes all over the place and I have to copy all my data from my real objects to these special contract objects.' Well, my friend, you'd be in the same position if you used contract-first, only with a lot more snails and much more confusing code.

As a particular point here, consider using Document/Literal/Bare. That's right, bare. Not wrapped. One of the causes of confusion in code-first development is conflicts between the front-end (e.g. JAX-WS) and the data binding (e.g. JAXB). In a bare service, the front end is narrowly focussed on the interface, and the data us under the control of the binding. You don't have to worry about who wins a war between @XmlRootElement and @WebParam.

The second step is to review the schema. Pull the WSDL with the appropriate tool, and study it. Make sure you understand it. If it has strange quirks, adjust your code until it is clean.

The third step is to freeze the code. A contract should sit still until you make an organized, intentional, decision to evolve it. This is another justification for those 'extra' objects. They allow the contract to stay put while the code beneath it evolves.

If you adopt this discipline, you will find that it restricts you to a relatively simple set of constructs. Java Map objects are right out. Complex polymorphism will fall by the wayside.

This is the cost of interoperability. If you don't care about interoperability, then you don't need any of this. If Web Services are just an RPC mechanism, and you control both ends in real time, then you can write any code-first thing you like. Just don't come looking for too much help when the more obscure code-first constructs don't do precisely what you want.

So, let's review the good news. You don't have to become an XML Schema adept to build an web services that interoperates. You don't have to become a conchologist, either. You do have to be aware of the dual nature of what you do in code and schema.