What S This Xml Stuff Anyway
Posted on 13 Mar 2000
by Scott Dexter (sgd)
Rated 4.25 (Ratings: 13)
- More articles in Code
You've heard of it enough to add 'XML' to the list of words you hate; you've seen it in materials gloating about how XML is going to change the future. Okay, sure. But how?
You've read just enough technical articles to know what XML looks like, but it leaves you questioning just what XML does and, more importantly --if its going to change the future-- how do you start thinking about it in real world terms? Well, this article is by no means the definitive demystification of XML, there are lots of people who beat me to it(references). --And considering XML is probably the #1 Developer's Cool Thing to Play With at the moment, resources abound (references). My intention here is to take you out of XML Purgatory and get you thinking about how XML could work in your next project.
So what about this DTD thingie?
First of all, let me square up the DTD thing. A DTD (Document Type Definition) is used to verify the XML file that references it. In other words, a DTD enforces the structure and datatypes in the XML file. It is perfectly okay to have XML files floating around that don't reference a DTD, as long as the XML is "well formed," which is to say it's XML and it works, it just doesn't have anything behind it to make sure someone didn't screw something up. The DTD outlines the hierarchy and fields and attributes and what kind of data they can have in them. I think of it as equivalent to adding referential integrity to a database --let the DTD and the XML parser do the work of figuring out if the file is perfect.
How many of your HTML files have a DTD reference in it? --Did you know that it should have one? (if you answered, "I type it in religiously because my idiotic WYSIWYG editor won't put it in" then you might want to contact the W3C, they need people like you) --The same optional declaration applies to XML files. "Optional" in the sense that so far, we can get away without it.
DTDs are a Good Thingtm. If you plan on working with XML with any regularity, you should get in the habit of creating and using them, but for your first steps into XML-land, you can work without 'em.
I can use XML to display my mom's cookie recipe in Netscape, right?
Yes, you can. But in my mind, XML is more than presenting data. Technically, XML isn't about presenting data at all. Its about describing data. Right now, XML has its biggest asset in business to business arenas, where well known models like EDI (Electronic Data Interchange) are used by really big companies to shuffle data around. In non-ten-cent words, you can replace listless comma-delimited files for context-rich XML.
So XML is for companies like EDS and Microsoft to rework things and make more $$?
Well, I guess that's one way to look at it. But wait! It has a great place for us small kids too. Say you just nabbed a client that wants you to build a database driven site for them. Great! --Thing is, they want to be able to send you data updates, oh, four or five times a month. Depending on the size of the data*, taking the time to define an XML schema (an informal schema between you and your client, not the formal XML schema) --which you can look at as an extension of the db design-- and handing that to your client is light years beyond squabbling over file formats and how your client is going to get the data to you out of his back end software and other stuff, and how you're going to import it to your db. --Not only does this save you development time, it can be a faster way of doing things (using XML to rip through text files is faster than Visual Basic's file i/o). XML can be sorted, worked with, added to, categorized, right there as it sits in the text file. And here's a bonus: its reusable and eXtendable. So when you sign on another client who wants a site "just like that one" you've got all your stuff ready to go. Take this a step further: create a DTD for it so all the world can use it and be the only kid on your block who is doing XML data exchange and become a pillar in your community.
*Because XML adds to the file size of anything, and it is ASCII text, it could take a 25K comma delimited file and make it a monster. There are people worrying about that, the W3C has a Working Group assigned to figure it out. The one thing I learned in school reminds me on a daily basis: "It Depends." Now back to your program.
Okay, I get it, but I need a mantra
XML is not about presenting data. Get that outta yer head. HTML is about presenting data. XML is about describing the data; giving the data context. XSL is the middle ground to get the data into HTML's hands.
Here's your mantra: XML packages data with its context.
You package data without having to know what the other guy is going to do to it --you don't have to know. Once packaged, it can go anywhere. And you can suck in data from anywhere. Think about your cellphone hitting a web server (over http) and nabbing a little piece of XML that describes a couple stock quotes. Your browser at home can grab the identical data. That app on the web server doesn't care what the client is, it sends the same XML packlet (package+'let'). The client app is the one that has to worry about how to present it. I'm not going into our challenges as developers of these clients here.
"With its context?" Huh?
Keeping data with its context isn't saying anything more than this: Describe the data in a structured, orderly way that is unambiguous and clear to the human reader.
For example, data without context, such as found in comma-delimited files across the globe, might look like this:
Harry,Bob,Sue,1,2,3,CEO,VP,XML Guru,123 Main Street
Unless you are the only person who ever deals with this data, or have the db handy or some documentation (that may or may not exist), you have no idea what those data values mean. Sure, you could guess, but what if you're wrong? --There's no context.
Data with context looks like this:
<office> <employees> <employee id="1" title="CEO">Harry</employee> <employee id="2" title="VP">Bob</employee> <employee id="3" title="XML Guru">Sue</employee> </employees> <address>123 Main St.</address></office>
Now we've got context! You know exactly what the data is talking about. The super-bonus is your local XML parser knows, too. If we get adventurous and define a DTD to define the above data, then instead of haggling over file formats and fixed length vs. delimited and other such nonsense, you just hand over the DTD to the other developers and you've got time to go play in the flowers.
By the way, the above is perfectly valid XML. It doesn't reference a DTD, but so what? We can still work with it: copy it and paste it into an empty text file, name it something.xml, and open it in IE4 or IE5. Notice IE recognizes the tree structure and you can expand/collapse the
<office> nodes. Congrats, you're no longer in XML Purgatory
Okay, now where?
Like any new toy, its pretty much useless unless you apply it to something. I remember trying to learn Lotus 1-2-3 way back in the day, but I didn't get very far because I had no data to work with --its tough trying to create a pie chart when you have no data.
Think about data you use every day that could live in XML-land. Your family tree? Your CD collection? The registered users on your site? A way to package and store and organize --and automagically unleash to the world-- the <tip>s you see on thelist emails?
Resources I like
- This is a great place to start. I use it as my XML launching pad. Discussion, HowTo, Specs, its all there.
- Microsoft's XML Developer Center
- Stop snickering. XML is one spot where MS is actually doing quite a fine job. And chances are you will be interfacing with the MS way of doing XML, so its a resource. You find that you end up there more than you'd admit, but that's okay, we all do it
- The World Wide Web Consortium (W3C)
- If you get to the point where you actually have to do file transfers with a client, take their developer(s) by the hand and head here. WDDX is a cross-platform developer's toolkit and DTD for XML file exchange. The principal evangelist is some guy named Allaire, er something
- Microsoft's initiative to coordinate using XML for business to business. I herby inaugurate the first annual "My XML site can beat up your XML site" with BizTalk and WDDX as our first contenders.
(its how I see it, anyway)