Miss an article? Archives
Thursday, May 07, 2009
In this exclusive interview, XML guru Norm Walsh chats with Scott Abel, The Content Wrangler about structured content, content standards, and the future of publishing. Read this interview and you’ll learn why XML documents aren’t a good fit for relational databases, how university professors are creating custom text books for students, and find links to several innovative projects that are demonstrating the power of XML and its cousin XQuery.
[Note: I’ll blogging from the MarkLogic User Conference, May 11-14, 2009 where I’ll be reporting on topics including those mentioned in this article. You can follow my adventures on the conference blog and on Twitter.]
TCW: Norm, thanks for taking time to chat with me today.we’ve known each other for some time now, but, for our readers who don’t know who you are, tell us a little about yourself and your connection to XML.
NW: Sure. I’ve been doing XML since we spelled it SGML. I started with Structured Generalized Markup Language back in the mid-nineties. My day job now is a wonderful combination of development work, helping customers build cool stuff with XML and XQuery, standards work at organizations like the W3C, pre-sales engagements talking about interesting and sometimes hard problems, speaking at conferences, working on community outreach programs, and other “evangelism” sorts of things.
I was an elected member of the W3C Technical Architecture Group for eight years, I’m also chair of the XML Processing Model Working Group at the W3C and co-chair of the XML Core Working Group and a member of the XQuery Working Group. At OASIS, I’m chair of the DocBook Technical Committee and a member of the RELAX NG Technical Committee.
TCW: Wow! That’s a lot of committee work. Thankfully, the work you do helping these groups also benefits what you do for your employer, MarkLogic. When you joined the company, there were a few people in the industry who were really surprised. After all, you were looked upon as a rock star in the XML arena. Why did you decide to leave Sun Microsystems after so many years employment?
NW: I’m not entirely comfortable with the notion of “rock star,” but between DocBook, open source projects, and standards work, I’ve guess I have become fairly well known.
Anyway, why did I leave Sun? I have tremendous passion for XML; let’s say that over time I felt like my vision for XML and Sun’s vision, as I perceived it, became so divergent that I decided to make a change.
As soon as I started talking to people at Mark Logic and had a chance to play with the server, I knew I’d found a group of exceptionally sharp folks who shared my passion for XML. A year and a few days after joining, I’ve never once felt otherwise.
TCW: After you joined Mark Logic, I recall you writing a blog post detailing who Dave Kellogg, the CEO of MarkLogic, challenged you to think differently about XML. Tell us a little about the challenge, what you thought before, and what made you “think differently?”
NW: That post is “Thinking differently about XML”.
I’d been at Mark Logic for a few months; I’d been thrown into a couple of small projects almost on day one, so I’d been busy. I was still trying to sink my teeth into the server, I wanted to develop something a little bit bigger.
In the course of building this project, I ran into some performance issues. I posted some basic questions to an internal discussion list and one of the folks who replied was Dave Kellog. His response wasn’t a challenge as much as a clear, patient explanation of how I had the wrong end of the stick.
I was used to thinking about XML in terms of a number of documents. The exact details escape me now, but roughly speaking I was trying to get all the documents I needed, then reach inside each to find the elements I needed, then process those. Dave’s observation was that I now had this great big, honking fast database that understands XML “natively”, and has everything indexed for fast access to XML. Instead of grabbing everything I might need and then filtering through it, I should push the constraints down into the database. Instead of applying XPath expressions to a document I had in hand, I could apply it to the whole database and get nearly instantaneous answers.
The app ran faster, I learned something pretty cool, and the *CEO* had taken the time to answer some newbie questions on an internal list. I thought that spoke volumes.
TCW: That’s a great example of good leadership and one of the reasons your CEO is admired by many others. In fact, his blog just won an SIIA Codie Award! And, the solutions your clients are creating using Mark Logic’s products are nothing short of miraculous, as far as I’m concerned. MarkLogic Server, for instance, has made it possible for organizations to see trends in—and answer questions derived from—unstructured and structured content, together in one repository. What is MarkLogic Server? Why is it so useful? And, what can it help organizations do today, that was impossible—or, at least extremely difficult—to do in the past?
NW: MarkLogic Server is a platform for rapidly building and deploying XML content applications. It’s a highly scalable native XML repository that can store and retrieve XML content and perform powerful search and analytics on it.
I’m a document guy. I think that most of what’s really important to an organization is bound up in documents one way or another. I’m not denying that there are huge quantities of tabular data out there, but documents provide the context for that data. The ideal way to store documents, so that you can extract the most value from them, is XML.
Because we have access to the structure and content of documents and metadata about them, we can do so much with them. Searching comes up a lot, of course, and we can easily provide both full-text searching: find me all the documents about “structured programming”, and faceted navigation: refine these search results by selecting only documents written by a specific author.
Alerting is important to a lot of people. Instead of querying a corpus of documents to find items of interest, you let the server do the work. By storing the queries, you can get the server to respond on the fly when new documents are inserted that match your criteria.
An area that I’ve been excited about for a while is geospatial applications. A *lot* of people are now carrying around devices that know exactly where they are, so I think the ability to quickly perform geospatial queries is going to become increasingly important.
One of the things that really impresses me about our core engineering team is how dedicated they are to maintaining the composability of features. Full text, structured and geospatial searching, for example, are all independent features, but you can compose them together arbitrarily and it “just works” *at speed*.
TCW: Publishers really see immediate benefit from using MarkLogic Server. Tell us about a few implementations by publishers and describe the value they’re receiving as a result.
NW: Custom publishing is a hot topic. At the MarkLogic User Conference (May 12-14 in San Francisco), Wiley is going to demonstrate Wiley Custom Select, an excellent example of a custom publishing application that I worked on recently. It allows professors to mix and match content from different textbooks to build their own custom textbook for a course. They can even upload their own content to be included in the book.
By putting all of the textbooks in MarkLogic Server, we can dynamically assemble a custom textbook in real-time. We give professors incredible freedom because we have effectively instantaneous access to every book in the system.
Another cool one I saw was a medical imaging application. A technician looking at an x-ray could enter a speculative diagnosis and the system would search a huge library of medical textbooks and journals. In this case, the system didn’t return whole documents, it returned just examples of x-rays that were diagnostic of the condition the technician entered. This provides instant access to exactly the x-rays that the tech wanted to use for comparison as an aid to making a final diagnosis.
TCW: Those are some great examples. The changing nature of consumer information consumption habits has led to drastic changes in the journalism arena. Are any publishers using MarkLogic Server to help them engage their readers online and to provide ...
Filed under: Content Management : Publishing : Structured Content : Unstructured Content : XML : XQuery
Wednesday, October 29, 2008
CMS Watch, an independent analyst firm that evaluates content technologies, has developed a series of online courses to help business and technology managers become more informed customers and decision-makers.
At a time when digital content doubles every three years, there is a worldwide shortage of expertise in how content technologies work. “Much of that expertise lies with vendors or integrators who have a vested interest in a particular solution,” noted CMS Watch principal, Theresa Regli, “and therefore enterprise technology selection and strategy teams have to work hard to find trusted advice.”
CMS Watch is helping to fill this knowledge gap by providing vendor-neutral expertise on how different types of content technology tools really work, with online courses based on analytical frameworks developed for CMS Watch product evaluation reports.
Each course is designed to give managers the smarts to engage with vendors and consultants on equal terms, to make the right decisions going forward about whether, where, and how to employ content technologies in their enterprises.
CMS Watch’s latest course, announced today, teaches business managers how to evaluate and “place” SharePoint within their enterprise. Other courses include introductions to Web Content Management and Enterprise Content Management technologies. Future courses will cover E-Discovery, Digital Asset Management, Portal, Web Analytics, and Social Software technologies, as well as web operations management and comparative web application development platforms.
Developed and narrated by CMS Watch experts and selected partners, each course packs a dense amount of information into four hours, via digestible one-hour modules. “Participants can follow a course from any network-accessible computer, anywhere in the world, at their own pace,” noted CMS Watch principal, Alan Pelz-Sharpe.
“Technology buyers around the world have many of the same basic questions,” said CMS Watch founder, Tony Byrne, “so we decided to put what we know about how these technologies really work into a format where customers can learn what they need to know without leaving their offices.
Wednesday, October 15, 2008
Meet Calvin Hendryx-Parker, Directory of Engineering at Six Feet Up, Inc., and a member of The Content Wrangler Community. As co-founder and Director of Engineering for Six Feet Up, Inc., Calvin oversees open source content management systems implemented in Plone, CMF and Zope. He is a proponent of web standards to ensure interoperability with other platforms.
Thursday, October 09, 2008
Meet Mugdha Amin, Senior Instruction Designer at HTS, a member of The Content Wrangler Community. Mugdha specializes in workplace learning and change readiness with “proven experience” defining and developing content for medium to large scale e-learning solutions.
Thursday, September 18, 2008
.
Intelligent Content 2009 has announced a call for presenters. The event, to be held January 29-30, 2009 at Le Parker Méridien Palm Springs, needs presenters who are creating, managing, and delivering intelligent content and who can present on such topics as:
The organizers are seeking submissions—presentations, case studies, panel sessions, workshops and interactive demonstrations—that are visionary and practical. But, more than anything, the organizers are seeking sessions that will help attendees learn something useful—something they can use when they return to the office. Case studies of content projects (web, print and/or mobile) are highly desired, as are presentations on content problems solved by social networks or via mashups - anything goes. If you are doing some really forward looking work let the organizers know

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.