What is wrong with OOXML
Peter MR has voiced his opinions on his blog about the use of OOXML for archiving chemistry documents:
The reason I currently like OOXML is that we can make it work and that we have material in Word that we can use. I’ll be demoing it publicly in a week’s time (more later). If we had material in ODT we’d use that, but we don’t.
…
My worry about Open Office (which emits ODT) is that I don’t yet believe that has reached a state where I could evangelize it without it falling over or being too difficult to install.
I would much rather recommend the ODF (ODT) format, which is a truly open ISO standard (approved on May 1st 2006). OpenOffice.org is only one of the tools that can generate it, there are several others as well as various converters (e.g. SUN’s MS Office plugin, Clever Age ODF translator) available for MS Word users.
His point is that it is still better to use OOXML than the binary doc format of MS Word. I do not agree with this point, I think OOXML is just as bad as the binary Word doc, for these reasons:
- It is a single vendor format with patent encumbered binary extensions — so it might as well be called proprietary. OOXML cannot be implemented by open source software due to incompatibilities with the GPL.
- The national bodies have raised over a thousand unique objections about technical details of the format during the ISO process (see also the wiki collection), less than 20 percent of which has been discussed during the Ballot Resolution Meeting and most of those was not resolved to the satisfaction of the opponents. You can find a good collection of remaining problems here
- It has been accepted as a standard via blatant manipulation, ballot stuffing, corruption in various levels, see some of the history here and here. More irregularities: Poland’s new rule: no vote equals yes, Cuba’s No vote counted as yes, Microsoft friendly “yes-men” invaded Belgium’s Technical Committee, Denmark voted yes by consensus while 50% opposed, interesting vote counting in Croatia (14 No + 3 Yes = Yes), how the Philippines changed their vote from no to yes.
- ISO has violated the WTO rules by allowing a duplicative standard to an existing one (ODF), according to Tineke Egyedi, president of the European Academy for Standardisation.
- OOXML reinvents the wheel, ignoring and replacing mature standards like SVG, MathML, XForms and even XML. The most prominent example is the neglection of MathML where OOXML defines its own formula markup language (OOMML).
- OOXML requires undisclosed copyrighted material from Microsoft Office. The previous problem of Border Style art being undisclosed was acknowledged and fixed on February 22nd 2008 however Part 4 2.18.94 ST_TextEffect (Animated Text Effects) describes VML art that is not included in the specification.
- OOXML does not provide the Binary to XML mapping which is required to fully represent the existing corpus of user documents. No other application supporting OOXML will be able to faithfully or fully recreate the look of Microsoft’s legacy binary documents. Although the binary Office document specifications have been posted by Microsoft (15 Feb 2008), no standardized mappings were offered during the BRM, as requested by the US, United Kingdom, Brazil, and Malaysia, amongst others.
- Markets cannot rely on ISO standards with calculation errors. Spreadsheet formulas still result in calculation errors. Although the CEILING function was recognised to have a legacy bug and fixed during the BRM, there exist more mathematical inaccuracies in OOXML’s spreadsheet function. The FLOOR function has been identified to have a similar mathematical inaccuracies for negative numbers. This is a problem that needs to get carefully studied. We recall that Intel faced a consumer scandal and losses when their new Pentium chip was found to have a calculation error. The Y2K problem, a standardization issue, resulted in billions of investment for damage control.
- Macro functionality is not properly defined. Section 2.16.5.41 defines a “MACROBUTTON” field that allows the definition of a button in the document that will trigger a macro. But little is said about how the macro is stored, bound, what API’s are available, or what the security model is for this feature. ECMA’s disposition (approved in batch by the BRM without discussion or opportunity for objection), was something quite different and unsatisfactory. ECMA simply added: “The mechanism by which the command specified by text in field-argument-1 is located and/or executed by an application is “implementation-defined”. Unfortunately, with this addition, not only is it impossible to have cross-platform interoperability of this feature, it is unlikely that vendors will be able to implement a reasonable security policy to detect, scan or block macros included in documents.
- There are additional 850+ technical problems raised during the ISO process and has not been resolved, I will not list all of them here
In a single sentence: OOXML is nothing more than a marketing check-box for Microsoft, so that they can now claim to have an open ISO standard document format, but in reality it is neither open, nor standard by any rational definition of the words.
ZZ.

May 11th, 2008 at 7:13 pm
I agree
May 12th, 2008 at 4:09 am
Peter, most visually, has tried to get foots on the ground with the datument concept. There clearly was not enough critical mass to get such things going. The audience is happy with ‘hamburgers’ (OOXML,PDF,ODF), and has currently zero incentive to do things properly… Even LaTeX has very limited means of semantic markup of chemical information. HTML does allow that, and has been focus of at least some of the work Peter has done.
Something I thought was rather promosing, semantically embedding chemical data in RSS feeds (CMLRSS paper in JCIM), has seen little uptake outside the author group.
While not happy about the choice (I don’t use word, so will not be able to comment on these efforts of Peter), I do understand his wish to push semantics into the authoring world. See further comments on this tradeof in Peter’s blog:
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1091#comments
Let me pop the big question here too then:
- did ‘we’ fail to deliver a semantic authoring tool for chemistry?
- and, should we settle for OOXML because that’s what people use?
From your blog item, you’re not in favor of settling… but what would you consider a good strategy to get better tools developed and used?
Do you think the Blue Obelisk [1] would manage to attract enough critical mass from the KOffice/OpenOffice communities to work on the ICE equivalent Peter mentioned? Should we try to bootstrap this?
1.http://www.blueobelisk.org/
May 12th, 2008 at 10:15 am
Egon, I understand that Peter is pushing for the “Right Thing ™”, and I completely agree with the intentions. I just don’t agree with the compromise he is willing to make, i.e. to work with the binary doc format or the XML dump of the same (OOXML) which still has all the technical problems, binary blobs, patent problems, application and platform dependence.
In response to your questions:
1. Yes, so far ‘we’ have failed to deliver semantic authoring tools, but we can still work on them. As for the existing documents, my company SimBioSys has a tool to get the chemical semantics out of PDF files: http://www.simbiosys.ca/clide/index.html
2. No, we shouldn’t settle, of course!
People use it not by choice, but because the monopoly pushes it on to them. We have to work hard to get semantic documents into Open Standard documents. If we roll over and let the monopoly win, then even if we get semantics into documents we will have to keep paying through the nose to use and access our own data!
ZZ
May 13th, 2008 at 2:53 am
[…] Claims about ODF support are typically meaningless I know I’m repeating myself a bit. But as you know there’s a Wikipedia page about applications that support the Open Document Format and it gets quoted and linked to. A lot.I linked to Peter Murray Rust yesterday, and one of the commenters on his blog also talks about the number of implementations.OpenOffice.org is only one of the tools that can generate it, there are several others as well as various converters (e.g. SUN’s MS Office plugin, Clever Age ODF translator) available for MS Word users.But folks, as of 2008-05-13 mid afternoon Toowoomba time, that Wikipedia page is not much help to people who might want to, like you know work on real documents. GoogleDocs, for example will throw away your styles if you happen to care about them. And why would you? It’s a web two-dot-oh world now what do we need with styles?I’m going to get around to editing that page, but I’m not an expert so I have held off.I just know that it’s not useful. Lots of things on that list don’t work with my ODF documents all created with applications derived from OpenOffice.org. As part of my homework for when I do engage the page I went to the spec.What does the ODF spec (v1.1) say about conformance? You need only read the first sentence of this extract, which I have highlighted for your convenience.The OpenDocument specification does not specify which elements and attributes conforming application must, should, or may support. The intention behind this is to ensure that the OpenDocument specification can be used by as many implementations as possible, even if these applications do not support some or many of the elements and attributes defined in this specification. Viewer applications for instance may not support all editing relates elements and attributes (like change tracking), other application may support only the content related elements and attributes, but none of the style related ones.Even typical office applications may only support a subset of the elements and attributes defined in this specification. They may for instance not support lists within text boxes or may not support some of the language related element and attributes.So you don’t have to do anything in particular to claim to support ODF. Maybe just allowing the top level element would be enough?We clearly need to add some detail to the Wikipedia page about who supports what specifically.My working definition of support for ODF (the text format is actually what I care about) is people using our word processing templates to edit files interoperably with Writer and some other application. And you know what? There’s nothing outrageous in any of our templates, but the only applications that work are ones that are based on the original Star Office code base that spawned OpenOffice.org. This is unsurprising. The file format was built around OpenOffice.org. Lots of people point out that nobody but Microsoft will be able to build an office suite that supports OOXML. People, this is the same process at work.Let me tell you about a quick experiment I did. I looked up the bit in the ODF spec about lists. Apparently you can have a thing called a list-header at the start of the list. A list header is a special kind of list item. It contains one or more paragraphs that are displayed before a list. The paragraphs are formatted like list items but they do not have a preceding number or bullet. The list header is represented by the list header element.So I made a .odt using NeoOffice, put in a list using the default style List 1 saved it, unzipped it and added a list-header to the start of the list, then rezipped it and opened in NeoOffice. Hmm, well it does display, but the Writer interface seems to know nothing about list headers. The only way to create one seems to be outside of the application. Having a feature like this creates some very serious weirdness. You can load up a document with multiple paragraphs in the list header and they are preserved but if you try to add one then you just get a normal list item.Now I know this is not quite in line with what I’m saying about the file format being largely derived from OpenOffice.org. I don’t know the history of this element but OpenOffice.org doesn’t support it in any useful way. Looks like something added by a standards committee raised on SGML with a clear idea of what makes a ‘good’ document format and not much consideration about what makes a usable word processor interface but that’s just a guess.Lets talk about the formatting I got from my experiment. Is this what the spec really means? See how the list header (Header 1) is indented? Not formatting I can imagine using. But it seems to be what the spec says. “The paragraphs are formatted like list items but they do not have a preceding number or bullet.”How many of the ODF cheer squad have read the standard? Dealt with document interop issues? Me, I have only glanced at the OOXML spec and so I don’t go around telling people about how bad it is, but on the few occasions I have looked at ODF and tested support for various things in Writer I have found problems or lack of support in OOo. The good thing with OpenOffice.org, of course is that you can tell Sun about the bug and it will likely get fixed sooner or later, particularly if you can rally enough supporters to vote.The bottom line is that if you want to work with ODT you have to check out whether the other applications are going to support the bit you want to use. I will get around to changing that wikipedia page to be more useful. […]