Chemical software quality 3 - polar surface area
This is my third post in the ongoing debate on chemical software quality. To give quick look-up table for those who join the discussion late, let me throw in some pointers in a time-line:
- Egon Willighagen wrote a blog about unit testing in CDK: Finding differences between IChemObjects
- Peter Murray Rust responded: The Blue Obelisk - Egon’s diff is boring making some general comments about chemical software quality
- Egon continued his unit testing story: Finding differences between IChemObjects #2
- Egon responded to PMR (2): Good Scientists Pimp there Research (was: Damn, I’m boring…)
- I responded to PMR (2, also citing 1,3,4): Research and software testing
- PMR responded to my post (5): Quality in chemical software - a debate
- I responded to PMR (6): Quality in chemical software - the debate continues
- PMR responded to (7): Quality is emerging in chemical software
- Egon responded to (2,5,6,7) in his post: Recovering full mass spectra from GC-MS data
In post (9) Egon points out that annual competition results and benchmark results that I referred to in (7) have very little connection with unit testing and basic software quality — i.e. detecting, fixing and avoiding bugs — which is the focus of unit testing discussed in (1,3,5). I agree and that is exactly why I focused on that definition of software quality in my first post (5) in this debate. PMR has accepted my defense of closed source software quality in the software engineering sense (”I am prepared to believe that a company is able to reproduce its own results internally and I suspect that the quality is better than it was 10 years ago.“) and jumped onto a different definition of chemical software quality in (6) - one that has to do with assessing the scientific value of the answers provided by the software. This is what I addressed in (7), so I am not confusing the quality addressed by unit testing and the competitions or benchmarks, and I hope it is equally clear to everyone else that these are two very distinct issues. It seems everybody is agreement that unit testing is crucial, and now we have a common understanding that it has always been (traditionally) applied in the commercial chemical software world. Now I will respond to the new points raised by PMR’s post (8):
PMR: By a tradition of quality I mean that there is a communal understanding that quality matters. Although quality is a wide term it is often difficult to discuss unless it is measured.
Indeed, and we already touched on two very different meanings of the word as I elaborated above.
PMR: Leaving aside the stochastic aspect - which we agree on (and which makes quality assessment much harder) my concern is not whether a given calculation is reproducible when confined to a manufacturers platform, but whether the results have been assessed as meaningful. Now I agree that this is not easy, but unless the manufacturers develop interoperable standards then the quality of the result is only assessable by public assessment, requiring standard data sets and standard results. I gave the example of “(total) polar surface area” which should, in principle, be computable reproducibly by all manufacturers. But only if it is defined in a manner that all agree upon. Otherwise we have as many different values as there are manufacturers. And I would content that - unless each has a clear defintions of the lagorithm and the proerty calculated - this is a lack of quality.
Well the question: “what is the (total) polar surface area of a molecule ?” really belongs to the computing non-observable category. I would say it is about as well defined as the question: “what is the favourite color ?” Of course, there is no single “correct” answer for either one. You need to specify the question far more precisely if you want to get a meaningful answer. First, which surface are you talking about ? The van der Waals surface, the solvent accessible surface, the solvent excluded (aka Connolly) surface or any of the iso-surfaces, e.g. electron density iso-surface at any given cut-off ? If you choose one of the former three, what radius values should be used to define the atom spheres ? Each force field has a different set of vdw radii. If you choose electron density iso-surface, what cut-off value should be chosen ? All of these choices will significantly alter the “correct answer” to the question. Then how do you define what part of the surface is considered polar ? For vdwSA or SAS it maybe defined based on atom type, like O, N being polar, hydrophobic carbons being apolar. But what about aromatic carbons next to a nitrogen, or the carbon of a charged group, like carboxylate, should that be considered polar or not ? Or should polarity be defined by computing the net charge effect of all atom based partial charges for every single point of the surface and sum that up via a surface integral ? How do you assign partial charge values for that ? Or use a QM method to compute the charges at surface points ? What level of theory to use ? All these questions and choices are outside the scope of software correctness or quality. There is a correct answer for each variation of the question and each set of parameters chosen. Once the choices and parameters are fixed, then you can ask how accurately a given software computes the polar surface area for the given specification. So, simply this “property” isn’t a single property but a whole range. Oh, and don’t forget the conformation, because a lot of interesting molecules are flexible and the PSA will depend on what conformation you use to compute it. The lowest energy conformation may not be the most relevant if you are interested in bio-activity against a specific target, you need to know the bioactive conformation. So before we can even begin to address the question of software quality metrics, we need to define the problems precisely. Otherwise, you may get a totally different result from every piece of software package and all of them can be accurate.
PMR: I have not - and will not - claim that the Open Source movement in chemistry is of higher quality than closed source.
This statement is easy to refute by a verbatim quote from PMR’s post (2):
PMR: So the Blue Obelisk is emerging as the main area which takes quality in chemical software and chemical data seriously. More organisations are taking Open Source seriously. I met a chemical software company last week - no names - who is seriously looking at Open Source and thinking of integrating its competitors’ products. Perhaps not RSN, but they are looking at it.
And when they do they will find the Blue Obelisk is the only place for software and data quality.
Notice the word only (emphasis is mine) in the last sentence. It is clear that it is an even stronger statement than claiming OS to be higher quality than closed, it implies there is no quality in closed source chemical software at all. Incidentally, this is the statement that “inspired” me to enter this debate in the first place.
PMR: I said there was no tradition of quality. As a result of your post I will moderate this statement slightly.
PMR: I agree this, but note that many of these are very recent. So I would be prepared to say that in certain fields a tradition of quality metrics is starting to emerge. Almost all of these relate to docking into proteins and are driven, at least in part, by the tradition of competitions in proteins such as CASP which has for many years been involved in predicting protein structure.
So I wish them well and will now exclude docking (but not QSAR) from my remarks.
Thank you very much, Peter. I am glad you changed your view about docking. This area has been the main focus of my research and development for the past 6 years and I believe we have very good results measured with sound scientific quality metrics. Mind you, some vendors in the past have used rather questionable metrics to report good results, and I have explained in that post how such metric can lead to ridiculous results.
ZZ
