Research and software testing
And one major way is writing “unit tests”. Is that boring? Extremely. Do you get publications by writing unit tests? No. Are they simple to write. Not when you start, but they get easier.
Of course, writing unit tests for chemistry software is not chemistry research and so you do not get to write chemistry publications about it. However, it is an active topic in computer science. If you hop over to the ACM digital library and enter the search “unit test”, you get 19,314 hits all in peer reviewed journals, just to show you a few example hits:
| Automatic extraction of abstract-object-state machines from unit-test executions |
| Software unit test coverage and adequacy |
| Carving differential unit test cases from system test cases |
When you read further Peter’s blog entry you see these statements:
The chemical software and data industry has no tradition of quality. I’ve known it for 30 years and I’ve never seen a commercial company output quality metrics.
Now, this is a bold claim if I have ever seen one. I am sure most commercial vendors who produce chemical software employ computer science or software engineering graduates, who during their training have been thought the standard unit testing and regression practices of the industry at school as part of the standard curriculum. How do I know that ? Because, not only do I have a BSc and an MSc myself in computer science (my PhD is in computational chemistry so that does not fall under CS), but I also spent 3 years as a teaching assistant at ELTE Budapest teaching programming methodology curses to CS undergraduates — including these techniques.
Of course, I can only speak about my own chemical software company with authority, so let me elaborate on how we do software testing. Our system consists of several compact software modules with well defined input and output data objects. These modules can be linked into a pipeline to perform complex tasks like docking or retrosynthetic analysis. Each of the modules have a unit test bed, which consists of a test engine, a set of test scripts and some input output data files and expected error report files. The test engine reads the test script, loads extracts the input data from the script, executes functions of the module and tests the responses, results returned comparing them to expected data from the script or data files. There are four distinct type of tests:
Func - functionality test; valid calls and parameters; checking certain scenarios to see if the module functions properly based on the script
Speed - performance test; valid calls and parameters; should be run with optimised compilation, debug turned off; measures speed
Error - testing of the exception handling; valid calls, parameters simulating extreme scenarios (e.g. file does not exist or incorrect file format used) that may happen in valid usage scenario due to wrong data being passed to the program by the user
Robust - robustness test; invalid call sequences and/or parameters to see whether the sanity checks (asserts) are thorough and complete. These tests programming errors in the integration pipeline, e.g. NIL pointers passed for required data input or calls made to uninitialized objects.
The last two categories have associated expected error files, where the error messages are listed that are expected to be in the response from the module that is being tested. An example functional test script is here from the MolFragGraph module. As you can see it contains a simple language, one command per line starting with a keyword followed by optional parameters and a data block. Of course, writing such scripts is boring, so we typically write only a few of them when a new module is developed. Then we add code like this to the program:
DBGMESSNLF(DEB_SCRIPT, “SCRIPT: MarkGridHead ClientID=0 NumLines=”<
<<" NumLineItems="<
<<" Low="<<_p_info->unit_min
<<" Dim="<<_p_info->unit_dims
<<" CellSize="<<_p_info->cell_size<<"\n");
This is a macro call, that is controlled by a debug flag (DEB_SCRIPT). If that flag is turned on during run-time, then the code will output a line into the log file identified by the "SCRIPT:" header and containing one complete line for the test script along with parameters and data. When we run an integrated software pipe, we can generate a log file containing the actual data being passed in and output from any given module inthe format required by the test bed scripts. This allows us to automatically generate test scripts for any of the modules by running an integrated software pipe for a practical input case. If we find a bug, when we reproduce it with a debug version of the code, we can immediately generate test script for each module involved and test them separately to identify where is the root of the problem. Once the bug is fixed, we can generate the correct output expected for each module for the test case. This comes very handy for generating regression tests, so that if later changes of the code would break any of the previously fixed functionality, then we can notice because the corresponding test script would fail. Of course, the running of all these tests is automated in a nightly build and test script. Each module is assigned to a developer who is responsible for the module. When a test script fails during the automated nightly test, the developer gets an email notification so he can fix it during the next day. For quality metric we are producing similar tables each night, like the VTK dash board (I cannot show you our own for confidentiality reasons). We have been doing development with quality control in SimBioSys since the start of the company in 1996. I have also worked in larger software company for medical imaging where software development was carried out under ISO 9001 certified methodology and I have implemented the same principles (with some more automation) in SimBioSys even though we have not applied for the certification — which is a long bureaucratic process with a significant cost.
So what is the take-home message from this post? That software unit and regression testing is a very important, serious — although boring — part of the chemistry software development, and it is not limited to (nor invented by) open source groups like the Blue Obelisk, which is NOT the only place for software and data quality, contrary to what PMR would like you to believe.
ZZ

June 3rd, 2008 at 2:50 pm
[…] Research and software testing […]
June 3rd, 2008 at 9:05 pm
I’ve been reading PMR’s posts and his views of “commercial cheminformatics” and, having been in the environment for over a decade, find his views truly insulting. The industry is full of hard-working, innovative and QUALITY-CONSCIOUS people who KNOW how to develop and care about delivering good software to their users. I don’t doubt the value of Open Source so I suggest he focus on singing its praises rather than shooting non-Open Source. The ongoing bias towards “Openness is Superior” and everything else be damned is simply wrong. I understand his agenda…but it’s his. Mind you, he hasn’t stopped thrashing me for ChemSpider not being Open Source yet sings the praises of PubChem. PubChem is NOT Open Source so I guess it’s their Open Data? They have policies about their data too..it is not declared Open Data. So, Pubchem is bad? No..I don’t think so.
As you say, PMR is now showing affection to Microsoft contrary to his Open SOurce position. Why? Because Microsoft’s solution works (on, and he is getting eChemistry funding from Microsoft too as he willingly admits). I KNOW Microsoft tests their software. One of my closest friends HAS that role at Microsoft.
So, what was the point of PMR’s post…I think it’s that Egon and Peter are doing what they should be doing in terms of testing? Oh. Ok. The rest of the industry does it anyway.
June 4th, 2008 at 9:13 am
As someone who’s part of the “Blue Obelisk,” I’d suggest you take Peter’s words with a large grain of salt. As ChemSpiderMan indicates, he has a clear agenda, which is not necessarily shared by others. One might argue he doesn’t even reflect his own views — if he’s such a believer in open source software, why does he use a Windows laptop?
In some very real sense, Peter has become the chemical software version of Richard Stallman. (I don’t write those words lightly.) He’s doing more talking than writing code. Indeed, when he contributed CML code to Open Babel, he wrote all of 6 unit tests. And he has since disappeared, except occasional complaints that CML support doesn’t match his ever-changing standard.
Personally, when Peter goes off on a rant, I tend to shrug my shoulders. I don’t read his blog much anymore, and haven’t commented in ages.