Assisted Estimation of Synthetic Accessibility
The program uses the technology of expert systems to mimic methods
used by an experienced synthetic chemist to estimate the ease of synthesis
of a particular compound. Potential starting materials are selected from
databases of available compounds and the structural features that make a compound
difficult to synthesize are identified. The combination of these two types
of information provides a realistic estimate of the likely ease of synthesis.
The program comprises a set of modules which estimate synthetic accessibility
using a knowledge based approach.
New techniques have been developed which permit rapid searches for potential
starting materials within databases of available compounds (such as the Fine
Chemicals Directory or in-house databases). Potential synthetic routes are
established between all compounds in these databases and the target structures.
The length and quality of each route is assessed and the best are selected.
A series of efficient measures have been implemented to ensure the search
is fast. The user is provided with structures of the potential starting
materials along with information concerning the quality of the synthetic
routes from individual starting materials.
The presence of complexity-enhancing structural features (such as chiral
centres or complex topological features) is detected in each target molecule.
These features are used to assess the ease of synthesis of a compound. Only
features that are not contained within the selected starting materials are
considered to contribute towards the synthetic difficulty of the compound.
Recent developments in expert systems, Causal Networks, are used to combine
the information from a variety of sources and calculate an index of synthetic
The program uses two sources of information, that could be customized
to provide estimations specific to the chemistry employed in the company:
a database of available starting materials and
a reaction knowledge-base.
The database of available starting materials is customizable with a converter
program provided in the CAESA package. This converter takes custom databases
in MDL molfile format and converts them off-line into an internal CAESA format.
Extension of the reaction knowledge-base is a more manual process. There are
about 180 basic reactions supplied with the program. The reactions are described
in a pattern language called: PATRAN, which is an ASCII (human
readable) pattern language designed specifically for reactions. It
is similar to the SMILES notation, but extended to handle reactions not only
individual molecules. PATRAN also allows usage of generic atomtypes. The
reaction knowledge base of CAESA is extendible by the user to include additional
reactions specific to the chemistry applied in the company.