[Product Releases]
|
|
|
|
|
[Blog]
|
|
Most recent post
|
[News]
|
|
Can we trust docking results? Sept 2010 IBM Systems and Technology Group releases a white paper with eHiTS and Cell
Oct 2008
EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007
|
[Events]
|
| 243rd ACS
Mar 25-29, 2012 San Diego, CA
see >> more
|
|
|
|
|
List of Abstracts
-
Chemical Structure Recognition and Generic Text Interpretation
in the CLiDE project
P. Ibison, F. Kam, R.W. Simpson, C. Tonnelier, T. Venczel
and A.P. Johnson
Proceedings on Online Information 92, 1992, London, England
Abstract: Chemical information, especially that concerning chemical
reactions, is becoming increasingly available in a variety of computer-readable
databases. However, the creation of these databases is a time-consuming
and expensive process. CLiDE (Chemical Literature Data Extraction) is a
new software project to help solve the problem of building substance and
reaction databases. CLiDE uses a combination of imaging and artificial
intelligence techniques to recognize a range of chemical diagrams and extract
the information they contain. The steps necessary to transform a chemical
structure drawing into a computer-readable output are detailed. The interpretation
of the generic structures is discussed.
-
Chemical Literature Data Extraction. Bond Crossing
in Single and Multiple Structures
F. Kam, R.W. Simpson, C. Tonnelier, T. Venczel and A.P. Johnson
Proceedings of the 1992 Chemical Information Conference, 1992, Annecy,
France
Abstract: The procedure to convert a scanned image of a page of chemical
structure diagrams (with accompanying text) into a set of connection tables
is one of the primary aims of the CLiDE project. These connection table
can be used in a variety of computer-based applications such as building
and maintaining databases. The image is decomposed into component graphics
and text which are further analysed to find the lines, wedges, and chemical
text strings. In an interpretation phase the connection tables for the
molecules are built from these items. The correct interpretation of chemical
bonding in the image is often hampered by the constraints of representing
a three-dimensional molecule in two dimensions where one bond may be drawn
over another. A method of identifying and successfully dealing with these
situations is described. A related situation where a bond is drawn crossing
a ring implying an undetermined point of attachment is also solved. Examples
are presented to illustrate these situations and the rules implemented
to handle these structures within the CLiDE program discussed.
-
Chemical Literature Data Extraction: The CLiDE Project
P. Ibison, M. Jacquot, F. Kam, A. G. Neville, R.W. Simpson,
C. Tonnelier, T. Venczel and A.P. Johnson
Journal of Chemical Information Computer Science, vol. 33, no. 3, pp:
338-344, 1993
Abstract: Chemical information, especially that concerning chemical
reactions, is becoming increasingly available in a variety of computer-readable
databases. However, the creation of these databases is a time- consuming
and expensive process. CLiDE (Chemical Literature Data Extraction) is a
new software project to help solve the problem of building substance and
reaction databases. CLiDE uses a combination of imaging and artificial
intelligence techniques to recognize a range of chemical diagrams and extract
the information they contain. The steps necessary to transform a chemical
structure drawing into a computer-readable output are detailed. Several
examples are given to illustrate the scope of the current work.
-
(Chem)DeTeX Automatic Generation of a Markup Language
Description of (Chemical) Documents from Bitmap Images
Aniko Simon, Jean-Christope Pret and A. Peter Johnson
Proc. of the Third International Conference on Document Analysis and
Recognition (ICDAR'95)
vol. I, pp: 458-462, 1995, Montreal, Canada
Abstract: This paper presents a novel view of document processing, as
being the reverse process to TeX. This concept simplifies the analysis
of the physical structure of documents, and also suggests the use of a
style file for layout recognition. An algorithm is given for both phases,
layout analysis and layout recognition. The bottom-up layout analysis method
employed is based on the Kruskal's algorithm and uses the distances between
the components to construct the physical page structure. The algorithm
is linear with respect to the number of the connected components. For layout
recognition, a document style description language (DSDL) is introduced.
This helps a fault-tolerant, recursive parsing algorithm to label the blocks
of the document. The presented methods were designed to be used for scientific
publications (papers, reports, books), but could be applied to a broader
range of documents.
-
Recent Advances in the CLiDE Project: Logical Layout
Analysis of Chemical Documents
Aniko Simon and A. Peter Johnson
Journal of Chemical Information Computer Science, vol. 37, no. 1, pp:
109-116, 1997
Abstract: The CLiDE system for chemistry document image processing consists
of three major steps: physical layout analysis, recognition of the primitives,
and logical layout analysis. This paper presents the new methods for logical
layout analysis: role assignment to the elements of the document with a
use of a style description language. The results are illustrated by application
to generic reaction interpretation.
-
A Fast Algorithm for Bottom-Up Document Layout Analysis
Aniko Simon, Jean-Christope Pret and A. Peter Johnson
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
19, no. 3, pp: 273-277, 1997
Abstract: This paper describes a new bottom-up method for document layout
analysis. The algorithm was implemented in the CLiDE (Chemical Literature
Data Extraction) system, but the method described here is suitable for
broader range of documents. It is based on Kruskal's algorithm and uses
a special distance-metric between the components to construct the physical
page structure. The method has all the major advantages of the bottom-up
systems: independence from different text spacing and independence
from different block alignments. The algorithms computational complexity
is rediced to linear by using heuristics and path-compression.
-
CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition.
Aniko T. Valko and A. Peter Johnson
J Chem Inf Model. 2009 Apr;49(4):780-787
Abstract: We present CLiDE Pro, the latest version of the output of the long-term CLiDE project for the development of tools for automatic extraction of chemical information from the literature. CLiDE Pro is concerned with the extraction of chemical structure and generic structure information from electronic images of chemical molecules available online as well as pages of scanned chemical documents. The information is extracted in three phases, first the image is segmented into text and graphical regions, then graphical regions are analyzed and where possible the connection tables are reconstructed, and finally any generic structures are interpreted by matching R-groups found in structure diagrams with the ones located in the text. The program has been tested on a large set of images of chemical structures originating from various sources. The results demonstrate good performance in the reconstruction of connection tables with few errors in the interpretation of the individual drawing features found in the structure diagrams. This full test set is presented for use in the validation of other similar systems.
|
|
|