CompanyProductsScienceSupportWhatsnew
[Product Releases]
Index
[Blog]

Most recent post

[News]

Can we trust docking results?
Sept 2010

IBM Systems and Technology Group releases a white paper with eHiTS and Cell
Oct 2008

EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007

[Events]

240th ACS
Aug 22-26, 2010
Boston, MA, USA
booth #945
see >> more

Index

 

CLiDE:
Chemical Literature Data Extraction

CLiDE Standard CLIDE Professional CLiDE Batch

Validation

Reference:

  1. Aniko T. Valko and A. Peter Johnson: "CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition"
    J. Chem. Inf. Model., 2009, 49 (4), pp 780-787
    DOI: 10.1021/ci800449t


The following test set (as reported in Ref. 1) was processed with CLiDE Pro in "batch mode". The images were processed and saved automatically. No manual correction of the results was applied (although CLiDE gives opportunity for this with various editors). The input and output files along with the corresponding processing times and success rate for the test set  are provided below:

Objects Total Number No. of objects containgin errors Success Rate Download Time
images 454 52 88.55%Input: dataset

Output:
SDF files
6 min
for the total set
ca.
1 sec / image
structure diagrams 519 53 89.79%

Below you'll find OLD1 data & results with CLiDE v2.1 - from the 20th century :-)  
(CLiDE Lite and Full v 2.1 were developed from 1991-1999)

Note-1:  These older datasets and results are kept only becauase the scientific community used these datasets as STANDARD / reference validation sets - i.e. the CLiDE development team under the guidance of Prof. A. Peter Johnson showed leadership in validating Chemical OCR methods from the early days.

HW

# images

Input files

Output

Proc. Time

366 MHz PII lap-top with 160 MB memory

14

Input: 14 images in Windows bitmap file format were processed

Output 1: saved in MOL file format
Output 2: saved in ChemDraw file format
Output 3 : saved in CLiDE file format

30 sec

900 MHz ATHLON PC with 768 MB memory

8 sec

366 MHz PII lap-top with 160 MB memory

96

Input: 96 images in Windows bitmap file format were processed

Output 1: saved in MOL file format
Output 2: saved in ChemDraw file format
Output 3: saved in CLiDE file format

650 sec
(ca. 11 minutes)

900 MHz ATHLON PC with 768 MB memory

230 sec
(ca. 4 minutes)


Some of the images in the validation set can be viewed below (click on the image to see its processing details):



[CLiDE Links]

Copyright © 2010 SimBioSys Inc., All rights reserved.