By J. W. Kennedy, Louis V. Quintas

It's been stated that glossy molecular concept is based on basically graph-like versions positioned in a few applicable embedding area. the assumption could be prolonged to actual concept, and it truly is this that offers the raison d'etre for this choice of papers. at the present time there's nearly no department of chemistry, together with its more moderen family members in polymer technological know-how and biology, that's not enriched by way of (or enriching) the mathematical conception of graphs. The influence of graph-theoretical considering in physics has, with a few extraordinary exceptions, built extra slowly. In 1847, G.R. Kirchoff based the idea of electric networks as a graph-theoretical constitution, and accordingly additionally made major contributions to the math of graph concept. this practice has persevered into the more recent sciences corresponding to telecommunications, machine technology and knowledge technological know-how.

The points in Fig. 2 are merely a rotation of the points in Fig. 1. Hence, distances of points from the origin in Fig. 2 are the same as in the original Fig. 1. In terms of straight Euclidean distance, point 9 is farther from the origin (near point 5) than is point 0. However, in another sense, point 0 is about as far from the general ellip tical pattern of points. Another rescaling possibility is to rescale the principal com ponents so that each has standard deviation one. Fig. 3 shows the result of this rescaling.

0 Formula Name C2 H 6 C 5 HI0 O 2 c 9h 8o 3 c 8 h 7 c io 2 Ethane Formic acid, butyl ester 2-Propenoic acid, 3-(2-hydroxyphenyl)-, (E)Benzeneacetic acid, 4-chloroBenzenemethanol Benzaldehyde, 3,4-dichloroBenzene, pentafluoromethoxyBenzene, l-isocyanato-3-(trifluoromethyl)1-Naphthalenol, acetate Butanamide, N-phenyl- c 7H8o c 7 h 4 c i2o c 7 h 3 f 5o Q H 4 F3 NO C8 H4 F3NO c „ h 8o 2 Determining structural similarity o f chemicals 35 PCI Fig. 4 . PC! versus PC 2 for 3692 chemicals. The ten chemicals chosen randomly are indicated with a • .

In finding nearest neighbors, the distance between two points (chemicals) X x and X 2 is given by 1/2 10 D = I [PC,(A'lr) - p c ,c y J )]2 (12) where PC, is the /th scaled principal component. This defines a numerical measure of dissimilarity which is calculated solely from the chemical structure. A visual in spection of the resulting ‘similar’ structures can be used to evaluate the utility and limitations of this approach in selecting structural analogs. Results To compute principal components, each of the 90 variables was transformed by the logarithm of the variable plus one.