## Topological Descriptors

Two widely applied examples of 2D molecular descriptors are molecular connectivity indices (MCI) and atom pair (AP) descriptors, initially developed by Carhart etal.29 Most 2D QSAR methods have been extensively studied by Randic,32 and Kier and Hall 33-38 based on graph theoretic indices. Although the physicochemical meaning of these structural indices is unclear, they certainly represent different aspects of molecular structures. These topological indices have been successfully combined with MLR analysis.39 They have been extensively applied to analytical chemistry, toxicity analysis, and other areas of biological activity prediction.40-43

A popular MolConnZ software44 affords the computation of a wide range of topological indices of molecular structure. These indices include (but are not limited to) the following descriptors: simple and valence path, cluster, path/cluster and chain molecular connectivity indices, kappa molecular shape indices, topological and electro-topological state indices, differential connectivity indices, graph's radius and diameter, Wiener and Platt indices, Shannon and Bonchev-Trinajstic information indices, counts of different vertices, and counts of paths and edges between different kinds of vertices.

Overall, MolConnZ produces over 400 different descriptors. Most of these descriptors characterize chemical structure, but several depend upon the arbitrary numbering of atoms in a molecule and are introduced solely for bookkeeping purposes. In a typical QSAR study, only about a half of all possible MolConnZ descriptors are eventually used after deleting descriptors with zero value or zero variance. Figure 1 provides a summary of these molecular descriptors and presents some algorithms used in their derivation.

The idea of using atom pairs as molecular features in structure-activity relationship (SAR) studies was first proposed by Carhart et al.29 AP descriptors are defined by their atom types and topological distance bins. An AP is a substructure defined by two atom types and the shortest path separation (or graph distance) between the atoms. The graph distance is defined as the smallest number of atoms along the path connecting two atoms in a molecular structure. The general form of an atom pair descriptor is as follows:

atom type i------(distance)------atom type j where atom chemical types are typically defined by the user. For example, 15 atom types can be defined using SYBYL mol2 format as follows: (1) negative charge center, NCC; (2) positive charge center, PCC; (3) hydrogen bond acceptor, HA; (4) hydrogen bond donor, HD; (5) aromatic ring center, ARC; (6) nitrogen atoms, N; (7) oxygen atoms, O; (8) sulfur atoms, S; (9) phosphorous atoms, P; (10) fluorine atoms, FL; (11) chlorine, bromine, iodine atoms, HAL; (12) carbon atoms, C; (13) all other elements, OE; (14) triple bond center, TBC; (15) double bond center, DBC. Apparently, the total number of pairwise combinations of all 15 atom types is 120. Further, distance bins should be defined to discriminate between identical atom pairs separated by different graph distances and therefore representing different molecular substructures. Thus, 15 distance bins can be introduced in the interval between graph distance zero (i.e., zero atoms separating an atom pair) to 14 and greater. Thus, in this a total of 1800 (120 x 15) AP descriptors can be generated for any molecular structure. An example of an AP descriptor is shown in Figure 2. Frequently, as applied to particular data sets, many of the theoretically possible AP descriptors have zero value (implying that certain atom types or atom pairs are absent in molecular structures).

Dragon descriptors45 include different groups: constitutional descriptors, topological indices, molecular walk counts, BCUT descriptors, Galvez topological charge indices, 2D autocorrelations, charge indices, aromaticity indices, Randic molecular profiles, geometrical descriptors, radial distribution junction (RDF) descriptors, 3D-MoRSE descriptors,

## Post a comment