Large databases of chemical reactions provide new data-mining opportunities and challenges. Key challenges result from the imperfect quality of the data and the fact that many of these reactions are not properly balanced or atom-
mapped. Here, we describe ReactionMap, an efficient atom-
mapping algorithm. Our approach uses a combination of maximum common chemical subgraph search and minimization of an assignment cost function derived empirically from training data. We use a set of over 259,000 balanced atom-
mapped reactions from the SPRESI commercial database to train the system, and we validate it on random sets of 1000 and 17,996 reactions sampled from this pool. These large test sets represent a broad range of chemical reaction types, and ReactionMap correctly
maps about 99% of the atoms and about 96% of the reactions, with a mean time per
mapping of 2 s. Most correctly
mapped reactions are
mapped with high confidence. Mapping accuracy compares favorably with ChemAxon鈥檚 AutoMapper, versions 5 and 6.1, and the DREAM Web tool. These approaches correctly
map 60.7%, 86.5%, and 90.3% of the reactions, respectively, on the same data set. A ReactionMap
server is available on the ChemDB Web portal at
http://cdb.ics.uci.edu.