Smiles ARbitary Target Specification (SMARTS) is a language to formulate chemical patterns like substructures in molecules . In order to evaluate the algorithms to search for chemical patterns in molecules, we present a collection of SMARTS expressions extracted from various literature sources [2-13] and a collection of SMARTS-molecules pairs created from the ZINC database. In addition, a test case comprised of a highly symmetric SMARTS-SMILES-pair and a subset of the ZINC lead-like database is provided. If you use this set or any subset, please cite:
Ehrlich, H.-C. Rarey, M.: Systematic benchmark of substructure search in molecular graphs - from Ullmann to VF2. J Cheminf 2012, DOI: 10.1186/1758-2946-4-13 (Open Access)
and the original sources accordingly.
The following table includes the literature references and links to the files containing the corresponding SMARTS expressions:
Note that the original paper  contain patterns in SLN notation. A conversion into SMARTS was performed by R.Guha using Cactvs . For further information on conversion, see Rajarshi Guhas blog entry http://blog.rguha.net/?p=850.
The following sets contain the literature SMARTS files, different versions of a subset of the PAINS SMARTS, sets of SMARTS-SMILES-pairs to evaluatethe influence of substructure and molecule size on the algorithmic runtime, the first 100k molecules from the ZINC lead-like database  as of 12th February 2011 to represent a small database and a phenylring-fullerene-pair file as a worst-case symmetry search case.
An extension of the benchmark set by SMARTS published in Kenny, P.; Montanari, C. & Prokopczyk, I. ClogPalk: a method for predicting alkane/water partition coefficient Journal of Computer-Aided Molecular Design, Springer Netherlands, 2013, 27, 389-402. See the changelog for a detailed overview of the changes.
A minor revision of the benchmark set adding and correcting SMARTS expressions that where missing or incorrect. See the changelog for a detailed overview of the changes. Thanks to Andrew Dalke who provided us with many hints for improvements of the dataset.
 Daylight Theory Manual: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
 Hann M, Hudson B, Lewell X, Lifely R, Miller L, Ramsden N: Strategic pooling of compounds for high-throughput screening. J Chem Inf Comput Sci, 1999, 39(5):897–902. [http://pubs.acs.org/doi/abs/10.1021/ci990423o]
 Walters W, Murcko MA: Prediction of ‘drug-likeness’. Adv Drug Delivery Rev, 2002, 54(3):255–271.[http://www.sciencedirect.com/science/article/pii/S0169409X02000030].[Computational Methods for the Prediction of ADME and Toxicity]
 Abolmaali SFB, Wegner JK, Zell A: The compressed feature matrix - a fast method for feature based substructure search. J Mol Model, 2003, 9:235–241. DOI:0.1007/s00894-003-0126-0. [10.1007/s00894-003-0126-0]
 Olah M, Bologa C, Oprea TI:An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des, 2004, 18:437–449. DOI:0.1007/s10822-004-4060-8. [10.1007/s10822-004-4060-8]
 Maass P, Schulz-Gasch T, Stahl M, Rarey M: Recore: a fast and versatile method for sca?old hopping based on small molecule crystal structure conformations.J Chem Inf Model, 2007, 47(2):390–399. [http://pubs.acs.org/doi/abs/10.1021/ci060094h]. [PMID: 17305328]
 Degen J, Wegscheid-Gerlach C, Zaliani A, Rarey M: On the art of compiling and using ’drug-like’ chemical fragment spaces. Chem Med Chem, 2008, 3:1503-1507. DOI:10.1002/cmdc.200800178
 Ahmed HEA, Vogt M, Bajorath J: Design and evaluation of bonded atom pair descriptors. J Chem Inf Model 2010, 50:487-499. DOI:10.1021/ci900512g
 Daylight SMARTS examples; Daylight Chemical Information Systems, Inc. Laguna Niguel, CA;
http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html. Accessed May 25, 2010.
 Agrafiotis DK, Gibbs AC, Zhu F, Izrailev S, Martin E: Conformational sampling of bioactive molecules: a comparative study. J Chem Inf Model, 2007, 47(3):1067–1086. [http://pubs.acs.org/doi/abs/10.1021/ci6005454].
 Enoch SJ, Madden JC, Cronin MTD: Identifcation of mechanisms of toxic action for skin sensitisation using a SMARTS pattern based approach. SAR QSAR Environ Res, 2008, 19(5-6):555–578.
 Baell, J. B., Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for their Exclusion in Bioassays, J Med Chem, 2010 , 53 (7), pp 2719-2740. DOI:10.1021/jm901137j
 Kenny, P.; Montanari, C., Prokopczyk, I.: ClogPalk: a method for predicting alkane/water partition coefficient Journal of Computer-Aided Molecular Design, Springer Netherlands, 2013, 27, 389-402. DOI:10.1007/s10822-013-9655-5
 Ihlenfeldt WD, Takahashi Y, Abe H, ichi Sasaki S: Computation and management of chemical properties in CACTVS: An extensible networked approach toward modularity and compatibility.J Chem Inf Comput Sci 1994, 34:109–116. DOI;10.1021/ci00017a013
 ZINC Database: http://zinc.docking.org