Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
RDKit Substructure Search with SMARTS
This section provides a tutorial example on how to perform substructure search with a MARTS parttern using RDKit library.
RDKit also supports substructure search with SMARTS (SMiles ARbitrary Target Specification) pattern, which is an extension of SMILES (Simplified Molecular Input Line Entry System).
SMARTS is a language for describing molecular substructure patterns and widely used in cheminformatics.
Here is a nice short example of doing substructure search with SMARTS using RDKit library, revised from "A Brief Introduction to SMARTS" by Daniel Russo at https://russodanielp.github.io/blog/a-brief-introduction-to-smarts/. The image generated by the line will be displayed automatically Jupyter Notebook.
from rdkit import Chem from rdkit.Chem import Draw mols = ['CC(=O)Nc1ccc(O)cc1', 'CC(N)Cc1ccccc1', 'CC(CS)C(=O)N1CCCC1C(=O)O'] mols = list(map(Chem.MolFromSmiles, mols)) sub = Chem.MolFromSmarts('[#6]!:[#6]') highlights = [[a[0] for a in m.GetSubstructMatches(sub)] for m in mols] Draw.MolsToGridImage(mols, highlightAtomLists=highlights)
Note that 2 Python nice features are used in example:
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
►Molecule Substructure Search with RDKit
RDKit m.HasSubstructMatch(s) - Substructure Match
RDKit GenerateDepictionMatching2DStructure(m, s) - Substructure Orientation
RDKit rdMolDraw2D.PrepareAndDrawMolecule - Substructure Highlight
►RDKit Substructure Search with SMARTS
rdkit.Chem.rdFMCS - Maximum Common Substructure
rdkit.Chem.rdSubstructLibrary - Substructure Library
Substructure Library in Binary and SMILES Formats
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
Morgan Fingerprint Generator in RDKit
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction