Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
Impact of 'invariants' on GetMorganFingerprint()
This section provides a tutorial example on impact of the 'invariants' option on fingerprint generation with rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function.
The 'invariants' option in the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function call allows you to override initial identifiers hashed from those predefined invariants (descriptors) on all atom nodes. Basically, you are providing the Morgan fingerprint for radius=0.
By default, RDKit uses the same invariants on each atom node as defined by Daylight Chemical Information Systems Inc..
1. For example, the following code shows you the differences between default initial identifiers and user-provided initial identifiers:
from rdkit.Chem import AllChem mol = AllChem.MolFromSmiles('CC') bitInfo = {} fp = AllChem.GetMorganFingerprint(mol, 1, invariants=[], bitInfo=bitInfo) print(bitInfo) bitInfo = {} fp = AllChem.GetMorganFingerprint(mol, 1, invariants=[1,2], bitInfo=bitInfo) print(bitInfo) # output: {2246728737: ((0, 0), (1, 0)), 3545175291: ((0, 1),)} {1: ((0, 0),), 2: ((1, 0),), 3205495808: ((0, 1),)}
As you can see from the output, when initial identifiers are provided, Morgan generator returns them as is as radius=0 identifiers, and uses them to derive radius=1 identifiers.
2. You can actually retrieve default initial identifiers from a given molecule by calling the GetConnectivityInvariants(mol) method. Them provide them the GetMorganFingerprint() call. You will get the same fingerprint as the default invariants.
mol = AllChem.MolFromSmiles('CC') bitInfo = {} inv = AllChem.GetConnectivityInvariants(mol) print(inv) fp = AllChem.GetMorganFingerprint(mol, 1, invariants=inv, bitInfo=bitInfo) print(bitInfo) bitInfo = {} fp = AllChem.GetMorganFingerprint(mol, 1, invariants=[], bitInfo=bitInfo) print(bitInfo) # output: [2246728737, 2246728737] {2246728737: ((0, 0), (1, 0)), 3545175291: ((0, 1),)} {2246728737: ((0, 0), (1, 0)), 3545175291: ((0, 1),)}
Conclusion: The "invariants=[...]" option is not easy to use. You have to build your own initial identifier on each atom node, by collecting a set of invariant values and hashing them into an integer.
If you really want to build your own initial identifiers, you can follow this example based on the code provided by Greg Landrum at mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg09400.html.
from rdkit import Chem from rdkit.Chem import AllChem def generateECFPAtomInvariant(mol, discrete_charges=False): pt = Chem.GetPeriodicTable() num_atoms = mol.GetNumAtoms() invariants = [0]*num_atoms ring_info = mol.GetRingInfo() for i,a in enumerate(mol.GetAtoms()): descriptors=[] descriptors.append(a.GetAtomicNum()) descriptors.append(a.GetTotalDegree()) descriptors.append(a.GetTotalNumHs()) descriptors.append(a.GetFormalCharge()) descriptors.append(a.GetMass() - pt.GetAtomicWeight(a.GetSymbol())) if(ring_info.NumAtomRings(i)): descriptors.append(1) invariants[i]=hash(tuple(descriptors))& 0xffffffff return invariants mol = Chem.MolFromSmiles('C') display(generateECFPAtomInvariant(mol)) display(AllChem.GetConnectivityInvariants(mol)) # output: [2286409670] [2246733040]
The generateECFPAtomInvariant() code is not generating the same identifier as AllChem.GetConnectivityInvariants(). Maybe the "discrete_charges=True" option needed to be added.
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
►Morgan Fingerprint Generator in RDKit
What Is Morgan Fingerprint Generator in RDKit
GetMorganFingerprint() Method in RDKit
Impact of 'radius' on GetMorganFingerprint()
Impact of 'useCounts' on GetMorganFingerprint()
►Impact of 'invariants' on GetMorganFingerprint()
Impact of 'useBondTypes' on GetMorganFingerprint()
Impact of 'fromAtoms' on GetMorganFingerprint()
GetMorganFingerprintAsBitVect() Method in RDKit
Impact of 'nBits' on GetMorganFingerprintAsBitVect()
GetHashedMorganFingerprint() Method in RDKit
Impact of 'nBits' on GetHashedMorganFingerprint()
GetMorganGenerator() Method in RDKit
Morgan Fingerprint Generator in RDKit for FCFP
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction