Cheminformatics Tutorials - Herong's Tutorial Examples

https://www.herongyang.com/Cheminformatics

Copyright © 2019-2024 Herong Yang. All rights reserved.

Molecule This book is a collection of notes and tutorial examples written by the author while he was learning cheminformatics and related tools. Topics include SMILES (Simplified Molecular-Input Line-Entry System) specifications; Open Babel chemical toolbox for file format conversion; Fingerprint index files used by Open Babel for fast search; RDKit for cheminformatics and machine learning; Substructure search and decomposition with RDKit; RDKit performance on large molecule datasets; molecular fingerprints generation methods; AlphaFold as an AI system to predict protein’s 3D structure. Updated in 2024 (Version v2.03) with minor updates.

Table of Contents

About This Book

SMILES (Simplified Molecular-Input Line-Entry System)

What Is SMILES

What Is Canonical SMILES

Atom Represenations in SMILES

Bond Represenations in SMILES

Branch Represenations in SMILES

Ring Represenations in SMILES

Disconnected Structures in SMILES

Charge Represenations in SMILES

Isotope Represenations in SMILES

Directional Bonds in SMILES

Tetrahedral Centers in SMILES

Chirality Representations in SMILES

Hydrogen Representations in SMILES

Open Babel: The Open Source Chemistry Toolbox

What Is Open Babel

Install Open Babel with Anaconda

Install Open Babel on Windows Computers

Run Open Babel GUI on Windows Computers

Change Display Command on Open Babel GUI

Open Babel Installation Options on Linux

Install Open Babel Binary Package on CentOS

"Open Babel Error in LoadAllPlugins" Error

Install Open Babel from Source Code

Install Open Babel 2.4.1 from Source Code

Open Babel Installation Options on macOS

Install Open Babel Binary Package on macOS

Using Open Babel Command: "obabel"

What Is "obabel" Command

"obabel -i ..." - Input Data Format and Source

"obabel -o ... -O" - Output Data Format and Destination

"obabel -... --..." - Generic Conversion Options

"obabel" Command Option Argument Syntax

"obabel ... --gen2D" - Calculated 2D Coordinates

"obabel ... -f # -l #" - Split Large Molecule File

"obabel -h/-d" - Add/Remove Hydrogens in Molecule Data

"obabel --append ..." - Calculate Molecule Properties

"obabel -L formats" - List of File Formats Supported

"obabel -a..." - Extra Options for Input Reading

"obabel -x..." - Extra Options for Output Writing

"obabel" vs. "babel" Open Babel Commands

Generating SVG Pictures with Open Babel

"obabel -o svg" - Molecule Picture in SVG

"obabel -:... -o svg" - Generate SVG from SMILES

"obabel ... -o svg -xi" - Show Atom Indices in SVG

"obabel ... -o svg -xS" - Ball/Stick Depiction in SVG

"obabel ... -o svg -xX" - Hide Implicit H in SVG

"obabel ... -o svg -xC" - Hide Terminal C in SVG

"obabel ... -o svg -xP300" - Control Image Size

"obabel ... -o svg" - Two "svg" XML Tag Levels

"obabel ... -o svg -xd" - Hide Molecule Name

"babel ... -o svg -xd -xP300" - Open Babel 2.4 Bug

Scale SVG Images using "viewBox" Attribute

Substructure Search with Open Babel

"obabel -s ..." Command - Substructure Search

Substructure Search with Wildcard Atom "*"

Substructure Search with Wildcard Bond "~"

Substructure Search with SMARTS Expressions

Similarity Search with Open Babel

Fingerprint Index for Fastsearch with Open Babel

Stereochemistry with Open Babel

What Is Stereochemistry

Read Stereoinformation from Input with Open Babel

Stereo Perception Performed by Open Babel

Write Stereoinformation to Output by Open Babel

Wedge-Hash Bond Changed by Open Babel

Hash Bond with Solid Line by Open Babel

Hash over Double Bond by Open Babel

Command Line Tools Provided by Open Babel

List of Open Babel Command Line Tools

"obchiral" - Print Chirality Information

"obconformer" - Generate Best Conformer

"obenergy" - Calculate Molecule Energy

"obfit" - Superimpose Two Molecules

"obgen" - Generate Molecule 3D Structures

"obgrep" - Search Molecules using SMARTS

"obminimize" - Optimize Geometry/Energy of Molecule

"obprobe" - Create Electrostatic Probe Grid

"obrotamer" - Generate Random Rotational Isomers

"obrotate" - Rotate Dihedral Angles with SMARTS

RDKit: Open-Source Cheminformatics Software

What Is RDKit

RDKit Installation Options

Install RDKit in an Anaconda Environment

Install RDKit Binary Package for CentOS

Build RDKit from Source Code on CentOS System

Compile, Link and Run RDKit C++ API Examples

Try Python API with RDKit Native Code

rdkit.Chem.rdchem - The Core Module

What Is rdkit.Chem.rdchem Module

rdkit.Chem.rdchem.Mol - The Molecule Class

rdkit.Chem.rdchem.Atom - The Atom Class

rdkit.Chem.rdchem.Bond - The Bond Class

rdkit.Chem.rdchem.RWMol - The RWMol Class

rdkit.Chem.rdmolfiles - Molecular File Module

What Is rdkit.Chem.rdmolfiles Module

MolFromSmiles/MolToSmiles for SMILES Format

MolFromMolBlock/MolToMolBlock for Mol Block

SmilesMolSupplier/SDWriter for SMILES Files

SDMolSupplier/SDWriter for SDF Files

rdkit.Chem.rdDepictor - Compute 2D Coordinates

What Is rdkit.Chem.rdDepictor Module

rdkit.Chem.Draw - Handle Molecule Images

What Is rdkit.Chem.Draw Module

MolToImage/MolToFile - Molecule PNG Image

rdkit.Chem.Draw.MolDrawing.DrawingOptions Class

rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DCairo - 2D Molecule Drawing

rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DCairo - Molecule PNG Image

rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG - Molecule SVG Image

rdkit.Chem.Draw.rdMolDraw2D.MolDrawOptions - Drawing Options

Drawing Diagrams with MolDraw2DCairo and MolDraw2DSVG

Molecule Substructure Search with RDKit

RDKit m.HasSubstructMatch(s) - Substructure Match

RDKit GenerateDepictionMatching2DStructure(m, s) - Substructure Orientation

RDKit rdMolDraw2D.PrepareAndDrawMolecule - Substructure Highlight

RDKit Substructure Search with SMARTS

rdkit.Chem.rdFMCS - Maximum Common Substructure

rdkit.Chem.rdSubstructLibrary - Substructure Library

Substructure Library in Binary and SMILES Formats

rdkit.Chem.rdmolops - Molecule Operations

What Is rdkit.Chem.rdmolops Module

Molecule Similarity Based on Fingerprints with RDKit

Molecule Core and Sidechains Decomposition with RDKit

R-Group Decomposition with RDKit

Daylight Fingerprint Generator in RDKit

What Is Daylight Fingerprint Generator in RDKit

RDKFingerprint() Method in RDKit

Impact of 'useBondOrder' on RDKFingerprint()

Impact of 'branchedPaths' on RDKFingerprint()

Impact of 'maxPath' on RDKFingerprint()

Impact of 'fpSize' on RDKFingerprint()

Impact of 'tgtDensity' on RDKFingerprint()

Impact of 'nBitsPerHash' on RDKFingerprint()

UnfoldedRDKFingerprintCountBased() Method in RDKit

GetRDKitFPGenerator() Method in RDKit

Morgan Fingerprint Generator in RDKit

What Is Morgan Fingerprint Generator in RDKit

GetMorganFingerprint() Method in RDKit

Impact of 'radius' on GetMorganFingerprint()

Impact of 'useCounts' on GetMorganFingerprint()

Impact of 'invariants' on GetMorganFingerprint()

Impact of 'useBondTypes' on GetMorganFingerprint()

Impact of 'fromAtoms' on GetMorganFingerprint()

GetMorganFingerprintAsBitVect() Method in RDKit

Impact of 'nBits' on GetMorganFingerprintAsBitVect()

GetHashedMorganFingerprint() Method in RDKit

Impact of 'nBits' on GetHashedMorganFingerprint()

GetMorganGenerator() Method in RDKit

Morgan Fingerprint Generator in RDKit for FCFP

RDKit Performance on Substructure Search

Introduction to Molecular Fingerprints

OCSR (Optical Chemical Structure Recognition)

StoneMIND Collector - Information Extraction System

Install StoneMIND Collector Client on Windows

Use StoneMIND Collector on Windows

Stop StoneMIND Collector on Windows

Use StoneMIND Collector Web Interface

AlphaFold - Protein Structure Prediction

What Is AlphaFold

Open Source Code for AlphaFold

Download AlphaFold Package and Databases

Resources and Tools

Cheminformatics Related Terminologies

References

Full Version in PDF/EPUB

Keywords:Cheminformatics, Molecule, DNA, Gene, BioTech