Molecule Module

The enhanced Molecule module extends Rxn-INSIGHT’s capabilities with PubChem integration, similarity calculations, and additional analysis features. This allows for more comprehensive molecular analysis and information retrieval.

Features

  • PubChem Integration: Retrieve chemical names, descriptions, and identifiers

  • Similarity Calculation: Compare molecules using Morgan fingerprints and Tanimoto similarity

  • Functional Group Analysis: Identify chemical functional groups

  • Ring System Detection: Extract ring structures from molecules

  • Reaction Search: Find reactions that produce or involve the molecule

Installation

The enhanced Molecule module requires additional dependencies:

pip install requests tqdm

Basic Usage

Creating a Molecule and Analyzing Properties

import rxn_insight as ri

# Create a molecule from SMILES
mol = ri.Molecule("c1ccc(C(=O)O)cc1")  # Benzoic acid

# Get basic properties
print(f"SMILES: {mol.smiles}")
print(f"InChI: {mol.inchi}")
print(f"InChIKey: {mol.inchikey}")

# Identify functional groups
functional_groups = mol.get_functional_groups()
print(f"Functional groups: {functional_groups}")

# Find ring systems
rings = mol.get_rings()
print(f"Ring systems: {rings}")

# Get molecular scaffold
print(f"Scaffold: {mol.scaffold}")

PubChem Integration

Retrieve chemical information from PubChem:

# Create a molecule with PubChem information
aspirin = ri.Molecule("CC(=O)OC1=CC=CC=C1C(=O)O", allow_pubchem=True)

# Access PubChem information
print(f"IUPAC Name: {aspirin.iupac_name}")
print(f"Common Name: {aspirin.trivial_name}")
print(f"PubChem CID: {aspirin.cid}")

# Description may contain information about the molecule's uses and properties
if aspirin.description:
    print(f"Description excerpt: {aspirin.description[:100]}...")

Calculating Molecular Similarity

Compare molecules using fingerprint-based similarity:

# Create reference molecule
benzene = ri.Molecule("c1ccccc1")

# Compare with other molecules
similarities = {
    "Toluene": benzene.calculate_similarity("Cc1ccccc1"),
    "Phenol": benzene.calculate_similarity("Oc1ccccc1"),
    "Naphthalene": benzene.calculate_similarity("c1ccc2ccccc2c1"),
    "Cyclohexane": benzene.calculate_similarity("C1CCCCC1")
}

# Print results
for name, similarity in similarities.items():
    print(f"Similarity to {name}: {similarity:.3f}")

Searching for Reactions

Find reactions that produce the molecule:

import pandas as pd
from rxn_insight.molecule import Molecule

# Load a reaction database
df_rxns = pd.read_parquet("your_reaction_database.gzip")

# Create a molecule
biphenyl = Molecule("c1ccc(-c2ccccc2)cc1")

# Find reactions that produce this molecule
reactions = biphenyl.search_reactions(df_rxns)

# Print the reactions
if reactions is not None and len(reactions) > 0:
    print(f"Found {len(reactions)} reactions producing this molecule:")
    for i, (idx, row) in enumerate(reactions.head(3).iterrows()):
        print(f"\nReaction {i+1}:")
        print(f"SMILES: {row['REACTION']}")
        print(f"Class: {row['CLASS']}")
        print(f"Conditions: {row['SOLVENT']}, {row['CATALYST']}, {row['REAGENT']}")
else:
    print("No reactions found for this molecule.")

Finding Similar Reactions by Scaffold

Search for reactions with products having similar scaffolds:

# Find reactions with similar scaffolds
similar_rxns = biphenyl.search_reactions_by_scaffold(
    df_rxns,
    threshold=0.6,
    max_return=10,
    fp="Morgan"
)

# Print the similar reactions
if similar_rxns is not None and len(similar_rxns) > 0:
    print(f"Found {len(similar_rxns)} reactions with similar scaffolds:")
    for i, (idx, row) in enumerate(similar_rxns.head(3).iterrows()):
        print(f"\nSimilar reaction {i+1} (Similarity: {row['SIMILARITY']:.2f}):")
        print(f"Product: {row['REACTION'].split('>>')[1]}")
        print(f"Reaction: {row['REACTION']}")
        print(f"Class: {row['CLASS']}")

API Reference

class Molecule(smi, allow_pubchem=False)

A class to handle and analyze molecular structures.

Parameters:
  • smi (str) – SMILES string of the molecule

  • allow_pubchem (bool) – Whether to fetch PubChem information (default: False)

mol

RDKit molecule object

smiles

SMILES representation of the molecule

inchi

InChI identifier of the molecule

inchikey

InChIKey identifier of the molecule

trivial_name

Common name from PubChem (if available)

iupac_name

IUPAC name from PubChem (if available)

description

Description from PubChem (if available)

cid

PubChem compound ID (if available)

functional_groups

List of functional groups in the molecule

rings

Ring structures in the molecule

scaffold

Murcko scaffold of the molecule

maccs_fp

MACCS fingerprint

morgan_fp

Morgan fingerprint

reactions

DataFrame of reactions involving this molecule

get_pubchem_information()

Retrieves information about the molecule from PubChem’s REST API.

calculate_similarity(smi)

Calculates the chemical similarity between this molecule and another one.

Parameters:

smi (str) – SMILES string of the molecule to compare with

Returns:

Tanimoto similarity value (0-1, where 1 is identical)

Return type:

float

search_reactions(df)

Searches for reactions involving the molecule as a product.

Parameters:

df (pandas.DataFrame) – The DataFrame to search for reactions

Returns:

DataFrame containing matching reactions

Return type:

pandas.DataFrame

search_reactions_by_scaffold(df, threshold=0.5, max_return=100, fp='MACCS')

Searches for reactions based on scaffold similarity.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing reactions to search

  • threshold (float) – Similarity threshold to apply (0-1)

  • max_return (int) – Maximum number of reactions to return

  • fp (str) – Type of fingerprint to use (‘MACCS’ or ‘Morgan’)

Returns:

DataFrame of similar reactions, sorted by similarity

Return type:

pandas.DataFrame

get_functional_groups(df=None)

Identifies and returns the functional groups present in the molecule.

Parameters:

df (pandas.DataFrame) – DataFrame containing functional group patterns

Returns:

List of functional group names found in the molecule

Return type:

list[str]

get_rings()

Identifies and returns rings in the molecule.

Returns:

List of ring SMILES strings found in the molecule

Return type:

list[str]