# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview QSARion is a chemical agent application built to predict molecular properties given a SMILES (Simplified Molecular Input Line Entry System) string. It's designed as a Hugging Face Space using Gradio for the web interface and leverages the smolagents framework for multi-step reasoning and tool usage. ## Architecture ### Core Components - **app.py**: Main application entry point containing agent configuration and tool definitions - **Gradio_UI.py**: Custom Gradio interface components for streaming agent interactions - **agent.json**: Agent configuration file defining tools, model settings, and prompt templates - **prompts.yaml**: System prompt templates and instructions for the agent - **tools/**: Directory containing custom tool implementations - **final_answer.py**: Tool for providing final answers - **web_search.py**: DuckDuckGo search functionality - **visit_webpage.py**: Web page content retrieval ### Key Dependencies - **smolagents==1.9.2**: Core agent framework for multi-step reasoning (Feb 15, 2025) - **jaqpot-python-sdk==6.1.0**: QSAR modeling and prediction capabilities - **openinference-instrumentation-smolagents==0.1.6**: Compatible instrumentation (Feb 18, 2025) - **gradio**: Web interface framework (installed via smolagents) - **pandas, rdkit, pubchempy**: Data science and chemical informatics (latest compatible versions) - **opentelemetry-sdk, duckduckgo-search**: Supporting libraries (resolved by pip) **Important**: Key versions (smolagents, jaqpot-sdk, instrumentation) are pinned for compatibility. Other packages use latest versions to avoid dependency conflicts in HF Spaces. ## Deployment (Hugging Face Spaces) This application runs on Hugging Face Spaces using the configuration in `README.md` header: - **Runtime**: Python 3.11 - **Framework**: Gradio 5.32.1 - **Entry point**: `app.py` ### Version Management All dependencies in `requirements.txt` are pinned to prevent compatibility issues in the HF Spaces environment: - OpenTelemetry instrumentation requires specific smolagents version compatibility - RDKit versions affect molecular property calculations - Pinned versions ensure consistent behavior across Space rebuilds ### Space Configuration Environment variables required: - `LANGFUSE_PUBLIC_KEY`: For OpenTelemetry tracing - `LANGFUSE_SECRET_KEY`: For OpenTelemetry tracing ### Common Issues 1. **Space startup fails with OpenTelemetry errors:** - Application includes error handling to continue without tracing - Check Space logs for "✓ OpenTelemetry instrumentation successfully enabled" or warning messages 2. **Agent fails on simple molecules (e.g., ethane):** - PubChem API connectivity issues - QSAR model availability varies by compound type - Agent should retry with different approaches automatically ### Local Development If forking for local development: ```bash pip install -r requirements.txt python app.py ``` Ensure environment variables `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` are set for tracing. ### Key Configuration - **Model**: Uses Qwen/Qwen2.5-Coder-32B-Instruct from Hugging Face - **Tools**: web_search, visit_webpage, final_answer, qsartoolbox_get_property_value_tool, retrieve_smiles_tool - **Max Steps**: 6 (configurable in agent.json) - **Temperature**: 0.5 ## Architecture Details ### Agent Configuration The agent is configured through `agent.json` with: - **Planning System**: Multi-step fact gathering and plan generation - **Tool Integration**: Custom tools for chemical property prediction and web research - **Prompt Templates**: Structured templates for different phases (planning, execution, managed agents) - **Token Management**: Input/output token tracking for cost monitoring ### Chemical Tools Integration The application integrates with multiple QSAR (Quantitative Structure-Activity Relationship) models: - **QSAR Toolbox API**: 200+ pre-trained models for various molecular properties - **PubChem Integration**: Automatic SMILES retrieval from compound names - **Property Categories**: Toxicity, environmental fate, physical-chemical properties, endocrine disruption ### Observability Uses OpenTelemetry with Langfuse for: - Agent step tracking - Token usage monitoring - Performance analytics - Error tracking ## Common Patterns ### Tool Development Custom tools should inherit from `smolagents.tools.Tool` and implement: - `name`: Tool identifier - `description`: Purpose and usage description - `inputs`: Input parameter specifications - `output_type`: Expected return type - `forward()`: Main execution method ### Agent Interaction Flow 1. **Task Input**: User provides molecular property prediction request 2. **Planning**: Agent analyzes requirements and creates step-by-step plan 3. **Tool Execution**: Sequential tool calls (SMILES retrieval, property prediction, web search) 4. **Result Synthesis**: Combines tool outputs into comprehensive answer 5. **Final Answer**: Structured response with predictions, confidence, and context ### Gradio Interface The custom `GradioUI` class extends smolagents functionality with: - Streaming message display - Step-by-step execution visualization - Token usage reporting - Error handling and recovery ## Environment Variables Required for tracing integration: - `LANGFUSE_PUBLIC_KEY`: Langfuse public key - `LANGFUSE_SECRET_KEY`: Langfuse secret key ## File Structure Significance - `agent.json`: Central configuration for agent behavior, tools, and model settings - `prompts.yaml`: Contains the core system prompt that prioritizes QSAR toolbox accuracy - Tool files implement the actual chemical property prediction and web research capabilities - The separation allows for easy modification of agent behavior without code changes