Update README.md

6879314 verified 5 months ago

11.8 kB

	---
	language:
	- en
	license: gemma
	library_name: transformers
	tags:
	- function-calling
	- agent-routing
	- multi-agent
	- lora
	- peft
	- gemma
	- functiongemma
	- customer-support
	- e-commerce
	base_model: google/functiongemma-270m-it
	datasets:
	- scionoftech/functiongemma-e-commerce-dataset
	model-index:
	- name: functiongemma-270m-ecommerce-router
	results:
	- task:
	type: text-classification
	name: Agent Routing
	dataset:
	name: E-commerce Customer Support Routing
	type: scionoftech/ecommerce-agent-routing
	metrics:
	- type: accuracy
	value: 89.4
	name: Routing Accuracy
	- type: f1
	value: 89.0
	name: Macro F1 Score
	---

	# FunctionGemma 270M - E-Commerce Multi-Agent Router

	Fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) for intelligent routing of customer queries across 7 specialized agents in e-commerce customer support systems.

	## Model Description

	This model demonstrates how FunctionGemma can be adapted beyond mobile actions for multi-agent orchestration in enterprise systems. It intelligently routes natural language customer queries to the appropriate specialized agent with 89.4% accuracy.

	Key Achievement: Replacing brittle rule-based routing (52-58% accuracy) with learned intelligence using only 1.47M trainable parameters (0.55% of the model).

	### Architecture

	- Base Model: google/functiongemma-270m-it (270M parameters)
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Trainable Parameters: 1,474,560 (0.55%)
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Target Modules: q_proj, k_proj, v_proj, o_proj

	### Training Details

	- Training Data: 12,550 synthetic customer queries (balanced across 7 agents)
	- Training Time: 45 minutes on Google Colab T4 GPU
	- Framework: Hugging Face Transformers + PEFT + TRL
	- Quantization: 4-bit NF4 during training
	- Optimizer: paged_adamw_8bit
	- Learning Rate: 2e-4
	- Epochs: 3
	- Batch Size: 4 (effective 16 with gradient accumulation)

	## Intended Use

	### Primary Use Case
	Multi-agent customer support routing for e-commerce platforms:
	- Route queries to order management, product search, returns, payments, account, technical support agents
	- Maintain conversation context across multi-turn interactions
	- Enable intelligent task switching

	### Supported Agents

	The model routes queries to 7 specialized agents:

	1. Order Management (`route_to_order_agent`) - Track orders, update delivery, cancel orders
	2. Product Search (`route_to_search_agent`) - Search catalog, check availability, recommendations
	3. Product Details (`route_to_details_agent`) - Specifications, reviews, comparisons
	4. Returns & Refunds (`route_to_returns_agent`) - Initiate returns, process refunds, exchanges
	5. Account Management (`route_to_account_agent`) - Update profile, manage addresses, security
	6. Payment Support (`route_to_payment_agent`) - Resolve payment issues, update methods, billing
	7. Technical Support (`route_to_technical_agent`) - Fix app/website issues, login problems

	### Out-of-Scope Use

	- ❌ General-purpose chatbot (use base Gemma models instead)
	- ❌ Direct dialogue generation (this is a routing model)
	- ❌ More than 20 agents (context window limitations)
	- ❌ Non-customer-support domains without fine-tuning

	## Performance

	### Test Set Results

	```
	Overall Accuracy: 89.40% (1,684/1,883 correct)

	Per-Agent Performance:
	order_management 92.3% (251/272)
	product_search 91.1% (257/282)
	product_details 94.7% (233/246)
	returns_refunds 88.2% (238/270)
	account_management 85.1% (229/269)
	payment_support 89.5% (241/269)
	technical_support 87.0% (234/269)
	```

	### Comparison to Baselines

	\| Approach \| Accuracy \| Latency \| Memory \|
	\|----------\|----------\|---------\|--------\|
	\| Keyword Matching \| 52-58% \| 5ms \| Negligible \|
	\| Rule-based (100 rules) \| 65-70% \| 8ms \| Negligible \|
	\| BERT Classifier (300M) \| 82-85% \| 45ms \| 400 MB \|
	\| This Model (LoRA) \| 89.4% \| 127ms \| 2.1 GB \|
	\| GPT-4 API (zero-shot) \| 85-90% \| 2500ms \| Cloud \|

	### Latency Breakdown (T4 GPU)

	- Routing Decision: 127ms average
	- Agent Execution: ~52ms average
	- Total End-to-End: ~179ms average

	## How to Use

	### Installation

	```bash
	pip install transformers peft torch accelerate bitsandbytes
	```

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/functiongemma-270m-it",
	device_map="auto",
	torch_dtype=torch.bfloat16
	)

	# Load LoRA adapters
	model = PeftModel.from_pretrained(
	base_model,
	"scionoftech/functiongemma-270m-ecommerce-router"
	)

	tokenizer = AutoTokenizer.from_pretrained("google/functiongemma-270m-it")

	# Define available agents
	agent_declarations = """<start_function_declaration>
	route_to_order_agent(): Track, update, or cancel customer orders
	route_to_search_agent(): Search products, check availability
	route_to_details_agent(): Get product specifications and reviews
	route_to_returns_agent(): Handle returns, refunds, exchanges
	route_to_account_agent(): Manage user profile and settings
	route_to_payment_agent(): Resolve payment and billing issues
	route_to_technical_agent(): Fix app, website, login issues
	<end_function_declaration>"""

	# Route a query
	query = "Where is my order?"

	prompt = f"""<start_of_turn>user
	{agent_declarations}

	User query: {query}<end_of_turn>
	<start_of_turn>model
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=30,
	do_sample=False,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
	print(response)
	# Output: <function_call>route_to_order_agent</function_call>
	```

	### Production Deployment (4-bit Quantization)

	```python
	from transformers import AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel

	# 4-bit quantization config
	quant_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	# Load with quantization
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/functiongemma-270m-it",
	quantization_config=quant_config,
	device_map="auto"
	)

	model = PeftModel.from_pretrained(
	base_model,
	"scionoftech/functiongemma-270m-ecommerce-router"
	)

	# Result: 180 MB model, 132ms latency, 89.1% accuracy
	```

	### Parsing Function Calls

	```python
	import re

	def extract_agent_function(response: str) -> str:
	"""Extract function name from FunctionGemma output."""
	match = re.search(r'<function_call>([a-zA-Z_]+)</function_call>', response)
	return match.group(1) if match else "unknown"

	# Usage
	agent = extract_agent_function(response)
	print(f"Route to: {agent}")
	# Output: Route to: route_to_order_agent
	```

	## Training Procedure

	### Dataset Preparation

	Generated 12,550 synthetic examples with linguistic variations:

	```python
	# Example training format
	{
	"query": "Please track my package ASAP",
	"function": "route_to_order_agent",
	"agent": "order_management"
	}
	```

	Variations included:
	- Polite forms: "Please", "Could you", "Can you"
	- Casual starters: "Hey", "Hi", "Um"
	- Urgency markers: "ASAP", "urgently", "immediately"
	- Edge cases and ambiguous queries

	### Training Configuration

	```python
	from transformers import TrainingArguments
	from trl import SFTTrainer
	from peft import LoraConfig

	# LoRA config
	lora_config = LoraConfig(
	r=16,
	lora_alpha=32,
	target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM"
	)

	# Training args
	training_args = TrainingArguments(
	output_dir="./functiongemma-ecommerce-router",
	num_train_epochs=3,
	per_device_train_batch_size=4,
	gradient_accumulation_steps=4,
	learning_rate=2e-4,
	lr_scheduler_type="cosine",
	warmup_ratio=0.1,
	weight_decay=0.01,
	bf16=True,
	optim="paged_adamw_8bit",
	logging_steps=20,
	eval_strategy="epoch",
	save_strategy="epoch"
	)
	```

	### Training Results

	- Final Training Loss: 0.0182
	- Final Validation Loss: 0.0198
	- Training Time: 45 minutes (T4 GPU)
	- Peak Memory: 11.2 GB

	## Limitations and Biases

	### Known Limitations

	1. Ambiguous Queries: 10.6% error rate concentrated in genuinely ambiguous queries
	- Example: "I need help" (could be any agent)
	- Mitigation: Implement confidence-based clarification (confidence < 0.7)

	2. Context Dependency: Requires conversation state management for multi-turn interactions
	- Solution: Use durable workflow orchestrators (Temporal, Cadence)

	3. Agent Confusion: Most common misclassifications:
	- Returns ↔ Order Management (12 cases)
	- Account ↔ Payment (8 cases)
	- Technical ↔ Product Details (6 cases)

	4. Language: Trained only on English queries
	- For multilingual support, fine-tune on translated datasets

	### Biases

	- Domain-Specific: Trained exclusively on e-commerce customer support
	- Synthetic Data: Generated examples may not capture all real-world variations
	- Agent Distribution: Balanced training may not reflect real query distributions

	## Ethical Considerations

	- Misrouting Impact: Incorrect routing may frustrate customers or delay issue resolution
	- Recommendation: Implement fallback to human agents for low-confidence predictions
	- Privacy: Model doesn't store user data; conversation state managed externally
	- Fairness: Ensure equal routing performance across user demographics

	## Citation

	If you use this model in your research or production systems, please cite:

	```bibtex
	@misc{functiongemma-ecommerce-router,
	author = {Sai Kumar Yava},
	title = {FunctionGemma 270M Fine-tuned for E-Commerce Multi-Agent Routing},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/scionoftech/functiongemma-270m-ecommerce-router}},
	}

	@article{functiongemma2025,
	title={FunctionGemma: Bringing bespoke function calling to the edge},
	author={Google DeepMind},
	year={2025},
	url={https://blog.google/technology/developers/functiongemma/}
	}
	```

	## Acknowledgments

	- Google DeepMind for FunctionGemma base model
	- Hugging Face for PEFT and Transformers libraries
	- The open-source AI community

	## License

	This model inherits the Gemma license from the base model. See [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

	Commercial Use: Permitted under Gemma license terms.

	## Related Resources

	- Blog Post: [Full implementation guide](https://medium.com/@saikumaryava/beyond-mobile-actions-exploring-functiongemma-for-intelligent-multi-agent-orchestration-242dc0273f93)
	- Funcroute python package: [funcroute](https://github.com/scionoftech/funcroute)
	- Training Notebook: [Google Colab](https://colab.research.google.com/github/scionoftech/functiongemma-finetuning-e-commerce/blob/main/FunctionGemma_fine_tuning.ipynb)
	- GitHub Repository: [Complete code](https://github.com/scionoftech/functiongemma-finetuning-e-commerce)
	- Dataset: [Training data](https://huggingface.co/datasets/scionoftech/functiongemma-e-commerce-dataset)
	- Base Model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)

	## Updates

	- 2025-12-25: Initial release - 89.4% routing accuracy on e-commerce customer support

	---

	Questions? Open an issue on [GitHub](https://github.com/scionoftech/functiongemma-finetuning-e-commerce/issues)