Sustainable AI Inference Framework
GreenInfer is a production-ready green AI orchestration framework that routes every prompt to the most energy-efficient model capable of answering it accurately, cutting AI compute energy by up to 97%.
Every prompt passes through a 7-layer orchestration pipeline that scores complexity, optimizes tokens, and routes to the right model tier.
User sends their query to the system
DistilBERT classifier rates query 0-100 with 98.9% accuracy
T5 model removes filler tokens (avg 35% reduction)
ERCOT grid check for carbon-aware scheduling
Routes to Small / Medium / Large tier based on complexity
Returns response with complete energy metrics
Model: Llama 3.2 1B
Energy: 0.9 mWh per query
Use Cases: Simple queries, factual questions, definitions
Traffic: 55% of all queries
Model: Llama 3.1 8B
Energy: 3.8 mWh per query
Use Cases: Reasoning tasks, explanations, analysis
Traffic: 30% of all queries
Model: Llama 3.3 70B
Energy: 48.0 mWh per query
Use Cases: Complex reasoning, code generation, expert tasks
Traffic: 15% of all queries
DistilBERT classifier trained on 600 labeled examples rates every prompt 0 to 100. 98.9% validation accuracy across 4 tiers.
Silently rewrites prompts to remove filler words before inference, cutting input tokens by an average of 35% with no quality loss.
Uses hourly ERCOT grid intensity estimates to defer expensive queries when the grid is running dirty, reducing CO₂ further.
Starts with the smallest model and only escalates to medium or large if confidence is below threshold, inspired by FrugalGPT.
For complex queries, shows a summary and outline first. User confirms before the full expensive response runs.
Every answer shows energy used in mWh, CO₂ emitted, tokens saved, and a Green Efficiency Score 0-100 with an improvement tip.
Framework Components:
Open Source: The entire framework is available on GitHub for developers to integrate green AI into their applications.