OpenAI Integration & Response
Author: Georgi Peev
This document details how the WFP Chatbot integrates with OpenAI's API and handles responses.
Environment Setup
Before working with the OpenAI integration:
- Create and activate a Python virtual environment:
# Create venv
python -m venv venv
# Activate venv
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activate
- Install required dependencies:
pip install -r requirements.txt
- Set up OpenAI API key:
# Add to .env file
OPENAI_API_KEY=your_api_key_here
OpenAI Integration
Model Configuration
The chatbot supports multiple OpenAI models with different configurations:
Model | Token Limit | Input Cost (USD/token) | Output Cost (USD/token) |
---|---|---|---|
gpt-4o | 26,000 | $0.0000025 | $0.00001 |
gpt-3.5-turbo | 4,096 | $0.0000005 | $0.0000015 |
o1-mini | 26,000 | $0.0000025 | $0.00001 |
Note: The token limits for gpt-4o and o1-mini are set to 26,000 (below the organizational limit of 30,000) for safety.
Authentication
The integration uses OpenAI's API key, which should be set in the environment variables:
OPENAI_API_KEY=your_api_key_here
Request Flow
-
Query Processing
- User query is received
- Relevant documents are retrieved via similarity search
- If country report specified: limit=1 document
- If no country report: uses user-specified limit (default: 5)
- Documents are compressed to optimize token usage
- Conversation context is gathered from recent interactions
-
Message Construction
- System message defines the chatbot's role and guidelines
- User message combines:
- Recent conversation context
- Country report context (if specified)
- Relevant document content
-
OpenAI API Call
completion = openai.chat.completions.create(
model=chatbot_type,
messages=messages,
temperature=0.7
)
Error Handling
-
Rate Limit Handling
- On hitting rate limits, waits 20 seconds
- Makes exactly one retry attempt
- If retry fails, error is propagated
-
Context Length Management
- Detects maximum context length errors
- Automatically reduces context by half and retries
- Preserves core functionality while adapting to limits
-
General Error Handling
- Catches and logs all OpenAI API errors
- Propagates errors for proper client notification
Response Handling
Token Management
-
Daily Usage Tracking
- Monitors token usage in MongoDB
- Enforces daily token limit (default: 26,000 tokens)
- Tracks costs for both input and output tokens
-
Token Optimization
- Compresses document content before API calls
- Truncates text to stay within model limits
- Caches compressed documents for efficiency
Document Compression
The chatbot uses a compression system to maximize context while minimizing token usage:
-
Compression Process
# For each relevant document
compressed_doc, res, is_compressed = prompt_compressor.compress_document(doc, doc["_id"])- Summarizes document content while preserving key information
- Removes redundant or less relevant details
- Maintains essential facts, statistics, and relationships
- Uses GPT-based compression for intelligent summarization
-
Caching Strategy
- Compressed versions are cached by document ID
- Cache hits avoid recompression costs and API calls
- Compression stats are tracked:
print(f"Compression stats:\n Before: {real_tokens}\n After: {compressed_tokens}")
print(f"Cost saved: ${(input_cost - cost_compression):.8f}")
Context Building
The chatbot builds context for OpenAI in several layers:
-
Conversation History
- Maintains recent interactions (default: 5 turns)
- More recent messages have higher priority
- Used to maintain conversation coherence
- Format:
f"Recent messages:\n{conversation_context}\n\n"
-
Document Context
- Country-specific reports (if applicable):
f"Provide your answer solely based on the following context which is about a country report:\n{report_context}"
- General knowledge base:
f"Use the following context:\n{search_result}"
- Country-specific reports (if applicable):
Response Formatting
The chatbot's responses follow strict formatting and content guidelines:
Category | Guidelines |
---|---|
Language | - Clear and concise - Professional tone - No technical jargon |
Content Restrictions | - No future predictions - No political opinions - No food aid recommendations - No sensitive information |
Data Usage | - Only use provided context - No external data sources - Cite sources when available |
Formatting | - Use markdown for structure - Include relevant headers - Maintain consistent style |
-
Supported Markdown Syntax
- Headers (h1-h3) for section organization
- Bold text for key statistics
- Bullet points for listing facts
- Ordered lists for sequential information
- Blockquotes for data limitations
- URLs (only for hungermap.wfp.org)
- Tables for data comparison
- Task lists for status indicators
-
Content Requirements
- Use only data from provided context
- Include precise statistics when available
- State time periods for any trends
- Acknowledge data limitations explicitly
- Use English names for locations
- Reference only WFP Hunger Map as source
-
Content Restrictions
- No future predictions
- No political opinions
- No food aid recommendations
- No sensitive information
- No speculation beyond data
- No unsupported markdown syntax
The formatting ensures responses are:
- Consistently structured
- Easy to read
- Suitable for frontend rendering
- Professional in tone
- Data-focused and factual
Monitoring and Optimization
-
Performance Monitoring
- Tracks compression statistics
- Monitors token usage and costs
- Records cache hit rates
-
Cost Optimization
- Calculates and logs cost savings from compression
- Uses appropriate models based on context size
- Caches frequently used data
Configuration
Key configuration options include:
DAILY_TOKEN_LIMIT=26000
MONGODB_URI=your_mongodb_uri
MONGODB_DB=your_database_name
These settings can be adjusted to balance cost, performance, and functionality requirements.