Prompt Compressor

Author: Muhammed Emre Bayraktaroglu

This documentation provides an overview of the PromptCompressor module, which is designed to compress (summarize) large input prompts using low-cost Large Language Models, like gpt-3.5-turbo and store these compressed prompts in a MongoDB cache. By caching, it can help reduce repeated token usage and thus lower costs. It makes up a crucial part of the chatbot's token optimization strategy.

Overview

The PromptCompressor:

Summarizes longer prompts into concise, essential content while preserving key details.
Caches the compressed results in a MongoDB collection, avoiding re-computation.
Provides cost estimation for each compression step (based on the model used).
Offers different compression models to suit various needs.

Requirements and Setup

Before using the PromptCompressor, ensure you have the following:

Python 3.8+
A valid OpenAI API Key or other model credentials (as needed by the compressor).
A running MongoDB instance (local or cloud) and credentials/connection strings.

Environment Variables

Configure these environment variables in your system or in a .env file:

OPENAI_API_KEY: The API key for your OpenAI (or compatible) LLM.
MONGODB_URI: Connection string for your MongoDB database.
MONGODB_DB_PROMPT_COMPRESSOR_CACHE: The name of the database used to store the compressed prompt cache. (in our example, prompt_compressor_db)

Example .env file:

OPENAI_API_KEY=your_openai_api_key
MONGODB_URI=url_to_your_mongodb_instance
MONGODB_DB_PROMPT_COMPRESSOR_CACHE=prompt_compressor_db

Implementation

Models

Currently, the PromptCompressor supports three models: gpt-4o, o1-mini, and gpt-3.5-turbo. You can add more models as needed.

Costs for Different Models

Model Name	Input Cost	Output Cost
GPT-3.5-Turbo	0.000003	0,000006
GPT-4o	0.0000025	0,00001
O1-Mini	0.000003	0,000012

Source: OpenAI API Reference Pricing

The costs for the calculations are implemented in the ModelCosts class.

Note: These are example costs; update them according to your usage and actual model pricing.

Class: PromptCompressor

This class defines the core functionality of the PromptCompressor module. It includes methods for compressing documents, caching compressed prompts, and retrieving them from the cache.

Constructor

def __init__(self, model_name="microsoft/phi-2"):
    """
    Initialize the Prompt Compressor.

    Args:
        model_name (str): Pre-trained LLM model to be used for compression.
                          Default is "microsoft/phi-2".
    """
    self.model_name = model_name
    self.device = "cpu"
    self.oai_tokenizer = tiktoken.encoding_for_model(Models.GPT_4O)

model_name (str): Defines which model to use when compressing your prompts. Default is microsoft/phi-2. This variable is for a possible LLMLingua implementation and can be ignored for now.
device (str): Currently set to "cpu", but you can modify it for GPU usage if supported by your environment. This variable is for a possible LLMLingua implementation and can be ignored for now.
oai_tokenizer (Encoding): An encoder/decoder for measuring token length to estimate usage.

Methods

Function: cacheable_item(obj_id, compressed_prompt):

Creates a dictionary that can be inserted into MongoDB to store the compressed prompt.

Arguments:

obj_id (int): Unique ID for the document or prompt.
compressed_prompt (str): The summarized version of the input content.

Returns:

a dictionary containing the compressed prompt and the document id

Function: insert_cache(item):

Inserts a compressed prompt dictionary into MongoDB.

Arguments:

item (dict): A dictionary containing at least doc_id and compressed_prompt.

Returns:

Nothing. A message is printed if the caching is successful

Function: is_cached(obj_id):

Checks if a compressed prompt is already cached.

Arguments:

obj_id (int): The ID of the document to look up.

Returns:

a boolean value, if the document it already cached

Function: get_cached_prompt(obj_id)

Retrieves a compressed prompt from the MongoDB cache if it exists.

Arguments:

obj_id (int): The ID of the document to retrieve.

Returns:

The cached prompt

Function: compress_document(content, id_of_document)

Compresses a larger piece of text content using a chosen model. If the document has been compressed before, it uses the cached version (no additional token usage).

Arguments:

content (str): The text content to be compressed.
id_of_document (int): A unique identifier for the document to cache/retrieve.

Returns:

The compressed prompt
A dictionary containing the number of tokens of the uncompressed prompt, the number of tokens of the compressed prompt, the compression rate, and the cost of the compression

Usage Example

Below is a minimal usage example demonstrating how to instantiate and use the PromptCompressor to compress a document.

from prompt_compressor import PromptCompressor

Initialize compressor

compressor = PromptCompressor()

Example content to compress

content = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur lobortis..."

A unique identifier for the document to cache/retrieve

id_of_document = "unique_document_id"

compressed_prompt, info_vector, success = compressor.compress_document(content, id_of_document)

if success:
    print("Compression successful!")
    print("Compressed Prompt:", compressed_prompt)
    print("Info Vector:", info_vector)
else:
    print("Compression failed or the document was already cached.")

System Prompt for the Compressor

The Compressor utilizes a low-cost LLM to summarize the content. The system prompt used is as follows:

system_instruction = """
            You are tasked with compressing detailed documents into a
            concise and structured bullet-point summary format.
            Your goal is to retain all essential information,
            including numerical values, dates, locations, and event types,
            while omitting redundant descriptions.
            Follow these instructions carefully:
            1.	Include the following key details for each document:
            •	Document name
            •	Type of event (if applicable)
            •	Country name
            •	Key metrics (e.g., population affected, percentages, analysis periods, phases, or severities)
            •	Event details (e.g., IDs, hazard names, locations, dates of creation and updates)
            •	Data source or labels (if mentioned)
            2.	Format the summary in bullet points for clarity.
            3.	Ensure that no data is lost but avoid repeating the same information multiple times.
            4.	Use clear and concise language to capture the most relevant details.
        """

This prompt warrants that the compressor focuses on the most relevant details, ensuring that no data is lost while avoiding redundancy.

What Happens Under the Hood?

Token Counting: The content length (in tokens) is measured using the tokenizer.
Model Selection: If the document is too large, a more robust (but more expensive) model (GPT-4o) may be used. Otherwise, GPT-3.5-Turbo (or your specified default model) is used.
Cache Check: If the document has been compressed before (is_cached), no additional cost is incurred, and the cached summary is retrieved.
LLM Call: If not cached, the content is sent to the Large Language Model with a system prompt for bullet-point summarization.
Summary and Caching: The response is saved both for immediate usage and for future retrieval (insert_cache).
Cost Calculation: The system calculates how many tokens were used (both input and output) and multiplies them by the relevant cost rates.

Conclusion

The PromptCompressor module streamlines the process of summarizing lengthy prompts into manageable chunks and storing them in MongoDB. By leveraging caching, this approach:

Saves on repeated LLM usage.
Quickly retrieves previously summarized documents.
Reduces token consumption and cost.

Overview​

Requirements and Setup​

Environment Variables​

Implementation​

Models​

Costs for Different Models​

Class: PromptCompressor​

Constructor​

Methods​

Usage Example​

Initialize compressor​

Example content to compress​

A unique identifier for the document to cache/retrieve​

System Prompt for the Compressor​

What Happens Under the Hood?​

Conclusion​

Overview

Requirements and Setup

Environment Variables

Implementation

Models

Costs for Different Models

Class: PromptCompressor

Constructor

Methods

Usage Example

Initialize compressor

Example content to compress

A unique identifier for the document to cache/retrieve

System Prompt for the Compressor

What Happens Under the Hood?

Conclusion