Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep tokens limited #653

Open
DanijelDomazet opened this issue Dec 10, 2024 · 0 comments
Open

Keep tokens limited #653

DanijelDomazet opened this issue Dec 10, 2024 · 0 comments

Comments

@DanijelDomazet
Copy link

DanijelDomazet commented Dec 10, 2024

Limit tokens

I'm using the --chat option constantly.

sgpt --chat mylongchat "Please calculate 2+2?"

After some days, I got the error:

RateLimitError: Error code: 429 - 
{'error': 
{'message': 'Request too large for gpt-4o in organization org-78asdf87asdf9aaa8976 on tokens per min (TPM): 
Limit 30000, Requested 31538. 
The input or output tokens must be reduced in order to run successfully. 
Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

The only thing that helps here is to start a new chat:

sgpt --chat myNEWlongchat "Please calculate 2+2?"

But with this change I loose all the chat context, which I would like to avoid.

Instead of limiting chat HISTORY, limit input tokens instead. When tokens are above the quota, delete the oldest message(s) from chat history.

Example

Something like (added to class ChatSession:):

    import tiktoken

    TOKEN_LIMIT = 30000

    def _limit_tokens(self, chat_id):
        while True:
            current_token_estimate = self._count_tokens(chat_id)
            if current_token_estimate > TOKEN_LIMIT:
                messages = self._read(chat_id)
                # Remove 2nd and 3rd message (first question and answer)
                truncated_messages = messages[:1] + messages[3:]
                self._write(truncated_messages, chat_id)
            else:
                break

    def _count_tokens(self, chat_id: str) -> int:
        file_path = self.storage_path / chat_id
        parsed_cache = json.loads(file_path.read_text())
        text_to_encode = " ".join(message["content"] for message in parsed_cache if "content" in message)
        tokenizer = tiktoken.get_encoding("o200k_base") # chatgpt 4.0 
        return len(tokenizer.encode(text_to_encode))

Call _limit_tokens() from ChatSession.wrapper().

@DanijelDomazet DanijelDomazet changed the title Keep "tokens per minute" limited Keep tokens limited Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant