Skip to main content

Command Palette

Search for a command to run...

Optimising bhindi usage through chat management

Published
3 min read
Optimising bhindi usage through chat management

You see how your pro subscription get drained down so soon, even when you haven’t had many conversation. This happens if you don’t properly manage and optimise your chats. At the core of the AI economy is a simple unit of measurement - the token. Bhindiai charges you on tokens, and it becomes really necessary for you to understand and manage them.

In this blog, we’ll discuss about how you can optimise your bhindi usage, not hit the “model limit exceed” and get the best out of Bhindiai.

What are tokens? Why are they so important?

In the context of Natural Language Processing (NLP), tokens are the basic units of text. When you take a piece of text and break it down into smaller components, such as words or subwords, each of these components is a token. Tokens are the units that the Bhindiai or any AI model uses to read your prompt and also write its response. The model generates one token at a time, and it only predicts the next token based on all the tokens it has seen so far.

Tokens are the basic unit or segment of text which the AI processes, you can think it of as a part of a word, individual character or even a punctuation mark. “Tokenization“ is one such important concept in LLMs, the process in which large piece of text (input) gets broken down into smaller and manageable segments or tokens. These tokens are what your AI takes in to understand your input better.

These are the fundamental units that the AI (bhindiai) uses to read your prompt and also write its response. The model generates one token at a time, and it only predicts the next token based on all the tokens it has seen so far. Every token has its “maximum token limit” for a single conversation or prompt, which is also called the context window. This is the limit which determines how much information the model can “remember” and process at once.

While using Bhindiai, both type of tokens input and output tokens cost money, and the entire context of the conversation is counted as input tokens for every single message.

Bhindiai provides around 100k tokens overall and 5k tokens for a single prompt

When using Bhindiai, you are functioning in a token economy with two different currencies :

  1. input tokens - the text that you give to bhindiai

  2. output tokens - the output text that bhindi generates (answers)

You get charged for both the input and output tokens

Why Shorter Chats Save You Money

When you continue a single chat, bhindi needs to “remember” everything you’ve said so far to maintain context. In order to do this, the entire conversation history (all of your previous questions and all of bhindi’s previous answers) is sent back to the model with every new prompt.

total tokens used per message = (new prompt + full conversation history tokens) + bhindi response tokens

Longer chats have a hidden cost, they drain your tokens rapidly.

To minimise your cost you can control or influence user input format like :

  1. Create structured input templates that accomplish your goals with fewer tokens

  2. Pre-process user inputs to remove redundant information

  3. Consider whether additional context is necessary for each interaction

Some do’s and dont’s while talking to bhindi

In a Long Chat: Your 15th message might include the full text of the prior 14 questions and 14 answers. This massive chunk of text is your context window, and you are billed for all the tokens it contains every time you press “send”.

In a New Chat: Your 1st message is sent with no prior history. You are only paying for the tokens in your current, precise prompt!