Tokens are a big reason today’s generative AI falls short
Loading...
Generative AI models don’t process text like humans, their “token”-based nature can help explain some of their strange behaviors.
Generative AI models don’t process text like humans, their “token”-based nature can help explain some of their strange behaviors.
Loading...
Depending on how a model is prompted, such as “once upon a” vs “once upon a , " you may get completely different responses.
Depending on how a model is prompted, such as “once upon a” vs “once upon a , " you may get completely different responses.
Loading...
It can take a transformer twice as long to complete a task phrased in a non-English language versus the same task phrased in English.
It can take a transformer twice as long to complete a task phrased in a non-English language versus the same task phrased in English.
Loading...
MambaByte can ingest more data than transformers without performance penalties, working with raw bytes over tokens.
MambaByte can ingest more data than transformers without performance penalties, working with raw bytes over tokens.
Most models, from small on-device ones like Gemma to OpenAI’s industry-leading GPT-4o, are built on an architecture known as the transformer. Due to the way transformers conjure up associations between text and other types of data, they can’t take in or output raw text — at least not without a massive amount of compute.
References: