“Necessity is the mother of all innovation” is the mantra I grew up believing. This has held the test-of-time. When you are short on resources, you innovate and optimize to make most of your current resources. Let’s consider a specific case study of DeepSeek - the researchers from certain part of the world did not had access to best and most computing resources. Those reseachers in-turn innovated and built economical and optimized way to train the model to achieve performance at-par with contempoary models from “big” frontier labs.

More recently when I talk to many of my friends from these frontier labs, I see them boasting about how many tokens they burn every day. While some other friends from FinTech or startups cringe about limited token budget they have to manage every day. From my personal experience at previous employer, I remember I had to prioritize tasks to “AI or not-to AI” for. At times, I would leave the coding task in the middle to continue next day after replenishing my token budget. These places have the “necessity” to innovate to optimize for the token, but, they don’t have skills or business objective to pursue that.

Ironically, the frontier AI Labs have skills to innovate, but, they have “unlimited tokens access”. It might not be in their best interest as well to optimize for token usage, since its their core business engine. So the context window keeps expanding. I don’t have any data to prove, but, if I have to guess, I would say the AI tokens are easily consuming 2-3% of internet traffic in USA from top 10 tech companies.

I truly believe, there is a “need” for innovation around efficient token usage to truly democratize AI access. For agentic workflows, we have distributed setup, wherein we can employ multiple agents using different backend models. I wonder can we employ such a distributed setup for a given inference request ? In this model, I can host some opensource models locally (on-pre/on-device) and use them to serve part of inference request, reaching out to the “big” models for complex/critical tasks only.

In some ways, we need motivation, incentives and dedicated efforts to address the problem of optimized token usage.