I’ve spent my last few months in LA, talking to a mix of VCs, founders, screenwriters and directors about the current moment in AI, and what makes it different from the past. The following is influenced both by those conversations, and the “The Emerging Capabilities of Generative AI” discussion I took part in with James Cham from Bloomberg Beta at Fika Ventures last month.
The truth is, whether you’re aware of it or not, you’ve probably been using AI for years. In my case, I first heard about AI while working for Stanford Management Company. It was 2014, and the whole campus seemed to be talking about this exciting new thing called Computer Vision.
At the time, the conversation focused on two names: Andrew Ng and Fei Fei Li. Groundbreaking research scientists showing us how large datasets could help models predict images and objects. What fascinated us back then, was realizing how the relatively new computing platforms of Mobile and Cloud were underpinning the emergence of Big Data and with it, the ability to train AI algorithms on large visual datasets.
Four years after Stanford, I joined Google where I learned about Peter Norvig’s “The Unreasonable Effectiveness of Data.” His work proved that rudimentary algorithms could achieve the same results as more sophisticated algorithms if they trained on more data. It became clear to me then, that data - whether it’s images, tweets, medical scans, click-through rates, or legal documents - is pivotal to training models. It is data that models used to create the exact type of AI which has defined our last twenty years as a society: prediction.
BIG DATA, SEARCH ENGINES, AND PREDICTION MACHINES: How we’ve been living with traditional AI
Whether as consumers or as employees, we’ve all used prediction machines. From 2013 to 2021, we saw the emergence of “Business Intelligence” or “Data Analytics.” If used correctly, AI can allow professionals to manipulate data, whether it’s about their product, service, or company, to produce their own predictions.
On the consumer side, Netflix, Amazon and YouTube showed us how AI could function to predict what a viewer wanted to watch next, changing consumer habits forever. I call these products, search engines applied to verticals. What we saw over the last decade was that AI didn’t exactly bring intelligence to the market, but achieved what Ajay Agrawal, Joshua Gans, and Avi Goldfarb summarized in 2018 as a “critical component of intelligence: prediction” (Prediction Machines: The Simple Economics of Artificial Intelligence.)
While these prediction machines achieved new academic and industry milestones, Alphabet acquired DeepMind in 2014. Suddenly, the conversation changed to Super Intelligence, AGI, and how models could become far smarter than the human brain. With AGI, we could use artificial intelligence to generate something that never existed before.
This was eight years ago, and at the time, it felt like this future reality was far, far, away. It wasn’t. Today, it seems we’ve already solved AGI, and are now just playing in the margins.
WELCOME TO THE GENERATIVE ERA
We’re currently moving from a prediction era, to a generative era. While in 2014, the exercise was to clean, annotate, and train data, before deciding if it’s a cat or dog, we’re now collaborating with Transformer models and creating a dog in the style of Matisse in the South of France.
As Sonya Huang and Pat Grady wrote in Sequoia’s 2022 report 'A Creative New World' with the help of Chat GPT-3, up until recently, machines “were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists.”
The key difference is that traditional AI is about prediction, while AGI is about accessing a set of skills that normally require decades of experience and mastery. AGI is the pursuit of one multi-modal model that understands image, voice, video, text, and action in any language, form, or format. At its best, AGI is a superior collaborator, asking the user to steer and refine the final output.
While Traditional AI enabled the Information Age, AGI is about reducing the barriers of entry for highly skilled labor - whether it’s in the form of illustration, literature, medicine, law, science, or programming.
What comes next after AI, and AGI, is Superintelligence. The challenge is that even the people leading this effort towards AGI, are not in consensus as to what AGI means. As a society, we need to consider AGI as technology that can achieve things that are otherwise impossible, rather than a replacement for a human with a job. The future of our society depends on this distinction.
WE’RE IN A CORPORATE ARMS RACE
With the rise in AGI, a corporate arms race is currently underway to solve Superintelligence first. Research labs and corporate entities are competing over access to data, to compute, and to talent. We’re watching Google and Microsoft pit DeepMind versus OpenAI to create their own Superintelligence by training AGI models on the world’s data, employing millions of annotators to provide feedback on the models’ mistakes. In the process, they're spending billions on hardware, datasets, and talent to achieve a singular model that reflects a collective intelligence.
At the same time, we’re also witnessing the latest private AGI models and their capabilities become inexpensive or free in the open source community. Last week, a leaked Google Memo revealed what most of us already knew, which is that the open source community is on track to outcompete both Google and OpenAI. Just last month, Stanford researchers released their own version of ChatGPT, built with less than $600. The LlaMA model was retrained by the researchers cheaply fine-tuning it on inputs and outputs from one of OpenAI’s first models text-davinci-003.
This means that the most advanced models in the world can now be open sourced for a fraction of the price.
HOW THIS EARTHQUAKE IS FELT BY ALL
Everyone can now build new products, services, and companies that harness AGI as an expert skill set. The first wave of tooling is beginning to bridge the gap between an information-rich world and a knowledge-first environment.
And we’re already seeing products blend multimodal workflows for specialists. Runway helps video editors edit their videos through natural language. Typeface helps marketing execs create ads through prompts. These tools do a few tasks really well, but remain siloed. Most users are still relying on antiquated systems to complete their day’s work, and using new products powered by RLHF for a few specific tasks. As AGI research advances, so will our opportunity to rethink how we work, and the tools we need to streamline our processes.
WHERE THINGS ARE GOING: What could the future look like?
Research models are increasingly becoming good at completing tasks on a user’s behalf. Models like Adept’s ACT-1 show us how AGI can understand a user request and execute it, ChatGPT Plugins illustrate how an LLM can reason and act on the user’s behalf on the internet.
Academia is beginning to see signs of life on whether a model can fully simulate human behavior (Stanford’s Generative Agents: Interactive Simulacra of Human Behavior paper, April 2023).
Developer Tools like LangChain, meanwhile, enables LLM agents to be both data-aware and semi-autonomous, and LlamaIndex enables LLM agents to more proactively manage their data infrastructure.
All three signs - whether with research models, academic papers, or developer tools - indicate that soon models will be acting on our behalf, simulating human behavior to plan, reason, and execute tasks, whilst developer tools make it far easier to deploy these models at scale.
In my next letter, I’ll talk about how to develop products that rely on generative AI as a collaborative partner. We need to see this technology as altering not only the speed at which we work, but the teams we hire, the businesses we develop, and the ideas we now have the time to explore.