We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you're implementing it, and what you expect to see in the future. Learn More
Retrieval augmented generation (RAG) is an important technique that pulls from external knowledge bases to help improve the quality of large language model (LLM) outputs. It also provides transparency into model sources that humans can cross-check.
However, according to Jerry Liu, co-founder and CEO of LlamaIndex, basic RAG systems can have primitive interfaces and poor quality understanding and planning, lack function calling or tool use and are stateless (with no memory). Data silos only exacerbate this problem. Liu spoke during VB Transform in San Francisco yesterday.
This can make it difficult to productionize LLM apps at scale, due to accuracy issues, difficulties with scaling and too many required parameters (requiring deep-tech expertise).
This means that there are many questions RAG simply can't answer.
"RAG was really just the beginning," Liu said onstage this week at VB Transform. Many core concepts of naive RAG are "kind of dumb" and make "very suboptimal decisions."
LlamaIndex aims to transcend these challenges by offering a platform that helps developers quickly and simply build next-generation LLM-powered apps. The framework offers data extraction that turns unstructured and semi-structured data into uniform, programmatically accessible formats; RAG that answers queries across internal data through question-answer systems and chatbots; and autonomous agents, Liu explained.
Synchronizing data so it's always fresh
It is critical to tie together all the different types of data within an enterprise, whether unstructured or structured, Liu noted. Multi-agent systems can then "tap into the wealth of heterogeneous data" that companies contain.
"Any LLM application is only as good as your data," said Liu. "If you don't have good data quality, you're not going to have good results."
LlamaCloud -- now available by waitlist -- features advanced extract, transform load (ETL) capabilities. This allows developers to "synchronize data over time so it's always fresh," Liu explained. "When you ask a question, you're guaranteed to have the relevant context, no matter how complex or high level that question is."
LlamaIndex's interface can handle questions both simple and complex, as well as high-level research tasks, and outputs could include short answers, structured outputs or even research reports, he said.
The company's LllamaParse is an advanced document parser specifically aimed at reducing LLM hallucinations. Liu said it has 500,000 monthly downloads and 14,000 unique users, and has processed more than 13 million pages.
"LlamaParse is currently the best technology I have seen for parsing complex document structures for enterprise RAG pipelines," said Dean Barr, applied AI lead at global investment firm The Carlyle Group. "Its ability to preserve nested tables, extract challenging spatial layouts and images is key to maintaining data integrity in advanced RAG and agentic model building."
Liu explained that LlamaIndex's platform has been used in financial analyst assistance, centralized internet search, analytics dashboards for sensor data and internal LLM application development platforms, and in industries including technology, consulting, financial services and healthcare.
From simple agents to advanced, multi-agents
Importantly, LlamaIndex layers on agentic reasoning to help provide better query understanding, planning and tool use over different data interfaces, Liu explained. It also incorporates multiple agents that offer specialization and parallelization, and that help optimize cost and reduce latency.
The issue with single-agent systems is that "the more stuff you try to cram into it, the more unreliable it becomes, even if the overall theoretical sophistication is higher," said Liu. Also, single agents can't solve infinite sets of tasks. "If you try to give an agent 10,000 tools, it doesn't really do very well."
Multi-agents help each agent specialize in a given task, he explained. It has systems-level benefits such as parallelization costs and latency.
"The idea is that by working together and communicating, you can solve even higher-level tasks," said Liu.