The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Tue, 10 Dec, 8:03 AM UTC
2 Sources
[1]
How Databricks is using synthetic data to simplify evaluation of AI agents
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprises are going all in on compound AI agents. They want these systems to reason and handle different tasks in different domains, but are often stifled by the complex and time-consuming process of evaluating agent performance. Today, data ecosystem leader Databricks announced synthetic data capabilities to make this a tad easier for developers. The move, according to the company, will allow developers to generate high-quality artificial datasets within their workflows to evaluate the performance of in-development agentic systems. This will save them unnecessary back-and-forth with subject matter experts and more quickly bring agents to production. While it remains to be seen how exactly the synthetic data offering will work for enterprises' using the Databricks Intelligence platform, the Ali Ghodsi-led company claims that its internal tests have shown it can significantly improve agent performance across various metrics. Databricks' play for evaluating AI agents Databricks acquired MosaicML last year and has fully integrated the company's technology and models across its Data Intelligence platform to give enterprises everything they need to build, deploy and evaluate machine learning (ML) and generative AI solutions using their data hosted in the company's lakehouse. Part of this work has revolved around helping teams build compound AI systems that can not only reason and respond with accuracy but also take actions such as opening/closing support tickets, responding to emails and making reservations. To this end, the company unveiled a whole new suite of Mosaic AI capabilities this year, including support for fine-tuning foundation models, a catalog for AI tools and offerings for building and evaluating the AI agents -- Mosaic AI Agent Framework and Agent Evaluation. Today, the company is expanding Agent Evaluation with a new synthetic data generation API. So far, Agent Evaluation has provided enterprises with two key capabilities. The first enables users and subject matter experts (SMEs) to manually define datasets with relevant questions and answers and create a yardstick of sorts to rate the quality of answers provided by AI agents. The second enables the SMEs to use this yardstick to assess the agent and provide feedback (labels). This is backed by AI judges that automatically log responses and feedback by humans in a table and rate the agent's quality on metrics such as accuracy and harmfulness. This approach works, but the process of building evaluation datasets takes a lot of time. The reasons are easy to imagine: Domain experts are not always available; the process is manual and users may often struggle to identify the most relevant questions and answers to provide 'golden' examples of successful interactions. This is exactly where the synthetic data generation API comes into play, enabling developers to create high-quality evaluation datasets for preliminary assessment in a matter of minutes. It reduces the work of SMEs to final validation and fast-tracks the process of iterative development where developers can themselves explore how permutations of the system -- tuning models, changing retrieval or adding tools -- alter quality. The company ran internal tests to see how the datasets generated from the API can help evaluate and improve agents and noted that it can lead to significant improvements across various metrics. "We asked a researcher to use the synthetic data to evaluate and improve an agent's performance and then evaluated the resulting agent using the human-curated data," Eric Peter, AI platform and product leader at Databricks, told VentureBeat. "The results showed that across various metrics, the agent's performance improved significantly. For instance, we observed a nearly 2X increase in the agent's ability to find relevant documents (as measured by recall@10). Additionally, we saw improvements in the overall correctness of the agent's responses." How does it stand out? While there are plenty of tools that can generate synthetic datasets for evaluation, Databricks' offering stands out with its tight integration with Mosaic AI Agentic Evaluation -- meaning developers building on the company's platform don't have to leave their workflows. Peter noted that creating a dataset with the new API is a four-step process. Devs just have to parse their documents (saving them as a Delta Table in their lakehouse), pass the Delta Table to the synthetic data API, run the evaluation with the generated data and view the quality results. In contrast, using an external tool would mean several additional steps, including running (extract, transform and load (ETL) to move the parsed documents to an external environment that could run the synthetic data generation process; moving the generated data back to the Databricks platform; then transforming it to a format accepted by Agent Evaluation. Only after this can evaluation be executed. "We knew companies needed a turnkey API that was simple to use -- one line of code to generate data," Peter explained. "We also saw that many solutions on the market were offering simple open-source prompts that aren't tuned for quality. With this in mind, we made a significant investment in the quality of the generated data while still allowing developers to tune the pipeline for their unique enterprise requirements via a prompt-like interface. Finally, we knew most existing offerings needed to be imported into existing workflows, adding unnecessary complexity to the process. Instead, we built an SDK that was tightly integrated with the Databricks Data Intelligence Platform and Mosaic AI Agent Evaluation capabilities." Multiple enterprises using Databricks are already taking advantage of the synthetic data API as part of a private preview, and report a significant reduction in the time taken to improve the quality of their agents and deploy them into production. One of these customers, Chris Nishnick, director of artificial intelligence at Lippert, said their teams were able to use the API's data to improve relative model response quality by 60%, even before involving experts. More agent-centric capabilities in pipeline As the next step, the company plans to expand Mosaic AI Agent Evaluation with features to help domain experts modify the synthetic data for further accuracy as well as tools to manage its lifecycle. "In our preview, we learned that customers want several additional capabilities," said Peter. "First, they want a user interface for their domain experts to review and edit the synthetic evaluation data. Second, they want a way to govern and manage the lifecycle of their evaluation set in order to track changes and make updates from the domain expert review of the data instantly available to developers. To address these challenges, we are already testing several features with customers that we plan to launch early next year." Broadly, the developments are expected to boost the adoption of Databrick's Mosaic AI offering, further strengthening the company's position as the go-to vendor for all things data and gen AI. But Snowflake is also catching up in the category and has made a series of product announcements, including a model partnership with Anthropic, for its Cortex AI product that allows enterprises to build gen AI apps. Earlier this year, Snowflake also acquired observability startup TruEra to provide AI application monitoring capabilities within Cortex.
[2]
Databricks introduces new API for generating synthetic datasets - SiliconANGLE
Databricks introduces new API for generating synthetic datasets Databricks Inc. today introduced an application programming interface that customers can use to generate synthetic data for their machine learning projects. The API is available in Mosaic AI Agent Evaluation, a tool that the company offers as part of its flagship data lakehouse. The tool helps developers compare the output quality, cost and latency of artificial intelligence applications. Mosaic AI Agent Evaluation rolled out in June alongside Mosaic AI Agent Framework, which eases the task of implementing retrieval-augmented generation. Synthetic data is information generated with the help of an AI for the sole purpose of neural network development. Creating training datasets in this manner is considerably faster and more cost-efficient than assembling them manually. Databricks' new API is geared towards generating question and answer collections, which are useful for developing applications powered by large language models. Creating a dataset with the API is a three-step process. Developers must first upload a frame, or file collection, with business information relevant to the task their AI application will perform. Frames must be in a format supported by Apache Spark or Pandas. Spark is the open-source data processing engine that underpins Databricks' platform, while Pandas is a popular analytics tool for the Python programming language. After uploading the sample data, developers must specify the number of questions and answers the API should generate. They can optionally provide additional instructions to customize the API's output. A software team may specify the style in which the questions should be generated, the task for which they will be used and the end-users who will interact with the AI application. Inaccurate training data can reduce the quality of an AI model's output. As a result, companies often have subject matter experts review a synthetic dataset for errors before feeding it to a neural network. Databricks says it developed the API in a manner that eases this part of the workflow. "Importantly, the generated synthetic answer is a set of facts that are required to answer the question rather than a response written by the LLM," Databricks engineers detailed in a blog post today. "This approach has the distinct benefit of making it faster for an SME to review and edit these facts vs. a full, generated response." Databricks plans to release several enhancements for the API early next year. A new graphical interface will enable dataset reviewers to more quickly check question-answer pairs for errors and add more pairs if necessary. Additionally, Databricks will add a tool for tracking how a company's synthetic datasets change over time.
Share
Share
Copy Link
Databricks introduces a new API for generating synthetic datasets, aimed at simplifying and accelerating the evaluation process for AI agents. This innovation promises to enhance developer workflows and improve AI performance metrics.
Databricks, a leader in the data ecosystem, has announced a new synthetic data generation API as part of its Mosaic AI Agent Evaluation tool. This innovation aims to simplify and accelerate the process of evaluating AI agents, addressing a significant challenge faced by enterprises developing compound AI systems 1.
Enterprises are increasingly adopting compound AI agents capable of reasoning and handling diverse tasks across different domains. However, the evaluation of these agents' performance has been a complex and time-consuming process, often requiring extensive involvement from subject matter experts (SMEs) 1.
The new synthetic data generation API allows developers to create high-quality artificial datasets within their workflows. This capability significantly reduces the time and effort required to evaluate in-development agentic systems 1.
Key features of the API include:
The process of creating a dataset with the new API involves four steps:
This streamlined approach eliminates the need for complex ETL processes and data transfers between different environments 1.
Databricks' internal tests have shown significant improvements in agent performance across various metrics when using the synthetic data for evaluation and improvement. Notable results include:
Databricks plans to release several enhancements to the API in early 2024, including:
These upcoming features are expected to further streamline the evaluation process and improve the overall quality of AI agent development.
Reference
Databricks introduces Databricks Apps, a new capability that allows rapid development and deployment of data-intensive and AI applications directly on their Data Intelligence Platform, promising to simplify the process and enhance security.
3 Sources
Amazon Web Services and Databricks announce a five-year deal to leverage AWS Trainium chips for cost-effective AI model development, challenging Nvidia's dominance in the AI chip market.
4 Sources
Redbird launches an innovative AI-driven analytics platform that handles 90% of enterprise business intelligence workload, streamlining data analysis and reporting processes for non-technical users.
2 Sources
Snowflake's Data Cloud Summit 2024 showcases AI integration and data management advancements. The event highlights collaborations with tech giants and introduces new features to enhance data analytics and AI capabilities.
3 Sources
DataStax has introduced the DataStax AI Platform, built with Nvidia AI, which claims to reduce AI development time by 60% and handle AI workloads 19 times faster. This integrated solution aims to streamline the entire AI lifecycle for enterprises.
4 Sources