The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Wed, 11 Dec, 12:07 AM UTC
2 Sources
[1]
Generative AI app testing platform Gentrace raises $8M to make LLM development more accessible - SiliconANGLE
Generative AI app testing platform Gentrace raises $8M to make LLM development more accessible Gentrace, a developer platform for testing and monitoring artificial intelligence applications, said today it has raised $8 million in an early-stage funding round led by Matrix Partners to expand large language model testing product capabilities beyond engineering teams. Today's Series A funding round attracted participation from Headline and K9 Ventures and brings the company's total raised to date to more than $14 million. Founded in 2023, Gentrace offers a testing and monitoring product that allows non-technical users to participate in the evaluation, testing and monitoring of AI applications. According to the company, many industries are rushing to add generative AI to their offerings, development teams are facing the challenge of ensuring it remains reliable and safe. Similarly, the ability to evaluate and test large language models remains largely the domain of development and engineering teams, making it difficult to collaborate with product managers, subject matter experts, designers and quality assurance teams. "Generative AI represents a paradigm shift in software development, but the reality is there's way too much noise and not enough signal on how to test and build them easily or correctly," said Doug Safreno, co-founder and chief executive of Gentrace. "We're not just creating another dev tool -- we're reimagining how entire organizations can collaborate and build better LLM products." To help tackle this challenge, Gentrace announced Experiments, a tool that allows cross-functional teams to collaborate in purpose-built testing environments to assess AI model performance. They can test AI outputs directly, preview test outcomes, anticipate errors and explore scenarios while exchanging data and information freely between technical and non-technical members. The company's platform and Experiments interfaces with many existing tools and model providers including OpenAI, vector database Pinecone Systems Inc. and visual LLM programming environment Rivet. Early adopters, including companies such as Webflow and Quizlet using the platform helped predict AI-related issues before they affected users. According to Quizlet, by implementing Gentrace's platform, the company increased testing frequency from two times per month to more than 20 times per week, significantly improving the speed of iteration with testing. "Gentrace was the right product for us because it allowed us to implement our custom evaluations, which was crucial for our unique use cases," said Madeline Gilbert, a staff machine learning engineer at Quizlet. As an education technology company providing study tools for students and teachers, Quizlet uses generative AI on unstructured notes and materials to create study tools. According to Gilbert, even minor changes, such as a comma in a prompt, could significantly change the predictability of models. Gentrace's solution allowed quality assurance teams and subject matter experts to evaluate and test quickly after any modification. "It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations," said Gilbert.
[2]
Gentrace makes it easier for businesses to test AI-powered software
As businesses continue to integrate generative AI into their products, many find it challenging to actually test whether the AI is behaving correctly and giving useful answers. To help address this problem, a startup called Gentrace offers an integrated platform for testing software built around large language models. Whereas traditional software that can be subjected to automated tests to verify that, say, data submitted to a web form ended up properly formatted in a database, AI-powered software often can't be expected to behave exactly in a specified way in response to input, says Gentrace cofounder and CEO Doug Safreno. Customers can end up defining a set of test data for the AI after any changes to the AI model, the databases it interacts with, or other parameters. But without a testing platform, running those tests can mean maintaining spreadsheets of AI test prompts and manually logging that they give satisfactory results. And while automation is possible, verifying that an AI response contains certain keywords or even asking another AI system to confirm that an AI looks satisfactory, complex testing often requires engineers to be heavily involved, even if other team members like product managers might know better what good output looks like, Safreno says. "The problem becomes, nobody can look at it and collaborate on these tests and on these evaluation methods," he says. "As new product requirements come in, they're not being captured in the testing."
Share
Share
Copy Link
Gentrace, a platform for testing and monitoring AI applications, has secured $8 million in Series A funding to expand its LLM testing capabilities beyond engineering teams, making AI development more accessible and collaborative.
Gentrace, a developer platform specializing in testing and monitoring artificial intelligence applications, has successfully raised $8 million in a Series A funding round. The investment was led by Matrix Partners, with participation from Headline and K9 Ventures, bringing the company's total funding to over $14 million [1]. This significant financial boost aims to expand Gentrace's large language model (LLM) testing product capabilities beyond engineering teams, making AI development more accessible and collaborative.
As industries rush to incorporate generative AI into their offerings, development teams face the critical challenge of ensuring AI applications remain reliable and safe. Traditionally, the evaluation and testing of large language models have been primarily confined to development and engineering teams, creating barriers for collaboration with other crucial stakeholders such as product managers, subject matter experts, designers, and quality assurance teams [1].
Doug Safreno, co-founder and CEO of Gentrace, emphasized the paradigm shift that generative AI represents in software development, stating, "We're not just creating another dev tool -- we're reimagining how entire organizations can collaborate and build better LLM products" [1].
To address these challenges, Gentrace has unveiled Experiments, a tool designed to facilitate cross-functional team collaboration in purpose-built testing environments. This innovative solution allows teams to:
Experiments seamlessly integrates with existing tools and model providers, including OpenAI, Pinecone Systems Inc., and Rivet, enhancing its versatility and applicability across different AI development ecosystems [1].
Early adopters of Gentrace's platform, including companies like Webflow and Quizlet, have reported significant improvements in their AI development processes. Quizlet, an education technology company, experienced a dramatic increase in testing frequency, jumping from twice a month to over 20 times per week [1].
Madeline Gilbert, a staff machine learning engineer at Quizlet, praised Gentrace's solution, stating, "It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations" [1]. The platform's flexibility in implementing custom evaluations proved crucial for Quizlet's unique use cases, particularly in creating AI-generated study tools from unstructured notes and materials.
Gentrace's platform addresses a critical need in the AI development landscape by providing a user-friendly interface that allows non-technical team members to participate in the evaluation, testing, and monitoring of AI applications [2]. This approach helps overcome the limitations of traditional software testing methods, which often fall short when applied to AI-powered systems.
Doug Safreno highlighted the challenges of maintaining spreadsheets of AI test prompts and manually logging results, emphasizing the need for a more streamlined and collaborative approach to AI testing [2]. By enabling product managers and other non-technical stakeholders to actively participate in the testing process, Gentrace aims to capture evolving product requirements more effectively and improve the overall quality of AI-powered software.
As businesses continue to integrate generative AI into their products, platforms like Gentrace are poised to play a crucial role in ensuring the reliability, safety, and effectiveness of AI applications across various industries.
Tel Aviv-based startup Early secures $5 million in seed funding to develop an AI-powered tool that automates code testing, aiming to improve software quality and catch bugs early in the development process.
2 Sources
As AI technology rapidly advances, businesses are exploring the potential of generative AI and large language models. This article examines the current state of AI, its applications, and the challenges organizations face in implementation.
6 Sources
Distributional, an AI testing platform founded by Intel's former GM of AI software, raises $19 million in Series A funding to automate and enhance the reliability of AI model and application testing for enterprises.
2 Sources
Generative AI is set for widespread adoption in the software industry, with China leading the global surge. However, experts caution about overstating its immediate impact, while adoption rates vary significantly across regions.
6 Sources
A new report by Menlo Ventures highlights a significant increase in enterprise AI spending, reaching $13.8 billion in 2024. However, many companies still struggle with implementing generative AI effectively across their organizations.
2 Sources