Kolena and why we need industrial-strength tools for AI testing

What’s the best AI model for your use case? Will it work with no hallucinations? These questions are plaguing every enterprise trying to adopt AI.

This stems from a huge shift in how we build software. The first wave of machine learning tooling was about in-house experimentation and training, and for good reason. The workflow was complex and third-party infrastructure tooling was insufficient. Traditional ML work was focused on model architecture and training hyperparameters, all driving toward consistent performance with a company’s specific product.

An illustration of the traditional ML workflow

Foundation models changed everything

Using zero-shot learning for ML product development, we now have machine-learning-as-a-service. Enterprises can license and integrate an AI model without having to build a full-fledged MLOps and tooling stack. But which model should they choose? Now they need a rigorous testing solution to validate and compare the best model in the market for their use cases.

An illustration of the traditional machine learning pipeline

Without a methodical way to verify performance, there are significant risks to adopting foundational models—LLM factualness, jailbreaking, privacy, hallucinations, and other characteristics. This is why the modern ML stack is test-driven. It’s all about gathering training data, fine-tuning models, and then checking that they consistently work as expected.

Kolena testing tools catalyze AI adoption

Kolena turns AI model comparison, testing, and validation into a science instead of a haphazard art. It allows developers to build AI systems characterized by safety, reliability, and fairness. By providing meticulous assessment and comprehensive analysis of every dimension of AI models and their data, Kolena adopts a highly granular approach: unit testing for machine learning. It ensures that AI models undergo rigorous testing at the scenario level before their deployment to users, significantly reducing risk to the business. That kind of peace of mind will catalyze adoption by bigger businesses, regulated sectors, and industries like healthcare that have no margin for error.

When we led Kolena’s seed round, we spoke with many practitioners from the industry. It was clear that there was a need but no competing tooling around building reliable, enterprise-ready AI products. One leader we spoke with from a well-known Silicon Valley company mentioned, “We do this using aggregate metrics currently, which leaves huge blind spots in our model validation process.” This is why customers were eager to use Kolena before the product was even built. They were dedicating significant headcount to ad hoc testing and were in such dire need of tooling that they considered building proprietary dashboards. This proved to be too complex, with big questions around data segmentation, testing diversity, and applying perturbations. Most enterprises preferred to buy rather than build, and now they have Kolena.

An illustration of the flow for testing model performance

As an AI-native venture fund that builds and tests its own models for investment sourcing and portfolio recruiting, SignalFire saw the need for Kolena early on and led its $6M seed round. We’ve since used our Beacon AI data platform to assist Kolena with commercial, technical, and leadership recruiting searches, and data pulls of potential customer lists. Now we’re excited to back its $15M Series A led by our friend David Hornik at Lobby Capital.

Kolena is fixing the broken testing workflow that even leading AI organizations like OpenAI are wasting time and risking mistakes by doing manually. It is tedious, time-consuming, and hard to manage at scale without the proper tooling. By giving time, energy, headcount, and assurance back to enterprises, Kolena is unlocking the next stage of AI adoption.

*Portfolio company founders listed above have not received any compensation for this feedback and may or may not have invested in a SignalFire fund. These founders may or may not serve as Affiliate Advisors, Retained Advisors, or consultants to provide their expertise on a formal or ad hoc basis. They are not employed by SignalFire and do not provide investment advisory services to clients on behalf of SignalFire. Please refer to our disclosures page for additional disclosures.

View all

How AnyTeam’s AI-native sales OS is fixing B2B sales’ most expensive problem

Portfolio

Investment

Sales & Go-to-Market

October 8, 2025

How AnyTeam’s AI-native sales OS is fixing B2B sales’ most expensive problem

AnyTeam is redefining B2B sales with the world’s first AI-native Sales OS, helping reps cut hours of admin to minutes. SignalFire leads the $10M seed to fix a $400B productivity gap and transform how AEs sell in the AI era.

Why we invested in Peer AI’s agentic backbone for the next era of drug approvals

Portfolio

Investment

October 7, 2025

Why we invested in Peer AI’s agentic backbone for the next era of drug approvals

SignalFire backs Peer AI, an agentic AI platform that accelerates life-sciences regulatory writing, reduces errors, and speeds therapies to patients.

Motion raises $60M at a $550M valuation to redefine the next era of work

Portfolio

September 8, 2025

Motion raises $60M at a $550M valuation to redefine the next era of work

SignalFire backs Motion’s $60M raise at a $550M valuation as the company triples ARR and pioneers the agent-native work suite with AI Employees.

No items found.

Foundation models changed everything

Kolena testing tools catalyze AI adoption

Subscribe to our newsletter

Share

Related posts

How AnyTeam’s AI-native sales OS is fixing B2B sales’ most expensive problem

Why we invested in Peer AI’s agentic backbone for the next era of drug approvals

Motion raises $60M at a $550M valuation to redefine the next era of work

Privacy Settings