Kolena and why we need industrial-strength tools for AI testing

Published on Sep 26, 2023

Kolena and why we need industrial-strength tools for AI testing

What’s the best AI model for your use case? Will it work with no hallucinations? These questions are plaguing every enterprise trying to adopt AI.

This stems from a huge shift in how we build software. The first wave of machine learning tooling was about in-house experimentation and training, and for good reason. The workflow was complex and third-party infrastructure tooling was insufficient. Traditional ML work was focused on model architecture and training hyperparameters, all driving toward consistent performance with a company’s specific product.

An illustration of the traditional ML workflow

Foundation models changed everything

Using zero-shot learning for ML product development, we now have machine-learning-as-a-service. Enterprises can license and integrate an AI model without having to build a full-fledged MLOps and tooling stack. But which model should they choose? Now they need a rigorous testing solution to validate and compare the best model in the market for their use cases.

An illustration of the traditional machine learning pipeline

Without a methodical way to verify performance, there are significant risks to adopting foundational models—LLM factualness, jailbreaking, privacy, hallucinations, and other characteristics. This is why the modern ML stack is test-driven. It’s all about gathering training data, fine-tuning models, and then checking that they consistently work as expected.

Kolena testing tools catalyze AI adoption

Kolena turns AI model comparison, testing, and validation into a science instead of a haphazard art. It allows developers to build AI systems characterized by safety, reliability, and fairness. By providing meticulous assessment and comprehensive analysis of every dimension of AI models and their data, Kolena adopts a highly granular approach: unit testing for machine learning. It ensures that AI models undergo rigorous testing at the scenario level before their deployment to users, significantly reducing risk to the business. That kind of peace of mind will catalyze adoption by bigger businesses, regulated sectors, and industries like healthcare that have no margin for error.

When we led Kolena’s seed round, we spoke with many practitioners from the industry. It was clear that there was a need but no competing tooling around building reliable, enterprise-ready AI products. One leader we spoke with from a well-known Silicon Valley company mentioned, “We do this using aggregate metrics currently, which leaves huge blind spots in our model validation process.” This is why customers were eager to use Kolena before the product was even built. They were dedicating significant headcount to ad hoc testing and were in such dire need of tooling that they considered building proprietary dashboards. This proved to be too complex, with big questions around data segmentation, testing diversity, and applying perturbations. Most enterprises preferred to buy rather than build, and now they have Kolena.

An illustration of the flow for testing model performance

As an AI-native venture fund that builds and tests its own models for investment sourcing and portfolio recruiting, SignalFire saw the need for Kolena early on and led its $6M seed round. We’ve since used our Beacon AI data platform to assist Kolena with commercial, technical, and leadership recruiting searches, and data pulls of potential customer lists. Now we’re excited to back its $15M Series A led by our friend David Hornik at Lobby Capital.

Kolena is fixing the broken testing workflow that even leading AI organizations like OpenAI are wasting time and risking mistakes by doing manually. It is tedious, time-consuming, and hard to manage at scale without the proper tooling. By giving time, energy, headcount, and assurance back to enterprises, Kolena is unlocking the next stage of AI adoption.

*Portfolio company founders listed above have not received any compensation for this feedback and did not invest in a SignalFire fund. Please refer to our disclosures page for additional disclosures.

Related posts

Horizon3: Building the leading autonomous defense platform
Portfolio
May 1, 2024

Horizon3: Building the leading autonomous defense platform

We’ve earmarked $50M for the SignalFire AI Lab to provide the resources, capital, and credibility to help tomorrow’s AI leaders today.
Grow Therapy raises $88M to revamp mental health and the MSO playbook
Portfolio
April 8, 2024

Grow Therapy raises $88M to revamp mental health and the MSO playbook

We’ve earmarked $50M for the SignalFire AI Lab to provide the resources, capital, and credibility to help tomorrow’s AI leaders today.
Lago helps companies of all stages with composable and effortless metering and billing
Portfolio
March 14, 2024

Lago helps companies of all stages with composable and effortless metering and billing

We’ve earmarked $50M for the SignalFire AI Lab to provide the resources, capital, and credibility to help tomorrow’s AI leaders today.