At SignalFire, we’re not only investing in AI companies, but we're also building an AI-driven venture firm day-to-day. Artificial intelligence and machine learning have been at the core of our operations for more than ten years.
When we started in 2013, the firm set out to create a machine learning–powered talent platform to help our investment team spot high-potential founders and companies, and to help our portfolio companies improve their recruiting efforts. Since then, we’ve woven data into the fabric of how we operate as a firm, giving rise to our in-house platform, Beacon AI. Alongside daily fine-tuning—via reinforcement learning from human feedback by our investment, talent, and data teams—we've expanded Beacon to incorporate LLMs over the last year.
- SignalFire has been building AI internally to help our investment and talent teams for a decade. Now we’re integrating LLMs into our Beacon AI data platform to speed up workflows and surface more accurate and comprehensive results for company and people searches.
- Applying GPT lets us surface concise summaries of companies we're evaluating and similar companies who might be competitive, and it allows us to route high-potential companies to the right investment team members.
- GPTs let us unify different job titles that mean the same thing to expand results for recruiting searches, programmatically understand gaps in teams, and provide more sales leads with contact info to portfolio companies.
- Experimentation is critical to learn where LLMs perform best, when to use off-the-shelf models vs. in-house models, and how to minimize costs as we continue to pioneer new approaches to using AI in venture.
Beacon: SignalFire’s homebrewed AI
The Beacon AI platform does quite a few things for the firm. Before we dive into how we're utilizing LLMs, let’s cover what Beacon AI is in the first place, and why it’s transformative for VC.
At its core, Beacon AI is a massive data platform that tracks and ranks more than 80 million companies, 600 million people, and millions of open-source projects. To power it, we rely on 40 different datasets to help us paint a holistic picture of each company, person, and project. We then rely on a handful of proprietary machine learning algorithms developed by our team over the past decade for ranking.
AI for investment sourcing
But what does a platform that ranks people, companies, and open-source talent do for the firm? First, it helps us research every deal that we do. From understanding a startup’s talent ranking to the market the company serves, we can apply a wide swath of quantitative learnings to every company and founding team we consider. This helps us ask more relevant questions and make more principled decisions.
In addition to research, we use the platform to proactively recommend founders, companies, and open-source projects that our investors should contact. By tracking people's mobility and quantifying some of the essential aspects of teams, companies, and projects, we can identify opportunities for investment before they become available on the market, giving our investment team a sourcing superpower. If a company is hiring high-quality engineers or gaining momentum on GitHub, Beacon will surface it to make sure we’ve taken a look.
AI for portfolio success
We also use the platform to support our portfolio companies. Most venture capital firms try to assist their portfolios through human-centric services alone, which doesn’t scale as their portfolio grows. The alternative is for portfolio support to wane in efficacy, which also isn't optimal. So, while SignalFire does provide human-centric Portfolio Success as a Service, we also believe in augmenting and scaling our ability to serve people via software.
Beacon AI helps us power a talent search tool that our portfolio companies can use for free to find the best talent as they build their teams, including people who aren’t officially looking for work but who our AI suggests might be open to new opportunities. If someone has worked at the best companies in their industry or is a prolific Github contributor, but their current company is bleeding talent, we know to recommend them to our portfolio companies. For example, our portfolio company EvenUp has used Beacon extensively for recruiting as it rapidly scales: they recently sourced 100 top candidates for their first product designer role, leading to two finalists and a great hire.
Finally, Beacon helps our portfolio companies find prospective customers. It can cross-reference their ideal customer profile with our talent search engine to provide them with lists of sales leads, complete with contact info.
Rather than ceasing support for earlier investments or companies still finding product-market fit, building scalable AI lets us help our entire portfolio as it grows. Now, we’re adding in LLMs to make our Beacon AI even more powerful.
Dos and don’ts of AI for venture
Applied improperly, LLMs can hallucinate inaccurate information and destroy trust in relationships. That’s why we’ve been closely monitoring the development of LLMs in recent years to determine how they can be safely applied to venture.
We’ve unfortunately heard of some firms using LLMs to automate outreach to startups. We see this as a disrespectful waste of founders’ time. If an investor wants to learn more about a startup, they should do their homework, then reach out to the founders personally, not secretly using AI to spam them.
Using LLMs to summarize pitch decks or memos so deal partners don’t have to read them in full is another questionable use case we’ve heard of. There are so many nuances to a pitch and business plan that could easily get lost or misinterpreted. You can use AI to efficiently scan the landscape for startups, but once you’re evaluating their vision, human experience, and expertise are critical to avoiding costly mistakes.
Applying LLMs at SignalFire
A much more respectful approach is to use LLMs to optimize internal data gathering rather than decision-making and communication. AI is best for understanding the world at a scale humans can’t. That’s why over the past year we’ve been using LLMs to enrich our existing Beacon AI datasets and route what’s most important to our investors and portfolio companies. This breaks down into two core categories:
- Finding similar companies: Knowing all the companies in a given space is critical to making holistic investment decisions. For example, we may want to find every company that’s building tools to solve the healthcare labor shortage, regardless of whether they focus on nurses or doctors, sourcing or retention, or health systems or practitioners. LLMs allow us to analyze company websites and descriptions to find competitors and alternative approaches to a startup we’re considering, build market maps, and know how crowded a given market is.
- Simplifying company descriptions: By using LLMs to summarize and strip jargon from information scattered across a company’s site and documentation, we can provide our investors a quicker way to triage recommendations from Beacon about which startups to potentially talk to.
- Improving company sector tagging: LLMs let us more accurately label which sector a company is in (e.g., cybersecurity or healthcare). That way, we can route it to the investor with expertise in that market for the most informed decision.
For our portfolio
- Recruiting across disparate job titles: To recommend the best talent for portfolio companies, we need to understand who has experience in a given role. Job titles aren't consistent from company to company. For example, “product managers” at Meta are the same as "program managers” at Microsoft. Many job titles also have unique modifiers like “mid-market” or “SMB” for go-to-market roles. In Beacon AI's talent platform, we now use LLMs to classify and condense roles to make talent searches simpler for our users. Instead of searching through every permutation of job titles, you can select “product” to encapsulate all product jobs.
- Providing accurate sales leads: We’re using LLMs to give our portfolio companies better lists of contact information for people matching their ideal customer profiles. By analyzing the size, industry, business model, job titles, and hierarchy of potential customers, we can share which are the most likely to be high value, swift-sales cycle buyers.
By building, using, and investing in AI every day, we’ve learned a lot about how the venture world should think about this technology. It’s not cheap, fast, or easy to spin up high-performance systems, so what’s most important is committing to a long-term experimentation cycle. Some of the other key takeaways from SignalFire’s experiences are:
- Off-the-shelf LLMs provide a remarkable tool for testing a concept, though they require lots of iteration before moving to production. What used to take months to test can now take just days. We've been able to go from hypothesis to test to production much more quickly by using GPT and other open models.
- Price is still a barrier for substantial projects. We use LLMs to classify a subset of the companies and people we track, not the entire corpus. If we were classifying all 600 million people and 80 million companies, our costs would skyrocket. That said, the prices for embeddings and other processes are racing to the bottom. Things today are exponentially more affordable and tenable than they were a year ago.
- The decision to use either in-house LLMs or off-the-shelf models requires consideration of privacy and data moats as well as product and data quality. Most technology is a double-edged sword, and off-the-shelf LLMs are no different. Despite the speed of execution they provide, data leakage is a risk. We limit our current use of LLMs to data that’s safe for ingestion, and will continue to weigh tradeoffs as we expand their usage to ensure security.
- Similar companies are still complex to identify with LLMs. That's because much more goes into similarity than a company's description. You have the sector, the core customer segment, how much funding they have, go-to-market strategy, and more that all contribute to “true” similarity. Determining Uber and Lyft's similarity is pretty simple, but finding direct analogues in cybersecurity is much more complex because of the industry’s variety of subsectors and specializations. Two companies’ text descriptions might look very similar to an LLM, indicating similar text embeddings, but they could differ across business models, customers, and technologies. Fortunately, there are huge benefits to being directionally good enough as we work toward comprehensive perfection.
- LLMs can mitigate data quality and assess risks: None of the 40+ datasets we use are 100% perfect. Data quality is our most significant product requirement, though, so we’re constantly looking for ways to improve it. For example, our tests found that LLMs had a precision/recall of ~75% for company sector tagging. This is remarkably good model performance, even compared to the best human-curated multitagged datasets on the market. LLMs can enrich our sector data to fill gaps where the hand-curated service failed. We expect many data providers will move to a blended human + AI/ML tagging system, creating better coverage for us. That means that even if a data provider ceased operations or dropped in quality, we could use the LLM as a stopgap until we found a replacement.
With AI and PhD data scientists, machine learning engineers, and experienced product managers on staff, it’s in SignalFire’s DNA to keep pushing the envelope with AI. Our team has a range of new experiments in the works that we’ll report back on soon. We all know AI will transform venture, and we’re excited to pioneer exactly how.
Learn more about our new AI program, the SignalFire AI Lab, here.