When data gets structured, value emerges. We’ve seen it over and over. Google structured web links into PageRank. Facebook structured your social graph into content ranking. Tesla is turning footage of city streets into navigation algorithms. Documents are another near-infinite naturally occurring resource of unstructured data. Docugami can become a generational technology company by distilling what’s inside.
85% of enterprise information is trapped as unstructured “dark data.” That’s why Jean Paoli is finishing the job he started more than 20 years ago when he co-created XML for the World Wide Web Consortium. As Paoli himself admits, he is “obsessed with documents,” and has devoted his career to unlocking the potential of unstructured data. It’s what drove much of his work at Microsoft, where he helped create four billion-dollar businesses and spearheaded the effort to add a data layer to Office documents. 
Now, with his new startup Docugami, he’s built customizable AI that will enable businesses to convert their unstructured information into data and put it to use.
“Only 15% of enterprise data is in a database. Everything else is a big mess” explains Paoli. But by bridging the languages of humans and computers with Docugami, “We can literally change how information flows across the enterprise.”
Sign up for early access to Docugami
That vision of a common tongue for data and Paoli’s extraordinary founder-market fit convinced us at SignalFire to lead Docugami’s seed round. Docugami can make the world’s unstructured document text — in PDFs, contracts, and Word files — structured, organized, and accessible for a myriad of use cases,” SignalFire’s CTO and former Googler Ilya Kirnos says, echoing his former employer’s tagline.
SignalFire’s Beacon data engine first spotted Docugami in 2018 when it detected an uncanny level of engineering talent at the new startup. “I remember this company came up in our Beacon Alerts meeting, and I go to their homepage and under Jean’s career highlights it lists him as a co-creator of XML, and I thought ‘Sure, yeah. And I’m the emperor of Atlantis,'” Kirnos remembers with a laugh. Turns out it was true. The other four co-founders all had deep domain expertise too. Andrew Begun, Taqi Jaffri, Mike Palmer, and Martin Sawicki were responsible for shipping critical parts of Word, Outlook, Office, and other products used by hundreds of millions of people.
“You can’t think of a more canonical team to go after this document problem.”

The Father Of Semi-Structured Data

It feels a little bit like destiny. Paoli was born in Lebanon, home of the Phoenicians who invented the alphabet, and grew up speaking French, which he credits with his appreciation of the written word despite becoming an engineer. Even as a child, he was drawn to seeing and applying structure to the world, building an enormous electric abacus out of a bed’s headboard. 
Paoli rose through the French computer science institute Inria and studied with famed researcher and pioneer of the semantics of programming languages Gilles Kahn. “I was always the engineer working with scientists” Paoli remembers from his time leading two Inria-incubated startups. That’s when he got the first taste of the problem that would define his career. “There are text and documents that humans understand, but there’s this other thing called data that computers understand. Why are these different and how can we bridge that divide?”

Docugami CEO Jean Paoli speaks at O’Reilly OSCON 2010. Image via James Duncan Davidson via O’Reilly Media 

Then Paoli got the big call. Bill Gates was staffing up Microsoft to embrace the web, and the company wanted him on the nascent Internet Explorer team. He joined under one condition. “I’m going to work on something that moves data on the internet, not on how the data looks.” In 1996 for the WC3, he co-created the Extensible Markup Language — XML — a format that’s both human-readable and machine-readable. That led to the X at the end of the DOCX, PPTX, and XLSX file formats we use today, augmenting these documents with a structured data layer.
But while this effort started to dig the wells to the crude information buried inside of documents, someone still needed to extract and refine it into semantic data that can fuel a business. Over the course of his Microsoft career, Paoli built four businesses that each grew to over $1 billion in revenue. In his final role, he led Microsoft Open Technologies, accelerating open source technologies and paving the way for more than 60% of Microsoft Azure usage today.
After 20 years at Microsoft, Paoli recognized that advances in AI and cloud infrastructure had finally reached the point that he could use them to solve the “document dysfunction” opportunity he had been pursuing his entire career. He left Microsoft to start his seventh business, Docugami, and build the missing algorithms and services that would transform documents into data.

How Docugami works

Docugami combines Deep Learning, natural language processing, Bayesian, Evolutionary, and other AI techniques paired with declarative markup approaches to scan and understand business documents in any file format. It can examine a large group of documents, categorize them by type and function, and identify common and unique elements. From there, Docugami’s AI can recognize, catalog, and analyze items across documents such as:
  • Payments
  • Dates
  • Contract terms
  • Patterns
  • Deadlines
  • Disparities
  • Relationships between terms and clauses
Docugami then generates automatic reports and summaries, helps you author new documents, and feeds the data into your other software like CRMs, Robotic Processing Automation, or analytics dashboards. 
For example, Docugami can help a bank compare terms across its entire loan portfolio, a government agency identify agreements that need auditing due to regulatory change, a real estate firm track millions of dollars in contractual obligations, or a health clinic simplify the process of doctors creating patient notes. 
In an open letter to the software industry, Paoli writes that “We envision a world where AI helps people construct documents that are engineered for maximum data reuse from the start, fostering human creativity and unlocking billions of dollars in increased efficiency, improved compliance, and business insights for companies around the world.”

Docugami is focused on mid- to large-sized businesses in legal, finance, consulting, human resources, and real estate that live and die by contracts. Users can upload partnership deals, service agreements, NDAs, leases, loans, RFPs, proposals, and all manners of sales contracts. Docugami then lets their teams and technology understand exactly what’s inside and help them create new documents. The technology can be customized in 30 minutes to understand the nuances of a particular business and can easily scale to other verticals such as healthcare or manufacturing. Docugami never jeopardizes privacy by applying AI learnings from one customer’s documents to another.
After being founded in March 2018, the startup was processing real customer data sets within months thanks to the credibility of the team. Currently, Docugami has dozens of businesses and organizations using its private beta, across a diverse array of industry sectors, including professional services, construction, real estate, law, health care, finance, and many more. Docugami’s document engineering solution is available in private beta today (sign up here for early access) ahead of its public self-serve launch in 2021.

SignalFire digs in with Docugami

“When I entered, I saw dozens of people actually working, doing stuff in a very nicely disorganized way with a bunch of tables and whiteboards. I thought, ‘Whoa, this looks like how I work’” Paoli recalls from his first visit to SignalFire’s San Francisco office in 2019. Encouraged by a recommendation from SignalFire portfolio company Grammarly, Paoli took the meeting.
Another factor stood out for Paoli at his first sit-down with SignalFire. “Finally, somebody asked me a question that wasn’t ‘how much ARR (annual recurring revenue) do you have?’ or ‘Why doesn’t this screen have a nicer design?’”
Instead, the first question Paoli received was “Does this thing work?” followed shortly by “Well, can we try it?” from Kirnos and our AI Ph.D. Adam Vogel. Since roughly a quarter of our team at SignalFire are data scientists and programmers building out our Beacon Recruiting and Market Data engines that we provide to our portfolio, we actually have engineering bays (or we did before going fully remote due to COVID-19).
“It was awesome. It reminded me a lot of Microsoft. I did not need to repeat what we were trying to do 15 times” Paoli remembers. “You know, it felt like a very familiar place, like ‘Okay, I’m talking to techies.’”

Docugami co-founders Andrew Begun, Taqi Jaffri, Martin Sawicki, and Mike Palmer (inset). The whole staff could fit in a tiny conference room in 2018. Today, the company has grown to nearly 30 engineers, scientists, and business leaders.

 

SignalFire’s engineering bay

In fact, Vogel and Paoli were geeking out together before they’d even met. After Beacon first sniffed out Docugami, Vogel tried the demo and started asking about their named entity recognizer and how the algorithm was trained. He tells me Docugami’s team responded, “‘What? You opened that? Nobody ever goes there!’ And I was like, ‘What are you talking about? These are the goods! The rest is just UI!”
Paoli says “Adam’s role was critical! He went and tested the stuff, he tried it with his own documents. If he found bugs, which you would expect with a pre-seed startup, he went around them. He was not scared about it. We had these experiences with a few other VCs telling us ‘Oh your dialog box doesn’t work.’ Really? Are you serious? I’m showing you an AI algorithm and you’re telling me the dialog box doesn’t work” he says with a grin. Kirnos beams “We’re proud to come across as more entrepreneurial and more like builders versus asset managers or financiers.”
Beyond the code, it’s been a pleasure to support a legend like Paoli. We’ve used our Beacon Recruiting tech and talent team to help fill several of Docugami’s engineering roles. In just six months after receiving their seed round from SignalFire, Docugami grew from 8 to 30 employees, hiring senior engineering, scientific, sales, and operational leaders, and continues to hire with a commitment to both excellence and diversity. Paoli and his team have joined 18 of our expert council events on topics like enterprise influencer marketing and freemium strategies. And our executive briefing team has introduced Docugami to sales leads inside and outside of our portfolio that have turned into pilot deals.

For a fund full of engineers, it’s easy to get excited about the potential of Docugami. “There are a lot of office workers who have this mind-crushing work of data extraction where they’re doing not quite the same thing over and over” Vogel explains. But Docugami can learn and be customized to automate what otherwise couldn’t be. “I really think this could be a liberating technology for people with crap office jobs.”
“You can take all the data that lives inside a PDF that isn’t recorded anywhere else, no one knows how to find, and no one knows how to aggregate, and all of a sudden you can run a query on this unstructured data as easily as if it was a spreadsheet” Kirnos says.
“The market is just absolutely ginormous because every company has contracts” Kirnos explains. “At Google, I saw the power of making information organized and accessible. Now I see it with Docugami.”

Principal investor & Head of Content at | Website | + posts

Josh Constine is a Principal investor and Head of Content for SignalFire. He focuses on early stage consumer startups, advising portfolio companies on PR, and writing to inspire and educate founders. Constine was previously Editor-At-Large for TechCrunch, where he spent 8 years writing 4000 blog posts about young startups and social giants. He's led 200 on-stage talks around the world and was ranked the #1 most cited tech journalist by Techmeme from 2016 to 2020.