Terbium Labs Relies on MapR Technology to Discover Stolen Data on the Dark Web

Since MapR technology is implemented in native code rather than through a Java virtual machine, its Hadoop distribution is significantly more resource efficient.

The Business

Terbium Labs is a security startup offering technology that proactively discovers when information stolen from companies shows up on hidden criminal websites.

By registering fingerprints of companies’ most valuable data and comparing them to ones gathered from across the Internet, Terbium’s Matchlight system can discover and alert companies immediately and automatically if their data appears in unexpected places on the internet, including the dark web.

The Challenge

The theft of sensitive data costs global industry over $445 billion each year1. Even the most robust security can’t stop today’s sophisticated attackers and insider threats. The average data breach takes over 200 days to discover, giving adversaries months or even years to exploit a security incident.

“We started with the thesis that defensive technologies are no longer sufficient for organizations to protect their data,” explains Terbium Labs CEO Danny Rogers. “Companies with sensitive data need to shift from a purely defensive posture to a proactive one, planning for inevitable breaches and how to discover and remedy any breaches as quickly as possible.”

MapR Solution

When Terbium Labs was developing its data intelligence solution, they evaluated several distributions of Hadoop. They built a small cluster to do development before moving to a cloud provider. “We started with Cloudera and then tried other distributions,” explains Terbium Labs CTO Michael Moore. “All of the Java heavy Hadoop distributions we tried broke. You name it, we tried it, and none of them worked.”

The Terbium Matchlight solution operates on all types of digital assets, from source code to documents, resulting in datasets that are extremely complex. This challenging data environment means that Terbium needs a Hadoop platform that is more efficient, more stable, and more reliable than the traditional, Java-based open source distributions.

“We’re a very technical company. We push the limits of the technology,” continues Moore. “We respect really good technology when we see it so we were excited to try MapR.” Terbium chose the MapR Distribution including Apache Hadoop to deploy its data fingerprinting technology across a cloud-scale analytic platform.


More Stable, Efficient and Reliable
Since MapR technology is implemented in native code rather than through a Java virtual machine, its Hadoop distribution is significantly more resource efficient. “When other distributions add that much more overhead, everything becomes more difficult and that much more unstable,” says Moore. “This means our per fingerprint storage and processing cost with MapR is significantly lower than with any other Hadoop implementation.”

“Given the volume, scale and speeds at which we use it, MapR is the only choice that can handle the kinds of things we put the system through,” says Moore. “MapR is the only choice. It’s that good. Nothing else works.”

Manages Complexity and Scalability
Terbium’s fingerprinting technology has multiple layers of complexity. “The indexing we do is computationally complex because we store everything as fingerprints. We never store raw information,” explains Moore. Fingerprints do not require access to, storage, or modification of the original data.

And the company’s database of fingerprints is growing at an exponential pace. It contains 340 billion fingerprints today and is growing by ten to fifteen billion every day. “We will easily get into the trillions of fingerprints. The only way we can get into that scale is with MapR technology,” says Rogers. “We are only as good as the data we collect, and our ability to collect more data depends on this key piece of technology.”

Early Results Look Promising
The startup is piloting its technology with half a dozen customers at F500 companies in financial services, healthcare, manufacturing and technology. Early results from their pilots look promising. In a single day, Matchlight identified 30,000 new stolen credit cards and 6,000 newly compromised email addresses for sale on the dark web.

Terbium Solution Not Possible Without MapR
The team’s vision for the company is bold. “We want to shut down the market for stolen data,” says Rogers. “We want to be an insurance policy for companies to bring the time to detect breaches down from 230 days to fifteen minutes, so that all of the stolen data can be identified and neutralized.”

Rogers says MapR technology is essential to building their business. “MapR literally makes or breaks our product. We can only do what we can do because MapR and these big data technologies exist. If they didn’t exist, we wouldn’t be able to do what we do.”