Experian uses MapR to store and process more financial data faster, providing clients with enhanced insights and services.
Experian is an international information services organization with global revenues of $4.8 billion and 16,000 employees. The company has four primary business lines: credit services, decision analytics, direct-to-consumer products, and a marketing services group.
The rapid growth of the credit reference industry and the market for credit risk
management services set the stage for the reliance on increasing amounts of consumer
and business data. This has culminated in an explosion of big data—data
that is Experian’s life’s blood.
As the company added new products and new sets of data quality rules, more data had to be processed in the same or less time. It was time to upgrade. But simply adding to the existing Windows/SAN system was too cumbersome and expensive.
Experian chose the MapR Distribution including Hadoop to move beyond the
restraints of their in-house database while increasing processing power, lowering
costs, and devising new ways to store more easily accessible data.
The group upgraded to a Linux-based HPC cluster with six nodes. “We have a single customer solution right now. But as we get new customers who can use this kind of capability, we can add additional nodes and storage and processing capacity at the same time,” says Tom Thomas, director of the Data Development Technology Group within the Consumer Services Division.
“Our first solution includes well-known and defined metrics and aggregations. We leverage DMX-h from Syncsort to determine metrics for each record and pre-aggregate other metrics, which are then stored in Hadoop to be used in downstream analytics as well as real-time rules based actions,” he says. “Our second solution follows a traditional data operations flow, except in this case we use DMX-h to prepare in-bound source data that is then stored in the MapR Distribution including Hadoop. Then we run Experian-proprietary models that read the data via Hive and create client-specific and industry-unique results.”
By taking advantage of Hadoop using MapR, the Experian team got:
Flexible data access with NFS
MapR is the only distribution for Hadoop that leverages the full power of NFS for remote access to shared disks across the network. “All our solutions leverage MapR NFS functionality,” Thomas continues. “This allows us to transition from our previous internal or SAN storage to Hadoop by mounting the cluster directly. In turn, this provides us with access to the data via HDFS and Hadoop environment tools, such as Hive.”
More storage and increased processing speed
“We are realizing increased processing speed which leads to shorter delivery times. In addition, reduced storage expenses means that we can store more, not acquire less. Both the company’s internal operations and our clients have access to deeper data supporting and aiding insights into their business areas,” he says.
Faster time to market for the business units
“Overall, we are seeing reduced storage expenses while gaining processing and store capabilities and capacities. This translates into an improved speed to market for our business units. It also positions our group to grow our Hadoop ecosystem to meet future big data requirements,” he says.