Lately I’ve talked to lots of people who are just getting their heads wrapped around the value of big data software such as Apache Hadoop, but are getting stuck figuring out the details. What kind of servers do I need to buy? What services do I need to install to make a “data lake”? How do I make sure I install the services in a way that makes them highly available, while being optimized for performance? How do I make sure that I can quickly expand my environment if my use case takes off?
One commonly used solution to this dilemma is the appliance. Why not just buy a shrink-wrapped solution with (hopefully) all of these considerations built in? The appliance model works well in some cases, but not all. Appliances typically work well with “small data” software, where a one-size-fits-all box works for pretty much anybody. However, when the solution you want to deploy could grow to the point where it is the majority of your data center footprint, the model might not work that well, for a few reasons:
- It might not fit into your typical hardware purchase process. Most companies out there have great relationships with one of the major server vendors that bring along benefits like a standard discount, a sparing strategy, and integration with server management and monitoring tools.
- Sizing can be tricky. What happens if you outgrow one “unit” of the appliance? Do you need to double your investment? Who is responsible for making multiple appliances talk to each other?
- It assumes you have floor space on which to plop a bunch of metal. What if you have a private cloud environment that you must deploy all new software on? What if your infrastructure is completely in the public cloud?
What if you could have the benefits of simplicity of deploying big data software without requiring a specific hardware configuration? Well, you can, with our new “Auto-Provisioning Templates.” MapR Auto-Provisioning Templates take the infrastructure you have, whether it’s servers from your vendor of choice, a private cloud, or public cloud, and wrap big data software around them as snug as a well-fitting glove. This module makes all the right decisions for you, out of the box, making services highly available and performance optimized.
Since you don’t expect to have a one-size-fits-all deployment, the Auto-Provisioning Templates can help by providing guidance. You get predefined configurations to help you get started with a platform that is best suited for your expected workloads. These configurations include:
- Data Lake – This deploys the most common services used in Apache Hadoop for deploying a data lake, in which a variety of data types from numerous sources are integrated into a single platform.
- Data Exploration – This configuration deploys services for schema-free interactive SQL exploration of data, and includes Apache Drill. This enables self-service querying on large data sets without requiring the time-consuming effort of building schemas, typically performed by your IT staff.
- Operational Analytics – This deploys Hadoop with MapR-DB, the in-Hadoop NoSQL database, to run operational HBase applications and analytical applications in a single cluster.
The Auto-Provisioning Templates are included in all editions of the MapR Distribution.
Ready to get started? Try MapR today.