The Make vs Buy Decision: What You Need to Consider When Establishing a Big Data Platform

Tips for balancing leverage, risk, and program management

How times have changed—10-15 years ago, when you needed to store data for your application, it was likely structured data; the data fields were known ahead of time and didn’t change much. This made your “make vs. buy” task pretty straightforward—you could roll your own flat file or basic database, you could use a big RDBMS vendor, or you could go with a smaller solution like MySQL to embed in your project. Your decision was based on a few key criteria: the solution had to scale, the RDBMS needed to be able to handle your data and queries over the next three years, and the cost you paid to the OEM had to be within budget.

Fast forward to 2015—now we’re all experiencing the issues that Web 2.0 companies faced 10+ years ago that caused them to innovate. Their data was both structured and unstructured, the scale was large, data growth was accelerating and unpredictable, and the costs of the hardware and software licenses for RDBMSs were astronomical. Their scale-up architectures still could not solve the big data problem, regardless of how much money you tried to throw at the RDBMS vendors to solve the issue.

Are you facing an inflection point today with your software in terms of deciding which path to take, and how fast you can get there? Sooner or later, your BOD, CEO, or new CTO will arrive in your office and demand to know:

“What are we (you) doing about leveraging new big data tools for analytics?”  

Hopefully, you’ve had a skunkworks project that involved prototyping a NoSQL, scale-out big data architecture. But before you know it, your boss’s next question will be:

“How soon can we deploy in production, how much new revenue can we generate from analytics on our existing data, and how big are the new data-driven markets?”

So how can you navigate the make vs. buy decision today, in an environment with more software vendors and offerings then ever, with all of us touting that our product can boil the ocean?  I’ll explore two versions of this path in future blog posts that detail existing companies that are in market with ample customers and revenue, as well as startup companies that have a different focus and path in this technology lifecycle. In this blog post, I’ll provide tips for balancing leverage, risk, and program management.

Leverage your partners

Leverage is defined as the ability to accelerate the scale, speed, and schedule of your project. Selecting a partner/software that gives you leverage is essential to the success of your big data architecture. Leverage can be thought of as a force multiplier, where your 20-person team produces at the speed and quality of a 100-person team. Think of leverage as a long-term parameter; your software partner should continue to grow and deliver, so you get a 10x bump again with the next release.

Don’t be afraid to take on some risk

Risk and reward are different sides of the same coin. You can’t eliminate risk and expect to get a big return. To outperform and out execute your competition, you need to take risks and implement change. If you’re not building a product that will outperform your competitors’ products by 10x, most likely your competitor or a startup is building one that is 10x as powerful as your product right now.

Take heed of the saying, “Well-behaved people seldom make history.” Do you want the 10% better product featured on your resume, or the 10x project? I suggest going for the 10x one!

Choose a product that will grow with your needs

The three legs of a development program include your budget, schedule, and features. It’s important to identify which aspects are most important to your project.  Even with agile software development, you have a fixed size team (budget) and defined sprint times that will constrain the features that are available per sprint release. With a buy decision, you are increasing your feature set and speeding up time to market. The result? An improved, more predictable development program.

Be sure to be open and honest about the total costs over the life span of this release cycle.  Keep in mind that open source software (OSS) is not really free; there are true and hidden costs to OSS, just as with commercial offerings. Start with a product that will grow with your needs, and has proven production deployments. That way, you won’t have to re-factor your code as you move from development into QA and then into production.

Keep your team small and define your decision-making parameters

Create a small team that is empowered to investigate and make decisions on which infrastructure software to use, with ownership spread across business strategy, product strategy, and program management (execution) areas. If at all possible, limit the number of people that own the decision to three people. Have your discussions/arguments up front, and have all groups funnel their issues through this key three-person team.

Each of these areas should include three key dimensions that are critical in selecting and implementing your data infrastructure software. Even though it might seem that your product is too complex to reduce it down to three critical dimensions, it’s important to focus on the key issues that are critical to the success of your project. If one of the dimensions is scale, you can figure out if you need read performance, write performance, or a balanced metric. You don’t need 14 parameters across 4 architectures that require a pivot table or a 4-point font on the spreadsheet; that level of complexity just obfuscates the discussion.

Suggested decision dimensions

Build vs. buy decision dimension

Don’t fall into that trap of “analysis paralysis” that can lead to an executive review where someone says, “Well, the spreadsheet ranked choice “x” over “y” across the 100 dimensions by 5%.” Most likely, you’re having this review because the project missed its schedule, didn’t include all features (as promised to your top customer) and is over budget. Nothing is more costly than doing a project over a second time; lost reputations, lost market share, and double the project costs are not a pretty sight.

Focus on your core competency

You might have a top team of Stanford and MIT engineers, but you need to focus them on your core competency that differentiates your product in the marketplace. Even with Agile development, you have a fixed-size team (budget) and defined sprint times that will constrain the features that are available per sprint release.   

Writing and maintaining open source is rewarding and valuable, but is it your core mission?  Creating a scale-out cluster for big data is a great learning experience, but is it your core mission? Take a look at NASCAR racing—do you see anyone making their own tires or engines, or do they look to partners to help them increase leverage and manage risk? The team focuses on their core competencies by building and tuning the cars, and executing on race day.

Your mission is to put a 10x product into market as soon as possible, and in the best environment. Your product should solve your customers’ problems that they are willing to pay for. Look to your big data platform partners to decrease your risk, speed up development, and give your team leverage.



Ebook: Getting Started with Apache Spark
Apache Spark is a powerful, multi-purpose execution engine for big data enabling rapid application development and high performance.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free