When I first started my internship, I wasn’t really sure what to expect. I knew the basics—MapR was a big data company, I was a technical marketing intern, and I would be doing quite a bit of competitive analysis. I originally found the position while searching for marketing internships in the Bay Area, and this one in particular popped out at me when I read the job description, since it intertwined my interest in marketing with my hope of getting some experience in the tech industry.
MapR Platform Blog Posts
This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good read whether you’re a white/grey/black hat, cyber superhero, or even if you’re not a security expert at all.
Six months ago, we launched the Converge Community in order to provide a seamless way for Hadoop and Spark developers, data analysts, and administrators to engage in technical discussions and share expertise that furthers the advancement of the big data community as a whole.
In this week's Whiteboard Walkthrough, Ellen Friedman, Solutions Consultant at MapR, describes what happens when certain fundamental big data capabilities are engineered together as a part of the same technology. This brief overview compares the converged data platform as a foundation for big data projects versus building solutions on a base of separate pieces.
It’s not just a concern when ordering coffee. Something similar can happen as we investigate new and innovative big data technologies and techniques. I used the cappuccino example in a talk I presented recently at the Strata + Hadoop World Conference in London. The talk, titled “Building Better Cross Team Communication,” highlighted the importance of identifying and addressing the difference in how each side thinks the world works when two groups that have different experience and skills come together.
With stories of the thefts of millions of credit card records and sensitive employee data at some of the world’s largest companies and government agencies dominating recent headlines, it’s not surprising that organizations are doubling down on security. Security is finally starting to get top management’s attention.
Dale Kim, Sr. Director of Industry Solutions at MapR, describes the monitoring capabilities of the MapR Converged Data Platform, which easily give you a single view of all cluster operations. Leveraging popular open source technologies, the monitoring system is customizable and extensible to address the challenges of your big data deployment requirements.
With the increasing amount of information that we use daily, technology is only becoming more and more important in everything we do. And businesses are seeing this at much greater scale than we do as consumers. There are many great examples of this in just about every industry.
Apache Spark is becoming very popular and widely used in the big data community. There are several reasons for Spark getting such rapid traction. These include its in-memory processing capabilities, support for a wide range of engines for various use cases such as streaming, machine learning, and SQL, and the ability to develop in multiple languages such as Python and Scala.
In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.
In this week’s Whiteboard Walkthrough Part II, Ted Dunning, Chief Application Architect at MapR, talks about the design freedom gained by adopting a micro-services architecture based on streaming data. When you move – one step at a time - from an old style architecture that suffers from too much dependence on a shared global state database to a stream-based flow architecture, the isolation between micro-services results in reduced strain on the original database, improved flexibility and often speed.
“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.
Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).
I was at the annual Hadoop Summit in San Jose last week. As usual, the MapR booth was buzzing with big data enthusiasts and experts alike. We showcased demos that spanned multiple topics including multi-cluster Hadoop monitoring using Grafana and Kibana (as part of our new Spyglass Initiative), IoT stream analysis using MapR Streams and Spark Streaming, and self-service big data analytics using Apache Drill.
Today we are proud to announce the Spyglass Initiative focused on easy management, deep visibility and full control. With this first release, MapR Monitoring empowers administrators with cluster monitoring capabilities, including metric and log collection from nodes, services and jobs, and dashboards.
Customers are flocking to Spark as their primary compute engine for big data use cases, and we received further proof of this last week when we ran an “Ask Us Anything about Spark” forum in the Converge Community. There were some great discussions that took place, where our Spark experts answered questions from customers and partners.
Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is this approach practical in terms of the cost of storing years of data at scale? The answer is “yes”, particularly because of the way topic partitions are handled in MapR Streams. Here’s how it works.
Is there a case to be made for big data for security analytics? The answer is an unqualified “yes.” In fact CSO Magazine called cyber security “the killer app” for big data analytics.
There’s been a lot of buzz and high expectations in the big data community around Apache Spark 2.0 and how it will impact the development of data pipelines, streaming applications, machine learning algorithms and all of the other use cases that Apache Spark is enabling.
In the beginning was data. How do we know this? Because many (if not all) creation stories from all cultures were essentially developed as an explanation of the world as observed by humans.
In this week's Whiteboard Walkthrough, Ellen Friedman, a consultant at MapR, talks about how to design a system to handle real-time applications, but also how to take advantage of streaming data beyond those in the moment insights.
Apache Spark, a powerful general purpose engine for processing large amounts of data, has seen a rapid increase in its adoption since its release. Recognizing its impact very early on, MapR has supported and invested in Spark as part of our Hadoop distribution to enable enterprises to build applications with Spark and deploy it in production in a reliable manner.
Streaming data now is a big focus for many big data projects, including real time applications, so there’s a lot of interest in excellent messaging technologies such as Apache Kafka or MapR Streams, which uses the Kafka 0.9 API.
With all the talk about Big Data, most organizations are barely out of the starting blocks when it comes to exploiting it for business benefit. Gartner estimates that 85% of Fortune 500 companies are yet unable to exploit Big Data for competitive advantage.
"Big 10 banks fined $43bn over seven years for failures in customer reporting” reads yesterday’s headline in Financial Times and I wonder how the power of big data could have helped in saving these billions of dollars.
The first Kafka Summit was recently held in San Francisco. While the size of the conference was relatively small at 600 attendees, it was encouraging to see the variety of companies that are embracing real-time data pipelines.
Technological innovation is one of the great stories of the 21st century. Over the past 15 years, technology companies have generated unprecedented wealth at a blistering pace, fueled by smart and capable teams of brilliant scientists and engineers.
We are honored to announce that MapR was named one of the Top 10 Banking Analytics Solution Providers for 2016 by Banking CIO Outlook magazine.
Organizations embracing big data are ready to put data to work, including looking for ways to effectively analyze data from a variety of sources in real time or near real time.
MapR, Cisco, and SAP have been collaborating for years to help you gain insight from all of your data sources. Today, we’re excited to announce that Cisco has developed an appliance that includes the MapR Converged Data Platform for SAP HANA, making it much easier and faster for you to harness the power of big data.
Editor's note: In this week's Whiteboard Walkthrough, Dale Kim, Sr. Director of Industry Solutions at MapR, discusses three examples of how the auditing capabilities in the MapR Converged Data Platform are beneficial for your big data environment.
There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.
In this week's Whiteboard Walkthrough, Dale Kim, Director of Industry Solutions at MapR, describes the 540° Customer View.
In my recent article for insideBIGDATA “Converged Data Platforms: Part of a Larger Trend”, I talked about the inevitable direction of technology architecture towards a limitless mainframe model, a converged data center that will be composed largely of open source technologies like Linux, KVM, Hadoop, Spark, Mesos, and OpenStack.
For almost seven years, MapR has been committed to advancing the understanding and application of open-source technology to solve big data challenges. Last year we delivered on the promise of Hadoop with the industry's only enterprise-grade, Converged Data Platform that supports a broad set of mission-critical and real-time production uses.
What’s clear to me is that we are in the midst of the biggest change in enterprise computing in decades: a shift in how data is stored, analyzed and processed is changing the way businesses operate and compete in the marketplace.
In the world of data warehouses and data marts, OLAP analysis has existed for many years. Concepts like drill down, drill across and roll ups have allowed business analysts and users to easily access and analyze data across a variety of dimensions such as product, customers and regions.
We are excited to share with you that Gartner has named MapR a Visionary in the Gartner 2016 Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Gartner evaluated 21 software vendors on 15 criteria for the quadrant.
Hadoop is a key data technology for Big Data, as everyone knows. But the question becomes, how can Big Data help make me more competitive, more efficient, and better able to detect fraud, security breaches, and other abuses?
In this week's whiteboard walkthrough, Tugdual Grall, technical evangelist at MapR, explains the advantages of a publish-subscribe model for real-time data streams.
Banks are among the many businesses taking advantage of big data and IoT opportunities, including for mobile payments, online banking, and smart kiosks, but the huge quantities of personally sensitive data from these activities must be protected at all stages.
One of the many high points in Disney’s Star Wars Episode VII: The Force Awakens movie was the return of several classic ships and other vehicles from the original trilogy, as well as the introduction of innovative, new types of vehicles. With all the advanced technology on these ships, one can’t help but wonder what kind of big data software and analytics they would be using for threat assessment and prediction, mission planning, and enemy ship tracking and identification.
Big data and Hadoop-based approaches are now widely recognized but are still considered by many to be new technologies. The potential benefit of these approaches already is clear, but are they able to deliver practical value now?
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.
You may understand your work style at the office, but what if you were a developer reindeer at the North Pole?
Over the last 5 years of shipping product we’ve watched our customers get enormous value out of storing and processing big data. The use cases are far and wide, from performing predictive maintenance on oil rigs to building fraud and risk models on financial transactions.
Today is very significant for MapR, with the introduction of MapR Streams and the industry’s first, and only, Converged Data Platform.
There’s good news in the world of NoSQL databases that will put a smile on the face of developers – and that should also make business leaders happy because it means shorter time-to-value. You can now enjoy the ease and flexibility of a document-style database with the power of extreme scalability and performance.
In this blog post, I will briefly summarize some of the key capabilities that customers are finding immensely valuable in Drill. I’ll also cover common use cases where Drill is deployed, as well as resources for getting started with Drill.
In this week's Whiteboard Walkthrough, Dale Kim, product marketing director, explains how document databases fit in your enterprise's use cases.
When we read “data journalism” articles, it often appears that journalists are walking a perilous line. In many cases, they’re working with data that is provided by the creators.
At Strata+Hadoop World in New York last week, MapR CMO Jack Norris talked about the Big Data Dividend – the ongoing, significant profits that are derived from data-driven applications. In his keynote, Jack provided a look at the bigger picture.
The MapR Distribution including Hadoop is now available on the Azure Fast Start. This solution enables push button deployment of MapR on the Azure cloud infrastructure, providing you with the solutions to turn your big data into big money.
Cloudera’s announcement of a new open source project called Kudu, a technology described as a “complement to HDFS and Apache HBase... designed to fill gaps in Hadoop’s storage layer.” Apparently Cloudera’s development team “... eventually came to the conclusion that large architectural changes were necessary to achieve our goals”.
Today, MapR has announced the developer preview of MapR-DB with native support of JSON, and the new library OJAI (Open JSON Application Interface), pronounced "OH-hy."
Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.
How times have changed—10-15 years ago, when you needed to store data for your application, it was likely structured data; the data fields were known ahead of time and didn’t change much.
It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.
MapR is glad to partner with SAP and we are excited to see them bring lead-edge innovations to the market. We are thrilled today to talk about a new offering from SAP that along with the MapR data platform to help you better serve your customers and simplify how your business works.
The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.
As you probably know (unless you’ve been living under an ant hill), Ant-Man is a fictional superhero who first appeared in Marvel comic books, and he’s also a proud founding member of The Avengers. He made his debut on the big screen recently with the advent of this summer’s blockbuster movie, “Ant-Man,” which has, as of last week, already earned $116.8 million at the domestic box office, and $234 million worldwide.
There’s a reason the industry refers to Big Data as “Big” Data. According to IBM, we create 2.5 quintillion bytes of data. Here’s another eye-opening stat: 90 percent of the data in the world today has been created in the last two years alone.
Hadoop has been a phenomenon for big data and operational workloads. It has transformed from its batch-oriented roots into an interactive platform by incorporating a number of components, including technologies that provide SQL and distributed in-memory capabilities.
We thought the Kickstart song by Mötley Crüe was appropriate, since everyone is excited about kickstarting their Spark-based applications these days. That’s our theme for the Quick Start Solutions we’re announcing today at Spark Summit West—you can kickstart your Spark efforts into high gear with our Spark Quick Starts. You’ll be able to develop at high speeds, use streaming data, and build applications faster.
Most of us have experienced the power of data-driven recommendations. Maybe you found a former colleague through LinkedIn’s “People You May Know” feature or you watched a movie because Netflix suggested it to you. And it’s quite likely that you bought something that Amazon.com recommended to you under the "Frequently Bought Together" section. It’s estimated that recommendation engines power approximately 30% of Amazon’s revenue. In all of these instances, recommendation engines help narrow your choices to those that best meet your particular needs. In all of the above situations, the systems that these companies built incorporate algorithms that learn from past data. Customers benefit from a more tailored and personalized experience, and this positive experience increases the likelihood that they’ll buy more products and services and stay loyal to the particular service provider or retailer in question. For the merchant or service provider, recommendation engines increase up-sell and cross-sell rates, reduce churn, and improve customer loyalty.
Today, MapR introduced Quick Start Solutions, a powerful package of services, software and training/certification to help you jump-start your deployments of enterprise data hub, security and marketing applications. These solutions address commonly implemented and high-value Hadoop use cases for Data Warehouse Optimization and Analytics, Security Log Analytics and Recommendation Engines.
As we close out the year, here is a look back at our 10 most popular blogs of 2014. Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.
I commonly hear lots of questions about how many drives to use per node in a cluster. For a long time, the norm was to have 4-6 drives per node, but lately, I have been hearing more people suggest 12 drives. At MapR, we have been recommending 12 or 24 drives for quite some time to take advantage of the inherent advantages of MapR-FS, but I still hear lots of people recommending smaller configurations. In fact, I think that the norm is moving much higher than 12 drives. It is not uncommon for us to see boxes with up to 60 large drives lately. These are not the majority of systems by any stretch, but they have some very distinct advantages in terms of money (capex $/TB, opex $/TB) and power (opex W/TB).
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #2: MapR provides world record performance for Hadoop.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #5: MapR provides complete data protection and disaster recovery with real snapshots and mirroring.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #7: MapR provides the top-ranked NoSQL key-value database for current offering.
One cat, a radio collar, and a night on the town – this little adventure turned into an entertaining article in Wired magazine 8 August 2014 by Andy Greenberg about the creative use of a feline investigator to find weak points in security of the neighborhood’s wifis.
There are hundreds of open source and commercial projects that relate to Hadoop in one way or another. These projects can be divided into two categories:
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!