Get an introduction to streaming analytics, which allows you real-time insight from captured events and big data. There are applications across industries, from finance to wine making, though there are two primary challenges to be addressed.
Apache Hadoop Blog Posts
This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good read whether you’re a white/grey/black hat, cyber superhero, or even if you’re not a security expert at all.
It’s not just a concern when ordering coffee. Something similar can happen as we investigate new and innovative big data technologies and techniques. I used the cappuccino example in a talk I presented recently at the Strata + Hadoop World Conference in London. The talk, titled “Building Better Cross Team Communication,” highlighted the importance of identifying and addressing the difference in how each side thinks the world works when two groups that have different experience and skills come together.
Dale Kim, Sr. Director of Industry Solutions at MapR, describes the monitoring capabilities of the MapR Converged Data Platform, which easily give you a single view of all cluster operations. Leveraging popular open source technologies, the monitoring system is customizable and extensible to address the challenges of your big data deployment requirements.
With the increasing amount of information that we use daily, technology is only becoming more and more important in everything we do. And businesses are seeing this at much greater scale than we do as consumers. There are many great examples of this in just about every industry.
Apache Spark is becoming very popular and widely used in the big data community. There are several reasons for Spark getting such rapid traction. These include its in-memory processing capabilities, support for a wide range of engines for various use cases such as streaming, machine learning, and SQL, and the ability to develop in multiple languages such as Python and Scala.
“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.
Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).
In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.
I was at the annual Hadoop Summit in San Jose last week. As usual, the MapR booth was buzzing with big data enthusiasts and experts alike. We showcased demos that spanned multiple topics including multi-cluster Hadoop monitoring using Grafana and Kibana (as part of our new Spyglass Initiative), IoT stream analysis using MapR Streams and Spark Streaming, and self-service big data analytics using Apache Drill.
Today we are proud to announce the Spyglass Initiative focused on easy management, deep visibility and full control. With this first release, MapR Monitoring empowers administrators with cluster monitoring capabilities, including metric and log collection from nodes, services and jobs, and dashboards.
Is there a case to be made for big data for security analytics? The answer is an unqualified “yes.” In fact CSO Magazine called cyber security “the killer app” for big data analytics.
In the beginning was data. How do we know this? Because many (if not all) creation stories from all cultures were essentially developed as an explanation of the world as observed by humans.
Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.
Apache Spark, a powerful general purpose engine for processing large amounts of data, has seen a rapid increase in its adoption since its release. Recognizing its impact very early on, MapR has supported and invested in Spark as part of our Hadoop distribution to enable enterprises to build applications with Spark and deploy it in production in a reliable manner.
Perhaps you’re old enough to remember when the library was the place we went to learn. We foraged through card catalogs, encyclopedias and the Reader's Guide to Periodical Literature in hopes that we’d be able to understand what was going on in other people’s minds when they decided what went where.
With all the talk about Big Data, most organizations are barely out of the starting blocks when it comes to exploiting it for business benefit. Gartner estimates that 85% of Fortune 500 companies are yet unable to exploit Big Data for competitive advantage.
In some circles today there is a sort of ‘Hadoop vs. RDBMS’ debate ongoing. Often the discussion casts Hadoop as the obvious heir apparent in the data processing world, with RDBMS cast as your father’s Oldsmobile.
Organizations embracing big data are ready to put data to work, including looking for ways to effectively analyze data from a variety of sources in real time or near real time.
There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.
What’s clear to me is that we are in the midst of the biggest change in enterprise computing in decades: a shift in how data is stored, analyzed and processed is changing the way businesses operate and compete in the marketplace.
The number of organizations that are thinking about using Hadoop has grown astronomically over the past year. How do you know whether you’re ready to implement Hadoop, and what are the best practices?
In the world of data warehouses and data marts, OLAP analysis has existed for many years. Concepts like drill down, drill across and roll ups have allowed business analysts and users to easily access and analyze data across a variety of dimensions such as product, customers and regions.
We are excited to share with you that Gartner has named MapR a Visionary in the Gartner 2016 Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Gartner evaluated 21 software vendors on 15 criteria for the quadrant.
Hadoop is a key data technology for Big Data, as everyone knows. But the question becomes, how can Big Data help make me more competitive, more efficient, and better able to detect fraud, security breaches, and other abuses?
It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016.
Banks are among the many businesses taking advantage of big data and IoT opportunities, including for mobile payments, online banking, and smart kiosks, but the huge quantities of personally sensitive data from these activities must be protected at all stages.
Big data and Hadoop-based approaches are now widely recognized but are still considered by many to be new technologies. The potential benefit of these approaches already is clear, but are they able to deliver practical value now?
The faster questions can be asked the faster you can get answers. Waiting for data to be shipped off of servers to a central processing platform can take time and most businesses these days want to get as close to real time as possible.
Streaming data enables businesses to respond to customers as close to real time as possible. There are many different ways to leverage a streaming platform and utilizing Spark in your streaming architecture is easier than you might think.
As technology advances at breakneck speed, our lives are becoming increasingly digitized. From Twitter feeds to sensor data to medical devices, companies are drowning in big data yet starving for actionable information.
There’s good news in the world of NoSQL databases that will put a smile on the face of developers – and that should also make business leaders happy because it means shorter time-to-value. You can now enjoy the ease and flexibility of a document-style database with the power of extreme scalability and performance.
Walmart is an industry leader in global e-commerce and brick-and-mortar retail, and they’re also a leader in the use of Hadoop-based technologies to implement their new data-driven approach to business.
The recent Attunity and MapR webinar ”Give your Enterprise a Spark: How to Deploy Hadoop with Spark in Production” proved to be highly interactive and engaging. As promised, Nitin and Rodan have provided follow-up answers your questions.
The cost of waste, fraud and abuse in the healthcare industry is a key contributor to spiraling health care costs in the United States. In 2012, healthcare waste and abuse accounted for nearly $60 billion.
In this blog post, I’ll talk about the relationship between Spark and Hadoop, what Hadoop gives Spark, and what Spark gives Hadoop.
Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.
Recently, a new name has entered many of the conversations about big data. Some people see the popular newcomer Apache Spark™ as a more accessible and more powerful replacement for Hadoop, the original technology of choice for big data.
It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.
The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.
Apache Hadoop is revolutionizing big data in more than one way. While the Hadoop platform introduced reliable distributed storage and processing, various packages such as Spark on top of Hadoop make it possible to build applications and analyze data much faster. Here are some cool ways the Hadoop stack is being used right now.
Organizations have struggled with critical performance and scalability shortcomings of conventional data integration for years, leading many to push heavy data integration workloads down to the data warehouse. As a result, core data integration experienced a shift from extract, transform, and load (ETL) to extract, load, and transform (ELT).
Every business has some kind of database. It’s right up there with word processors and spreadsheets as essential business software. Relational databases are one of the most popular types—but as useful as they are, they’re not necessarily the best for every situation. NoSQL databases are becoming popular because they can handle different types of data at scale more efficiently.
Hadoop has been a phenomenon for big data and operational workloads. It has transformed from its batch-oriented roots into an interactive platform by incorporating a number of components, including technologies that provide SQL and distributed in-memory capabilities.
Reducing operating costs and increasing efficiency are, and will always be, priorities for any business, but they become imperative when an industry is facing cyclical challenges. Given the current volatility in the oil market, the oil and gas industry is looking for solutions that can proactively address inefficiencies through better asset tracking and predictive maintenance.
Today, a significant number of our customers are deploying the MapR Big Data platform in the cloud. This is not only true for development and test use cases, but also for many more production ones, especially those in the Internet of Everything (IoE) space.
Lately I’ve talked to lots of people who are just getting their heads wrapped around the value of big data software such as Apache Hadoop, but are getting stuck figuring out the details. What kind of servers do I need to buy? What services do I need to install to make a “data lake”? How do I make sure I install the services in a way that makes them highly available, while being optimized for performance?
As you probably know, Apache Hadoop was inspired by Google’s MapReduce and Google File System papers and cultivated at Yahoo! It started as a large-scale distributed batch processing infrastructure, and was designed to meet the need for an affordable, scalable and flexible data structure that could be used for working with very large data sets.
The MapR Distribution including Hadoop is now available in a private IT sandbox environment on the Amazon Web Services (AWS) Test Drive. We’ve partnered with AWS to create this lab environment so that you can gain hands-on experience with Hadoop.
Gartner just released a comprehensive research report based on a survey highlighting the adoption trends around Hadoop, which sheds some light on where and how customers are getting value from Hadoop. Some of the key take-aways in the report include:
Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications.
For the past several years, organizations have been struggling to figure out how to deal with all of the new data that is streaming in all around them. From smartphones to production line sensors, everything is generating data.
So you’ve researched the general capabilities of Hadoop, and have worked with your colleagues to identify the first set of big data use cases to tackle. Now, you’re ready to take the plunge and select the right Hadoop solution to invest in. After thoroughly doing your homework, you might think the final selection step would be simple, knowing that open source packages appear to be the same across the different Hadoop distributions out there. Unfortunately, that is not the case.
I recently wrote an article about how to get a Big Data initiative back on track. In the comments section, a user challenged how such an initiative is different to a traditional analytics project. That’s a good question, and the answer is not immediately obvious to most. The key differentiators are not Hadoop, NoSQL, large datasets or any of the usual suspects. The difference is that it is impacting all organizational datasets, the overall flow of data in the long-term, as well as data processing and storage.
This is a tremendously exciting time for those who work in clinical genomics. The demand for cutting-edge technologies that deliver fast and accurate genome information has exploded. In 2013, close to 2000 genome sequencers were in operation. These genome sequencers produced a whopping 15 petabytes of sequence data, which included the sequencing of 300k human genomes.
In this week's Whiteboard Walkthrough, Dale Kim, Director of Industry Solutions at MapR, gets you up to speed on Apache Hadoop and NoSQL. He talks about the similarities and differences between the two, but most importantly how both technologies should be a requirement for any true big data environment.
Fraud represents the biggest loss for banks, accounting for upward of $1.744 billion in losses annually. The banking industry spends millions each year on technologies aimed at reducing fraud and retaining customers, but the spend does little in protecting banks. Let’s focus on why the current fraud detection approaches don't work as well as they should and how machine learning on big data can help.
Companies everywhere are excited about harnessing big data and putting it to work. Adopting a Hadoop distribution is a critical decision that has far-reaching ramifications for your organization. CITO Research recognizes this in its white paper, “Five Questions to Ask Before Choosing a Hadoop Distribution.”
To get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. The Hadoop revolution is about merging business analytics and production operations to create the ‘as-it-happens’ business.
Big data challenges and opportunities are rapidly spreading across a huge number of organizations, large and small, in a wide range of verticals. Not surprisingly, people are turning to scalable solutions such as Apache Hadoop and NoSQL-based technologies to meet these challenges. The choice of an excellent data platform along with smart selections from among the many Hadoop ecosystem tools are of course important decisions to set yourself up for success, but there are also some other fundamental choices that can make a big difference in meeting and exceeding your goals, regardless of the particular project or tool involved.
Gigaom Research released a new report recently, titled “Extending Hadoop Towards the Data Lake.” According to the report, early data lake adopters are integrating Hadoop into organizational workflows, and are addressing challenges regarding the cleanliness, validity, and protection of their data. Their research resulted in some key findings.
Big data made a huge leap forward in 2014 in terms of the increased adoption of big data technologies among enterprises. Where is big data heading in 2015? At the beginning of this year, MapR CEO John Schroeder made several predictions about key aspects of big data and Hadoop, from data agility and processing data platforms to self-service and market consolidation. Check out our Big Data Trends in Action infographic below to learn about the five key trends that will reshape the way enterprises operate internally and connect with consumers.
Today, MapR introduced Quick Start Solutions, a powerful package of services, software and training/certification to help you jump-start your deployments of enterprise data hub, security and marketing applications. These solutions address commonly implemented and high-value Hadoop use cases for Data Warehouse Optimization and Analytics, Security Log Analytics and Recommendation Engines.
One of the best ways to figure out how to succeed with your own large-scale projects is to see what others are doing – what has worked for them and what has not.
Dr. Pramod Varma, Chief Architect and Technology Advisor to Unique Identification Authority of India (UIDAI), gave an informative talk titled “Architecting World's Largest Biometric Identity System - Aadhaar Experience”. He began his talk by talking about why the Aadhaar project was created. In India, the inability to prove one’s identity is one of the biggest barriers that prevents the poor from accessing benefits and subsidies. India is a country with 1.2 billion residents in over 640,000 villages. The Indian government spends $50 billion on direct subsidies (food coupons for rice, cooking gas, etc.) every year. Both public and private agencies in India require proof of identity before providing services or benefits to those living in India.
The following is a guest blog post from Sean Kandel, CTO & Co-founder of MapR Partner, Trifacta. It’s no secret that enterprises are increasingly adopting Hadoop for a variety of analytic purposes. The Hadoop software stack introduces entirely new economics for storing and processing data at scale. It also allows organizations unparalleled flexibility in how they’re able to leverage data of all shapes and sizes to uncover insights about their business.
MapR announced today that our SQL-on-Hadoop solution earned the highest score for Hadoop/data warehouse interoperability. MapR was among six vendors invited to participate in Gigaom Research’s January 2015 report, “Sector Roadmap: Hadoop/Data Warehouse Interoperability.” One of the key factors for our top placement in this competitive evaluation was the integration powers of Apache Drill’s technology included in the MapR Distribution. This report validates Apache Drill as a major advancement in data exploration given its schema flexibility, which makes it possible for you to immediately query complex data in native formats, such as schema-less data, nested data, and data with rapidly-evolving schemas, with minimal IT involvement.
Did you know that not all Hadoop distributions are the same? As Hadoop deployments grow, the architectural differences between Hadoop distributions begin to show dramatic cost differences. These differences can save you 20-50% in terms of total cost of ownership, as we detailed in a previous post. To make it easier for you to compare distributions and understand the true costs for deploying and running Hadoop, we’ve developed the for Hadoop, a simple self-service tool that uses your own data to show how the Hadoop distributions costs stack up.
Today MapR announced the availability of free Hadoop On-Demand Training for developers, analysts and administrators. Hadoop On-Demand Training offers full-length courses on a range of Hadoop technologies for developers, data analysts and administrators. Designed in a format that meets your convenience, availability and flexibility needs, these courses will lead you on the path to becoming a certified Hadoop professional.
There are resources and there are Resources. The old statement is still valid: Your project's success is entirely dependent on the people you hire. Hadoop initiatives are no exception to this; instead, these initiatives are particularly demanding. Here are a few reasons why: 1) Hadoop is an emerging technology, there is and there will be much hype around it; the temptation to try this technology no matter what is coming from specialists and companies, large and small. Emerging is synonymous to evolving when it comes to Hadoop; funny enough, the book on Hadoop that was just released by O'Reilly and Pentaho (freely available to download here) will become outdated in a few months' time. The team members you require must be a particular, fond-of-learning kind to be able to stay on top of the Hadoop evolution, choose which innovations to adopt and, at the end of the day, deliver well.
Information wants to be free. Open source is free. Moore’s law is making computing free. Free, Free, Free. Enough already with the free. In the real world, computing costs money. Making great products costs money. More efficient computing saves money. If you’re running a serious big data infrastructure, you must first focus on getting value, but once that’s done, you must make sure you are not bleeding money in a variety of ways. Is your cluster too big? Are you wasting storage? Do you spend too much time on admin? Are you building up technical debt? How much is downtime wasting? If you aren’t asking these questions, you are surely wasting money.
I recently joined MapR to lead the Product Management group. Since I used to consult with MapR four years ago (before the company was launched), it is natural that everyone wants to know, "What's different four years later?" The big answer is that the promise of Hadoop has turned into a reality for a wide array of situations, and the impact is really meaningful. Here are some first impressions after working at MapR for a few weeks. Note: I have to say up front that this is not meant to be a dispassionate research piece; It is just top of mind impressions.
Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).
Some people say I am biased toward certain technologies. That is a completely true statement! Granted, it does depend on the specific technology. But just because I may be biased with certain technologies doesn’t mean I’m not objective or fair. When it comes to Hadoop, Apache Hadoop really is free–as in beer. But, in reality, unless you are a huge company with a massive team of engineers, you very likely are NOT going to be patching and building your own distribution of Hadoop to run internally. No matter what anyone says, if you are paying for support for your distribution of Hadoop in any way then it is not truly free.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Previous posts in this series provided insights about nine good reasons why customers choose MapR: from security to performance and TCO to ease of integration. Here’s reason #1: High availability.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #2: MapR provides world record performance for Hadoop.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #3: MapR provides ease of data integration through industry standard interfaces for data access and movement such as read-write NFS, ODBC, REST APIs and LDAP. Read-write NFS access is one of the key reasons customers choose MapR.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #5: MapR provides complete data protection and disaster recovery with real snapshots and mirroring.
In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #9: MapR provides a read-write file system for real-time Hadoop.
To get real with Hadoop, you need a real enterprise-ready platform. Join us as we begin the countdown of the 10 top reasons our customers choose MapR. Reason #10....Enterprise-grade security. Security has always been important with enterprise customers, but today it’s non-negotiable.
Our daily commute may not feel like such a high tech experience, but whether you feel it or not – it is. Big data and Hadoop have revolutionized the transportation industry over the past several years. Whether in a car, a train, a plane or a delivery truck, we all use big data throughout our travels. Let’s go through a few specific use cases to spotlight transportation businesses that are using big data in a big way.
Five minutes is easily squandered without much thought; however with Hadoop, five minutes can make a big impact. John Schroeder, MapR CEO and Founder, recently used a five-minute keynote address to illustrate this point.
Following is an edited transcript of John's message.
Capping off the first day of 2014 Hadoop Summit, the Open Air, Open Source Celebration was a lively event bringing friends of MapR together to toast the Apache Drill and Apache Spark projects. Rather than words, we will let the photos tell the story.
Conversations and ideas flowed in the fresh air. Tomer Shiran, VP of Product Development and Ted Dunning, Chief Application Architect enjoyed talking Hadoop in the relaxed outdoor venue.
SQL-on-Hadoop just got easier this morning. Working together with the HP Vertica team, we are excited to announce general availability of the HP Vertica Analytics Platform running on the MapR Distribution for Apache Hadoop.
A while back, I presented a Big Data Glossary: A to ZZ. In separate articles, I discussed some of the different entries in the glossary. Here, I focus on H (Hadoop), which is the evolving but increasingly standardized big data computing platform.
The explosive growth in data and in big data technologies (that process and transform the data into knowledge) corresponds to a new industrial revolution. The raw materials and the machinery are different from past revolutions, but the fundamental features are not so different – new markets, new opportunities, new tools, and new wealth are being created at a remarkable pace.
I am not sure about other data scientists’ experiences in trying to explain Big Data and Data Science to family members, but I find that there is a lot of interest coupled with a lot of confusion. That’s not too surprising since there are many experts in the field who are also confounded by the mixed messages, the hype, the uncertain meanings of terminology, and where all of this is going.
MapR announced at the end of last week that we were among the select companies that Forrester Research Inc. invited to participate in its report entitled “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014.” In this evaluation, MapR was cited as a Leader and achieved the highest score for Current Offering among all reviewed vendors.
We, at Flux7 Labs, a solutions company, help customers maximize performance/$. To help our customers make the right decisions we constantly research and evaluate the solutions available to customers and thereby build and strengthen our internal knowledge. As part of this research process, we evaluated the most common Hadoop distributions on various metrics. The distributions we tested were from Intel, Cloudera, Hortonworks, and MapR. This testing was done independently on all the distributions.
This past summer I had an amazing opportunity to work on the up and coming open source project Apache Drill. Working at MapR Technologies alongside a great group of engineers and with an expanding open source community has been an invaluable learning experience. SQL in Hadoop has been a very hot topic over the past year, and while there are a lot of different approaches being pursued, I believe Drill is making one of the strongest cases for providing speed, extensibility and longevity.
The Hadoop Space is Incredibly Competitive—a Knock-Down, Drag-Out Fight across the Major Distributions.
Myth or Reality?
M.C. Srivas, CTO will participate on the Big Data Ecosystem partner panel at Splunk .conf2013.
Q: How specifically are you addressing variety, not merely volume and velocity?
Back in 2011, Paul Yang from Facebook’s engineering team published a fascinating blog post that detailed how they migrated a 30TB Hadoop cluster from one server base to another. As one of the world’s largest Hadoop deployments, Facebook has deployed several Hadoop clusters collectively numbering over 5,000 nodes, and has amassed an impressive roster of hundreds of H
Mike Gualtieri, Principal Analyst with Forrester Research joined us for a webinar titled Productionizing Hadoop: Seven Architectural Best Practices. Following the webinar, Mike answered a number of questions from participants including one about how to get started with Big Data.
MapR was proud to participate in the Developer Sandbox at Google I/O, Google’s exclusive annual developer conference held in San Francisco last week. The conference featured speakers from various industries as well as code labs and developer demos.
Expanding Big Data solutions to yet another level, MapR made two announcements today: MapR M7, which provides an enterprise-grade NoSQL and Hadoop solution to our customers, is now available; and LucidWorks Search will be distributed with the MapR Big Data Platform for Apache Hadoop, including with the new MapR M7 Edition.
M7 Now Available
Look for MapR at these events:
Hadoop users were excited to see the real-time Hadoop analytics demonstration at the Strata Conference in Santa Clara. By streaming the #strataconf twitter hashtag directly into a cluster during the conference, MapR displayed two real-time tag clouds showing a word bubble with the most frequently used words in conference tweets and a user name cloud of top tweeters. Watching the information change proved mesmerizing for some.
How did we do this? By bringing MapR and Storm together to capitalize on their strengths.
Gone in 60 seconds! Breaking the MinuteSort RecordYuliya Feldman, Amit Hadke, Gera Shegalov, M. C. Srivas
IntroductionThe MinuteSort test measures how much data a system can sort in 1 minute. The test requires that a random sequence of 100-byte records, each consisting of a 10-byte key and 90 bytes of payload, be arranged in either ascending or descending order. The earlier record had sorted 14 billion records totaling 1400 gigabytes in 60 seconds. The web site sortbenchmark.org keeps track of all such records.
In it, author Dan Woods makes a couple of interesting observations:
An Earlier Result
In a new briefing, the 451 Group details MapR’s exciting past few months. MapR has continued to strive to expand our product to meet our customers needs in wherever computing environment they may have – in the cloud, on-premise or both. The Amazon and Google announcements serve to establish MapR as the emerging defacto standard for Hadoop.
Recently, Ryan Rawson over at Drawn to Scale wrote on their decision to base their innovative product Spire on MapR’s distribution for Hadoop. I was excited to read Ryan’s posting for a couple of reasons:
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!