<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MapR Apache Hadoop Blog, News and Press Releases &#124; MapR Technologies</title>
	<atom:link href="http://www.mapr.com/index.php?option=com_wordpress&#038;Itemid=265&#038;lang=en&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.mapr.com?option=com_wordpress&#038;Itemid=265</link>
	<description></description>
	<lastBuildDate>Mon, 14 May 2012 05:47:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1.2</generator>
		<item>
		<title>Announcing the MapR Hive ODBC Driver</title>
		<link>http://www.mapr.com/?p=269&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=269&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Sat, 12 May 2012 05:40:33 +0000</pubDate>
		<dc:creator>Tomer</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=269&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[I’m happy to announce that we just released the MapR Hive ODBC Driver. It is available to all MapR M3 and M5 users. The MapR Hive ODBC Driver is a standard ODBC 3.52 driver, allowing our users to leverage hundreds &#8230; <a href="http://www.mapr.com/?p=269&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I’m happy to announce that we just released the MapR Hive ODBC Driver. It is available to all MapR M3 and M5 users.</p>
<p>The MapR Hive ODBC Driver is a standard ODBC 3.52 driver, allowing our users to leverage hundreds of commercial and open source SQL-based tools, such as query builders and BI applications. For example, we’ve tested our Hive ODBC Connector with Excel, Tableau, MicroStrategy and a variety of 100% open source SQL tools, such as Kaimon (<a href="http://www.kaimon.cl/">http://www.kaimon.cl</a>).</p>
<p>As you may know, we deliver the most open Hadoop distribution, supporting the broadest ecosystem of applications. As a Hadoop distribution, we obviously support the Hadoop APIs, but we also support a standard NFS interface, which allows any file-based application to read and write data. With our new Hive ODBC Connector, we allow SQL-based applications to run SQL queries (via Hive).</p>
<p>Other Hadoop vendors have struggled to support connectivity with third-party SQL-based applications, resulting in specialized, proprietary connectors. For example, one Hadoop vendor had to build special-purpose connectors for Tableau and MicroStrategy (based on partial ODBC 2.x support), and even a rudimentary Web-based query builder. Another Hadoop vendor announced a specialized connector for Excel. We decided to take the more open approach of providing a single ODBC driver that complies with the latest ODBC 3.52 standard so that our users can utilize practically any SQL-based query builder or BI application with no special-purpose connectors.</p>
<p>Here’s a screenshot showing Kaimon, an open source SQL query builder, analyzing some Apache httpd logs via the MapR Hive ODBC Connector:<br />
<a href="http://www.mapr.com/components/com_wordpress/wp/wp-content/uploads/2012/05/uiodbcscreen11.jpg"><img class="alignnone size-full wp-image-272" title="uiodbcscreen1" src="http://www.mapr.com/components/com_wordpress/wp/wp-content/uploads/2012/05/uiodbcscreen11.jpg" alt="" width="819" height="460" /></a></p>
<p>Here’s a similar screenshot with Microsoft Excel:</p>
<p><a href="http://www.mapr.com/components/com_wordpress/wp/wp-content/uploads/2012/05/uiodbcscreen2.jpg"><img class="alignnone size-full wp-image-273" title="uiodbcscreen2" src="http://www.mapr.com/components/com_wordpress/wp/wp-content/uploads/2012/05/uiodbcscreen2.jpg" alt="" width="819" height="460" /></a></p>
<p>To get started, check out the Hive ODBC Connector page in our documentation: <a href="To get started, check out the Hive ODBC Connector page in our documentation: http://www.mapr.com/doc/display/MapR/Hive+ODBC+Connector">http://www.mapr.com/doc/display/MapR/Hive+ODBC+Connector</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=269&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Haystacks and Jet Packs &#8211; How Hadoop Changes Everything</title>
		<link>http://www.mapr.com/?p=262&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=262&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Mon, 23 Apr 2012 17:30:46 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=262&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[I thought for sure we&#8217;d have flying cars and jet packs by the 21st Century. It turns out that the new tools for data analysis are far more important. Only a few short years ago, the best tools available could &#8230; <a href="http://www.mapr.com/?p=262&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I thought for sure we&#8217;d have flying cars and jet packs by the 21st Century. It turns out that the new tools for data analysis are far more important.</p>
<p>Only a few short years ago, the best tools available could only analyze small representative samples of big data sets. To figure out who will win the election, ask randomly selected people. To establish a correlation between behavior and outcome, do a study on a small group. This is an excellent way to find the common cases, but does nothing to turn up interesting anomalies&#8211;it&#8217;s like looking for a needle in a haystack by searching one percent of the hay&#8211;or one thousandth of one percent.</p>
<p>The new tools for analyzing Big Data can search the entire haystack&#8211;and are scaling up to deal with bigger and bigger haystacks. You may have heard this by now, but Big Data is big. How big? The <a href="http://public.web.cern.ch/public/en/lhc/Computing-en.html">Large Hadron Collider</a> produces over a petabyte per month. The <a href="http://arstechnica.com/science/news/2012/04/future-telescope-array-drives-development-of-exabyte-processing.ars">Square Kilometre Array</a> radio telescope, which is being developed over the next decade or so, will produce at least an exabyte per day of astronomical data. According to <a href="http://techcrunch.com/2010/08/04/schmidt-data/">Google CEO Eric Schmidt</a>, every two days the world produces as much data as it did in total up until 2003. The amount of data produced by the world nearly doubles every year&#8211;and the world produced one zettabyte in 2010 (according to <a href="http://blog.mastermaq.ca/2011/06/28/1-2-zettabytes-of-data-created-in-2010/">IDC</a>).</p>
<p>It&#8217;s not just scientists, search engines, and social media sites that need to process this much data. Retailers, merchants, and credit card companies are looking for patterns in massive flows of transaction, click, and sentiment data to fine-tune marketing, prevent fraud, and optimize the customer experience. Here&#8217;s an example. Let&#8217;s say there&#8217;s a transaction on your credit card indicating that you bought a tank of gas. An hour or so later, another transaction shows that you ate lunch. If the restaurant is within 30 miles or so of the gas station, then the speed you would have had to travel&#8211;the card velocity&#8211;is a plausible 30 miles per hour. But if the restaurant is, say, 500 miles away, then the card velocity is a good indicator of fraud, unless you also own a jet pack. Only by processing data about every single transaction and its location can valuable insights like card velocity be gained.</p>
<p>The <a href="http://www.nytimes.com/2011/04/26/science/26planetarium.html?pagewanted=all">New York Times</a> reported that data measurement could be as important as the invention of the microscope. By taking a snapshot of the data at various points in time, of course, you can tune the microscope by preserving a history of previous results, which makes it possible to further refine data analysis models.</p>
<p>This is the true beginning of the 21st Century. The possibilities that will grow from this new technology can barely be imagined today. So as you start to search for the needle in your haystack or even the needle in your jet pack, we&#8217;d love to hear the ways MapR made it possible for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=262&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lights Out Data Center Ready Hadoop</title>
		<link>http://www.mapr.com/?p=258&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=258&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Thu, 12 Apr 2012 00:39:50 +0000</pubDate>
		<dc:creator>Jack</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=258&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[What does it mean to be “Lights Out Data Center Ready”? It means that any failures whether hardware, software or user errors do not require immediate administrator action. On a scheduled basis administrators can visit the data center and perform &#8230; <a href="http://www.mapr.com/?p=258&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>What does it mean to be “Lights Out Data Center Ready”? It means that any failures whether hardware, software or user errors do not require immediate administrator action. On a scheduled basis administrators can visit the data center and perform maintenance that is now routine, not an emergency. Picture an administrator with a shopping cart full of disk drives casually moving through the aisle.</p>
<p>In discussions with customers it is immediately clear that they are confused by the various descriptions of Hadoop High Availability. My personal favorite is another vendor’s description of their HA as providing “Hot Manual Failover”. Huh? How is a manual failover process “hot”?  What generates the heat exactly &#8212; the flames by business users when the cluster is unavailable? This has to be the biggest oxymoron to hit business continuity since “Highly Available – Not”. At least it’s clear from the latter that it isn’t really Highly Available.</p>
<p>In contrast, MapR has been designed specifically for High Availability and is the only Hadoop distribution with no single points of failure. Other distributions use a single NameNode and when that name node goes down, the entire cluster becomes unavailable and you lose data. With MapR, the NameNode function is distributed across the cluster. In a sense, MapR has a “No NameNode” architecture so there is no data loss or downtime, even in the face of multiple disk or node failures.</p>
<p>When we talk about high availability we’re talking about automated, stateful failover for all software and hardware errors. Automated re-replication of data means that your system will work through any errors without issues. MapR’s rolling upgrades guarantee high availability during routine hardware and software maintenance.</p>
<p>MapR is also built to give full data protection with Mirroring and Snapshots – features designed to efficiently maintain data integrity and business continuity across clusters and sites. This is significant because the replication that other Hadoop distributions use does not protect against user or application errors that are replicated across a cluster but with MapR you are fully protected. MapR makes data protection easy and built in. Furthermore, you will experience zero performance lost on writing to original during snapshot, a petabyte snapshot can be performed in only seconds.</p>
<p>So when considering High Availability for Hadoop make sure to get the complete picture, and then you can safely turn off the lights.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=258&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Alan&#8217;s Partner Blog</title>
		<link>http://www.mapr.com/?p=251&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=251&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Tue, 06 Mar 2012 22:58:02 +0000</pubDate>
		<dc:creator>Alan</dc:creator>
		
		<guid isPermaLink="false">http://www.mapr.com/?p=251&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[Enterprises are benefitting directly from MapR’s partnerships. Cisco recently announced the integration of MapR Technology into the Cisco UCS platform and this week, MapR And Informatica Announced joint support to deliver high performance Big Data integration and analysis. Cisco has &#8230; <a href="http://www.mapr.com/?p=251&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Enterprises are benefitting directly from MapR’s partnerships. <a href="http://www.theregister.co.uk/2012/02/14/emc_cisco_greenplum_hadoop_stack/">Cisco recently announced</a> the integration of MapR Technology into the Cisco UCS platform and this week, <a href="http://www.informationweek.com/news/software/info_management/232601984">MapR And Informatica Announced</a> joint support to deliver high performance Big Data integration and analysis.</p>
<p>Cisco has published <a href="http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/wp_greenplum.pdf">high performance reference configurations</a> and Informatica announced a series of integration capabilities with MapR including real-time streaming into MapR with their Ultra Messaging technology.  <a href="http://www.informationweek.com/news/software/info_management/232601984">InformationWeek</a> states:  “For now, MapR has taken the lead in providing a low-latency option for streaming big data directly into Hadoop&#8217;s core MapReduce processing environment, and that counts as an edge on rival distributors.”</p>
<p>The timing of the partnerships and interest in MapR makes sense.  For customers that are considering distributions for Apache Hadoop, MapR provides unique functionality such as automated failover, dynamic file-based access and streaming support, and enterprise-grade data protection.  The value of the MapR distribution, is validated by the fact that the technology leaders in Storage (EMC), Data Management (Informatica), and Hardware (Cisco and others) are looking to MapR to help them address their customers Big Data needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=251&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Free Hadoop Training &#8211; Did I Get Your Attention?</title>
		<link>http://www.mapr.com/?p=241&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=241&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Thu, 23 Feb 2012 01:27:14 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=241&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[In the fall of last year, we launched the MapR Academy, the first free online training resource for Hadoop and MapReduce. Hadoop itself is so new that there is still a huge knowledge gap. Sure, there are a few experts &#8230; <a href="http://www.mapr.com/?p=241&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the fall of last year, we launched the<a href="http://academy.mapr.com"> MapR Academy</a>, the first free  online training resource for Hadoop and MapReduce. Hadoop itself is so  new that there is still a huge knowledge gap. Sure, there are a few  experts out there, but most people still don&#8217;t know what Hadoop is, or  why they should be interested in the first place.</p>
<p>Imagine trying to explain digital photography to someone 20 years ago.  Film was more than adequate to the task at hand, very inexpensive, and  easy to use. Sophisticated cameras took care of exposure and focus,  abstracting away the &#8220;implementation details&#8221; and letting the  photographer focus on snapping a picture. What motivation could the  typical shutterbug possibly have for switching to digital photography?  Pictures are pictures, after all.</p>
<p>Big Data is in the same state today. It&#8217;s difficult to imagine what  benefits Hadoop can provide, in the same way that early adopters of  digital photography could not have anticipated the impact of social  media. Just as the digital revolution completely changed how we think  about images, Hadoop is a complete paradigm shift in the way data is  stored and analyzed. Hadoop is the key to a future we are only just  beginning to imagine.</p>
<p>One of the first tasks of the MapR Academy is to get you started on the  basics of this disruptive technology. As the MapR Academy grows, we&#8217;ll  be adding more focused tutorials and advanced classes. I&#8217;d like to  invite you to take a look around, watch a few videos, and let us know  what you would like to see next in the curriculum.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=241&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Top Misconceptions about Big Data and Hadoop</title>
		<link>http://www.mapr.com/?p=237&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=237&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Wed, 08 Feb 2012 04:18:12 +0000</pubDate>
		<dc:creator>Jack</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=237&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[The Hadoop market is a fast growing, expanding and exciting ecosystem, but this can also be accompanied by confusion. I thought I’d take a stab at addressing some of the Big Misconceptions about Big Data.  I. First of all, the &#8230; <a href="http://www.mapr.com/?p=237&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The Hadoop market is a fast growing, expanding and exciting ecosystem, but this can also be accompanied by confusion. I thought I’d take a stab at addressing some of the Big Misconceptions about Big Data.  I. First of all, the term Big Data is approaching Cloud in its utter lack of descriptiveness. That said, Big Data is not simply about massive amounts of data – petabytes and beyond. Big Data represents a paradigm shift. It’s about new, unstructured, data sources. It’s about avoiding schema definitions and transformations. There’s no need to structure data before you can derive benefits. It’s about performing data and compute together to perform better and faster analysis. Through Hadoop, organizations can benefit even with relatively small amounts of data.</p>
<p>Since Hadoop is a funny name and somewhat new to people they assume it must be risky. Huge amounts of investment and work have addressed these concerns. Hadoop has emerged as a standard. The rich ecosystem around Hadoop has provided a lot of flexibility, choice, and trained professionals. There are product-grade distributions available, (MapR) that provide full data protection, automatic stateful failover and business continuity. The deployed footprint, complementary products, and available technical resources all contribute to Hadoop adoption. And with that, the number and breadth of deployed Hadoop applications have expanded rapidly.</p>
<p>Another misconception about Hadoop, is that it is a batch process. This is an artifact of the HDFS implementation and not a limitation of Hadoop per se. MapR, for example, provides full support for streaming analytics and real-time processing.</p>
<p>Perhaps the biggest misconception is that Hadoop is a single, monolithic, component. Hadoop is a framework &#8212; a complete stack for distributing applications and data. Hadoop supports multiple programming paradigms and includes packages such as Pig, Hive and others. There are packages for data ingress/egress, ETL, and data integration, as well as specific components for machine learning. Most distributions integrate, test and harden these packages along with some proprietary extensions.</p>
<p>With respect to open source, the question about a distribution is not a simple binary “open” or “closed”. The question is what components are open and what areas do proprietary value-added components address. In the case of Cloudera, the proprietary extensions are in the management tools. MapR has chosen to innovate in the areas that provide the most benefits to customers while also being the most difficult for the community to effectively address. These also happen to be areas in which customers have the least desire to modify such as the underlying storage services. MapR’s distribution includes value-added improvements along with all of the open source programming, data access, programming, and machine learning packages.</p>
<p>These are some of the top misconceptions. Let me know what other areas you’d like us to address.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=237&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top 10 NameNode-related problems</title>
		<link>http://www.mapr.com/?p=227&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=227&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Fri, 27 Jan 2012 00:48:13 +0000</pubDate>
		<dc:creator>Tomer</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=227&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[After joining MapR back in 2009, I spent many months meeting with early Hadoop users and listening to their pain points. In many of these meetings, users described problems related the HDFS architecture and the NameNode in particular. In this &#8230; <a href="http://www.mapr.com/?p=227&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After joining MapR back in 2009, I spent many months meeting with early Hadoop users and listening to their pain points. In many of these meetings, users described problems related the HDFS architecture and the NameNode in particular. In this blog post I wanted to share 10 NameNode-related issues that came up frequently in these meetings:</p>
<ol>
<li> We want HA, but the NameNode is a single point of failure. This results in downtime due to hardware failures and user errors. In addition, it is often non-trivial to recover from a NameNode failure, so our Hadoop administrators always need to be on call.</li>
<li>We want to run Hadoop with 100% commodity hardware. To run HDFS in production and not lose all our data in the event of a power outage, HDFS requires us to deploy a commercial NAS to which the NameNode can write a copy of its edit log. In addition to the prohibitive cost of a commercial NAS, the entire cluster goes down any time the NAS is down, because the NameNode needs to hard-mount the NAS (for consistency reasons).</li>
<li>We need both a NameNode and a Secondary NameNode. We read some documentation that suggested purchasing higher-end servers for these roles (e.g., dual power supplies). We only have 20 nodes in the cluster, so this represents a 15-20% hardware cost overhead with no real value (i.e., it doesn’t contribute to the overall capacity or throughput of the cluster).</li>
<li>We have a significant number of files. Even though we have hundreds of nodes in the cluster, the NameNode keeps all its metadata in memory, so we are limited to a maximum of only 50-100M files in the entire cluster. While we can work around that by concatenating files into larger files, that adds tremendous complexity. (Imagine what it would be like if you had to start combining the documents on your laptop into zip files because there was a severe limit on how many files you could have.)</li>
<li>We have a relatively small cluster, with only 10 nodes. Due to the DataNode-NameNode block report mechanism, we cannot exceed 100-200K blocks (or files) per node, thereby limiting our 10-node cluster to less than 2M files. While we can work around that by concatenating files into larger files, that adds tremendous complexity.</li>
<li>We hired a new engineer who did not understand the architectural issues and ran a simple directory traversal (the equivalent of the find command). This created so much load on the NameNode that it simply crashed, and the entire cluster was down.</li>
<li>We need much higher performance when creating and processing a large number of files (especially small files). Hadoop is extremely slow.</li>
<li>We have had outages and latency spikes due to garbage collection on the NameNode. Although we are using the CMS (concurrent mark and sweep) garbage collector, the NameNode still freezes occasionally, causing the DataNodes to lose connectivity (i.e., become blacklisted).</li>
<li>When we change permissions on a file (chmod 400 foo), the changes do not affect existing clients who have already opened the file. We have no way of knowing who the clients are. It’s impossible to know when the permission changes would really become effective, if at all.</li>
<li>We have lost data due to various errors on the NameNode. In one case, the root partition ran out of space, and the NameNode crashed with a corrupted edit log.</li>
</ol>
<p>When we looked at this list of NameNode-related problems, it was clear to all of us that the only viable solution was to eliminate the NameNode. Our engineering team spent two years re-architecting Hadoop’s storage layer (as well as advancing Hadoop’s MapReduce layer and developing the leading management suite for Hadoop).</p>
<p>The end result is that we have eliminated these 10 issues and many others. In my next blog post I’ll dive deeper into our no-NameNode architecture so that you can understand how it works, and why it really eliminates the issues with NameNode-based architectures (including all planned HDFS enhancements, such as HDFS Federation and HA NameNode). In the meantime, if you’ve run into other NameNode-related problems that I haven’t listed, let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=227&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sr. Director of Business Development, Alan Geary, discusses why MapR is the best Hadoop Distribution for Partners</title>
		<link>http://www.mapr.com/?p=225&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=225&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Tue, 24 Jan 2012 08:48:31 +0000</pubDate>
		<dc:creator>Alan</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=225&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[Having worked in hyper-growth companies, (3Com for 8 years, where we grew from a modest base to $5.5 Billion, and with VMware for the past 7 years) I’ve learned the keys to success. The formula for success I saw in &#8230; <a href="http://www.mapr.com/?p=225&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Having worked in hyper-growth companies, (3Com for 8 years, where we grew from a modest base to $5.5 Billion, and with VMware for the past 7 years) I’ve learned the keys to success.  The formula for success I saw in these high growth companies includes technology leadership, a great management team, and a solid strategy.  In our industry, the formula is relatively simple.  Putting that formula to work is the difficult part.  To me, MapR has the keys to success and is on the hyper-growth journey we all want to be a part of.</p>
<p>I believe another critical element is the commitment to partnering to achieve this growth if the foundational formula is in place.  If you are a consulting partner, MapR will work with you to position your business advisory services, POC or Pilots, and other services.  If you are a technology partner, we’ll work with you to differentiate our joint offering to customers.  We do that with benchmarking, interoperability statements, reference architecture, joint promotions, event marketing, etc.</p>
<p>In the short time I have been with MapR, I’ve been impressed by the involvement of partners with our customers and MapR’s commitment to the channel.   We want our partners to profit from services.  The expertise in the Hadoop is finite today, but that is a big opportunity for our partners.  The demand is there&#8211;we are seeing it from our customers.  Our partners will profit from that demand on both the consulting and training front.</p>
<p>Joining MapR was an easy decision for me.  If you are a consulting or technology partner, working with MapR should be an easy decision for you as well. I invite you to ask yourself a few simple questions:  Who is the best company to partner with?  Who provides the easiest to use and the most dependable product in the Big Data space?  Who isn’t going to compete for training or consulting dollars in your accounts?  Whose management team understands and is committed to partnering?  The answer is MapR.  To hear more, view my <a href="http://www.mapr.com/video/alangeary">video</a> and learn more about partnering with MapR.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=225&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Big Year for Hadoop</title>
		<link>http://www.mapr.com/?p=201&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=201&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Tue, 17 Jan 2012 08:30:16 +0000</pubDate>
		<dc:creator>Jack</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=201&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[Our CEO, John Schroeder was recently interviewed in the press and asked about his predictions for Hadoop in 2012. Simply put, he sees a Big year for Big Data. It’s not just the scale of data growth. John shared his &#8230; <a href="http://www.mapr.com/?p=201&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Our CEO, John Schroeder was recently interviewed in the press and asked about his predictions for Hadoop in 2012. Simply put, he sees a Big year for Big Data. It’s not just the scale of data growth. John shared his view that the ability to process and analyze Big Data is changing the game for companies and it’s changing the game in every aspect of their business. </p>
<p>Many enterprise IT vendors are wrapping themselves in some sort of “Big Data” cloak. The storage vendors talk about how they’ve always understood Big Data and cite the petabytes of storage that customers manage with their technology. The database and data warehouse providers make similar claims. Virtually, every vendor has some sort of Big Data presentation and by virtually every vendor I am of course including the virtualization providers. When MapR refers to Big Data we are talking about the MapReduce framework. This is the game changing approach popularized by Google.</p>
<p>We tend to take Google’s dominance for granted but when the Google search beta debuted in 1998 it was the 19th search engine on the market. The market was already well served with Yahoo, Excite, Infoseek, AltaVista, and a host of others. Within two short years, Google was the dominant player. The reason? MapReduce enabled Google to index much more data, much more quickly, and much more cheaply than any other provider. MapReduce is a paradigm shift, a new architecture that trumps existing approaches and provides any organization with the same power of changing their respective competitive landscapes. Google published a white paper on MapReduce in 2003. A Yahoo engineer, named Doug Cutting read the paper and the result was Hadoop. We’ve seen Hadoop emerge as a robust ecosystem with innovations happening across the Hadoop stack. </p>
<p>This is unfolding to be a big year. <a href="http://www.mapr.com/company/press-releases/mapr-ceo-sees-big-changes-in-big-data-in-2012">John’s predictions for 2012</a> encompass five major developments in Big Data. These include:<br />
•	Hadoop emerges as the safe platform choice for Big Data. The deployed footprint, complementary products, and available technical resources all reinforce the adoption of Hadoop.<br />
•	Real-time analytics take-off. Analyzing streaming data from application logs to messages augments existing batch applications.<br />
•	Hadoop applications move from experimental to mission critical. The number and breadth of deployed Hadoop applications also expands.<br />
•	Consulting firms augment their offerings with Hadoop specific consulting services expanding the number of available services vendors. Organizations benefit from the large and growing education and consultancy services.<br />
•	Big Data is no longer limited to companies that can ‘roll their own’ as the application ecosystem expands rapidly. In addition to the rapid expansion of Hadoop applications, 2012 sees the emergence of applications and services that leverage an underlying Hadoop engine.</p>
<p>We’re looking forward to a big year. We hope you join us.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=201&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing version 1.2 of MapR&#8217;s Distribution for Hadoop</title>
		<link>http://www.mapr.com/?p=188&#038;option=com_wordpress&#038;Itemid=265</link>
		<comments>http://www.mapr.com/?p=188&#038;option=com_wordpress&#038;Itemid=265#comments</comments>
		<pubDate>Wed, 07 Dec 2011 07:10:27 +0000</pubDate>
		<dc:creator>Tomer</dc:creator>
				<category><![CDATA[MapR Technologies Blog]]></category>

		<guid isPermaLink="false">http://www.mapr.com/?p=188&#038;option=com_wordpress&#038;Itemid=265</guid>
		<description><![CDATA[Today we announced version 1.2 of the MapR Distribution for Apache Hadoop.  With this release, MapR continues to push the envelope by making Hadoop more accessible to  more users, more languages, and more platforms. This release includes numerous features and &#8230; <a href="http://www.mapr.com/?p=188&#038;option=com_wordpress&#038;Itemid=265">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today we announced version 1.2 of the MapR Distribution for Apache Hadoop.  With this release, MapR continues to push the envelope by making Hadoop more accessible to  more users, more languages, and more platforms. This release includes numerous features and capabilities including:</p>
<ul>
<li><strong>Ability to take advantage of next generation resource management framework:</strong> MapR users will be able to take advantage of MapReduce 2.0 once it is ready for production use. Although it is expected to take several months for the community to stabilize Hadoop 0.23, users will be able to take advantage of the combined benefits of MapReduce 2.0, such as backward-compatibility and scalability and MapR’s unique capabilities, such as HA (no lost tasks or jobs during a JobTracker or ApplicationMaster failure) and the high-performance shuffle.</li>
</ul>
<ul>
<li><strong>High-performance native access library:</strong> With Version 1.2, MapR provides a libhdfs implementation that bypasses Java altogether and provides high-performance access to the distributed file system from C/C++ applications and other compatible scripting languages. There is no need to recompile applications that use libhdfs, since the API (header file) is identical.</li>
</ul>
<ul>
<li><strong>Upgrade of various packages including HBase, Hive and Pig:</strong> The HBase package in the MapR distribution has been upgraded to release 0.90.4. In addition, MapR has identified several critical stability and data corruption issues in 0.90.4, which we have addressed by backporting 15 fixes from future HBase releases. Versions of Hive and Pig have also been upgraded in the MapR distribution, so users can leverage the latest bug fixes and features available from these Apache projects.</li>
</ul>
<ul>
<li><strong>MapR Virtual Machine (VM)</strong>. MapR now provides a VMWare virtual machine that allows users to experiment with the MapR distribution. Although this environment is not suitable for any performance or scale testing, it makes it easy to experiment with some of MapR’s unique capabilities, such as NFS and snapshots. The VM is also a great asset if you are new to Hadoop, because you could be up and running on any environment (e.g., your laptop) within minutes.</li>
</ul>
<ul>
<li><strong>Additional performance improvements</strong>. The MapR distribution is already 2-5x faster than other distributions on typical Hadoop workloads, including the standard DFSIO and Terasort benchmarks, resulting in a significant hardware cost reduction. The 1.2 release continues to push the envelope, with a number of performance improvements in the platform (file system and MapReduce layers).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mapr.com/?feed=rss2&amp;p=188&#038;option=com_wordpress&#038;Itemid=265</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

