Hive on MR3 & Spark on MR3

Hive and Spark powered by a new execution engine MR3

What do we do?

We develop a new execution engine MR3 and maintain its two applications - Hive on MR3 and Spark on MR3. With excellent performance and rich features, MR3 is a significant improvement over its predecessors MapReduce and Tez. Our execution engine MR3 provides native support for both Hadoop and Kubernetes.

We provide a quick and ready solution to the following problems.

#1. You want to run Hive directly on Kubernetes.

As the enterprise environment gravitates towards Kubernetes at an accelerating pace, the industry is looking for a solution that enables Hive to run directly on Kubernetes. Unfortunately only an expedient solution exists today which first operates Hadoop on Kubernetes and then runs Hive on Hadoop. For this problem, Hive on MR3 is a perfect solution ready for you.

#2. You want to upgrade an old version of Hive running on Hadoop (like HDP and CDH).

Upgrading to a higher version of Hive is not a simple task. It usually requires upgrades of a few important dependencies of Hive and may even require an upgrade of Hadoop itself. In contrast, Hive on MR3 is very easy to install and requires no change to an existing installation of Hadoop.

#3. You are running Hive and Spark on Amazon EMR and want to reduce the cost.

For users of Hive and Spark on Amazon EMR, switching to MR3 on Amazon EKS immediately reduces the cost. With fault tolerance in MR3, you can use only spot instances for worker Pods. With fast autoscaling, idle instances are quickly removed. Moreover our solution automatically configures Hive and Spark to share Metastore, thus making it trivial to run Hive and Spark together.

Why is our solution better than alternatives?

#1. Hive on MR3 is stable with 700+ security and critical patches backported.

Anyone who manually builds Apache Hive 3 (such as version 3.1.3) soon discovers that it is not really ready for production use because many important patches have not been merged into the source repository. We have backported over 700 important patches to Hive on MR3 and keep backporting more patches.

#2. Hive on MR3 achieves the speed of LLAP with no additional configuration.

LLAP (Low-Latency Analytical Processing) is a major component of Hive which allows it to far outperform competing technologies such as Presto and Spark SQL. Enabling LLAP, however, is excruciatingly difficult because of its complex architecture. Hive on MR3 automatically achieves the speed of LLAP, whether on Hadoop or on Kubernetes.

#3. Spark on MR3 maximizes resource utilization in multi-tenant environments.

While building a Spark cluster is relatively easy, optimizing compute resources for a multi-tenant cluster is far from trivial. This is because different Spark applications maintain their own sets of executors and do not share compute resources. Spark on MR3 enables multiple Spark applications to share compute resources, thus significantly increasing resource utilization.

Our solution is being deployed in production environments.

Intrusion, Inc.

Intrusion, a leader in cybersecurity, has adopted Hive/Spark on MR3 on Kubernetes for its data warehouse. Previously Intrusion was running HDP (Hortonworks Data Platform) based on Hadoop. All the existing code, including SQL queries, Java UDFs, and Python transform scripts, has been successfully migrated.

Ready to get started?

If you are interested in our solution, you can try it yourself or request a demo. For any question about our solution, please contact us.

Latest news