Hive on MR3

The best way to run Apache Hive in production

What do we do?

We develop an execution engine MR3 for big data processing and maintain it main application - Hive on MR3. Our execution engine MR3 provides native support for both Hadoop and Kubernetes.

We provide a quick and ready solution to the following problems.

#1. You want to run Hive directly on Kubernetes.

As the enterprise environment gravitates towards Kubernetes at an accelerating pace, the industry is looking for a solution that enables Hive to run directly on Kubernetes. Unfortunately only an expedient solution exists today which first operates Hadoop on Kubernetes and then runs Hive on Hadoop. For this problem, Hive on MR3 is a perfect solution ready for you.

#2. You want to install Hive on Hadoop or upgrade an old version of Hive running on Hadoop.

Installing Hive or upgrading to a higher version of Hive is not a simple task. It usually requires upgrades of a few important dependencies of Hive and may even require an upgrade of Hadoop itself. In contrast, Hive on MR3 is very easy to install and requires no change to an existing installation of Hadoop.

#3. You want to run Hive without installing Kubernetes or Hadoop.

MR3 supports standalone mode which does not require a resource manager such as Hadoop and Kubernetes. By exploiting standalone mode, you can run Hive virtually in any type of cluster. Installing Hive on MR3 in standalone mode is also very easy.

Why is our solution better than alternatives?

#1. Hive on MR3 is stable with 700+ security and critical patches backported.

Anyone who manually builds Apache Hive 3 (such as version 3.1.3) soon discovers that it is not really ready for production use because many important patches have not been merged into the source repository. We have backported over 700 important patches to Hive on MR3 and keep backporting more patches.

#2. Hive on MR3 achieves the speed of LLAP and runs as fast as Trino.

LLAP (Low-Latency Analytical Processing) is a major component of Hive which allows it to far outperform competing technologies. Enabling LLAP, however, is excruciatingly difficult because of its complex architecture. Hive on MR3 automatically achieves the speed of LLAP with no additional configuration. From our latest performance evaluation, Hive on MR3 (with fault tolerance) runs as fast as Trino (without fault tolerance).

#3. Hive on MR3 can run concurrently with Spark on MR3.

Hive on MR3 can run concurrently with Spark on MR3 which is another major application of MR3. In comparison with the vanilla version, Spark on MR3 enables multiple Spark applications to share compute resources, thus significantly increasing resource utilization.

#4. No vendor lock-in

Running Hive on MR3 means that there is no risk of vendor lock-in. Since Hive on MR3 runs with Hive Metastore, the user can switch back to Apache Hive or an alternative technology at any time.

Our solution is being deployed in production environments.

Intrusion, USA

Intrusion, a leader in cybersecurity, has adopted Hive on MR3 on Kubernetes for its data warehouse. Previously Intrusion was running HDP (Hortonworks Data Platform). All the existing code, including SQL queries, Java UDFs, and Python transform scripts, has been successfully migrated.

Shuyun, China

Shuyun, a provider of data services for marketing enterprises, uses Hive on MR3 on Hadoop to power its data warehouse of 1 petabyte. After switching from HDP (Hortonworks Data Platform) to Hive on MR3, the data analysis department has seen an increase in productivity by a factor of 10!

Ready to get started?

If you are interested in our solution, you can try it yourself or request a demo. For any question about our solution, please contact us.

Latest news