Frequently Asked Questions

Basics

Q. What is MR3?

A. It is a new execution engine for big data processing which provides native support for both Hadoop and Kubernetes.

Q. What applications of MR3 are available?

A. Currently Hive on MR3, which is Apache Hive powered by MR3, is its main application.

Q. What components does Hive on MR3 use?

A. It uses Hive for MR3 (Apache Hive extended to use MR3 as the execution engine) and Tez for MR3 (Apache Tez extended to support MR3).

Q. Why does Hive on MR3 use Apache Tez?

A. Because compiling and running Hive requires the runtime library of Tez. MR3 implements all the core components expected of an execution engine and borrows only the runtime library of Tez.

Comparison with alternative technologies

Q. Who is the intended audience of Hive on MR3?

A. Anyone running Hive, Impala, Presto, or SparkSQL on Hadoop, and anyone planning to run SQL-based analytics on Kubernetes.

Q. I am currently using Hive-LLAP. Why should I use Hive on MR3?

A. Because Hive on MR3 is much easier to operate and more performant in concurrent environments. For the benefit of switching to Hive on MR3, see Comparison with Hive-LLAP in MR3docs.

Q. I am currently using Impala on Hadoop. Why should I use Hive on MR3?

A. Because Hive on MR3 is more mature. Besides MR3 uses ephemeral Yarn Containers and thus allows the user to make better use of cluster resources.

Q. I am currently using Presto/SparkSQL on Hadoop. Why should I use Hive on MR3?

A. Because Hive on MR3 is much faster. In our evaluation based on the TPC-DS benchmark with a scale factor of 10 terabytes, Hive on MR3 runs about three times faster than Presto and about four times faster than SparkSQL with respect to the geometric mean of execution times. For concurrent queries, Hive on MR3 runs even faster. This means that Hive on MR3 requires only a third/fourth as many nodes as Presto/SparkSQL in order to achieve the same level of throughput. Since Presto/SparkSQL runs with Hive Metastore, it is pretty easy to migrate to Hive on MR3.

Q. Do you have experimental results comparing Hive on MR3 against Hive-LLAP, Impala, Presto, and SparkSQL?

A. Yes, we have evaluated these systems using the TPC-DS benchmark on four separate production-grade clusters. For the latest performance evaluation result, see our Blog.

License and the MR3 distribution

Q. Is MR3 an open source project?

A. No, it is a private project. On the other hand, both Hive for MR3 and Tez for MR3 are licensed under the Apache License 2.0. See below for why we do not release MR3 as open source.

Q. What does the MR3 distribution include?

A. It includes pre-built binaries of MR3 as well as all necessary tools to rebuild Hive on MR3 for both Hadoop and Kubernetes. In this way, we enable the user to customize Hive on MR3 by applying patches to Hive for MR3 and Tez for MR3 in his or her own way.

Q. Is the MR3 distribution free to use?

A. Yes. Moreover you may include MR3 as part of your software system for your own use or on behalf of a third party. For more details, check out License.

Q. What enterprise features of Hive on MR3 are missing in the MR3 distribution?

A. None ‒ the MR3 distribution fully supports all the enterprise features available in Hive on MR3.

Q. What restrictions does the MR3 distribution have?

A. On Kubernetes, Hive on MR3 can use up to 512 gigabytes for the aggregate memory of worker Pods. For example, the user can use 16 nodes each with 32 gigabytes of memory or 8 nodes each with 64 gigabytes of memory for worker Pods. On Hadoop, there is no limit on the aggregate memory of worker Containers.

Q. How can I increase the limit on the aggregate memory of worker Pods on Kubernetes?

A. The user should purchase a commercial use license based on the new limit. For more details, check out Licensing and Support.

Q. Why do you not release MR3 as open source?

A. We do not release MR3 as open source for three reasons. First, unlike a typical open source product which often omits critical features in its community edition, the MR3 distribution fully supports all the enterprise features available in Hive on MR3. Second we even enable the user to customize Hive on MR3 by applying patches to Hive for MR3 and Tez for MR3. Finally our commercial-grade documentation is available for free in MR3docs. Hence we believe that MR3 is more open than popular open source products that do not release the source code for their enterprise editions.

Q. Can I purchase a source code license for MR3?

A. Yes. For more details, check out Licensing and Support. Note that purchasing a source code license automatically implies no limit on the aggregate memory of worker Pods on Kubernetes.

Evaluating Hive on MR3

Q. How can I test Hive on MR3 with minimum effort?

A. See Quick Start Guide in MR3docs to learn how to quickly test Hive on MR3 on a local machine, on Hadoop, on Kubernetes, on Amazon AWS, or in a Docker environment. In any case, no change to the underlying system is necessary.

Q. Where should I go for help while testing Hive on MR3?

A. The user can ask questions in MR3 Google Group.

Q. Can I evaluate Hive on MR3 on Kubernetes with a higher limit on the aggregate memory of worker Pods?

A. Yes, we can provide the user with a custom MR3 distribution which is valid for two months. After two months of evaluation, the user can decide whether or not to purchase a commercial use license. For more details, contact us.

Q. Do you provide technical support for Hive on MR3?

A. For users who purchase a commercial use license or a source code license, we provide technical support to assist in the deployment and maintenance of Hive on MR3. For more details, check out Licensing and Support.

Q. How can I avoid vendor lock-in with Hive on MR3?

A. As Hive on MR3 runs with Hive Metastore, the user can switch back to Apache Hive or an alternative technology at any time.

Features

Q. Is MR3 fault-tolerant?

A. Yes. In fact, MR3 provides better support for fault tolerance than Tez and Spark. For more details, see Fault Tolerance in MR3docs.

Q. Does MR3 support Kerberos-based security?

A. Yes.

Q. Does Hive on MR3 support high availability?

A. Yes. Moreover multiple instances of HiveServer2 can share common Yarn Containers or Kubernetes Pods. For more details, see High Availability in MR3docs.

Q. Does Hive on MR3 integrate with Apache Ranger?

A. Yes. For more details, see Integrating Apache Ranger in MR3docs.

Q. Does Hive on MR3 support autoscaling on Kubernetes?

A. Yes. For more details, see Autoscaling in MR3docs. The quick start guide On EKS with Autoscaling in MR3docs demonstrates the use of autoscaling on Amazon EKS.

Q. What plan do you have for extending MR3?

A. In addition to continuously updating MR3 to meet new requirements, we are always extending MR3 with new features. For more details, see Roadmap in MR3docs.

Q. What if I need new features in MR3?

A. In rare cases where the user needs extensions to MR3, just ask us – we will be happy to implement your requests at no charge.

MR3

Q. In what language is MR3 written?

A. MR3 is written in Scala.

Q. Can you give a short history of MR3?

A. The development of MR3 began in July 2015 after two years of preliminary research where we developed a prototype execution engine called MR2 (written in Java). The first release MR3 0.1 was out in March 2018. MR3 1.0 was released by DataMonad in February 2020.

Q. Is MR3 designed specifically for Apache Hive?

A. No, MR3 is a general purpose execution engine and can execute MapReduce/Tez jobs. As an application, Hive on MR3 supports compaction without relying on MapReduce because MapReduce jobs performing compaction are directly sent to MR3. For more details, see Compaction in MR3docs.