Hive on MR3 allows Apache Hive to run directly on Kubernetes without requiring an additional Hadoop layer. Spark on MR3 allows Apache Spark to share Hive Metastore in the same Kubernetes cluster. The enabling technology is an execution engine MR3 which provides native support for Kubernetes. Our product can package Grafana, Superset, and Apache Ranger as well.
As the enterprise environment gravitates towards Kubernetes at an accelerating pace, the industry is looking for a solution that enables Hive to run on Kubernetes. Unfortunately only an expedient solution exists today which first operates Hadoop on Kubernetes and then runs Hive on Hadoop, thus introducing two layers of complexity. For this problem, Hive on MR3 is a perfect solution ready for you.
LLAP (Low-Latency Analytical Processing) is a major component of Hive which allows it to far outperform competing technologies such as Presto and Spark SQL. Enabling LLAP, however, is excruciatingly difficult because of its complex architecture. Hive on MR3 automatically achieves the speed of LLAP, whether on Hadoop or on Kubernetes.
While building a Spark cluster on Kubernetes is relatively easy, optimizing compute resources for a multi-tenant cluster is far from trivial. This is because different Spark applications maintain their own sets of executors and do not share compute resources. Our solution enables multiple Spark applications to share compute resources, thus significantly increasing resource utilization.
For users of Hive and Spark on Amazon EMR, switching to MR3 on Amazon EKS immediately reduces the cost. With fault tolerance in MR3, you can use only spot instances for worker Pods. With fast autoscaling, idle instances are quickly removed.
As a general purpose execution engine, MR3 can easily execute legacy MapReduce code on Kubernetes. All the strengths of MR3, such as concurrent DAGs, cross-DAG container reuse, fault tolerance, and autoscaling, are available when running MapReduce jobs.