The best way to run Hive on Hadoop, on Kubernetes, and on Amazon AWS
As the de facto standard for SQL-based analytics on Hadoop, Apache Hive is a mature data warehouse system in wide use in industry. Unfortunately upgrading Hive on Hadoop is a tough decision because it almost inevitably runs into new dependency problems. As a consequence, many users reluctantly keep their first installation of Hive without enjoying the tremendous benefit of more recent releases.
LLAP (Low-Latency Analytical Processing) is a major component of Hive which allows it to far outperform competing technologies such as Presto and SparkSQL. Unfortunately enabling and configuring LLAP is excruciatingly difficult because of its complex architecture. It is so frustrating as to cause many users to reluctantly choose an alternative technology that is slower and less mature.
As the enterprise environment gravitates towards Kubernetes at an accelerating pace, the industry is urgently looking for a solution that will enable Hive to run on Kubernetes. Unfortunately only an expedient solution exists today which first operates Hadoop on Kubernetes and then runs Hive on Hadoop, thus introducing two layers of complexity. The right approach is to use an execution engine capable of communicating directly with Kubernetes.
Hive on MR3 is a robust solution that addresses all the pain points of Hive. Its core technology is a new execution engine MR3 which provides native support for both Hadoop and Kubernetes. Hive on MR3 is a significant improvement over Apache Hive in terms of both simplicity of operation and efficiency in execution.
On Hadoop, MR3 allows users to easily switch between different versions of Hive without upgrading Hadoop. All the major versions of Hive, from Hive 1 to Hive 4, can run in the same cluster. Hive on MR3 automatically achieves the performance of LLAP and beyond without requiring any further configuration.
On Kubernetes, Hive on MR3 directly creates and destroys worker Pods. All the enterprise features are equally available such as high availability, Kerberos-based security, SSL data encryption, authorization with Apache Ranger, and so on. On public clouds, Hive on MR3 can take advantage of autoscaling supported by MR3.
For users of Amazon AWS, Hive on MR3 includes key features for reducing the cost significantly. With in-memory or NVMe caching, the separation of compute and storage continues to work without performance penalty. With autoscaling, workers are created and destroyed dynamically to adapt to workload changes. With fault tolerance, spot instances can replace on-demand instances. For executing queries sporadically, workers can run on AWS Fargate.
As the design principle of MR3 lies in simplicity, we are confident that you will like Hive on MR3 much better than Apache Hive, whether on Hadoop or on Kubernetes. So give it a try!
Hive on MR3 also offers unique experiences to users. As we provide everything necessary to rebuild Hive on MR3, you can quickly apply patches from Apache Hive or your own changes whenever necessary. So customize Hive on MR3 without losing any enterprise features!