The best way to operate Apache Hive in production
We develop an execution engine MR3 for big data processing and maintain it main application - Apache Hive on MR3. Our execution engine MR3 provides native support for both Hadoop and Kubernetes, as well as standalone mode. We actively maintain both Hive 3 and Hive 4.
#1. You want to install Hive on Hadoop or upgrade from an old version of Hive. |
Installing Hive or upgrading to a higher version of Hive on the Hadoop platform is not a simple task. Hive on MR3 is very easy to install and requires no change to an existing installation of Hadoop. |
#2. You want to run Hive directly on Kubernetes. |
As the enterprise environment gravitates towards Kubernetes at an accelerating pace, the industry is looking for a solution that enables Hive to run directly on Kubernetes. For this problem, Hive on MR3 is a perfect solution ready for you. |
#3. You want to run Hive without installing Kubernetes or Hadoop. |
MR3 supports standalone mode which does not require a resource manager such as Hadoop and Kubernetes. By exploiting standalone mode, you can run Hive on MR3 virtually in any type of cluster. |
#4. You are running Hive or Spark on Amazon AWS and want to reduce the cost. |
For users of Amazon AWS, switching to Hive on MR3 immediately reduces the cost. With fault tolerance in MR3, you can use spot instances for worker pods. With fast autoscaling, idle instances are quickly removed. With LLAP I/O cache, you can significantly reduce the network traffic to S3. |
#1. Hive 3 on MR3 is stable with about 800 security and critical patches backported. |
Anyone who manually builds Apache Hive 3 (such as version 3.1.3) soon discovers that it is not really ready for production use because many important patches have not been merged. We have backported about 800 important patches to Hive 3 on MR3 and keep backporting more patches. |
#2. Hive on MR3 achieves the speed of LLAP and beyond. |
LLAP (Low-Latency Analytical Processing) is a major component of Hive which allows it to far outperform competing technologies. Enabling LLAP, however, is excruciatingly difficult because of its complex architecture. Hive on MR3 automatically achieves the speed of LLAP and beyond with no additional configuration. For comparison, Hive on MR3 runs as fast as Trino and much faster than Spark on the 10TB TPC-DS benchmark. |
#3. Hive on MR3 achieves a much higher throughput than Hive on Tez. |
A common use case of Hive on Tez is to run ETL (Extract-Transform-Load) jobs. By virtue of its advanced resource sharing model, Hive on MR3 can deliver significant cost savings, especially if many ETL jobs are run concurrently. |
#4. Hive on MR3 supports Java 17. |
Unlike Apache Hive which still requires Java 8, Hive on MR3 can run with Java 17. By switching to Java 17, Hive on MR3 can reduce the running time by up to 30%. |
#5. Hive on MR3 supports Remote Shuffle Service. |
Remote Shuffle Service is being adopted by a growing number of technologies because of its numerous potentials. Hive on MR3 is also evolving fast to support Remote Shuffle Service. Currently, Hive on MR3 supports Apache Celeborn as Remote Shuffle Service and can eliminate over 95% of local disk writes. |
#6. No vendor lock-in. |
Running Hive on MR3 means that there is no risk of vendor lock-in. Since Hive on MR3 runs with Hive Metastore, the user can switch back to Apache Hive or an alternative technology at any time. Furthermore, user of Hive 4 on MR3 can leverage Iceberg as the storage layer. |
You can customize Hive on MR3. |
Hive on MR3 consists of three components: MR3, Tez for MR3, and Hive for MR3. Users can easily rebuild Hive on MR3 after backporting additional patches of their choice from Apache Hive. |
Hive on MR3 is more open than popular open source products. |
Unlike typical open source products which often omit critical features in their community editions, we provide all the enterprise features available in MR3. Since it is also customizable by users, Hive on MR3 is in fact more open than popular open source products that do not release the source code for their enterprise editions. |
Hive on MR3 is an affordable alternative to Cloudera solutions. |
For users of Cloudera solutions, switching to Hive on MR3 can significantly reduce the cost. Moreover, operating Hive on MR3 is much simpler, especially on Kubernetes. |
If you are interested in our solution, you can try it yourself or request a demo. For any question about our solution and commercial licenses, please contact us.