Apache Hive data warehouse on Amazon AWS

Do you need a data warehouse built with Apache Hive on Amazon AWS?

    

Building a data warehouse with Apache Hive can be challenging. On Amazon AWS, existing solutions are hard to use because of the inevitable complexity from the underlying Hadoop system. You need to learn not only Hive but also Hadoop.

An ideal solution should be easy to maintain by hiding or even eliminating the Hadoop layer. It should also be cheap to operate by delivering excellent performance while minimizing wasted resources. Finally it should not demand vendor lock-in.

Apache Hive data warehouse built on Amazon EKS

Our solution runs Apache Hive directly on Kubernetes without requiring an additional Hadoop layer. The enabling technology is a new execution engine MR3 which provides native support for Kubernetes. Our solution packages Hive with Grafana, Superset, and Apache Ranger. As it does not require Hadoop, our solution runs on Amazon EKS and stores all data on S3 using standard open data formats such as ORC and Parquet.

Data analysts can concurrently access the data warehouse with built-in Superset or their favorite BI tools, while the administrator can control access with Apache Ranger.

Reduce your AWS bill for the data warehouse

    

Our installation of Apache Hive will reduce your AWS bill significantly. On the TPC-DS benchmark, it runs at least twice faster than competing technologies such as Presto and Spark 3, and thus requires much less compute resources. With autoscaling, Hive workers are created and destroyed dynamically to adapt to workload changes. With fault tolerance, spot instances can replace on-demand instances.

How it works

We deploy our solution in your AWS account. Your sensitive data never leaves your AWS account. Our solution uses Apache Hive 3.1 with over 600 additional patches backported.

1. Connect

Give us scoped permissions on your AWS account. Specify basic configurations for your data warehouse. Then we will deploy our solution on Amazon EKS.

2. Execute

Once your data warehouse is ready, you can execute SQL queries right away with the built-in Superset or your favorite BI tool.

3. Manage

We manage your data warehouse in a transparent way. Hence you can also control it using the AWS console/CLI or a tool provided by us.

Pay as you go &
Pay like Amazon EMR

    

We use a simple pricing plan. We charge only on the compute resources for Hive workers, and just as much as the Amazon EMR price for the same compute resources. For example, as Amazon EMR charges $0.113 per hour for m5d.2xlarge instance type, we charge $0.113 per hour for the same instance type.

With better performance and faster autoscaling, our solution offers a much cheaper option than Amazon EMR.

Ready to get started?

If you are interested in our solution, you can request a demo or start your 14-day free trial. You can also try our solution yourself. Please contact us for details.

Latest news