Monday, September 23, 2019

Google Releases Cloud Dataproc for Kubernetes in Alpha - InfoQ.com

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has now announced the alpha availability of Cloud Dataproc for Kubernetes to provide customers with more efficiency to process data across platforms.

The Cloud Dataproc service has been generally available for over three years and now offers alpha access to Spark jobs on Google Kubernetes Engine (GKE) - meaning developers and data scientists can now run Apache Spark jobs on GKE clusters. Typically, Spark applications run on Hadoop YARN clusters; however, with Cloud Dataproc for Kubernetes, users will have one central view that can span both YARN and Kubernetes clusters and do not need to manage them separately. Furthermore, according to the announcement blog post, the support for both clusters will give enterprises more flexibility to modernize specific hybrid workloads while continuing to monitor YARN-based workloads.

Running Apache Spark on Kubernetes differs from running them on virtual machine-based-Hadoop clusters like on the CloudProc Dataproc service or competitive offerings like Amazon Web Services (AWS) Elastic MapReduce (EMR), and Microsoft's Azure HDInsight (HDI). Apache Spark is the first open-source processing engine Google brings to Cloud Dataproc on Kubernetes. And, the tech giant is planning to bring other open-source analytics components to Kubernetes as well, such as Apache Flink, Presto and Apache Druid. Furthermore, products like Anthos - now making GKE available virtually anywhere, allow customers to even take Cloud Dataproc to their own data centers or eventually to the Amazon Elastic Kubernetes Service (EKS) and Azure Kubernetes Services (AKS).

In the same Google announcement blog post, Matt Aslett, research vice president at 451 Research, said:

Enterprises are increasingly looking for products and services that support data processing across multiple locations and platforms. The launch of Cloud Dataproc on Kubernetes is significant in that it provides customers with a single control plane for deploying and managing Apache Spark jobs on Google Kubernetes Engine in both public cloud and on-premises environments.

Customers who want to try out Cloud Dataproc for Kubernetes will have to apply for access by emailing Google. Furthermore, the alpha release is intended for testing and experimentation purposes only. More details on Cloud Dataproc for Kubernetes are available on the How to Get Started blog post.



https://ift.tt/30nRBxu

No comments:

Post a Comment