Home Cyber Security Google fuses SQL, Python, and Spark in Colab Enterprise push • The Register

Google fuses SQL, Python, and Spark in Colab Enterprise push • The Register

0
Google fuses SQL, Python, and Spark in Colab Enterprise push • The Register


Google is promising a single notebook environment for machine learning and data analytics, integrating SQL, Python, and Apache Spark in one place.

Readers might note that other prominent vendors in the data science and analytics market have also tried to tackle the schism between SQL/analytics and the machine learning workbench.

Yasmeen Ahmad, managing director, Google Data Cloud, said that the greatest barrier to productivity in data science was switching between database/data warehouse environments to get data with SQL code, only to export and load it into a Python notebook to do machine learning, while configuring a separate Spark cluster. They might then switch to a BI tool just to visualize results, she said.

“Our priority is to eliminate this friction by creating the single, intelligent environment an architect needs to engineer, build, and deploy – not just run predictive models.”

Google is therefore previewing a number of enhancements to its Colab Enterprise notebooks in its BigQuery data warehouse and the ML platform Vertex AI, which it says will bring these ideas into reality.

Within Colab Enterprise notebooks, Google is previewing native SQL cells that let users employ SQL for data exploration and see the results in a BigQuery DataFrame, a Pythonic DataFrame and machine learning (ML) API powered by the BigQuery engine, where they can build models in Python. The Chocolate Factory is also previewing interactive visualization cells, which generate editable charts in the same environment, breaking the barrier between SQL, Python, and visualization, the vendor claimed.

Also in Colab Enterprise notebooks, Google offers Data Science Agent, which it claims to have enhanced to incorporate tool usage within its detailed plans, including the use of BigQuery ML for training and inferencing, BigQuery DataFrames for analysis using Python, or large-scale Spark transformations (currently in preview). Google announced BigQuery support for Apache Spark in 2022.

Google is not the only vendor to try to bridge the gap between data analytics and machine learning. For example, cloud data platform Snowflake introduced the Snowpark Connector in August. It builds on the Apache Spark community’s Spark Connect, which adopts a client-server architecture that allows any client application to connect to remote Spark clusters.

Snowflake says the Snowpark Connector lets Spark users run code in a client tied directly to its analytics engine, instead of managing a separate Spark cluster. By doing so, they can run all modern Spark DataFrame, Spark SQL, and user-defined function code within Snowflake.

Databricks moved to bring SQL support to its data lake environment, which includes Apache Spark, in 2020. ®



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here