June 22, 2024
Amazon and Snowflake partner to enable modern data streaming pipeline with Snowpipe and Data Firehose

Join Gen AI enterprise leaders in Boston on March 27 for an exclusive night of networking, insights, and conversations surrounding data integrity. Request an invite here.

What does a modern real-time data streaming pipeline look like?

For Amazon Web Services (AWS) and Snowflake a modern data streaming pipeline makes it easy for organizations to get data in near real-time from one platform to another. 

The foundation of AI and data analytics is data, but sometimes it can be difficult to stream data from one place to another in an optimized approach. For example, an organization might have a lot of data in Amazon Web Services (AWS) and be using Snowflake for data analysis. AWS and Snowflake have now come together in a partnership that makes that scenario easier, with the integration of Amazon Data Firehose and Snowflake Snowpipe Streaming.

“No tech organization leader wants to force everybody to use one tool, because no one tool is generally the best for everything ” James Malone, senior director of product management at Snowflake told VentureBeat in an exclusive interview. “That means that it becomes incumbent on Snowflake on AWS and everybody to try and make things work seamlessly together, so I think this partnership is a reflection of making it easier to use multiple things together.”

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

Why AWS partnering with Snowflake for data streaming matters

Amazon Data Firehose is a fully managed service from AWS for delivering real-time streaming data from many different sources. Snowpipe Streaming is a feature in Snowflake that allows ingesting data into Snowflake tables in real time from external sources.

While it was possible in the past to get real-time streaming data from AWS into Snowflake, it wasn’t necessarily a seamless or optimized process. In an exclusive interview with VentureBeat, Mindy Ferguson, VP of messaging and streaming at AWS said that the core goal of the partnership is to enable simplicity.

“Really, the way to think about streaming in the modern streaming data pipeline is that customers are telling us they want simplification and they also want to see reduced cost,” Ferguson said. “So that was part of how, how and why we built this.”

The data lake still matters, but can be simplified for real-time streaming

Ferguson noted that another goal of the partnership is to reduce latency in the streaming process itself to better enable more real-time capabilities. Ferguson said that with the integration several extra steps are being removed that an organization used to have to take to enable the connection between AWS and Snowflake.

Before the integration of Amazon Data Firehose and Snowflake Snowpipe Streaming an organization could have enabled data stream from AWS to Snowflake using a data lake layer using Amazon S3 as an intermediate step. Data coming from the Data Firehouse would first have to be ingested and then land on S3 and from there Snowflake Snowpipe could get the data. The new integration simplifies the process by allowing data to go directly from Amazon Data Firehose into Snowflake Snowpipe Streaming.

“If you want to get your data into Snowflake, there’s really no need to have the intermediate storage area and customers have made that stop at a number of places, S3 being one of them,” Ferguson said. 

She added that in her view as organizations really start to think about how to optimize real-time data streaming, they will really consider how to use the actual storage of streaming to move data along in real time and land it at a destination as fast as possible.

The integration is currently in public beta. While the AWS Snowflake partnership can now enable data coming from AWS to easily be streamed into Snowflake for analysis, streaming the other way isn’t quite as optimized – yet. When asked if there is support for getting data easily from Snowflake via streaming into AWS for use with the Amazon Redshift data warehouse, Malone said that is not currently supported but hinted that it is a future capability.

“I won’t give any timelines but it’s something that has been a hot topic inside of Snowflake,” Malone said.

Source link