Introduction to Spark Structured Streaming

Overview

IBM Client Center
1 Rogers Street
Cambridge, MA
United States

Thursday, 23 March 2017 - 6:00pm

View Event Page

Details

This is a reschedule of the Meetup originally planned for 2/9/2017 that we had to cancel due to the foot of snow that day.


It would be greatly appreciated that in addition to RSVPing to the Meetup that you also register at the following link in order to expedite check-in the day of the meetup.


REGISTER HERE

Spark Structured Streaming


Structured Streaming is a new scalable and fault-tolerant stream processing engine built on the Spark SQL engine. Structured Streaming allows you to express your streaming computation the same way you would express a computation on static data. This has two benefits. The first is code reuse as essentially the same queries be run on batch, interactive or streaming data. Second, it simplifies streaming application development as you can operate on streams of data just like you can on static data using DataFrames. Structured Streaming abstracts away the complexity of streaming analytics allowing you to perform streaming analytics without having to reason about streaming.

The Spark SQL engine takes care of running Structured Streaming queries and incrementally and continuously updating the result as streaming data continues to arrive. With Spark Structured Streaming, you can express streaming aggregations, event-time windows, as well as join streaming data to static data.

In this session, well walk through the basics of Structured Streaming, its programming model and APIs. The concepts will be illustrated using code examples. Then, well walk through a demo of analyzing both static and streaming sensor data to show how the same queries can be used on each, thereby simplifying streaming analytics application development, and how static and streaming data can be leveraged together.