How to understand streaming table in Flink?

These dynamic tables don’t necessarily exist anywhere — it’s simply an abstraction that may, or may not, be materialized, depending on the needs of the query being performed. For example, a query that is doing a simple projection

SELECT a, b FROM events

can be executed by simply streaming each record through a stateless Flink pipeline.

Also, Flink doesn’t operate on mini-batches — it processes each event one at a time. So there’s no physical “table”, or partial table, anywhere.

But some queries do require some state, perhaps very little, such as

SELECT count(*) FROM events

which needs nothing more than a single counter, while something like

SELECT key, count(*) FROM events GROUP BY key

will use Flink’s key-partitioned state (a sharded key-value store) to persist the current counter for each key. Different nodes in the cluster will be responsible for handling events for different keys.

Just as “normal” SQL takes one or more tables as input, and produces a table as output, stream SQL takes one or streams as input, and produces a stream as output. For example, the SELECT count(*) FROM events will produce the stream 1 2 3 4 5 ... as its result.

There are some good introductions to Flink SQL on YouTube:, and there are training materials on github with slides and exercises:

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top