Begun, The Catalog Wars Have

Chris

Jun 20, 2024

Data lake catalogs are pretty important, and vendors are figuring this out...

Read →

3 Comments

Kevin Liu

Jun 20, 2024

> Data lake catalogs have seen a flurry of activity over the past few weeks

I'm also announcing a catalog :)

https://kevinjqliu.substack.com/p/introducing-the-pythonic-iceberg

> Before continuing, readers should know that I owned Tabular shares.

nice!

> The layer above the table format—the catalog—is the more important layer; it’s where query engines integrate. All of the major table formats have moved upwards, growing a catalog. An integrated catalog, table format, and file format is compelling. Such a product contains the entry point to the data plane and much of the data plane itself. It is a good point to move further up the stack into the query engine layer.

Interoperable catalogs are even more important! With Iceberg REST catalog specification, companies can bring their own catalog (data plane) to any vendor.

Expand full comment

Robert Bastian

Jun 21, 2024

Spark Streaming and Flink operate on Kafka topics - so I'm curious why the catalog providers are n't supporting native Kafka topics in their metadata catalogs. Ideally a single catalog would have all my tables and topics.

With Confluent's adoption of Iceberg via TableFlow maybe they saw the writing on the wall?

Expand full comment

Alex Merced

Jun 21, 2024

Nessie is a catalog that people shouldn’t sleep on, it does have a growing ecosystem adopters, two platforms building on it as covered in this article:

https://www.dremio.com/blog/the-nessie-ecosystem-and-the-reach-of-git-for-data-for-apache-iceberg/

Also have this article that is a deep dive on the mechanics of Iceberg catalogs:

https://www.dremio.com/blog/the-evolution-of-apache-iceberg-catalogs/

Expand full comment