Browser Bankruptcy 2023: Consensus, Durable Execution, Streaming, HTAP, TSDBs, and more...
Everything in my tabs as I close out the year. Catch you in '24!
I started this blog on October 31, 2023. It’ll be 2 months old as we start the new year. It’s been fun to write and I plan to keep it going in the new year. If you haven’t yet read my most popular post, it’s a good place to start:
Rather than a recap or predictions post, I thought it’d be fun to share what I’ve got in my browser tabs—stuff I haven’t been able to get to yet. I hope you find a link or two that pique your interest.
Consensus
Viewstamped Replication sucked me back into consensus protocols this year.
Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks
Building a Large-scale Distributed Storage System Based on Raft
Workflows, FaaS, durable execution
Durable execution blew up this year.
Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows
Streaming
The trend toward S3 persistence for streaming (with WarpStream [$]) captured my interest.
Clonos: Consistent Causal Recovery for Highly-Available Streaming Dataflows
Streaming from Apache Iceberg - Building Low-Latency and Cost-Effective Data Pipelines
DBSP: Automatic Incremental View Maintenance for Rich Query Languages
HTAP/multi-model databases
As I dug more into S3 persistence, I found plenty of other exemplary systems (e.g. Neon, Turbopuffer, Quickwit). I’ve been thinking lately about what it means to have all our data directly on S3. Does it make hybrid transaction/analytical processing (HTAP) and multi-model databases easier to build or more likely to be successful?
Running OLAP and OLTP Workloads on the Same Cluster with Workload Prioritization
Introducing Compute-Compute Separation for Real-Time Analytics
The Beauty of HTAP: Defining a Modern Data Architecture with TiDB
Embedded databases
Litestream, LiteFS, libsql, Turso, and SKDB have me pulling at the embedded (and edge) DB thread.
Building data-centric apps with a reactive relational database
Embedded databases (1): The harmony of DuckDB, KùzuDB and LanceDB
TreeLine: An Update-In-Place Key-Value Store for Modern Storage
Time-series databases
Some Prometheus (and frostdb) spelunking led me to InfluxDB’s new(ish) IOx storage engine, which uses Datafusion and Parquet.
PosgreSQL
PostgreSQL and its extensions continue to be a pragmatic solution to, well, everything.
Introducing pgroll: zero-downtime, reversible, schema migrations for Postgres
The Great Re-shard: adding Postgres capacity (again) with zero downtime
Analytics
Analytics Twitter continues to be very—obnoxiously—loud.
Why we develop on data locally and how to finally stop (Part 1)
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
Don’t Hold My Data Hostage – A Case For Client Protocol Redesign
You can support me by purchasing The Missing README: A Guide for the New Software Engineer for yourself or gifting it to new software engineers that you know.
I occasionally invest in infrastructure startups. Companies that I’ve invested in are marked with a [$] in this newsletter. See my LinkedIn profile for a complete list.