Hybrid OLTP and OLAP (HTAP) require processing transactional and analytical queries in isolation to remove the interference between them. To achieve this, it is necessary to maintain two replicas of data specified for the two types of queries.
However, it is challenging to provide a consistent view for the two replicas within a storage system, because the storage system should allow analytical requests to efficiently read consistent and fresh data from transactional workloads and also scale to large data sizes with high availability.
To solve this problem, we propose a Raft-based HTAP database: TiDB. We design a multi-Raft storage system to materialize updates from transactional requests and to synchronize Raft logs with additional nodes that transform row format to column format, forming a column store.
This store is dedicated to analytical queries to efficiently read fresh and consistent data. Based on this storage system, we build an SQL engine to process large-scale distributed transactions and expensive analytical queries which optimally read row-format and column-format replicas of data.
In this talk, Ed will talk about TiDB and its columnar storage engine TiFlash from the perspective of engineering design, and explain how TiDB utilizes and extends Raft protocol to eliminate the impact of OLAP workload on OLTP, to achieve an industrial-grade HTAP database, and give relevant benchmarks and user cases.
Program committee comment
Architecture and underlying distributed algorithms underpinning modern, production ready NewSQL database, built scratch in Go language.
Download presentation