Overview
Note
The TrinityLake format specification requires specific knowledge about certain data structures, algorithms, database system and file system concepts. If you find anything described difficult to understand, please follow the corresponding links for further explanations, or make a contribution to help us improve this document.
Version
This document describes the TrinityLake format at version 0.0.0. Please see Versioning about the versioning semantics of this format.
Introduction
The TrinityLake format defines a LakeHouse-specific key-value map implemented using a B-epsilon tree.
- The keys of this map are IDs of objects in a LakeHouse
- The values of this map are location pointers to the Object Definitions
We denote such tree as the TrinityLake Tree, and denote a LakeHouse implemented using the TrinityLake format as a Trinity LakeHouse.
The TrinityLake format contains the following specifications:
- The TrinityLake tree is persisted in storage and follows Storage Specification.
- The TrinityLake tree is assessed and updated following the Transaction Specification.
- The object definitions are persisted in storage and follows the Object Definition File Specification
- The key names in a TrinityLake tree follow the Key Encoding Specification.
- The locations used in a TrinityLake tree follow the Location Specification.
Example
Here is an example logical representation of a TrinityLake tree:
This Trinity LakeHouse is a tree of order 3, with the following objects:
- Namespace
ns1
- Table
table1
- Table
- Namespace
ns2
- Table
table1
: this is an Apache Iceberg table, which further points to its own metadata JSON file, manifests Avro files and Parquet/Avro/ORC data files, following the Iceberg table format specification. Note that these files in the Iceberg table are not considered a part of the TrinityLake and does not need to follow the Location Specification. - Index
index1
- Table
- Namespace
ns3
- Materialized View
mv1
- Table
table1
- Materialized View
There are also a set of write operations that are performed against the objects in the write buffer of the TrinityLake tree:
- Update the definition of existing table
table1
inns1
- Create a new materialized view
mv2
in namespacens2
- Delete existing table
table1
in namespacens3