Storage
The LakeHouse tree in general follows the storage layout of N-way search tree map.
Each node file is in the Apache Arrow IPC format with suffix .ipc
.
Node File Schema
ID | Name | Arrow Type | Description | Required? | Default |
---|---|---|---|---|---|
1 | key | String | Name of the key, following Key Specification | no | |
2 | pvalue | String | File location pointer to the value of the key, following Location Specification | no | |
3 | pnode | String | File location pointer to the value of the node, following Location Specification | no |
System-Internal Rows for Root Node
System-internal keys such as lakehouse
will appear as the top rows in the file.
Such keys do not exist in non-root node, and do not participate in the tree storage algorithm.
Node Pointers
To read the node pointers, the reader for the tree should skip the keys until it reaches the ones starting with [space]
.
There are exactly N
rows of the file are reserved for the N
pointers of each node for a tree of order N
.
The fist row in these N
rows must have key
and pvalue
as NULL
because the first pointer points to all
keys that are smaller than the key of the second row.
A node might not have all N
child nodes yet. If there are k <= N
child nodes,
There will be N-k
rows with all column values as NULL
s.
Write Buffer
The write buffer rows start after the node pointer rows.
These rows must have a key
that is not NULL
, and the pnode
is always NULL
.
- When the
pvalue
isNULL
, it is a message to delete thekey
. - When the
pvalue
is notNULL
, it is a message to set the currentpvalue
of the key in the tree to the new one in the write buffer.
Node File Size
Each node is targeted for the same specific size, which is configurable in the LakeHouse definition.
The estimated size of the N
rows should be:
N * (
namespace_name_size_max_bytes +
table_name_size_max_bytes +
file_name_size_max_Bytes +
5 // 1 initial byte, 4 bytes for schema ID
)
and this value must be less than the node file size. This remaining size is used as the write buffer for each node.
For users that would like to fine-tune the performance characteristics of a TrinityLake tree, this formula can be used to readjust the node file size to achieve the desired epsilon value.
Node File Name
Non-root node file name will be in the form of a base64 encoded UUID with suffix .ipc
.
For example, if a UUID 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
is generated for the node file,
the original file name of the node file will be b8tRS7h4TJ2Vt43Dp85v2A.ipc
,
and that further goes through the file name optimization
to produce the final node file name.
For root node file, please refer to Transaction Specification for more details since the name is involved as a part of the transaction process.