Iceberg Connector

TrinityLake can be used through the Spark Iceberg connector by leveraging the TrinityLake Iceberg Catalog or TrinityLake Iceberg REST Catalog integrations.

Configuration

TrinityLake Iceberg Catalog

To configure an Iceberg catalog with TrinityLake in Spark, you should:

Add TrinityLake Spark Iceberg runtime package to the Spark classpath
Use the Spark Iceberg connector configuration for custom catalog.

For example, to start a Spark shell session with a TrinityLake Iceberg catalog named demo:

spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.0,io.trinitylake:trinitylake-spark-iceberg-runtime-3.5_2.12:0.0.1 \
  --conf spark.sql.extensions=org.apache.spark.iceberg.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.demo=org.apache.spark.iceberg.SparkCatalog \
  --conf spark.sql.catalog.demo.catalog-impl=io.trinitylake.iceberg.TrinityLakeIcebergCatalog \
  --conf spark.sql.catalog.demo.warehouse=s3://my-bucket

TrinityLake Iceberg REST Catalog

To configure an Iceberg REST catalog with TrinityLake in Spark, you should:

Start your TrinityLake IRC server
Add TrinityLake Spark Iceberg runtime package to the Spark classpath
Add TrinityLake Spark extension to the list of Spark SQL extensions io.trinitylake.spark.iceberg.TrinityLakeIcebergSparkExtensions
Use the Spark Iceberg connector configuration for IRC.

For example, to start a Spark shell session with a TrinityLake IRC catalog named demo that is running at http://localhost:8000:

spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.0,io.trinitylake:trinitylake-spark-iceberg-runtime-3.5_2.12:0.0.1 \
  --conf spark.sql.extensions=org.apache.spark.iceberg.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.demo=org.apache.spark.iceberg.SparkCatalog \
  --conf spark.sql.catalog.demo.type=rest \
  --conf spark.sql.catalog.demo.uri=http://localhost:8000

Operation Behavior

The TrinityLake Spark Iceberg connector offers the same operation behavior as the Iceberg catalog integration. See Operation Behavior in Iceberg Catalog for more details.

Using System Namespace

The TrinityLake Spark Iceberg connector offers the same system namespace support as the Iceberg catalog integration to perform operations like create lakehouse and list distributed transactions. See Using System Namespace in Iceberg Catalog for more details.

For examples:

-- create lakehouse
CREATE DATABASE sys;

SHOW DATABASES IN sys
---------
|name   |
---------    
|dtxns  |

-- list distributed transactions in lakehouse
SHOW DATABASES IN sys.dtxns
------------
|name      |
------------    
|dtxn_123  |
|dtxn_455  |

Using Distributed Transaction

The Spark Iceberg connector for TrinityLake offers the same distributed transaction support as the Iceberg catalog integration using multi-level namespace. See Using Distributed Transaction in Iceberg Catalog for more details.

For examples:

-- create a transaction with ID 1234
CREATE DATABASE system.dtxns.dtxn_1234
       WITH DBPROPERTIES ('isolation-level'='serializable')

-- list tables in transaction of ID 1234 under namespace ns1
SHOW TABLES IN sys.dtxns.dtxn_1234.ns1;
------
|name|
------     
|t1  |

SELECT * FROM sys.dtxns.dtxn_1234.ns1.t1;
-----------
|id |data |
-----------
|1  |abc  |
|2  |def  |

INSERT INTO sys.dtxns.dtxn_1234.ns1.t1 VALUES (3, 'ghi');

-- commit transaction with ID 1234
ALTER DATABASE sys.dtxns.dtxn_1234
      SET DBPROPERTIES ('commit' = 'true');