Skip to content

Iceberg Connector

TrinityLake can be used through the Spark Iceberg connector by leveraging the TrinityLake Iceberg Catalog or TrinityLake Iceberg REST Catalog integrations.

Configuration

TrinityLake Iceberg Catalog

To configure an Iceberg catalog with TrinityLake in Spark, you should:

For example, to start a Spark shell session with a TrinityLake Iceberg catalog named demo:

spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.0,io.trinitylake:trinitylake-spark-iceberg-runtime-3.5_2.12:0.0.1 \
  --conf spark.sql.extensions=org.apache.spark.iceberg.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.demo=org.apache.spark.iceberg.SparkCatalog \
  --conf spark.sql.catalog.demo.catalog-impl=io.trinitylake.iceberg.TrinityLakeIcebergCatalog \
  --conf spark.sql.catalog.demo.warehouse=s3://my-bucket

TrinityLake Iceberg REST Catalog

To configure an Iceberg REST catalog with TrinityLake in Spark, you should:

For example, to start a Spark shell session with a TrinityLake IRC catalog named demo that is running at http://localhost:8000:

spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.0,io.trinitylake:trinitylake-spark-iceberg-runtime-3.5_2.12:0.0.1 \
  --conf spark.sql.extensions=org.apache.spark.iceberg.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.demo=org.apache.spark.iceberg.SparkCatalog \
  --conf spark.sql.catalog.demo.type=rest \
  --conf spark.sql.catalog.demo.uri=http://localhost:8000

Operation Behavior

The TrinityLake Spark Iceberg connector offers the same operation behavior as the Iceberg catalog integration. See Operation Behavior in Iceberg Catalog for more details.

Using System Namespace

The TrinityLake Spark Iceberg connector offers the same system namespace support as the Iceberg catalog integration to perform operations like create lakehouse and list distributed transactions. See Using System Namespace in Iceberg Catalog for more details.

For examples:

-- create lakehouse
CREATE DATABASE sys;

SHOW DATABASES IN sys
---------
|name   |
---------    
|dtxns  |

-- list distributed transactions in lakehouse
SHOW DATABASES IN sys.dtxns
------------
|name      |
------------    
|dtxn_123  |
|dtxn_455  |

Using Distributed Transaction

The Spark Iceberg connector for TrinityLake offers the same distributed transaction support as the Iceberg catalog integration using multi-level namespace. See Using Distributed Transaction in Iceberg Catalog for more details.

For examples:

-- create a transaction with ID 1234
CREATE DATABASE system.dtxns.dtxn_1234
       WITH DBPROPERTIES ('isolation-level'='serializable')

-- list tables in transaction of ID 1234 under namespace ns1
SHOW TABLES IN sys.dtxns.dtxn_1234.ns1;
------
|name|
------     
|t1  |

SELECT * FROM sys.dtxns.dtxn_1234.ns1.t1;
-----------
|id |data |
-----------
|1  |abc  |
|2  |def  |

INSERT INTO sys.dtxns.dtxn_1234.ns1.t1 VALUES (3, 'ghi');

-- commit transaction with ID 1234
ALTER DATABASE sys.dtxns.dtxn_1234
      SET DBPROPERTIES ('commit' = 'true');