Native Connector
TrinityLake provides a native Spark connector using a Spark DataSource V2 (DSv2) catalog implementation for accessing a Trinity lakehouse. It also provides Spark SQL extensions for TrinityLake specific SQL operations like transactions.
Configuration
To configure a TrinityLake Spark DSv2 catalog, you should:
- Add TrinityLake Spark runtime package
io.trinitylake:trinitylake-spark-runtime-3.5_2.12
to the Spark classpath - Start a Spark session or application using
io.trinitylake.spark.TrinityLakeSparkCatalog
Spark catalog implementation - Add the TrinityLake Spark SQL extensions
io.trinitylake.spark.TrinityLakeSparkExtensions
to your Spark SQL extensions list - Set the following Spark catalog configurations:
Config name | Description | Required? | Default |
---|---|---|---|
storage.root | The root URI of the TrinityLake storage | Yes | |
storage.type | The type of storage | No | Inferred from storage.root scheme |
storage.ops.<key> | Any property configuration for a specific type of storage operation. See Storage for more details. | No | |
txn.isolation-level | The default isolation level for a transaction | No | default setting in Lakehouse Definition |
txn.ttl-millis | The default duration for which a transaction is valid in milliseconds | No | default setting in Lakehouse Definition |
For example:
spark-shell \
--packages io.trinitylake:trinitylake-spark-runtime-3.5_2.12:0.0.1 \
--conf spark.sql.extensions=io.trinitylake.spark.TrinityLakeSparkExtensions \
--conf spark.sql.catalog.demo=io.trinitylake.spark.TrinityLakeSparkCatalog \
--conf spark.sql.catalog.demo.storage.root=s3://my-bucket
SQL Extensions
TrinityLake provides the following Spark SQL extensions for its Spark connector:
CREATE LAKEHOUSE
Create a new lakehouse at the configured storage root location.
CREATE LAKEHOUSE [ IF NOT EXISTS ]
[ LOCATION location ]
[ LHPROPERTIES ( property_key = property_value [ , ... ] ) ]
BEGIN TRANSACTION
Begin a transaction.
BEGIN [ TRANSACTION ]
[ IDENTIFIED BY transaction_id ]
[ ISOLATION LEVEL { SERIALIZABLE | SNAPSHOT } ]
[ TXNPROPERTIES ( property_key = property_value [ , ... ] ) ]
COMMIT TRANSACTION
Commit a transaction.
This command can only be used after executing a BEGIN TRANSACTION
or LOAD TRANSACTION
SAVE TRANSACTION
Save the current transaction and exit the current transaction context.
This command can only be used after executing a BEGIN TRANSACTION
or LOAD TRANSACTION
LOAD TRANSACTION
Load a transaction of a given transaction ID, and resume its transaction context.
SET TRANSACTION ISOLATION LEVEL
Set the default isolation level for any new transactions in the session.