Skip to content

Latest commit

 

History

History
494 lines (370 loc) · 12.8 KB

File metadata and controls

494 lines (370 loc) · 12.8 KB

Lightdash Python SDK Guide

This guide covers the query builder, filtering system, and other features of the Lightdash Python SDK.

Quick Start

from lightdash import Client

client = Client(base_url="https://app.lightdash.cloud", token="your-token")
model = client.get_model("your_project_uuid", "your_model_name")

# Build and execute a query
result = (
    model.query()
    .metrics(model.metrics.revenue, model.metrics.profit)
    .dimensions(model.dimensions.country)
    .filter(model.dimensions.status == "active")
    .sort(model.metrics.revenue.desc())
    .limit(100)
    .execute()
)

df = result.to_df()

Query Builder

The SDK provides two patterns for building queries: single-call and chainable builder.

Single-Call Pattern

Pass all parameters at once—useful for simple queries:

query = model.query(
    metrics=[model.metrics.revenue, model.metrics.profit],
    dimensions=[model.dimensions.country],
    filters=model.dimensions.status == "active",
    sort=model.metrics.revenue.desc(),
    limit=100
)
result = query.execute()

Chainable Builder Pattern

Build queries incrementally with method chaining—each method returns a new immutable Query object:

query = (
    model.query()
    .metrics(model.metrics.revenue)
    .dimensions(model.dimensions.country, model.dimensions.date)
    .filter(model.dimensions.status == "active")
    .sort(model.metrics.revenue.desc())
    .limit(100)
)

Key characteristics:

  • Immutable: Each method returns a new Query object (safe for reuse)
  • Lazy evaluation: API calls only happen when .execute() is called
  • Order-independent: Methods can be called in any order
  • Composable: Create base queries and extend them
# Create a reusable base query
base = model.query().metrics(model.metrics.revenue).dimensions(model.dimensions.country)

# Extend it for different use cases
by_active = base.filter(model.dimensions.status == "active")
by_inactive = base.filter(model.dimensions.status == "inactive")

Filters

Filters let you constrain your query results. The SDK provides an intuitive, Pythonic API for creating filters.

Creating Filters with Operators

Use standard Python comparison operators on dimensions:

# Equality
f = model.dimensions.country == "USA"

# Inequality
f = model.dimensions.country != "USA"

# Numeric comparisons
f = model.dimensions.amount > 1000
f = model.dimensions.amount >= 1000
f = model.dimensions.amount < 500
f = model.dimensions.amount <= 500

Filter Helper Methods

For more complex filters, use helper methods:

# Check if value is in a list
f = model.dimensions.country.in_(["USA", "UK", "Canada"])

# String operations
f = model.dimensions.name.starts_with("John")
f = model.dimensions.name.ends_with("son")
f = model.dimensions.name.includes("Smith")

# Null checks
f = model.dimensions.email.is_null()
f = model.dimensions.email.is_not_null()

# Range checks
f = model.dimensions.order_date.between("2024-01-01", "2024-12-31")
f = model.dimensions.amount.not_between(100, 500)

Supported Operators by Data Type

Operator Numeric String Boolean Date
is null Yes Yes Yes Yes
is not null Yes Yes Yes Yes
equals / is Yes Yes Yes Yes
is not Yes Yes - Yes
is less than Yes - - -
is greater than Yes - - -
starts with - Yes - -
ends with - Yes - -
includes - Yes - -
in the last - - - Yes
in the next - - - Yes
in the current - - - Yes
is before - - - Yes
is after - - - Yes
is between Yes - - Yes
is not between Yes - - Yes

Combining Filters

Use & (AND) and | (OR) operators to combine filters:

# AND: Both conditions must be true
f = (model.dimensions.country == "USA") & (model.dimensions.amount > 1000)

# OR: Either condition must be true
f = (model.dimensions.status == "active") | (model.dimensions.status == "pending")

# Complex combinations
f = (
    (model.dimensions.country == "USA") &
    ((model.dimensions.amount > 1000) | (model.dimensions.priority == "high"))
)

Multiple .filter() calls on a query are combined with AND logic:

query = (
    model.query()
    .filter(model.dimensions.country == "USA")
    .filter(model.dimensions.amount > 1000)  # AND-ed with above
)

Multiple Filters on the Same Field

You can apply more than one condition to the same field—for example, to constrain a date dimension to a range:

query = model.query().filter(
    (model.dimensions.order_date >= "2026-01-01") &
    (model.dimensions.order_date <= "2026-01-31")
)

The same works across separate .filter() calls, since they are AND-ed together:

query = (
    model.query()
    .filter(model.dimensions.order_date >= "2026-01-01")
    .filter(model.dimensions.order_date <= "2026-01-31")
)

Filtering on Table Calculations

Define a table calculation, add it to the query, then filter on it with the same operators you use for dimensions. Table-calculation conditions are sent under filters.tableCalculations:

from lightdash import TableCalculation

profit_ratio = TableCalculation(
    name="profit_ratio",
    sql="${orders.profit} / ${orders.revenue}",
)

query = (
    model.query()
    .metrics(model.metrics.revenue, model.metrics.profit)
    .table_calculations(profit_ratio)
    .filter(profit_ratio > 0.2)  # only rows where the ratio exceeds 20%
)

Dimension and table-calculation filters can be combined with & (AND); each is serialized under its own key:

query = (
    model.query()
    .table_calculations(profit_ratio)
    .filter((model.dimensions.country == "USA") & (profit_ratio > 0.2))
)

Note on type: TableCalculation defaults type="number". The data type must be set for filter operators to compile — an untyped calculation is treated as a string by the API and rejects numeric operators like > or between. For a non-numeric calculation, pass type="string" (or date, timestamp, boolean).

Limitation — | (OR) across field types: Lightdash sends dimension, metric, and table-calculation filters as separate groups that are AND-ed together at the top level. An OR that mixes field types — e.g. (model.dimensions.country == "USA") | (profit_ratio > 0.2) — therefore cannot be expressed faithfully: it serializes as two independent groups that the server combines with AND, not OR. OR works as expected within a single field type (e.g. (profit_ratio > 0.8) | (profit_ratio < 0.2)).


Dimensions and Metrics

Access dimensions and metrics as attributes on the model:

# Access via attribute
country = model.dimensions.country
revenue = model.metrics.revenue

# List all available
all_dimensions = model.dimensions.list()
all_metrics = model.metrics.list()

Features:

  • Lazy loading: Fetched from API on first access, then cached
  • Fuzzy matching: Typos suggest closest matches
  • Tab completion: Works in Jupyter/IPython for discovery
  • Rich display: HTML rendering in notebooks

Sorting

Sort results using the .sort() method or Sort class:

from lightdash import Sort

# Using metric/dimension methods (recommended)
query = model.query().sort(model.metrics.revenue.desc())
query = model.query().sort(model.dimensions.country.asc())

# Multiple sorts
query = model.query().sort(
    model.metrics.revenue.desc(),
    model.dimensions.country.asc()
)

# Control null positioning
query = model.query().sort(model.dimensions.name.asc(nulls_first=True))

# Using Sort class directly
query = model.query().sort(Sort("orders_revenue", descending=True))

Results

Query results implement a unified ResultSet interface:

Converting Results

result = query.execute()

# To pandas DataFrame
df = result.to_df()  # or result.to_df(backend="pandas")

# To polars DataFrame
df = result.to_df(backend="polars")

# To list of dictionaries
records = result.to_records()

# To JSON string
json_str = result.to_json_str()

Iterating Over Results

# Iterate over rows
for row in result:
    print(row)

# Get total count
total = len(result)

Large Result Sets

There is no fixed SDK row cap. A query can return as many rows as the instance allows — its configured query.maxLimit (discovered automatically from /api/v1/health; e.g. 100,000 on Lightdash Cloud). Request more than that and the SDK raises a clear ValueError instead of letting the server silently return a truncated result:

# Fetch a large extract — pages are streamed transparently
result = model.query().metrics(model.metrics.revenue).limit(100_000).execute()
df = result.to_df()

# Asking for more than the instance allows fails loudly
model.query().limit(10_000_000).execute()
# ValueError: Limit 10000000 exceeds this instance's maximum query limit of 100000...

Large fetches page at the instance's maxPageSize to minimise round-trips, and every page uses the same size so no rows are skipped. To pull result sets larger than query.maxLimit, use a CSV/Excel export instead.

Pagination

For large result sets, results are paginated automatically:

result = query.execute()

# Access specific page
page_2 = result.page(2)

# Iterate through all pages
for page in result.iter_pages():
    process(page)

# Lazy DataFrame loading (polars only)
lazy_df = result.to_df_lazy()

Properties:

  • result.query_uuid - Unique identifier for the query
  • result.total_results - Total number of rows
  • result.total_pages - Number of pages
  • result.fields - Field metadata

Compiling to SQL

Use .compile() to get the warehouse SQL for a query without executing it. Nothing is run and no rows are fetched — it returns the exact SQL Lightdash would issue:

sql = (
    model.query()
    .metrics(model.metrics.revenue)
    .dimensions(model.dimensions.country)
    .filter(model.dimensions.status == "active")
    .compile()
)
print(sql)
# SELECT ... FROM ... WHERE ... GROUP BY ... LIMIT 500

This is handy for inspecting or debugging a query, or for running it directly against your warehouse when you need more rows than the query API returns — for example feeding it to BigQuery/bigframes, dbt, or a data pipeline:

import bigframes.pandas as bpd

sql = model.query().metrics(model.metrics.revenue).limit(10_000_000).compile()
df = bpd.read_gbq(sql)   # run it yourself, no Lightdash row cap

The query's limit is included in the SQL as LIMIT n; set a large limit (or strip the trailing clause) if you're running it yourself.


SQL Runner

Execute raw SQL queries directly against your data warehouse:

# Execute SQL
result = client.sql("SELECT * FROM orders WHERE status = 'active' LIMIT 100")
df = result.to_df()

# With custom limit
result = client.sql("SELECT * FROM orders", limit=1000)

# Introspection
tables = client.sql_runner.tables()
fields = client.sql_runner.fields("orders")
fields = client.sql_runner.fields("orders", schema="public")

Exception Handling

The SDK provides specific exceptions for different error conditions:

from lightdash import LightdashError, QueryError, QueryTimeout, QueryCancelled

try:
    result = query.execute()
except QueryTimeout as e:
    print(f"Query timed out: {e.query_uuid}")
except QueryCancelled as e:
    print(f"Query was cancelled: {e.query_uuid}")
except QueryError as e:
    print(f"Query failed: {e.message}")
except LightdashError as e:
    print(f"Lightdash error: {e.message} (status: {e.status_code})")

Exception hierarchy:

  • LightdashError - Base exception for all SDK errors
    • QueryError - Query execution failed (HTTP 400)
    • QueryTimeout - Query exceeded timeout (HTTP 408)
    • QueryCancelled - Query was cancelled (HTTP 499)

Complete Example

from lightdash import Client, QueryError, QueryTimeout

# Initialize client
client = Client(
    base_url="https://app.lightdash.cloud",
    token="your-api-token"
)

# Get a model
model = client.get_model("project-uuid", "orders")

# Build a query with filters
query = (
    model.query()
    .metrics(model.metrics.total_revenue, model.metrics.order_count)
    .dimensions(model.dimensions.country, model.dimensions.order_date)
    .filter(
        (model.dimensions.status == "completed") &
        (model.dimensions.order_date >= "2024-01-01")
    )
    .sort(model.metrics.total_revenue.desc())
    .limit(50)
)

try:
    result = query.execute()
    df = result.to_df()
    print(f"Fetched {len(result)} rows")
    print(df.head())
except QueryTimeout:
    print("Query took too long - try adding more filters")
except QueryError as e:
    print(f"Query failed: {e.message}")