This guide covers the query builder, filtering system, and other features of the Lightdash Python SDK.
from lightdash import Client
client = Client(base_url="https://app.lightdash.cloud", token="your-token")
model = client.get_model("your_project_uuid", "your_model_name")
# Build and execute a query
result = (
model.query()
.metrics(model.metrics.revenue, model.metrics.profit)
.dimensions(model.dimensions.country)
.filter(model.dimensions.status == "active")
.sort(model.metrics.revenue.desc())
.limit(100)
.execute()
)
df = result.to_df()The SDK provides two patterns for building queries: single-call and chainable builder.
Pass all parameters at once—useful for simple queries:
query = model.query(
metrics=[model.metrics.revenue, model.metrics.profit],
dimensions=[model.dimensions.country],
filters=model.dimensions.status == "active",
sort=model.metrics.revenue.desc(),
limit=100
)
result = query.execute()Build queries incrementally with method chaining—each method returns a new immutable Query object:
query = (
model.query()
.metrics(model.metrics.revenue)
.dimensions(model.dimensions.country, model.dimensions.date)
.filter(model.dimensions.status == "active")
.sort(model.metrics.revenue.desc())
.limit(100)
)Key characteristics:
- Immutable: Each method returns a new Query object (safe for reuse)
- Lazy evaluation: API calls only happen when
.execute()is called - Order-independent: Methods can be called in any order
- Composable: Create base queries and extend them
# Create a reusable base query
base = model.query().metrics(model.metrics.revenue).dimensions(model.dimensions.country)
# Extend it for different use cases
by_active = base.filter(model.dimensions.status == "active")
by_inactive = base.filter(model.dimensions.status == "inactive")Filters let you constrain your query results. The SDK provides an intuitive, Pythonic API for creating filters.
Use standard Python comparison operators on dimensions:
# Equality
f = model.dimensions.country == "USA"
# Inequality
f = model.dimensions.country != "USA"
# Numeric comparisons
f = model.dimensions.amount > 1000
f = model.dimensions.amount >= 1000
f = model.dimensions.amount < 500
f = model.dimensions.amount <= 500For more complex filters, use helper methods:
# Check if value is in a list
f = model.dimensions.country.in_(["USA", "UK", "Canada"])
# String operations
f = model.dimensions.name.starts_with("John")
f = model.dimensions.name.ends_with("son")
f = model.dimensions.name.includes("Smith")
# Null checks
f = model.dimensions.email.is_null()
f = model.dimensions.email.is_not_null()
# Range checks
f = model.dimensions.order_date.between("2024-01-01", "2024-12-31")
f = model.dimensions.amount.not_between(100, 500)| Operator | Numeric | String | Boolean | Date |
|---|---|---|---|---|
is null |
Yes | Yes | Yes | Yes |
is not null |
Yes | Yes | Yes | Yes |
equals / is |
Yes | Yes | Yes | Yes |
is not |
Yes | Yes | - | Yes |
is less than |
Yes | - | - | - |
is greater than |
Yes | - | - | - |
starts with |
- | Yes | - | - |
ends with |
- | Yes | - | - |
includes |
- | Yes | - | - |
in the last |
- | - | - | Yes |
in the next |
- | - | - | Yes |
in the current |
- | - | - | Yes |
is before |
- | - | - | Yes |
is after |
- | - | - | Yes |
is between |
Yes | - | - | Yes |
is not between |
Yes | - | - | Yes |
Use & (AND) and | (OR) operators to combine filters:
# AND: Both conditions must be true
f = (model.dimensions.country == "USA") & (model.dimensions.amount > 1000)
# OR: Either condition must be true
f = (model.dimensions.status == "active") | (model.dimensions.status == "pending")
# Complex combinations
f = (
(model.dimensions.country == "USA") &
((model.dimensions.amount > 1000) | (model.dimensions.priority == "high"))
)Multiple .filter() calls on a query are combined with AND logic:
query = (
model.query()
.filter(model.dimensions.country == "USA")
.filter(model.dimensions.amount > 1000) # AND-ed with above
)You can apply more than one condition to the same field—for example, to constrain a date dimension to a range:
query = model.query().filter(
(model.dimensions.order_date >= "2026-01-01") &
(model.dimensions.order_date <= "2026-01-31")
)The same works across separate .filter() calls, since they are AND-ed together:
query = (
model.query()
.filter(model.dimensions.order_date >= "2026-01-01")
.filter(model.dimensions.order_date <= "2026-01-31")
)Define a table calculation, add it to the query, then filter on it with the same
operators you use for dimensions. Table-calculation conditions are sent under
filters.tableCalculations:
from lightdash import TableCalculation
profit_ratio = TableCalculation(
name="profit_ratio",
sql="${orders.profit} / ${orders.revenue}",
)
query = (
model.query()
.metrics(model.metrics.revenue, model.metrics.profit)
.table_calculations(profit_ratio)
.filter(profit_ratio > 0.2) # only rows where the ratio exceeds 20%
)Dimension and table-calculation filters can be combined with & (AND); each is
serialized under its own key:
query = (
model.query()
.table_calculations(profit_ratio)
.filter((model.dimensions.country == "USA") & (profit_ratio > 0.2))
)Note on
type:TableCalculationdefaultstype="number". The data type must be set for filter operators to compile — an untyped calculation is treated as a string by the API and rejects numeric operators like>orbetween. For a non-numeric calculation, passtype="string"(ordate,timestamp,boolean).
Limitation —
|(OR) across field types: Lightdash sends dimension, metric, and table-calculation filters as separate groups that are AND-ed together at the top level. An OR that mixes field types — e.g.(model.dimensions.country == "USA") | (profit_ratio > 0.2)— therefore cannot be expressed faithfully: it serializes as two independent groups that the server combines with AND, not OR. OR works as expected within a single field type (e.g.(profit_ratio > 0.8) | (profit_ratio < 0.2)).
Access dimensions and metrics as attributes on the model:
# Access via attribute
country = model.dimensions.country
revenue = model.metrics.revenue
# List all available
all_dimensions = model.dimensions.list()
all_metrics = model.metrics.list()Features:
- Lazy loading: Fetched from API on first access, then cached
- Fuzzy matching: Typos suggest closest matches
- Tab completion: Works in Jupyter/IPython for discovery
- Rich display: HTML rendering in notebooks
Sort results using the .sort() method or Sort class:
from lightdash import Sort
# Using metric/dimension methods (recommended)
query = model.query().sort(model.metrics.revenue.desc())
query = model.query().sort(model.dimensions.country.asc())
# Multiple sorts
query = model.query().sort(
model.metrics.revenue.desc(),
model.dimensions.country.asc()
)
# Control null positioning
query = model.query().sort(model.dimensions.name.asc(nulls_first=True))
# Using Sort class directly
query = model.query().sort(Sort("orders_revenue", descending=True))Query results implement a unified ResultSet interface:
result = query.execute()
# To pandas DataFrame
df = result.to_df() # or result.to_df(backend="pandas")
# To polars DataFrame
df = result.to_df(backend="polars")
# To list of dictionaries
records = result.to_records()
# To JSON string
json_str = result.to_json_str()# Iterate over rows
for row in result:
print(row)
# Get total count
total = len(result)There is no fixed SDK row cap. A query can return as many rows as the instance
allows — its configured query.maxLimit (discovered automatically from
/api/v1/health; e.g. 100,000 on Lightdash Cloud). Request more than that and
the SDK raises a clear ValueError instead of letting the server silently
return a truncated result:
# Fetch a large extract — pages are streamed transparently
result = model.query().metrics(model.metrics.revenue).limit(100_000).execute()
df = result.to_df()
# Asking for more than the instance allows fails loudly
model.query().limit(10_000_000).execute()
# ValueError: Limit 10000000 exceeds this instance's maximum query limit of 100000...Large fetches page at the instance's maxPageSize to minimise round-trips, and
every page uses the same size so no rows are skipped. To pull result sets larger
than query.maxLimit, use a CSV/Excel export instead.
For large result sets, results are paginated automatically:
result = query.execute()
# Access specific page
page_2 = result.page(2)
# Iterate through all pages
for page in result.iter_pages():
process(page)
# Lazy DataFrame loading (polars only)
lazy_df = result.to_df_lazy()Properties:
result.query_uuid- Unique identifier for the queryresult.total_results- Total number of rowsresult.total_pages- Number of pagesresult.fields- Field metadata
Use .compile() to get the warehouse SQL for a query without executing it.
Nothing is run and no rows are fetched — it returns the exact SQL Lightdash
would issue:
sql = (
model.query()
.metrics(model.metrics.revenue)
.dimensions(model.dimensions.country)
.filter(model.dimensions.status == "active")
.compile()
)
print(sql)
# SELECT ... FROM ... WHERE ... GROUP BY ... LIMIT 500This is handy for inspecting or debugging a query, or for running it directly against your warehouse when you need more rows than the query API returns — for example feeding it to BigQuery/bigframes, dbt, or a data pipeline:
import bigframes.pandas as bpd
sql = model.query().metrics(model.metrics.revenue).limit(10_000_000).compile()
df = bpd.read_gbq(sql) # run it yourself, no Lightdash row capThe query's limit is included in the SQL as LIMIT n; set a large limit (or
strip the trailing clause) if you're running it yourself.
Execute raw SQL queries directly against your data warehouse:
# Execute SQL
result = client.sql("SELECT * FROM orders WHERE status = 'active' LIMIT 100")
df = result.to_df()
# With custom limit
result = client.sql("SELECT * FROM orders", limit=1000)
# Introspection
tables = client.sql_runner.tables()
fields = client.sql_runner.fields("orders")
fields = client.sql_runner.fields("orders", schema="public")The SDK provides specific exceptions for different error conditions:
from lightdash import LightdashError, QueryError, QueryTimeout, QueryCancelled
try:
result = query.execute()
except QueryTimeout as e:
print(f"Query timed out: {e.query_uuid}")
except QueryCancelled as e:
print(f"Query was cancelled: {e.query_uuid}")
except QueryError as e:
print(f"Query failed: {e.message}")
except LightdashError as e:
print(f"Lightdash error: {e.message} (status: {e.status_code})")Exception hierarchy:
LightdashError- Base exception for all SDK errorsQueryError- Query execution failed (HTTP 400)QueryTimeout- Query exceeded timeout (HTTP 408)QueryCancelled- Query was cancelled (HTTP 499)
from lightdash import Client, QueryError, QueryTimeout
# Initialize client
client = Client(
base_url="https://app.lightdash.cloud",
token="your-api-token"
)
# Get a model
model = client.get_model("project-uuid", "orders")
# Build a query with filters
query = (
model.query()
.metrics(model.metrics.total_revenue, model.metrics.order_count)
.dimensions(model.dimensions.country, model.dimensions.order_date)
.filter(
(model.dimensions.status == "completed") &
(model.dimensions.order_date >= "2024-01-01")
)
.sort(model.metrics.total_revenue.desc())
.limit(50)
)
try:
result = query.execute()
df = result.to_df()
print(f"Fetched {len(result)} rows")
print(df.head())
except QueryTimeout:
print("Query took too long - try adding more filters")
except QueryError as e:
print(f"Query failed: {e.message}")