diff --git a/api-reference/python/tilebox.datasets/Client.mdx b/api-reference/python/tilebox.datasets/Client.mdx
index 759078c..7dfd95e 100644
--- a/api-reference/python/tilebox.datasets/Client.mdx
+++ b/api-reference/python/tilebox.datasets/Client.mdx
@@ -47,8 +47,5 @@ client = Client(
url="https://api.tilebox.com",
token="YOUR_TILEBOX_API_KEY",
)
-
-# use HTTP/1.1 if your network blocks or breaks gRPC over HTTP/2
-client = Client(transport="http1")
```
diff --git a/api-reference/python/tilebox.workflows/Client.mdx b/api-reference/python/tilebox.workflows/Client.mdx
index 5963d71..8ec93c4 100644
--- a/api-reference/python/tilebox.workflows/Client.mdx
+++ b/api-reference/python/tilebox.workflows/Client.mdx
@@ -93,9 +93,6 @@ client = Client(
name="sentinel-2-runner",
)
-# use HTTP/1.1 if your network blocks or breaks gRPC over HTTP/2
-client = Client(transport="http1")
-
# access sub clients
job_client = client.jobs()
cluster_client = client.clusters()
diff --git a/assets/storage/s2_quicklook.jpg b/assets/storage/s2_quicklook.jpg
index 24a6b3d..71251a5 100644
Binary files a/assets/storage/s2_quicklook.jpg and b/assets/storage/s2_quicklook.jpg differ
diff --git a/assets/storage/usgs_quicklook.jpg b/assets/storage/usgs_quicklook.jpg
index 1f69af5..bdac750 100644
Binary files a/assets/storage/usgs_quicklook.jpg and b/assets/storage/usgs_quicklook.jpg differ
diff --git a/changelog.mdx b/changelog.mdx
index b1b3ea3..214e274 100644
--- a/changelog.mdx
+++ b/changelog.mdx
@@ -172,7 +172,7 @@ mode: center
- [Spatio Temporal datasets documentation](https://docs.tilebox.com/datasets/types/spatiotemporal)
- All open data now supports spatio-temporal queries
- - [Create your own spatio-temporal datasets](/guides/datasets/create)
+ - [Create your own spatio-temporal datasets](/guides/datasets/build-spatiotemporal-catalog)
- [Ingesting spatio-temporal data](/datasets/ingest)
@@ -198,5 +198,5 @@ mode: center
3. Use client.ingest() to ingest a `xarray.Dataset` or `pandas.DataFrame`
4. Query away!
- For detailed instructions, check out the [Creating a dataset](/guides/datasets/create) how-to guide.
+ For detailed instructions, check out the [Build a spatio-temporal catalog](/guides/datasets/build-spatiotemporal-catalog) how-to guide.
diff --git a/datasets/concepts/datasets.mdx b/datasets/concepts/datasets.mdx
index 3331fb4..dd840dd 100644
--- a/datasets/concepts/datasets.mdx
+++ b/datasets/concepts/datasets.mdx
@@ -11,11 +11,11 @@ icon: database
## Related Guides
-
- Learn how to create a Timeseries dataset using the Tilebox Console.
+
+ Learn how to create a custom dataset catalog with the Python SDK.
-
- Learn how to ingest an existing CSV dataset into a Timeseries dataset collection.
+
+ Learn how to ingest GeoParquet metadata into an existing spatio-temporal catalog.
@@ -151,7 +151,7 @@ Once you have your dataset object, you can use it to [list the available collect
## Creating / Updating a dataset
-You can create a dataset using the [Tilebox Console](/guides/datasets/create) or by using one of the available [client SDKs](/sdks/introduction).
+You can create a dataset using one of the available [client SDKs](/sdks/introduction), or use the [Tilebox Console](/console) when you prefer a visual schema editor.
```python Python
diff --git a/datasets/ingest.mdx b/datasets/ingest.mdx
index 7407f26..0e6ff2f 100644
--- a/datasets/ingest.mdx
+++ b/datasets/ingest.mdx
@@ -19,7 +19,7 @@ The examples on this page assume that you have access to a [Timeseries dataset](
- Check out the [Creating a dataset](/guides/datasets/create) guide for an example of how to create such a dataset.
+ Check out the [Build a spatio-temporal catalog](/guides/datasets/build-spatiotemporal-catalog) guide for an example of how to create such a dataset.
**MyCustomDataset schema**
diff --git a/docs.json b/docs.json
index 259b25c..6fb061b 100644
--- a/docs.json
+++ b/docs.json
@@ -147,18 +147,19 @@
{
"group": "Datasets",
"pages": [
- "guides/datasets/access-sentinel2-data",
"guides/datasets/query-satellite-data",
- "guides/datasets/create",
- "guides/datasets/ingest",
- "guides/datasets/ingest-format",
- "guides/datasets/build-spatiotemporal-catalog"
+ "guides/datasets/access-sentinel2-data",
+ "guides/datasets/access-usgs-landsat-data",
+ "guides/datasets/build-spatiotemporal-catalog",
+ "guides/datasets/ingest-into-spatiotemporal-catalog",
+ "guides/datasets/ingest-format"
]
},
{
"group": "Workflows",
"pages": [
"guides/workflows/run-your-first-workflow",
+ "guides/workflows/execute-tasks-in-parallel",
"guides/workflows/build-and-deploy-workflow",
"guides/workflows/deploy-to-your-compute",
"guides/workflows/debug-failed-run",
diff --git a/guides/cookbook.mdx b/guides/cookbook.mdx
index 02d16ee..d7a075a 100644
--- a/guides/cookbook.mdx
+++ b/guides/cookbook.mdx
@@ -12,40 +12,49 @@ export const cookbookSections = [
description: "Find, model, ingest, and query geospatial metadata with Tilebox Datasets.",
guides: [
{
- title: "Access Sentinel-2 data",
- href: "/guides/datasets/access-sentinel2-data",
- description: "Query Sentinel-2 metadata by area and time range.",
+ title: "Query open satellite data",
+ href: "/guides/datasets/query-satellite-data",
+ description: "Explore open data catalogs and query Sentinel-2 metadata by time and location.",
icon: "satellite",
level: "Beginner",
time: "5 min",
- tags: ["Open data", "Sentinel-2", "Spatial filters", "Python SDK"],
+ tags: ["Open data", "Sentinel-2", "Metadata queries", "Spatial filters"],
},
{
- title: "Query satellite data by time and location",
- href: "/guides/datasets/query-satellite-data",
- description: "Narrow open data catalogs before downloading product files.",
+ title: "Access Copernicus data",
+ href: "/guides/datasets/access-sentinel2-data",
+ description: "Download Copernicus product files with the storage client, using Sentinel-2 as an example.",
icon: "magnifying-glass-location",
level: "Beginner",
- time: "5 min",
- tags: ["Open data", "Temporal filters", "Spatial filters", "Storage clients"],
+ time: "10 min",
+ tags: ["Copernicus", "Storage clients", "Sentinel-2", "Product files"],
},
{
- title: "Creating a dataset",
- href: "/guides/datasets/create",
- description: "Create a custom dataset and define its schema in the Console.",
- icon: "database",
+ title: "Access USGS Landsat data",
+ href: "/guides/datasets/access-usgs-landsat-data",
+ description: "Download USGS Landsat product files with the storage client, using Landsat 8 as an example.",
+ icon: "satellite-dish",
level: "Beginner",
- time: "5 min",
- tags: ["Datasets", "Dataset schemas", "Console", "Timeseries datasets"],
+ time: "10 min",
+ tags: ["USGS", "Landsat 8", "Storage clients", "Product files"],
+ },
+ {
+ title: "Build a spatio-temporal catalog",
+ href: "/guides/datasets/build-spatiotemporal-catalog",
+ description: "Create, document, ingest, and query a custom geospatial catalog with the Python SDK.",
+ icon: "globe",
+ level: "Intermediate",
+ time: "20 min",
+ tags: ["Spatio-temporal datasets", "Dataset schemas", "Ingestion", "Python SDK"],
},
{
- title: "Ingesting data",
- href: "/guides/datasets/ingest",
- description: "Ingest GeoParquet data into a Tilebox timeseries dataset.",
+ title: "Ingest into a spatio-temporal catalog",
+ href: "/guides/datasets/ingest-into-spatiotemporal-catalog",
+ description: "Prepare GeoParquet metadata and ingest it into an existing spatio-temporal catalog.",
icon: "up-from-bracket",
level: "Beginner",
time: "15 min",
- tags: ["Ingestion", "Timeseries datasets", "GeoParquet", "Python SDK"],
+ tags: ["Ingestion", "Spatio-temporal datasets", "GeoParquet", "Python SDK"],
},
{
title: "Ingesting from common file formats",
@@ -56,15 +65,6 @@ export const cookbookSections = [
time: "20 min",
tags: ["Ingestion", "DataFrames", "xarray", "File formats"],
},
- {
- title: "Build a spatio-temporal catalog",
- href: "/guides/datasets/build-spatiotemporal-catalog",
- description: "Create a geospatial catalog that can be queried by time and geometry.",
- icon: "globe",
- level: "Intermediate",
- time: "15 min",
- tags: ["Spatio-temporal datasets", "Geometry", "Ingestion", "Catalogs"],
- },
],
},
{
@@ -80,6 +80,15 @@ export const cookbookSections = [
time: "10 min",
tags: ["Tasks", "Jobs", "Direct runners", "Python SDK"],
},
+ {
+ title: "Execute tasks in parallel",
+ href: "/guides/workflows/execute-tasks-in-parallel",
+ description: "Submit many subtasks and process them with several direct runners at the same time.",
+ icon: "arrows-split-up-and-left",
+ level: "Beginner",
+ time: "15 min",
+ tags: ["Tasks", "Subtasks", "Direct runners", "Parallelism"],
+ },
{
title: "Build and deploy a workflow project",
href: "/guides/workflows/build-and-deploy-workflow",
diff --git a/guides/datasets/access-sentinel2-data.mdx b/guides/datasets/access-sentinel2-data.mdx
index 2e60586..ba299d2 100644
--- a/guides/datasets/access-sentinel2-data.mdx
+++ b/guides/datasets/access-sentinel2-data.mdx
@@ -1,111 +1,145 @@
---
-title: Access Sentinel-2 data
-description: Query Sentinel-2 satellite imagery metadata from a Tilebox dataset, filtering results by geographic area and time range to find exactly the data you need.
+title: Access Copernicus data
+description: Download Copernicus Data Space products with the Tilebox Copernicus storage client, using Sentinel-2 as an example.
icon: database
---
-This guide assume you already [signed up](https://console.tilebox.com/sign-up) for a Tilebox account (free) and [created an API key](https://console.tilebox.com/settings/api-keys).
+Use this guide when you already have a Copernicus datapoint from a Tilebox metadata query and want to access the product files behind it. The example uses Sentinel-2 Level-2A data, but the same storage client pattern applies to Copernicus products supported by Tilebox.
-## Install Tilebox package
+Tilebox indexes product metadata as datasets. Product files remain in the Copernicus Data Space Ecosystem, so file access uses the `CopernicusStorageClient` with Copernicus S3 credentials.
-
-```bash uv
-uv add tilebox
-```
-```bash pip
-pip install tilebox
-```
-```bash poetry
-poetry add tilebox="*"
-```
-```bash pipenv
-pipenv install tilebox
-```
-
+## Prerequisites
+
+- You have a [Tilebox API key](/authentication).
+- You have installed the [Python SDK](/sdks/python/install).
+- You have a [Copernicus Data Space](https://dataspace.copernicus.eu/) account.
+- You have generated Copernicus [S3 credentials](https://eodata-s3keysmanager.dataspace.copernicus.eu/panel/s3-credentials).
-## Access Sentinel-2 metadata
+```bash
+uv add tilebox shapely
+```
-Query the Sentinel-2A satellite for level 2A data of October 2025 that cover the state of Colorado.
+## Select a Sentinel-2 datapoint
-
- Replace `YOUR_TILEBOX_API_KEY` with your actual API key, or omit the `token` parameter entirely if the `TILEBOX_API_KEY` environment variable is set.
-
+Start with a small metadata query and select one datapoint to access. For a deeper guide to open data discovery and metadata filtering, see [Query open data metadata](/guides/datasets/query-satellite-data).
-
```python Python
-from shapely import MultiPolygon
+from shapely import Polygon
from tilebox.datasets import Client
-area = MultiPolygon(
+area = Polygon(
[
- (((-109.05, 41.00), (-109.045, 37.0), (-102.05, 37.0), (-102.05, 41.00), (-109.05, 41.00)),),
+ (-109.05, 37.0),
+ (-102.05, 37.0),
+ (-102.05, 41.0),
+ (-109.05, 41.0),
+ (-109.05, 37.0),
]
)
-client = Client(token="YOUR_TILEBOX_API_KEY")
+client = Client()
collection = client.dataset("open_data.copernicus.sentinel2_msi").collection("S2A_S2MSI2A")
-data = collection.query(
+
+scenes = collection.query(
temporal_extent=("2025-10-01", "2025-11-01"),
spatial_extent=area,
show_progress=True,
)
-print(data)
+
+selected = scenes.where(scenes.cloud_cover < 10, drop=True).isel(time=0)
+print(selected.granule_name.item())
```
-
-
-```plaintext Output
- Size: 75kB
-Dimensions: (time: 169)
-Coordinates:
- * time (time) datetime64[ns] 1kB 2025-10-02T18:07:51.0240...
-Data variables: (12/23)
- id (time)
-The output shows that the query returned 169 data points metadata. The metadata is returned as an `xarray.Dataset`.
+
+ These credentials are Copernicus Data Space S3 credentials, not your Tilebox API key.
+
-Now you can check the metadata and decide which data points you want to download.
+## Download the complete product
-## Download the data from Copernicus Data Space
+Use `download` when you need the complete Sentinel-2 product directory. The storage client resolves the product location from the Tilebox datapoint metadata and downloads the matching files into the local cache directory.
-Tilebox stores and indexes metadata about datasets but doesn't store the data files.
-Sentinel-2 data is stored in the Copernicus Data Space Ecosystem (CDSE). If you never used the CDSE before, [create an account](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/auth?client_id=cdse-public&response_type=code&scope=openid&redirect_uri=https%3A//dataspace.copernicus.eu/account/confirmed/1) and then generate [S3 credentials here](https://eodata-s3keysmanager.dataspace.copernicus.eu/panel/s3-credentials).
+```python Python
+product_path = storage.download(selected)
+
+print(f"Downloaded {product_path.name} to {product_path}")
+print("Contents:")
+for path in product_path.iterdir():
+ print(f"- {path.relative_to(product_path)}")
+```
+
+```plaintext Output
+Downloaded S2A_MSIL2A_20251002T180751_N0511_R084_T13TEE_20251002T225842.SAFE to data/Sentinel-2/MSI/L2A/2025/10/02/S2A_MSIL2A_20251002T180751_N0511_R084_T13TEE_20251002T225842.SAFE
+Contents:
+- manifest.safe
+- GRANULE
+- INSPIRE.xml
+- MTD_MSIL2A.xml
+- DATASTRIP
+- HTML
+- rep_info
+- S2A_MSIL2A_20251002T180751_N0511_R084_T13TEE_20251002T225842-ql.jpg
+```
+
+## Download selected product files
+
+Sentinel-2 products contain many files, including metadata, masks, quicklook images, and bands at different resolutions. Use `list_objects` and `download_objects` when you only need specific files.
-
```python Python
-from tilebox.storage import CopernicusStorageClient
+objects = storage.list_objects(selected)
-storage_client = CopernicusStorageClient(
- access_key="YOUR_ACCESS_KEY",
- secret_access_key="YOUR_SECRET_ACCESS_KEY",
-)
+wanted_bands = ["B02_10m", "B03_10m", "B04_10m", "B08_10m"]
+band_objects = [
+ obj for obj in objects
+ if any(band in obj for band in wanted_bands)
+]
-for _, datapoint in data.groupby("time"):
- downloaded_data = storage_client.download(datapoint)
- print(f"Downloaded data to {downloaded_data}")
+for obj in band_objects:
+ print(obj)
+
+downloaded_files = storage.download_objects(selected, band_objects)
+print(downloaded_files)
```
-
-This code will download all 169 data points to a cache folder. Now you can work with the data, visualize it, and run your custom processor on it.
+Use this pattern when a workflow only needs a few bands or metadata files. It reduces transfer time and local storage compared with downloading the full `.SAFE` product.
+
+## Preview the product
+
+Many Copernicus products include a quicklook image. In a notebook, use `quicklook` to display the product preview without downloading the full product first.
-## Next Steps
+```python Python
+storage.quicklook(selected)
+```
+
+
+
+
+
+## Next steps
-
- Check out available open data datasets on Tilebox
+
+ Find Copernicus products by time, location, and metadata fields.
+
+
+ Learn about the other Tilebox storage clients for open data products.
+
+
+ Download Landsat product files with the USGS Landsat storage client.
diff --git a/guides/datasets/access-usgs-landsat-data.mdx b/guides/datasets/access-usgs-landsat-data.mdx
new file mode 100644
index 0000000..38fa5f3
--- /dev/null
+++ b/guides/datasets/access-usgs-landsat-data.mdx
@@ -0,0 +1,138 @@
+---
+title: Access USGS Landsat data
+description: Download USGS Landsat products with the Tilebox Landsat storage client, using Landsat 8 as an example.
+icon: satellite-dish
+---
+
+Use this guide when you already have a Landsat datapoint from a Tilebox metadata query and want to access the product files behind it. The example uses Landsat 8 Collection 2 Level-2 surface reflectance data.
+
+Tilebox indexes Landsat metadata as datasets. Product files remain in the USGS public cloud archive, so file access uses the `USGSLandsatStorageClient` and your AWS requester-pays setup.
+
+## Prerequisites
+
+- You have a [Tilebox API key](/authentication).
+- You have installed the [Python SDK](/sdks/python/install).
+- You have AWS credentials configured in your environment.
+- Your AWS account can access [requester-pays S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html).
+
+```bash
+uv add tilebox shapely
+```
+
+
+ USGS Landsat data is stored in a requester-pays S3 bucket. AWS charges for requests and data transfer according to your AWS account settings.
+
+
+## Select a Landsat 8 datapoint
+
+Start with a small metadata query and select one datapoint to access. For a deeper guide to open data discovery and metadata filtering, see [Query open data metadata](/guides/datasets/query-satellite-data).
+
+```python Python
+from shapely import Polygon
+from tilebox.datasets import Client
+
+area = Polygon(
+ [
+ (-109.05, 37.0),
+ (-102.05, 37.0),
+ (-102.05, 41.0),
+ (-109.05, 41.0),
+ (-109.05, 37.0),
+ ]
+)
+
+client = Client()
+collection = client.dataset("open_data.usgs.landsat8_oli_tirs").collection("L2_SR")
+
+scenes = collection.query(
+ temporal_extent=("2024-08-01", "2024-08-15"),
+ spatial_extent=area,
+ show_progress=True,
+)
+
+selected = scenes.where(scenes.cloud_cover < 10, drop=True).isel(time=0)
+print(selected.granule_name.item())
+```
+
+## Create the Landsat storage client
+
+Create a `USGSLandsatStorageClient`. The client uses AWS credentials from your environment, such as `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN` when needed.
+
+```python Python
+from tilebox.storage import USGSLandsatStorageClient
+
+storage = USGSLandsatStorageClient()
+```
+
+## Download the complete product
+
+Use `download` when you need the complete Landsat product directory. The storage client resolves the product location from the Tilebox datapoint metadata and downloads the matching files into the local cache.
+
+```python Python
+product_path = storage.download(selected)
+
+print(f"Downloaded {product_path.name} to {product_path}")
+print("Contents:")
+for path in product_path.iterdir():
+ print(f"- {path.relative_to(product_path)}")
+```
+
+```plaintext Output
+Downloaded LC08_L2SP_033033_20240808_20240814_02_T1 to ~/.cache/tilebox/collection02/level-2/standard/oli-tirs/2024/033/033/LC08_L2SP_033033_20240808_20240814_02_T1
+Contents:
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B1.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B2.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B3.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B4.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B5.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B6.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_SR_B7.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_QA_PIXEL.TIF
+- LC08_L2SP_033033_20240808_20240814_02_T1_MTL.json
+- LC08_L2SP_033033_20240808_20240814_02_T1_thumb_small.jpeg
+```
+
+## Download selected product files
+
+Landsat products contain surface reflectance bands, quality masks, thermal bands, metadata, and preview images. Use `list_objects` and `download_objects` when you only need specific files.
+
+```python Python
+objects = storage.list_objects(selected)
+
+rgb_bands = ["B4", "B3", "B2"]
+rgb_objects = [
+ obj for obj in objects
+ if any(obj.endswith(f"_{band}.TIF") for band in rgb_bands)
+]
+
+for obj in rgb_objects:
+ print(obj)
+
+downloaded_files = storage.download_objects(selected, rgb_objects)
+print(downloaded_files)
+```
+
+Use this pattern when a workflow only needs a few bands, masks, or metadata files. It reduces transfer time and local storage compared with downloading the full product.
+
+## Preview the product
+
+Many Landsat products include a thumbnail image. In a notebook, use `quicklook` to display the product preview without downloading the full product first.
+
+```python Python
+storage.quicklook(selected)
+```
+
+
+
+
+
+## Next steps
+
+
+
+ Find Landsat products by time, location, and metadata fields.
+
+
+ Learn about the other Tilebox storage clients for open data products.
+
+
diff --git a/guides/datasets/build-spatiotemporal-catalog.mdx b/guides/datasets/build-spatiotemporal-catalog.mdx
index a209757..e97c8ca 100644
--- a/guides/datasets/build-spatiotemporal-catalog.mdx
+++ b/guides/datasets/build-spatiotemporal-catalog.mdx
@@ -1,25 +1,137 @@
---
title: Build a spatio-temporal catalog
-description: Create a custom spatio-temporal dataset and ingest geospatial metadata that can be queried by time and location.
+description: Create a custom spatio-temporal dataset catalog with the Python SDK, document its schema, ingest geospatial metadata, and query it by time and location.
icon: globe
---
Use a spatio-temporal dataset when each datapoint has both a time and a geometry. This is useful for internal imagery catalogs, derived products, ground truth data, regions of interest, and processing outputs that need geospatial lookup.
-This guide shows the shape of the workflow. For the full ingestion guide, see [Ingesting data](/guides/datasets/ingest).
+This guide creates an imagery catalog from code. You will define the dataset schema with the Python SDK, add field descriptions and examples for generated schema documentation, create a collection, ingest geospatial metadata, and query the catalog by time and location.
+
+## Prerequisites
+
+- You have a [Tilebox API key](/authentication).
+- You have installed the [Python SDK](/sdks/python/install).
+
+```bash
+uv add tilebox geopandas shapely
+```
+
+## Define the catalog schema
+
+Start by choosing the spatio-temporal dataset kind and the custom fields for your catalog. Tilebox adds the required `time`, `id`, `ingestion_time`, and `geometry` fields automatically.
+
+The example catalog tracks imagery products with a provider product ID, a storage location, cloud cover, and processing level. Field descriptions and example values become part of the generated schema documentation.
+
+```python Python
+from tilebox.datasets import Client
+from tilebox.datasets.data.datasets import DatasetKind
+
+client = Client()
+
+fields = [
+ {
+ "name": "product_id",
+ "type": str,
+ "description": "Stable product or scene identifier from the source catalog.",
+ "example_value": "LC08_L2SP_033033_20240808_20240814_02_T1",
+ },
+ {
+ "name": "location",
+ "type": str,
+ "description": "Storage path, object key, or provider-specific product location.",
+ "example_value": "s3://example-bucket/landsat/LC08_L2SP_033033_20240808_20240814_02_T1",
+ },
+ {
+ "name": "cloud_cover",
+ "type": float,
+ "description": "Cloud cover percentage for the product footprint.",
+ "example_value": "3.2",
+ },
+ {
+ "name": "processing_level",
+ "type": str,
+ "description": "Processing level or product type assigned by the source provider.",
+ "example_value": "L2_SR",
+ },
+]
+```
+
+Use field names that are stable and descriptive. Changing or removing fields after ingesting datapoints requires emptying the affected collections first, because existing datapoints must continue to match the dataset schema.
## Create the dataset
-Create a dataset in the [Tilebox Console](https://console.tilebox.com/datasets/my-datasets) and choose the **Spatio-temporal** kind. Tilebox adds the required `time`, `id`, `ingestion_time`, and `geometry` fields automatically.
+Call `create_or_update_dataset` with the dataset kind, code name, field list, and display name. The code name becomes the stable identifier used in SDK calls.
+
+```python Python
+dataset = client.create_or_update_dataset(
+ kind=DatasetKind.SPATIOTEMPORAL,
+ code_name="internal_imagery_catalog",
+ fields=fields,
+ name="Internal imagery catalog",
+)
+
+print(dataset)
+```
+
+If a dataset with the same code name already exists, `create_or_update_dataset` updates it instead of creating a duplicate. This makes the snippet safe to keep in a setup script.
+
+
+ The Python SDK currently sets the dataset display name and schema. Field-level `description` and `example_value` entries populate the generated schema documentation. Use the Tilebox Console when you want to add rich Markdown documentation to the dataset page.
+
+
+## Inspect the generated schema documentation
+
+Tilebox uses the dataset kind and field annotations to document the schema. Required fields are added by the dataset kind, and your custom fields appear with their descriptions and examples.
-Add fields that describe the products you want to catalog, such as:
+For this catalog, the complete schema includes:
| Field | Type | Purpose |
| --- | --- | --- |
-| `product_id` | `string` | Stable product or scene identifier. |
-| `location` | `string` | Storage path, object key, or provider location. |
-| `cloud_cover` | `float64` | Optional quality or filtering field. |
-| `processing_level` | `string` | Product processing level or category. |
+| `time` | Required | Timestamp associated with the datapoint. |
+| `id` | Required | Tilebox-generated UUID for the datapoint. |
+| `ingestion_time` | Required | Time when Tilebox ingested the datapoint. |
+| `geometry` | Required | Geometry used for spatial queries. |
+| `product_id` | Custom | Stable product or scene identifier. |
+| `location` | Custom | Storage path or provider product location. |
+| `cloud_cover` | Custom | Cloud cover percentage for filtering. |
+| `processing_level` | Custom | Provider processing level or product type. |
+
+The descriptions and example values you provided in the SDK call appear in the dataset schema documentation.
+
+## Add richer dataset documentation
+
+Use field descriptions for schema-level documentation. Use the Console documentation editor when you want longer Markdown documentation for the dataset, such as provenance notes, quality caveats, ingestion rules, or examples for downstream users.
+
+
+
+
+
+
+Open the dataset in the Console, click the edit pencil on the documentation section, and add Markdown content. A short documentation block often includes:
+
+```md Markdown
+# Internal imagery catalog
+
+This dataset indexes analysis-ready imagery products used by the operations team.
+
+## Source
+
+Products are copied from the provider archive after validation.
+
+## Usage notes
+
+Use `cloud_cover < 10` for workflows that require mostly cloud-free scenes.
+```
+
+## Create a collection
+
+After creating the dataset, create a collection to hold datapoints. Collections let you organize datapoints within the same schema, for example by provider, product family, or processing pipeline.
+
+```python Python
+collection = dataset.get_or_create_collection("landsat_level_2")
+print(collection)
+```
## Prepare datapoints
@@ -36,6 +148,10 @@ products = products.rename(
"path": "location",
}
)
+
+products = products[
+ ["time", "geometry", "product_id", "location", "cloud_cover", "processing_level"]
+]
```
## Ingest the catalog
@@ -43,12 +159,6 @@ products = products.rename(
Ingest the prepared records into a collection.
```python Python
-from tilebox.datasets import Client
-
-client = Client()
-dataset = client.dataset("my_org.imagery_catalog")
-collection = dataset.collection("processed-scenes")
-
collection.ingest(products)
```
@@ -76,4 +186,7 @@ matches = collection.query(
Load CSV, Parquet, GeoParquet, and NetCDF data before ingestion.
+
+ Prepare GeoParquet metadata and ingest it into this catalog.
+
diff --git a/guides/datasets/create.mdx b/guides/datasets/create.mdx
deleted file mode 100644
index ec58f39..0000000
--- a/guides/datasets/create.mdx
+++ /dev/null
@@ -1,101 +0,0 @@
----
-title: Creating a dataset
-description: Walk through creating a custom dataset in the Tilebox Console, from choosing the right dataset kind to defining your schema fields and organizing groups.
-icon: database
----
-
-This page guides you through the process of creating a dataset in Tilebox using the [Tilebox Console](/console).
-
-## Related documentation
-
-
-
- Learn about Tilebox datasets and how to use them.
-
-
- Learn about Timeseries datasets, which link each data point to a specific point in time.
-
-
-
-## Creating a dataset in the Console
-
-
-
- Create a dataset in Tilebox by going to [My datasets](https://console.tilebox.com/datasets/my-datasets) and clicking the `Create dataset` button.
-
-
- Choose a [dataset kind](/datasets/concepts/datasets#dataset-types) from the dropdown menu. Required fields for the selected dataset kind are automatically added.
-
-
-
-
-
-
-
- Complete these fields:
-
- - `Name` is the name of your dataset.
- - `Code name` is a unique identifier for the dataset within your team. It's automatically generated, but you can adjust it if needed.
- - `Description` is a brief description of the dataset purpose.
-
-
-
-
-
-
-
- Specify the fields for your dataset.
-
- Each field has these properties:
- - `Name` is the name of the field (by convention it is recommended to be in `snake_case`).
- - `Type` is the data type of the field.
- - `Array` can be set to indicate that the field contains multiple values of the specified type.
- - `Description` is an optional brief description of the field. You can use it to provide more context and details about the data.
- - `Example value` is an optional example for this field. It can be useful for documentation purposes.
-
-
-
-
-
-
-
- Once you're done completing the fields, click "Create" to create and save the dataset. You are redirected to your newly created dataset.
-
-
-
-## Automatic dataset schema documentation
-
-By specifying the fields for your dataset, including the data type, description, and an example value for each one, Tilebox
-is capable of automatically generating a documentation page for your dataset schema.
-
-
-
-
-
-
-## Adding extra documentation
-
-You can also add custom documentation to your dataset, providing more context and details about the data included data.
-This documentation supports rich formatting, including links, tables, code snippets, and more.
-
-
-
-
-
-
-To add documentation, click the edit pencil button on the dataset page to open the documentation editor.
-You can use Markdown to format your documentation; you can include links, tables, code snippets, and other Markdown features.
-
-
- If you don't see the edit pencil button, you don't have the required permissions to edit the
- documentation.
-
-
-Once you are done editing the documentation, click the `Save` button to save your changes.
-
-## Changing the dataset schema
-
-You can always add new fields to a dataset.
-
-If you want to remove or edit existing fields, you'll first need to empty all collections in the dataset, to ensure that
-no existing data points relying on those fields exist. Then, you can freely edit the dataset schema in the console.
diff --git a/guides/datasets/ingest-format.mdx b/guides/datasets/ingest-format.mdx
index 5dbbe80..a424398 100644
--- a/guides/datasets/ingest-format.mdx
+++ b/guides/datasets/ingest-format.mdx
@@ -68,7 +68,7 @@ data = gpd.read_parquet("modis_MCD12Q1.geoparquet")
- For a step-by-step guide of ingesting a GeoParquet file, check out our [Ingesting data](/guides/datasets/ingest) guide.
+ For a step-by-step guide to ingesting a GeoParquet file, see [Ingest into a spatio-temporal catalog](/guides/datasets/ingest-into-spatiotemporal-catalog).
### Feather
@@ -135,4 +135,3 @@ collection = dataset.get_or_create_collection("Measurements")
collection.ingest(data)
```
-
diff --git a/guides/datasets/ingest-into-spatiotemporal-catalog.mdx b/guides/datasets/ingest-into-spatiotemporal-catalog.mdx
new file mode 100644
index 0000000..5202b02
--- /dev/null
+++ b/guides/datasets/ingest-into-spatiotemporal-catalog.mdx
@@ -0,0 +1,175 @@
+---
+title: Ingest into a spatio-temporal catalog
+sidebarTitle: Ingest into a catalog
+description: Prepare GeoParquet metadata and ingest it into an existing Tilebox spatio-temporal catalog.
+icon: up-from-bracket
+---
+
+Use this guide after [Build a spatio-temporal catalog](/guides/datasets/build-spatiotemporal-catalog). It assumes you already created a spatio-temporal dataset and now want to load geospatial metadata into one of its collections.
+
+The example starts from a GeoParquet file, reshapes it to match the catalog schema, ingests it into Tilebox, and runs a time and location query against the new collection.
+
+
+ If your source data uses a different file format, see [Ingesting from common file formats](/guides/datasets/ingest-format) for examples of loading CSV, Parquet, GeoParquet, and NetCDF data before ingestion.
+
+
+## Prerequisites
+
+- You have a [Tilebox API key](/authentication).
+- You have installed the [Python SDK](/sdks/python/install).
+- You have created the catalog from [Build a spatio-temporal catalog](/guides/datasets/build-spatiotemporal-catalog), or an equivalent spatio-temporal dataset with matching fields.
+
+```bash
+uv add tilebox geopandas lonboard shapely
+```
+
+## Download the example metadata
+
+The example metadata is available as a [GeoParquet](https://geoparquet.org/) file:
+
+```bash
+curl -L \
+ -o modis_MCD12Q1.geoparquet \
+ https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/modis_MCD12Q1.geoparquet
+```
+
+This file contains MODIS land cover product metadata, including timestamps and product footprints.
+
+## Read and preview the source data
+
+Read the GeoParquet file with Geopandas. The resulting `GeoDataFrame` includes a `geometry` column, which Tilebox uses for spatial indexing in spatio-temporal datasets.
+
+```python Python
+import geopandas as gpd
+
+source = gpd.read_parquet("modis_MCD12Q1.geoparquet")
+source.head(5)
+```
+
+```plaintext Output
+ time end_time granule_name geometry horizontal_tile_number vertical_tile_number tile_id
+0 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v08... POLYGON ((-180 10, -180 0, -170 0, ... 0 8 51000008
+1 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v09... POLYGON ((-180 0, -180 -10, ... 0 9 51000009
+```
+
+You can inspect the footprints before ingestion with `lonboard`.
+
+```python Python
+from lonboard import viz
+
+viz(source, map_kwargs={"show_tooltip": True})
+```
+
+
+
+
+
+
+## Match the catalog schema
+
+Prepare a DataFrame with the fields required by the catalog. This example targets the schema from [Build a spatio-temporal catalog](/guides/datasets/build-spatiotemporal-catalog): `time`, `geometry`, `product_id`, `location`, `cloud_cover`, and `processing_level`.
+
+```python Python
+products = source.copy()
+
+products["product_id"] = products["granule_name"]
+products["location"] = products["granule_name"].map(
+ lambda name: f"modis://MCD12Q1/{name}"
+)
+products["cloud_cover"] = 0.0
+products["processing_level"] = "MCD12Q1"
+
+products = products[
+ ["time", "geometry", "product_id", "location", "cloud_cover", "processing_level"]
+]
+
+products.head(5)
+```
+
+Keep the DataFrame columns aligned with the dataset schema. Required fields such as `id` and `ingestion_time` are generated by Tilebox during ingestion, so you do not include them in the input DataFrame.
+
+## Connect to the catalog collection
+
+Access the catalog dataset and create or reuse a collection for the MODIS products.
+
+```python Python
+from tilebox.datasets import Client
+
+client = Client()
+dataset = client.dataset("internal_imagery_catalog")
+collection = dataset.get_or_create_collection("modis_land_cover")
+```
+
+Replace `internal_imagery_catalog` with the code name of your catalog if you used a different value in the previous guide.
+
+## Ingest the products
+
+Ingest the prepared DataFrame into the collection. Tilebox validates each row against the dataset schema before storing it.
+
+```python Python
+datapoint_ids = collection.ingest(products)
+print(f"Successfully ingested {len(datapoint_ids)} datapoints.")
+```
+
+```plaintext Output
+Successfully ingested 7245 datapoints.
+```
+
+## Query the ingested catalog
+
+After ingestion, query the collection by time and location. The query model is the same one used by Tilebox open data catalogs.
+
+```python Python
+from shapely import Polygon
+
+area = Polygon(
+ [
+ (-124.45, 49.19),
+ (-120.88, 29.31),
+ (-66.87, 24.77),
+ (-65.34, 47.84),
+ (-124.45, 49.19),
+ ]
+)
+
+matches = collection.query(
+ temporal_extent=("2015-01-01", "2020-01-01"),
+ spatial_extent=area,
+)
+
+matches[["product_id", "processing_level", "location"]]
+```
+
+```plaintext Output
+ Size: 18kB
+Dimensions: (time: 110)
+Coordinates:
+ * time (time) datetime64[ns] 2015-01-01 ... 2019-01-01
+Data variables:
+ product_id (time) object 'MCD12Q1.A2015001.h10v03...' ...
+ processing_level (time) object 'MCD12Q1' 'MCD12Q1' ...
+ location (time) object 'modis://MCD12Q1/MCD12Q1.A2015001...' ...
+```
+
+## View the data in the Console
+
+You can also inspect ingested datapoints in the Tilebox Console. Open the dataset, select the collection, and click a datapoint to inspect its fields and geometry.
+
+
+
+
+
+
+## Next steps
+
+
+
+ Create and document the catalog schema used by this guide.
+
+
+ Learn more about querying datasets by time, location, collection, and ID.
+
+
+ Load CSV, Parquet, GeoParquet, and NetCDF data before ingestion.
+
+
diff --git a/guides/datasets/ingest.mdx b/guides/datasets/ingest.mdx
deleted file mode 100644
index 6cb7655..0000000
--- a/guides/datasets/ingest.mdx
+++ /dev/null
@@ -1,257 +0,0 @@
----
-title: Ingesting data
-description: Walk through the full process of ingesting GeoParquet data into a Tilebox timeseries dataset, from downloading source files to previewing the results.
-icon: up-from-bracket
----
-
-
-
- This guide is also available as a Google Colab notebook. Click here for an interactive version.
-
-
-
-This page guides you through the process of ingesting data into a Tilebox dataset. Starting from an existing
-dataset available as file in the [GeoParquet](https://geoparquet.org/) format, you'll go through the process of
-ingesting that data into Tilebox as a [Timeseries](/datasets/types/timeseries) dataset.
-
-
- If you have your data in a different format, check out the [Ingesting from common file formats](/guides/datasets/ingest-format) examples of how to ingest it.
-
-
-## Related documentation
-
-
-
- Learn about Tilebox datasets and how to use them.
-
-
- Learn how to ingest data into a Tilebox dataset.
-
-
-
-## Downloading the example dataset
-
-The dataset used in this example is available as a [GeoParquet](https://geoparquet.org/) file. You can download it
-from here: [modis_MCD12Q1.geoparquet](https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/modis_MCD12Q1.geoparquet).
-
-## Installing the necessary packages
-
-This example uses a couple of python packages for reading parquet files and for visualizing the dataset. Install the
-required packages using your preferred package manager. For new projects, Tilebox recommend using [uv](https://docs.astral.sh/uv/).
-
-
-```bash uv
-uv add tilebox-datasets geopandas lonboard
-```
-```bash pip
-pip install tilebox-datasets geopandas lonboard
-```
-```bash poetry
-poetry add tilebox-datasets="*" geopandas="*" lonboard="*"
-```
-```bash pipenv
-pipenv install tilebox-datasets geopandas lonboard
-```
-
-
-## Reading and previewing the dataset
-
-The dataset is available as a [GeoParquet](https://geoparquet.org/) file. You can read it using the `geopandas.read_parquet` function.
-
-
-```python Python
-import geopandas as gpd
-
-modis_data = gpd.read_parquet("modis_MCD12Q1.geoparquet")
-modis_data.head(5)
-```
-
-
-
-```plaintext Output
- time end_time granule_name geometry horizontal_tile_number vertical_tile_number tile_id file_size checksum checksum_type day_night_flag browse_granule_id published_at
-0 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v08.061.2022146024956.hdf POLYGON ((-180 10, -180 0, -170 0, -172.62252 ... 0 8 51000008 275957 941243048 CKSUM Day None 2022-06-23 10:54:43.824000+00:00
-1 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v09.061.2022146024922.hdf POLYGON ((-180 0, -180 -10, -172.62252 -10, -1... 0 9 51000009 285389 3014510714 CKSUM Day None 2022-06-23 10:54:44.697000+00:00
-2 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v10.061.2022146032851.hdf POLYGON ((-180 -10, -180 -20, -180 -20, -172.6... 0 10 51000010 358728 2908215698 CKSUM Day None 2022-06-23 10:54:44.669000+00:00
-3 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v08.061.2022146025203.hdf POLYGON ((-172.62252 10, -170 0, -160 0, -162.... 1 8 51001008 146979 1397661843 CKSUM Day None 2022-06-23 10:54:44.309000+00:00
-4 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v09.061.2022146025902.hdf POLYGON ((-170 0, -172.62252 -10, -162.46826 -... 1 9 51001009 148935 2314263965 CKSUM Day None 2022-06-23 10:54:44.023000+00:00
-```
-
-
-## Exploring it visually
-
-Geopandas comes with a built in explorer to visually explore the dataset.
-
-
-```python Python
-from lonboard import viz
-
-viz(modis_data, map_kwargs={"show_tooltip": True})
-```
-
-
-
-
-
-
-
-## Create a Tilebox dataset
-
-Now you'll create a [Spatio-temporal](/datasets/types/spatiotemporal) dataset with the same schema as the given MODIS dataset.
-To do so, you'll use the [Tilebox Console](/console), navigate to `My Datasets` and click `Create Dataset`. Then select
-`Spatio-temporal Dataset` as the dataset type.
-
-
- For more information on creating a dataset, check out the [Creating a dataset](/guides/datasets/create) guide for a
- Step by step guide.
-
-
-Now, to match the given MODIS dataset, you'll specify the following fields:
-
-| Field | Type | Note |
-| --- | --- | --- |
-| `granule_name` | string | MODIS granule name |
-| `end_time` | Timestamp | Measurement end time |
-| `horizontal_tile_number` | int64 | Horizontal modis tile number (0-35) |
-| `vertical_tile_number` | int64 | Vertical modis tile number (0-17) |
-| `tile_id` | int64 | Modis Tile ID |
-| `file_size` | uint64 | File size of the product in bytes |
-| `checksum` | string | Hash checksum of the file |
-| `checksum_type` | string | Checksum algorithm (MD5 / CKSUM) |
-| `day_night_flag` | int64 | Day / Night / Both |
-| `browse_granule_id` | string | Optional granule ID for browsing |
-| `published_at` | Timestamp | The time the product was published |
-
-In the console, this will look like the following:
-
-
-
-
-
-
-## Access the dataset from Python
-
-Your newly created dataset is now available. You can access it from Python. For this, you'll need to know the dataset slug,
-which was assigned automatically based on the specified `code_name`. To find out the slug, navigate to the dataset overview
-in the console.
-
-
-
-
-
-
-You can now instantiate the dataset client and access the dataset.
-
-
-```python Python
-from tilebox.datasets import Client
-
-client = Client()
-dataset = client.dataset("tilebox.modis") # replace with your dataset slug
-```
-
-
-## Create a collection
-
-Next, you'll create a collection to insert your data into.
-
-
-```python Python
-collection = dataset.get_or_create_collection("MCD12Q1")
-```
-
-
-## Ingest the data
-
-Now, you'll finally ingest the MODIS data into the collection.
-
-
-```python Python
-datapoint_ids = collection.ingest(modis_data)
-print(f"Successfully ingested {len(datapoint_ids)} datapoints!")
-```
-
-
-
-```plaintext Output
-Successfully ingested 7245 datapoints!
-```
-
-
-## Query the newly ingested data
-
-You can now query the newly ingested data. You can query a subset of the data for a specific time range.
-
-
- Since the data is now stored directly in the Tilebox dataset, you can query and access it from anywhere.
-
-
-
-```python Python
-from shapely import Polygon
-
-area = Polygon( # area roughly covering the US
- ((-124.45, 49.19), (-120.88, 29.31), (-66.87, 24.77), (-65.34, 47.84), (-124.45, 49.19)),
-)
-
-data = collection.query(
- temporal_extent=("2015-01-01", "2020-01-01"),
- spatial_extent=area
-)
-data
-```
-
-
-
-```plaintext Output
- Size: 28kB
-Dimensions: (time: 110)
-Coordinates:
- * time (time) datetime64[ns] 880B 2015-01-01 ... 2019-01-01
-Data variables: (12/14)
- id (time)
-
-
- For more information on accessing and querying data, check out [querying data](/datasets/query/querying-data).
-
-
-## View the data in the console
-
-You can also view your data in the Console, by navigate to the dataset, selecting the collection and then clicking
-on one of the data points.
-
-
-
-
-
-
-## Next steps
-
-Congrats. You've successfully ingested data into Tilebox. You can now explore the data in the console and use it for
-further processing and analysis.
-
-
-
- Learn all about [querying your newly created dataset](https://docs.tilebox.com/datasets/query)
-
-
- Explore the different dataset types available in Tilebox
-
-
- Check out a growing number of publicly available open data datasets on Tilebox
-
-
diff --git a/guides/datasets/query-satellite-data.mdx b/guides/datasets/query-satellite-data.mdx
index f9c1eec..d417192 100644
--- a/guides/datasets/query-satellite-data.mdx
+++ b/guides/datasets/query-satellite-data.mdx
@@ -1,12 +1,12 @@
---
-title: Query satellite data by time and location
-description: Find satellite metadata in a Tilebox open data catalog by combining temporal and spatial filters.
+title: Query open satellite data
+description: Explore available Tilebox open data catalogs and query Sentinel-2 metadata by time and location.
icon: satellite
---
-Use this guide when you know the area and time range you care about and want to find matching satellite products before downloading any files.
+Use this guide when you want to find satellite products in Tilebox open data catalogs before downloading any files. You will first inspect the available open data datasets, then query Sentinel-2 metadata by time and location.
-Tilebox Datasets stores searchable metadata for open data catalogs. Querying metadata first lets you narrow a large catalog to the scenes that match your workflow, notebook, or agent task.
+Tilebox Datasets stores searchable metadata for open Earth observation catalogs. Metadata queries are the fastest way to narrow a large catalog to the scenes that match your workflow, notebook, or agent task.
## Prerequisites
@@ -17,6 +17,47 @@ Tilebox Datasets stores searchable metadata for open data catalogs. Querying met
uv add tilebox shapely
```
+## Explore available open data datasets
+
+Tilebox exposes open data catalogs through the same dataset API as your private datasets. To get a list of available open data satellite datasets, run the following snippet.
+
+```python Python
+from tilebox.datasets import Client
+
+client = Client()
+datasets = client.datasets()
+print(datasets.open_data)
+```
+
+The output groups datasets by provider. Open data datasets include Copernicus Sentinel missions, USGS Landsat products, ASF SAR products, and other public catalogs that Tilebox has indexed.
+
+```plaintext Output
+asf:
+ ers_sar: European Remote Sensing Satellite (ERS) Synthetic Aperture Radar ...
+copernicus:
+ sentinel1_sar: The Sentinel-1 mission is the European Radar Observatory ...
+ sentinel2_msi: Sentinel-2 is equipped with an optical instrument payload ...
+ sentinel3_olci: OLCI (Ocean and Land Colour Instrument) is an optical ...
+ ...
+usgs:
+ ...
+ landsat8_oli_tirs: Landsat-8 Operational Land Imager and Thermal Infrared ...
+ landsat9_oli_tirs: Landsat-9 Operational Land Imager and Thermal Infrared ...
+```
+
+You can also browse open data datasets in the [Tilebox Console](https://console.tilebox.com/datasets/open-data) when you want descriptions, provider details, the dataset schema, and available collections before writing code.
+
+## Select the Sentinel-2 catalog
+
+Access the Sentinel-2 MSI dataset by its slug. The dataset contains collections for Sentinel-2 products such as `S2A_S2MSI2A`.
+
+```python Python
+sentinel2 = client.dataset("open_data.copernicus.sentinel2_msi")
+
+for name, collection in sentinel2.collections().items():
+ print(name, collection)
+```
+
## Define the search area
Create a polygon for the area you want to inspect. This example uses a bounding box around Colorado.
@@ -26,10 +67,12 @@ from shapely import Polygon
area = Polygon(
[
+ # lon, lat
(-109.05, 37.0),
(-102.05, 37.0),
(-102.05, 41.0),
(-109.05, 41.0),
+ # close the square (repeat the first element)
(-109.05, 37.0),
]
)
@@ -37,13 +80,9 @@ area = Polygon(
## Query Sentinel-2 metadata
-Select the Sentinel-2 MSI open data catalog and query a collection by time and location.
+Query the Sentinel-2 Level-2A collection by time and location. This returns metadata for matching scenes; it does not download image products.
```python Python
-from tilebox.datasets import Client
-
-client = Client()
-sentinel2 = client.dataset("open_data.copernicus.sentinel2_msi")
collection = sentinel2.collection("S2A_S2MSI2A")
scenes = collection.query(
@@ -55,32 +94,27 @@ scenes = collection.query(
print(scenes[["granule_name", "processing_level", "product_type"]])
```
-The result is an `xarray.Dataset` containing scene metadata. Use it to inspect candidate scenes, filter by metadata fields, or pass selected datapoints to a storage client.
+The result is an `xarray.Dataset` containing scene metadata. Use it to inspect candidate scenes, filter by metadata fields, or pass selected datapoints to a workflow task.
-## Download matching products
+## Filter the metadata result
-Metadata queries do not download product files. Use a [storage client](/datasets/storage/clients) when you want to read or download the files referenced by a datapoint.
+Metadata results behave like regular `xarray.Dataset` objects. You can filter, sort, or select scenes before deciding what to process next.
```python Python
-from pathlib import Path
-from tilebox.storage import CopernicusStorageClient
-
-storage = CopernicusStorageClient(
- cache_directory=Path("./data"),
- s3_access_key_id="YOUR_COPERNICUS_ACCESS_KEY",
- s3_secret_access_key="YOUR_COPERNICUS_SECRET_KEY",
-)
+low_cloud = scenes.where(scenes.cloud_cover < 10, drop=True)
+latest = low_cloud.sortby("time").isel(time=-1)
-first_scene = scenes.isel(time=0)
-path = storage.download(first_scene)
-print(path)
+print(latest.granule_name.item())
+print(latest.cloud_cover.item())
```
+Metadata queries do not download product files. Use a [storage client](/datasets/storage/clients) when you want to read or download the files referenced by a datapoint.
+
## Next steps
-
- Follow a longer Sentinel-2 example with output inspection.
+
+ Download Copernicus product files with the storage client.
Configure provider-specific clients for product access.
diff --git a/guides/workflows/deploy-to-your-compute.mdx b/guides/workflows/deploy-to-your-compute.mdx
index d501a03..90fbb03 100644
--- a/guides/workflows/deploy-to-your-compute.mdx
+++ b/guides/workflows/deploy-to-your-compute.mdx
@@ -51,15 +51,44 @@ tilebox runner start --cluster workflow-dev --debug
The runner watches its cluster, downloads missing release artifacts, starts the workflow runtime, and advertises the tasks it can execute. Updating a deployment changes what the runner can execute without rebuilding the runner process.
+## Bundle the release runner in a container
+
+For cloud or Kubernetes deployments, package the release runner into a small container image. The image only needs Python, `uv`, the Tilebox command-line tool, and any system dependencies your workflow runtime needs. The workflow code itself comes from the deployed workflow release.
+
+```dockerfile Dockerfile
+FROM python:3.13-slim
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+
+# Install system dependencies.
+RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl git git-lfs openssh-client \
+ && rm -rf /var/lib/apt/lists/* \
+ && apt-get clean
+
+RUN curl -fsSL https://cli.tilebox.com/install.sh | TILEBOX_INSTALL_DIR=/usr/local/bin TILEBOX_NO_INSTALL_COMPLETIONS=1 sh
+
+# Required at runtime: set TILEBOX_CLUSTER to a valid cluster slug and
+# TILEBOX_API_KEY to an API key that can read deployments and claim tasks.
+ENV TILEBOX_CLUSTER=""
+ENV TILEBOX_API_KEY=""
+
+CMD ["tilebox", "runner", "start"]
+```
+
+Build and publish this image with your normal container workflow. At runtime, provide `TILEBOX_CLUSTER` and `TILEBOX_API_KEY` through your deployment system rather than baking secrets into the image.
+
+In Kubernetes, run the image as a `Deployment` and store `TILEBOX_API_KEY` in a `Secret`. Set `TILEBOX_CLUSTER` through the pod environment and scale replicas to increase runner concurrency. In Google Cloud, run the same image on Cloud Run jobs, GKE, or Compute Engine depending on your workload constraints. In AWS, run it on ECS, EKS, or EC2 and inject the API key through your secret manager or task definition.
+
## Scale runner processes
-Start multiple runners for the same cluster when you want more parallelism.
+Scale the number of runner containers or virtual machine instances when you want more parallelism. In Kubernetes, increase the `Deployment` replica count. In GCP or AWS, use the scaling controls of the service that runs the container, such as Cloud Run, GKE, ECS, EKS, or an auto-scaling VM group. Each runner process connects to the same cluster and claims compatible tasks independently.
+
+As an alternative for local testing or constrained environments, you can run multiple runner processes inside one container or shell session. Use `tilebox parallel` only for that case.
```bash
tilebox parallel -n 4 -- tilebox runner start --cluster workflow-dev
```
-You can also run runners through your own process manager, VM image, container platform, or scheduler. Tilebox does not require the compute process to run in Tilebox-managed infrastructure.
+Tilebox does not require the runner process to run in Tilebox-managed infrastructure. Use the process manager, scheduler, or container platform that fits your compute environment.
## Verify cluster alignment
diff --git a/guides/workflows/execute-tasks-in-parallel.mdx b/guides/workflows/execute-tasks-in-parallel.mdx
new file mode 100644
index 0000000..9abfa31
--- /dev/null
+++ b/guides/workflows/execute-tasks-in-parallel.mdx
@@ -0,0 +1,172 @@
+---
+title: Execute tasks in parallel
+description: Submit multiple workflow subtasks and process them faster by running multiple direct runners at the same time.
+icon: arrows-split-up-and-left
+---
+
+Use this guide when a workflow can split work into independent tasks. You will create a small workflow that submits 20 sleep subtasks, submit one job, run it with one direct runner, and then run the same workflow with five direct runners in parallel.
+
+The example is intentionally simple. `time.sleep` stands in for real work such as downloading scenes, processing tiles, calling a model, or writing output files.
+
+## Prerequisites
+
+- You have a [Tilebox API key](/authentication).
+- You have installed [`uv`](https://docs.astral.sh/uv/).
+- You have installed the [Tilebox command-line tool](/agents-and-ai-tools/tilebox-cli) if you want to use `tilebox parallel`.
+
+```bash
+export TILEBOX_API_KEY="YOUR_TILEBOX_API_KEY"
+```
+
+## Create the workflow file
+
+Create a file named `parallel_workflow.py`. The script uses inline `uv` dependencies, so you can run it directly with `uv run parallel_workflow.py`.
+
+```python parallel_workflow.py
+# /// script
+# dependencies = ["cyclopts", "tilebox"]
+# ///
+
+import time
+
+from cyclopts import App
+from tilebox.workflows import Client, ExecutionContext, Runner, Task
+
+
+app = App()
+
+
+class ParallelSleepWorkflow(Task):
+ count: int
+ seconds: float
+
+ def execute(self, context: ExecutionContext) -> None:
+ context.logger.info(
+ "Submitting sleep subtasks",
+ count=self.count,
+ seconds=self.seconds,
+ )
+ context.submit_subtasks(
+ [
+ SleepTask(index=index, seconds=self.seconds)
+ for index in range(self.count)
+ ]
+ )
+
+
+class SleepTask(Task):
+ index: int
+ seconds: float
+
+ def execute(self, context: ExecutionContext) -> None:
+ context.current_task.display = f"SleepTask({self.index})"
+ context.logger.info("Starting sleep task", index=self.index)
+ time.sleep(self.seconds)
+ context.logger.info("Finished sleep task", index=self.index)
+
+
+runner = Runner(tasks=[ParallelSleepWorkflow, SleepTask])
+
+
+def submit_job(count: int, seconds: float) -> None:
+ client = Client()
+ job = client.jobs().submit(
+ "parallel-sleep-workflow",
+ ParallelSleepWorkflow(count=count, seconds=seconds),
+ )
+ print(f"Submitted job: {job.id}")
+ print(f"Open in Console: https://console.tilebox.com/workflows/jobs/{job.id}")
+
+
+def run_runner() -> None:
+ client = Client()
+ runner.connect_to(client).run_all()
+
+
+@app.default
+def main(submit: bool = False, count: int = 20, seconds: float = 5.0) -> None:
+ """Run a direct runner, or submit a new job with --submit."""
+ if submit:
+ submit_job(count=count, seconds=seconds)
+ return
+
+ run_runner()
+
+
+if __name__ == "__main__":
+ app()
+```
+
+The root task, `ParallelSleepWorkflow`, does not do the slow work itself. It submits many independent `SleepTask` subtasks. Tilebox tracks the tasks in one job and lets any eligible runner claim queued work.
+
+## Submit a job
+
+Submit one job with 20 subtasks. The script exits after submitting the job, so no work is executed yet.
+
+```bash
+uv run parallel_workflow.py --submit --count 20 --seconds 5
+```
+
+Copy the job ID from the output. You can inspect it in the Console while runners process the queue.
+
+```plaintext Output
+Submitted job: 019f2c8c-3df2-4ed0-9d8f-8a4f19c47a7c
+Open in Console: https://console.tilebox.com/workflows/jobs/
+```
+
+## Run one direct runner
+
+Start one direct runner from the same file.
+
+```bash
+uv run parallel_workflow.py
+```
+
+The runner executes tasks, but only one after the other, and exits when no more work is available. With one runner, the sleep subtasks don't run in parallel at all.
+
+## Run five direct runners
+
+Submit another job, then start five runner processes for the same workflow file.
+
+```bash
+uv run parallel_workflow.py --submit --count 20 --seconds 5
+tilebox parallel -n 5 -- uv run parallel_workflow.py
+```
+
+This starts five direct runners. Each process registers the same task classes and asks Tilebox for work. Tilebox assigns queued tasks across the available runners, so multiple `SleepTask` instances run at the same time.
+
+
+ Takeaway: use `tilebox parallel -n 5 -- uv run parallel_workflow.py` to start five local direct runners for the same workflow file.
+
+
+## What to expect
+
+The first runner to claim `ParallelSleepWorkflow` submits the subtasks. After that, all runners can claim compatible `SleepTask` tasks from the same job.
+
+In the Console, you should see:
+
+- one root task that submits the subtask fan-out
+- many `SleepTask(index)` tasks
+- multiple tasks running at overlapping times when five runners are active
+- logs from each task attached to the same job
+
+For command-line inspection, query logs or spans for the job:
+
+```bash
+tilebox job logs --json
+tilebox job spans --json
+```
+
+## Next steps
+
+
+
+ Learn how runners claim queued tasks and how direct runners differ from release runners.
+
+
+ Learn how parent tasks submit subtasks, define dependencies, and report progress.
+
+
+ Inspect task state, logs, traces, runner context, and cluster alignment.
+
+
diff --git a/index.mdx b/index.mdx
index 1e8df1a..cb595b2 100644
--- a/index.mdx
+++ b/index.mdx
@@ -73,7 +73,7 @@ export const HomeSearch = () => {
Get API Key
-
+
Ingest data
@@ -276,7 +276,7 @@ export const HomeSearch = () => {
-
+
diff --git a/quickstart.mdx b/quickstart.mdx
index 13bfeef..f4c0de0 100644
--- a/quickstart.mdx
+++ b/quickstart.mdx
@@ -117,11 +117,11 @@ If you prefer to work locally, follow these steps to get started.
Review the following guides to learn more about the modules that make up Tilebox:
-
- Learn how to create a Timeseries dataset using the Tilebox Console.
+
+ Learn how to create a custom dataset catalog with the Python SDK.
-
- Learn how to ingest an existing CSV dataset into a Timeseries dataset collection.
+
+ Learn how to ingest GeoParquet metadata into an existing spatio-temporal catalog.
Inspect task state, logs and traces when a workflow job fails.
diff --git a/workflows/build-and-deploy/project-structure.mdx b/workflows/build-and-deploy/project-structure.mdx
index 108ba82..0e13360 100644
--- a/workflows/build-and-deploy/project-structure.mdx
+++ b/workflows/build-and-deploy/project-structure.mdx
@@ -29,11 +29,33 @@ Use a layout where task code and the runner definition are importable from the p
+## Define the Python project
+
+Use a minimal `pyproject.toml` with the dependencies your workflow needs. For this example, only the Tilebox Python package is required.
+
+```toml pyproject.toml
+[project]
+name = "my-workflow"
+version = "0.1.0"
+requires-python = ">=3.11"
+dependencies = [
+ "tilebox",
+]
+```
+
+Create the package directory and an empty `__init__.py` file so Python can import `my_workflow.runner` from the project root.
+
+```bash
+mkdir -p my_workflow
+touch my_workflow/__init__.py
+uv lock
+```
+
## Define tasks
Put task classes in a module that can be imported during release validation.
-```python Python
+```python my_workflow/tasks.py
# my_workflow/tasks.py
from tilebox.workflows import ExecutionContext, Task
@@ -56,7 +78,7 @@ Use explicit identifiers for workflow code that will be published. A stable iden
Create a module that exports a `Runner` object. This object defines the task registrations for the workflow, and release builds import it during validation.
-```python Python
+```python my_workflow/runner.py
# my_workflow/runner.py
from tilebox.workflows import Runner
from tilebox.workflows.cache import LocalFileSystemCache