Skip to content

test: unskip two spark related tests#5895

Merged
lucasjia-aws merged 5 commits into
aws:masterfrom
lucasjia-aws:fix/spark_related_tests
May 29, 2026
Merged

test: unskip two spark related tests#5895
lucasjia-aws merged 5 commits into
aws:masterfrom
lucasjia-aws:fix/spark_related_tests

Conversation

@lucasjia-aws
Copy link
Copy Markdown
Collaborator

@lucasjia-aws lucasjia-aws commented May 27, 2026

Unskip test_to_pipeline_and_execute_with_lake_formation

This test was previously skipped because the CI account (729646638167) lacked the Lake Formation environment setup required for credential vending. The test creates a Feature Group with LakeFormationConfig(enabled=True) and runs a SageMaker Pipeline whose Spark job calls GetTemporaryCredentialsForTableV2 to ingest data into the offline store.

Root Cause of Previous Failure

The Feature Store Spark SDK calls lakeformation:GetTemporaryCredentialsForTableV2 (the V2 credential vending API) during ingestion when use_lake_formation_credentials=True. This API requires:

  1. The caller must not be a Data Lake Admin
  2. The caller must have Lake Formation table-level permissions (SELECT/INSERT)
  3. The S3 data location must be registered with Lake Formation
  4. AllowFullTableExternalDataAccess must be enabled in the account's data lake settings — this is the setting that authorizes third-party engines (like Spark running in SageMaker Training) to call the credential vending API

CI Account Configuration Applied

The following Lake Formation and IAM configurations were applied to account 729646638167 (us-west-2):

Configuration Detail
IAM inline policy on PullRequestBuildRole lakeformation:* on *
LF Database grant ALL on sagemaker_featurestore
LF Table wildcard grant ALL on all tables in sagemaker_featurestore
LF Data Location Access On s3://sagemaker-test-featurestore-us-west-2-729646638167
Data Lake Admin PullRequestBuildRole removed (credential vending doesn't work for admins)
AllowFullTableExternalDataAccess true (required for Spark credential vending)
AllowExternalDataFiltering true
AuthorizedSessionTagValueList ["*"]
Trust policy on PullRequestBuildRole Already includes lakeformation.amazonaws.com

Fix test_decorator_with_spark_job (sagemaker-core)

Problem

The Spark integ test test_decorator_with_spark_job was failing with two issues:

  1. ModuleNotFoundError: No module named 'sagemaker' — The Spark processing image (sagemaker-spark-processing:3.5-cpu-py312-v1) does not have sagemaker-core pre-installed. When smspark-submit runs spark_app.py, it tries to from sagemaker.core.remote_function import invoke_function and fails.

  2. ModuleNotFoundError: No module named '_pytest' — Pytest's assertion rewriting injects _pytest module references into the function bytecode. When cloudpickle serializes the @remote function containing assert, the serialized payload captures this dependency. The Spark container doesn't have pytest installed, so deserialization fails.

Root Cause

The sagemaker-mlops Spark integ tests (test_feature_processor_integ.py) already solve this problem with a well-established pattern:

  1. Build local SDK wheels (sagemaker, sagemaker-core, sagemaker-mlops)
  2. Upload them to S3
  3. Install them in the Spark container via pre_execution_commands

The sagemaker-core Spark test was missing this entirely — it only passed spark_config=SparkConfig(...) without any pre_execution_commands or dependencies, relying on the container having the SDK pre-installed (which was true for the old v2 SDK images but not for v3/sagemaker-core).

Fix

Mirroring the existing pattern from sagemaker-mlops/tests/integ/feature_store/feature_processor/test_feature_processor_integ.py:

  1. Added spark_pre_execution_commands fixture in conftest.py — builds the local sagemaker-core wheel, uploads it to S3, and returns install commands. This directly mirrors get_pre_execution_commands() / get_wheel_file_s3_uri() in the mlops test.

  2. Added pre_execution_commands=spark_pre_execution_commands to the @remote decorator call in the test, ensuring the Spark container has the correct dev version of sagemaker-core installed — same as how mlops passes pre_execution_commands=pre_execution_commands to its @remote calls.

  3. Replaced assert with if/raise RuntimeError inside the remote function to avoid pytest's assertion rewriting polluting the serialized function with _pytest module references. (The mlops tests avoid this issue naturally because their remote functions don't use assert.)

  4. Added Spark 3.5 fallback for py312 in job.py — when the default Spark version doesn't support py312, fall back to Spark 3.5 which has a py312 image available.

@lucasjia-aws lucasjia-aws changed the title unskip two spark related tests test: unskip two spark related tests May 27, 2026
…test

The Spark processing image does not have sagemaker-core pre-installed.
Build the local dev wheel, upload to S3, and install it in the container
via pre_execution_commands, mirroring the pattern used in sagemaker-mlops
feature_processor integ tests.
Pytest's assertion rewriting injects _pytest module references into
the function bytecode. When cloudpickle serializes the function and
the Spark container deserializes it, it fails with
ModuleNotFoundError: No module named '_pytest' since pytest is not
installed in the container.
@lucasjia-aws
Copy link
Copy Markdown
Collaborator Author

@lucasjia-aws lucasjia-aws merged commit 65d6c04 into aws:master May 29, 2026
15 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants