test: unskip two spark related tests#5895
Merged
Merged
Conversation
c8be4a6 to
0cf35eb
Compare
…test The Spark processing image does not have sagemaker-core pre-installed. Build the local dev wheel, upload to S3, and install it in the container via pre_execution_commands, mirroring the pattern used in sagemaker-mlops feature_processor integ tests.
Pytest's assertion rewriting injects _pytest module references into the function bytecode. When cloudpickle serializes the function and the Spark container deserializes it, it fails with ModuleNotFoundError: No module named '_pytest' since pytest is not installed in the container.
Collaborator
Author
Collaborator
Author
Collaborator
Author
Collaborator
Author
mujtaba1747
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unskip
test_to_pipeline_and_execute_with_lake_formationThis test was previously skipped because the CI account (
729646638167) lacked the Lake Formation environment setup required for credential vending. The test creates a Feature Group withLakeFormationConfig(enabled=True)and runs a SageMaker Pipeline whose Spark job callsGetTemporaryCredentialsForTableV2to ingest data into the offline store.Root Cause of Previous Failure
The Feature Store Spark SDK calls
lakeformation:GetTemporaryCredentialsForTableV2(the V2 credential vending API) during ingestion whenuse_lake_formation_credentials=True. This API requires:AllowFullTableExternalDataAccessmust be enabled in the account's data lake settings — this is the setting that authorizes third-party engines (like Spark running in SageMaker Training) to call the credential vending APICI Account Configuration Applied
The following Lake Formation and IAM configurations were applied to account
729646638167(us-west-2):PullRequestBuildRolelakeformation:*on*sagemaker_featurestoresagemaker_featurestores3://sagemaker-test-featurestore-us-west-2-729646638167PullRequestBuildRoleremoved (credential vending doesn't work for admins)AllowFullTableExternalDataAccesstrue(required for Spark credential vending)AllowExternalDataFilteringtrueAuthorizedSessionTagValueList["*"]PullRequestBuildRolelakeformation.amazonaws.comFix
test_decorator_with_spark_job(sagemaker-core)Problem
The Spark integ test
test_decorator_with_spark_jobwas failing with two issues:ModuleNotFoundError: No module named 'sagemaker'— The Spark processing image (sagemaker-spark-processing:3.5-cpu-py312-v1) does not havesagemaker-corepre-installed. Whensmspark-submitrunsspark_app.py, it tries tofrom sagemaker.core.remote_function import invoke_functionand fails.ModuleNotFoundError: No module named '_pytest'— Pytest's assertion rewriting injects_pytestmodule references into the function bytecode. Whencloudpickleserializes the@remotefunction containingassert, the serialized payload captures this dependency. The Spark container doesn't havepytestinstalled, so deserialization fails.Root Cause
The sagemaker-mlops Spark integ tests (
test_feature_processor_integ.py) already solve this problem with a well-established pattern:sagemaker,sagemaker-core,sagemaker-mlops)pre_execution_commandsThe sagemaker-core Spark test was missing this entirely — it only passed
spark_config=SparkConfig(...)without anypre_execution_commandsordependencies, relying on the container having the SDK pre-installed (which was true for the old v2 SDK images but not for v3/sagemaker-core).Fix
Mirroring the existing pattern from
sagemaker-mlops/tests/integ/feature_store/feature_processor/test_feature_processor_integ.py:Added
spark_pre_execution_commandsfixture inconftest.py— builds the localsagemaker-corewheel, uploads it to S3, and returns install commands. This directly mirrorsget_pre_execution_commands()/get_wheel_file_s3_uri()in the mlops test.Added
pre_execution_commands=spark_pre_execution_commandsto the@remotedecorator call in the test, ensuring the Spark container has the correct dev version ofsagemaker-coreinstalled — same as how mlops passespre_execution_commands=pre_execution_commandsto its@remotecalls.Replaced
assertwithif/raise RuntimeErrorinside the remote function to avoid pytest's assertion rewriting polluting the serialized function with_pytestmodule references. (The mlops tests avoid this issue naturally because their remote functions don't useassert.)Added Spark 3.5 fallback for py312 in
job.py— when the default Spark version doesn't support py312, fall back to Spark 3.5 which has a py312 image available.