Skip to content

feat(aws): add --vpc-id flag to deploy into existing VPCs#850

Open
adrianriobo wants to merge 2 commits into
redhat-developer:mainfrom
adrianriobo:fix-849
Open

feat(aws): add --vpc-id flag to deploy into existing VPCs#850
adrianriobo wants to merge 2 commits into
redhat-developer:mainfrom
adrianriobo:fix-849

Conversation

@adrianriobo

Copy link
Copy Markdown
Collaborator

Summary

  • Adds a --vpc-id flag to all standard AWS host commands (rhel, windows, fedora, rhel-ai) so mapt can reuse an existing VPC instead of creating a new one
  • Spot AZ search is restricted to AZs that have subnets in the given VPC
  • On-demand AZ selection is restricted to the VPC's subnet AZs
  • Airgap is rejected when --vpc-id is set (mutually exclusive)
  • existingVPCNetwork() resolves a public subnet in the chosen AZ and reads the VPC/subnet as Pulumi read-only resources (ec2.GetVpc / ec2.GetSubnet), so no new networking infrastructure is created

Closes #849

Test plan

  • Deploy a RHEL host with --vpc-id <existing-vpc> and verify it lands in a subnet of that VPC
  • Deploy a Windows host with --vpc-id and confirm no new VPC is created
  • Confirm spot instance selection is limited to AZs with subnets in the given VPC
  • Confirm --airgap + --vpc-id is rejected with a clear error
  • Deploy without --vpc-id and confirm existing behaviour is unchanged

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 262f4259-357f-416c-83b9-f012194a77ce

📥 Commits

Reviewing files that changed from the base of the PR and between 85f501a and 8f2067b.

📒 Files selected for processing (9)
  • pkg/provider/aws/action/fedora/fedora.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/provider/aws/action/rhel/rhel.go
  • pkg/provider/aws/action/windows/windows.go
  • pkg/provider/aws/data/azs.go
  • pkg/provider/aws/data/network.go
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/modules/ec2/compute/compute.go
  • pkg/provider/aws/modules/network/network.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/provider/aws/action/windows/windows.go
  • pkg/provider/aws/data/network.go
  • pkg/provider/aws/data/azs.go
  • pkg/provider/aws/modules/ec2/compute/compute.go
  • pkg/provider/aws/action/fedora/fedora.go
  • pkg/provider/aws/modules/network/network.go
  • pkg/provider/aws/action/rhel/rhel.go

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added support for deploying AWS hosts into an existing VPC using vpc-id (Tekton tasks included).
    • Provisioning workflows for Fedora, RHEL, RHEL AI, and Windows now accept vpc-id.
    • Spot placement and networking can be constrained to availability zones available in the chosen VPC.
  • Bug Fixes
    • Added validation to prevent combining vpc-id with airgap.
    • Private-subnet deployments now avoid public ingress/ready checks; public EIP/public endpoints and SSH ingress are applied only when networking is public.

Walkthrough

Adds an optional VpcID input for AWS host provisioning, threads it through CLI, task, action, allocation, network, and compute layers, and adds VPC-scoped subnet and availability-zone selection for spot and on-demand placement.

Changes

Fixed VPC Support

Layer / File(s) Summary
VpcID flag and task parameters
cmd/mapt/cmd/params/params.go, tkn/infra-aws-*.yaml, tkn/template/infra-aws-*.yaml
Adds the vpc-id input, registers it for CLI use, and conditionally appends --vpc-id in the Tekton provisioning commands.
Host create command wiring
cmd/mapt/cmd/aws/hosts/fedora.go, cmd/mapt/cmd/aws/hosts/rhel.go, cmd/mapt/cmd/aws/hosts/rhelai.go, cmd/mapt/cmd/aws/hosts/windows.go
Passes params.NetworkVpcID() into the host create argument structs.
Host action VpcID propagation
pkg/provider/aws/action/fedora/fedora.go, pkg/provider/aws/action/rhel/rhel.go, pkg/provider/aws/action/windows/windows.go, pkg/provider/aws/action/rhel-ai/rhelai.go, pkg/target/host/rhelai/api.go
Adds VpcID to action args and request structs, validates VpcID with Airgap, and forwards VpcID into allocation and network calls.
AZ-restricted allocation and spot selection
pkg/provider/aws/modules/allocation/allocation.go, pkg/provider/aws/modules/spot/stack.go, pkg/provider/aws/data/spot.go, pkg/provider/aws/data/azs.go
Derives allowed AZs from VPC subnets, filters on-demand AZ choice, threads AllowedAZs through spot selection, and updates AZ resolution for all availability zones.
Existing-VPC network path
pkg/provider/aws/modules/network/network.go, pkg/provider/aws/data/network.go
Adds existing-VPC network creation, subnet lookup helpers, and VPC/AZ subnet filtering for public subnet selection.
Compute wiring for public/private deployments
pkg/provider/aws/modules/ec2/compute/compute.go
Adjusts dependency wiring, public IP/EIP handling, target group creation, and host DNS selection based on network and EIP state.

Estimated code review effort: 4 (Complex) | ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: adding a --vpc-id flag to deploy into existing VPCs.
Description check ✅ Passed The description matches the PR and covers the VPC, AZ filtering, and airgap restrictions.
Linked Issues check ✅ Passed The changes satisfy #849 by supporting fixed VPCs, restricting spot AZs, and rejecting airgap with --vpc-id.
Out of Scope Changes check ✅ Passed The additional networking and compute adjustments still support existing-VPC deployment and do not appear unrelated.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/provider/aws/action/fedora/fedora.go (1)

103-110: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Restrict existing-VPC AZs to public-subnet AZs. GetSubnetAZsForVPC returns every AZ with any subnet, but existingVPCNetwork later requires a public subnet in the chosen AZ. In a VPC that has only private subnets in one AZ, allocation can pick that AZ and the deploy fails. Filter candidate AZs earlier or reject the VPC setup up front.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/action/fedora/fedora.go` around lines 103 - 110, Restrict AZ
selection for existing VPCs in the AWS fedora provider flow:
`GetSubnetAZsForVPC` is currently allowing any AZ with a subnet, but
`existingVPCNetwork` later needs a public subnet in the chosen AZ. Update the
allocation path around `allocation.Allocation` / `allocation.AllocationArgs` to
filter candidate AZs to only those with public subnets, or fail fast when the
VPC lacks a valid public-subnet AZ, so we never allocate an AZ that
`existingVPCNetwork` cannot deploy into.
pkg/provider/aws/data/network.go (1)

126-176: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

isPublic dereferences route.GatewayId before the nil check — crash risk now reachable via the new GetPublicSubnetIDInAZ.

gwID := *route.GatewayId
if route.GatewayId != nil && len(gwID) > 0 && gwID[:2] == "igw" {

*route.GatewayId is dereferenced unconditionally before checking route.GatewayId != nil. Route tables commonly contain routes without a GatewayId (NAT Gateway, VPC peering, Transit Gateway, prefix-list routes) — for those, this panics with a nil-pointer dereference. This helper is pre-existing but is now exercised on every VPC/AZ lookup performed by the new GetPublicSubnetIDInAZ (lines 126-148), used for every existing-VPC deployment, so any route table containing such a route will crash the deployment instead of returning a clean error.

🐛 Proposed fix
 	for _, routeTable := range routeTablesOutput.RouteTables {
 		for _, route := range routeTable.Routes {
-			gwID := *route.GatewayId
-			if route.GatewayId != nil && len(gwID) > 0 && gwID[:2] == "igw" {
+			if route.GatewayId != nil && strings.HasPrefix(*route.GatewayId, "igw") {
 				return nil
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/data/network.go` around lines 126 - 176, The isPublic helper
currently dereferences route.GatewayId before checking for nil, which can panic
on route tables that contain routes without a gateway ID. Update isPublic to
guard route.GatewayId first, then safely inspect the value before checking for
an internet gateway prefix. Keep the fix localized to isPublic, which is called
from GetPublicSubnetIDInAZ for subnet classification.
🧹 Nitpick comments (2)
cmd/mapt/cmd/params/params.go (1)

210-210: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Description omits on-demand AZ restriction.

VpcIDDesc only mentions spot search being restricted to VPC subnet AZs, but per the PR objectives on-demand allocation is also restricted to the VPC's subnet AZs. Worth updating the help text so --help output is accurate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/mapt/cmd/params/params.go` at line 210, Update the VpcIDDesc help text to
mention that both spot search and on-demand allocation are restricted to AZs
with subnets in the specified VPC. Locate the VpcIDDesc constant in params.go
and revise the description so the --help output accurately reflects the VPC AZ
constraints while keeping the existing airgap note.
pkg/provider/aws/data/spot.go (1)

200-211: 🚀 Performance & Scalability | 🔵 Trivial | ⚖️ Poor tradeoff

Placement scores are still fetched for all opt-in regions before filtering by AllowedAZs.

When AllowedAZs is set (fixed VPC, single known region), getPlacementScores (line 188) still runs across nOpInRegions (all non-opt-in regions) before this filter discards everything outside the VPC's region. The adjacent comment already flags placement-score API quota as a concern ("to avoid exceeding the account quota"); restricting apiRegions up front to the VPC's region when AllowedAZs is known would reduce unnecessary API calls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/data/spot.go` around lines 200 - 211, The placement-score
fetch in getPlacementScores is still querying all opt-in regions even when
AllowedAZs already pins the deployment to a known region. Update the apiRegions
selection logic in spot.go so that, when args.AllowedAZs is set, it limits the
regions passed into placement score collection to the VPC’s region before the
API call rather than filtering afterward. Keep the existing AllowedAZs filtering
as a safety net, but make the upfront region restriction part of the
getPlacementScores flow to avoid unnecessary quota usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/mapt/cmd/params/params.go`:
- Around line 234-240: `NetworkVpcID()` currently treats an empty `--vpc-id` as
present because it relies on `viper.IsSet`, which can return true for a
changed-but-empty flag. Update the `NetworkVpcID` helper in `params.go` to
return nil when the resolved VPC ID is empty (or whitespace) so
`NetworkArgs.VpcID` does not route into `existingVPCNetwork` with an invalid ID.
Use the `VpcID` flag lookup in `NetworkVpcID` as the single place to normalize
this behavior before downstream calls like `ec2.GetVpc`.

---

Outside diff comments:
In `@pkg/provider/aws/action/fedora/fedora.go`:
- Around line 103-110: Restrict AZ selection for existing VPCs in the AWS fedora
provider flow: `GetSubnetAZsForVPC` is currently allowing any AZ with a subnet,
but `existingVPCNetwork` later needs a public subnet in the chosen AZ. Update
the allocation path around `allocation.Allocation` / `allocation.AllocationArgs`
to filter candidate AZs to only those with public subnets, or fail fast when the
VPC lacks a valid public-subnet AZ, so we never allocate an AZ that
`existingVPCNetwork` cannot deploy into.

In `@pkg/provider/aws/data/network.go`:
- Around line 126-176: The isPublic helper currently dereferences
route.GatewayId before checking for nil, which can panic on route tables that
contain routes without a gateway ID. Update isPublic to guard route.GatewayId
first, then safely inspect the value before checking for an internet gateway
prefix. Keep the fix localized to isPublic, which is called from
GetPublicSubnetIDInAZ for subnet classification.

---

Nitpick comments:
In `@cmd/mapt/cmd/params/params.go`:
- Line 210: Update the VpcIDDesc help text to mention that both spot search and
on-demand allocation are restricted to AZs with subnets in the specified VPC.
Locate the VpcIDDesc constant in params.go and revise the description so the
--help output accurately reflects the VPC AZ constraints while keeping the
existing airgap note.

In `@pkg/provider/aws/data/spot.go`:
- Around line 200-211: The placement-score fetch in getPlacementScores is still
querying all opt-in regions even when AllowedAZs already pins the deployment to
a known region. Update the apiRegions selection logic in spot.go so that, when
args.AllowedAZs is set, it limits the regions passed into placement score
collection to the VPC’s region before the API call rather than filtering
afterward. Keep the existing AllowedAZs filtering as a safety net, but make the
upfront region restriction part of the getPlacementScores flow to avoid
unnecessary quota usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: dc6d32e0-eed8-4e14-a28d-345ef0cfe68e

📥 Commits

Reviewing files that changed from the base of the PR and between e504607 and b3a0f3d.

📒 Files selected for processing (15)
  • cmd/mapt/cmd/aws/hosts/fedora.go
  • cmd/mapt/cmd/aws/hosts/rhel.go
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/aws/hosts/windows.go
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/action/fedora/fedora.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/provider/aws/action/rhel/rhel.go
  • pkg/provider/aws/action/windows/windows.go
  • pkg/provider/aws/data/network.go
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/modules/allocation/allocation.go
  • pkg/provider/aws/modules/network/network.go
  • pkg/provider/aws/modules/spot/stack.go
  • pkg/target/host/rhelai/api.go

Comment on lines +234 to +240
func NetworkVpcID() *string {
if viper.IsSet(VpcID) {
v := viper.GetString(VpcID)
return &v
}
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Empty-string --vpc-id treated as “set”.

viper.IsSet returns true once the flag is Changed, even if the value is "" (e.g. --vpc-id="$VPC_ID" with an unset env var in a CI script). NetworkVpcID() then returns a pointer to "" instead of nil, and downstream (NetworkArgs.VpcID != nil) will route into existingVPCNetwork and call ec2.GetVpc with an empty ID, failing deep in the Pulumi provider instead of failing fast with a clear message.

🔧 Proposed fix
 func NetworkVpcID() *string {
-	if viper.IsSet(VpcID) {
-		v := viper.GetString(VpcID)
-		return &v
-	}
-	return nil
+	if v := viper.GetString(VpcID); v != "" {
+		return &v
+	}
+	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NetworkVpcID() *string {
if viper.IsSet(VpcID) {
v := viper.GetString(VpcID)
return &v
}
return nil
}
func NetworkVpcID() *string {
if v := viper.GetString(VpcID); v != "" {
return &v
}
return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/mapt/cmd/params/params.go` around lines 234 - 240, `NetworkVpcID()`
currently treats an empty `--vpc-id` as present because it relies on
`viper.IsSet`, which can return true for a changed-but-empty flag. Update the
`NetworkVpcID` helper in `params.go` to return nil when the resolved VPC ID is
empty (or whitespace) so `NetworkArgs.VpcID` does not route into
`existingVPCNetwork` with an invalid ID. Use the `VpcID` flag lookup in
`NetworkVpcID` as the single place to normalize this behavior before downstream
calls like `ec2.GetVpc`.

@adrianriobo adrianriobo force-pushed the fix-849 branch 2 times, most recently from b49b745 to 3bcc437 Compare July 1, 2026 11:27
When --vpc-id is set, mapt reuses an existing VPC instead of creating
a new one. The flag is available on all standard AWS hosts (rhel,
windows, fedora, rhel-ai) and exposed as a param in all Tekton tasks.

Key behaviours:
- Spot AZ search is restricted to AZs that have subnets in the given VPC
- On-demand AZ selection is restricted to the VPC's subnet AZs
- Airgap is rejected when --vpc-id is set (mutually exclusive)
- existingVPCNetwork() resolves a public subnet in the chosen AZ and
  reads the VPC/subnet as pulumi read-only resources (ec2.GetVpc /
  ec2.GetSubnet), so no new networking infrastructure is created

Also fix spot AZ resolution: GetSpotPlacementScores can return AZ IDs
for zones not visible to the account via the default DescribeAvailabilityZones
call. describeAvailabilityZonesByRegions now uses AllAvailabilityZones: true
so all AZ IDs can be resolved to names during spot search, preventing
spurious "skipping AZ: az id not found" drops that led to no spot
option being found in accounts with SCP-restricted regions.

Closes redhat-developer#849

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/provider/aws/action/fedora/fedora.go (1)

86-88: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate validation across host actions.

Per the PR stack, this identical VpcID != nil && Airgap mutual-exclusion check is repeated verbatim across Fedora, RHEL, Windows, and RHEL AI actions. Consider extracting a single shared validator (e.g., in a common package) to avoid drift if the rule changes later.

♻️ Example shared helper
func ValidateVpcAirgapExclusivity(vpcID *string, airgap bool) error {
	if vpcID != nil && airgap {
		return fmt.Errorf("--vpc-id and --airgap are mutually exclusive")
	}
	return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/action/fedora/fedora.go` around lines 86 - 88, The
VpcID/Airgap mutual-exclusion validation is duplicated across multiple host
action paths, so move the check in the Fedora action out of the local validation
block into a shared helper. Add a common validator (for example, a package-level
function like ValidateVpcAirgapExclusivity) and call it from the Fedora action’s
validation flow so the same rule is enforced consistently with RHEL, Windows,
and RHEL AI without repeated inline logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/provider/aws/data/network.go`:
- Around line 169-173: The gateway-ID prefix check in isPublic is broken because
it compares a 2-character slice of gwID against the 3-character literal "igw",
so internet-gateway routes are never recognized. Update the route detection
logic in the AWS network helpers (especially isPublic, which is used by
getPublicSubnets, GetRandomPublicSubnet, and GetPublicSubnetIDInAZ) to validate
the full prefix safely and with a bounds check that matches the prefix length,
so explicit route-table associations are detected correctly.
- Around line 126-148: GetPublicSubnetIDInAZ is missing the same main route
table fallback used by getPublicSubnets, so it can incorrectly report no public
subnet when subnets rely on the VPC main route table. Update
GetPublicSubnetIDInAZ to mirror the existing public-subnet detection flow by
checking isPublic for each subnet and treating ErrNoRouteTableBySubnetID as a
public-subnet case when appropriate, using the existing
isPublic/getPublicSubnets behavior as the reference point.

---

Nitpick comments:
In `@pkg/provider/aws/action/fedora/fedora.go`:
- Around line 86-88: The VpcID/Airgap mutual-exclusion validation is duplicated
across multiple host action paths, so move the check in the Fedora action out of
the local validation block into a shared helper. Add a common validator (for
example, a package-level function like ValidateVpcAirgapExclusivity) and call it
from the Fedora action’s validation flow so the same rule is enforced
consistently with RHEL, Windows, and RHEL AI without repeated inline logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 655c91f5-9c0b-489a-8cea-e9a2cd46cb1f

📥 Commits

Reviewing files that changed from the base of the PR and between b49b745 and 92d011a.

📒 Files selected for processing (24)
  • cmd/mapt/cmd/aws/hosts/fedora.go
  • cmd/mapt/cmd/aws/hosts/rhel.go
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/aws/hosts/windows.go
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/action/fedora/fedora.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/provider/aws/action/rhel/rhel.go
  • pkg/provider/aws/action/windows/windows.go
  • pkg/provider/aws/data/azs.go
  • pkg/provider/aws/data/network.go
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/modules/allocation/allocation.go
  • pkg/provider/aws/modules/network/network.go
  • pkg/provider/aws/modules/spot/stack.go
  • pkg/target/host/rhelai/api.go
  • tkn/infra-aws-fedora.yaml
  • tkn/infra-aws-rhel-ai.yaml
  • tkn/infra-aws-rhel.yaml
  • tkn/infra-aws-windows-server.yaml
  • tkn/template/infra-aws-fedora.yaml
  • tkn/template/infra-aws-rhel-ai.yaml
  • tkn/template/infra-aws-rhel.yaml
  • tkn/template/infra-aws-windows-server.yaml
🚧 Files skipped from review as they are similar to previous changes (22)
  • tkn/template/infra-aws-fedora.yaml
  • tkn/infra-aws-windows-server.yaml
  • pkg/target/host/rhelai/api.go
  • cmd/mapt/cmd/aws/hosts/fedora.go
  • tkn/template/infra-aws-rhel.yaml
  • tkn/infra-aws-rhel.yaml
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/aws/hosts/windows.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • tkn/infra-aws-fedora.yaml
  • cmd/mapt/cmd/aws/hosts/rhel.go
  • tkn/template/infra-aws-windows-server.yaml
  • pkg/provider/aws/modules/spot/stack.go
  • pkg/provider/aws/action/windows/windows.go
  • tkn/infra-aws-rhel-ai.yaml
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/data/azs.go
  • tkn/template/infra-aws-rhel-ai.yaml
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/modules/allocation/allocation.go
  • pkg/provider/aws/modules/network/network.go
  • pkg/provider/aws/action/rhel/rhel.go

Comment on lines +126 to +148
// GetPublicSubnetIDInAZ returns a public subnet ID in the given AZ within the specified VPC.
func GetPublicSubnetIDInAZ(ctx context.Context, region, vpcID, az string) (*string, error) {
cfg, err := getConfig(ctx, region)
if err != nil {
return nil, err
}
client := ec2.NewFromConfig(cfg)
subnetsOutput, err := client.DescribeSubnets(ctx, &ec2.DescribeSubnetsInput{
Filters: []ec2types.Filter{
{Name: aws.String(filterVPCID), Values: []string{vpcID}},
{Name: aws.String(filterAvailabilityZone), Values: []string{az}},
},
})
if err != nil {
return nil, fmt.Errorf("failed to describe subnets in VPC %s AZ %s: %w", vpcID, az, err)
}
for _, s := range subnetsOutput.Subnets {
if err := isPublic(ctx, client, *s.SubnetId); err == nil {
return s.SubnetId, nil
}
}
return nil, fmt.Errorf("no public subnet found in VPC %s AZ %s", vpcID, az)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Missing "main route table" fallback causes false negatives.

getPublicSubnets (lines 72-90) treats subnets as public when none of them have an explicit route table association (ErrNoRouteTableBySubnetID for all, implying the VPC's main route table applies). GetPublicSubnetIDInAZ doesn't replicate this fallback — for VPCs relying on the main route table (a common setup), this will always return "no public subnet found" even though a valid public subnet exists.

🐛 Proposed fix mirroring existing fallback
 	client := ec2.NewFromConfig(cfg)
 	subnetsOutput, err := client.DescribeSubnets(ctx, &ec2.DescribeSubnetsInput{
 		Filters: []ec2types.Filter{
 			{Name: aws.String(filterVPCID), Values: []string{vpcID}},
 			{Name: aws.String(filterAvailabilityZone), Values: []string{az}},
 		},
 	})
 	if err != nil {
 		return nil, fmt.Errorf("failed to describe subnets in VPC %s AZ %s: %w", vpcID, az, err)
 	}
-	for _, s := range subnetsOutput.Subnets {
-		if err := isPublic(ctx, client, *s.SubnetId); err == nil {
-			return s.SubnetId, nil
-		}
-	}
-	return nil, fmt.Errorf("no public subnet found in VPC %s AZ %s", vpcID, az)
+	noRouteTableCounter := 0
+	for _, s := range subnetsOutput.Subnets {
+		err := isPublic(ctx, client, *s.SubnetId)
+		if err == nil {
+			return s.SubnetId, nil
+		} else if err == ErrNoRouteTableBySubnetID {
+			noRouteTableCounter++
+		}
+	}
+	// If none of the subnets have an explicit route table association, the
+	// VPC's main route table applies, so treat the first subnet as public.
+	if len(subnetsOutput.Subnets) > 0 && noRouteTableCounter == len(subnetsOutput.Subnets) {
+		return subnetsOutput.Subnets[0].SubnetId, nil
+	}
+	return nil, fmt.Errorf("no public subnet found in VPC %s AZ %s", vpcID, az)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetPublicSubnetIDInAZ returns a public subnet ID in the given AZ within the specified VPC.
func GetPublicSubnetIDInAZ(ctx context.Context, region, vpcID, az string) (*string, error) {
cfg, err := getConfig(ctx, region)
if err != nil {
return nil, err
}
client := ec2.NewFromConfig(cfg)
subnetsOutput, err := client.DescribeSubnets(ctx, &ec2.DescribeSubnetsInput{
Filters: []ec2types.Filter{
{Name: aws.String(filterVPCID), Values: []string{vpcID}},
{Name: aws.String(filterAvailabilityZone), Values: []string{az}},
},
})
if err != nil {
return nil, fmt.Errorf("failed to describe subnets in VPC %s AZ %s: %w", vpcID, az, err)
}
for _, s := range subnetsOutput.Subnets {
if err := isPublic(ctx, client, *s.SubnetId); err == nil {
return s.SubnetId, nil
}
}
return nil, fmt.Errorf("no public subnet found in VPC %s AZ %s", vpcID, az)
}
// GetPublicSubnetIDInAZ returns a public subnet ID in the given AZ within the specified VPC.
func GetPublicSubnetIDInAZ(ctx context.Context, region, vpcID, az string) (*string, error) {
cfg, err := getConfig(ctx, region)
if err != nil {
return nil, err
}
client := ec2.NewFromConfig(cfg)
subnetsOutput, err := client.DescribeSubnets(ctx, &ec2.DescribeSubnetsInput{
Filters: []ec2types.Filter{
{Name: aws.String(filterVPCID), Values: []string{vpcID}},
{Name: aws.String(filterAvailabilityZone), Values: []string{az}},
},
})
if err != nil {
return nil, fmt.Errorf("failed to describe subnets in VPC %s AZ %s: %w", vpcID, az, err)
}
noRouteTableCounter := 0
for _, s := range subnetsOutput.Subnets {
err := isPublic(ctx, client, *s.SubnetId)
if err == nil {
return s.SubnetId, nil
} else if err == ErrNoRouteTableBySubnetID {
noRouteTableCounter++
}
}
// If none of the subnets have an explicit route table association, the
// VPC's main route table applies, so treat the first subnet as public.
if len(subnetsOutput.Subnets) > 0 && noRouteTableCounter == len(subnetsOutput.Subnets) {
return subnetsOutput.Subnets[0].SubnetId, nil
}
return nil, fmt.Errorf("no public subnet found in VPC %s AZ %s", vpcID, az)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/data/network.go` around lines 126 - 148,
GetPublicSubnetIDInAZ is missing the same main route table fallback used by
getPublicSubnets, so it can incorrectly report no public subnet when subnets
rely on the VPC main route table. Update GetPublicSubnetIDInAZ to mirror the
existing public-subnet detection flow by checking isPublic for each subnet and
treating ErrNoRouteTableBySubnetID as a public-subnet case when appropriate,
using the existing isPublic/getPublicSubnets behavior as the reference point.

Comment on lines +169 to +173
if route.GatewayId != nil {
gwID := *route.GatewayId
if len(gwID) > 0 && gwID[:2] == "igw" {
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Critical: gateway-ID check always evaluates to false.

gwID[:2] is a 2-character slice but is compared to the 3-character literal "igw". In Go, strings of different lengths are never equal, so this condition can never be true. isPublic will therefore never detect an internet-gateway route via this path, breaking not only the new GetPublicSubnetIDInAZ but also the pre-existing getPublicSubnets/GetRandomPublicSubnet for any VPC where subnets have explicit route-table associations. Also, len(gwID) > 0 is an insufficient bounds guard for a 2-byte slice.

🐛 Proposed fix
 			if route.GatewayId != nil {
 				gwID := *route.GatewayId
-				if len(gwID) > 0 && gwID[:2] == "igw" {
+				if len(gwID) >= 3 && gwID[:3] == "igw" {
 					return nil
 				}
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if route.GatewayId != nil {
gwID := *route.GatewayId
if len(gwID) > 0 && gwID[:2] == "igw" {
return nil
}
if route.GatewayId != nil {
gwID := *route.GatewayId
if len(gwID) >= 3 && gwID[:3] == "igw" {
return nil
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/data/network.go` around lines 169 - 173, The gateway-ID
prefix check in isPublic is broken because it compares a 2-character slice of
gwID against the 3-character literal "igw", so internet-gateway routes are never
recognized. Update the route detection logic in the AWS network helpers
(especially isPublic, which is used by getPublicSubnets, GetRandomPublicSubnet,
and GetPublicSubnetIDInAZ) to validate the full prefix safely and with a bounds
check that matches the prefix length, so explicit route-table associations are
detected correctly.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/provider/aws/modules/network/network.go (1)

75-118: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Avoid creating an EIP for airgap networks.

The airgap branch sets IsPublic = true, so this block now allocates an EIP and passes it downstream, where compute can associate it to the isolated instance. Split EIP allocation from internal-LB creation.

Proposed restructuring
-	if result.IsPublic {
+	if result.IsPublic && !args.Airgap {
 		result.Eip, err = ec2.NewEip(ctx,
 			resourcesUtil.GetResourceName(args.Prefix, args.ID, "lbeip"),
 			&ec2.EipArgs{
 				Tags: mCtx.ResourceTags(),
 			})
 		if err != nil {
 			return nil, err
 		}
-		if args.CreateLoadBalancer {
-			lba := &loadBalancerArgs{
-				prefix: &args.Prefix,
-				id:     &args.ID,
-				mCtx:   mCtx,
-				subnet: result.Subnet,
-			}
-			if !args.Airgap {
-				lba.eip = result.Eip
-			}
-			result.LoadBalancer, err = loadBalancer(ctx, lba)
-			if err != nil {
-				return nil, err
-			}
+	}
+	if result.IsPublic && args.CreateLoadBalancer {
+		lba := &loadBalancerArgs{
+			prefix: &args.Prefix,
+			id:     &args.ID,
+			mCtx:   mCtx,
+			subnet: result.Subnet,
+		}
+		if !args.Airgap {
+			lba.eip = result.Eip
+		}
+		result.LoadBalancer, err = loadBalancer(ctx, lba)
+		if err != nil {
+			return nil, err
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/modules/network/network.go` around lines 75 - 118, The
airgap path in the network builder is incorrectly flowing through the
public-network EIP allocation logic because `result.IsPublic` is set to true, so
`network()` ends up creating an EIP and passing it into `loadBalancer()`. Update
the `network()` flow to separate public EIP creation from internal load balancer
setup: only call `ec2.NewEip` for truly public deployments, and ensure the
airgap branch can still create the load balancer without attaching `result.Eip`.
Use the existing `result.IsPublic`, `args.Airgap`, and `loadBalancerArgs`
symbols to keep the routing explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/provider/aws/action/rhel-ai/rhelai.go`:
- Around line 296-298: The early return in the RHEL AI networking path is
incorrectly treating private-subnet deployments as successful even though
`rhaiisSetupScript()` is skipped. Update the logic around the `autoStart` flow
in the RHEL AI action so private deployments either fail with an explicit error
or use an alternate outbound-compatible setup path before proceeding. Make sure
the fix is applied in the same control flow that checks `nw.IsPublic`, and
verify `rhaiisSetupScript` still runs whenever required for successful service
configuration.

In `@pkg/provider/aws/modules/ec2/compute/compute.go`:
- Around line 291-302: The GetHostDnsName method in Compute currently falls back
to an empty string when there is no LB, EIP, or Instance, which causes private
existing-VPC spot deployments with only an ASG to export an empty host. Update
this path so the existing-VPC spot case is either explicitly rejected with a
clear error or resolved to the launched instance private IP before outputHost is
exported, using the Compute.GetHostDnsName and outputHost flow to locate the
change.

---

Outside diff comments:
In `@pkg/provider/aws/modules/network/network.go`:
- Around line 75-118: The airgap path in the network builder is incorrectly
flowing through the public-network EIP allocation logic because
`result.IsPublic` is set to true, so `network()` ends up creating an EIP and
passing it into `loadBalancer()`. Update the `network()` flow to separate public
EIP creation from internal load balancer setup: only call `ec2.NewEip` for truly
public deployments, and ensure the airgap branch can still create the load
balancer without attaching `result.Eip`. Use the existing `result.IsPublic`,
`args.Airgap`, and `loadBalancerArgs` symbols to keep the routing explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: de56305a-1482-4790-a428-0933af9fae92

📥 Commits

Reviewing files that changed from the base of the PR and between 92d011a and 85f501a.

📒 Files selected for processing (9)
  • pkg/provider/aws/action/fedora/fedora.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/provider/aws/action/rhel/rhel.go
  • pkg/provider/aws/action/windows/windows.go
  • pkg/provider/aws/data/azs.go
  • pkg/provider/aws/data/network.go
  • pkg/provider/aws/data/spot.go
  • pkg/provider/aws/modules/ec2/compute/compute.go
  • pkg/provider/aws/modules/network/network.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/provider/aws/data/azs.go

Comment on lines +296 to +298
if !nw.IsPublic {
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t report success when private RHEL AI cannot run auto-start setup.

This returns before rhaiisSetupScript() runs, so autoStart private-subnet deployments can succeed without configuring the service. Return an explicit error or move setup to an outbound-compatible mechanism.

Proposed guard
 	if !nw.IsPublic {
+		if r.autoStart {
+			return fmt.Errorf("auto-start is not supported in private subnets without SSH connectivity")
+		}
 		return nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !nw.IsPublic {
return nil
}
if !nw.IsPublic {
if r.autoStart {
return fmt.Errorf("auto-start is not supported in private subnets without SSH connectivity")
}
return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/action/rhel-ai/rhelai.go` around lines 296 - 298, The early
return in the RHEL AI networking path is incorrectly treating private-subnet
deployments as successful even though `rhaiisSetupScript()` is skipped. Update
the logic around the `autoStart` flow in the RHEL AI action so private
deployments either fail with an explicit error or use an alternate
outbound-compatible setup path before proceeding. Make sure the fix is applied
in the same control flow that checks `nw.IsPublic`, and verify
`rhaiisSetupScript` still runs whenever required for successful service
configuration.

Comment on lines +291 to +302
func (c *Compute) GetHostDnsName(public bool) pulumi.StringInput {
if c.LB != nil {
return c.LB.DnsName
}
return util.If(public, c.Eip.PublicDns, c.Eip.PrivateDns)
if c.Eip != nil {
return util.If(public, c.Eip.PublicDns, c.Eip.PrivateDns)
}
// Private subnet: no EIP or LB — return the instance's private IP.
if c.Instance != nil {
return c.Instance.PrivateIp
}
return pulumi.String("")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Private spot deployments export an empty host.

For private existing-VPC spot, networking has no LB/EIP and compute only has an ASG, so this falls through to "". Reject that combination or add a way to resolve the launched instance private IP before exporting outputHost.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/modules/ec2/compute/compute.go` around lines 291 - 302, The
GetHostDnsName method in Compute currently falls back to an empty string when
there is no LB, EIP, or Instance, which causes private existing-VPC spot
deployments with only an ASG to export an empty host. Update this path so the
existing-VPC spot case is either explicitly rejected with a clear error or
resolved to the launched instance private IP before outputHost is exported,
using the Compute.GetHostDnsName and outputHost flow to locate the change.

When --vpc-id is set and the chosen AZ contains only private subnets,
mapt now falls back to using any available subnet instead of failing.
Machines in private subnets connect outbound only (e.g. GitLab runner
registration), so inbound SSH rules, EIP allocation, load balancer
creation and SSH readiness checks are all skipped automatically.

Key changes:
- existingVPCNetwork tries GetPublicSubnetIDInAZ first; on failure
  falls back to GetAnySubnetIDInAZ and marks IsPublic=false
- NetworkResult.IsPublic propagates to all host deploy functions;
  EIP and LB are only created when IsPublic is true
- ComputeRequest respects nil Eip: no AssociatePublicIpAddress, no
  EIP association, no LB target groups; GetHostDnsName falls back
  to the instance private IP when neither EIP nor LB is present
- Security groups omit all inbound rules for private subnet deploys
- GetSubnetAZsForVPC returns all AZs (public and private) since
  private-subnet AZs are now valid deployment targets
- GetAnySubnetIDInAZ added to data/network.go
- Spot AZ resolution hardened: describeAvailabilityZonesAllAsync
  falls back to the standard describe when AllAvailabilityZones:true
  is SCP-blocked; getPlacementScores restricts the placement score
  request to regions where AZ ID resolution actually succeeded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(aws): Support for using fixed VPCs

1 participant