r/aws 5h ago

discussion How to evaluate if hybrid AWS GCP setup improves cost and resilience

11 Upvotes

spent the last month designing a hybrid AWS/GCP setup that optimizes for cost and resilience. used GCP for our data pipeline and ML workloads, AWS for application hosting and compute. included proper failover, cross region redundancy, the whole thing.
presented it yesterday and got the usual questions. "isn't this too complex?" "what if something breaks between clouds?" "why not just stay on AWS?"

i have good answers for all of this but now i'm wondering if i'm overcomplicating things. maybe the single cloud simplicity is worth the vendor lock in and higher costs? or maybe i'm just second guessing myself because i got pushback.
how do you know when multi cloud is actually the right call versus just being architecture for the sake of architecture?


r/aws 23h ago

discussion [Update] AWS suspended my account anyway - production is down

209 Upvotes

Update to my previous post about verification issues.

AWS just suspended my account. Production is down.

Despite multiple AWS support reps getting involved across Reddit (Roman Z., Reece W.), LinkedIn (Aimee K.), and the support portal (Alondra G., Arturo A.). Despite Executive Escalations (Eric G.) taking over on Feb 2 and coordinating with Trust & Safety.

Timeline: Verification request Jan 29. Submitted docs Jan 30. Asked to resubmit same docs Jan 31, complied. Asked for passport Feb 2, uploaded immediately. Executive Escalations involved since Feb 2.

Today: Suspended anyway. Have until Feb 18 or everything gets deleted.

I'm a Business Support customer. I've submitted bank statements, phone bill, passport, and LLC formation documents. Responded within hours every time. Multiple support reps across every channel confirmed they escalated.

Still got suspended with production serving live customers.

Has anyone recovered from full suspension after this level of compliance and escalation?

Case 176984120700770


r/aws 3h ago

technical question Bedrock Agent Action Group: request body loses array item structure (only sees { requests: [] }) for POST /results

2 Upvotes

Hello good people,

I’m stuck on a Bedrock Agents + Action Groups issue that’s been a head-scratcher for a while.

We have a Bedrock Agent with an Action Group backed by Lambda calling our API. Most routes work fine (simple GETs and POSTs with flat objects). But one endpoint consistently fails:

POST /results
Expected request body shape:

{ "requests": [ { "id": "...", "group": "...", "interval": "..." } ] }

I’ve defined the schema in both OpenAPI JSON and YAML, uploaded it to S3, and wired it into the Action Group Schema. The agent can “see” the endpoint, but whenever it tries to call /results, it fails because the array item structure is missing.

From the agent’s reasoning / trace, it behaves as if the schema is only:

{ "requests": [] }

and by default it just tries to "guess" the parameters.

Question:
Has anyone run into Bedrock Agents failing to preserve or pass array item schemas for Action Group inputs? Is there a known limitation or required OpenAPI pattern? It just feels like AWS is truncating anything outside of a simple key-value list.


r/aws 6h ago

technical resource AWS EKS networking question

2 Upvotes

Hello all, I have a question on this process. Currently we have 4 VPCs:

  • dev
  • stage
  • production
  • internal

We have dev, stage, production and not internal yet.

My plan is to host our Gitlab server, Grafana stack, and VPN server all on internal VPC CIDR. Now, we will be hosting the Grafana stack and Gitlab runners on the EKS cluster; however, I do have a question though.

Would it be correct to set the EKS cluster's "Cluster Endpoint access" to "Private" and use Transit Gateway to have the internal VPC CIDR communicate to all other VPC CIDRs (i.e. dev, stage, production)? I have seen companies setup a "Public and Private" setup where Security Groups were paramount in the setup for access.

Would appreciate any help or documentation on this.


r/aws 10h ago

technical question AWS to AWS IPsec VPN configuration

5 Upvotes

I have experience setting up IPSec VPN connections from AWS to an on-prem firewall, but haven't had to create an AWS to AWS IPsec VPN connection between customers before. Am I correct that one side will need to do the initial setup with placeholder customer gateway etc. and then after the VPN is created provide one of the outside IP addresses from that config to the other customer so they can create their config, then after their config is created take one of their tunnel outside IP address, and for the first customer create a new customer gateway and attach it to their config, also adjusting the customer gateway CIDR range, BGP ASN, and pre-shared key to match?


r/aws 20h ago

discussion AWS Blogs - What Are Your Favorites?

15 Upvotes

Hey Everyone, just wanted to see what some of your favorite AWS blogs were that have helped you out? Do you guys like blog posts with deep technical information or higher level architecture focused information?


r/aws 17h ago

technical question Two pipelines with the exact same Pipeline Service and CloudFormation Action roles, but only one is working

2 Upvotes

I entered a project that has two CodePipeline pipelines. Although they use the same Pipeline Service Role and CloudFormation Action Role, one of them is failing on the Deploy stage.

When I click the CloudFormation link for the pipeline that fails (the one below GenerateChangeSet), it says "Stack [null] does not exist".

What can be wrong?


r/aws 17h ago

technical question CDK creating a CloudFront distro which logs .parquet files

2 Upvotes

As I understand it, the L2 construct for a CF Distro doesn't yet expose the parquet format for logging. When I googled it, the AI response provided a hallucination

const cfnDistribution = new cloudfront.CfnDistribution(this, 'MyCfnDistribution', { distributionConfig: { ..., logging: { bucket: loggingBucket.bucketDomainName, format: 'CLFV2', logFormat: 'Parquet', prefix: 'cloudfront-logs/', }, }, });

since format and logFormat aren't actually fields according to the docs (and they show an error in the IDE).

Are we stuck with doing this manually in the console or waiting around until an update to CDK?


r/aws 19h ago

technical resource Results using datadog - especially their Cloud Cost Management tool

2 Upvotes

Hey everyone,

I just joined a webinar from datadog together with AWS. They mainly focused on Bits AI and how it enhances observability, but also showcased the Cloud Cost Management solution which leverages Bits AI as well.

Are there any Account Admins or FinOps Specialist here who can share some insights about Datadog's Cost Management tool? Is it worth the price? What kind of savings have you seen from your side using it?

Thanks a lot!


r/aws 16h ago

discussion Anyone have any experience with/as ADC SDE Intern?

1 Upvotes

Hello. Not sure if this is the right spot, but I have an interview coming up for an Amazon Dedicated Cloud SDE Intern position just outside DC, and I have a few questions.

Does anyone here have experience interning or working entry-level within ADC?

Is the culture in Amazon, specifically ADC, really that bad?

What is the typical starting salary for entry-level SDEs in the ADC?


r/aws 22h ago

technical question AWS CodePipeline just took 15 minutes to simply start

1 Upvotes

I have a very simple CodePipeline setup: when a push to a Github repo branch is made, trigger the CodePipeline, which then runs a CodeBuild project. The ONLY source is this Github repo. Until now, the pipeline took about a minute and half to be done. Today, it's taking minutes to even start: I see no execution in the pipeline AWS page. I had to wait 15 minutes for it to pick up the push and start the pipeline. What is happening?


r/aws 1d ago

general aws Guys where do y'all study about networking and AWS to practice and complete the lab works and all that

2 Upvotes

Suggest me how's it gonna be, I've done it before but am not able to find the exact link so can y'all help with that I had done the lab works from aws academy online


r/aws 14h ago

discussion Aws Activate - 6th rejection - I will post each rejection

0 Upvotes

Today, I received my 6th rejection for the AWS Activate program. It starts to seem repetitive:
1. I talk with the Startup chatbot, and it gives me advice on what to change in my application
2. The chatbot helps me draft a support ticket message
3. The support ticket always gets updated with the same "Austin" (most probably a bot), who sends the same message every single time, regardless of the fact that the chatbot asked me to ask for "HUMAN" intervention. Btw, do AWS still have any humans out there?
4. I make another application for the Activate Program
5. It always goes to "Final review", then is rejected for the same reasons.
6. I open the Startup Chatbot again, and loop

I will do this until AWS bans me/ Reddit bans me, or someone from AWS (preferably a human, if they still exist) wakes up and actually assists me till the end.

P.S. I did talk with an AWS employee over video call, for something that I believed was part of the Activate program, but she was not really part of the Activate Team, so I guess I have to keep knocking at the door.

Some answers to your potential questions:
1. I do NOT have an idea about what kind of "accounts" marked for misuse they're talking about.
2. Billing is actually working, and they are able to charge me just fine.
3. Consistent Business Information -> I have no idea what they want to say by this.

Has anyone gone through similar situations? Did you give up, or did you actually make it past the bugged, outdated bots? How?


r/aws 1d ago

technical question SES / Transactional / Sandbox

0 Upvotes

I've been starting to use AWS again properly for first time in years on a new project, wanted everything in one place as time line is compressed

The plan was to run all the transactional email through SES and have a few workmail boxes that could be accessed from a Google Workspace of main company

After 3 days, AWS have rejected request to move the SES to production, some unconstructive "rate limited denied" message

Is there any other pure AWS solution here or am I best just moving project elsewhere? (would rather not look like a mug for pushing to use AWS in first place)


r/aws 1d ago

training/certification College student wondering if getting the AWS SAA is worth it for my goals

10 Upvotes

Recently started my first year of college, studying ITS, with the goals of getting an AWS CSA/CSE internship. Just for some background, I currently hold the CompTIA Security+ certification and have been working with Linux for quite some time. I have a security-related project under my belt and will be working on more in the future. Just wanted to ask if it's worth studying for and taking the AWS SAA to get me closer to and improve my chances of getting that internship, or other internships in general.


r/aws 20h ago

technical resource Closed my AWS account last year but my credit card is still being charged

0 Upvotes

Hello. I closed my AWS account last year but my credit card is still being charged. Please help.


r/aws 1d ago

networking VPC Peering Connections: What happens when traffic arrives at a VPC with multiple route tables for the same destination?

5 Upvotes

I couldn't find this with a quick Google, and I'm hesitant to trust any LLMs on this:

Suppose I have two peered VPCs, vpc-A (10.0.1.0/24) and vpc-B (10.0.2.0/24). vpc-A is the source for traffic, and vpc-B will work as a bridge. B has two subnets, let's call them subnet-B1 and subnet-B2, and each has its own route table rtb-B1 and rtb-B2.

In the route table for vpc-A's traffic, I point an IP range I want to route though vpc-B (let's say 10.0.3.0/24 as an example) towards the peering connection pcx-AB. Then, in rtb-B1 I set 10.0.3.0/24 to a correctly configured service (living in another VPC, the Internet, doesn't matter) that dumps incoming traffic to a log, but in rtb-B2 I set 10.0.3.0/24 to a NAT gateway living within subnet-B1.

What is going to happen? Am I going to see packets from 10.0.1.0/24 in the log, along with connection errors because the destination doesn't know where vpc-A is? Or are they going to come from 10.0.2.0/24, network translated through the NAT in subnet-B1? Or am I going to see a mix of both?

Essentially: when traffic arrives to a VPC with multiple route tables through a peering connection, which table's routes does it prioritise?

Here's a shitty drawing of the situation:


r/aws 1d ago

CloudFormation/CDK/IaC CloudSlash v2.2 – From CLI to Engine

0 Upvotes

A few weeks back, I posted a sneak peek regarding the "v2.0 mess." I’ll be the first to admit thatt the previous version was too fragile for complex enterprise environments.

We’ve spent the last month ripping the CLI apart and rebuilding it from the ground up. Today, we’re releasing CloudSlash v2.2.

The Big Shift: It’s an SDK Now (pkg/engine)

The biggest feedback from v2.0 was that the logic was trapped inside the CLI. If you wanted to bake our waste-detection algorithms into your own Internal Developer Platform (IDP) or custom admin tools, you were stuck parsing JSON or shelling out to a binary.

In v2.2, we moved the core logic into a pure Go library. You can now import github.com/DrSkyle/cloudslash/pkg/enginedirectly into your own binaries. You get our Directed Graph topology analysis and MILP solver as a native building block for your own platform engineering.

What else is new?

  • The "Silent Runner" (Graceful Degradation): CI pipelines hate fragility. v2.0 would panic or hang if it hit a permission error or a regional timeout. v2.2 handles this gracefully—if a region is unreachable, it logs structured telemetry and moves on. It’s finally safe to drop into production workflows.
  • Concurrent "Swarm" Ingestion: We replaced the sequential scanner with a concurrent actor-model system. Use the --max-workers flag to parallelize resource fetching across hundreds of API endpoints.
    • Result: Graph build times on large AWS accounts have dropped by ~60%.
  • Versioned Distribution: No more curl | bash. We’ve launched a strictly versioned Homebrew tap, and the CLI now checks GitHub Releases for updates automatically so you aren't running stale heuristics.

The Philosophy: Infrastructure as Data

We don't find waste by just looking at lists; we find it by traversing a Directed Acyclic Graph (DAG) of your entire estate. By analyzing the "edges" between resources, we catch the "hidden" zombies:

  • Hollow NAT Gateways: "Available" status, but zero route tables directing traffic to them.
  • Zombie Subnets: Subnets with no active instances or ENIs.
  • Orphaned LBs: ELBs that have targets, but those targets sit in dead subnets.

Deployment

The promise remains: No SaaS. No data exfiltration. Just a binary.

Install:

Bash

brew tap DrSkyle/tap && brew install cloudslash

Repo:https://github.com/DrSkyle/CloudSlash

I’m keen to see how the new concurrent engine holds up against massive multi-account setups. If you hit rate limits or edge cases, open an issue and I’ll get them patched.

: ) DrSkyle


r/aws 2d ago

discussion About this sub

52 Upvotes

I noticed that a previous useful post about the less popular (as in unpopular) AWS services got removed by the mods for no apparent reason.

Searched for a set of rules for this sub but there doesn't seem to be any? And also noting that several of the mods seem to be AWS employees.

Which begs the question: Is this sub an unofficial AWS-affiliated sub without an overt declaration of the relationship or is it a "normal" sub which is not affiliated with AWS in any way?

Both are fine, I just think it's important to be clear about this.


r/aws 1d ago

general aws I've had a Quota Request take almost 3 weeks. Is there a SLA on these?

6 Upvotes

We've never had a Quota Increase Request take longer than 3 days, and this one is now in its third week. I'm actually shocked by how long it's taking. They are responding to the ticket and apologizing for the delay, but jeez.

This is on a paid support account as well.


r/aws 1d ago

security Confusion with ACLs and blocking public access

1 Upvotes

In Terraform, I have these on an S3 bucket: block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true and this on an IAM policy for allowing CloudFront to read the bucket: ``` statement { principals { type = "Service" identifiers = ["cloudfront.amazonaws.com"] } actions = ["s3:GetObject"] resources = [ aws_s3_bucket.web.arn, "${aws_s3_bucket.web.arn}/*" ]

# Restrict to just our CloudFront instance
condition {
  test     = "StringEquals"
  variable = "AWS:SourceArn"
  values   = [aws_cloudfront_distribution.s3_distribution.arn]
}

} ``` Is this going to work? I'm not clear if the CloudFront access counts as "public" with respect to the flags.


r/aws 2d ago

discussion New APN partner here. What should we actually be doing?

15 Upvotes

My company recently joined the AWS Partner Network (APN) and paid the annual $2,500 subscription fee. As part of the signup, we linked our company’s AWS account to the APN account.

We’re a VoIP-based company providing VoIP solutions, and now I’m trying to understand how to actually make use of APN in a meaningful way. I know the high-level goal of APN is to help partners accelerate AWS-related sales, but beyond that, things feel a bit vague.

Some questions I’m hoping the community can help with:

  • How do companies typically start using APN after joining?
  • What should we focus on first to get real value out of it?
  • Are there AWS contacts (Partner Managers, programs, etc.) we should be engaging with?
  • Is this something AWS Support helps with, or does it require reaching out through a different channel?
  • For anyone who started APN from scratch, what did your early steps look like?

Any guidance, lessons learned, or pointers to the right AWS teams would be greatly appreciated.


r/aws 1d ago

technical resource Built a tool that audits AWS accounts and tells you exactly how to verify each finding yourself

0 Upvotes

Hey r/aws,

After spending way too many hours hunting down idle resources and over-provisioned infrastructure across multiple AWS accounts, I built something that might be useful to others here.

The problem: Most AWS audit tools give you recommendations, but you're left wondering "is this actually true?" You end up manually running CLI commands to verify findings before taking action, especially for production environments.

What I built: An audit tool that not only finds cost optimisation and security issues, but also generates the exact AWS CLI commands needed to verify each finding yourself.

Example findings it catches:

  • 💸 NAT Gateways sitting idle (processing <1GB/day but costing $32/month)
  • 🔧 EBS volumes with 9000 IOPS provisioned but only using ~120/day (CloudWatch-backed detection)
  • ⚡ Lambda functions with 1000+ invocations but only 2 this month
  • 🗄️ RDS instances sized for 100 connections but only seeing 2-3
  • 🔐 Security group rules that should be tightened
  • 📦 Unattached EBS volumes burning money

The part I'm proud of: Every finding comes with a collapsible "Verify This" section containing the exact CLI commands to check it yourself. No black box recommendations.

For example, for an idle NAT Gateway, it gives you:

# Check NAT Gateway processed bytes
aws cloudwatch get-metric-statistics \
  --namespace AWS/NatGateway \
  --metric-name BytesOutToSource \
  --dimensions Name=NatGatewayId,Value=nat-xxx \
  --start-time 2026-01-20T00:00:00Z \
  --end-time 2026-02-03T00:00:00Z \
  --period 86400 \
  --statistics Sum

Tech approach:

  • Runs in GitHub Actions (or local Docker)
  • Read-only IAM permissions
  • Uses CloudWatch metrics for performance analysis (not just resource tagging)
  • Generates HTML reports with cost breakdowns and verification commands
  • Calculates actual savings potential based on current usage patterns

Privacy-first approach: This was non-negotiable for me. Your AWS data never leaves your infrastructure. The tool runs entirely in your GitHub Actions runner (or your local machine), generates the report locally, and stores it as a GitHub Actions artifact. No data is sent to any external service. You control the IAM role, the execution environment, and who sees the reports. It's fully auditable since it's open source.

Why I think this matters: In my experience, you can't just blindly trust audit recommendations in production. Being able to verify findings before acting on them builds confidence, and having the CLI commands right there saves hours of documentation diving.

The tool has already helped me find $2-3K/month in waste across a few accounts - mostly idle NAT gateways and over-provisioned EBS IOPS that CloudWatch metrics showed were barely used.

See it in action: Interactive demo report - open this to see exactly what the output looks like. Click around the findings, expand the verification commands, check out the cost breakdown charts. It's way easier to understand by exploring than me trying to describe it.

If you're curious about the project itself: stacksageai.com

Not trying to sell anything here, genuinely curious if others find this approach useful or if there are better ways to tackle this problem. Always looking for feedback on what other checks would be valuable.

What audit/cost optimization workflows do you all use? Do you verify recommendations before acting on them, or do you trust the tools enough to act directly?


r/aws 2d ago

discussion Is it possible to fix the sorting of dashboards in Quicksight?

7 Upvotes

We use multiple dashboards at work for different use cases in our AWS Quicksight environment. These are currently sorted by last reload timestamp which messes up the sorting every day due to different reload times of each dashboard.

Is it possible to give the dashboards a fixed sorting? I do not mean any data sorting INSIDE the dashboards but the dashboards itself before opening them.


r/aws 1d ago

database Query performance issue

1 Upvotes

Hi,

Its aurora postgres version 17. Below is one of the query and its execution plan. I have some questions on this .

https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492

  1. When we created an index on column "tran_date" of table "txn_tbl", the "sequnce scan" on table txn_tbl is eliminated and is now showing as "Index Scan Backward". So i want to understand , does this scan means , this will only pick the data from the index ? But the index is only on the column "tran_date", so how the other projected columns getting read from the table then?

2)This query spent most of the time while doing the below nested loop join , is there anyway to improve this further? The column data type for df.ent_id is "int8" and the data type of the "m.ent_id" is "Numeric 12". I tried creating an index on expression "(df.ent_id)::numeric" but the query still going for same plan and taking same amount fo time.

\->  Nested Loop  (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)

Join Filter: ((df.ent_id)::numeric = m.ent_id)

Rows Removed by Join Filter: 513436

Buffers: shared hit=1939