There's a Live BGP Router Running on My Portfolio

There is a section on this portfolio showing real router output from a BGP session I built and ran in AWS.

Not a simulation. Real FRR routers, peered over an IPSec tunnel, exchanging routes between autonomous systems. The session stayed established for over five weeks before I deliberately spun the lab down, because paying for idle EC2 instances between demos is not architecture, it is negligence. The entire stack redeploys from Terraform in minutes whenever it is needed.

I built it because I believe there is a meaningful difference between understanding something in theory and being able to operate it under real conditions. I wanted the portfolio to reflect that distinction. And I wanted to build something that forced me to solve the same class of problems I solve in production, but in a context where I could take it apart deliberately instead of under pressure.

Why BGP Specifically

BGP is the routing protocol that runs the internet. It is also the protocol that makes hybrid cloud connectivity work at the layer that actually matters.

When you connect an on-premises data center to AWS using Direct Connect or a Site-to-Site VPN, BGP handles route advertisement between the two environments. When you need to control which prefixes flow between your corporate network and your cloud VPCs, BGP is how you express that policy. When something breaks and traffic stops flowing between environments, BGP is usually where the investigation starts.

Most cloud engineers have never configured it directly. Most enterprise network engineers have configured it but never in a cloud context. The combination of both is genuinely uncommon, and it is the specific intersection where I have spent my career.

Building a live BGP lab was not an exercise in novelty. It was a way to keep the skills sharp and make the work visible to people who would not otherwise have a way to verify it.

What the Lab Actually Is

Two FRR (Free Range Routing) instances running on EC2. One represents a cloud-side router. One represents an on-premises router. They establish an eBGP peering session across a strongSwan IPSec tunnel, advertising routes between Autonomous Systems exactly the way a real hybrid environment does.

AS 65001 (cloud) ←── eBGP / TCP 179 / IPSec AES-256 ──→ AS 65002 (on-prem)

This is the same architectural pattern that underpins AWS Direct Connect. The protocol stack is identical: BGP session over an encrypted transport, route advertisements flowing in both directions, prefix filtering controlling what each side sees.

While the lab is deployed, a Lambda function behind an API Gateway endpoint executes router commands via AWS Systems Manager, and the portfolio renders the output in a terminal interface: BGP summary, BGP table, routing table, interface status, against either router. The terminal on the Projects page shows captured output from a real session.

The whole stack is provisioned with Terraform. If I tear it down and rebuild it, it comes back to an identical state.

What Building It Actually Taught Me

Designing and operating this lab reinforced things I had internalized from production work but had never needed to articulate precisely until I was explaining them to an API endpoint.

Routing and reachability are not the same thing. A BGP session can be fully established and routes can be present in the table, but traffic still does not flow. The gap between "routes are advertised" and "packets are actually moving" is where most BGP troubleshooting lives. Understanding why that gap exists in a lab, where you control every variable, makes you significantly faster at diagnosing it in production, where you do not.

Encryption adds a layer of operational complexity that compounds everything else. The IPSec tunnel between the two FRR instances has to be healthy before BGP can establish. Debugging a BGP session on top of a flapping IPSec SA is a specific kind of diagnostic exercise. You have to work down the stack systematically instead of chasing the most visible symptom. That discipline is valuable in production environments where the two failure modes arrive at the same time.

Infrastructure as code makes implicit assumptions explicit. When I provisioned this lab in Terraform, every assumption I had been making informally became something I had to write down: CIDR ranges, security group rules, route table entries, IAM policies for SSM access. Terraform does not tolerate vagueness. That constraint is useful because production does not tolerate it either.

The most valuable thing a lab project can do is create the conditions for deliberate failure. In production, you prevent failures. In a well-designed lab, you engineer them on purpose and study what happens. The BGP lab has let me intentionally misconfigure route policies, drop IPSec sessions mid-transfer, and introduce path asymmetry to observe how the routing protocol responds. That kind of controlled experimentation does not have a classroom equivalent.

How This Connects to Production Work

In my current role, I design and maintain hybrid connectivity between on-premises data centers and cloud environments in AWS and Azure. The environments serve healthcare infrastructure, which means uptime requirements are real and the cost of a misconfigured route or an unexpected BGP session drop is not abstract.

The lab is a cleaner version of that work. Same protocol stack. Same architectural decisions. Same failure modes. The difference is that in the lab I can break things deliberately and learn from them at my own pace. In production, I apply my findings in the lab to make sure things do not break in the first place.

That translation from deliberate practice to production discipline is the most accurate thing a portfolio project can demonstrate. It shows that the work did not stop at the tutorial. It shows that you understood the material well enough to build something real with it, operate it over time, and extract lessons that transfer back to the work you are actually paid to do.

The session ran continuously for over five weeks before I tore the lab down on purpose. The captured output in the Projects section shows the uptime counter from that run.

Uptime is something you prove, not claim. So is knowing when to turn things off.

BGP AWS IPSec FRR Terraform Hybrid Cloud Network Engineering Lambda

A captured session from the lab is viewable in the Projects section, and the entire stack redeploys from Terraform in minutes.

See the Lab