There is a section on this portfolio where you can run live router commands against a real BGP session.

Not a simulation. Not a screenshot. Real FRR routers running in AWS, peered over an IPSec tunnel, returning live BGP tables in response to your queries. You can see the neighbor relationships, the advertised prefixes, the uptime on the session. If the session is up, the dot pulses green. If it is down, it does not lie to you.

I built it because I believe there is a meaningful difference between understanding something in theory and being able to operate it under real conditions. I wanted the portfolio to reflect that distinction. And I wanted to build something that forced me to solve the same class of problems I solve in production, but in a context where I could take it apart deliberately instead of under pressure.


Why BGP Specifically

BGP is the routing protocol that runs the internet. It is also the protocol that makes hybrid cloud connectivity work at the layer that actually matters.

When you connect an on-premises data center to AWS using Direct Connect or a Site-to-Site VPN, BGP handles route advertisement between the two environments. When you need to control which prefixes flow between your corporate network and your cloud VPCs, BGP is how you express that policy. When something breaks and traffic stops flowing between environments, BGP is usually where the investigation starts.

Most cloud engineers have never configured it directly. Most enterprise network engineers have configured it but never in a cloud context. The combination of both is genuinely uncommon, and it is the specific intersection where I have spent my career.

Building a live BGP lab was not an exercise in novelty. It was a way to keep the skills sharp and make the work visible to people who would not otherwise have a way to verify it.


What the Lab Actually Is

Two FRR (Free Range Routing) instances running on EC2. One represents a cloud-side router. One represents an on-premises router. They establish an eBGP peering session across a strongSwan IPSec tunnel, advertising routes between Autonomous Systems exactly the way a real hybrid environment does.

AS 65001 (cloud)  ←── eBGP / TCP 179 / IPSec AES-256 ──→  AS 65002 (on-prem)

This is the same architectural pattern that underpins AWS Direct Connect. The protocol stack is identical: BGP session over an encrypted transport, route advertisements flowing in both directions, prefix filtering controlling what each side sees.

A Lambda function sits behind an API Gateway endpoint and executes router commands via AWS Systems Manager. The portfolio site calls the API, and the output renders in a terminal interface in real time. You can run four commands against either router: BGP summary, BGP table, routing table, interface status.

The whole stack is provisioned with Terraform. If I tear it down and rebuild it, it comes back to an identical state.


What Building It Actually Taught Me

Designing and operating this lab reinforced things I had internalized from production work but had never needed to articulate precisely until I was explaining them to an API endpoint.

Routing and reachability are not the same thing. A BGP session can be fully established and routes can be present in the table, but traffic still does not flow. The gap between "routes are advertised" and "packets are actually moving" is where most BGP troubleshooting lives. Understanding why that gap exists in a lab, where you control every variable, makes you significantly faster at diagnosing it in production, where you do not.

Encryption adds a layer of operational complexity that compounds everything else. The IPSec tunnel between the two FRR instances has to be healthy before BGP can establish. Debugging a BGP session on top of a flapping IPSec SA is a specific kind of diagnostic exercise. You have to work down the stack systematically instead of chasing the most visible symptom. That discipline is valuable in production environments where the two failure modes arrive at the same time.

Infrastructure as code makes implicit assumptions explicit. When I provisioned this lab in Terraform, every assumption I had been making informally became something I had to write down: CIDR ranges, security group rules, route table entries, IAM policies for SSM access. Terraform does not tolerate vagueness. That constraint is useful because production does not tolerate it either.

The most valuable thing a lab project can do is create the conditions for deliberate failure. In production, you prevent failures. In a well-designed lab, you engineer them on purpose and study what happens. The BGP lab has let me intentionally misconfigure route policies, drop IPSec sessions mid-transfer, and introduce path asymmetry to observe how the routing protocol responds. That kind of controlled experimentation does not have a classroom equivalent.


How This Connects to Production Work

In my current role, I design and maintain hybrid connectivity between on-premises data centers and cloud environments in AWS and Azure. The environments serve healthcare infrastructure, which means uptime requirements are real and the cost of a misconfigured route or an unexpected BGP session drop is not abstract.

The lab is a cleaner version of that work. Same protocol stack. Same architectural decisions. Same failure modes. The difference is that in the lab I can break things deliberately and learn from them at my own pace. In production, I apply my findings in the lab to make sure things do not break in the first place.

That translation from deliberate practice to production discipline is the most accurate thing a portfolio project can demonstrate. It shows that the work did not stop at the tutorial. It shows that you understood the material well enough to build something real with it, operate it over time, and extract lessons that transfer back to the work you are actually paid to do.


The BGP session on this site has been running continuously since I launched it. You can check the session uptime and the timestamp on the last query yourself from the Projects section.

I keep it live because uptime is something you prove, not claim.

Want to see the live router output? It is running right now in the Projects section. Or if you are building a team that needs this kind of depth, let's talk.

See the Live Lab