Author: MIke

  • Tagged Layer 3 Interfaces vs Router-on-a-Stick: Two Sides of the Same Coin

    Both tagged Layer 3 interfaces and router-on-a-stick use 802.1Q VLAN tagging to multiplex multiple Layer 3 networks over a single physical link. The concepts are nearly identical—the main differences lie in the platform, scale, and typical use cases. Let’s break down what makes them similar and where they diverge.


    The Foundation: 802.1Q VLAN Tagging

    Both designs rely on 802.1Q trunking to carry multiple VLANs across a single physical interface. Each VLAN gets its own Layer 3 subinterface (or logical unit), allowing a single link to handle multiple routed networks simultaneously.

    Think of it like a single fiber optic cable carrying multiple wavelengths of light (DWDM). One physical medium, multiple logical channels.

    Router-on-a-Stick: The Classic Pattern

    How It Works

    Router-on-a-stick connects a router to a Layer 2 switch via a single 802.1Q trunk. The router creates multiple subinterfaces on one physical port, with each subinterface handling routing for a specific VLAN.

    Configuration Example (Cisco Router):

    interface GigabitEthernet0/0
     description Trunk to Layer 2 Switch
     no ip address
    
    interface GigabitEthernet0/0.10
     description VLAN 10 - Finance
     encapsulation dot1Q 10
     ip address 192.168.10.1 255.255.255.0
    
    interface GigabitEthernet0/0.20
     description VLAN 20 - Engineering  
     encapsulation dot1Q 20
     ip address 192.168.20.1 255.255.255.0
    
    interface GigabitEthernet0/0.30
     description VLAN 30 - Guest
     encapsulation dot1Q 30
     ip address 192.168.30.1 255.255.255.0

    Primary Use Case

    Inter-VLAN routing in small to medium environments:

    • Branch offices with Layer 2 switches
    • Small campus networks
    • Budget-constrained deployments
    • Networks with light to moderate inter-VLAN traffic

    Tagged Layer 3 Interfaces: The Enterprise Pattern

    How It Works

    Tagged Layer 3 interfaces use the same 802.1Q subinterface concept, but typically on enterprise routers or Layer 3 switches connecting to other Layer 3 devices or provider networks. Rather than inter-VLAN routing for local users, these interfaces often carry:

    • Multiple customer connections (ISP/carrier use case)
    • Different VRFs or routing instances
    • Segregated services over shared infrastructure
    • WAN connections with multiple circuits

    Configuration Examples

    Juniper (Logical Units):

    set interfaces et-0/0/1 description "Carrier_Circuit_to_DMZ_Switch"
    set interfaces et-0/0/1 vlan-tagging
    
    set interfaces et-0/0/1 unit 200 description "ATT"
    set interfaces et-0/0/1 unit 200 vlan-id 200
    set interfaces et-0/0/1 unit 200 family inet address 10.23.59.1/30
    
    set interfaces et-0/0/1 unit 308 description "Zayo"
    set interfaces et-0/0/1 unit 308 vlan-id 308
    set interfaces et-0/0/1 unit 308 family inet address 10.23.58.1/30
    
    set interfaces et-0/0/1 unit 322 description "Lumen"
    set interfaces et-0/0/1 unit 322 vlan-id 322
    set interfaces et-0/0/1 unit 322 family inet address 10.23.57.1/30
    
    set interfaces et-0/0/1 unit 337 description "Verizon"
    set interfaces et-0/0/1 unit 337 vlan-id 337
    set interfaces et-0/0/1 unit 337 family inet address 10.23.56.1/30

    Arista (Subinterfaces with VRFs):

    interface Ethernet3
       description "Verizon"
       no switchport
    
    interface Ethernet3.3011
       description "Customer1"
       encapsulation dot1q vlan 3011
       vrf Cust1
       ip address 10.140.242.45/31
    
    interface Ethernet3.3012
       description "Customer2"
       encapsulation dot1q vlan 3012
       vrf Cust2
       ip address 10.140.242.49/31
    
    interface Ethernet3.3018
       description "Customer3"
       encapsulation dot1q vlan 3018
       vrf Customer3
       ip address 10.140.242.53/31

    Primary Use Cases

    Service multiplexing and network segregation:

    • Carrier/ISP networks serving multiple customers over shared infrastructure
    • Enterprise edge routers with multiple WAN circuits or partners
    • Data center interconnects (DCI) carrying multiple tenants
    • MPLS PE routers with VRF-segregated customers
    • DMZ/extranet environments with strict segmentation requirements

    Key Differences

    FeatureRouter-on-a-StickTagged Layer 3 Interfaces
    Typical PlatformSmall branch routers (ISR, etc.)Enterprise routers (MX, ASR, 7xxx)
    Connected ToLayer 2 access switchLayer 3 device, carrier, or upstream
    Primary PurposeInter-VLAN routing for end usersService multiplexing, WAN aggregation
    Traffic PatternEast-west (VLAN to VLAN)North-south (external connections)
    VRF UsageRarely usedCommon (customer/service isolation)
    ScaleTypically 3-10 VLANsCan support dozens to hundreds
    Port Speed1G typical10G/40G/100G common
    Routing ComplexitySimple (default gateway role)Complex (BGP, OSPF, policy routing)

    The Real Difference: Context and Scale

    Technically, both designs are doing the same thing: using 802.1Q tagging to create multiple Layer 3 interfaces on a single physical port. The distinctions come down to:

    1. Network Location

    • Router-on-a-stick: Access layer, connecting to end-user VLANs
    • Tagged L3 interfaces: Edge/core, connecting to WAN, partners, or other infrastructure

    2. Traffic Type

    • Router-on-a-stick: Internal traffic between VLANs (Finance ↔ Engineering)
    • Tagged L3 interfaces: External services, customers, or carriers (Bank of America, Wells Fargo, Verizon,ATT)

    3. Isolation Requirements

    • Router-on-a-stick: Simple VLAN separation, shared routing table
    • Tagged L3 interfaces: Often uses VRFs for strict routing isolation between customers/services

    4. Performance Expectations

    • Router-on-a-stick: Bandwidth bottleneck is an accepted trade-off for simplicity
    • Tagged L3 interfaces: High-speed links (10G+) with hardware-accelerated forwarding

    Real-World Example: Financial Services Edge Router

    In the Arista example above, a single 10G interface to a carrier (Lumen) carries three completely isolated networks:

    • VLAN 3011: Dedicated Wells Fargo connection (VRF: WellsFargo)
    • VLAN 3012: Shared FIX protocol link (VRF: Shared_Fix)
    • VLAN 3018: Extranet services (VRF: Extranet)

    Each subinterface exists in a separate VRF, ensuring complete routing isolation. Traffic from Wells Fargo can never leak into the Extranet VRF, even though they share the same physical wire.

    This is service multiplexing—using 802.1Q to deliver multiple isolated services over shared infrastructure.

    When to Use Each Design

    Use Router-on-a-Stick When:

    • You need inter-VLAN routing in a small office or branch
    • You have Layer 2 switches and one router
    • Budget constraints prevent Layer 3 switching
    • Inter-VLAN traffic is moderate and predictable

    Use Tagged Layer 3 Interfaces When:

    • Connecting to carriers, partners, or WAN providers
    • You need strict traffic segregation (VRFs)
    • Multiplexing multiple customers or services over shared links
    • Building data center interconnects or MPLS PE infrastructure
    • Working with high-bandwidth circuits (10G+)

    Common Pitfalls and Considerations

    MTU and Fragmentation

    802.1Q adds 4 bytes to the frame. If your physical interface MTU is 1500, your effective Layer 3 MTU per subinterface is 1496. Always verify MTU settings match on both ends to avoid fragmentation issues.

    Native VLAN Considerations

    Some platforms allow a “native” (untagged) VLAN on trunk ports. Be explicit about whether you’re using this feature to avoid misconfigurations and potential security issues.

    Performance Monitoring

    Monitor each subinterface individually—don’t just look at the physical interface utilization. One busy subinterface can saturate the link and affect all others.

    QoS and Traffic Shaping

    When multiplexing critical services, implement QoS policies to ensure high-priority traffic (e.g., VoIP, financial transactions) isn’t starved by bulk data transfers.

    Conclusion

    Router-on-a-stick and tagged Layer 3 interfaces are fundamentally the same technology—802.1Q subinterfaces providing Layer 3 routing over a single physical link. The key differences are:

    • Router-on-a-stick: Small-scale inter-VLAN routing for local users
    • Tagged L3 interfaces: Enterprise-scale service multiplexing with VRF isolation

    Both have their place in modern networks. Understanding when and why to use each pattern is essential for designing efficient, scalable infrastructure—whether you’re building a branch office network or connecting to major financial institutions over carrier circuits.


    Working with VLANs, VRFs, or enterprise routing? Let’s connect on LinkedIn

  • Fixing XCP-ng Live Migration Failures: Mixed CPU Generations in a Homelab Pool

    The Problem: When Your Homelab Becomes a Lesson in Enterprise Architecture

    I recently ran into an interesting issue with my XCP-ng homelab that taught me a valuable lesson about virtualization infrastructure design. If you’re running a mixed-hardware pool and your rolling updates keep failing with cryptic CANNOT_EVACUATE_HOST errors, this post is for you.

    The Setup

    My homelab consists of two hosts in an XCP-ng pool (managed via Xen Orchestra):

    • Hera: HP Z640 with Intel Xeon E5-2670 v3 (Haswell, 12c/24t @ 2.30GHz)
    • Zeus: Dell server with Intel Xeon E5-2650 v2 (Ivy Bridge, 8c/16t @ 2.60GHz)

    Seems reasonable, right? Both are Xeon E5 v2/v3 generation processors, both support virtualization, and they’ve been running happily together in a pool for quite some time.

    The Failure: Rolling Updates Hit a Wall

    When I attempted to perform a rolling pool update through Xen Orchestra, I was greeted with this error:

    CANNOT_EVACUATE_HOST(VM_INCOMPATIBLE_WITH_THIS_HOST,
    OpaqueRef:1de8f41d-c39c-b097-026d-c8b687dee6a1,
    OpaqueRef:4f9c343b-8ebd-9ade-7f9b-eaa22844b7dd,
    VM last booted on a CPU with features this host's CPU does not have.)
    

    Similarly, attempting to put Hera into maintenance mode resulted in:

    VM_INCOMPATIBLE_WITH_THIS_HOST(
    OpaqueRef:dd8ccb61-2e86-4853-880f-49f078b0e10d,
    OpaqueRef:4f9c343b-8ebd-9ade-7f9b-eaa22844b7dd,
    VM last booted on a CPU with features this host's CPU does not have.)
    

    The error message is clear enough: some VMs couldn’t be migrated because they were using CPU features that didn’t exist on the destination host.

    Understanding the Root Cause

    Here’s what was actually happening:

    The CPU Generation Gap

    While both hosts use Intel Xeon E5 processors, they’re from different microarchitecture generations:

    FeatureHera (E5-2670 v3)Zeus (E5-2650 v2)
    ArchitectureHaswell (2014)Ivy Bridge (2013)
    Instruction SetsAVX, AVX2, BMI2, FMA3AVX only
    Cores/Threads12c/24t8c/16t
    L3 Cache30 MB20 MB

    The Haswell architecture (v3) introduced several new instruction sets that Ivy Bridge (v2) doesn’t support, including:

    • AVX2 (Advanced Vector Extensions 2)
    • BMI2 (Bit Manipulation Instructions 2)
    • FMA3 (Fused Multiply-Add 3)

    How VMs Lock to CPU Features

    When a VM boots on a host, it discovers and can utilize all available CPU features. The hypervisor essentially tells the VM: “Here are all the CPU instructions you can use.”

    Once a VM starts using these features, it expects them to remain available. During live migration, XCP-ng checks: “Does the destination host support all the CPU features this running VM is currently using?”

    In my case:

    • VMs booted on Hera discovered and started using AVX2 and other Haswell-specific features
    • When XCP-ng tried to migrate them to Zeus for patching, Zeus said “I don’t have AVX2”
    • Migration blocked → Pool evacuation failed → Rolling update failed

    The Simple Analogy

    Think of it like a phone app that requires iOS 17 trying to run on a phone with iOS 16. The app expects certain APIs to be available, and when they’re not, it simply won’t run. You can’t hot-swap the phone’s OS mid-operation.

    Finding the Problematic VMs

    The OpaqueRefs in the error messages are internal XAPI object references, not directly useful for identifying VMs. Here’s how I tracked down the culprits:

    List VMs by Host

    # Show all running VMs on Hera
    xe vm-list resident-on=$(xe host-list name-label="Hera - z640" --minimal) \
      power-state=running is-control-domain=false params=name-label,uuid
    

    Trial and Error Method

    Since I had a manageable number of VMs, I:

    1. Identified all VMs running on Hera
    2. Attempted to manually migrate each one to Zeus through XO
    3. The ones that failed were my incompatible VMs

    Through this process, I identified two VMs that couldn’t migrate.

    The Solution: CPU Compatibility Mode

    XCP-ng provides a way to constrain VMs to use only CPU features available across all pool members. This is done via the platform:cpu-type parameter.

    Applying the Fix

    For each problematic VM:

    # Set CPU type to generic (lowest common denominator)
    xe vm-param-set uuid=<VM-UUID> platform:cpu-type=generic
    
    # Verify the setting
    xe vm-param-get uuid=<VM-UUID> param-name=platform
    
    # Reboot the VM for changes to take effect
    xe vm-reboot uuid=<VM-UUID>
    

    After rebooting, the VMs now only use CPU instructions available on both Haswell (v3) and Ivy Bridge (v2) processors.

    What “Generic” Actually Does

    Setting cpu-type=generic instructs the hypervisor to present the VM with a baseline CPU feature set that’s compatible across all hosts in the pool. The VM essentially runs in “compatibility mode,” using only the CPU features guaranteed to exist everywhere.

    Performance Impact

    For most workloads, the performance impact is negligible:

    • General compute: No noticeable difference
    • I/O-bound workloads: Unaffected
    • Specific AVX2-optimized applications: Minor performance reduction (typically <5%)

    The trade-off of slightly reduced performance for operational flexibility is well worth it in a homelab environment.

    Verification and Testing

    After applying the fix and rebooting the VMs:

    1. Test manual migration: Successfully migrated both VMs from Hera to Zeus
    2. Maintenance mode: Hera successfully evacuated all VMs to Zeus
    3. Rolling pool update: Completed without errors

    Success! The pool is now fully functional for automated updates.

    Prevention: Applying Pool-Wide

    To prevent this issue from occurring with other VMs in the future, you can apply CPU compatibility mode pool-wide:

    # Apply to all VMs in the pool
    for vm in $(xe vm-list is-control-domain=false params=uuid --minimal | tr ',' ' '); do 
      echo "Setting CPU compatibility for: $(xe vm-param-get uuid=$vm param-name=name-label)"
      xe vm-param-set uuid=$vm platform:cpu-type=generic
    done
    

    Important: VMs must be rebooted for this change to take effect. You can do this gradually during normal maintenance windows.

    The Bigger Lesson: Infrastructure Homogeneity

    This experience reinforced a fundamental principle of enterprise virtualization: infrastructure homogeneity matters.

    Why Matching Hardware is Critical

    Live Migration Requirements:

    • CPU instruction set compatibility
    • Same virtualization extensions (VT-x/AMD-V)
    • Compatible storage and network interfaces

    Operational Simplicity:

    • Predictable performance across the cluster
    • Simplified capacity planning
    • Reduced troubleshooting complexity

    High Availability:

    • VMs can failover to any host without constraints
    • Automated DRS/anti-affinity rules work seamlessly

    Enterprise Best Practices

    In production environments:

    1. Buy in matched sets: Purchase servers in pairs or groups with identical specs
    2. Lifecycle management: Refresh entire clusters together, not piecemeal
    3. Spare parts consistency: Keep compatible spare components
    4. Firmware alignment: Maintain consistent BIOS/firmware versions

    Homelab Reality

    Of course, homelabs are different:

    • We buy what’s affordable or available
    • Hardware comes from various sources (eBay, liquidation sales, hand-me-downs)
    • Mix-and-match is the norm, not the exception

    The good news? XCP-ng provides tools like CPU compatibility mode to work around these limitations.

    Alternative Solutions

    If CPU compatibility mode isn’t acceptable for your use case, consider these alternatives:

    Option 1: Separate Pools

    Run incompatible hosts as separate pools:

    Pros:

    • Each pool runs at full CPU capability
    • No performance compromises

    Cons:

    • No live migration between pools
    • More complex management
    • Reduced flexibility for workload placement

    Option 2: Hardware Standardization

    Upgrade or replace hosts to match specifications:

    Pros:

    • Full feature utilization
    • Operational simplicity
    • Better long-term scalability

    Cons:

    • Higher upfront cost
    • Requires hardware acquisition

    For my homelab, I’m keeping the CPU compatibility mode approach for now. E5-2670 v3 processors are relatively inexpensive on the secondary market (~$20-40), so upgrading Zeus to match Hera is a potential future project.

    Which CPU is Actually Better?

    For those curious, despite Zeus having a higher base clock (2.6 GHz vs 2.3 GHz), Hera is the superior host:

    • 50% more cores: 12c/24t vs 8c/16t = significantly better VM density
    • Newer architecture: Better IPC (instructions per clock)
    • Larger cache: 30MB vs 20MB
    • Advanced instructions: AVX2, BMI2, FMA3 for optimized workloads

    The lesson? More cores and newer architecture generally trump raw clock speed for virtualization workloads.

    Key Takeaways

    1. CPU compatibility matters: Mixed CPU generations in a pool can prevent live migration and automated updates
    2. CPU compatibility mode exists: The platform:cpu-type=generic parameter solves most heterogeneous pool issues
    3. Performance impact is minimal: For most workloads, compatibility mode has negligible performance cost
    4. Homogeneous infrastructure is ideal: Matching hardware simplifies operations and prevents these issues
    5. Homelabs are different: We work with what we have and use workarounds when necessary

    Troubleshooting Checklist

    If you encounter similar issues:

    • ☐ Check CPU models across all pool members
    • ☐ Verify CPU architecture generations match
    • ☐ Review VM placement and migration history
    • ☐ Test manual VM migration to identify incompatible VMs
    • ☐ Apply platform:cpu-type=generic to problematic VMs
    • ☐ Reboot VMs after applying CPU compatibility settings
    • ☐ Consider pool-wide application for future-proofing

    Conclusion

    What started as a frustrating “why won’t my rolling update work?” turned into a valuable learning experience about virtualization architecture fundamentals. The issue was quickly resolved with XCP-ng’s built-in CPU compatibility features, and I gained a deeper appreciation for why enterprise environments invest in hardware consistency.

    For fellow homelabbers running mixed hardware: don’t let CPU generation differences stop you. Apply CPU compatibility mode, reboot your VMs, and get back to the fun stuff—learning, breaking things, and building your infrastructure skills.

    Have you encountered similar issues in your homelab? How did you solve them? Connect with me on LinkedIn and let’s discuss!


    Environment Details:

    • Hypervisor: XCP-ng 8.x
    • Management: Xen Orchestra (latest)
    • Pool: 2 hosts (mixed Intel Xeon E5 v2/v3)
    • Issue: Rolling pool updates failing on CPU incompatibility

    Related Resources:


    Questions or thoughts? Connect with me on LinkedIn | About mmooresystems

  • Welcome to my journey


    After years of tinkering, breaking things, and occasionally fixing them in my homelab, I figured it was time to start documenting the journey.

    This site is where I’ll be sharing the lessons learned from building enterprise-grade infrastructure at home, the networking concepts that keep me up at night (in a good way), and the occasional “why didn’t anyone tell me this sooner?” moment.

    What to expect:

    • Deep dives into networking protocols (because understanding BGP shouldn’t require a PhD)
    • Homelab projects that actually work (and the 17 failed attempts before that)
    • Infrastructure tutorials for building resilient systems
    • The truth about working in network engineering and SRE roles

    First real post coming soon. In the meantime, check out the About Me page to learn more about who’s behind this chaos.

    Thanks for stopping by.

    – Mike


    Questions? Connect with me on LinkedIn.