The Problem: When Your Homelab Becomes a Lesson in Enterprise Architecture
I recently ran into an interesting issue with my XCP-ng homelab that taught me a valuable lesson about virtualization infrastructure design. If you’re running a mixed-hardware pool and your rolling updates keep failing with cryptic CANNOT_EVACUATE_HOST errors, this post is for you.
The Setup
My homelab consists of two hosts in an XCP-ng pool (managed via Xen Orchestra):
- Hera: HP Z640 with Intel Xeon E5-2670 v3 (Haswell, 12c/24t @ 2.30GHz)
- Zeus: Dell server with Intel Xeon E5-2650 v2 (Ivy Bridge, 8c/16t @ 2.60GHz)
Seems reasonable, right? Both are Xeon E5 v2/v3 generation processors, both support virtualization, and they’ve been running happily together in a pool for quite some time.
The Failure: Rolling Updates Hit a Wall
When I attempted to perform a rolling pool update through Xen Orchestra, I was greeted with this error:
CANNOT_EVACUATE_HOST(VM_INCOMPATIBLE_WITH_THIS_HOST,
OpaqueRef:1de8f41d-c39c-b097-026d-c8b687dee6a1,
OpaqueRef:4f9c343b-8ebd-9ade-7f9b-eaa22844b7dd,
VM last booted on a CPU with features this host's CPU does not have.)
Similarly, attempting to put Hera into maintenance mode resulted in:
VM_INCOMPATIBLE_WITH_THIS_HOST(
OpaqueRef:dd8ccb61-2e86-4853-880f-49f078b0e10d,
OpaqueRef:4f9c343b-8ebd-9ade-7f9b-eaa22844b7dd,
VM last booted on a CPU with features this host's CPU does not have.)
The error message is clear enough: some VMs couldn’t be migrated because they were using CPU features that didn’t exist on the destination host.
Understanding the Root Cause
Here’s what was actually happening:
The CPU Generation Gap
While both hosts use Intel Xeon E5 processors, they’re from different microarchitecture generations:
| Feature | Hera (E5-2670 v3) | Zeus (E5-2650 v2) |
|---|---|---|
| Architecture | Haswell (2014) | Ivy Bridge (2013) |
| Instruction Sets | AVX, AVX2, BMI2, FMA3 | AVX only |
| Cores/Threads | 12c/24t | 8c/16t |
| L3 Cache | 30 MB | 20 MB |
The Haswell architecture (v3) introduced several new instruction sets that Ivy Bridge (v2) doesn’t support, including:
- AVX2 (Advanced Vector Extensions 2)
- BMI2 (Bit Manipulation Instructions 2)
- FMA3 (Fused Multiply-Add 3)
How VMs Lock to CPU Features
When a VM boots on a host, it discovers and can utilize all available CPU features. The hypervisor essentially tells the VM: “Here are all the CPU instructions you can use.”
Once a VM starts using these features, it expects them to remain available. During live migration, XCP-ng checks: “Does the destination host support all the CPU features this running VM is currently using?”
In my case:
- VMs booted on Hera discovered and started using AVX2 and other Haswell-specific features
- When XCP-ng tried to migrate them to Zeus for patching, Zeus said “I don’t have AVX2”
- Migration blocked → Pool evacuation failed → Rolling update failed
The Simple Analogy
Think of it like a phone app that requires iOS 17 trying to run on a phone with iOS 16. The app expects certain APIs to be available, and when they’re not, it simply won’t run. You can’t hot-swap the phone’s OS mid-operation.
Finding the Problematic VMs
The OpaqueRefs in the error messages are internal XAPI object references, not directly useful for identifying VMs. Here’s how I tracked down the culprits:
List VMs by Host
# Show all running VMs on Hera
xe vm-list resident-on=$(xe host-list name-label="Hera - z640" --minimal) \
power-state=running is-control-domain=false params=name-label,uuid
Trial and Error Method
Since I had a manageable number of VMs, I:
- Identified all VMs running on Hera
- Attempted to manually migrate each one to Zeus through XO
- The ones that failed were my incompatible VMs
Through this process, I identified two VMs that couldn’t migrate.
The Solution: CPU Compatibility Mode
XCP-ng provides a way to constrain VMs to use only CPU features available across all pool members. This is done via the platform:cpu-type parameter.
Applying the Fix
For each problematic VM:
# Set CPU type to generic (lowest common denominator)
xe vm-param-set uuid=<VM-UUID> platform:cpu-type=generic
# Verify the setting
xe vm-param-get uuid=<VM-UUID> param-name=platform
# Reboot the VM for changes to take effect
xe vm-reboot uuid=<VM-UUID>
After rebooting, the VMs now only use CPU instructions available on both Haswell (v3) and Ivy Bridge (v2) processors.
What “Generic” Actually Does
Setting cpu-type=generic instructs the hypervisor to present the VM with a baseline CPU feature set that’s compatible across all hosts in the pool. The VM essentially runs in “compatibility mode,” using only the CPU features guaranteed to exist everywhere.
Performance Impact
For most workloads, the performance impact is negligible:
- General compute: No noticeable difference
- I/O-bound workloads: Unaffected
- Specific AVX2-optimized applications: Minor performance reduction (typically <5%)
The trade-off of slightly reduced performance for operational flexibility is well worth it in a homelab environment.
Verification and Testing
After applying the fix and rebooting the VMs:
- Test manual migration: Successfully migrated both VMs from Hera to Zeus
- Maintenance mode: Hera successfully evacuated all VMs to Zeus
- Rolling pool update: Completed without errors
Success! The pool is now fully functional for automated updates.
Prevention: Applying Pool-Wide
To prevent this issue from occurring with other VMs in the future, you can apply CPU compatibility mode pool-wide:
# Apply to all VMs in the pool
for vm in $(xe vm-list is-control-domain=false params=uuid --minimal | tr ',' ' '); do
echo "Setting CPU compatibility for: $(xe vm-param-get uuid=$vm param-name=name-label)"
xe vm-param-set uuid=$vm platform:cpu-type=generic
done
Important: VMs must be rebooted for this change to take effect. You can do this gradually during normal maintenance windows.
The Bigger Lesson: Infrastructure Homogeneity
This experience reinforced a fundamental principle of enterprise virtualization: infrastructure homogeneity matters.
Why Matching Hardware is Critical
Live Migration Requirements:
- CPU instruction set compatibility
- Same virtualization extensions (VT-x/AMD-V)
- Compatible storage and network interfaces
Operational Simplicity:
- Predictable performance across the cluster
- Simplified capacity planning
- Reduced troubleshooting complexity
High Availability:
- VMs can failover to any host without constraints
- Automated DRS/anti-affinity rules work seamlessly
Enterprise Best Practices
In production environments:
- Buy in matched sets: Purchase servers in pairs or groups with identical specs
- Lifecycle management: Refresh entire clusters together, not piecemeal
- Spare parts consistency: Keep compatible spare components
- Firmware alignment: Maintain consistent BIOS/firmware versions
Homelab Reality
Of course, homelabs are different:
- We buy what’s affordable or available
- Hardware comes from various sources (eBay, liquidation sales, hand-me-downs)
- Mix-and-match is the norm, not the exception
The good news? XCP-ng provides tools like CPU compatibility mode to work around these limitations.
Alternative Solutions
If CPU compatibility mode isn’t acceptable for your use case, consider these alternatives:
Option 1: Separate Pools
Run incompatible hosts as separate pools:
Pros:
- Each pool runs at full CPU capability
- No performance compromises
Cons:
- No live migration between pools
- More complex management
- Reduced flexibility for workload placement
Option 2: Hardware Standardization
Upgrade or replace hosts to match specifications:
Pros:
- Full feature utilization
- Operational simplicity
- Better long-term scalability
Cons:
- Higher upfront cost
- Requires hardware acquisition
For my homelab, I’m keeping the CPU compatibility mode approach for now. E5-2670 v3 processors are relatively inexpensive on the secondary market (~$20-40), so upgrading Zeus to match Hera is a potential future project.
Which CPU is Actually Better?
For those curious, despite Zeus having a higher base clock (2.6 GHz vs 2.3 GHz), Hera is the superior host:
- 50% more cores: 12c/24t vs 8c/16t = significantly better VM density
- Newer architecture: Better IPC (instructions per clock)
- Larger cache: 30MB vs 20MB
- Advanced instructions: AVX2, BMI2, FMA3 for optimized workloads
The lesson? More cores and newer architecture generally trump raw clock speed for virtualization workloads.
Key Takeaways
- CPU compatibility matters: Mixed CPU generations in a pool can prevent live migration and automated updates
- CPU compatibility mode exists: The
platform:cpu-type=genericparameter solves most heterogeneous pool issues - Performance impact is minimal: For most workloads, compatibility mode has negligible performance cost
- Homogeneous infrastructure is ideal: Matching hardware simplifies operations and prevents these issues
- Homelabs are different: We work with what we have and use workarounds when necessary
Troubleshooting Checklist
If you encounter similar issues:
- ☐ Check CPU models across all pool members
- ☐ Verify CPU architecture generations match
- ☐ Review VM placement and migration history
- ☐ Test manual VM migration to identify incompatible VMs
- ☐ Apply
platform:cpu-type=genericto problematic VMs - ☐ Reboot VMs after applying CPU compatibility settings
- ☐ Consider pool-wide application for future-proofing
Conclusion
What started as a frustrating “why won’t my rolling update work?” turned into a valuable learning experience about virtualization architecture fundamentals. The issue was quickly resolved with XCP-ng’s built-in CPU compatibility features, and I gained a deeper appreciation for why enterprise environments invest in hardware consistency.
For fellow homelabbers running mixed hardware: don’t let CPU generation differences stop you. Apply CPU compatibility mode, reboot your VMs, and get back to the fun stuff—learning, breaking things, and building your infrastructure skills.
Have you encountered similar issues in your homelab? How did you solve them? Connect with me on LinkedIn and let’s discuss!
Environment Details:
- Hypervisor: XCP-ng 8.x
- Management: Xen Orchestra (latest)
- Pool: 2 hosts (mixed Intel Xeon E5 v2/v3)
- Issue: Rolling pool updates failing on CPU incompatibility
Related Resources:
Questions or thoughts? Connect with me on LinkedIn | About mmooresystems