Well, that was a long weekend! My cloud management automation team and I started work bright and early Saturday morning, after notifying our business unit customers several days before that the portal would be down on Saturday while we completed the upgrade. We spent approximately 30 hours over the weekend pushing through the upgrade. Most of the time was simply due to the large scale of our deployment (thousands of VMs, over 1,000 blueprints, 219 business groups, and hundreds of entitlements), and the need to do proper testing to ensure our customers have a healthy environment on Monday morning.
Here are some notes that I took during the upgrade process – it is anything but simple, and a lot of improvements could be made by the vRealize engineering team to enhance the customer experience.
First, take snapshots of every VM, source and target. Take snapshots before and after Pre-Migration, and before and after Migration. This will allow you to completely rollback your environment in case something goes horribly wrong. Take SQL database backups as well, prior to the snapshot, so they are included in the snapshot.
Preparing the Source System
Before you begin the Pre-Migration and make any changes to the environments, understand that any lease extension requests or machine approval requests in flight will be lost during the migration process. For us this meant that we went ahead and approved any pending lease extension requests, notifying the Business Group Managers that we had done so, as a lost lease extension request could cause expiration of a machine. We deemed that new machine requests were less critical, as the owner could simply request the machine again.
Allow all pending workflows to complete, and uninstall any custom workflow stubs by using cloudutil from the CDK.
Self-Signed SSL Certificates and Pre-Migration
If you are migrating and want to keep your portal website URL the same, you’ll need to do this:
- Generate a self-signed certificate on your existing IaaS server.
- Import the self-signed certificate into both the existing IaaS server, as well as your new IaaS server, so that it is a trusted certificate. If you’re not sure how to do this, see this article.
- Edit the binding in IIS manager for port 443 to use the self-signed certificate, instead of your current certificate.
- You’ll also need to edit both the Manager.config and Web.config file for the Manager service and Repository service to point to the FQDN used for the self-signed certificate, then restart services, recycle IIS application pools, and do an “iisreset.”
- Verify that you can browse to your existing vCAC portal, using the FQDN, from the new vRA IaaS server, and that:
- You can still load the portal (Model Manager/Repository is working).
- The SSL certificate is trusted and you don’t receive any security warnings.
The reason you have to do this is that the both the Pre-Migration and actual Migration require a trusted SSL certificate on the source system. Both our source system and target system will use “onecloud.mckesson.com” for the URL, but during the migration, I need to address source and target individually. I can’t just hack around this with a hosts file, because then the FQDN won’t match the SSL certificate’s common name, which makes the SSL certificate untrusted.
Once you have the SSL trust re-established using self-signed certificates, you can proceed with Pre-Migration.
One of the byproducts of having a long-running vCAC installation (ours has been up since 2013) is that you will inevitably have orphaned work items in the dbo.WorkItems table. Since these work items will likely be months or even years old, they are never going to complete successfully, and can be safely dropped. First, confirm that there are not a lot of recent work items in the table, and after doing so, you can safely delete them:
DELETE from dbo.WorkItems
Performing the Migration
During the actual Migration, point the target system to the primary IaaS server, not the load balancer. It needs a self-signed certificate that is trusted, so you might need to update the IIS binding, as you did above on the source IaaS server.
After the Migration
One of the first things you need to do after the Migration, when you bring up all of the components in the new system, is to do an Inventory Data Collection across all of your Compute Resources so that the MoRef (Managed Object Reference) of each VM can be updated in the vRA database. I found a trick to do this automatically without having to manually kick it off from the portal:
UPDATE dbo.DataCollectionStatus SET LastCollectedTime=NULL, CollectionStartTime=NULL
What this does is set the LastCollectionTime and CollectionStartTime to NULL for each Compute Resource so that vRA will immediately initiate a new Inventory Data Collection on each, just as if those Compute Resources were brand new and had never run data collection.
Guest Agent Reconfiguration
If you’re upgrading from 5.x to 6.x, you’ll need to update your Guest Agents to point to the IaaS load balancer, instead of the primary portal URL. This is due to the architectural change that placed the tcServer in front of the IaaS tier in the application. Here are steps to do this for each type of Guest Agent:
Windows Guest Agent
If you are using an older Windows Guest Agent, unfortunately, you’ll most likely have to upgrade to the new version that is a port of the Linux Guest Agent. Download the new Guest Agent from https://vra-appliance/i/ and be sure to Unblock the downloaded file… otherwise UAC will block the Guest Agent from running and you might waste a lot of time trying to figure out why (I know I did…). Right-click the Zip file you downloaded from the vRA Appliance, select Properties, then click Unblock:
By the way, this didn’t seem to be as much of a problem in 5.x, since the file came from the vCAC binaries. Now that you get the file from the website of the vRA Appliance, unless that website is in your Trusted Websites list, it will get blocked from execution automatically.
To uninstall the old Windows Service, run the following command:
To install the new Windows Service, run the following command:
C:\VRMGuestAgent\winservice.exe -i -p SSL -h iaas_load_balancer:443
If your SSL certificate has changed, you can remove the cert.pem file from C:\VRMGuestAgent, and the Guest Agent will automatically download the cert.pem from the IaaS load balancer upon execution.
Linux Guest Agent
We didn’t have to update the Linux Guest Agent from the 5.2 version. It still seems to work fine, however, you will need to reconfigure it to point at the IaaS load balancer:
# rm -rf /usr/share/log
# cd /usr/share/gugent
# ./installgugent.sh iaas_load_balancer:443 ssl
Please note that the documentation for Linux Guest Agent installation is actually incorrect and will be updated (we filed a PR with VMware) to reflect this correct way to install the Linux Guest Agent. The document says you should put a hyphen in front of the IaaS hostname, however, the hyphen is not necessary and will actually cause connection to fail because the hostname is invalid.
I hope you have a successful migration. We were able to complete ours, however, we still have a number of PRs that are open with VMware and are awaiting some critical patches. The good news is that if you are performing your upgrade after 6.2 Service Pack 1 is released, I am told that they will try to get most of the patches we’ve had created into that release, so the product should be much more stable. Here is our shiny new Service Catalog: