Showing posts with label Case Study. Show all posts
Showing posts with label Case Study. Show all posts
Johnny Zhang
I was setting up the BCS team lab, and one of the steps is to install a new vCenter 4.0 U1 server with a SQL 2008 back end. During the installation, the installer complained about port 80 was in use. From the past experience, I quickly checked if IIS is running on the server. There was not.

A quick “netstat –a” shows port 80 is listening to any incoming request. To find what service was using the port you can use “netstat –bna” the result shows the service that using the port was “SQLserver.exe” Wow, since when SQL server starts to use port 80? I took a look at SQL server configuration manager. There a list of services. One that came into the spot light was “SQL server reporting service” I recalls on SQL 2005 when Microsoft introduces the reporting service, IIS was a required service. Since IIS is not running, how did reporting service start itself? Right after I turn off SQL reporting service I can move the installation forward. (So the reporting service was the one holding port 80). Did a quick research on SQL 2008’s reporting service, here is the reason:

“Reporting Services no longer depends on IIS to provide access to the SOAP endpoint. URLs no longer include Web sites in IIS. Reporting Services uses HTTP.SYS directly to listen for requests on a specific port that you define for report server URLs.”

In case you need to use SQL 2008 reporting service on the same server where vCenter installed, keep in mind this could prevent vCenter from starting even it already installed. Make sure you either change the port for the reporting service or your vCenter’s port to avoid any issues
Johnny Zhang
Since most of you are upgrading form VI 3 to vSphere, I'd like to share with you on this issue I worked with one of the BCS customers.

I received this call when customer failed to add their ESX 3.5 U4 host to their new vCenter 4.0 server. The error message they hit was “A general system error occurred” (Always nice to see a message does not give any direction). So for the first step, we tested viclient connect directly to ESX server and that worked for us. Normally, that means hostd is working (This is actually the key issue). So we focused on vpxa.
from the vpxa logs:

[2010-05-20 11:29:01.793 server_name error 'App'] [VpxaHalCnxHostagent] caught InvalidProperty::Exception - filter creati
on failed, temptorary versioning workaround
[2010-05-20 11:29:01.820 server_name verbose 'App'] [VpxVmomi] Invoke error: vpxapi.VpxaService.login session: 5238dc5c-bd8b-fcdd-f450-ab23a826a413 Throw: vmodl.fault.SystemError

So we tried many things and nothing worked... I then want to see what hostd report during that time

Here is the hostd log:
[2010-05-20 13:04:36.281 'App' 52411312 error] AdapterServer caught unexpected exception: vector::reserve
[2010-05-20 13:04:36.281 'App' 52411312 trivia] Completing task haTask vim.event.EventManager.createColle
ctor-389 as 3
[2010-05-20 13:04:36.281 'PropertyProvider' 52411312 trivia] RecordOp 2: info, haTask--vim.event.EventMana
ger.createCollector-389
[2010-05-20 13:04:36.281 'Vmomi' 52411312 info] Activation [N5Vmomi10ActivationE:0x9fd9780] : Invoke done
[createCollector] on [vim.event.EventManager:ha-eventmgr]
[2010-05-20 13:04:36.281 'Vmomi' 52411312 info] Throw vmodl.fault.SystemError
[2010-05-20 13:04:36.281 'Vmomi' 52411312 info] Result:
(vmodl.fault.SystemError) {
dynamicType = ,
reason = "",
msg = ""
}
[2010-05-20 13:04:36.325 'Proxysvc Req00153' 18353072 trivia] Client HTTP stream read error
[2010-05-20 13:04:36.325 'Proxysvc Req00153' 18353072 trivia] Closed

The error message "AdapterServer caught unexpected exception: vector::reserve" lead me to a problem report. However, the report had no vCenter involved. I asked customer to bring up the ESX server via viclient, click on any virtual machine, then click on "event" tab, sure enough, it hits “A general system error occurred” message, just like the problem report. The good news is patch 19 fixed this issue. We installed patch 19 ( ESX350-200912401-BG - PATCH )
and able to add those ESX server to the new vCenter 4.0.

If you already have patch 19 installed, you should not have this issue. If you don't, then put this on the side, it would be handy in case you need it.

Johnny Zhang
This is a case I worked with one of the BCS customers on their vCenter issue.

Problem: vCenter starts, but soon after, it will crash on it's own. From the log we saw:

[2010-03-12 11:44:18.178 'App' 4500 error] An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Check database connectivity before restarting. Error: Error[VdbODBCError] (-1) "ODBC error: (23000) - [Microsoft][SQL Native Client][SQL Server]Violation of PRIMARY KEY constraint 'PK_VPX_IP_ADDRESS'. Cannot insert duplicate key in object 'dbo.VPX_IP_ADDRESS'." is returned when executing SQL statement "INSERT INTO VPX_IP_ADDRESS (ENTITY_ID, DEVICE_ID, IP_ADDRESS) VALUES (?, ?, ?)"
[2010-03-12 11:44:18.178 'App' 4500 verbose] Backtrace:
[2010-03-12 11:44:18.178 'App' 4500 info] Forcing shutdown of VMware VirtualCenter now

So we know the cause of this problem was something try to insert into the table dbo.VPX_IP_ADDRESS. Note the thread ID is 4500, so we can trace back to the problem virtual machine or host. In our case, this was a VM

[2010-03-12 11:44:17.865 'App' 4500 verbose] [VpxdMoEventManager] Event[2567833]: SOME_VM on SOME_ESX_HOST in SOME_CLUSTER is powered on

Now we could:
  1. Shut down the vm
  2. Remove the host from vCenter
  3. Stop hostd/vpxa service(s) on that ESX server
  4. Start vCenter and confirm everything is working as designed
  5. Power down the VM, unregister it
  6. Add the host back in vCenter
  7. Check database to make sure the VM is not there
  8. Register the VM
Now, please don't ask me why one VM can bring down vCenter, it would be off topic :)
Johnny Zhang
I was working with one of the BCS customer on a case. The customer was originally planing to extend a Windows 2003 VM's OS disk to a larger size. However, after extend the VMDK files with VI client, and extend the NTFS inside Windows (Since this was a Windows 2003 guest, the customer needs to use a helper VM to do the job) after failed to power on the VM, the customer found a shocking fact: there was a snapshot. Then the customer tried to shrink the VMDK file back to the original size... and number of other things was also tried. So when we start to work on this issue, the VMDK file was broken in many different ways alone with the snapshot. To make the thing worse the last and only backup was created right after the extension of the VMDK file.

The end story was, we did recover all the data on the VMDK, the VMDK was not good enough to power on as a boot disk, but that was good enough to save the day. Here was how we did it:

First, list the size of the flat file by using "ls -l" in the VM directory
Keep in mind the number we need is the size of the flat file. In my test set up it's 8589934592. We need this number later. Now we need open the VMDK file (The pointer). In my case snap_test.vmdk:
Note, we need the CID and the RW number.
The RW number is the flat disk size divide by 512.
in my case 8589934592 / 512 = 16777216. If this number is wrong, you will not able to power on the VM. (So first we fix this part).

Now let's open the snapshot VMDK file. snap-test-000001.vmdk

In this file the "parentCID" should be the same as the snap-test.vmdk's CID. In my case 4cc5f033. If this is wrong, then the VM will not power on. (We fixed this one as well). The RW size would be also wrong, since it will only record the original size. Change this to the same size as the parent RW (16777216).
Now we need a helper VM to attach this disk as a second VMDK file (Not boot disk). When the OS boots up it will automatically run check disk against the VMDK and auto fix the errors. Now you should able to mount the disk inside the helper VM and copy over the data.