Computing Blogs

ESXCLI commands to configure VMware ESXi hosts for Starwind

Unsure if it survives upgrades yet….

 

Base commands as follows:

esxcli storage nmp device list | grep iSCSI
  Device Display Name: STARWIND iSCSI Disk (eui.46bed2d1ba1297d4)
  Device Display Name: STARWIND iSCSI Disk (eui.e1c4a9d50c1de985)
  Device Display Name: STARWIND iSCSI Disk (eui.75b468162d2473ac)
  Device Display Name: STARWIND iSCSI Disk (eui.0bd446722e88d9a6)

esxcli storage nmp device set --device eui.46bed2d1ba1297d4 --psp VMW_PSP_RR

esxcli storage nmp psp roundrobin deviceconfig set -d eui.46bed2d1ba1297d4 -t iops -I 1

 

Can be put in a script:

for i in `ls /vmfs/devices/disks/ | grep eui. | cut -b 1-20` ; do echo $i; esxcli storage nmp device set --device $i --psp VMW_PSP_RR; esxcli storage nmp psp roundrobin deviceconfig set --device $i -t iops -I 1;done
Posted by admin in VMware

vCenter Error – Exception occurred in install precheck phase

I see this one a fair bit, both in my own lab and in production environments, when doing vCenter updates.

The simplistic solution is to ssh to the vCenter appliance, log on with the root crednetials (not the Single Sign On ones), open a bash shell by typing shell, then removing a conf file:

rm /etc/applmgmt/appliance/software_update_state.conf

Retry the update from the VAMI interface again.

 

Anecdotally, it might coincide with aborted upgrades, but currently unable to confirm.

Posted by admin in VMware

Backup Exec Returns error about Exchange database not mounted

TLDR

Restart the Microsoft Exchange Replication service, either via services.msc GUI, or via:

net stop MSExchangeRepl
net start MSExchangeRepl

 

Recently I was asked to look into why somebody was having the following error during Agent level backups of an Exchange 2013 server:

The job failed with the following error: The database specified for the snapshot was not backed up because the database was not mounted.

Once this had happened, all subsequent Exchange server backups would fail as well. They had always been resolving it with reboots of the Exchange server, based on an article they had found on Veritas, but this always meant a 10-15 minute outage, which they were trying to avoid – they are a small outfit with a single Exchange server.

Reading through the Veritas documentation, the database backups ran through the Microsoft supplied VSS writers, and it looks like the VSS process was getting stuck, and hanging.  The VSS for Exchange in Exchange 2013 onwards is part of the Microsoft Exchange Replication service.  Restarting this service resolved the failed backups.

I suspect the reason why it is hanging is their Exchange server is virtualised, and their hypervisor’s disk system isn’t really fast enough, with occasional spikes in latency.

Posted by admin in Exchange

Exchange Mailbox Quarantined

Sometimes an Exchange mailbox will get quarantined due to mailbox threads hanging. Reasons may be corruption, AV scanning, backups, storage issues and all manner of things.

For Exchange 2013 onwards:

Check which mailboxes are quarantined:

Get-Mailbox | Get-MailboxStatistics | Where {$_.IsQuarantined -eq $True} |  fl DisplayName

Scan for corruption:

New-MailboxRepairRequest -mailbox <logon_id_of_mailbox> -corruptiontype Aggregatecounts,searchfolder,provisionedfolder,folderview

Check Repair Progress:

Get-MailboxRepairRequest -mailbox <logon_id_of_mailbox>

Re-enable mailbox:

Disable-MailboxQuarantine <logon_id_of_mailbox>
Posted by admin in Computing Blogs

New Edge and vCenter prompts

vCenter under the New Chromium based Edge browser always prompts when using the enhanced authentication plugin or VMRC.

To enable a checkbox to remember your setting, add the following registry key (you will have to create the Edge key)

HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Edge

dword: ExternalProtocolDialogShowAlwaysOpenCheckbox = 1
Posted by admin in VMware

Suppressing System logs on host are stored on non-persistent storage

This occurs is the scratch location is to RAM, rather than a datastore, usually if the ESX boots from flash media (as writing the logs would accelerate flash wear).

Obviously the correct fix is to point it to a datastore:

Syslog.global.logDir = [<datastore-name>] /scratch/log

Syslog.global.logDirUnique = true

(The second is if multiple ESX’s are configured to write logs to same location)

However, there may be scenarios when this is not desirable. For example, if all your datastore storage is VSAN (which doesn’t support having the ESX hosting VSAN writing to its VSAN datastore), and you don’t care about the logs. You simply just need the message suppressed so as not to hide other errors.

Set:

Syslog.global.logHost = udp://127.0.0.1:514

(Or even to a valid syslog server if you have one)

Posted by admin in Computing Blogs, VMware

WSUS Snippets

For diagnosing slow downloads from MS, set download to foreground

$conf=(get-wsusserver).GetConfiguration()
$conf.BitsDownloadPriorityForeground=$true
$conf.save()

Restart BITS to take effect (WSUS has a dependency, so will also restart). Set back to false after testing, obviously, else you may overload your connection.

AdamJ clean-up script – http://www.adamj.org/clean-wsus.html – sadly now commercial :(, yet another example of someone who has been given all the info by everyone else, created a simple, but useful script that so many have become reliant on, and then personal greed has set in.

Limit WID database memory use – http://www.stugr.com/2013/01/24/wsus-limit-sql-windows-internal-database-memory/ – essentially these commands from a cmd prompt:

osql -E -S \\.\pipe\Microsoft##WID\tsql\query

exec sp_configure 'show advanced option', '1';
reconfigure;
exec sp_configure;
go

Check for max server memory in output, likely to be massive such as 2Tb (output is in MB)


exec sp_configure 'max server memory', 2048;
reconfigure with override;
go
quit

This sets to 2GB (max server memory = 2048MB)

Clean up disk

Occasionally its necessary to delete all the downloaded content and redownload, in order to reclaim disk space that the WSUS Clean Up tasks fail to do.

Stop WSUS

Stop BITS

Empty the BITS queue via Powershell (this rarely works):

import-module bitstransfer
get-bitstransfer –allusers | remove-bitstransfer

Empty the BITS queue (Last resort): (need to stop BITS again if you tried powershell method above)

del "%ALLUSERSPROFILE%\Application Data\Microsoft\Network\Downloader\*.*"

Delete the contents of the WSUScontent folder (but not the folder itself

Tell WSUS to check all downloaded files are present, and download any missing ones (which will be all of them)

"%systemdrive%\Program Files\Update Services\Tools\wsusutil.exe" RESET

Above command will restart BITS and WSUS, else start them manually

A reboot may be needed if you used the del command to delete BITS queue

Other useful commands:

"%systemdrive%\Program Files\Update Services\Tools\wsusutil.exe" checkhealth
bitsadmin /list /allusers

Posted by admin in WSUS

Using Tape Drive under ESX6

Ultimately, using PCI passthru will always work better if available, but…

esxcli storage nmp device list
naa.500110a00152f5ba
   Device Display Name: HP Serial Attached SCSI Tape (naa.500110a00152f5ba)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_MRU
   Path Selection Policy Device Config: Current Path=vmhba3:C2:T0:L0
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba3:C2:T0:L0
   Is USB: false

esxcli storage nmp satp list

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -V "HP" -M "Ultrium 5-SCSI"

esxcli storage core claiming unclaim -t location -A vmhba3 -C 2 -T 0 -L 0       <<< Must match your HBA

esxcfg-rescan vmhba3                         <<< Must match HBA

esxcli storage nmp device list
naa.600508b1001cac88dec8ea77b73b7083
   Device Display Name: Local HP Disk (naa.600508b1001cac88dec8ea77b73b7083)
   Storage Array Type: VMW_SATP_LOCAL
   Storage Array Type Device Config: SATP VMW_SATP_LOCAL does not support device configuration.
   Path Selection Policy: VMW_PSP_FIXED
   Path Selection Policy Device Config: {preferred=vmhba4:C0:T0:L6;current=vmhba4:C0:T0:L6}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba4:C0:T0:L6
   Is USB: false

Reboot ESX
References:

ESXi 5, HP P212 and LTO 5 tape drive goes offline

https://kb.vmware.com/s/article/1026157

Posted by admin in Computing Blogs, VMware

Opsview/Nagios MySQL DB corruption

Seems the runtime.nagios_servicechecks InnoDB table is prone to corruption if the server crashes.

Often /var/log/opsview/opsviewd.log has errors such as:
[2018/11/12 19:21:24] [import_ndologsd] [WARN] Failed to import 1542029786.120211
[2018/11/12 19:21:24] [import_ndologsd] [FATAL] Error for 1542029791.083171: Can't call method "execute" on an undefined value at /usr/local/nagios/bin/../lib/Opsview/Utils/NDOLogsImporter.pm line 1163.

 

Other InnoDB tables are also prone – to find out which, run:

mysqlcheck -u <user> -p runtime

Which will display all the “good” tables up to the bad one.  To find the bad one, in the MySQL CLI, run:

MySQL> use runtime;

mysql> show tables;
+----------------------------------------+
| Tables_in_runtime |
+----------------------------------------+
| nagios_acknowledgements |
| nagios_commands |
| nagios_commenthistory |
| nagios_comments |
| nagios_configfiles |
| nagios_configfilevariables |
| nagios_conninfo |
| nagios_contact_addresses |
| nagios_contact_notificationcommands |
| nagios_contactgroup_members |
| nagios_contactgroups |
| nagios_contactnotificationmethods |
| nagios_contactnotifications |
| nagios_contacts |
| nagios_contactstatus |
| nagios_customvariables |
| nagios_customvariablestatus |
| nagios_database_version |
| nagios_dbversion |
| nagios_downtimehistory |
| nagios_eventhandlers |
| nagios_externalcommands |
| nagios_flappinghistory |
| nagios_host_contactgroups |
| nagios_host_contacts |
| nagios_host_parenthosts |
| nagios_hostchecks |
| nagios_hostdependencies |
| nagios_hostescalation_contactgroups |
| nagios_hostescalation_contacts |
| nagios_hostescalations |
| nagios_hostgroup_members |
| nagios_hostgroups |
| nagios_hosts |
| nagios_hoststatus |
| nagios_instances |
| nagios_logentries |
| nagios_notifications |
| nagios_objects |
| nagios_processevents |
| nagios_programstatus |
| nagios_runtimevariables |
| nagios_scheduleddowntime |
| nagios_schema_version |
| nagios_service_contactgroups |
| nagios_service_contacts |
| nagios_servicechecks |
| nagios_servicedependencies |
| nagios_serviceescalation_contactgroups |
| nagios_serviceescalation_contacts |
| nagios_serviceescalations |
| nagios_servicegroup_members |
| nagios_servicegroups |
| nagios_services |
| nagios_servicestatus |
| nagios_statehistory |
| nagios_systemcommands |
| nagios_timedeventqueue |
| nagios_timedevents |
| nagios_timeperiod_timeranges |
| nagios_timeperiods |
| opsview_contact_hosts |
| opsview_contact_objects |
| opsview_contact_services |
| opsview_contacts |
| opsview_database_version |
| opsview_host_objects |
| opsview_host_services |
| opsview_hostgroup_hosts |
| opsview_hostgroups |
| opsview_hosts |
| opsview_hosts_matpaths |
| opsview_monitoringclusternodes |
| opsview_monitoringservers |
| opsview_performance_metrics |
| opsview_servicechecks |
| opsview_servicegroups |
| opsview_topology_map |
| opsview_viewports |
| schema_version |
| snmptrapdebug |
| snmptrapexceptions |
| snmptrapruledebug |
+----------------------------------------+
83 rows in set (0.00 sec)

 

(Sometimes you might have to add innodb_force_recovery = 4 to the [mysqld] section of my.cnf and restart MySQL. Note - THIS IS DANGEROUS! Stop Opvsiew first. Remember to remove this line, and restart MySQL before restarting Opsview)

 

 

Resolution is to get as much data out to a duplicate table, drop table, and then duplicate back…

 

Stop Opsview:

/etc/init.d/opsview stop
/etc/init.d/opsview-agent stop
/etc/init.d/opsview-web stop

 

mysql -u root -p

use runtime;
create table nagios_servicechecksnew like nagios_servicechecks;
insert nagios_servicechecksnew select * from nagios_servicechecks where servicecheck_id not in (select servicecheck_id from nagios_servicechecksnew);

This will error after a while with a SQL Server crash. Run same command below, but starting limit high, and slowly reducing to 1.

insert nagios_servicechecksnew select * from nagios_servicechecks where servicecheck_id not in (select servicecheck_id from nagios_servicechecksnew) limit 1;

Delete table, and then duplicate info back

drop table nagios_servicechecks;
create table nagios_servicechecks like nagios_servicechecksnew;
insert nagios_servicechecks select * from nagios_servicechecksnew where servicecheck_id not in (select servicecheck_id from nagios_servicechecks);
drop table nagios_servicechecksnew;

 

Start Opsview

/etc/init.d/opsview start
/etc/init.d/opsview-agent start
/etc/init.d/opsview-web start

Posted by admin in Computing Blogs

Installing VMware Enhanced Client Integration Plugin in Windows 10

Browse to your vcenter, and select Flash option. At login page, download the client using link at bottom, and save to disk.

Run this file from an Administrative user.

For IE:

Add the base vcenter FQDN (or IP, if using just that) to Local Intranet zone.

Download the Trusted certs from link in bottom right of https://<vcenter fqdn>/

Extract download.zip

Import the 2 certs (MMC > Add Snapin > Certificates > Local Computer, right click trusted roots, select import, and point at one of the extracted certs. Repeat import for other)

Run IE as Administrator (ie, use the “Run As Administrator” option, not just run as a user with Admin rights), and browse to vcenter FQDN, select Flash link, then on the popup, uncheck the box to ask every time, and select Allow.

All should work now

(If you use web proxies, ensure that https://vmware-plugin:8094 is in the proxy bypass list)

 

For Firefox

Browse to https://vmware-plugin:8094 (this is added to hosts file during client install) and accept the Exception for duff cert.

Browse to vcenter, and accept any Exceptions for duff certs.

All show work now

 

For Chrome

You’re a moron. Use a browser with some element of security.

Although its trivial to get the client to work, the fact is you should NEVER use Chrome on Windows, until Google go back to the drawing board and rewrite from scratch thinking about security from the start.

Posted by admin in VMware