Dell AppAssure: Find offline agents and restart the agent service

At my place of business, we sell and support Dell AppAssure for most of our customers. Recently, Dell released the 5.4.1 version and I noticed something interesting. Occasionally, agents would go “offline” in the Core console. Untill I have a chance to figure out why this is happening, I wrote a script to take advantage of the AppAssure PowerShell APIs and restart the remote AppAssureAgent service.

AACore-AgentsOffline

At first, I wrote a script that searched through the AppAssure event log for the error stating which agents were offline in the past hour (the snapshot interval). However, this proved to be rather messy. Perhaps it’s because I don’t know enough about parsing event log messages, but it was hard to get the exact Agent’s computer name from the event message.

Here’s the code from my initial attempt, which includes code to create yet another event log entry telling me that the agent was restarted:

<#
Purpose:	Script to find AppAssure events that indicate an agent is not responding, then restart that service remotely.
Author:		Michael Kenning (mjkenning@gmail.com)
Version:	1.1 (15 MAY 2014) -- (Added code to create event log entry and send an alert email)
Usage: 		.\fixAAagent.ps1

Notes:		AppAssure event ID 35291 indicates a missed snapshot because the agent wasn't responding.
			Tested on AppAssure Core version 5.4.1.77
			The display names of all protected machines should not have the "or" or "has" strings in them as this will break event detection.
			Must create an event log source prior to executing this script, matching the source in the code below:
				e.g. New-EventLog –LogName Application –Source “Fix AA Agent"

#>

# Determine the time to search the log file
$startTime = (Get-Date) - (New-TimeSpan -Hour 1) # Start time is set to one hour prior to the current time. This can be adjusted based on need.
$endTime = (Get-Date) # End time is the current time

# Find the events we are looking for!
$message = @(Get-WinEvent -FilterHashtable @{logname='AppAssure'; id=35291;StartTime=$startTime;EndTime=$EndTime} | Select Message) # You can adapt this to search other logs and for other event ID's.

# Process each message and extract the hostname, then trim the string into a usable variable to pass to the restart code.
foreach ($item in $message){
	$pattern = "(([^on)]+)has)" # This regex pattern will return the text after the word "on" up to and including the word "has".
	if ($item -match $pattern) { # Only execute the code below if there is text matching the pattern
		$computerName = $matches[0] # Return the match and assign it to a variable
		$computerName = $computerName.trim() # Trim spaces from the beginning and end of the variable
		$computerName = $computerName.trimend("has") # Trim the letters found at the end of the string
		$computerName = $computerName.trim() # Trim spaces from the beginning and end of the variable
	}

	# Restart the service, create an event log and send an alert email.
	if ($matches) { # Check to make sure we found a match before we restart anything!
		$ServiceObj = Get-Service -Name "AppAssureAgent" -ComputerName $computerName -ErrorAction Stop # Set the action specifications to a variable.
		$result = Restart-Service -InputObj $ServiceObj # Restart the service based on the action object defined above.
		
		$EventLogEntry = "==============================`nSummary Information:`nAgent found with broken connection:`n$computerName`nResult was: $result"
		Write-EventLog -LogName Application -Source "Fix AA Agent" -EntryType Warning -EventID 1 -Message $EventLogEntry
		
	}
}

This did the job reasonably well, but I couldn’t use agent names with the “or” string in them. This customer has a “.org” domain name, which meant I had to tidy up all the display names to be host name only. Like I said, messy and not what I was looking for.

Here’s my attempt with the AppAssure API’s. Note, if you haven’t already done so, you will need to run the following command:

Import-Module “AppAssurePowerShellModule”
<#
Purpose:	Script to find AppAssure agents that are not responding, then start or restart that service remotely.
Author:		Michael Kenning (mjkenning@gmail.com)
Version:	1.0 (15 MAY 2014)
Usage: 		.\fixAAagent.ps1

Notes:		Tested on:
		1) AppAssure 5.4.1
		2) Windows 2012 R2 (Powershell v4)
#>

# Determine which Agents are not "Online"
$servers = Get-ProtectedServers | where Status -ne Online | Select -expand displayname

foreach ($server in $servers){
 	# Get the Agent service status of the servers identified as not online.
	$status = (Get-Service AppAssureAgent -Computer $server).Status

 	# If the service is anything but "Running", start the service
	if ($status -ne "Running"){
		$ServiceObj = Get-Service -Name AppAssureAgent -ComputerName $server -ErrorAction Stop
		Start-Service -InputObj $ServiceObj
	}
	# If the service is in a "Running" status, restart the service.
	Else {
		$ServiceObj = Get-Service -Name AppAssureAgent -ComputerName $server -ErrorAction Stop
		Restart-Service -InputObj $ServiceObj
	}
}

As you can see, this script is much cleaner and neater. I set this script to run in a scheduled task every hour, just before the snapshots take place.

Advertisements
Tagged with: ,
Posted in AppAssure, Powershell
2 comments on “Dell AppAssure: Find offline agents and restart the agent service
  1. Brian says:

    Michael,

    Thank you very much for this tip. It’s an issue that has plagued us for awhile, and this worked perfectly as a solution. We are running it as a scheduled task every hour.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: