Dajbych.net


Health Monitoring of the Service Fabric App Upgrade

, 3 minutes to read

service fabric logo

Deploying an update of any application can be risky because new code may contain new bugs. Unit testing is an advisable method of reducing the risk. However, some mechanisms depend on workload. Some workloads can be simulated more easily than others. Service Fabric provides health monitoring after the new application version is deployed to the cluster. If the new version is not healthy, the old version is rolled back automatically. Setting up protection against failures caused by upgrades is relatively easy.

Create a new Service Fabric Stateful service, open the Stateful1.cs class, and replace its content with the following code:

using Microsoft.ServiceFabric.Services.Communication.Runtime;
using Microsoft.ServiceFabric.Services.Runtime;
using System;
using System.Collections.Generic;
using System.Fabric;
using System.Fabric.Health;
using System.Threading;
using System.Threading.Tasks;

namespace Stateful1 {
    internal sealed class Stateful1 : StatefulService {

        public Stateful1(StatefulServiceContext context) : base(context) { }

        protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() {
            return new ServiceReplicaListener[0];
        }

        protected override async Task RunAsync(CancellationToken cancellationToken) {
            var version = Context.CodePackageActivationContext.GetServiceManifestVersion();

            ServiceEventSource.Current.ServiceMessage(Context, $"version: {version}", Context.ServiceName);

            while (!cancellationToken.IsCancellationRequested) {
                if (version == "1.0.0") {

                    var healthInformation = new HealthInformation(nameof(Stateful1), "Watchdog", HealthState.Ok) {
                        TimeToLive = TimeSpan.FromMinutes(1)
                    };
                    FabricRuntime.GetActivationContext().ReportDeployedServicePackageHealth(healthInformation);

                    await Task.Delay(TimeSpan.FromSeconds(10), cancellationToken);

                } else {

                    var healthInformation = new HealthInformation(nameof(Stateful1), "Watchdog", HealthState.Ok) {
                        TimeToLive = TimeSpan.FromSeconds(10)
                    };
                    FabricRuntime.GetActivationContext().ReportDeployedServicePackageHealth(healthInformation);

                    await Task.Delay(TimeSpan.FromSeconds(30), cancellationToken);

                }
            }

        }

    }
}

As you can see, there is a HealthInformation class. It reports the health state of some property. The health of the entire service consists of multiple properties. The health information can be valid until it is rewritten, or it can be periodically ensured as valid. In this case, the TimeToLive interval must be set. When the interval expires and new health information is not present, the health state will automatically change to Error.

The code above simulates two versions of the same service. In the first version, the unit of work is done sooner than the health information expires. Then the loop starts again, the health information is refreshed, and the whole cycle starts again.

In the newer version, the unit of work takes longer, so the health information expires sooner than the work is done. It simulates an unexpected decrease in the service performance. The service will be unhealthy for most of the time, and Service Fabric can detect it and halt the upgrade.

Publish the application as usual and then update its version.

Publish the application again, but modify the settings. Check the Upgrade the Application option.

Click on the Configure Upgrade Settings link and set the Monitored Upgrade mode. Verify that the FailureAction property is set to Rollback.

Click the Publish button and open Service Fabric Explorer. You can see that one upgrade is in progress.

The upgrade is processed, and the health of the service is monitored.

When the service is unhealthy after the upgrade, it is downgraded to the original version.

Health monitoring can reflect the quality of the service and block the upgrade if the quality of the service decreases.