Service Discovery in .NET

Wednesday, September 25, 2024

Introduction

Service Discovery is a common pattern in distributed systems, like microservices, where one does not need to know the exact address of a resource, instead we ask some kind of database-backed service to get it for us. This is very useful to prevent hardcoding such addresses, instead they are discovered dynamically at runtime. There may be other functionality, but it may also include checking the health of a service first, or rotating the exact address to return, when there are many possibilities.

Service discovery is supported in .NET since version 8, through the Microsoft.Extensions.ServiceDiscovery Nuget package. It's a very simple implementation, which I will cover here, the code is made available under the Microsoft Aspire repo. More advanced service discovery providers include, to name just a few:

All of these have libraries available for .NET, for free or otherwise, and some work in different ways - doing a runtime lookup vs updating a configuration file, for example - but I won't go into each of them. The Microsoft implementation essentially acts as a wrapper for the HttpClient, which replaces a dummy host, for a real one. It comes with two implementations for the service resolver:

The default one, gets mappings from services to hosts from the configuration file (AddConfigurationServiceEndpointProvider, from Microsoft.Extensions.ServiceDiscovery)
Another one based on DNS and DNS SRV records (AddDnsServiceEndpointProvider or AddDnsSrvServiceEndpointProvider, from Microsoft.Extensions.ServiceDiscovery.Dns)

Let's first look at how it works internally.

Service Discovery for the HttpClient

I have written extensively about the HttpClient in recent times, and here I go again. As I said earlier on this post, the library that Microsoft makes available works for HttpClient only, it allows you to register a client using a service name as the host, and then have it dynamically replaced at runtime, before making the actual HTTP call. It operates in two steps:

We register the required services to the dependency injection (DI) framework
We apply it to one or all clients

It goes like this, for registering stuff, we have the AddServiceDiscovery extension method, which uses the default service resolver implementation:

builder.Services.AddServiceDiscovery();

It is possible to configure a few options (ServiceDiscoveryOptions):

builder.Services.AddServiceDiscovery(static options =>
{
    options.RefreshPeriod = TimeSpan.FromHours(1);
    options.AllowAllSchemes = false;
    options.AllowedSchemes = ["https"];
});

The AllowAllSchemes, when set to false, will only accept the schemes present on the AllowedSchemes list; the default is to accept all. RefreshPeriod is the time period between polling attempts, and it defaults to 1 hour.

Actually, the AddServiceDiscovery does two things:

Calls the base AddServiceDiscoveryCore method, which registers all the base stuff
Registers an endpoint provider that gets its options from the configuration by calling AddConfigurationServiceEndpointProvider

So, essentially, it is similar to these two calls:

builder.Services.AddConfigurationServiceEndpointProvider();
builder.Services.AddServiceDiscoveryCore();

Then, when we register a client, we use a different AddServiceDiscovery method:

builder.Services.AddHttpClient<TodoClient>("todo", static client =>
{
    client.DefaultRequestHeaders.Add(HeaderNames.Accept, MediaTypeNames.Application.Json);
    client.BaseAddress = new("https://todo");
}).AddServiceDiscovery();

What this one does is registering a message handler for the named HttpClient, this is where all the magic happens. Please notice the https://todo URL, more on this in a moment!

We can also register service discovery for all registered clients:

builder.Services.ConfigureHttpClientDefaults(static client =>
{
    client.AddServiceDiscovery();
});

Now, when you run code like this:

var client = httpClientFactory.CreateClient("todo");
var results = await client.GetFromJsonAsync<List<Todo>>("/todos");

The framework will automatically provide the right host!

Service Discovery for gRPC

Now, you may have heard about gRPC and its implementation in .NET. It so happens that, by default, this implementation uses its own client, but it can be made to use our beloved HttpClient, and everything that goes with it, including service discovery!

We will need the Grpc.Net.ClientFactory Nuget package for this, and after we install it, we get the AddGrpcClient method:

builder.Services.AddGrpcClient<My.MyGrpcClient>(static client =>
{
    client.Address = new("https://grpc:5001");
}).AddServiceDiscovery();

And that's it, there's no way to do this for all configured clients, you need to enable them individually with AddServiceDiscovery. I won't cover gRPC here, so I won't include the code for the client, for it is irrelevant here. Just make sure you have an entry on your configuration for "grpc".

Service Discovery Using the Configuration

As I said, the default provider loads the configuration from the file, normally, appsettings.json. If we want, we can change the section name on the configuration file to use (SectionName):

builder.Services.AddConfigurationServiceEndpointProvider(static options =>
{
    options.SectionName = "Services"; //this is the default
});
builder.Services.AddServiceDiscoveryCore();

If we just call AddServiceDiscovery, we get the defaults.

Now, let's see how the actual addresses are resolved! We need something like this in our configuration file:

{
    "Services": {
        "todo": {
            "https": [
                "jsonplaceholder.typicode.com",
                 "jsonplaceholder2.typicode.com"
            ],
            "http": [
                "jsonplaceholder.typicode.com",
                "jsonplaceholder2.typicode.com"
            ]
    }
}

A few things worth noting:

The section, by default, must be called "Services", but it can be changed through the SectionName property of the ConfigurationServiceEndpointProviderOptions, as I've shown before
Service registrations must match the name that is passed as the host on the HttpClient.BaseAddress property ("todo")
Registrations are different for HTTP and HTTPS endpoints
We only register here host names, not full URLs
There can be multiple hosts for the same service, they are returned in round-robin by default

Service Discovery Using DNS

The second implementation option, that does not use configuration files, is available with the Microsoft.Extensions.ServiceDiscovery.Dns package. Actually, it's two options:

Resolve a service name using the DNS service (AddDnsServiceEndpointProvider)
Resolve a service name using the DNS SRV records (AddDnsSrvServiceEndpointProvider)

This can be useful, for example, when using Docker or other container technology. You can read about the RFC that is behind this here and the general idea behind using DNS for service discovery here.

The DNS provides load-balanced service names resolutions out of the box, so it can be a good bet. To use it, we need the AddDnsServiceEndpointProvider method:

builder.Services.AddDnsServiceEndpointProvider();
builder.Services.AddServiceDiscoveryCore();

It can take a few options (DnsSrvServiceEndpointProviderOptions) as well:

builder.Services.AddDnsServiceEndpointProvider(static options =>
{
    options.DefaultRefreshPeriod = TimeSpan.FromMinutes(60);
});

The DNS SRV implementation uses the SRV records for the lookup, it's normally used with Docker or Kubernetes containers, the call to use is AddDnsSrvServiceEndpointProvider:

builder.Services.AddDnsSrvServiceEndpointProvider();
builder.Services.AddServiceDiscoveryCore();

And the options version (DnsSrvServiceEndpointProviderOptions):

builder.Services.AddDnsSrvServiceEndpointProvider(static options =>
{
    options.QuerySuffix = "my.domain";
});

Service Discovery with Yarp

Yarp is an open-source reverse proxy service written by Microsoft. It integrates nicely with the ASP.NET Core framework, and Microsoft, fortunately, also implemented service discovery support for it, through the Microsoft.Extensions.ServiceDiscovery.Yarp Nuget package!

We call AddHttpForwarderWithServiceDiscovery instead of just AddHttpForwarder:

builder.Services.AddHttpForwarderWithServiceDiscovery();

and the service discovery capabilities are added to Yarp, meaning, it will use the registered provider (remember that the default one is configuration-based) to resolve service names.

Custom Service Discovery

Now, what if we could provide our own service discovery resolver? We certainly can, and I will show how to do it. First, let's define a standard contract for our resolver:

public interface IServiceResolver
{
    ValueTask<string> Resolve(string serviceName, CancellationToken cancellationToken = default);
    bool FailOnNoResolve { get; }
}

Should be pretty straightforward. The Resolve method, depending on its actual implementation, can be asynchronous. The FailOnNoResolve just means that the code should throw an exception if it's not possible to resolve a service name.

One issue with this interface is, we can only return a single host for a given service name. There are ways around it, like returning comma-separated values, for example, but I won't go into that here (perhaps on a future post).

Let's consider a few implementations, first, one that takes a configuration of registrations:

public sealed class ServiceResolver : IServiceResolver
{
    private readonly CustomServicesOptions _options;

    public ServiceResolver(CustomServicesOptions options)
    {
        ArgumentNullException.ThrowIfNull(options, nameof(options));
        _options = options;
    }

    public bool FailOnNoResolve => _options.FailOnNoResolve;

    public ValueTask<string> Resolve(string serviceName, CancellationToken cancellationToken = default)
    {
        ArgumentException.ThrowIfNullOrWhiteSpace(serviceName, nameof(serviceName));
        _options.Services.TryGetValue(serviceName, out var host);
        return ValueTask.FromResult(host!);
    }
}

And here is that configuration class:

public class CustomServicesOptions
{
    public CustomServicesOptions() { }

    internal CustomServicesOptions(IDictionary<string, string> services)
    {
        ArgumentNullException.ThrowIfNull(services, nameof(services));
        Services = new Dictionary<string, string>(services, StringComparer.InvariantCultureIgnoreCase);
    }

    public Dictionary<string, string> Services { get; } = new Dictionary<string, string>(StringComparer.InvariantCultureIgnoreCase);

    public bool FailOnNoResolve { get; set; }
}

As you can see, the CustomServiceOptions class uses a case-insensitive dictionary to store registrations, in the form service name -> host, pretty much what we had in the configuration file. And, of course, you can also load it from configuration:

var options = builder.Configuration.GetSection("CustomServices").Get<CustomServiceOptions>();

We can have an implementation that takes a resolver lambda function:

public sealed class FunctionServiceResolver : IServiceResolver
{
    private readonly Func<string, string> _resolver;

    public FunctionServiceResolver(Func<string, string> resolver)
    {
        ArgumentNullException.ThrowIfNull(resolver, nameof(resolver));
        _resolver = resolver;
    }

    public ValueTask<string> Resolve(string serviceName, CancellationToken cancellationToken = default)
    {
        ArgumentException.ThrowIfNullOrWhiteSpace(serviceName, nameof(serviceName));
        var host = _resolver(serviceName);
        return ValueTask.FromResult(host);
    }

    public bool FailOnNoResolve { get; init; }
}

This one is very flexible, as you can provide any implementation you want as the resolver function.

What if we want to store the services configuration out-of-process, for example, in a distributed cache? Let's see how, but first, we create an interface to state that intent (distributed resolver):

public interface IDistributedServiceResolver : IServiceResolver
{
    string? KeyPrefix { get; }
}

This one extends the IServiceResolver interface I introduced earlier, with just one additional (optional) property: KeyPrefix. If supplied, it will be used as a prefix to our service registrations. How we actually get them there is beyond the scope of this post! Let's create a configuration class to go along:

public class DistributedCustomServicesOptions
{
    public string? KeyPrefix { get; set; }
    public bool FailOnNoResolve { get; set; }
}

And here is an implementation that leverages the IDistributedCache standard interface:

public sealed class DistributedCacheServiceResolver : IDistributedServiceResolver
{
    private readonly IDistributedCache _cache;
    private readonly DistributedCustomServicesOptions? _options;

    public DistributedCacheServiceResolver(IDistributedCache cache, IOptions<DistributedCustomServicesOptions> options)
    {
        ArgumentNullException.ThrowIfNull(cache, nameof(cache));
        _cache = cache;
        _options = options?.Value;
    }

    public async ValueTask<string> Resolve(string serviceName, CancellationToken cancellationToken = default)
    {
        ArgumentException.ThrowIfNullOrWhiteSpace(serviceName, nameof(serviceName));
        var key = string.IsNullOrWhiteSpace(KeyPrefix) ? serviceName : $"{KeyPrefix}:{serviceName}";
        var host = await _cache.GetStringAsync(key, cancellationToken);
        return host!;
    }

    public string? KeyPrefix => _options?.KeyPrefix;

    public bool FailOnNoResolve => ((bool?)_options?.FailOnNoResolve).GetValueOrDefault();
}

As you can see, I made KeyPrefix and FailOnNoResolve depend on a configuration parameter to be passed in the constructor, using the Options Pattern. Because this class requires an IDistributedCache instance, we should register it using DI:

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddCustomServiceDiscovery<TServiceResolver>(this IServiceCollection services) where TServiceResolver : class, IServiceResolver
    {
        ArgumentNullException.ThrowIfNull(services, nameof(services));

        services.AddSingleton<IServiceResolver, TServiceResolver>();
        services.AddServiceDiscoveryCore();
        services.AddSingleton<IServiceEndpointProviderFactory, CustomServiceEndpointProviderFactory>();

        return services;
    }
}

Note the CustomServiceEndpointProviderFactory class, I'll talk about it in a moment. For now you just register it as:

builder.Services.Configure<DistributedCustomServicesOptions>(static options =>
{
    options.KeyPrefix = "my";
});
builder.Services.AddCustomServiceResolver<DistributedCacheServiceResolver>();

Of course, for this to work, you will need a working registered implementation of IDistributedCache, such as the Redis implementation (Microsoft.Extensions.Caching.StackExchangeRedis is the most commonly used), and, of course, a working Redis server. I won't go into the details of configuring these, but you can find it here.

To finalise, I need to show you what actually does the magic: the CustomServiceEndpointProviderFactory class, which implements IServiceEndpointProviderFactory. Without further ado:

public sealed class CustomServiceEndpointProviderFactory : IServiceEndpointProviderFactory
{
    private readonly ILogger<CustomServiceEndpointProviderFactory> _logger;
    private readonly IServiceResolver _serviceResolver;

    public CustomServiceEndpointProviderFactory(ILogger<CustomServiceEndpointProviderFactory> logger, IServiceResolver serviceResolver)
    {
        ArgumentNullException.ThrowIfNull(serviceResolver, nameof(serviceResolver));
        ArgumentNullException.ThrowIfNull(logger, nameof(logger));
        _logger = logger;
        _serviceResolver = serviceResolver;
        _logger.LogInformation("Registered '{serviceResolver}' service resolver", _serviceResolver);
    }

    bool IServiceEndpointProviderFactory.TryCreateProvider(ServiceEndpointQuery query, out IServiceEndpointProvider provider)
    {
        var serviceName = query.ToString()!;

        if (!TryFindEndPoint(serviceName, out var endPoint))
        {
            endPoint = new DnsEndPoint(serviceName, 0);
            _logger.LogInformation("Could not find endpoint for '{serviceName}'", serviceName);
        }

        provider = new CustomConfigurationServiceEndpointProvider(endPoint!);

        return true;
    }

    private bool TryFindEndPoint(string serviceName, out EndPoint? endPoint)
    {
        if ((serviceName.Contains("://", StringComparison.Ordinal) || !Uri.TryCreate($"fakescheme://{serviceName}", default, out var uri)) && !Uri.TryCreate(serviceName, default, out uri))
        {
            endPoint = null;
            return false;
        }

        var uriHost = uri.Host;
        var segmentSeparatorIndex = uriHost.IndexOf('.');
        string? host;

        if (uriHost.StartsWith('_') && segmentSeparatorIndex > 1 && uriHost[^1] != '.')
        {
            host = uriHost[(segmentSeparatorIndex + 1)..];
        }
        else
        {
            host = uriHost;
        }

        var port = uri.Port > 0 ? uri.Port : 0;

        if (!string.IsNullOrWhiteSpace(host))
        {
            if (TryResolveHost(host, out var hostResolved))
            {
                _logger.LogInformation("Resolved '{host}' to '{hostResolved}'", host, hostResolved);
            }
            else if (!_serviceResolver.FailOnNoResolve)
            {
                var ips = Dns.GetHostAddresses(host);
                if (ips != null && ips.Length != 0)
                {
                    hostResolved = host;
                    _logger.LogInformation("Keeping '{host}' as the host", host);
                }
                else if (IPAddress.TryParse(host, out var ip))
                {
                    _logger.LogInformation("Using '{ip}' as the host", ip);
                    endPoint = new IPEndPoint(ip, port);
                    return true;
                }
            }
            else
            {
                var ex = new InvalidOperationException($"Could not resolve service {serviceName}");
                _logger.LogError(ex, "Could not resolve service {serviceName}", serviceName);
                throw ex;
            }

            if (!string.IsNullOrWhiteSpace(hostResolved))
            {
                endPoint = new DnsEndPoint(hostResolved!, port);
                return true;
            }
        }

        _logger.LogWarning("Could not resolve service {host}", host);
        endPoint = null;
        return false;
    }

    private bool TryResolveHost(string service, out string? hostResolved)
    {
        hostResolved = _serviceResolver.Resolve(service).ConfigureAwait(false).GetAwaiter().GetResult();
        if (string.IsNullOrWhiteSpace(hostResolved))
        {
            _logger.LogWarning("No host for service {service} found", service);
        }

        return !string.IsNullOrWhiteSpace(hostResolved);
    }
}

I partially based this on Microsoft's implementation for PassThroughServiceEndpointProviderFactory. A brief explanation is in order:

The CustomServiceEndpointProviderFactory tries first to resolve the host using the supplied provider
If it fails, and if FailOnNoResolve is set to true, it throws an exception
Otherwise, it tries to resolve the service name, as a name and as an IP address, and creates an appropriate endpoint
It then creates an CustomConfigurationServiceEndpointProvider with the endpoint

Its companion class, CustomConfigurationServiceEndpointProvider (IServiceEndpointProvider implementation, based upon PassThroughServiceEndpointProvider), is really very simple:

public sealed class CustomConfigurationServiceEndpointProvider(EndPoint endpoint) : IServiceEndpointProvider
{
    public const string SectionName = "CustomServices";

    ValueTask IAsyncDisposable.DisposeAsync() => default;

    ValueTask IServiceEndpointProvider.PopulateAsync(IServiceEndpointBuilder endpoints, CancellationToken cancellationToken)
    {
        if (endpoints.Endpoints.Count == 0)
        {
            var serviceEndpoint = ServiceEndpoint.Create(endpoint);
            serviceEndpoint.Features.Set<IServiceEndpointProvider>(this);
            endpoints.Endpoints.Add(serviceEndpoint);
        }

        return default;
    }

    public override string ToString() => "Custom";
}

Now, all we need is a couple more extension methods to glue everything together:

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddCustomServiceDiscovery(this IServiceCollection services, CustomServicesOptions options)
    {
        ArgumentNullException.ThrowIfNull(services, nameof(services));
        ArgumentNullException.ThrowIfNull(options, nameof(options));

        services.AddSingleton<IServiceResolver>(new ServiceResolver(options));
        services.AddServiceDiscoveryCore();
        services.AddSingleton<IServiceEndpointProviderFactory, CustomServiceEndpointProviderFactory>();

        return services;
    }

    public static IServiceCollection AddCustomServiceDiscovery(this IServiceCollection services, Func<string, string> serviceResolver, bool failOnNoResolve = false)
    {
        ArgumentNullException.ThrowIfNull(services, nameof(services));
        ArgumentNullException.ThrowIfNull(serviceResolver, nameof(serviceResolver));

        services.AddSingleton<IServiceResolver>(new FunctionServiceResolver(serviceResolver) { FailOnNoResolve = failOnNoResolve });
        services.AddServiceDiscoveryCore();
        services.AddSingleton<IServiceEndpointProviderFactory, CustomServiceEndpointProviderFactory>();

        return services;
    }
}

And I guess you know how to use them! :-)

Future Work for Custom Discovery

Some ideas to make this even better:

Allow returning multiple endpoints for a service
Checking the health of an endpoint before returning it (see this post)

Conclusion

And that is basically what I wanted to show you. If you look at my repo, https://github.com/rjperes/ServiceDiscovery, you will see a few more options, such as a cached and a composite resolver, and you can also try my Nuget package, ServiceDiscovery.NET.