Automate multi-cloud service failover with sameness groups

30min
|
Enterprise
Consul

Admin partitions are a multi-tenancy solution in Consul datacenters. You can use them to peer clusters across different datacenters. When services share the same names across Consul deployments with peered clusters, you can configure sameness groups to add automatic failover between services.

A sameness group is a logical collection of local admin partitions and remote admin partitions on cluster peers, where services with the same names are treated as the same service in terms of service failover. With sameness groups, you can setup and manage automatic service failover with fewer configuration steps.

In this tutorial, you will peer clusters deployed to different cloud providers and then configure a sameness group to automate multi-cloud failover in your Consul service mesh.

Scenario overview

HashiCups is a coffee-shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. In this tutorial, you will deploy HashiCups services on Kubernetes clusters deployed in two different cloud providers. By peering the Consul clusters, the services in one region can communicate with the services in the other. By using sameness groups, unavailable services from one peer will automatically fallback to services from another peer.

The architecture diagram of the scenario. It shows the Kubernetes environments and the flow of traffic from the client request through the Consul service mesh.

HashiCups uses the following microservices:

The nginx service is an NGINX instance that routes requests to the frontend microservice and serves as a reverse proxy to the public-api service.
The frontend service provides a React-based UI.
The public-api service is a GraphQL public API that communicates with the product-api and the payments services.
The product-api service stores the core HashiCups application logic, including authentication, coffee (product) information, and orders.
The product-api-db service is a Postgres database instance that stores user, product, and order information.
The payments service is a gRCP-based Java application service that handles customer payments.

Prerequisites

To complete this tutorial, you should already be familiar with admin partitions and cluster peering in Consul.

Enterprise Only

The functionality described in this tutorial requires Consul Enterprise. To explore Consul Enterprise features, you can sign up for a free 30-day trial.

To complete this tutorial, you need:

A valid Consul Enterprise license
An HCP account configured for use with Terraform
An AWS account configured for use with Terraform
aws-cli v2.0 or later
A Google Cloud account configured for use with Terraform
gcloud CLI v461.0.0 or later with the gke-cloud-auth-plugin plugin installed
kubectl v1.27 or later
git v2.0 or later
terraform v1.2 or later
consul-k8s v1.3.3
jq v1.6 or later

This tutorial uses Terraform automation to deploy the demo environment. You do not need to know Terraform to successfully complete this tutorial.

Clone example repository

Clone the GitHub repository containing the configuration files and resources.

$ git clone https://github.com/hashicorp-education/learn-consul-sameness-groups

Change directories to the newly cloned repository.

$ cd learn-consul-sameness-groups

The repository has the following structure:

The dc1-aws directory contains Terraform configuration to deploy an HCP Consul Dedicated cluster and an AWS EKS cluster in us-west-2.
The dc2-gcloud directory contains Terraform configuration to deploy an GKE cluster in us-central1-a.
The consul-peering directory contains Terraform configuration to automate peering of two Consul clusters.
The k8s-yamls directory contains Consul custom resource definitions (CRDs) that support this tutorial.
The hashicups-v1.0.2 directory contains YAML configuration files for deploying HashiCups.

Deploy Kubernetes clusters

In this section, you will deploy the infrastructure for this tutorial. You will use Terraform to create an HCP Consul cluster, deploy a Kubernetes cluster on each cloud provider, and deploy Consul dataplanes alongside services in each Kubernetes cluster.

Deploy HCP Consul Dedicated and the first Kubernetes cluster on AWS

Initialize the Terraform configuration for dc1-aws to download the necessary providers and modules.

$ terraform -chdir=dc1-aws init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...

By default, dc1-aws deploys to us-west-2. You can change the region your workloads run in by using the terraform.tfvars.example template file to create a terraform.tfvars file.

$ cp dc1-aws/terraform.tfvars.example dc1-aws/terraform.tfvars

dc1-aws/terraform.tfvars

vpc_region = "us-west-2"
hvn_region = "us-west-2"

Deploy the resources for dc1. Confirm the run by entering yes.

$ terraform -chdir=dc1-aws apply
## ...
Plan: 112 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
## ...

Apply complete! Resources: 112 added, 0 changed, 0 destroyed.

Outputs:

cluster_name = "learn-consul-sameness-dc1"
consul_datacenter = "learn-consul-sameness-dc1"
consul_token = <sensitive>
hcp_consul_ca = <sensitive>
region = "us-east-2"

It takes about 15 minutes to deploy your infrastructure. To save time while waiting, you may proceed to the next section of this tutorial and begin the second datacenter deployment in parallel.

Note

If your HCP account has access to multiple organizations or projects, you may encounter a Terraform error related to an unexpected number of organizations or projects. If you receive this error, use the HCP_PROJECT_ID environment variable to specify which HCP project you want your Consul cluster deployed in. For more information, refer to the Terraform HCP provider documentation.

After you deploy the first datacenter, configure the kubectl tool to interact with it. The following command stores the cluster connection information in the dc1 alias.

$ aws eks \
    update-kubeconfig \
    --region $(terraform -chdir=dc1-aws output -raw region) \
    --name $(terraform -chdir=dc1-aws output -raw cluster_name) \
    --alias=dc1

Deploy the second Kubernetes cluster

Place the contents of your Consul Enterprise license into a file named consul.hclic. A Consul license is a requirement for the second cluster only. The first cluster does not need a license because it uses HCP Consul Dedicated, which already contains the Enterprise feature set.

$ touch consul.hclic

Next, initialize the Terraform configuration for dc2-gcloud to download the necessary providers and modules.

$ terraform -chdir=dc2-gcloud init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...

Use the terraform.tfvars.example template file to create a terraform.tfvars file. Then set your Google Cloud project ID in the project variable. By default, dc2-gcloud deploys to us-central1-a. You have the option to change the region in the variables file.

$ cp dc2-gcloud/terraform.tfvars.example dc2-gcloud/terraform.tfvars

dc2-gcloud/terraform.tfvars

project = "XXXXXXXXXXXXXXXXXXXXXXX"
zone = "us-central1-a"

Then deploy the resources for dc2. To confirm the run, enter yes. It takes about 10 minutes to deploy your infrastructure.

$ terraform -chdir=dc2-gcloud apply
## ...
Plan: 45 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
## ...

Apply complete! Resources: 45 added, 0 changed, 0 destroyed.

Outputs:

get-credentials_command = "gcloud container clusters get-credentials --zone us-central1-a learn-consul-sameness-dc2"
project_id = "hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
rename-context_command = "kubectl config rename-context gke_hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx_us-central1-a_learn-consul-sameness-dc2 dc2"
set-project_command = "gcloud config set project hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
zone = "us-central1-a"

After you deploy the second datacenter, set the active Google Cloud project to reference the deployment in dc2.

$ gcloud config set project $(terraform -chdir=dc2-gcloud output -raw project_id)
Updated property [core/project].

Next, obtain the Google Cloud credentials required to interact with the dc2 deployment.

$ gcloud container clusters get-credentials --zone $(terraform -chdir=dc2-gcloud output -raw zone) learn-consul-sameness-dc2
Fetching cluster endpoint and auth data.
kubeconfig entry generated for learn-consul-sameness-dc2.

Configure kubectl to use the dc2 alias for the second datacenter.

$ kubectl config rename-context gke_$(terraform -chdir=dc2-gcloud output -raw project_id)_$(terraform -chdir=dc2-gcloud output -raw zone)_learn-consul-sameness-dc2 dc2
Context "gke_hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx_us-central1-a_learn-consul-sameness-dc2" renamed to "dc2".

Review infrastructure and service deployments

Terraform deploys Consul on both your Kubernetes platforms. By default, Consul deploys into its own dedicated namespace (consul). The following settings in the Consul Helm chart are mandatory for cluster peering.

global:
##...
  peering:
    enabled: true # mandatory for cluster peering
  tls:
    enabled: true # mandatory for cluster peering
##...
meshGateway:
  enabled: true # mandatory for k8s cluster peering
##...

Inspect the Kubernetes pods in the consul namespace to verify that Terraform deployed Consul in dc1. Notice that there are no Consul servers in this Kubernetes deployment because they are running on the HCP platform.

$ kubectl --context=dc1 --namespace=consul get pods
NAME                                           READY   STATUS    RESTARTS   AGE
consul-api-gateway-96bbfdb55-kc2zc             1/1     Running   0          41m
consul-connect-injector-6b9b644469-vfvnv       1/1     Running   0          42m
consul-mesh-gateway-7856f98dd8-bkt2x           1/1     Running   0          42m
consul-webhook-cert-manager-84d694f9c9-bmr25   1/1     Running   0          42m
prometheus-server-8455cbf87d-sdcxc             2/2     Running   0          42m

The Consul API Gateway enables browser access to the HashiCups application. Terraform deploys the API Gateway in both datacenters. For more information on Consul API Gateway, refer to the Consul API Gateway tutorial.

Next, inspect the Kubernetes pods in the consul namespace to verify that Terraform deployed Consul in dc2. Because dc2 is a self-managed installation of Consul Enterprise, it includes Consul server pods in the Running state. If any pods are in a failed state, run the kubectl --context=dc2 --namespace=consul logs <PODNAME> command to inspect the logs and get more information.

$ kubectl --context=dc2 --namespace=consul get pods
NAME                                           READY   STATUS    RESTARTS   AGE
consul-api-gateway-7c85b597b-lnr6w             1/1     Running   0          16m
consul-connect-injector-6d5864f969-tpmwf       1/1     Running   0          19m
consul-mesh-gateway-788cbd8448-7xjdb           1/1     Running   0          19m
consul-server-0                                1/1     Running   0          19m
consul-server-1                                1/1     Running   0          19m
consul-server-2                                1/1     Running   0          19m
consul-webhook-cert-manager-84d694f9c9-km4g5   1/1     Running   0          19m
prometheus-server-8455cbf87d-zth86             2/2     Running   0          19m

Explore HashiCups in browser (optional)

Open HashiCups from dc1 in your browser and verify that it is operational. If you receive an error, wait a few minutes for the deployment to be ready before trying again.

$ echo http://$(kubectl --context=dc1 --namespace=consul get services consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
http://a336ea2854e1c4f3294470eed4975c42-388180783.us-west-2.elb.amazonaws.com

Open HashiCups from dc2 in your browser and verify that it is also operational. It may also take a few minutes for this deployment to be ready.

$ echo http://$(kubectl --context=dc2 --namespace=consul get services consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
http://1.2.3.4

Peer the Consul clusters

Before you can use sameness groups with the two Consul clusters, you need to peer them. Cluster peering lets you connect two or more independent admin partitions so that services deployed to different Consul datacenters can communicate. In this section of the tutorial, you will use Terraform to peer the HCP Consul dedicated cluster in dc1 with the self-managed cluster in dc2. For more information, including how to peer clusters manually, refer to the Connect services between Consul datacenters with cluster peering tutorial.

Initialize the Terraform configuration to download the necessary providers and modules.

$ terraform -chdir=consul-peering init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...

In order to peer the two clusters, Terraform needs a variables file with the endpoint addresses and a valid token for dc1 and dc2. Run the following command to generate a variables file.

$ cat <<EOF \
>consul-peering/terraform.tfvars
dc1_address = "$(terraform -chdir=dc1-aws output -raw consul_public_url)"
dc1_token = "$(terraform -chdir=dc1-aws output -raw consul_token)"
dc2_address = "$(kubectl --context=dc2 get svc -n consul consul-ui -o jsonpath='{.status.loadBalancer.ingress.*.ip}')"
dc2_token = "$(kubectl --context=dc2 get secrets -n consul consul-bootstrap-acl-token -o go-template='{{.data.token | base64decode}}')"
dc2_certificateauthority = "$(kubectl --context=dc2 get secrets -n consul consul-ca-cert -o jsonpath='{.data.tls\.crt}')"
EOF

Inspect the contents of the generated variables file in consul-peering/terraform.tfvars - it contains the addresses of each Consul cluster, a token for performing the peering operations, as well as the CA certificate of dc2.

consul-peering/terraform.tfvars

dc1_address = "https://learn-consul-sameness-dc1.consul.3eecc579-d274-4792-ae72-22d2035a4c1d.aws.hashicorp.cloud"
dc1_token = "<sensitive>"
dc2_address = "34.135.181.34"
dc2_token = "<sensitive>"
dc2_certificateauthority = "..." # output trimmed for brevity

Deploy the cluster peering. Confirm the run by entering yes.

$ terraform -chdir=consul-peering apply                
## ...
Plan: 6 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
## ...

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

Create a sameness group

Sameness groups make automatic failover possible between services with identical names in different datacenters. If Consul namespaces are enabled, they must also match in order for the failover to happen. For more information about preparing your Consul network for sameness groups, refer to the recommendations for sameness groups.

To use sameness groups in your network, you need to perform the following steps in each datacenter:

Create the sameness group
Export services to sameness group members
Create service intentions

To configure this sameness group to failover to other service instances in the sameness group by default, create a configuration entry for the sameness group that sets spec.defaultForFailover=true and list the group members in the order you want to use in a failover scenario. Refer to failover with sameness groups for more information.

The following CRDs with these configurations are included in this tutorial's repository:

dc1-sg-hashicups.yaml

123456789apiVersion: consul.hashicorp.com/v1alpha1
kind: SamenessGroup 
metadata:
  name: hashicups
spec:
  defaultForFailover: true
  members:                                   
    - partition: default
    - peer: learn-consul-sameness-dc2-default

The spec.members.peer stanza contains the peered datacenter name learn-consul-sameness-dc2 from the cluster peering page in HCP Consul, joined with a dash and the peered partition name default. If you formatted cluster names differently, use the consul peering CLI command to return a cluster's active cluster peering connections. For a full list of attributes for the SamenessGroup CRD, refer to the Sameness Group configuration entries.

Apply the dc1-sg-hashicups.yaml resource to the first Consul cluster:

$ kubectl --context=dc1 apply -f k8s-yamls/dc1-sg-hashicups.yaml
samenessgroup.consul.hashicorp.com/hashicups created

Apply the dc2-sg-hashicups.yaml resource to the second Consul cluster:

$ kubectl --context=dc2 apply -f k8s-yamls/dc2-sg-hashicups.yaml
samenessgroup.consul.hashicorp.com/hashicups created

Export service to other partition in the sameness group

The goal of this tutorial is for the HashiCups application in dc1 to fallback to the public-api service in dc2. Because this fallback goes in one direction, you only need apply the exported service CRD in dc2.

To make the public-api service available to other members of the sameness group, apply an exported services CRD. In this CRD, the sameness group is the consumer for the exported services.

The following configuration file demonstrates how to format an ExportedServices CRD. In this example, Consul exports the public-api service in the local default namespace to the sameness group. The metadata.name stanza refers to the local partition that the service is being exported from. The spec.services.[].namespace stanza reflects the local partition’s namespace from which the service is exported.

1 2 3 4 5 6 7 8 9 10apiVersion: consul.hashicorp.com/v1alpha1
Kind: ExportedServices
metadata:
  name: default
spec:
  services:
    - name: public-api
      namespace: default
      consumers:
        - samenessGroup: hashicups

For more information about exporting services, including examples of CRDs that export multiple services at the same time, refer to the exported services configuration entry reference.

Apply the ExportedServices CRD in dc2.

$ kubectl --context=dc2 apply -f k8s-yamls/exp-hashicups.yaml
exportedservices.consul.hashicorp.com/default created

Verify that the service from dc2 was exported to dc1 successfully. The following command queries the dc1 cluster about its peering connection with dc2:

$ curl \
  --silent \
  --header "X-Consul-Token: $(terraform -chdir=dc1-aws output -raw consul_token)" \
  "$(terraform -chdir=dc1-aws output -raw consul_public_url)/v1/peering/learn-consul-sameness-dc2-default" \
  | jq

The following example of this command's output highlights the name of the datacenter peer, the state of the peering connection, and a list of imported services.

{
  "ID": "be47e9d9-f3bb-6df5-0f76-99a456f18858",
  "Name": "learn-consul-sameness-dc2-default",
  "Partition": "default",
  "State": "ACTIVE",
  "PeerCAPems": [ "OMITTED FOR BREVITY"  ],
  "StreamStatus": {
    "ImportedServices": [
      "default/default/public-api"
    ],
    "ExportedServices": null,
    "LastHeartbeat": "2024-02-27T10:43:40.240719241Z",
    "LastReceive": "2024-02-27T10:43:40.240719241Z",
    "LastSend": "2024-02-26T14:48:05.907301818Z"
  },
  "CreateIndex": 4251,
  "ModifyIndex": 22038,
  "Remote": {
    "Partition": "default",
    "Datacenter": "learn-consul-sameness-dc2",
    "Locality": {
      "Region": "us-central1",
      "Zone": ""
    }
  }
}

Then, access the Consul UI for dc1 to verify that the sameness group actively supports datacenter failover for the public-api service.

$ echo "$(terraform -chdir=dc1-aws output -raw consul_public_url)/ui/$(terraform -chdir=dc1-aws output -raw consul_datacenter)/services/public-api/routing"
https://learn-consul-sameness-dc1.consul.3eecc579-d274-4792-ae72-22d2035a4c1d.aws.hashicorp.cloud/ui/learn-consul-sameness-dc1/services/public-api/routing

In the Consul UI, click Services. Click public-api and then Routing.

HCP UI details for the public-api service

The Resolvers for public-api lists the local instance, as well as the name of the cluster peer that resolves the remote instance: learn-consul-sameness-dc2-default.

Simulate and observe service failure

Retrieve the API gateway URL of the first datacenter. Open it in your browser to view the HashiCups application.

$ export APIGW_URL=$(kubectl --context=dc1 get services --namespace=consul consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') && echo "http://$APIGW_URL"
http://a336ea2854e1c4f3294470eed4975c42-388180783.us-west-2.elb.amazonaws.com

The nginx service connects to public-api to retrieve a list of coffees. When the services can communicate with each other, HashiCups displays a selection of coffees in your browser.

HashiCups UI front end page

To simulate failure, delete public-api service from dc1.

$ kubectl --context=dc1 delete -f hashicups-v1.0.2/public-api.yaml
service "public-api" deleted
serviceaccount "public-api" deleted
servicedefaults.consul.hashicorp.com "public-api" deleted
deployment.apps "public-api-v1" deleted

Verify that there are no active public-api deployments in dc1.

$ kubectl --context=dc1 get deployments public-api
Error from server (NotFound): deployments.apps "public-api" not found

Refresh the HashiCups UI in your web browser.

HashiCups UI showing no products

At this point, the frontend service in dc1 is configured to connect to its local instance of the public-api service, however there is currently no instance of public-api on dc1. Before the failover to use the instance of public-api hosted in the sameness group, you must configure a Consul service intention for the public-api service in dc2 to authorize traffic.

Authorize traffic between sameness group members and observe recovery

The ExportedServices CRD does not automatically grant permission to accept traffic from a remote service. You must also create a service intention that references the sameness group.

The following configuration file demonstrates how to format a ServiceIntentions CRD so that a service named public-api becomes available to all instances of nginx deployed in all members of the sameness group. In the following example, public-api is deployed to the default namespace and default partition in both the local datacenter and the remote sameness group. The ServiceIntentions CRD includes two rules. One rule authorize traffic to the local service and the other authorizes traffic to the sameness group.

intentions-samenessgroup.yaml

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: public-api
  namespace: default
spec:
  sources:
  - name: nginx
    namespace: default
    samenessGroup: hashicups
    action: allow
  - name: nginx
    namespace: default
    action: allow
  destination:
    name: public-api

Refer to create and manage intentions for more information about how to create and apply service intentions in Consul.

Apply the ServiceIntentions CRD on the destination cluster dc2.

$ kubectl --context=dc2 apply -f k8s-yamls/intentions-samenessgroup.yaml
serviceintentions.consul.hashicorp.com/public-api configured

Refresh the HashiCups UI in your web browser.

HashiCups UI front end page

With a service intention that allows the connection, Consul is now able to route the request for the public-api service in dc1 to the public-api instance in dc2. Because these services are in the sameness group and defaultForFailover is enabled, the HashiCups UI automatically recovers and shows the list of coffees.

Without sameness groups, you must define exported services and service intentions for each service and each admin partition. However, the number of CRDs required grows exponentially, and changes to a service in a single datacenter would require re-applying the CRDs after each change. In multi-cloud deployments, these updates would be applied separately for each cloud provider.

Sameness groups allow you to export and authorize services in a point-to-multipoint approach by managing one sameness group membership and intention per datacenter. The consumers of the service can reside in different regions or cloud providers without complicating the connectivity matrix. Enabling connectivity with sameness groups is not affected by the number of datacenters involved and always consists of two steps: one sameness group CRD per datacenter and one service intention per destination service.

Clean up environments

After you complete this tutorial, you can stop Consul, the demo application, and remove the Kubernetes cluster in each datacenter. Begin the cleanup by removing the peering between the two Consul clusters.

$ terraform -chdir=consul-peering destroy

##...

Plan: 0 to add, 0 to change, 6 to destroy.

##...

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

##...

Destroy complete! Resources: 6 destroyed.

Next, destroy the supporting infrastructure in your second datacenter. Due to race conditions with the various cloud resources created in this tutorial, you may need to run the destroy operation twice to ensure all resources have been properly removed.

$ terraform -chdir=dc2-gcloud destroy

##...

Plan: 0 to add, 0 to change, 112 to destroy.

##...

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

##...

Destroy complete! Resources: 112 destroyed.

Next, destroy the supporting infrastructure in your first datacenter. Due to race conditions with the various cloud resources created in this tutorial, you may need to run the destroy operation twice to ensure all resources have been properly removed.

$ terraform -chdir=dc1-aws destroy

##...

Plan: 0 to add, 0 to change, 45 to destroy.

##...

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

##...

Destroy complete! Resources: 45 destroyed.

Next steps

In this tutorial, you used Consul sameness groups to automate failover across two Consul clusters in different cloud providers. In the process, you learned about the benefits of using Sameness Groups for highly available applications over multiple cloud provider deployments.

Feel free to explore these tutorials and collections to learn more about Consul service mesh, microservices, and Kubernetes security.

API gateway

Explore tutorial library