Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/argo-cd] Make it possible to run ArgoCD in HA mode #27585

Merged
merged 7 commits into from
Jul 12, 2024

Conversation

maxnitze
Copy link
Contributor

@maxnitze maxnitze commented Jun 28, 2024

Description of the change

When the replicaCount of the application controller or the server deployments in the ArgoCD chart are set to a number higher than one, an additional env variable is set on the respective container ARGOCD_CONTROLLER_REPLICAS and ARGOCD_API_SERVER_REPLICAS.

In addition a change was added, that allows the application controller to be installed as a StatefulSet. I cannot find direct confirmation in the docs for it, but it seems the sharding of clusters, that is enabled when running with multiple application controller instances, is only working, when the pods are running as part of a stateful set. Otherwise they can't figure out their shard number and some clusters are not handled by any replica anymore (see here for example: argoproj/argo-cd#17016).

The "official" Helm chart also installs the application controller as a StatefulSet (see https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd/templates/argocd-application-controller). There are cases, where it is installed as a Deployment, but that's for a (currently in alpha) feature called Dynamic Cluster Distribution.

Also a validation was added, that multiple replicas of the application controller can only be set if the kind is changed to a StatefulSet (see "Benefits" for an explanation why).

Benefits

As per ArgoCD documentation on high availability, both the application controller and the server should have an environment variable set to the number of replicas in case they are run with multiple.

The application controller uses this value to "shard clusters across multiple controller replicas". The server uses it to "divide the limit of concurrent login requests (ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT) between each replica." The repo-server has no such environment variable.

Especially on the application controller, if the variable is not set, then the different replica will concurrently try to do the same operations on the clusters. Which results in constantly out-of-sync applications and other issues with the apps.

Possible drawbacks

None for the application itself.

Is this change breaking?

There are some interesting cases though.

Installations with Multiple Replicas of the Application Controller

In case you already have an installation with multiple replicas of the application controller, the chart will fail to validate, if you have not set the .Values.controller.kind to "StatefulSet". But in that case your application would not be running properly anyways. I had a similar setup for a week, or so, and the controller pods failed every couple of minutes due to memory issues created by the fact, that they concurrently worked on my clusters.

Env variable is set already in extraEnvVars

In case a user set the respective env variable already in the extraEnvVars block, this leads to a duplicate entry:

spec:
  replicas: 3
  templates:
    spec:
      containers:
        - name: controller
          env:
            - name: ARGOCD_CONTROLLER_REPLICAS
              value: "3" # <-- from the .Values.controller.replicaCount
            # ...
            - name: ARGOCD_CONTROLLER_REPLICAS
              value: "3" # <-- from .Values.controller.extraEnvVars

This is not a problem, since according to the documentation the last occurrence of an environment variable takes precedence.

Env variable is set already in extraEnvVarsCM or extraEnvVarsSecret

This one is a bit more tricky. According to the docs, the environment variables from env take precedence over those from envFrom. So in theory this could overwrite a value set in the extraEnvVarsCM or extraEnvVarsSecret.

BUT: This is only an issue, if those values differ. And since this value should be the same value as the number of replicas, this is a misconfiguration, from my perspective. I don't actually know what the implications of a wrong value are though. I could not find detailed info on the sharding of the application controller, but reading the linked docs I assume this could lead to clusters not being served by any of the replicas!

Applicable issues

None

Additional information

There are some issues and PRs in the ArgoCD repo talking about this:

Checklist

  • Chart version bumped in Chart.yaml according to semver. This is not necessary when the changes only affect README.md files.
  • Variables are documented in the values.yaml and added to the README.md using readme-generator-for-helm
  • Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
  • All commits signed off and in agreement of Developer Certificate of Origin (DCO)
@github-actions github-actions bot added argo-cd triage Triage is needed labels Jun 28, 2024
@github-actions github-actions bot requested a review from carrodher June 28, 2024 13:23
@carrodher carrodher added verify Execute verification workflow for these changes in-progress labels Jul 1, 2024
@github-actions github-actions bot removed the triage Triage is needed label Jul 1, 2024
@github-actions github-actions bot removed the request for review from carrodher July 1, 2024 06:41
@github-actions github-actions bot requested a review from fmulero July 1, 2024 06:42
@maxnitze maxnitze marked this pull request as draft July 1, 2024 13:36
@maxnitze
Copy link
Contributor Author

maxnitze commented Jul 1, 2024

Converted this back to a draft. I ran into some major issues with the sharding of the application controller today. Apparently it only really works, when the controller is installed as a StatefulSet :(

@maxnitze maxnitze force-pushed the feature/argocd-replica-env-variables branch 3 times, most recently from 43ff3c9 to cff50e4 Compare July 3, 2024 07:30
@maxnitze maxnitze marked this pull request as ready for review July 3, 2024 07:55
@maxnitze maxnitze changed the title [bitnami/argo-cd] Add replica env variables to app controller and server Jul 3, 2024
@maxnitze maxnitze force-pushed the feature/argocd-replica-env-variables branch 3 times, most recently from 694596a to 63cad23 Compare July 5, 2024 09:39
@maxnitze maxnitze force-pushed the feature/argocd-replica-env-variables branch from 2c03212 to 7d967b0 Compare July 9, 2024 09:53
Signed-off-by: Bitnami Containers <bitnami-bot@vmware.com>
Copy link
Collaborator

@fmulero fmulero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @maxnitze Thanks a lot for your contribution, this will be very useful for all users!

I left some comments, Could you give them a glance?

bitnami/argo-cd/templates/application-controller/_pod.tpl Outdated Show resolved Hide resolved
affinity: {{- include "common.tplvalues.render" ( dict "value" .Values.controller.affinity "context" $) | nindent 8 }}
{{- else }}
affinity:
podAffinity: {{- include "common.affinities.pods" (dict "type" .Values.controller.podAffinityPreset "component" "controller" "customLabels" $podLabels "context" $) | nindent 10 }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$podLabels variable is not declared

bitnami/argo-cd/templates/server/deployment.yaml Outdated Show resolved Hide resolved
bitnami/argo-cd/templates/_helpers.tpl Outdated Show resolved Hide resolved
bitnami/argo-cd/templates/application-controller/_pod.tpl Outdated Show resolved Hide resolved
…Labels variable

Signed-off-by: Max Nitze <max.nitze@mgm-tp.com>
@maxnitze
Copy link
Contributor Author

Hey @fmulero ,

thanks for the review! Looking at the Pod partial I saw, that the whole indentation was wrong. In the file itself, but also the calls to nindent were not adapted accordingly. Fixed that as well as your other remarks :)

Signed-off-by: Bitnami Containers <bitnami-bot@vmware.com>
…dation messages

Signed-off-by: Max Nitze <max.nitze@mgm-tp.com>
@maxnitze maxnitze requested a review from fmulero July 11, 2024 14:06
Copy link
Collaborator

@fmulero fmulero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @maxnitze for your contribution.

LGTM

@fmulero fmulero merged commit 8d231f8 into bitnami:main Jul 12, 2024
14 checks passed
@maxnitze maxnitze deleted the feature/argocd-replica-env-variables branch July 12, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-cd solved verify Execute verification workflow for these changes
4 participants