Optimizing Renovate for GitLab with 500+ Repositories

Renovate is a great tool for keeping your code secure and up-to-date. But when you have 500+ repositories, it can be slow and inefficient…

Optimizing Renovate for GitLab with 500+ Repositories
Renovate for GitLab

At Notive, we like to keep all our deployed code secure and make sure no vulnerabilities are on any of our production environments.

However, as a software agency, we also have many repositories that are deployed in stable production environments. These projects are still important, but they are no longer actively being worked on, and as a result, developers will not update them as part of the biweekly development cycle.

To solve this issue, we have recently introduced Renovate to push package updates to us daily, creating awareness of security updates for both actively maintained repositories and repositories that have become stale in terms of feature development.

Renovate checks your language’s package files (package.json for NPM, go.mod for Golang), your docker-compose.yml included services, and Dockerfile FROM statements. It even checks your Terraform modules or Kustomize repo for container images that have updates available.

This allows us to have a single tool responsible for handling all possible dependencies in our projects, while other tools focus on a single component (e.g., Trivy is used for scanning a build container image).

Initial configuration

We introduced Renovate using their supplied configuration for GitLab Runner: https://gitlab.com/renovate-bot/renovate-runner

include: 
    - project: 'renovate-bot/renovate-runner' 
      file: '/templates/renovate.gitlab-ci.yml'

This is a great starting point. However, when running this pipeline using a schedule, we ended up with a GitLab CI job that ran for 60+ minutes and exceeded our configured CI timeout of 1 hour.

We also observed that our runner was slowing down because it started to trigger CI jobs on all our other repositories as it wrote new commits to them.

Parellel runs

Running all projects in a single job has a few other downsides:

  • When the job pipeline crashes, all remaining repositories in the queue would also fail.
  • The job log size was enormous and exceeded the default GitLab CI log limit, making it hard to trace any unusual behavior.
  • We had a long running job hogging resources on our GitLab CI runner for a very long time.

Renovate recommends using parallel jobs as an improvement. Parallel jobs allow running the same job with different arguments simultaneously.

We needed to update our gitlab-ci.yaml to support the parallel matrix, using the template as provided by Renovate: https://github.com/renovatebot/docker-renovate/blob/HEAD/docs/gitlab.md

include: 
  - project: 'renovate-bot/renovate-runner' 
    file: '/templates/renovate-slim.gitlab-ci.yml' 
    ref: v8.81.6 
 
renovate: 
  variables: 
    RENOVATE_AUTODISCOVER: 'true' 
    RENOVATE_AUTODISCOVER_FILTER: '<group>/**' 
  script: 
    - renovate --write-discovered-repos=template/renovate-repos.json 
    - sed "s~###RENOVATE_REPOS###~$(cat template/renovate-repos.json)~" template/.gitlab-ci.yml > .gitlab-renovate-repos.yml 
  artifacts: 
    paths: 
      - renovate-repos.json 
      - .gitlab-renovate-repos.yml 
 
renovate:repos: 
  stage: deploy 
  needs: 
    - renovate 
  inherit: 
    variables: false 
  trigger: 
    include: 
      - job: renovate 
        artifact: .gitlab-renovate-repos.yml

And the template/gitlab-ci.yaml file which is used for the parallel job matrix:

include: 
  - project: 'renovate-bot/renovate-runner' 
    file: '/templates/renovate-slim.gitlab-ci.yml' 
    ref: v8.81.6 
 
variables: 
  RENOVATE_ONBOARDING: 'true' 
 
renovate: 
  parallel: 
    matrix: 
      - RENOVATE_EXTRA_FLAGS: ###RENOVATE_REPOS### 
  resource_group: $RENOVATE_EXTRA_FLAGS

Great! Now we have managed to include all the repositories in the initial job and create a parallel job that takes an array of all the repositories found during the AUTODISCOVER initial stage as input.

We tried running this, but unfortunately, GitLab only supports the creation of up to 200 parallel jobs at a time. Although there is an open issue to address this limitation on self-hosted instances, it doesn’t seem to be resolved anytime soon.

GitLab was failing the pipeline due to too many jobs generated

Final optimization

Our last solution was to ensure that the jobs would fit within the 200-job limit. To achieve this, we decided to bundle repositories in small batches.

However, this approach reintroduces some of the drawbacks of the initial option we sought to address with parallel runs. As a compromise, we decided to group the repositories by 5 and create a parallel build for every 5 jobs. This seemed like a good balance between the advantages and disadvantages of using concurrency.

To implement this, we made changes to the .gitlab-ci.yaml file and utilized some shell magic with the generated file from the Renovate discovery:

renovate: 
  variables: 
    RENOVATE_AUTODISCOVER: 'true' 
  script: 
    - renovate --write-discovered-repos=template/renovate-repos.json 
    - | 
      # Define the input JSON file 
      input_file="template/renovate-repos.json" 
 
      # Read the JSON array from the input file 
      json_array=$(cat "$input_file") 
 
      # Split the JSON array into smaller arrays of 5 elements 
      split_array=$(echo "$json_array" | jq -r '.[]' | xargs -n5) 
 
      # Merge the split JSON arrays into a single array with elements joined by spaces 
      merged_array=$(echo "$split_array" | jq -R . | jq -s '.') 
 
      # Write the merged array to the output file 
      echo "$merged_array" | tr -d '\n' > "template/renovate-repos-merged.json" 
 
    - sed "s~###RENOVATE_REPOS###~$(cat template/renovate-repos-merged.json)~" template/.gitlab-ci.yml > .gitlab-renovate-repos.yml 
  artifacts: 
    paths: 
      - template/renovate-repos-merged.json 
      - .gitlab-renovate-repos.yml

This approach significantly reduces the impact of errors on individual repositories. It also enables our GitLab CI worker to scale horizontally by running multiple jobs concurrently.

Conclusion

Ultimately, a single parallel job takes approximately 5 minutes, and the entire pipeline completes in around 15 minutes. This timeframe is acceptable for us as we run the pipeline daily after working hours.

Ideally, we hope that GitLab introduces a setting to specify the maximum number of jobs that can be spawned for self-hosted instances. Such a feature would enhance error reporting and monitoring for our Renovate pipeline.

There is still a lot of configuration to add to Renovate to better integrate it into our day-to-day workflow. For example, we need to configure default settings for auto-merge and make the Merge Request (MR) it creates easier to manage for our developers. Nevertheless, having Renovate working for so many repositories is a great first step!