DagsterDocs

Customizing your Kubernetes Deployment#

Overview#

We go over common ways to customize your Dagster Helm deployment. This includes adding Kubernetes and Celery configuration at the pipeline and solid level, configuring Celery queues, and configuring your Helm release to use external resources.

Prerequsites#

We expect familiarity with the basic guide and advanced guide on deploying Dagster with Helm.

Walkthrough#

Solid or Pipeline Kubernetes Configuration#

The dagster-k8s/config allows users to pass custom configuration to the Kubernetes Job, Job metadata, JobSpec, PodSpec, and PodTemplateSpec metadata. We can specify this information in a solid or pipeline's tags.

@solid(
  tags = {
    'dagster-k8s/config': {
      'container_config': {
        'resources': {
          'requests': { 'cpu': '250m', 'memory': '64Mi' },
          'limits': { 'cpu': '500m', 'memory': '2560Mi' },
        }
      },
      'pod_template_spec_metadata': {
        'annotations': { "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}
      },
      'pod_spec_config': {
        'affinity': {
          'nodeAffinity': {
            'requiredDuringSchedulingIgnoredDuringExecution': {
              'nodeSelectorTerms': [{
                'matchExpressions': [{
                  'key': 'beta.kubernetes.io/os', 'operator': 'In', 'values': ['windows', 'linux'],
                }]
              }]
            }
          }
        }
      },
    },
  },
)
def my_solid(context):
  context.log.info('running')

@pipeline(
  tags = {
    'dagster-k8s/config': {
      'container_config': {
        'resources': {
          'requests': { 'cpu': '200m', 'memory': '32Mi' },
        }
      },
    }
  }
)
def my_pipeline():
  my_solid()

Configuring Celery Queues#

Users can configure multiple Celery queues (for example, one queue for each resource to be limited) and multiple Celery workers per queue via the runLauncher.config.celeryK8sRunLauncher.workerQueues section of values.yaml.

To use the queues, dagster-celery/queue can be set on solid tags.

By default, all solids will be sent to the default Celery queue named dagster.

@solid(
  tags = {
    'dagster-celery/queue': 'snowflake_queue',
  }
)
def my_solid(context):
  context.log.info('running')

Celery Priority#

Users can set dagster-celery/run_priority on the pipeline tags to configure the baseline priority of all solids from that pipeline. To set priority at the solid level, users can set dagster-celery/priority on the solid tags to configure additional priority. When priorities are set on both the pipeline and solid, the sum of both priorities will be used.

@solid(
  tags = {
    'dagster-celery/priority': 2,
  }
)
def my_solid(context):
  context.log.info('running')

@pipeline(
  tags = {
    'dagster-celery/run_priority': 3,
  }
)
def my_pipeline():
  my_solid()

Configuring an External Database#

In a real deployment, users will likely want to set up an external PostgreSQL database and configure the postgresql section of values.yaml.

postgresql:
  enabled: false
  postgresqlHost: "postgresqlHost"
  postgresqlUsername: "postgresqlUsername"
  postgresqlPassword: "postgresqlPassword"
  postgresqlDatabase: "postgresqlDatabase"
  service:
    port: 5432

Configuring an External Message Broker#

In a real deployment, users will likely want to set up an external message broker like Redis, and configure rabbitmq and redis sections of values.yaml.

rabbitmq:
  enabled: false

redis:
  enabled: true
  internal: false
  host: "redisHost"
  port: 6379
  brokerDbNumber: 0
  backendDbNumber: 0

Security#

Users will likely want to permission a ServiceAccount bound to a properly scoped Role to launch Jobs and create other Kubernetes resources.

Users will likely want to use Secrets for managing secure information such as database logins.

Conclusion#

You should now be familiar with the common ways to customize your Dagster Helm deployment.