DagsterDocs

Partitions

class dagster.Partition(value=None, name=None)[source]

Partition is the representation of a logical slice across an axis of a pipeline’s work

Parameters
  • value (Any) – The object for this partition

  • name (str) – Name for this partition

class dagster.PartitionSetDefinition(name, pipeline_name, partition_fn, solid_selection=None, mode=None, run_config_fn_for_partition=<function PartitionSetDefinition.<lambda>>, tags_fn_for_partition=<function PartitionSetDefinition.<lambda>>)[source]

Defines a partition set, representing the set of slices making up an axis of a pipeline

Parameters
  • name (str) – Name for this partition set

  • pipeline_name (str) – The name of the pipeline definition

  • partition_fn (Callable[void, List[Partition]]) – User-provided function to define the set of valid partition objects.

  • solid_selection (Optional[List[str]]) – A list of solid subselection (including single solid names) to execute with this partition. e.g. ['*some_solid+', 'other_solid']

  • mode (Optional[str]) – The mode to apply when executing this partition. (default: ‘default’)

  • run_config_fn_for_partition (Callable[[Partition], [Dict]]) – A function that takes a Partition and returns the run configuration that parameterizes the execution for this partition, as a dict

  • tags_fn_for_partition (Callable[[Partition], Optional[dict[str, str]]]) – A function that takes a Partition and returns a list of key value pairs that will be added to the generated run for this partition.

dagster.date_partition_range(start, end=None, delta_range='days', fmt=None, inclusive=False, timezone=None)[source]

Utility function that returns a partition generating function to be used in creating a PartitionSet definition.

Parameters
  • start (datetime) – Datetime capturing the start of the time range.

  • end (Optional(datetime)) – Datetime capturing the end of the partition. By default, the current time is used. The range is not inclusive of the end value.

  • delta_range (Optional(str)) – string representing the time duration of each partition. Must be a valid argument to pendulum.period.range (“days”, “hours”, “months”, etc.).

  • fmt (Optional(str)) – Format string to represent each partition by its start time

  • inclusive (Optional(bool)) – By default, the partition set only contains date interval partitions for which the end time of the interval is less than current time. In other words, the partition set contains date interval partitions that are completely in the past. If inclusive is set to True, then the partition set will include all date interval partitions for which the start time of the interval is less than the current time.

  • timezone (Optional(str)) – Timezone in which the partition values should be expressed.

Returns

Callable[[], List[Partition]]

dagster.identity_partition_selector(context, partition_set_def)[source]

Utility function for supplying a partition selector when creating a schedule from a partition set made of ``datetime``s that assumes the schedule always executes at the partition time.

It’s important that the cron string passed into create_schedule_definition match the partition set times. For example, a schedule created from a partition set with partitions for each day at midnight would create its partition selector as follows:

partition_set = PartitionSetDefinition(
    name='hello_world_partition_set',
    pipeline_name='hello_world_pipeline',
    partition_fn= date_partition_range(
        start=datetime.datetime(2021, 1, 1),
        delta_range="days",
        timezone="US/Central",
    )
    run_config_fn_for_partition=my_run_config_fn,
)

schedule_definition = partition_set.create_schedule_definition(
    "hello_world_daily_schedule",
    "0 0 * * *",
    partition_selector=identity_partition_selector,
    execution_timezone="US/Central",
)
dagster.create_offset_partition_selector(execution_time_to_partition_fn)[source]

Utility function for supplying a partition selector when creating a schedule from a partition set made of `datetime`s that assumes a fixed time offset between the partition time and the time at which the schedule executes.

It’s important to keep the cron string that’s supplied to PartitionSetDefinition.create_schedule_definition in sync with the offset that’s supplied to this function. For example, a schedule created from a partition set with partitions for each day at midnight that fills in the partition for day N at day N+1 at 10:00AM would create the partition selector as follows:

partition_set = PartitionSetDefinition(
    name='hello_world_partition_set',
    pipeline_name='hello_world_pipeline',
    partition_fn= date_partition_range(
        start=datetime.datetime(2021, 1, 1),
        delta_range="days",
        timezone="US/Central",
    )
    run_config_fn_for_partition=my_run_config_fn,
)

schedule_definition = partition_set.create_schedule_definition(
    "daily_10am_schedule",
    "0 10 * * *",
    partition_selector=create_offset_partition_selector(lambda d: d.subtract(hours=10, days=1))
    execution_timezone="US/Central",
)
Parameters
  • execution_time_to_partition_fn (Callable[[datetime.datetime], datetime.datetime]) – A

  • that maps the execution time of the schedule to the partition time. (function) –