Task Inputs
Task inputs are the data that a task needs to run. We’ll show you how to define them.
Understanding Task Inputs
Task inputs are a crucial component in defining the structure and flow of data processing pipelines. They serve two primary functions:
- Specifying the data requirements for each task
- Establishing the order of task execution
The task_inputs
field is used to define these relationships, effectively creating a Directed Acyclic Graph
(DAG) of task dependencies. This DAG
ensures that tasks are executed in the correct order, with each task receiving its required inputs only after they have been produced by preceding tasks.
Key Characteristics of the Task Input DAG:
- Directed: Relationships between tasks have a specific direction, from
input
tooutput
. - Acyclic: The
graph
does not contain cycles, preventing infinite loops in task execution. That’s fancy way to say “we make sure your data doesn’t end up chasing its own tail.” - Graph: Tasks and their dependencies form a
network
structure.
In this example, we can observe the DAG structure:
invoice_parsing
is the root node, with no dependencies.extract_customer_info
andextract_invoice_items
both depend onparsed_invoice
.create_invoice_qna
depends on bothinvoice_items
andcustomer_info
.
This structure ensures that:
- Tasks are executed in the correct order
- Each task has access to its required inputs
- Parallel execution is possible for independent tasks
Validation and Execution
The system performs validation on the task input definitions to ensure the DAG’s integrity:
- Cyclic Dependency Check: Verifies that no cycles exist in the task dependencies.
- Existence Check: Confirms that all referenced inputs are defined.
- Type Consistency: Ensures that input and output data types are compatible.
During execution, the DAG is traversed to determine the optimal order of task execution, potentially allowing for parallel processing of independent task branches. As tasks complete, their outputs are stored and made available to downstream tasks.