Tasks are the heart of the dataset generation process. They define the steps that will be used to generate the dataset. Tasks can include parsing, extraction, or something else. They are written in YAML and are used to generate synthetic data that adheres to the specified structure.

Task Structure

A task is a YAML object that contains the following fields:

  • task_id: The unique identifier for the task.
  • task_type: The type of task. It can be generation, parsing, or any other type.
  • task_properties: The properties required for the task. These properties can include the model, prompt, API key, etc.
Task Structure
tasks:
    parse_graphs: # This is called `task_id`
        task_type: generation # Define the `type` for task
        task_properties: # Incase the task requires any properties define them here
            model: gpt-4o-mini
            prompt: What are the insights from the graph
            api_key: $OPENAI_API_KEY

Task Types

Tasks could be one of the following types:

  • generation: This type of task is used to generate data using a model.
  • parsing: This type of task is used to parse data from a source.
  • extraction: This type of task is used to extract columns from a columnar data.
  • scraping: This type of task is used to scrape data from a website.
  • labelling: This type of task is used assign zero shot labels to data.

At present, Cyyrus supports generation and parsing tasks. We will be adding support for other task types sometime later.

In case you have a specific task type in mind, reach out and help shape our priorities - we’d love to chat.

Task Properties

Task properties are the parameters required for the task. These properties can include the model, prompt, API key, etc. The properties are defined in the task_properties field of the task object.

We have neat documentation for each of the task types. Check them out for more details.