Cyyrus doesn’t need a manual, but here’s one anyway.
dataset
section is the heart of our schema. It defines the overall structure and properties of our invoice dataset.
parsed_invoice
column is required in every record.experimental/sample
directory.Parsing
- Unf*ck unstructured data. Nothing more, nothing less.Generation
- Tap into latent space of Language Models to generate data. And more.Extraction
- Seed your data from an existing dataset. Be it a CSV, JSON, or a something else.Labelling
- Got bunch of images? Tag ‘em all!Scraping
- Got an unusable URL? Scrape it like you mean it.parsed_invoice
: This column will contain the output from our invoice parsing task.