Skip to content

What is sieves?

sieves is a library for zero-shot document AI with structured generation. It facilitates the rapid prototyping of document AI pipelines with validated output. No training required.

It bundles common NLP utilities, document parsing, and text chunking capabilities together with ready-to-use tasks like classification and information extraction, all organized in an observable pipeline architecture. It's particularly valuable for rapid prototyping scenarios where structured output is needed but training data is scarce.

sieves is built around three key components that you should be familiar with in order to use it:

  1. Pipeline: The main orchestrator that runs your NLP tasks sequentially (define with Pipeline([...]) or chain with +)
  2. Task: Pre-built or custom NLP operations (classification, extraction, etc.)
  3. Doc: The fundamental data structure for document processing

There are two more key components that you only need to know if you're a maintainer or want to create your own tasks for sieves:

  1. ModelWrapper: Backend implementations that power the tasks (outlines, dspy, langchain, etc.)
  2. Bridge: Connectors between Tasks and Model wrappers

Next Steps

Consult the API reference for each component you're working with if you have specific question. They contain detailed information about parameters, configurations, and best practices.


Essential Links

For any feedback, feature requests, contributions etc. use our GitHub issue tracker.


sieves is maintained by Mantis, an AI consultancy. We help our clients to solve business problems related to natural human language and speech. If that's something you're interested in - drop us a line!