What is sieves?
sieves is a library for zero-shot document AI with structured generation. It facilitates the rapid prototyping of
document AI pipelines with validated output. No training required.
It bundles common NLP utilities, document parsing, and text chunking capabilities together with ready-to-use tasks like classification and information extraction, all organized in an observable pipeline architecture. It's particularly valuable for rapid prototyping scenarios where structured output is needed but training data is scarce.
sieves is built around three key components that you should be familiar with in order to use it:
Pipeline: The main orchestrator that runs your NLP tasks sequentially (define withPipeline([...])or chain with+)Task: Pre-built or custom NLP operations (classification, extraction, etc.)Doc: The fundamental data structure for document processing
There are two more key components that you only need to know if you're a maintainer or want to create your own tasks for
sieves:
ModelWrapper: Backend implementations that power the tasks (outlines, dspy, langchain, etc.)Bridge: Connectors between Tasks and Model wrappers
Next Steps
- Install
sieves - Dive into our guides, starting with the Getting Started Guide
- Understand different model configurations
- Learn about custom task creation
Consult the API reference for each component you're working with if you have specific question. They contain detailed information about parameters, configurations, and best practices.
Essential Links
For any feedback, feature requests, contributions etc. use our GitHub issue tracker.
sieves is maintained by Mantis, an AI consultancy. We help our clients to solve business problems related to
natural human language and speech. If that's something you're interested in - drop us a line!