sieves
sieves
is a Python library designed for zero-shot and few-shot NLP tasks that focuses on structured generation,
allowing developers to build production-ready NLP prototypes without requiring training data. It provides a unified
interface that wraps popular NLP tools (like outlines
, dspy
, langchain
, and others) while ensuring structured
outputs and observability.
It bundles common NLP utilities, document parsing, and text chunking capabilities together with ready-to-use tasks like classification and information extraction, all organized in an observable pipeline architecture. It's particularly valuable for rapid prototyping scenarios where structured output is needed but training data is scarce.
Quick Installation
You can install sieves
with different options depending on your needs
Core package with minimal dependencies:
pip install sieves
sieves
relies on the functionality of a lot of other libraries to work properly. The minimal setup allows
you to manually install only the dependencies you need to keep the disk footprint small, but keep in mind you won't be
able to use any of the pre-built tasks with this setup.
All dependencies for every feature, including all supported engines and utilities:
pip install "sieves[all]"
Specific Features
All document processing utilities (PDF parsing, chunking, etc.):
pip install "sieves[utils]"
All supported engines:
pip install "sieves[engines]"
Development Setup
- Set up
uv
. - Install all dependencies for development, testing, documentation generation with:
uv pip install --system .[all,test]
.
Core Concepts
sieves
is built around five key components:
Pipeline
: The main orchestrator that runs your NLP tasks sequentiallyTask
: Pre-built or custom NLP operations (classification, extraction, etc.)Engine
: Backend implementations that power the tasks (outlines, dspy, langchain, etc.)Bridge
: Connectors between Tasks and EnginesDoc
: The fundamental data structure for document processing
Essential Links
Guides
We've prepared several guides to help you get up to speed quickly:
- Getting Started - Start here! Learn the basic concepts and create your first pipeline.
- Document Preprocessing - Master document parsing, chunking, and text standardization.
- Creating Custom Tasks - Learn to create your own tasks when the built-in ones aren't enough.
- Saving and Loading Pipelines - Version and share your pipeline configurations.
Getting Help
- Check our GitHub Issues for common problems
- Review the documentation in the
/docs/guides/
directory - Join our community discussions (link to be added)
Next Steps
- Dive into our guides, starting with the Getting Started Guide
- Check out example pipelines in our repository
- Learn about custom task creation
- Understand different engine configurations
Consult the API reference for each component you're working with if you have specific question. They contain detailed information about parameters, configurations, and best practices.