Skip to content

sieves

sieves is a Python library designed for zero-shot and few-shot NLP tasks that focuses on structured generation, allowing developers to build production-ready NLP prototypes without requiring training data. It provides a unified interface that wraps popular NLP tools (like outlines, dspy, langchain, and others) while ensuring structured outputs and observability.

It bundles common NLP utilities, document parsing, and text chunking capabilities together with ready-to-use tasks like classification and information extraction, all organized in an observable pipeline architecture. It's particularly valuable for rapid prototyping scenarios where structured output is needed but training data is scarce.

Quick Installation

You can install sieves with different options depending on your needs

Core package with minimal dependencies:

pip install sieves
Note that sieves relies on the functionality of a lot of other libraries to work properly. The minimal setup allows you to manually install only the dependencies you need to keep the disk footprint small, but keep in mind you won't be able to use any of the pre-built tasks with this setup.

All dependencies for every feature, including all supported engines and utilities:

pip install "sieves[all]"

Specific Features

All document processing utilities (PDF parsing, chunking, etc.):

pip install "sieves[utils]"

All supported engines:

pip install "sieves[engines]"

Development Setup

  1. Set up uv.
  2. Install all dependencies for development, testing, documentation generation with: uv pip install --system .[all,test].

Core Concepts

sieves is built around five key components:

  1. Pipeline: The main orchestrator that runs your NLP tasks sequentially
  2. Task: Pre-built or custom NLP operations (classification, extraction, etc.)
  3. Engine: Backend implementations that power the tasks (outlines, dspy, langchain, etc.)
  4. Bridge: Connectors between Tasks and Engines
  5. Doc: The fundamental data structure for document processing

Guides

We've prepared several guides to help you get up to speed quickly:

Getting Help

  • Check our GitHub Issues for common problems
  • Review the documentation in the /docs/guides/ directory
  • Join our community discussions (link to be added)

Next Steps

  • Dive into our guides, starting with the Getting Started Guide
  • Check out example pipelines in our repository
  • Learn about custom task creation
  • Understand different engine configurations

Consult the API reference for each component you're working with if you have specific question. They contain detailed information about parameters, configurations, and best practices.