Getting Started
This guide will help you get started with using sieves for zero-shot and few-shot NLP tasks with structured generation.
Basic Concepts
sieves is built around four main concepts:
- Documents (
Doc): The basic unit of text that you want to process. A document can be created from text or a URI. - Models + ModelSettings: You pass a model from your chosen backend (Outlines, DSPy, LangChain, etc.) and optional
ModelSettings(e.g., batch size) - Tasks: NLP operations you want to perform on your documents (classification, information extraction, etc.)
- Pipeline: A sequence of tasks that process your documents
Quick Start Example
Here's a simple example that performs text classification:
import outlines
import transformers
from sieves import Pipeline, tasks, Doc, ModelSettings
# Set up model.
model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"
model = outlines.models.from_transformers(
transformers.AutoModelForCausalLM.from_pretrained(model_name),
transformers.AutoTokenizer.from_pretrained(model_name)
)
# Define task.
task = tasks.Classification(
labels=["science", "politics"],
model=model,
# If you want to change inference-time parameters, pass them like this:
model_settings=ModelSettings(inference_kwargs={"max_new_tokens": 512})
)
# Define pipeline with at least one task.
pipeline = Pipeline(task)
# Define documents to analyze.
doc = Doc(text="The new telescope captures images of distant galaxies.")
# Run pipeline a print results.
docs = list(pipeline([doc]))
print(docs[0].results) # {'Classification': [('science', 1.0), ('politics', 0.0)]}
print(docs[0].meta) # {'Classification': {'raw': ['{ "science": 1.0, "politics": 0'], 'usage': {'input_tokens': 2, 'output_tokens': 2, 'chunks': [{'input_tokens': 2, 'output_tokens': 2}]}}, 'usage': {'input_tokens': 2, 'output_tokens': 2}}
# This produces: {'Classification': [('science', 1.0), ('politics', 0.0)]}
Using Label Descriptions
You can improve classification accuracy by providing descriptions for each label. This is especially helpful when label names alone might be ambiguous:
import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer
from sieves import Pipeline, tasks, Doc
doc = Doc(text="Special relativity applies to all physical phenomena in the absence of gravity.")
model_name = "HuggingFaceTB/SmolLM-135M-Instruct"
model = outlines.models.from_transformers(
AutoModelForCausalLM.from_pretrained(model_name),
AutoTokenizer.from_pretrained(model_name)
)
# Use dict format to provide descriptions.
pipeline = Pipeline([
tasks.predictive.Classification(
labels={
"science": "Scientific topics including physics, biology, chemistry, and natural sciences",
"politics": "Political news, government affairs, elections, and policy discussions"
},
model=model,
)
])
for doc in pipeline([doc]):
print(doc.results)
Working with Documents
Documents can be created in several ways:
from sieves import Doc
# From text.
doc = Doc(text="Your text here")
from sieves import Doc
# From a file (requires docling).
doc = Doc(uri="path/to/your/file.pdf")
# With metadata.
doc = Doc(
text="Your text here",
meta={"source": "example", "date": "2025-01-31"}
)
Note: File-based ingestion (Docling/Marker/...) is optional and not installed by default. To enable it, install the ingestion extra or the specific libraries you need:
pip install "sieves[ingestion]"
Advanced Example: PDF Processing Pipeline
Here's a more involved example that:
- Parses a PDF document
- Chunks it into smaller pieces
- Performs information extraction on each chunk
import chonkie
import outlines
import tokenizers
import pydantic
from transformers import AutoModelForCausalLM, AutoTokenizer
from sieves import Doc, tasks
# Create a tokenizer for chunking.
tokenizer = tokenizers.Tokenizer.from_pretrained("bert-base-uncased")
# Initialize components.
chunker = tasks.Chunking(
chunker=chonkie.TokenChunker(tokenizer, chunk_size=512, chunk_overlap=50)
)
# Choose a model for information extraction.
model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"
model = outlines.models.from_transformers(
AutoModelForCausalLM.from_pretrained(model_name),
AutoTokenizer.from_pretrained(model_name)
)
# Define the structure of information you want to extract.
class PersonInfo(pydantic.BaseModel, frozen=True):
name: str
age: int | None = None
occupation: str | None = None
# Create an information extraction task.
extractor = tasks.predictive.InformationExtraction(
entity_type=PersonInfo,
model=model,
)
# Create the pipeline (use + for succinct chaining).
pipeline = chunker + extractor
# Process a document.
doc = Doc(text="Marie Curie died at the age of 66 years.")
results = list(pipeline([doc]))
# Access the extracted information.
for result in results:
print(result.results["InformationExtraction"])
Supported Models
sieves supports multiple frameworks for interacting with zero-shot language models - see
https://sieves.ai/guides/models/ for an overview.
You pass supported models directly to PredictiveTask. Optionally, you can include ModelSettings to
influence model initialization and runtime behavior.
Batching is configured on each task via batch_size:
- batch_size = -1 processes all inputs at once (default)
- batch_size = N processes N docs per batch
ModelSettings (optional)
ModelSettings controls details of how the model's structured generation will be run. It allows to configure:
strict: Whether to raise encountered errors, or assign placeholder values for documents that failed to process.init_kwargs: Model-specific arguments that will be passed to the model's structured generation abstraction at its initialization.inference_kwargs: Model-specific arguments that will be passed to the model's structured generation abstraction during inference.config_kwargs: Model-specific arguments that will be applied to the model after task initialization.inference_mode: Model-specific modes for structured generation. Don't change this unless you exactly know what you're doing.
Example:
from sieves.model_wrappers.utils import ModelSettings
from sieves import tasks
classifier = tasks.Classification(
labels={
"science": "Scientific topics and research",
"politics": "Political news and government"
},
model=model,
model_settings=ModelSettings(strict=True),
batch_size=8,
)
To specify an inference mode (model type-specific):
import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer
from sieves.model_wrappers import outlines_
from sieves.model_wrappers.utils import ModelSettings
from sieves import tasks
model_name = "HuggingFaceTB/SmolLM-135M-Instruct"
model = outlines.models.from_transformers(
AutoModelForCausalLM.from_pretrained(model_name),
AutoTokenizer.from_pretrained(model_name)
)
classifier = tasks.Classification(
labels=["science", "politics"],
model=model,
model_settings=ModelSettings(
strict=True,
inference_mode=outlines_.InferenceMode.json # Specifies how to parse results.
),
batch_size=8,
)
Evaluating Your Pipeline
sieves provides built-in support for measuring the performance of your tasks against ground-truth data.
To evaluate a task, you must provide the expected results in the .gold field of your Doc objects. The evaluation engine then compares the model's .results with your .gold data.
from sieves import Doc, Pipeline, tasks
# 1. Setup a pipeline.
task = tasks.Classification(labels=["science", "politics"], model=model, task_id="clf")
pipeline = Pipeline(task)
# 2. Prepare test documents with ground-truth (gold) data.
doc = Doc(text="The telescope discovered a new galaxy.")
doc.gold["clf"] = "science" # Set expected output for task 'clf'
# 3. Run inference.
docs = list(pipeline([doc]))
# 4. Evaluate performance.
report = pipeline.evaluate(docs)
# Print results.
print(report["clf"].metrics) # e.g., {'F1 (Macro)': 0.5}
Different tasks use different evaluation strategies:
- Deterministic Metrics: Tasks like Classification and NER use standard metrics (e.g., accuracy, f1-score) to provide precise measurements.
- Model-based Evaluation: Generative tasks like Summarization and Translation require a "judge" model (a dspy.LM instance) to assess semantic similarity.