Chonkie
Allows chunking of documents into segments.
            Chonkie
    
              Bases: Task
Chunker wrapping the chonkie library.
Source code in sieves/tasks/preprocessing/chunking/chonkie_.py
                | 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |  | 
            id
  
      property
  
    Return task ID.
Used by pipeline for results and dependency management.
Returns:
| Type | Description | 
|---|---|
| str | Task ID. | 
            __add__(other)
    Chain this task with another task or pipeline using the + operator.
This returns a new Pipeline that executes this task first, followed by the
task(s) in other. The original task(s)/pipeline are not mutated.
Cache semantics:
- If other is a Pipeline, the resulting pipeline adopts other's
  use_cache setting (because the left-hand side is a single task).
- If other is a Task, the resulting pipeline defaults to use_cache=True.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| other | Task | Pipeline | A  | required | 
Returns:
| Type | Description | 
|---|---|
| Pipeline | A new  | 
Raises:
| Type | Description | 
|---|---|
| TypeError | If  | 
Source code in sieves/tasks/core.py
              | 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |  | 
            __call__(docs)
    Split documents into chunks.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| docs | Iterable[Doc] | Documents to split. | required | 
Returns:
| Type | Description | 
|---|---|
| Iterable[Doc] | Split documents. | 
Source code in sieves/tasks/preprocessing/chunking/chonkie_.py
              | 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |  | 
            __init__(chunker, task_id=None, include_meta=False, batch_size=-1)
    Initialize chunker.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| task_id | str | None | Task ID. | None | 
| include_meta | bool | Whether to include meta information generated by the task. | False | 
| batch_size | int | Batch size to use for processing. Use -1 to process all documents at once. | -1 | 
Source code in sieves/tasks/preprocessing/chunking/chonkie_.py
              | 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |  | 
            deserialize(config, **kwargs)
  
      classmethod
  
    Generate Task instance from config.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| config | Config | Config to generate instance from. | required | 
| kwargs | dict[str, Any] | Values to inject into loaded config. | {} | 
Returns:
| Type | Description | 
|---|---|
| Task | Deserialized Task instance. | 
Source code in sieves/tasks/core.py
              | 95 96 97 98 99 100 101 102 103 104 |  | 
            serialize()
    Serialize task.
Returns:
| Type | Description | 
|---|---|
| Config | Config instance. | 
Source code in sieves/tasks/core.py
              | 88 89 90 91 92 93 |  |