Docling
Wrapper for Docling
for the conversion of complex files into markdown.
Docling
Bases: Task
Parser wrapping the docling library to convert files into documents.
Source code in sieves/tasks/preprocessing/ocr/docling_.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
id
property
Returns task ID. Used by pipeline for results and dependency management.
Returns:
Type | Description |
---|---|
str
|
Task ID. |
__call__(docs)
Parse resources using docling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs
|
Iterable[Doc]
|
Resources to process. |
required |
Returns:
Type | Description |
---|---|
Iterable[Doc]
|
Parsed documents |
Source code in sieves/tasks/preprocessing/ocr/docling_.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
__init__(converter=None, export_format='markdown', task_id=None, show_progress=True, include_meta=False)
Initialize the docling parser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
converter
|
DocumentConverter
|
Docling parser instance. |
None
|
task_id
|
str | None
|
Task ID. |
None
|
show_progress
|
bool
|
Whether to show progress bar for processed documents |
True
|
include_meta
|
bool
|
Whether to include meta information generated by the task. |
False
|
Source code in sieves/tasks/preprocessing/ocr/docling_.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
deserialize(config, **kwargs)
classmethod
Generate Task instance from config.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
Config
|
Config to generate instance from. |
required |
kwargs
|
dict[str, Any]
|
Values to inject into loaded config. |
{}
|
Returns:
Type | Description |
---|---|
Task
|
Deserialized Task instance. |
Source code in sieves/tasks/core.py
56 57 58 59 60 61 62 63 64 |
|
serialize()
Serializes task.
Returns:
Type | Description |
---|---|
Config
|
Config instance. |
Source code in sieves/tasks/core.py
50 51 52 53 54 |
|