Annotation Project¶

The AnnotationProject class represents an annotations project within BerryDB. You typically obtain an instance of this class by calling the BerryDB.create_annotation_project()_ method.

Once you have an AnnotationProject instance, you can use its methods to configure the project, populate it with data, connect it to machine learning backends, and manage annotations and predictions.

`AnnotationProject Methods`¶

AnnotationProject.setup_label_config(label_config)¶

The setup_label_config method configures the label settings for the specified annotation project. This configuration defines the labeling structure and rules to ensure consistent and organized annotation of data within the project. It is essential for tasks such as Named Entity Recognition (NER), image or text classification, and other annotation-based workflows where specific labels are applied to different data points. Without this setup, annotations and predictions added to the project will not be visible in the UI, making it a critical step for visualizing and managing the labeled data.

Parameters:

label_config (str): A string representing the label configuration. This configuration defines the labels that will be used within the project and their respective properties. The format typically follows a predefined structure that outlines label categories, attributes, and potentially hierarchical relationships between labels.

Returns:

Dict: A dict with a message indicating whether the label configuration was successfully applied to the project. In case of an error (e.g., invalid project ID or configuration format), the returned message will provide details about the failure.

Example

# Define your API key and other required parameters
project_id = 12345  # ID of the project you want to configure

# Label config for Topic Modeling
project_config = '''
<View>
<Text name="text" value="$content.text"/>
<View style="box-shadow: 2px 2px 5px #999; padding: 20px; margin-top: 2em; border-radius: 5px;">
    <Header value="Choose text sentiment"/>
    <Choices name="sentiment" toName="text" choice="single" showInLine="true">
    <Choice value="Introduction"/>
    <Choice value="Plan Information"/>
    <Choice value="Eligibility"/>
    <Choice value="Benefits"/>
    <Choice value="Financial Information"/>
    <Choice value="Provider Information"/>
    <Choice value="Claims and Reimbursements"/>
    <Choice value="Administrative Information"/>
    <Choice value="Legal and Regulatory Information"/>
    <Choice value="Dates and Deadlines"/>
    </Choices>
</View>
</View>
'''

# Assume 'my_annotation_project' is an instance of AnnotationProject
# Setup label for Topic Modeling
label_config = my_annotation_project.setup_label_config(project_config)
if label_config:
    print("Label config setup succesful!")

AnnotationProject.populate(database)¶

The populate method is designed to fill a specific annotation project with data retrieved from a designated database. This process allows users to efficiently import existing data for annotation tasks, enabling faster project setup and reducing the need for manual data entry.

Parameters:

database (Database | str): The BerryDB database from which data will be sourced. This can be an instance of the Database class or a string representing the name of the database. If a string is provided, the SDK will attempt to connect to this database using the berrydb_api_key associated with this AnnotationProject.

Returns:

Dict | None: A dictionary containing the API response upon successful population, or None if an error occurs during the process. The dictionary typically includes details about the import task.

Example

# Define your required parameters
database_name = "PatientDB"  # Name of the database from which to populate data

# Assume 'my_annotation_project' is an instance of AnnotationProject
# Option 1: Using database name (string)
result_from_name = my_annotation_project.populate(database_name)
if result_from_name:
    print(f"Successfully populated project using database name: {database_name}")

# Option 2: Using a Database object (When you either create or connect to a database)
from berrydb import BerryDB
my_database_object = BerryDB.connect(api_key="your_api_key", database_name=database_name)
result_from_object = my_annotation_project.populate(my_database_object)

AnnotationProject.connect_to_ml(model=None, model_url=None, model_title='ML Model')¶

The connect_to_ml method establishes a connection between a specified annotation project and a Machine Learning (ML) model backend. This integration enables the project to leverage the capabilities of the ML model for tasks such as predictions, automated annotations, or data analysis, thereby enhancing the annotation workflow.

Tip

You can find your BerryDB ML Model URLs for ml_url here.

Parameters:

ml_url (str): The URL of the ML backend to which the project will be connected. This URL should point to a deployed ML model or service that is accessible from the BerryDB environment.
ml_title (str, optional): A title for the connection, which provides context for the integration. By default, this is set to “ML Model”, but you can customize it to better describe the specific ML model being used.

Returns:

None: This method does not return a value. Instead, it establishes the connection and prints a success or failure message indicating the result of the operation.

Example

ml_url = "http://app.berrydb.io/berrydb/model/model-id"
ml_title = "Text Classification Model"

# Assume 'my_annotation_project' is an instance of AnnotationProject
# Connect the annotation project to the specified ML model
my_annotation_project.connect_to_ml(ml_url, ml_title)

AnnotationProject.retrieve_prediction(task_ids)¶

The retrieve_prediction method is used to add a custom prediction to a specific task/record within an annotation project. This allows users to associate automated insights or model outputs with individual task, facilitating the annotation process and improving overall data management.

Parameters:

task_ids (List[int]): The identifier of the tasks for which the prediction will be retrieved. This IDs should correspond to a valid tasks within the specified project.

Returns:

None: This method does not return a value. Instead, it retrieves predictions to the specified tasks asynchronously.

Example

task_ids = [12340, 12341, 12342]

# Retrieve predictions for the specified tasks
ner_proj.retrieve_prediction(task_id)

AnnotationProject.create_prediction(task_id, prediction)¶

The create_prediction method is used to add a custom prediction to a specific task/record within an annotation project. This allows users to associate automated insights or model outputs with individual task, facilitating the annotation process and improving overall data management.

Parameters:

task_id (int): The unique identifier of the task to which the prediction will be added. This ID should correspond to a valid task within the specified project.
prediction (dict): A dictionary containing the prediction data to be associated with the task/record. The structure of this dictionary should align with the expected format for predictions in your project based on the setup configuration, and it may include fields such as labels, polygon_labels, confidence scores, etc., and additional metadata.

Returns:

None: This method does not return a value. Instead, it adds the prediction to the specified task and prints a success or failure message to indicate the result of the

Example

task_id = 67890
prediction = {
    "label": "Positive",
    "confidence": 0.95,
    "additional_info": "Predicted based on sentiment analysis model."
}

# Assume 'my_annotation_project' is an instance of AnnotationProject
# Create a prediction for the specified task
my_annotation_project.create_prediction(task_id, prediction)

AnnotationProject.create_annotation(task_id, annotation)¶

Create an annotation for a task in the annotation project.

Parameters:

task_id (int): The unique identifier of the task that the annotation will be associated with. This ID must match an existing task within the specified project.
annotation (Dict): A dictionary containing the details of the annotation to be added to the task. The structure of this dictionary should reflect the required fields and values for your annotation, such as labels, coordinates for bounding boxes, or any relevant metadata.

Returns:

None: This method does not return a value. Instead, it adds the annotation to the specified task and prints a success or failure message indicating the result of the operation.

Example

task_id = 67890
annotation = {
    "label": "Cat",
    "bounding_box": {
        "x": 50,
        "y": 30,
        "width": 100,
        "height": 80
    },
    "confidence": 0.97
}

# Assume 'my_annotation_project' is an instance of AnnotationProject
# Create an annotation for the specified task
my_annotation_project.create_annotation(task_id, annotation)

AnnotationProject.attach_annotations_config(annotations_config)¶

Attaches a predefined annotations configuration to the annotation project.

This method links an AnnotationsConfig instance to the current project. The AnnotationsConfig defines how annotations should be generated, potentially using LLMs, specific prompts, and data transformations. Attaching it to a project enables automated annotation workflows based on that configuration.

Note

The annotations_config parameter must be an instance of AnnotationsConfig. Please refer to the Annotations Config page for details on how to create and save an AnnotationsConfig object using its builder.

Parameters:

annotations_config (AnnotationsConfig): An instance of AnnotationsConfig that has been previously created and saved. The name attribute of this config is used to identify it in BerryDB.

Returns:

dict: A dictionary containing the response from BerryDB, typically confirming that the configuration has been successfully attached.

Raises:

ValueError: If the annotations_config does not have a name.

Example:

from berrydb import AnnotationsConfig

# Assume 'my_annotation_project' is an instance of AnnotationProject
# and 'berrydb_api_key' is your BerryDB API key.

# First, create and save an AnnotationsConfig
ner_config = (
    AnnotationsConfig.builder()
    .name("my-project-ner-config")
    .input_transform_expression("data.text_content")
    .output_transform_expression("annotations.ner_tags")
    .llm_provider("openai")
    .llm_model("gpt-4o-mini")
    .prompt("Extract named entities: {{input}}")
    .build()
)
ner_config.save(berrydb_api_key) # Save it to BerryDB

# Now, attach this saved config to your annotation project
try:
    my_annotation_project.attach_annotations_config(ner_config)
    print(f"Successfully attached '{ner_config.name}' to project '{my_annotation_project.project_name()}'.")
except Exception as e:
    print(f"Error attaching annotations config: {e}")

AnnotationProject.get_task_data()¶

Retrieves a list of all tasks within the annotation project, along with their corresponding BerryDB document IDs and database names.

This method fetches information for every task in the project. Each task typically represents a single data item that has been imported into the annotation project. The task_id returned for each task can then be used with other methods like retrieve_prediction, create_annotation, or create_prediction to interact with individual tasks.

Returns:

List[Dict[str, any]]: A list of dictionaries, where each dictionary represents a task and contains the following keys:
- ‘task_id’ (int): The unique identifier of the task within the annotation project.
- ‘document_id’ (str): The original identifier of the document in BerryDB from which this task was created.
- ‘database_name’ (str): The name of the BerryDB database from which the original document was sourced.

Raises:

Exception: If there’s an issue fetching the tasks from BerryDB, for example, due to network issues or if the project is not found.

Example:

# Assuming 'my_annotation_project' is an instance of AnnotationProject

try:
    all_tasks_info = my_annotation_project.get_task_data()
    if all_tasks_info:
        print(f"Found {len(all_tasks_info)} tasks in project '{my_annotation_project.project_name()}':")
        for task_info in all_tasks_info:
            task_id = task_info['task_id']

            Example: Retrieve prediction for this task
            try:
                prediction = my_annotation_project.retrieve_prediction(task_ids=[task_id])
                if prediction:
                    print(f"    Prediction for task {task_id}: {prediction}")
            except Exception as pred_e:
                print(f"    Could not retrieve prediction for task {task_id}: {pred_e}")

            # Example: Create an example annotation for this task
            ex_annotation_data = {
                "result": [
                    {
                        "from_name": "label", # Matches your label config
                        "to_name": "text",   # Matches your label config
                        "type": "choices",   # Matches your label config
                        "value": {"choices": ["SomeLabel"]}
                    }
                ]
            }
            try:
               my_annotation_project.create_annotation(task_id, ex_annotation_data)
               print(f"    Successfully created annotation for task {task_id}")
            except Exception as ann_e:
               print(f"    Could not create annotation for task {task_id}: {ann_e}")

    else:
        print(f"No tasks found in project '{my_annotation_project.project_name()}'.")
except Exception as e:
    print(f"Error retrieving task data: {e}")

Annotation Project¶

AnnotationProject Methods¶

`AnnotationProject Methods`¶