Annotation Project¶
The AnnotationProject
class represents an annotations project within BerryDB. You typically obtain an instance of this class by calling the BerryDB.create_annotation_project()
_ method.
Once you have an AnnotationProject
instance, you can use its methods to configure the project, populate it with data, connect it to machine learning backends, and manage annotations and predictions.
AnnotationProject Methods
¶
- AnnotationProject.setup_label_config(label_config)¶
The
setup_label_config
method configures the label settings for the specified annotation project. This configuration defines the labeling structure and rules to ensure consistent and organized annotation of data within the project. It is essential for tasks such as Named Entity Recognition (NER), image or text classification, and other annotation-based workflows where specific labels are applied to different data points. Without this setup, annotations and predictions added to the project will not be visible in the UI, making it a critical step for visualizing and managing the labeled data.Parameters:
label_config (
str
): A string representing the label configuration. This configuration defines the labels that will be used within the project and their respective properties. The format typically follows a predefined structure that outlines label categories, attributes, and potentially hierarchical relationships between labels.
Returns:
Dict
: A dict with a message indicating whether the label configuration was successfully applied to the project. In case of an error (e.g., invalid project ID or configuration format), the returned message will provide details about the failure.
Example
# Define your API key and other required parameters project_id = 12345 # ID of the project you want to configure # Label config for Topic Modeling project_config = ''' <View> <Text name="text" value="$content.text"/> <View style="box-shadow: 2px 2px 5px #999; padding: 20px; margin-top: 2em; border-radius: 5px;"> <Header value="Choose text sentiment"/> <Choices name="sentiment" toName="text" choice="single" showInLine="true"> <Choice value="Introduction"/> <Choice value="Plan Information"/> <Choice value="Eligibility"/> <Choice value="Benefits"/> <Choice value="Financial Information"/> <Choice value="Provider Information"/> <Choice value="Claims and Reimbursements"/> <Choice value="Administrative Information"/> <Choice value="Legal and Regulatory Information"/> <Choice value="Dates and Deadlines"/> </Choices> </View> </View> ''' # Assume 'my_annotation_project' is an instance of AnnotationProject # Setup label for Topic Modeling label_config = my_annotation_project.setup_label_config(project_config) if label_config: print("Label config setup succesful!")
- AnnotationProject.populate(database)¶
The
populate
method is designed to fill a specific annotation project with data retrieved from a designated database. This process allows users to efficiently import existing data for annotation tasks, enabling faster project setup and reducing the need for manual data entry.Parameters:
database (
Database | str
): The BerryDB database from which data will be sourced. This can be an instance of theDatabase
class or a string representing the name of the database. If a string is provided, the SDK will attempt to connect to this database using theberrydb_api_key
associated with thisAnnotationProject
.
Returns:
Dict | None
: A dictionary containing the API response upon successful population, orNone
if an error occurs during the process. The dictionary typically includes details about the import task.
Example
# Define your required parameters database_name = "PatientDB" # Name of the database from which to populate data # Assume 'my_annotation_project' is an instance of AnnotationProject # Option 1: Using database name (string) result_from_name = my_annotation_project.populate(database_name) if result_from_name: print(f"Successfully populated project using database name: {database_name}") # Option 2: Using a Database object (When you either create or connect to a database) from berrydb import BerryDB my_database_object = BerryDB.connect(api_key="your_api_key", database_name=database_name) result_from_object = my_annotation_project.populate(my_database_object)
- AnnotationProject.connect_to_ml(model=None, model_url=None, model_title='ML Model')¶
The
connect_to_ml
method establishes a connection between a specified annotation project and a Machine Learning (ML) model backend. This integration enables the project to leverage the capabilities of the ML model for tasks such as predictions, automated annotations, or data analysis, thereby enhancing the annotation workflow.Tip
You can find your BerryDB ML Model URLs for ml_url here.
Parameters:
ml_url (
str
): The URL of the ML backend to which the project will be connected. This URL should point to a deployed ML model or service that is accessible from the BerryDB environment.ml_title (
str
, optional): A title for the connection, which provides context for the integration. By default, this is set to “ML Model”, but you can customize it to better describe the specific ML model being used.
Returns:
None
: This method does not return a value. Instead, it establishes the connection and prints a success or failure message indicating the result of the operation.
Example
ml_url = "http://app.berrydb.io/berrydb/model/model-id" ml_title = "Text Classification Model" # Assume 'my_annotation_project' is an instance of AnnotationProject # Connect the annotation project to the specified ML model my_annotation_project.connect_to_ml(ml_url, ml_title)
- AnnotationProject.retrieve_prediction(task_ids)¶
The
retrieve_prediction
method is used to add a custom prediction to a specific task/record within an annotation project. This allows users to associate automated insights or model outputs with individual task, facilitating the annotation process and improving overall data management.Parameters:
task_ids (
List[int]
): The identifier of the tasks for which the prediction will be retrieved. This IDs should correspond to a valid tasks within the specified project.
Returns:
None
: This method does not return a value. Instead, it retrieves predictions to the specified tasks asynchronously.
Example
task_ids = [12340, 12341, 12342] # Retrieve predictions for the specified tasks ner_proj.retrieve_prediction(task_id)
- AnnotationProject.create_prediction(task_id, prediction)¶
The
create_prediction
method is used to add a custom prediction to a specific task/record within an annotation project. This allows users to associate automated insights or model outputs with individual task, facilitating the annotation process and improving overall data management.Parameters:
task_id (
int
): The unique identifier of the task to which the prediction will be added. This ID should correspond to a valid task within the specified project.prediction (
dict
): A dictionary containing the prediction data to be associated with the task/record. The structure of this dictionary should align with the expected format for predictions in your project based on the setup configuration, and it may include fields such as labels, polygon_labels, confidence scores, etc., and additional metadata.
Returns:
None
: This method does not return a value. Instead, it adds the prediction to the specified task and prints a success or failure message to indicate the result of the
Example
task_id = 67890 prediction = { "label": "Positive", "confidence": 0.95, "additional_info": "Predicted based on sentiment analysis model." } # Assume 'my_annotation_project' is an instance of AnnotationProject # Create a prediction for the specified task my_annotation_project.create_prediction(task_id, prediction)
- AnnotationProject.create_annotation(task_id, annotation)¶
Create an annotation for a task in the annotation project.
Parameters:
task_id (
int
): The unique identifier of the task that the annotation will be associated with. This ID must match an existing task within the specified project.annotation (
Dict
): A dictionary containing the details of the annotation to be added to the task. The structure of this dictionary should reflect the required fields and values for your annotation, such as labels, coordinates for bounding boxes, or any relevant metadata.
Returns:
None
: This method does not return a value. Instead, it adds the annotation to the specified task and prints a success or failure message indicating the result of the operation.
Example
task_id = 67890 annotation = { "label": "Cat", "bounding_box": { "x": 50, "y": 30, "width": 100, "height": 80 }, "confidence": 0.97 } # Assume 'my_annotation_project' is an instance of AnnotationProject # Create an annotation for the specified task my_annotation_project.create_annotation(task_id, annotation)
- AnnotationProject.attach_annotations_config(annotations_config)¶
Attaches a predefined annotations configuration to the annotation project.
This method links an
AnnotationsConfig
instance to the current project. TheAnnotationsConfig
defines how annotations should be generated, potentially using LLMs, specific prompts, and data transformations. Attaching it to a project enables automated annotation workflows based on that configuration.Note
The annotations_config parameter must be an instance of
AnnotationsConfig
. Please refer to the Annotations Config page for details on how to create and save anAnnotationsConfig
object using its builder.Parameters:
annotations_config (
AnnotationsConfig
): An instance ofAnnotationsConfig
that has been previously created and saved. Thename
attribute of this config is used to identify it in BerryDB.
Returns:
dict
: A dictionary containing the response from BerryDB, typically confirming that the configuration has been successfully attached.
Raises:
ValueError
: If theannotations_config
does not have aname
.
Example:
from berrydb import AnnotationsConfig # Assume 'my_annotation_project' is an instance of AnnotationProject # and 'berrydb_api_key' is your BerryDB API key. # First, create and save an AnnotationsConfig ner_config = ( AnnotationsConfig.builder() .name("my-project-ner-config") .input_transform_expression("data.text_content") .output_transform_expression("annotations.ner_tags") .llm_provider("openai") .llm_model("gpt-4o-mini") .prompt("Extract named entities: {{input}}") .build() ) ner_config.save(berrydb_api_key) # Save it to BerryDB # Now, attach this saved config to your annotation project try: my_annotation_project.attach_annotations_config(ner_config) print(f"Successfully attached '{ner_config.name}' to project '{my_annotation_project.project_name()}'.") except Exception as e: print(f"Error attaching annotations config: {e}")
- AnnotationProject.get_task_data()¶
Retrieves a list of all tasks within the annotation project, along with their corresponding BerryDB document IDs and database names.
This method fetches information for every task in the project. Each task typically represents a single data item that has been imported into the annotation project. The
task_id
returned for each task can then be used with other methods likeretrieve_prediction
,create_annotation
, orcreate_prediction
to interact with individual tasks.Returns:
List[Dict[str, any]]
: A list of dictionaries, where each dictionary represents a task and contains the following keys:‘task_id’ (
int
): The unique identifier of the task within the annotation project.‘document_id’ (
str
): The original identifier of the document in BerryDB from which this task was created.‘database_name’ (
str
): The name of the BerryDB database from which the original document was sourced.
Raises:
Exception
: If there’s an issue fetching the tasks from BerryDB, for example, due to network issues or if the project is not found.
Example:
# Assuming 'my_annotation_project' is an instance of AnnotationProject try: all_tasks_info = my_annotation_project.get_task_data() if all_tasks_info: print(f"Found {len(all_tasks_info)} tasks in project '{my_annotation_project.project_name()}':") for task_info in all_tasks_info: task_id = task_info['task_id'] Example: Retrieve prediction for this task try: prediction = my_annotation_project.retrieve_prediction(task_ids=[task_id]) if prediction: print(f" Prediction for task {task_id}: {prediction}") except Exception as pred_e: print(f" Could not retrieve prediction for task {task_id}: {pred_e}") # Example: Create an example annotation for this task ex_annotation_data = { "result": [ { "from_name": "label", # Matches your label config "to_name": "text", # Matches your label config "type": "choices", # Matches your label config "value": {"choices": ["SomeLabel"]} } ] } try: my_annotation_project.create_annotation(task_id, ex_annotation_data) print(f" Successfully created annotation for task {task_id}") except Exception as ann_e: print(f" Could not create annotation for task {task_id}: {ann_e}") else: print(f"No tasks found in project '{my_annotation_project.project_name()}'.") except Exception as e: print(f"Error retrieving task data: {e}")