Dagster & Chroma
The Chroma library allows you to easily interact with Chroma's vector database capabilities to build AI-powered data pipelines in Dagster. You can perform vector similarity searches, manage schemas, and handle data operations directly from your Dagster assets.
Installation
- uv
- pip
uv add dagster-chroma
pip install dagster-chroma
Example
import os
from dagster_chroma import ChromaResource, HttpConfig, LocalConfig
import dagster as dg
@dg.asset
def my_table(chroma: ChromaResource):
    with chroma.get_client() as chroma_client:
        collection = chroma_client.create_collection("fruits")
        collection.add(
            documents=[
                "This is a document about oranges",
                "This is a document about pineapples",
                "This is a document about strawberries",
                "This is a document about cucumbers",
            ],
            ids=["oranges", "pineapples", "strawberries", "cucumbers"],
        )
        results = collection.query(
            query_texts=["hawaii"],
            n_results=1,
        )
defs = dg.Definitions(
    assets=[my_table],
    resources={
        "chroma": ChromaResource(
            connection_config=LocalConfig(persistence_path="./chroma")
            if os.getenv("DEV")
            else HttpConfig(host="192.168.0.10", port=8000)
        ),
    },
)
About Chroma
Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. It provides a simple API for storing and querying embeddings, documents, and metadata. Chroma can be used to build semantic search, question answering, and other AI-powered applications. The database can run embedded in your application or as a separate service.