Prompts
We will be working with the API for the National Renewable Energy Laboratory (NREL). This API allows us to find nearby alternative fuel stations. To use this API, we need to provide our latitude and longitude and some information about our vehicle. Trying to make this user friendly, we might want to put an AI in front to translate free text into the information we need.
Using AI models like Claude, we can get coordinates based on an address as well as parse information out on our vehicle. For example if we told it: "I'm near the The Art Institute of Chicago and driving a Kia EV9, what are my coordinates and my vehicle type?", it will provide an answer that may sound like this:
Your coordinates will depend on your exact location near The Art Institute of Chicago, but its approximate coordinates are 41.8796° N, 87.6237° W. If you're nearby, your latitude and longitude should be close to these.
Your vehicle type is a Kia EV9, which is a fully electric SUV. Let me know if you need parking suggestions or other assistance nearby!
This is helpful, but has a couple of issues. First, it requires the user to know what to include for our backend API. We don't want to count on the user to include that information. Also, the response we get back will be hard to parse, and we cannot guarantee that it will always be in the same format.
Prompts
To get around this, we can employ prompt engineering. Prompt engineering is the process of crafting effective prompts to guide AI models toward generating accurate, relevant, or desired outputs. When including the output of AI models in data pipelines, one of the most important things is to ensure the output is in a format we expect. In this case, we know the information we want back from the AI model. Instead of a free text response, it would be helpful to have something easier to parse, like JSON, only containing the fields we need to make the API call.
We can write a prompt that tells the models our desired outcome and provide an example. In our case, a prompt might look like this:
PROMPT_LOCATION = """
Given a location and vehicle, return the latitude (as a decimal, range -90 to 90)
and longitude (as a decimal, range -180 to 180) and fuel type of the vehicle
(enum 'ELEC', 'BD' or 'all').
If the location cannot be found return the status a zero. Electric vehicle map to 'ELEC',
biodiesel to 'BD', anything else should be marked as 'all'.
Return everything as a JSON object.
<example>
Input: I'm at 1600 Pennsylvania Avenue NW, Washington, DC 20500 with a Tesla Model 3
Output:
{{
    'latitude': 38.8977,
    'longitude': -77.0365,
    'fuel_type': 'ELEC',
}}
</example>
Input: {location}
"""
Now we can use this prompt with Claude. Within our Dagster asset (user_input_prompt) we can use the AnthropicResource to easily interact with the Anthropic client. We will also want to include a run configuration for the asset so we can reuse this same pipeline with slightly different inputs. Finally, since we can ensure the response format from Claude with our prompt engineering, we can define a more specific output for the asset. Using Pydantic, we can define the exact schema we expect.
class UserInputSchema(BaseModel):
    latitude: float
    longitude: float
    fuel_type: Literal["all", "ELEC", "BD"]
class InputLocation(dg.Config):
    location: str
@dg.asset(
    kinds={"anthropic"},
    description="Determine location and vehicle type from an input",
)
def user_input_prompt(
    context: dg.AssetExecutionContext,
    config: InputLocation,
    anthropic: AnthropicResource,
) -> UserInputSchema:
    prompt = PROMPT_LOCATION.format(location=config.location)
    with anthropic.get_client(context) as client:
        resp = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        )
    message = resp.content[0].text
    schema = UserInputSchema(**json.loads(message))
    context.log.info(schema)
    return schema
Looking at the final asset, you can see the pieces working in unison. We combine the input from the run configuration into our prompt, which returns a result we can assume is JSON. Then we can unpack that JSON into our UserInputSchema schema to get further validation that our result matches what we expected.
Prompt engineering also gives us the benefit of providing the context so the user no longer has to. Now the prompt can be shortened to something like: "I'm near the The Art Institute of Chicago and driving a Kia EV9" which will result in:
latitude=41.8796 longitude=-87.6237 fuel_type='ELEC'
Next steps
- Continue this example with custom resource