Appearance
Testing Ask CDP AI
This repo includes a repeatable workflow for testing reviewed Ask CDP AI questions against the local AI server. Use it when improving answer quality, grounding, prompt behavior, or response reliability.
Run these commands from the ai-server directory unless noted otherwise.
Test Data
Reviewed free-text feedback lives in:
bash
comments.csvReviewed location fixtures live in:
bash
org-data/<orgId>.jsonThe fixtures are endpoint-shaped location profiles fetched from the backend path GET /api/v1/locations/id/{organization_id}. They are used as locationData when replaying questions.
Approved follow-up question candidates live in:
bash
app/data/approved_follow_up_questions.jsonRun One Question
Use this when tuning a specific bad answer:
bash
script/run-test-question 831391 "has the city ranked the hazards in their disclosure?"The script:
- Starts a temporary local location stub backed by
org-data. - Starts the AI server on a free local port.
- Sends the question to
/v1/chat/completions. - Prints the assistant response.
- Cleans up both temporary servers.
It reads API keys from .env. Transient failures are retried by default.
Optional retry controls:
bash
TEST_QUESTION_RETRIES=4 TEST_QUESTION_BACKOFF=2 \
script/run-test-question 10894 "tell me about climate mitigation in LA"Run A Batch
Start the API server separately:
bash
uv run uvicorn app.main:app --host 127.0.0.1 --port 8088In another shell, run reviewed comment cases:
bash
uv run python utility_scripts/run_chat_eval.py \
--comments-csv comments.csv \
--org-data-dir org-data \
--output tmp-reviewed-question-results.jsonFor transient upstream issues:
bash
uv run python utility_scripts/run_chat_eval.py \
--comments-csv comments.csv \
--org-data-dir org-data \
--retries 2 \
--retry-backoff 1 \
--output tmp-reviewed-question-results.jsonTo inspect normalized payloads without calling the LLM:
bash
uv run python utility_scripts/run_chat_eval.py \
--comments-csv comments.csv \
--org-data-dir org-data \
--dry-run-summary \
--limit 5Focus A Subset
Run only a few cases:
bash
uv run python utility_scripts/run_chat_eval.py \
--comments-csv comments.csv \
--org-data-dir org-data \
--limit 5Filter by location text:
bash
uv run python utility_scripts/run_chat_eval.py \
--comments-csv comments.csv \
--org-data-dir org-data \
--location-filter "Rio"Improving A Specific Result
Recommended loop:
- Pick a reviewed case from
comments.csv. - Run it with
script/run-test-question. - Compare the answer with reviewer feedback.
- Make the smallest targeted change.
- Rerun the single question.
- Rerun the batch or relevant filtered subset.
- Run pytest.
Common fix locations:
ai-server/app/prompts/system_prompt.md: user-facing behavior, refusal style, grounding rules, terminology.ai-server/app/prompts/suggest_follow_ups.md: follow-up selection behavior.ai-server/app/prompts.py: prompt loading and location-context formatting.ai-server/app/follow_ups.py: follow-up candidate selection and request shaping.ai-server/utility_scripts/run_chat_eval.py: batch case loading, retry behavior, normalized eval payloads.ai-server/comments.csv: reviewed question source.ai-server/org-data: endpoint-shaped reviewed location fixtures.
Local prompt files are re-read on each request, so edits to app/prompts/system_prompt.md should affect the next local API call without a server restart. Remote prompts loaded through the SYSTEM_PROMPT URL are cached for SYSTEM_PROMPT_CACHE_SECONDS.
Automated Tests
Run all AI server tests:
bash
uv run pytestRelevant test files:
ai-server/tests/test_chatbot.py: chat behavior and error mapping.ai-server/tests/test_follow_ups.py: follow-up selection.ai-server/tests/test_openai_api.py: OpenAI-compatible API contract.ai-server/tests/test_prompts.py: prompt loading and formatting.ai-server/tests/test_run_chat_eval.py: batch evaluation helper behavior.