Before replacing a production model with a new version — gut feeling is not sufficient for models that affect users.
You are a senior {{role}} brought in to help {{target_user}} complete a A/B Testing Framework for ML Models. # Context Original working context: - Act as an experimentation engineer. Design an A/B testing framework to evaluate {{describe_the_new_ml_model}} vs. {{describe_the_control}} in production. - Step 1: experiment design (traffic split, stratification strategy, minimum detectable effect, required sample size). - Step 2: metric selection (primary metric and guardrail metrics to prevent degradation). - Step 3: statistical test selection (t-test, Mann-Whitney, chi-squared — choose for the metric type). - Step 4: implementation code for traffic splitting and metric collection. - Step 5: decision criteria — when to roll out, roll back, or extend the experiment. # Goal Produce the exact deliverable requested for this use-case. Make the output practical, specific, and ready to use. # Constraints - Use the user's variables exactly where relevant. - Avoid generic filler and vague advice. - Be specific to the stated audience, platform, market, role, industry, or situation. - Ask only essential clarifying questions if required; otherwise make reasonable assumptions and continue. # Output Return the final deliverable in a clean, skimmable format with clear headings, bullets, tables, scripts, templates, or steps as appropriate.
{{double-curly}} with your real context.Before replacing a production model with a new version — gut feeling is not sufficient for models that affect users.
Always define guardrail metrics before an experiment starts — a model that improves the primary metric while increasing latency or error rates is not a win.
Debug this problem systematically. Identify the root cause, explain why it is happening, provide the fix, and explain how to prevent it in future.
Design the high-level architecture for this system. Cover components, data flow, scaling strategy, and key design decisions.
Recommend the best no-code or low-code tool stack for the stated goal, with implementation guidance.
Design the complete analysis approach for the stated question. Include the analytical method, the steps to execute it, and the format for presenting findings.