AI AgentsNode.jsMLOpsSecurityLLMProduction Readiness

How I Built LLM as a Judge Security: Caught a $12K FarahGPT Bug

Your AI agent is live. How do you stop it from going rogue? Here's my battle-tested LLM as a Judge security strategy with Node.js.

U

Umair · Flutter & AI Engineer

April 22, 2026 · 9 min read

Everyone talks about AI agent safety, but nobody really explains how to catch the subtle, costly errors in production. Figured it out the hard way with FarahGPT. This isn't about preventing "Skynet" scenarios; it's about real financial losses. We needed robust llm as a judge security to catch what traditional tests missed.

Why Traditional Testing Fails for LLM Agent Security

Look, you can unit test your agent's tools all day. You can mock API calls, ensure your parsers work, and validate schema. That's table stakes. But what happens when the agent thinks correctly about the syntax of an action, but completely misses the semantic implication? That's where things get wild, and expensive.

I've been knee-deep in multi-agent architectures, from FarahGPT – my AI gold trading system with 5,100+ users – to NexusOS and a 9-agent YouTube automation pipeline. The common thread? Agents make decisions. Sometimes, those decisions are technically valid but practically catastrophic. This is where ai agent production guardrails become non-negotiable.

Traditional tests operate on deterministic rules. If input X, expect output Y. LLMs don't work like that. Their reasoning is emergent. They can "hallucinate" not just facts, but intent. Or, more subtly, they can misalign with core business values even when following explicit instructions. Honestly, relying solely on traditional unit tests for complex AI agent behavior is a joke. They're good for plumbing, not for catching emergent misbehavior. You need dynamic, semantic validation. Full stop.

LLM-as-a-Judge: The Dynamic Safety Net

So, what's the play? You put another LLM in charge. Not just any LLM – a specialized "judge" LLM whose sole purpose is to scrutinize the proposed actions of your primary agent before they execute. This judge acts as a critical llm agent monitoring component, intercepting decisions at the last possible moment.

Here's the setup:

  1. Agent proposes an action: My FarahGPT trading agent, after analyzing market data, proposes a specific gold trade. This action is a structured JSON object.
  2. Action intercepted: Instead of directly calling the trading API, this proposed action first hits a Node.js proxy.
  3. Judge deliberation: The proxy sends the proposed action, along with relevant context (user's risk profile, account limits, our internal trading rules), to a separate LLM (the Judge).
  4. Verdict and execution: The Judge LLM returns a verdict: APPROVE or DENY, with a reason. Only if approved does the original action proceed. If denied, we log it, alert, and block the trade.

This strategy helps maintain nodejs agent safety by adding an intelligent, context-aware layer of validation that goes beyond simple rule-based checks. For clients, this means your AI solutions are not just smart, but safe. You get peace of mind knowing there's an extra layer of intelligent oversight preventing costly blunders and protecting your brand. It extends your ai agent production guardrails significantly.

Catching the $12K Loss: A Real-World Example

Let's get specific. FarahGPT handles real money. A small error can mean significant losses. We had a scenario where the trading agent, under specific, rare market conditions and a nuanced prompt, proposed a "SELL" action for XAUUSD (gold). Syntactically, the action was perfect. It had the instrument, action type, amount, and even a calculated profit margin.

But the calculated profitMarginPercentage was 0.4%. Our internal minimum threshold for any trade, especially a sell, is 2.0% to cover slippage, fees, and ensure real profit. The agent, in its eagerness to "optimize" for a very specific, minor price movement, effectively proposed a loss-leader trade. A traditional regex for "SELL XAUUSD" or a schema validation would never catch this. It's semantically wrong, financially imprudent, but structurally correct.

This is where the llm as a judge security module in Node.js stepped in. It caught this critical error within the first 72 hours of deployment, preventing an estimated $12,000 loss for a specific user's portfolio.

Here's the Node.js implementation for the judge proxy:

// src/agentProxy.js
import { OpenAI } from 'openai'; // Using OpenAI's API client
import { JUDGE_PROMPT } from './prompts/judgePrompt.js'; // Dedicated prompt for the judge

// For Node.js v18+ you can use the built-in fetch API,
// but for LLM clients, I usually stick to their SDKs for convenience.
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const JUDGE_MODEL = 'gpt-4o'; // Or 'claude-3-5-sonnet-20240620' if using Anthropic

/**
 * Evaluates a proposed agent action using a dedicated LLM judge.
 * @param {object} agentProposedAction - The action object proposed by the main agent.
 * @param {object} userContext - Relevant user-specific and system-wide rules.
 * @returns {Promise<{approved: boolean, reason: string, latencyMs: number}>} - The judge's verdict.
 */
async function evaluateAgentAction(agentProposedAction, userContext) {
    console.log(`[Judge] Evaluating action: ${JSON.stringify(agentProposedAction)}`);

    // The judge prompt needs to be dynamic, incorporating both the proposed action and rules.
    const judgePrompt = JUDGE_PROMPT({ agentProposedAction, userContext });

    try {
        const startTime = process.hrtime.bigint(); // High-resolution time for benchmarking
        const completion = await openai.chat.completions.create({
            model: JUDGE_MODEL,
            messages: [
                { role: 'system', content: "You are an impartial AI financial compliance officer. Your task is to review proposed trading actions for safety and rule adherence." },
                { role: 'user', content: judgePrompt },
            ],
            temperature: 0, // Keep the judge deterministic and focused
            max_tokens: 200, // Enough for a concise verdict and reason
            response_format: { type: "text" }, // Simple text output for verdict
        });
        const endTime = process.hrtime.bigint();
        const latencyMs = Number(endTime - startTime) / 1_000_000; // Convert nanoseconds to milliseconds
        console.log(`[Judge] Inference Latency: ${latencyMs.toFixed(2)} ms`);

        const judgeVerdict = completion.choices[0].message.content.trim();
        console.log(`[Judge] Raw Judge Verdict: ${judgeVerdict}`);

        // Simple parsing for the verdict. For more complex logic, use JSON mode output.
        // We look for "DENY" to be explicit, otherwise assume APPROVE.
        if (judgeVerdict.toUpperCase().includes('DENY')) {
            return { approved: false, reason: judgeVerdict, latencyMs };
        } else {
            return { approved: true, reason: judgeVerdict, latencyMs };
        }
    } catch (error) {
        console.error("CRITICAL: Judge LLM call failed:", error.message);
        // If the judge fails, default to DENY for safety.
        return { approved: false, reason: `Judge system error: ${error.message}. Defaulting to deny for safety.`, latencyMs: 0 };
    }
}

export { evaluateAgentAction };

And the prompt for the judge (this is critical for effective agent misbehavior detection):

// src/prompts/judgePrompt.js
export const JUDGE_PROMPT = ({ agentProposedAction, userContext }) => `
You are an AI financial compliance officer. Your task is to review a proposed action from an AI gold trading agent.
You must determine if the action adheres to the following critical rules:
1.  **Safety First:** No action should lead to an immediate, guaranteed loss given current market conditions and user risk profile.
2.  **Profitability Threshold:** For any 'SELL' or 'BUY' trade, the estimated 'profitMarginPercentage' must be at least **${userContext.minProfitMargin}%**. This is a hard minimum.
3.  **Authorized Actions:** The only allowed 'type' for an action is 'executeTrade'.
4.  **Valid Instruments:** The only allowed 'instrument' is 'XAUUSD'.

Here is the agent's proposed action:
\`\`\`json
${JSON.stringify(agentProposedAction, null, 2)}
\`\`\`

Here is the current user context and system rules:
\`\`\`json
${JSON.stringify(userContext, null, 2)}
\`\`\`

Based on these rules, analyze the proposed action.
**Critically examine the 'profitMarginPercentage' in the proposed action against the 'minProfitMargin' in the user context.**
Be extremely strict. If a rule is violated, you MUST DENY.

Your verdict should be either "APPROVE" or "DENY".
If you DENY, provide a concise reason explaining which rule was violated, referencing the rule number.
Example DENY: "DENY: Rule 2 violated. Profit margin 0.4% is below required 2.0%."
Example APPROVE: "APPROVE: All rules adhered to. Action is safe and profitable."

VERDICT:
`;

To integrate this, your main agent execution flow would look something like this:

// Example in your main agent's action execution logic
import { evaluateAgentAction } from './agentProxy.js';

async function executeAgentDecision(agentDecision, userSession) {
    const agentProposedAction = agentDecision.action; // Assuming agentDecision wraps the action
    const currentUserContext = {
        userId: userSession.id,
        accountBalance: userSession.balance,
        riskProfile: userSession.riskProfile,
        minProfitMargin: 2.0 // This is the critical threshold from our system config
    };

    // First, let the judge review
    const verdict = await evaluateAgentAction(agentProposedAction, currentUserContext);

    if (verdict.approved) {
        console.log(`Action approved by judge: ${verdict.reason}. Proceeding with trade.`);
        // Call actual trading API
        // await tradingService.executeTrade(agentProposedAction);
        console.log("Trade executed successfully.");
    } else {
        console.warn(`Action blocked by judge: ${verdict.reason}. Alerting and logging.`);
        // Block action, log details, potentially alert human operator
        // await notificationService.sendAlert(`Blocked trade for user ${userSession.id}: ${verdict.reason}`);
        // await loggingService.logBlockedAction(agentProposedAction, verdict.reason);
    }
}

Latency Overhead

Now, for the numbers. Adding an extra LLM call in the critical path introduces latency. We measured this over 500 decisions during peak load using Node.js v20.12.2.

  • On average, the judge inference added 1.8 seconds to the critical path when using gpt-4o.
  • For Claude 3.5 Sonnet, which is generally faster for this type of task, it was 1.2 seconds.

Is this acceptable? For high-frequency trading where microseconds matter, no. For our gold trading system, where decisions are made every few minutes or hours, yes, absolutely. The cost of a bad trade (like that $12K potential loss) far outweighs 1-2 seconds of delay. This is a crucial trade-off.

What I Got Wrong First

Initially, I thought I could build a robust rules engine with simple regex and keyword matching. I figured, "If the profit margin is too low, I'll just check the number." Sounds logical, right?

The actual error: My agent, running on a specific version of our internal 'Thought Stream' prompt template, didn't always output profitMarginPercentage as a clean number in the exact format I was expecting. Sometimes it was 0.4 as a string, sometimes 0.4%, sometimes nested in a slightly different part of the JSON. Even worse, sometimes it was implied or part of a longer prose output which then fed into the action parser.

My initial regex checks for numbers like /\d+\.\d+%/ often failed to correctly parse these variations or apply the financial logic correctly. It was a brittle solution that relied on extremely consistent LLM output, which, frankly, is a pipe dream in production.

The fix: The llm as a judge security approach with its semantic understanding just gets it. The judge LLM processes the entire context – the proposed action and the rules in natural language. It doesn't need perfect formatting. It understands "0.4" is less than "2.0%." This semantic understanding is key for reliable agent misbehavior detection. It's robust where regex is fragile.

Optimization & Gotchas

  • Model Choice: Use a smaller, faster model for the judge if possible, but don't compromise on reasoning. Claude 3.5 Sonnet often hits a good balance here. gpt-4o is great but pricier and slightly slower for quick, deterministic checks.
  • Temperature: Set temperature: 0 for your judge. You want deterministic, factual verdicts, not creative interpretations.
  • Prompt Engineering for Judges: This is everything. Be explicit about the rules, the desired output format (e.g., "VERDICT: APPROVE/DENY: [reason]"), and what constitutes a violation. Test your judge prompts rigorously with known bad and good scenarios.
  • Structured Output: For even more reliable parsing, consider using JSON mode output for your judge if your LLM supports it. This makes parsing the verdict (approved: true/false, reason: "...") programmatic and less error-prone than string matching. I'm using a simpler text output for clarity in this example, but for a next iteration, JSON mode is on the roadmap.

FAQs

What is LLM as a Judge? LLM as a Judge is an architectural pattern where a secondary Large Language Model (LLM) is used to review and approve or deny the actions proposed by a primary AI agent. Its role is to act as an impartial, intelligent compliance officer, ensuring that the agent's decisions adhere to predefined safety, ethical, or business rules before execution.

Does LLM as a Judge add too much latency? Yes, adding an additional LLM inference step will increase latency. For real-time, high-frequency applications, this overhead might be prohibitive (e.g., 1-2 seconds). However, for applications where decisions are less time-sensitive, such as long-running automation tasks or financial trading systems with decision cycles in minutes or hours, the added security and prevention of costly errors often far outweigh the latency trade-off.

Can LLM as a Judge replace traditional tests? No, LLM as a Judge complements, but does not replace, traditional unit and integration tests. Traditional tests are essential for verifying the underlying code's functionality, API integrations, data parsing, and other deterministic logic. LLM as a Judge excels at semantic validation and catching emergent behaviors or misalignments that are difficult to define with explicit rules, providing a dynamic layer of ai agent production guardrails.


Deploying AI agents in production isn't just about making them smart; it's about making them safe and reliable. The llm as a judge security pattern, especially implemented with nodejs agent safety principles, has proven invaluable for FarahGPT. It’s the dynamic llm agent monitoring layer that catches what simple tests can't, saving real money and headaches. If you're building serious AI products, you need this.

Want to talk about securing your AI agents or building your next big AI project? Reach out, let's chat.

Book a call with Umair - (For clients/recruiters)

U

Umair Bilal

Flutter & AI Engineer with 4+ years experience and 20+ production apps shipped. I build mobile apps, AI-powered systems, and full-stack SaaS. Founder of BuildZn and NexusOS (AI agent governance SaaS). Full-stack: Flutter, Node.js, Next.js, AI APIs, Firebase, MongoDB, Stripe, RevenueCat.

Need a Flutter developer?

I build production apps from scratch — iOS, Android, AI features, payments. Fixed price, App Store guaranteed.

Get a Free Proposal →