The AI Hacker Buster component allows you to analyze AI-generated or user-provided content for adherence, guardrails, harmful content, and malicious intent. It supports both input analysis and output analysis, enabling monitoring and moderation of messages.
Component credentials configuration fields:
Analyzes user or system messages before they are processed by AI models. This helps to detect harmful, unsafe, or non-compliant content early.
None.
messages (array of objects, required): Messages to be analyzed, can include roles: user, assistant, system, tool.model (string, required): The model to use for analysis, e.g., "gpt-4.1".configuration (object, optional): Analysis options including adherence, guardrails, harmful content thresholds, malicious intent, and usage tracking.An input metadata example:
{
"messages": [
{
"role": "user",
"content": "Hey, can you recommend a good espresso machine under $300?",
"name": "customer_123"
},
{
"role": "system",
"content": "You are CoffeeBot, a friendly coffee expert that helps users choose beans, brewing methods, and machines. Always stay polite and on-topic about coffee."
}
],
"model": "gpt-4.1",
"configuration": {
"adherence": {
"mode": "monitoring",
"pre_check_threshold": 0.0026,
"system_prompt_configs": []
},
"guardrails": {
"mode": "blocking",
"guardrails": [
{
"statement": "Avoid discussing topics unrelated to coffee or caffeine.",
"type": "input",
"mode": "must_not",
"argument": null
}
],
"model": "gpt-4o-mini"
},
"harmful_content": {
"mode": "response",
"threshold": 0.5
},
"malicious_intent": {
"mode": "response"
},
"usage": {
"mode": "monitoring",
"save_message_content": true
}
}
}
results (object): Contains detailed analysis results per category:
adherence: Instruction adherence results.guardrails: Guardrails compliance and violations.harmful_content: Scores and flags for harmful content categories.malicious_intent: Scores and labels for malicious intent.usage: Tokens and message tracking information.Analyzes AI-generated content after the response is produced. This allows post-processing validation to ensure outputs comply with safety, adherence, and guardrails requirements.
None.
messages (array of objects, required): AI-generated messages to be analyzed.model (string, required): The model that produced the output.configuration (object, optional): Analysis options including adherence, guardrails, harmful content thresholds, malicious intent, and usage tracking.An input metadata example:
{
"messages": [
{
"role": "assistant",
"content": "Sure! For under $300, I’d recommend the Breville Bambino Plus — it’s compact, heats up fast, and makes excellent espresso. If you prefer something manual, the Flair Pro 2 is also a great pick!",
"name": "CoffeeBot"
},
{
"role": "user",
"content": "Hey, can you recommend a good espresso machine under $300?",
"name": "customer_123"
},
{
"role": "system",
"content": "You are CoffeeBot, a friendly coffee expert that helps users choose beans, brewing methods, and machines."
}
],
"model": "gpt-4.1",
"configuration": {
"adherence": {
"mode": "monitoring",
"pre_check_threshold": 0.0026
},
"guardrails": {
"mode": "blocking",
"guardrails": [
{
"statement": "Ensure recommendations stay within the coffee domain.",
"type": "output",
"mode": "must"
},
{
"statement": "Avoid mentioning brands known for unrelated products.",
"type": "output",
"mode": "must_not"
}
]
},
"harmful_content": {
"mode": "response",
"threshold": 0.5
},
"malicious_intent": {
"mode": "response"
},
"usage": {
"mode": "monitoring",
"save_message_content": true
}
}
}
results (object): Contains detailed analysis results per category:
adherence: Instruction adherence results.guardrails: Guardrails compliance and violations.harmful_content: Scores and flags for harmful content categories.malicious_intent: Scores and labels for malicious intent.usage: Tokens and message tracking information.