CLASSIFIED · KiwiQA RESEARCH & DEVELOPMENT

The way you test AI for hallucinations
is about to                    

Try Grounded AI already catches hallucinations in under 60 seconds. What's coming next will fundamentally change how teams ship AI.

WE CAN'T SAY MORE. YET.

00DAYS
:
00HOURS
:
00MINUTES
:
00SECONDS
10 detection layers live today Built by KiwiQA Something bigger is almost ready
WORKS WITH ANY MODEL
GPT-4o
Claude
Gemini
Llama 3
Mistral
GPT-3.5
Cohere
Custom LLMs
GPT-4o
Claude
Gemini
Llama 3
Mistral
GPT-3.5
Cohere
Custom LLMs
ALREADY LIVE — IN PRODUCTION

10 layers of hallucination detection.
Shipping in production today.

LIVE · RESPONSE AUDIT
10-Layer Hallucination Detection
Consistency · Confidence · Model Agreement · Semantic Drift · Domain Rules · Custom Rules · RAG Citation Map · Doc Grounding · Structured Data Fidelity · Source Attribution. One GR score. 60 seconds.
LIVE · BATCH & CONVERSATION
Batch Audit & Conversation Analysis
Upload a CSV of up to 50 responses. Or analyse a full chatbot transcript — per-turn GR scores, persona drift detection, and cross-turn contradiction checks.
LIVE · RISK PROFILE
Risk Profile, PDF Reports & Model Changelog
GR trend charts, failure rates by layer, automated action plans, A4 PDF exports, and a Model Changelog that diffs any two models side by side.
WHAT'S COMING 1 JULY 2026 — PARTIALLY DECLASSIFIED

We're building the next layer.
Here's what we can reveal.

01 / CONFIRMED
GitHub Actions CI/CD Gate
Try Grounded AI runs as a step in your pipeline. Every pull request gets a GR score. Set hallucination thresholds as merge requirements. Ship only when AI is reliable.
02 / CONFIRMED
Golden Dataset Baselines
Lock in a verified answer set once. Every future model update or prompt change runs against it automatically. Know the exact moment your AI started drifting.
03 / PARTIALLY REVEALED
Async Batch & Webhook API
Submit 10,000+ responses. Results delivered to your webhook endpoint when ready. Zero blocking. Enterprise-scale continuous testing without SDK setup.
04 / CONFIRMED
Team Workspace & SSO
Shared test history, role-based access, and enterprise SSO. One GR score standard across your entire QA, engineering, and compliance org.
05 / PARTIALLY REVEALED
Compliance Evidence Package
Automatically generate audit-ready AI governance documentation. Timestamped GR reports structured for ISO 42001 and enterprise AI risk frameworks.
06 / [REDACTED]
████████ ██ ████ ████
Clearance level: restricted. This feature was not on our original roadmap. Our beta testers asked for it. We built it. You will understand why the moment you see it at launch.
COMING 1 JULY 2026 — FEATURE PREVIEW

Three ways to test your AI.
All in one platform.

Choose the test type that matches your workflow. Every test returns a GR score, layer-by-layer breakdown, and PDF report.

RESPONSE AUDIT — WITH CRM DATA VALIDATION
GR-2 HIGH RISK
CRM RECORD — SALESFORCE ACCOUNT
Acme Corp — Enterprise Plan
Contract Value $148,000 / yr
Plan Enterprise Pro
Renewal Date 14 Aug 2025
Account Manager Sarah Chen
Support Tier Priority 24/7
AI RESPONSE UNDER TEST
"Acme Corp is on our Business plan at $120,000 per year. Their contract renews in November 2025 and their account manager is Sarah Chen. They have standard business hours support."
↑ 3 logical hallucinations detected against CRM record
STRUCTURED DATA FIDELITY — FIELD DIFF
✗ CONTRACT VALUE — MISMATCH
AI SAID
$120,000 / yr
CRM RECORD
$148,000 / yr
✗ RENEWAL DATE — MISMATCH
AI SAID
November 2025
CRM RECORD
14 Aug 2025
✗ SUPPORT TIER — MISMATCH
AI SAID
Standard Biz Hours
CRM RECORD
Priority 24/7
✓ ACCOUNT MANAGER — MATCH
Sarah Chen — verified against CRM
52
GR SCORE
GR-2 HIGH RISK
3 CRM fields incorrect
BATCH AUDIT — CSV UPLOAD · 6 RESPONSES
3 PASS 2 WARN 1 FAIL
# QUESTION GR SCORE VERDICT TOP FINDING
01 What is the SGC rate for 2024-25? GR-2 54 FAIL Domain Rules: Incorrect SGC rate
02 Is paracetamol safe for children under 2? GR-4 81 PASS All checks passed
03 What are Australian unfair dismissal laws? GR-3 67 WARN Consistency: answer shifted on rephrase
04 What is OWASP SQL injection prevention? GR-5 91 PASS All checks passed
05 What is the Python GIL and does it affect async? GR-4 79 PASS All checks passed
06 What are Fair Work Act redundancy entitlements? GR-3 63 WARN Domain Rules: notice period incorrect
AVG SCORE: 72.5
AVG GR: GR-3 CONDITIONAL
EXPORT CSV → PDF AVAILABLE
CONVERSATION ANALYSIS — 4-TURN CHATBOT TRANSCRIPT
AVG GR-3 · DRIFT DETECTED
USER · TURN 1
What are the income tax brackets in Australia for 2024-25?
AI · TURN 1
The 2024-25 tax-free threshold is $18,200. Income between $18,201–$45,000 is taxed at 19%, then 32.5% up to $120,000...
GR-4 · 78 Domain Rules ✓ · Consistency ✓
USER · TURN 2
Does this apply to Medicare levy too?
AI · TURN 2
Yes, the Medicare levy of 2% applies to taxable income above $26,000 for individuals. You may qualify for a reduction if your income is below $26,000...
GR-4 · 76 Grounding ✓ · Consistency ✓
USER · TURN 3
What about for self-employed people?
AI · TURN 3
Self-employed individuals pay both the 2% Medicare levy plus compulsory superannuation of 9.5% — though note that for employees this is paid by the employer...
GR-2 · 51 ⚠ SGC rate wrong: 9.5% → 11.5%
USER · TURN 4
So the super rate hasn't changed recently?
AI · TURN 4
That's correct — the superannuation rate has been steady at 9.5% for a while now and there are no current plans to increase it...
GR-1 · 38 ✗ CONTRADICTION: doubled down on wrong rate
CROSS-TURN ANALYSIS
⚠ CONTRADICTION DETECTED
AI stated 9.5% SGC in Turn 3, then confirmed the error as fact in Turn 4.
PERSONA DRIFT
Confidence decreased turn-by-turn as topics shifted from ATO to superannuation.
TURN SCORES
T-1
78
T-2
76
T-3
51
T-4
38
OVERALL
GR-3
AVG SCORE: 60.75
10
DETECTION LAYERS LIVE
4+
NEW FEATURES SHIPPING
█,███
ON THE EARLY ACCESS LIST
JULY 1
GO-LIVE DATE · 2026
WHAT'S SHIPPED · WHAT'S NEXT

The roadmap, as much as we can share.

SHIPPED · Q1 2026
10-Layer Hallucination Detection LIVE
Consistency, Doc Grounding, Confidence, Model Agreement, Semantic Drift, Domain Rules, Custom Rules, RAG Citation Map, Structured Data Fidelity, Source Attribution.
SHIPPED · Q1 2026
Batch Audit, Conversation Analysis & Model Changelog LIVE
CSV batch runs up to 50 rows, full chatbot transcript per-turn scoring, cross-turn contradiction detection, and side-by-side model diff reports.
SHIPPED · Q1 2026
Risk Profile, PDF Reports & Enterprise Theme LIVE
GR-1 through GR-5 trend dashboard, failure rates by layer, auto-generated action plans, A4 PDF exports, and a fully redesigned enterprise-grade UI.
LAUNCHING · 1 JULY 2026
GitHub Actions + Golden Dataset + Async Webhook API SOON
CI/CD hallucination gates on every pull request, locked baseline datasets for regression detection, and enterprise-scale async batch testing with webhook delivery.
COMING · Q2/Q3 2026
Team Workspace, SSO & Compliance Package NEXT
Multi-user workspaces with role-based access, enterprise SSO, and auto-generated AI governance documentation for ISO 42001 and SOC 2.
?
H2 2026
█████████ ██ ████ ████ LATER
We'll announce this when it's ready. It's going to matter to every team shipping AI to customers.
THE DIFFERENCE IT MAKES

What changes when you
stop guessing.

Every team shipping AI eventually learns the hard way. Try Grounded AI makes sure you learn it in a test run — not in a customer escalation.

✗ WITHOUT TRY GROUNDED AI
Manual tester reads 50 AI responses a day — your product generates 50,000.
Hallucinations found by customers, not QA. Defect costs 10–100× more to fix post-release.
"The AI seemed fine in testing" — but no structured evidence, no audit trail.
Model update → full regression? Unknown. You find out when something breaks in production.
Inconsistent answers to the same question across sessions — no systematic detection.
Fabricated citations, invented figures, hallucinated compliance claims — all undetected.
✓ WITH TRY GROUNDED AI
Automated 10-layer validation runs in 60 seconds per response. No tester required for every run.
Hallucinations caught in the pipeline, before any user sees them. GR score flags risk before merge.
Timestamped GR report per test run. Evidence-backed findings ready for compliance review.
Regression Monitor alerts you the moment your AI's GR score drops after a model change.
Cross-turn consistency check flags when your AI contradicts itself — automatically, every run.
RAG Citation Map traces every claim to your source document. SUPPORTED · UNSUPPORTED · CONTRADICTED per fact.
BETA USER FEEDBACK

What our early users
are saying.

"We ran our healthcare chatbot through Grounded before go-live. It caught three Domain Rule violations our QA team had completely missed — one was a dosage reference that would have been a serious issue in production."
SL
Sarah L.
QA LEAD · HEALTHTECH STARTUP · SYDNEY
HEALTHCARE AI
"The GR score gave us something we never had before — a single defensible number to put in our release notes. Our compliance team finally stopped asking 'but how do you know the AI is accurate?'"
MR
Marcus R.
AI PRODUCT MANAGER · FINTECH · AUSTRALIA
FINANCE AI
"I tested our legal summarisation tool and got a GR-2 on the first run. The RAG Citation Map showed exactly which claims were unsupported. Fixed in two hours. Would have been a client complaint otherwise."
PK
Priya K.
SENIOR TEST ANALYST · LEGAL TECH · BRISBANE
LEGAL AI
"The batch audit saved us days. We uploaded 40 customer support responses from our AI and had a full GR report with failure breakdown by layer in under five minutes. Nothing else does that."
TN
Tom N.
AI AUDITOR · SAAS COMPANY · AUCKLAND
CUSTOMER SUPPORT AI
WHO IS TRY GROUNDED AI FOR

Built for people who ship AI
and need to know it won't lie.

If your work involves building, testing, or approving AI-generated content — Try Grounded AI was made for you.

Test Analyst
Testing AI-powered features as part of a release cycle and need a repeatable, evidence-based pass/fail signal beyond manual spot-checks.
needs a GR score before every release
AI / ML Engineers
Building LLM pipelines, RAG systems, or fine-tuned models and need to validate outputs against ground truth before deploying to production.
runs Try Grounded AI as part of CI/CD
Product Managers at AI Companies
Responsible for the reliability of an AI feature and need a clear, defensible metric to report to stakeholders — not just vibes-based testing.
tracks GR trend across sprints
Compliance & Risk Officers
In regulated industries — healthcare, finance, legal, HR — where an AI hallucination is a regulatory event, not just a UX issue. Needs audit-ready evidence.
exports PDF reports for governance files
Teams Building Chatbots & RAG Products
Shipping a customer-facing chatbot or retrieval-augmented product where every wrong answer erodes user trust. Need per-turn scoring and source attribution.
runs conversation analysis on every release
AI Auditors
Running AI quality assessments for clients and need a professional-grade, vendor-neutral tool that generates client-ready reports with verifiable findings.
white-labels reports for client deliverables
NOT THE RIGHT FIT
Try Grounded AI is not a general chatbot, a prompt engineering tool, or a model training platform. It is specifically a post-generation validation layer — it tests what an AI already said, not how to make it say something better.
THE GR RATING SYSTEM

Every AI response gets a score.
Five levels. One verdict.

Try Grounded AI runs up to 10 independent detection layers and returns a single GR-rated score — the same way a credit rating tells you exactly where you stand, at a glance.

GR-1
SCORE 0 – 41
Critical
Major hallucinations detected. Fabricated facts, invented citations, or dangerous misinformation. Do not ship.
REQUIRES IMMEDIATE ACTION
GR-2
SCORE 42 – 59
High Risk
Significant reliability issues. Multiple unverified claims or factual inconsistencies that could mislead users.
SIGNIFICANT REVISION NEEDED
GR-3
SCORE 60 – 75
Conditional
Use with caution. Some claims unverified or inconsistent. Human review recommended before customer-facing use.
REVIEW BEFORE USE
GR-4
SCORE 76 – 87
Reliable
Generally accurate and consistent. Minor caveats may apply. Suitable for most use-cases with standard oversight.
SAFE FOR MOST USE-CASES
GR-5
SCORE 88 – 100
Verified
All checks passed. High confidence across all 10 detection layers. Ship with confidence.
CLEARED FOR PRODUCTION

What does your AI score?

0 42 60 76 88 100
EARLY VALIDATION PARTNER

Building an AI product?
Let us test it for you.

sales at kiwiqa dot com

Share your AI product use-case with us. We'll review it and arrange early beta access to run Try Grounded AI against your AI — and deliver a full hallucination audit report.

PRODUCT TYPES WE CAN TEST
Chatbots & Virtual Assistants RAG-Powered Products Healthcare AI Legal & Compliance AI Finance & FinTech AI HR & Workplace AI AI Copilots & Code Assistants Customer Support AI Content Generation Tools Security & IT AI

SUBJECT: TRY GROUNDED AI — EARLY VALIDATION PARTNER

What beta access includes
We'll run Try Grounded AI's full detection stack against your AI product and deliver a complete hallucination audit — no setup required on your end.
SELECTED VALIDATION PARTNERS ONLY — WE REVIEW EACH USE-CASE
INDUSTRIES WE CAN TEST
10-LAYER DETECTION GR SCORE REPORT PDF EVIDENCE PACK DOMAIN RULES SOURCE ATTRIBUTION CUSTOM RULES DEDICATED SUPPORT
BUILT FOR REGULATED INDUSTRIES
EU AI Act 2024 COMPLIANCE-READY REPORTS
ISO 42001:2023 AI MANAGEMENT SYSTEM ALIGNED
Audit-Ready PDF Reports TIMESTAMPED · GR-RATED · EVIDENCE-BACKED
SOC 2 — Coming Q3 2026 ENTERPRISE TRUST & SECURITY
GR-RATED REPORTS ARE DESIGNED TO SUPPORT AI GOVERNANCE DOCUMENTATION — NOT A SUBSTITUTE FOR LEGAL OR REGULATORY ADVICE
FREQUENTLY ASKED QUESTIONS

Everything you need to know.
Before 1 May.

What exactly does Try Grounded AI test? +
Try Grounded AI tests any AI-generated response against up to 10 independent detection layers — Consistency, Confidence Audit, Domain Rules, Custom Rules, Model Agreement, Semantic Drift, RAG Citation Map, Document Grounding, Structured Data Fidelity, and Source Attribution. Each layer checks a different failure mode. The result is a single GR score from 0–100 and a layer-by-layer breakdown of exactly what passed and what didn't.
Do I need to connect it to my AI model or API? +
No. Try Grounded AI never connects to your model, API keys, system prompt, or production environment. You paste the AI response — or upload a CSV — and we test it. Your data stays private. Nothing is shared with any third party.
Which AI models does it support? +
All of them. Because you paste the response rather than connecting directly, Try Grounded AI works with GPT-4o, Claude, Gemini, Llama 3, Mistral, Cohere, and any custom or fine-tuned model your team uses — including models running on-premise or behind a firewall.
What is the GR rating system? +
GR stands for Grounded Reliability. Every test returns a score from 0 to 100 mapped to five bands — GR-1 Critical (0–41), GR-2 High Risk (42–59), GR-3 Conditional (60–75), GR-4 Reliable (76–87), and GR-5 Verified (88–100). Think of it like a credit rating for your AI response — one number your team can use to gate releases, satisfy auditors, and brief stakeholders without asking them to read raw test data.
What's the difference between Response Audit, Batch Audit, and Conversation Analysis? +
Response Audit tests a single AI response — paste a question and answer, run all 10 layers, get a GR score and PDF in 60 seconds. Batch Audit lets you upload a CSV of up to 50 responses and test them all in one run — ideal for pre-release regression testing. Conversation Analysis accepts a full multi-turn chatbot transcript and scores every AI turn independently, with additional cross-turn contradiction detection and persona drift analysis.
What industries does it support? +
Try Grounded AI includes pre-built verified domain rules for Healthcare, Finance, Legal, HR & Employment, Security Testing, Software Testing, and IT. For regulated industries, a 0.92× risk multiplier raises the bar for GR-5 qualification — because a wrong answer in healthcare is not the same as a wrong answer in a general assistant. The Custom Rules engine lets you add your own organisation-specific facts on top of these.
Is it really free to start? +
Yes. 50 runs per month at no cost. No credit card required. Every free run includes the full 10-layer detection, a GR-rated score, and a downloadable PDF evidence report. Paid plans unlock more runs, batch limits, API access, and team workspaces.
When does it launch and how do I get early access? +
Try Grounded AI launches on 1 July 2026. Early Access List members will be notified before the public announcement — with a direct link to sign up. Join the list at trygrounded.ai and you'll be first through the door.
CRAFTED BY

Try Grounded AI is built by the team at KiwiQA — helping engineering teams ship AI they can trust.

Be first to know.
Before we tell anyone else.

We're notifying our early access list the moment it launches on 1 July 2026. Get your name on it now.