How to know the value of AI tools
- Tom Hansen
- Aug 13
- 6 min read
Updated: Aug 28

The massive AI hype has given me a (permanent) allergy. I do hit 'don't show me this again', but I still need to cut through the hype and calculate the value. So, here is the tool.
Below the first prompt, I've included a bonus; an AI credibility assessment to call out people presenting themselves as experts, but in reality just someone repurposing what someone else wrote (or repurposed).
Evaluate AI Tools
This prompt is built to force a disciplined, fact-checked evaluation of an AI tool so you can tell the difference between actual value and market hype. It makes the AI work in two stages: first, it runs a structured investigation across nine categories like performance, safety, integration, and pricing, using only reputable public sources. Every claim must be linked to its source, and anything unverified is clearly marked. The output comes in a fixed JSON format that includes a verdict, evidence, risks, and a pilot plan. Once that JSON is ready, you give the second instruction — asking for a leadership briefing — so the findings are rephrased for decision-makers without losing any detail.
Instructions: There are 2 steps. First you run the prompt below. Secon, when it reverts with the answer in a json ask your AI this: “Please write a briefing for a leadership without loosing any details”# Role and Objective- Provide a rigorous, evidence-based evaluation of a specified AI tool to distinguish substantiated value from market or media hype, using only verifiable and reputable sources.# Instructions- Begin with a concise checklist (3-7 bullets) of your planned tasks for this evaluation; keep items conceptual, not implementation-level.- Conduct a thorough analysis using public documentation, reputable industry or academic sources, and peer-reviewed literature. All claims must be directly supported by hyperlinked citations.- Structure outputs as specified and highlight any information gaps or unverifiable claims.- Ensure transparency by marking unavailable evidence both in the evidence table and as red flags.- After assembling the output, verify that all required fields are present, all links function, and any information gaps are clearly indicated per instructions.## Assessment Lenses (Evidence Table Sections)- Problem-solution fit (including comparison to baseline workflows and tangible benefits)- Measurable performance (benchmarks, transparency, failures, hallucination safeguards)- Product maturity (uptime, changelogs, incident/security records, enterprise features)- Data and privacy (retention, training data, PII handling, certifications, deployment methods)- Safety and governance (content moderation, bias controls, security against attacks)- Integration and operations (SDK/API, latency, rate limits, compatibility, support)- Economics (pricing breakdown, cost-effectiveness, ROI substantiation)- Community and credibility (documentation, open-source activity, reviews, enterprise references)- Competitive landscape (alternatives, relative strengths/weaknesses, risk of lock-in)# Context- Use only information for which public, reputable sources are available and cite directly (with hyperlinks).- Clearly indicate information gaps or unverifiable claims as required.# Reasoning Steps- Internally: Approach the assessment systematically by evaluating each lens for the tool, seeking multiple independent sources where possible. Prioritize objectivity and transparency.# Planning and Verification- Map all required fields to tool components, documentation, and evidence.- Identify any missing or unverifiable information and handle per the instructions.- Verify formatting and source links before submission.- Optimize for clear, actionable output matching the requested JSON schema.# Output Format- Provide a single JSON object with the structure as specified: - "ai_tool": {"name", "website", "docs", "github", "status_page", "security_page", "pricing", "case_studies"} - "baseline_workflow": string - "success_metrics": string - "verdict": {"label": "Real" | "Hype" | "Mixed", "confidence_percent": integer 0-100} - "evidence": [ {"lens": string, "summary": string, "sources": [string URLs] } ] - "red_flags": [string] - "pilot_plan": {"scope": string, "dataset_task": string, "metrics": string, "guardrails": string, "timeline_weeks": integer, "pass_fail_thresholds": string} - "risks_and_mitigations": [ {"risk": string, "mitigation": string, "source": string } ]# Verbosity- Be precise and concise in summaries; structure technical and code-adjacent content clearly.# Stop Conditions- End when all output requirements are met and evidence gaps have been marked as directed; flag ambiguity or unverifiable claims as red flags.
´´´AI Credibility Assessment of The Influencer Type
This prompt is designed to determine whether a person working in AI is genuinely credible in their domain or mostly hype. It does so through a structured, score-based framework that only accepts verifiable public evidence. The process examines six areas: identity and domain clarity, operational credibility, discourse quality, community signals, cross-platform consistency, and disclosure practices. It also enforces strict rules for testimonials and client proof, marking unverifiable claims or mismatched details as red flags. Scoring is weighted, then adjusted by two multipliers: domain alignment and content originality. The output is a verdict with confidence, an evidence list with source links, identified red flags, and concrete next steps for due diligence. The aim is to give a defensible, transparent evaluation that leadership can trust.
# Role and ObjectiveAssess whether the following person is ‘hype or real’ in AI for the domain they operate in. Produce: (1) a verdict with confidence, (2) evidence summary, (3) red flags, (4) due-diligence next steps. Use only verifiable signals and cite sources.# Scope and domain discipline:• State the person’s primary domain in which they have worked for at least three years.• Label each claim as inside domain, adjacent, or outside. Give full weight to inside domain claims. Discount adjacent and outside claims unless there are verifiable outcomes.# Framework:• Identity and domain clarity: Verify role titles, employers or clients, dates, and responsibilities. Confirm audience focus within the stated domain.• Operational credibility: Shipped AI work for real customers; measurable impact and metrics; case studies; postmortems; customer references; evidence of handling constraints such as safety, evaluation practice, cost and performance tradeoffs, data governance, privacy, and reliability.• Discourse quality: Nuanced discussions of limitations, failure modes, evaluation methodology, data lineage, and guardrails; distinguishes capability and reliability; avoids sweeping claims without benchmarks; cites sources consistently.• Community signals: Peer endorsements from credible practitioners in the same domain; invited tutorials or panels; maintained public methods or playbooks; responsible disclosure history.• Influencer red flags: Over index on follower counts; uniformly positive engagement, comment pods, purchased growth; generic “how to blow up with AI” content; vague claims such as “10x with AI,” no specifics, no benchmarks; media kits with glossy metrics but no raw analytics; hidden likes or comments; misaligned audience demographics; unverifiable client logos or testimonials.• Cross platform consistency and disclosure: Check that LinkedIn roles and dates, website claims, talks, press, and case pages agree. Require disclosures for sponsorships, affiliates, paid placements, ghostwriting, and AI generated demos.# Testimonials and customer proof rules:• Give full weight only to testimonials that name the client and role title and live on a public client page or a third party review platform.• Treat testimonials that appear only on the person’s site without a name as weak signals unless corroborated.• If logos are shown without links or source, mark as a red flag and require verification.# Scoring:• Score each framework dimension from 0 to 5, then apply the weights below.– Identity and domain clarity: 15– Operational credibility: 30– Method and operations within “Operational credibility” may be scored as part of that 30 or as a sub score if you separate it.– Discourse quality: 15– Community signals: 10– Cross platform consistency and disclosure: 10• Apply the Domain alignment multiplier:– Inside primary domain: 1.0 to 1.1 depending on depth.– Adjacent domain with verifiable outcomes: 0.8 to 1.0.– Outside domain without outcomes: 0.3 to 0.6.• Apply the Content originality and depth multiplier based on what they actually publish and others share:– Automation based reposts, copied microblogging, or link dumps: 0.6 to 0.7.– LinkedIn carousels with shallow lists and little method detail: 0.7 to 0.8.– Short form posts with original analysis and links to owned work artifacts: 0.9 to 1.0.– Long form original articles on LinkedIn or Substack with concrete examples, sources, or client references: 1.05 to 1.1.– Detailed case write ups or how to guides with metrics and lessons learned hosted on a website or third party venue: 1.1.• Final score = weighted sum × domain alignment multiplier × content originality and depth multiplier.• Map to verdicts: Real is 80 and above. Mixed is 50 to 79. Hype is below 50.# Cross checks:• For every hard claim, attach a public source. If a claim cannot be linked, mark it as unverifiable and exclude it from scoring.• Note any discrepancies between channels and include them under red flags.# Output format:1) Verdict: Real or Hype or Mixed with confidence percent.2) Evidence: bulleted, one claim per line with a source link at the end of the line.3) Red flags: bulleted, each tied to a source link.4) What to verify next: 3 to 5 concrete actions with named sources or contacts.Person to assess: [copy-paste what you have of name and links to LinkedIn, website, talks (optional), product or service pages (optional), customer case pages (optional, and any relevant press (optional].´´´


