Hachyderm.io

1 post1 participant0 posts today

GiskardPhare is developed by Giskard with Google DeepMind, the European Commission and Bpifrance as research & funding partners.👉 Full analysis: <a href="https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms</a> Benchmark results: <a href="https://phare.giskard.ai" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://phare.giskard.ai</a><a href="https://fosstodon.org/tags/AISecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AISecurity</a> <a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMBenchmark</a> <a href="https://fosstodon.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>

GiskardThe replay of our session at Forum INCYBER Europe (FIC) is now online 🎬Watch our CTO present the initial Phare results - our multilingual and independent LLM benchmark that evaluates hallucination, factual accuracy, bias, and harm potential.The session features Matteo Dora and Elie Bursztein (Google DeepMind).Full recording linked below 👇<a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMBenchmark</a> <a href="https://fosstodon.org/tags/AISecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AISecurity</a> <a href="https://fosstodon.org/tags/ForumINCYBER" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ForumINCYBER</a> <a href="https://fosstodon.org/tags/Research" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Research</a>

Giskard✨ Announcing Phare: new multi-lingual <a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMBenchmark</a> 🌊We're announcing an open & independent LLM benchmark to evaluate key AI security dimensions including hallucination, factual accuracy, bias, and potential for harm across several languages, with Google DeepMind as research partner.Phare (Potential Harm Assessment & Risk Evaluation) will cover leading models from the top 7 AI labs in English, French, and Spanish, and will evaluate models across four dimensions: 👇

Recent searches

Search options

Administered by:

Server stats:

#llmbenchmark