hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

8.9K
active users

#llmbenchmark

1 post1 participant0 posts today
Giskard<p>Phare is developed by Giskard with Google DeepMind, the European Commission and Bpifrance as research &amp; funding partners.</p><p>👉 Full analysis: <a href="https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">giskard.ai/knowledge/good-answ</span><span class="invisible">ers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms</span></a> <br>Benchmark results: <a href="https://phare.giskard.ai" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">phare.giskard.ai</span><span class="invisible"></span></a></p><p><a href="https://fosstodon.org/tags/AISecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISecurity</span></a> <a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMBenchmark</span></a> <a href="https://fosstodon.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a></p>
Giskard<p>The replay of our session at Forum INCYBER Europe (FIC) is now online 🎬</p><p>Watch our CTO present the initial Phare results - our multilingual and independent LLM benchmark that evaluates hallucination, factual accuracy, bias, and harm potential.</p><p>The session features Matteo Dora and Elie Bursztein (Google DeepMind).</p><p>Full recording linked below 👇</p><p><a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMBenchmark</span></a> <a href="https://fosstodon.org/tags/AISecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AISecurity</span></a> <a href="https://fosstodon.org/tags/ForumINCYBER" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ForumINCYBER</span></a> <a href="https://fosstodon.org/tags/Research" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Research</span></a></p>
Giskard<p>✨ Announcing Phare: new multi-lingual <a href="https://fosstodon.org/tags/LLMBenchmark" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMBenchmark</span></a> 🌊</p><p>We're announcing an open &amp; independent LLM benchmark to evaluate key AI security dimensions including hallucination, factual accuracy, bias, and potential for harm across several languages, with Google DeepMind as research partner.</p><p>Phare (Potential Harm Assessment &amp; Risk Evaluation) will cover leading models from the top 7 AI labs in English, French, and Spanish, and will evaluate models across four dimensions:<br>👇</p>