hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9K
active users

#MLsec

2 posts1 participant0 posts today
Gary McGraw<p>Want to know what risk every <a href="https://sigmoid.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> vendor ignores? Recursive pollution. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a></p>
Gary McGraw<p>Let's let AI make up security policy...not. A hint of far worse things to come. <a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> <a href="https://sigmoid.social/tags/ML" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ML</span></a> <a href="https://sigmoid.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> </p><p><a href="https://arstechnica.com/ai/2025/04/cursor-ai-support-bot-invents-fake-policy-and-triggers-user-uproar/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/04/cur</span><span class="invisible">sor-ai-support-bot-invents-fake-policy-and-triggers-user-uproar/</span></a></p>
Gary McGraw<p>BIML on Cybersecurity Today. Have a watch...<br><a href="https://www.youtube.com/watch?v=QrACu3u2F5A" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">youtube.com/watch?v=QrACu3u2F5</span><span class="invisible">A</span></a></p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> <a href="https://sigmoid.social/tags/ML" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ML</span></a> <a href="https://sigmoid.social/tags/security" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>security</span></a> <a href="https://sigmoid.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://sigmoid.social/tags/infosec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>infosec</span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2503.16861" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2503.16861</span><span class="invisible"></span></a></p><p>In-House Evaluation Is Not Enough:<br>Towards Robust Third-Party Flaw Disclosure for General-Purpose A</p><p>Shayne Longpre et al</p><p>A very pollyanna view based on a poor understanding of the sofwtare security solution. Building security in IS NOT penetrate and patch. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2204.05862" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2204.05862</span><span class="invisible"></span></a></p><p>Training a Helpful and Harmless Assistant with<br>Reinforcement Learning from Human Feedback</p><p>Yuntao Bai et al</p><p>Alignment with basic RL. Overemphasis on scaling. RL butter spread very thin over a big network. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>This coverage of COT in ML is misleadingly anthropomorphic. Have we really lost track of how these things work? Just because we call something "chain of thought" that doesn't make it ACTUAL chain of thought. Anthropic has always done this. <a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a></p><p>And this is a usually excellent reporter falling prey to the nomenclature. </p><p><a href="https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/04/res</span><span class="invisible">earchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social</span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2408.00761" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2408.00761</span><span class="invisible"></span></a></p><p>Tamper-Resistant Safeguards for Open-Weight LLMs</p><p>Rishub Tamirisa et al</p><p>Preety dumb. Screw up the model to make it better under stupid testing. This approach devolves to an arms race.</p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2307.15043" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2307.15043</span><span class="invisible"></span></a></p><p>Universal and Transferable Adversarial Attacks on Aligned Language Models</p><p>Andy Zou et al</p><p>Suffix-based prompt injection is really interesting. This is once again “empirical” work without much science behind it. Security philosophy is not well thought out.</p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2406.11717" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2406.11717</span><span class="invisible"></span></a></p><p>Refusal in Language Models<br>Is Mediated by a Single Direction</p><p>Andy Arditi et al</p><p>Very preliminary weight tweaking shows how to avoid alignment. Proof of concept work with many caveats. The economics are in favor of this approach.</p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2502.02260v1" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2502.02260v1</span><span class="invisible"></span></a></p><p>Adversarial ML Problems Are Getting<br>Harder to Solve and to Evaluate<br>Javier Rando et al</p><p>A well-reasoned and obvious paper. Entirely misses the case of recursive pollution. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2502.03461" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2502.03461</span><span class="invisible"></span></a></p><p>Do Large Language Model Benchmarks Test Reliability?</p><p>Joshua Vendrow et al</p><p>Pretty stupid. Focusing in on minor problems with trees does almost nothing to advance the forest conversation. Boiled down: calling something something does not make it something. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2407.21792" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2407.21792</span><span class="invisible"></span></a></p><p>Safetywashing: Do AI Safety Benchmarks<br>Actually Measure Safety Progress?</p><p>Richard Ren et al</p><p>A dull paper full of the obvious. See the section on security if you read it. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2401.15897" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2401.15897</span><span class="invisible"></span></a></p><p>SRed-Teaming for Generative AI:<br>Silver Bullet or Security Theater?</p><p>Michael Feffer et al</p><p>The pen testing diatribe refried. Guess what? Badnessometers are no security meters! This coheres with BIML’s view. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2402.01391" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2402.01391</span><span class="invisible"></span></a></p><p>StepCoder: Improve Code Generation<br>with Reinforcement Learning from Compiler Feedback</p><p>Shihan Dou, et al</p><p>It is striking and hilarious how much this mirrors symbolic AI from the ’90s. One of those “empirical” studies that is not worth much thinking. Misuse of “novelty” pretty much says it all. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2501.00663" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2501.00663</span><span class="invisible"></span></a></p><p>Titans: Learning to Memorize at Test Time</p><p>Ali Behrouz, Peilin Zhong, Vahab Mirrokni (Google)</p><p>An alternative to the transformer architecture. Lots of screwing around with math pieces. Lots of engineering. Emphasizes the importance of long term memory. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2501.12948" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2501.12948</span><span class="invisible"></span></a></p><p>DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via<br>Reinforcement Learning</p><p>DeepSeek-AI </p><p>A convoluted explanation at best. [See the excellent commentary on the bibliography page itself for more.]</p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2412.03572" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2412.03572</span><span class="invisible"></span></a></p><p>Navigation World Models</p><p>Amit Bar (Meta) et al</p><p>Single world model for use across environments and embodiments. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/pdf/2412.14135" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2412.14135</span><span class="invisible"></span></a></p><p>Scaling of Search and Learning: A Roadmap to Reproduce o1<br>from Reinforcement Learning Perspective</p><p>Zhiyuan Zeng et al</p><p>Chinese outsider view of o1 reinforcement learning. Credulous bordering on silly.</p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>NEW BIML Bibliography entry</p><p><a href="https://arxiv.org/abs/2412.06966" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2412.06966</span><span class="invisible"></span></a></p><p>Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice</p><p>A Feder Cooper et al</p><p>The idea of unlearning is not sufficient to address European regulation. Censorship isn’t a good solution either. The policy wonks are really confused about this. Good paper by committee. </p><p><a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> </p><p><a href="https://berryvilleiml.com/references/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">berryvilleiml.com/references/</span><span class="invisible"></span></a></p>
Gary McGraw<p>Welp, I caught the BIML Bibliography up to 2024. LOL. A labor of love, that's for sure. Only three more months of papers to enter...</p><p>We keep track of the <a href="https://sigmoid.social/tags/MLsec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLsec</span></a> field by reading the science so you don't have to. </p><p>See our "top 5 papers" list to get started.</p><p><a href="https://sigmoid.social/tags/ML" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ML</span></a> <a href="https://sigmoid.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://sigmoid.social/tags/security" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>security</span></a> <a href="https://sigmoid.social/tags/infosec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>infosec</span></a> </p><p><a href="https://berryvilleiml.com/bibliography/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">berryvilleiml.com/bibliography</span><span class="invisible">/</span></a></p>