Sign in to view Mike’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mike’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Mission Viejo, California, United States
Sign in to view Mike’s full profile
Mike can introduce you to 10+ people at Delphos Labs
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
4K followers
500+ connections
Sign in to view Mike’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mike
Mike can introduce you to 10+ people at Delphos Labs
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mike
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mike’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
4K followers
-
Mike Beaumier reposted thisMike Beaumier reposted this🔍 We’re hiring a Security Researcher & Software Engineer! At Delphos Labs, we’re building AI systems that understand software — using AI itself to reason about binaries, uncover hidden behaviors, and make codebases more transparent and secure. We’re looking for someone who loves the intersection of security research, reverse engineering, and AI-driven automation — someone who can dive deep into compiled code and build tools that scale those insights. 💻 You’ll work on: • Reverse engineering and binary analysis (PE/ELF/Mach-O) • Designing and implementing AI-powered systems for reverse engineering • Building tooling in Python for large-scale security automation • Turning cutting-edge research into production-grade systems 🧠 You are: • A security researcher or software engineer with a hacker’s curiosity • Experienced with low-level systems, RE, or ML-for-security approaches • Excited to build AI that builds AI — systems that make reasoning about software scalable Join us as we redefine how machines understand code. 👉 Apply here: https://lnkd.in/gE-NAkAn #Hiring #AI #SecurityResearch #ReverseEngineering #SoftwareEngineering #MachineLearning #StartupJobs
-
Mike Beaumier posted thisI am stoked to be taking the next step in my career as a Staff Data Scientist at Delphos Labs. I can't wait to make foundational contribtuions to data infrastructure, modeling, and superpower reversing with AI!
-
Mike Beaumier posted thisAfter a fulfilling 7-year journey at Google, I am now embarking on a new chapter, eager to explore fresh opportunities in Data Science and Machine Learning. Want to work with me? Ping me! Reflecting on my technical milestones over the past decade, here's a glimpse into my journey: - PhD in Particle Physics: Utilized data science and machine learning methodologies to analyze massive datasets, unraveling the essence of matter's fundamental properties. - Machine Learning Engineer at Mercedes Research: Crafted probabilistic graphical models for real-time on-device predictive analysis of driver behavior. - Data Science at Google: Spearheaded causal inference projects and deployed cutting-edge machine learning models, pinpointing revenue prospects for Google Ads clientele. - Machine Learning/Data Science at Google: Engineered expansive content classification algorithms safeguarding Google's Ad Revenue & Reputation. Introduced Gemini-powered agents enhancing content classification efficiency, managing 40 thousand reviews daily. Excited for the future and eager to connect for potential opportunities! If you know someone who might be interested, in working with me, please share this post / or my profile! #DataScience #MachineLearning #NewBeginnings
-
Mike Beaumier shared thisSharing my colleague Josh's profile: Josh is an extremely talented engineer who taught me a lot of what I know about using code to solve problems. My immediate team is not hiring, otherwise I'd hire him in a heartbeat. Hoping others in my network have a look!Mike Beaumier shared thisAnyone looking to hire an automation gremlin? - skill: Ansible (kubernetes, container image creation, linux security and admin (UEFI, bootloader, OSes, services, userspace, networking), IT infrastructure configuration, development environment configuration management, Execution Environment management), deprecated skills: (Jenkins, Bash scripting, Windows) - skill: Rust (3 years non-professionally), deprecated skills: (Python, C++) - skill: Physics (Doctorate in Nuclear Physics (petabyte-scale data collection and analysis)) total experience coding: 20+ years (15 professionally) total experience automating at scale: 12+ years (9 professionally) preference for Staff Engineer, Principal Engineer, Research Engineer, or similar (my knowledge far outstrips my actual ability, stereotypical ADHD 'jack of all trades' (?), so I work best teaching/mentoring and collaborating)
-
Mike Beaumier shared thisGood opportunity to hire top data science candidates. I myself was an insight fellow, and can vouch for how effective the program is at providing a platform for PhD researchers to transition into careers in data science. #insightdatascience #datascience #hiredatascienceMike Beaumier shared thisWe’re waiving the Insight hiring fee this summer to surface more job opportunities for Insight Fellows given COVID. We’re doing no payment to hire Insight Fellows until Aug 31, and would love to connect with great teams. If you’re hiring for Data Science, Data Engineering, ML Engineering, DevOps / SRE, Blockchain Engineering or Security roles, we’d love to help you find the right person. Please reach out at: partner@insightfellows.com or visit https://lnkd.in/gKYsQxV
-
Mike Beaumier shared thisThe Insight Fellowship program I participated in is accepting applications for the upcoming January session. #training Learn more and apply at insightdatascience.com/apply by October 22nd: * 7-week, full-time training fellowship * Tuition-free with need-based scholarships available to help cover living costs * Learn cutting-edge open-source technologies with mentorship from industry leaders * Meet with teams from leading companies * Join an active community of over 1600 Insight alumni * Self-directed, project-based learning followed by interviews at top companies #datascience
-
Mike Beaumier reacted on thisProud of how the team showed up at RSAC. Grateful to the security leaders who spent time with us, to the Institute for Humane Studies for co-sponsoring the Machine Stops breakfast, and to LastPass for having Caleb on The Phish Bowl Podcast. Compiled software runs critical systems everywhere. Delphos Labs exists so you can understand what's inside it.Mike Beaumier reacted on thisIf you hear “agentic” three times on the RSAC floor, a startup gets funded. Just kidding. One thing that is true is security is moving from hype to implementation. Filmed live from RSAC, this episode of The Phish Bowl breaks down what’s signal vs. noise in security today. From how AI is actually being used, to why identity keeps emerging as the new perimeter, to why analyzing finished software is key to finding modern vulnerabilities. Featuring Caleb Fenton, Co‑Founder of Delphos Labs. Full episode link in the comments. 🎙️
-
Mike Beaumier reacted on thisMike Beaumier reacted on thisLook at me, I have hot takes and click bait titles. I'm a serious person. https://lnkd.in/gvP_Xu52 I don't think Python is the language for the agentic future. It was great for humans, great to prototype and get started, but why bother when your agent can just write Rust. You're not reading it anyway. It's 100000x faster and if it compiles it mostly works. No more fighting a bolted on type system. Rust might not be the winner in 10 years but it'll be closer to Rust than it is to Python.
-
Mike Beaumier reacted on thisMike Beaumier reacted on thisI had the pleasure and privilege of discussing AI and the future of work with the brilliant Ghada Richani, Lara Kallab, and Romeo Elias at the LebNet LA/OC panel co-hosted with IEEE and USC last month. Check out LebNet's other great events! https://lnkd.in/g_VAksnT@lebnetorg on Instagram: "🚀 Join LebNet LA/OC on February 28th at 2PM for an engaging panel discussion on how AI and data are reshaping industries, redefining careers, and transforming the future of work. Hear from industry leaders sharing real-world insights on emerging trends, evolving skill sets, and how professionals and organizations can prepare for what’s next. In partnership with USC and I@lebnetorg on Instagram: "🚀 Join LebNet LA/OC on February 28th at 2PM for an engaging panel discussion on how AI and data are reshaping industries, redefining careers, and transforming the future of work. Hear from industry leaders sharing real-world insights on emerging trends, evolving skill sets, and how professionals and organizations can prepare for what’s next. In partnership with USC and I
-
Mike Beaumier reacted on thisMike Beaumier reacted on thisHow I thought my job would be: "Ill just rewrite the infrastructure this week with a design that can handle all of our possible future constraints." How its going: "We're going to strategize around this problem together and solve ambiguous problems within the next quarter"
-
Mike Beaumier liked thisMike Beaumier liked thisI wanted to think through the implications of reverse engineering skills getting more accessible with AI and what it means that it's so much easier to take compiled code apart and understand it. So I wrote this: https://lnkd.in/gqPreucP How long until someone takes some expensive closed source software and uses an LLM to "translate" the disassembly to C/C++ or even Rust? And just put it on github or pastebin or something. We're here. It's possible.The Barrier Between Source Code and Compiled Code Has DissolvedThe Barrier Between Source Code and Compiled Code Has Dissolved
-
Mike Beaumier liked thisMike Beaumier liked thisLooking for my next role. Full-time or contract, automation engineer or AI engineer, hybrid or remote out of Washington state. I run DataNova Consulting. Most of the work has been Python automation and Claude Code agent systems: browser automation, data pipelines, background jobs that run without supervision. I want to build things, not talk about building them. andrewtcrooks@gmail.com | https://lnkd.in/gq_yrNNg
-
Mike Beaumier reacted on thisMike Beaumier reacted on thisResearch is one of the most collaborative human endeavors. Every scientific result builds on thousands of others. But the system designed to track this(journals, citations, peer review) was built for a world where papers traveled by mail. What if publishing started from the atomic unit of knowledge, not the paper? Today, a negative result sits in a notebook because there's no journal for it. A calibration trick stays local knowledge. A hypothesis waits months for peer review while someone in another field has the exact missing piece. And when work does get published, matching happens on titles and abstracts: fuzzy, slow, and limited to what keyword search can find. NanoPub is the first application built on our CoreTx cross-domain research intelligence platform. It lets you capture atomic research findings (a method, a negative result, a half-formed hypothesis) as structured Knowledge Objects in 30 seconds, then matches them to complementary work across every field. Three things change: 1. Matching at the level of knowledge, not papers. A single negative result can match to a method that explains it. A calibration trick can find the field that needs it. More precise, more serendipitous, and much faster than waiting for publication. 2. Your knowledge stays yours. With GPT, Gemini, and Claude, your expertise trains their next model unless you opt out. NanoPub reverses that. You opt in to sharing, on your terms. Content stays anonymous until both sides choose to connect. Cryptographically timestamped from the moment you capture. 3. Connections no search engine finds. We don't match on keywords or similarity. We map structural dependencies between findings: what a result depends on, what it enables, what it contradicts. A materials scientist's polymer failure mode matches to a marine biologist's coral stress response because they share underlying mechanisms, not because they share words. The architecture behind this is modeled after how human memory actually works: structured, contextual, and grounded in who discovered what, when, and under what conditions. That same structure is what makes cross-domain discovery possible. The more researchers participate, the more valuable each person's knowledge becomes. The traditional publication model was ripe for an update. Not because papers are bad, but because they are too coarse and too closed for how science can work today. Have fun exploring a new era for scientific knowledge sharing and collaboration! And please give us your feedback! https://nanopub.coretx.ai
Experience & Education
-
Delphos Labs
***** **** *********
-
********** ********* ***********
******** ********
-
******
****** ************* ******** ***** ******* * ******* *********
-
********** ** *********** *********
****** ** ******* ****** ******* undefined
-
********** ** *********** *********
****** ** ********** ******* ******** *******
View Mike’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Licenses & Certifications
Organizations
-
Insight Data Science
Fellow
- PresentAccepted as an Insight Data Science Fellow for the Fall session in 2015 in Silicon Valley
Recommendations received
2 people have recommended Mike
Join now to viewView Mike’s full profile
-
See who you know in common
-
Get introduced
-
Contact Mike directly
Other similar profiles
-
Amir Erfanian, PhD
Amir Erfanian, PhD
Ending Community Homelessness Coalition (ECHO)
3K followersAustin, Texas Metropolitan Area
Explore more posts
-
Colleen Farrelly
Post Urban • 13K followers
Some recent, exciting news on compressed tokenizers and how individual tokens modify images in encoder-decoder models. I hope this opens up a new avenue of research with smaller models across frameworks (perhaps JEPA?). https://lnkd.in/evFw8WBu
69
2 Comments -
Rahul Dharankar
MarcoPolo • 9K followers
“Ambiguity is genAI’s kryptonite.” -- Dave McComb, CEO of Semantic Arts Most enterprises are stuck in AI proof-of-concept purgatory—not because their models aren't working, but because their data infrastructure wasn't built for AI-scale operations. Travis Van (InfoWorld) synthesizes insights from leading data infrastructure and AI experts on the critical gap between AI experimentation and production deployment. Key Voices: Dave McComb (Semantic Arts) - CEO Philip Rathle (Neo4j) - CTO Grant Miller (Replicated) - CEO Brian Gruttadauria (HPE) - CTO of Hybrid Cloud Anthony J. Annunziata (IBM) - Director of AI and Open Innovation Miqdad Jaffer (OpenAI) - Product Lead Three barriers blocking AI production deployment: Data Accessibility: Legacy systems create silos where AI agents can't reach the context they need—traditional data catalogs weren't designed for the real-time context retrieval AI agents require Semantic Integration: McComb and Rathle emphasize knowledge graphs as critical infrastructure for connecting disparate data sources with business meaning and relationships at scale Infrastructure Complexity: Gruttadauria, Miller, and Annunziata highlight how enterprises struggle operationalizing AI workloads across hybrid environments—the infrastructure layer itself becomes a bottleneck The shift happening now: Organizations are realizing they need a new layer—not replacing existing data warehouses or lakes, but adding an AI-native context layer that bridges the gap between where data lives and where AI agents need to access it. Jaffer's insights from OpenAI emphasize rethinking data architecture around agent workflows rather than forcing AI to work within human-designed query patterns. The companies winning at AI aren't necessarily those with the best models—they're the ones who solved the data access problem first. At Immersa, we are working diligently at solving this precise problem. #EnterpriseAI #DataInfrastructure #AIAdoption #ContextEngineering #DataArchitecture #AgenticAI #KnowledgeGraphs #DataStrategy
8
-
Guo-Jun Qi
Westlake University • 4K followers
We recently released a preprint paper demonstrating how a consensus group of individual prompts with evolved contexts by large language models (LLMs) can significantly enhance diversity and overall performance in challenging agentic tasks. You can read more about our findings at https://lnkd.in/gmMw-d2W. We call it C-Evolve (Concensus Evolve), and it is done without needing to train LLMs' weights. #c-evovle #CompoundAISystems #ContextOptimization
23
-
Stephen Wolfram
Wolfram Research • 51K followers
If you do functional programming (like in Wolfram Language) you've probably used lots of pure functions, or lambdas. But what are lambdas like in the wild? Things I'm doing in CS, bio and ML converged to make me curious to find out... And as seems to happen whenever I go exploring in the computational universe ... they surprised me ... https://lnkd.in/eEY2euBD
735
23 Comments -
Guy Nadav
Booking.com • 3K followers
In my previous post I wrote about fine-tuning an in-house LLM for travel recommendations. This post is about why owning the model stack turns latency into an advantage. (previous post - https://lnkd.in/dZS6934P) One of our recent papers, “Speed Without Sacrifice” (link in comments 👇), won the Industry Track Best Paper Award at ACL 2025. For me, the most interesting part is not the award (though kudos to our amazing team including Moran Beladev, Manos Stergiadis, Ilya Gusev and Eran Fainman), but what it says about building GenAI systems at scale. When you rely on third-party models, latency is a fixed parameter. You pick a model. You pick a tier. You live with the response time. With open-source models, latency becomes something you can actively design for. Together with our long-time partners at AWS, Daniel Zagyva, Laurens van der Maas, and Aleksandra Dokic, we explored how owning the full stack lets you optimize not just prompts or infrastructure, but the model architecture itself. Huge thanks to them (and the AWS team including our partner in crime Eran Bachar) for an outstanding collaboration over the last three to four years. This paper shows how we did that in practice. We used knowledge distillation to transfer quality from very large teacher models into much smaller students, preserving task performance while dramatically reducing cost and memory footprint. We then applied speculative decoding, using Medusa-style heads to parallelize token generation. Instead of treating inference as a black box, we changed how decoding works under the hood, achieving large speedups without sacrificing output quality. The key insight is that these techniques compound. Distillation makes models smaller and cheaper. Speculative decoding makes them faster. Combined, they delivered 10–20x latency improvements in real production systems like AI Trip Planner, Smart Filters, and large-scale content generation. The bigger takeaway is not a single optimization. Over time, we have built an internal recipe for latency control across open-source LLMs. Distillation. Speculative decoding. Quantization. Task-specific training. KV caching. Careful serving. Once you own this toolbox, you can apply it to almost any task, and tune the speed-quality-cost tradeoff intentionally instead of accepting it as given. This is why open-source models are so powerful in practice. Latency is no longer a constraint you work around. It is a dimension you control. And it is yet another great example of the kind of deep, hands-on GenAI work the team is doing to build fast, reliable, AI-first travel experiences for our customers. (and yes we are hiring in TLV and AMS - ping Moran Beladev for open roles)
45
1 Comment -
Michael Ryaboy
inference.net • 5K followers
Thanks to the resurgence of RL, LLMs are finally able to reliably coordinate tools and reasoning to do high-precision retrieval. Companies like Happenstance, Clado, and Mintlify have already shifted to agentic search, and it's only a matter of time until anything less feels broken to users. Link to full blog in comments.
20
2 Comments -
James Brand
Microsoft • 2K followers
Wrote a second blog post! This time it's about using LLMs for imputing missing data. I've seen a few papers about imputation with LLMs, but most seem to either train custom imputation transformers or horserace flagship models against standard methods. Mert Demirer and I recently played with an alternative idea which instead uses LLMs as part of an ensemble imputation approach, relying on their "knowledge" of the world to provide additional prediction signal. I haven't seen a paper about it yet, so we thought it'd be fun to make a short post: https://lnkd.in/gMzC6iaP
232
8 Comments -
Emmanouil Antonios Platanios
Scaled Cognition • 2K followers
Check out our new blog post “Prompt Trees: Training-time Prefix Caching,” by the research team at Scaled Cognition. https://lnkd.in/dDYF2VXz TL;DR: Training speedups of up to 70x on tree-structured data. Not 70%. _70x_. How do we do this? Let's consider the case of RL rollouts for a conversational agent (this is just one possible example of many). You typically have many rollouts at each turn - this is a tree. In standard transformer training, you need to duplicate the prompt prefix for each rollout, compute gradients, and take a step (e.g., trl does this). That's a lot of duplicate encoding, especially for long conversations or many rollouts. How do we remove the duplicate encoding? The key insight is that token encoding only depends on other tokens by means of the position IDs and attention mask - if we can set those appropriately for a tree structure, we can use standard transformers to encode trees. It turns out you can use pytorch's flex attention to construct a tree-structured attention mask for a prompt tree. It's very efficient if you use a binary encoding of a node's path to compute whether one token is in the prompt prefix of another token. With this attention mask and correctly-offset position ids, the encodings that you get (and gradients that you compute) are _exactly_ the same as if you had encoded each path through the tree as a separate linear prompt (modulo numerical stability). How well does this work? That depends on a lot of things, but the biggest factor is how dense your prompt trees are. If your data is intentionally constructed to make good use of prompt trees, the speedup can be very large. We ran timing experiments on our cluster, measuring the relative speedup as a function of how dense the prompt tree is. The speedup is roughly linear in the proportion of duplicated tokens in the tree, with empirical speedups of up to 70x. It's not exactly linear, for various reasons we discuss in the post, but it's empirically nearly so. Note that this is a speedup for the gradient computation only, and it's one that potentially changes optimization dynamics (e.g., you're packing more information into each batch) As we're coming out of stealth, we're excited to be sharing more with the community! (We'll be starting with projects which are at the periphery of our tech for now and saving our core agentic modeling tech for later.) We'll be at NeurIPS, happy to chat, and we're hiring 🙂
35
-
Vivekpandian V.
UPS • 8K followers
The more I experiment with RAG, the more I realize smarter chunking matters far more than fancy models. One thing that surprised me recently is how much chunking influences retrieval quality. Most of the issues we blame on embeddings or model choice or prompt actually start with how we split our documents. Sometimes chunking improves precision. Other times it breaks valuable context and hurts performance. My takeaway from hands-on learning: 🔹 Chunk when documents are long, multi-topic, or noisy 🔹 Avoid chunking when documents already fit cleanly into the model’s context window This small shift made my RAG experiments noticeably more accurate and easier to debug. #RAG #LLM #AIEngineering #VectorSearch #EnterpriseAI #AI
29
3 Comments -
Sameera Ramasinghe
Pluralis Research • 4K followers
Decentralized (model parallel) training, to this date, has been considered infeasible, primarily because of the extreme sensitivity of large-scale model training to communication latency. Training any serious model involves a very long sequence of forward and backward passes, where activations and gradients must be exchanged rapidly across devices. Normally, this is only feasible because the GPUs are co-located in tightly synchronized clusters with high-speed interconnects. Even small bottlenecks such as 100ms delays between consumer devices on Wi-Fi can be really catastrophic. Today we share a new paper that shows, for the first time, the ability to train billion-scale models over the open internet using low-end GPUs. Our compression algorithm can compress both activations and gradients by over 100x with no degradation in convergence or final performance. For the first time, this allows billion parameter models to be trained across globally distributed low end GPUs, connected only via standard internet, and match the performance of centralized training using high-bandwidth interconnects. In the coming days, we’ll launch the first ever fully volunteer-driven decentralized model training run, where anyone around the world can join and co-train a foundation model over internet. Preprint: https://lnkd.in/gUryp3Bs Register for the run → https://lnkd.in/dEKVEesZ
57
12 Comments -
Hernán Vivas
AlixPartners • 447 followers
📄 It's paper Friday! 🧠 Is there an underlying semantic structure common to all word embeddings? A recent paper from a group of researchers at Cornell University makes the case that the "Platonic Representation Hypothesis" (the idea that embeddings have a common underlying representation space) holds true. Closer Look: The Challenge * Given two embedding models, the corresponding vector spaces are usually completely incompatible with each other * Even if they are both encoding the same text, you can't just match them vector-to-vector or apply tools designed for one model's embeddings to another model's embeddings The Idea: Universal Latent Representation Inspired by a paper in image processing that argues that "representations in AI models, particularly deep networks, are converging", the authors propose a Strong Platonic Representation Hypothesis where there's an underlying universal semantic structure that all text embedding models converge toward and build an adversarial network (a machine learning method where two networks compete) that allows translating unknown embeddings so that information extraction tools designed for known embedding spaces can be applied to them. The results are very telling: Their method achieves 0.92 cosine similarity and perfect matching on over 8,000 shuffled embeddings. A Call for Some Concern: Security Implications An adversary with access to just a vector database could use this method to extract sensitive information about the original documents - like medical records or private emails - without even knowing which model created the embeddings. Though the authors don't dive deep into the security implications, this opens up some questions about vector database security. My Take When I first read this I thought about Chomsky's universal grammar - the idea that despite surface differences, all human languages share deep structural principles. Could embeddings be revealing something similar about how meaning itself is structured? What are your thoughts? Let me know in the comments! #AI #MachineLearning #LLM #Research
11
2 Comments -
marwan abdelaziz
Microsoft • 604 followers
A major step forward for AI and fair use. A U.S. District Court just ruled that training LLMs on copyrighted books can constitute fair use likening it to teaching students to become better writers by learning from published works, without copying them verbatim. This decision, if upheld, helps remove a major point of legal ambiguity around data access one of the most critical inputs in modern AI development. While pirated data is still off-limits (as it should be), legally obtained texts, including books are now safer ground. Access to high-quality data = progress. Let’s keep innovating responsibly. #AI #FairUse #LLMs #AIethics #OpenSourceAI #DataCentricAI #AIRegulation
-
Steve Bromley
gamesuserresearch.com • 6K followers
Hugely important issue covered here, especially as teams often perceive surveys as the cheap or ‘easy’ method. If you can’t trust the quality of the data you are gathering, all of the time, money (and political capital) invested into the study is wasted. Screening & recruitment are some of the most important parts of the research process - part of the reason I favour qualitative moderated methods when it aligns with the team’s objectives.
42
3 Comments -
Sam Denton
Applied Compute • 2K followers
In Anthropic's Agentic Coding Trends Report, they mention "perhaps the most valuable capability developments in 2026 will be agents learning when to ask for help, rather than blindly attempting every task, and humans stepping into the loop only when required." That's why we are releasing our latest research at Scale AI: Long Horizon Augmented Workflows (LHAW). LHAW is a synthetic data generation pipeline for creating underspecification on *any* dataset and evaluating how agents react. LHAW transforms well-specified long-horizon tasks into controllably underspecified variants using a three-phase pipeline: segment extraction, candidate generation and empirical validation. We generate & validate 285 ambiguous task variants across MCP-Atlas, TAC, and SWE-Bench Pro Finding #1: Clarification recovers meaningful performance, but not fully. Access to a simulated user significantly improves success on underspecified tasks (+31% Pass@3 for Opus4.5 on MCP-Atlas), yet agents are not able to fully recover original performance. Finding #2: Models vary widely in clarification strategy: GPT-5.2 spams, Gemini models underask. Some models extract high value information per question. Others ask far more frequently, achieving gains but with lower value per interaction. We measure this with with Gain/Question Finding #3: Clarification behavior adapts to cost. As expected, when interaction is “cheap”, agents ask more but gain less per question. When interaction is “expensive”, agents ask less but extract more value per question at higher risk of failure. Finding #4: Clarification failure-modes vary from widespread to model-specific. Certain failure-modes like poor question quality, underclarification, and question targeting apply across models. Some models show particularly bad tendencies to overclarify or misinterpret a response. As agents take on longer tasks, we want to know how they act under uncertainty and how much they burden us with their questions :) LHAW provides a way to create these tasks, evaluate clarification strategies, and (soon) train agents for reliability under real-world ambiguity. This work was led by George Pu and Mike Lee with contributions from Udari Madhushani Sehwag, David Lee, Bryan Zhu, Yash Maurya, Mohit Raghavendra, and Yuan (Emily) Xue Blog: https://lnkd.in/gp768At9 Full Paper: https://lnkd.in/gVTjemmv Dataset: Hugging Face https://lnkd.in/gTjVrszU
154
5 Comments -
John Olafenwa
Microsoft • 4K followers
An important question I get asked is, why is RL important and is supervised finetuning enough for LLMs? I spent the weekend putting together a substack and a youtube video explaining the difference between RL and SFT, and the theoretical foundations for why RL works in challenging domains. I highly recommend this if you are working with LLMs or just curious to understand how they are trained. You can also find the substack version here: https://lnkd.in/eKewpccT https://lnkd.in/eNjYbWzh
7
2 Comments -
Sravan Bodapati
Google • 12K followers
Anthropic's recent blogpost discusses the important shift from evaluating simple LLMs ---> testing complex AI agents. It highlights why standard benchmarks fall short for agentic workflows and offers a practical roadmap for building robust, multi-turn evaluations." Packed with insights, but couple of takeaways that stood out to me: 🎯 1. 𝐂𝐨𝐦𝐩𝐨𝐮𝐧𝐝𝐢𝐧𝐠 𝐯𝐚𝐥𝐮𝐞 𝐨𝐟 𝐭𝐡𝐞𝐬𝐞 𝐬𝐲𝐬𝐭𝐞𝐦𝐬: where costs are visible upfront but their benefits accumulate over time. (quicker iterations, faster deployment) 🎯 𝟐. 𝐓𝐡𝐞 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 𝐅𝐢𝐱 / 𝐃𝐞𝐭𝐞𝐫𝐦𝐢𝐧𝐢𝐬𝐦 𝐢𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧𝐬: Regardless of agent type, agent behavior varies between runs. Sometimes, we want to measure how often (what proportion of the trials) an agent succeeds for a task, while for others, we want to measure if Agent succeeds atleast once. 🚀 Single success is enough across runs ? Use pass@k. 🚨 Need reliability across runs? Use pass^k. The post contains many more valuable insights worth exploring. Read more here: https://lnkd.in/gwCxqhDj
59
2 Comments -
Jessica Leight
International Food Policy… • 10K followers
Icymi: neat new analysis from Charles Yang uses ORCID profiles and github links, along with public Cluade Code commits, to track scientific adoption of Claude code. It's clearly incomplete, but as a first indication of early adopters, he finds a 2% adoption rate of Claude Code. There is a U-shaped adoption curve, and economists seem to be adopting at roughly double the rate of other scientists. (Link in comment.) These are neat findings, and I look forward to more and better data sources that allow us to track agentic coding option. Right now, the conversation is heavily driven by social media. The adopters and those who see large benefits are undoubtedly the largest. Less represented are those who haven't adopted because of up-front costs (time, money, or both), organizational constraints or organizational policies, or other reasons. Moreover, it's a reasonable hypothesis that there are strong complementarities between preexisting coding skills and use of Claude code, in which case the earliest adopters may have the highest returns. More evidence is urgently needed!
31
1 Comment -
Sarthak Rastogi
One New Zealand • 25K followers
Anthropic's technical blog is a great resource for production-ready AI systems. I went through some of their best articles lately (imo) -- They're really good at explaining the thought process behind anticipating challenges in a prod scenario and designing robust systems. I used Miskies AI to create visual and hands-on explanations for all articles. - Equipping agents for the real world with Agent Skills Claude is powerful, but real work requires procedural knowledge and organizational context. Introducing Agent Skills, a new way to build specialized agents using files and folders. https://lnkd.in/gCrFPFFV - Effective context engineering for AI agents Context is a critical but finite resource for AI agents. In this post, they explore strategies for effectively curating and managing the context that powers them. https://lnkd.in/gbetNUVW - Writing effective tools for agents — with agents Agents are only as effective as the tools we give them. They share how to write high-quality tools and evaluations, and how you can boost performance by using Claude to optimize its tools for itself. https://lnkd.in/gGzpctYF - How we built our multi-agent research system Research feature uses multiple Claude agents to explore complex topics more effectively. They share the engineering challenges and the lessons learned from building this system. https://lnkd.in/gw4C4r-9 Links to the original articles are also in the comments. ♻️ Share with someone building production-ready AI systems :) #AI #LLMs #GenAI
31
3 Comments
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top contentOthers named Mike Beaumier in United States
-
Mike Beaumier
Denver, CO -
Michael Beaumier
Osprey, FL -
Michael Beaumier
Standish, ME -
Michael Beaumier
Palm Springs, CA
8 others named Mike Beaumier in United States are on LinkedIn
See others named Mike Beaumier