Sign in to view Max’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Cambridge, Massachusetts, United States
Sign in to view Max’s full profile
Max can introduce you to 10+ people at Microsoft
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
3K followers
500+ connections
Sign in to view Max’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Max
Max can introduce you to 10+ people at Microsoft
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Max
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Max’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Activity
3K followers
-
Max K. reposted thisMax K. reposted thisIn 2016, Geoffrey Hinton said we should stop training radiologists. "It's just completely obvious that within 5 years, deep learning is going to do better than radiologists." It's 2026 now. Ten years later. There are over 1,000 FDA-cleared AI tools for radiology. The technology works. Hinton wasn't wrong about the capability. But here's what happened to radiologists: → Salaries grew ~40% → There's a global shortage → Demand increased, not decreased → AI handles the routine scans so radiologists focus on complex cases The AI didn't replace them. It made imaging cheaper and faster, which meant doctors ordered more scans, which meant more work for radiologists, not less. Economists have a name for this. The Jevons paradox: when technology makes a resource more efficient to use, total consumption of that resource goes up, not down. Coal got more efficient in the 1800s. We used more coal, not less. Computing got cheaper. We didn't use fewer computers. AI makes diagnosis faster. We don't need fewer diagnosticians. Every time we automate something, we discover new uses for it that we couldn't afford before. This pattern matters right now because the same confident predictions are being made about software engineers, designers, writers, and analysts. "AI will replace X within Y years." Maybe. But we've heard that before. And so far, the track record of these predictions is terrible. Pay attention to the Jevons paradox. Now that building software is 10x cheaper, we don't build the same amount of software with fewer people. We'll build 100x more software.
-
Max K. reposted thisMax K. reposted thisAI assistants have changed the way we use computers to work and search for information. As LLMs become more powerful, what’s next? Agents 🤖 I’m very excited introduce Windows Agent Arena, a benchmark for evaluating AI models that can reason, plan and act to solve tasks on your PC. 🔗Blog: https://lnkd.in/ezsdhS3N 🌐Webpage: https://lnkd.in/eqtgMn2M 📃Paper: https://lnkd.in/ezrrybij 💻Code: https://lnkd.in/eEKa4pw9 🚀 Windows Agent Arena comprises of 150+ tasks across a diverse range of 11 programs/domains that test how an AI model can act in a real OS using the same applications, tools, and browsers available to us. Researchers can test and develop agents that can browse the web, do online booking/purchasing, manipulate and plot spreadsheets, edit code and settings in an IDE, fiddle with Windows GUI settings to customize PC experiences, and more. ⏰ A major feature of our benchmark is cloud parallelization. While most agent benchmarks today often take days to evaluate an agent by running tasks in series in a development machine, we allow easy integration with the Azure cloud. A researcher can deploy hundreds of agents in parallel, accelerating results as little as 20 minutes, not days. 🧠 Alongside the benchmark we also introduce Navi, a multi-modal agent for Windows navigation. We open-source a version of our screen parsing models to serve as a template for the research community. We benchmark several base models, ranging from the small local Phi3-V all the way to large cloud models like GPT-4o. ✨ I am super excited about this release, and all the innovations for generalist computer agents that the Windows Agent Arena will unlock. For the first time agent developers can start exploring large-scale autonomous data collection in a real OS domain, and train action models using Reinforcement Learning as opposed to costly human demonstrations. This work was done with a group of fantastic collaborators at Microsoft (Dan Zhao, Francesco Bonacci, Dillon DuPont, Sara Abdali, Yinheng Li, Justin W., Kazuhito Koishida), as well as our superstar interns from CMU (Arthur Fender Bucker, Lawrence Jang) and Columbia (Zack Hui).
-
Max K. reposted thisMax K. reposted thisCome build and evaluate AI agents on Windows! Thrilled to share Windows Agent Arena - a framework for you to test and develop agents that can reason, plan and act on a PC using language models.Windows Agent Arena | Applied Sciences | MicrosoftWindows Agent Arena | Applied Sciences | Microsoft
-
Max K. reposted thisMax K. reposted thisCheck out the online demo for #OpenNeRF ✨ Open-vocabulary 3D scene search - directly in your browser! 🚀 🛋️ Demo: https://lnkd.in/d5wwPXph 👩💻Code: https://lnkd.in/dgsG8UFP 📄Paper: arxiv.org/abs/2404.03650 📽️ Project: opennerf.github.io \w Federico Tombari Marc Pollefeys Michael Niemeyer Keisuke Tateno Fabian Manhardt @ ICLR ETH AI Center Department of Computer Science (D-INFK), ETH Zürich
-
Max K. reposted thisA great project demonstrates that Florence-2 with only 0.23B parameters can run real-time on Jetson to empower the applications for robotics and embedded edge AI.Max K. reposted thisWe are excited to announce the release of a new NVIDIA #HoloScan application that showcases the real-time inference capabilities of Microsoft's #Florence-2 vision foundation model. Florence-2 excels in both vision and vision-language tasks with a compact size of just 0.23 billion parameters, making it ideal for running on iGPUs on #Jetson and #IGX. Experience its power firsthand using a simple webcam on #HoloHub!
-
Max K. shared thisI’m excited to receive my patent award cube from Microsoft for making robots do things which human bodies take for granted. PS: still patiently waiting for my Huawei award from 2017.
-
Max K. shared thisMax K. shared thisFor those of my friends, coworkers and their community affected by the recent layoffs in #mixedreality, I wanted to pass on some referrals to companies working in non-virtual reality - specifically the precision agriculture, #agtech, and carbon removal fields that also have a need for people used to doing what was previously though impossible. Take a look at https://lnkd.in/giSGA8UH, https://www.innov8.ag/team, https://kodama.ai/ or any of the many startups in https://lnkd.in/ggH4EZR8 And thanks to Jim Dooley for passing on the leads. 👍Carbon Robotics: First & Only Commercial LaserWeeder™Carbon Robotics: First & Only Commercial LaserWeeder™
-
Max K. shared thisMax K. shared thisI was part of the Mixed Reality layoffs that happened last week - here’s a post reflecting on my time at Microsoft and what could be next for me :) I knew head tracking at Microsoft HoloLens was my dream role the moment I tried Roboraid on HoloLens 1 and tried to break its tracking as a student at CMU (tracking was and still is very very robust!). I tapped into every contact I had to land an interview with the team and was struck by how warm, friendly and talented my interviewers were. Moving to the Greater Seattle Area to build with these wonderful people was a no-brainer. Headtracking in Mixed Reality at Microsoft was a utopia where challenging problems could be discussed, prioritized, prototyped and shipped all while maintaining a strong and empathetic team culture. In a period of 5 formative years, my mind and heart have been transformed - I have deep confidence in my ability to tackle challenging computer vision and deep learning problems, work collaboratively with people and with a high degree of autonomy to build great software and hardware. I also come away with an enduring love for the outdoors and a sense of stewardship and community that is a big part of living and working here. I look forward to using my skills in computer vision and deep learning, C++, Python, visual-inertial odometry and SLAM in another cool place. I’d appreciate any leads and intros from my network. It may be a tougher job market out there but I’m sure my colleagues and I will land on our feet ❤️ #computervision #deeplearning #robotics #perception
-
Max K. shared thisMax K. shared thisYesterday my adventure in Mixed Reality division of Microsoft ended too soon as I was laid off. I spent 1.5 years in Head Tracking team, working on Augmented Reality projects. I still believe AR is a technology of the future and I'm grateful for the opportunity to have my contribution to it. I had ton of fun experimenting and building products, but most of all, met amazing people who helped me grow as an engineer. Along the way I had a chance to explore many areas, such as computer vision, optimization, networking, sensor fusion, calibration, visualization and synthetic ground truth data, all while using modern, fast C++. If you're looking for somebody with similar skillset, I'm #opentowork. I'm looking for positions in Canada or Europe. I'd be also glad to explore new areas of software engineering. Good luck to everyone else affected!
-
Max K. liked thisMax K. liked thisSome legends live up to the hype! Seriously, impressive listening to Jensen Huang talk to a cohort of Accel companies and give his thoughts about a unique moment in history for all companies. Thanks NVIDIA and Accel!
-
Max K. liked thisMax K. liked thisOur journey toward becoming an AI-centric organization didn't start with a model. It started with a massive cleanup of our technical debt. Marco Bill and I just kicked off a new blog series to pull back the curtain on our internal AI experience. In Part 1, we’re digging into the "boring" but essential work of standardization. A few years ago, we were struggling with a fragmented landscape of virtual machines and containers across multiple platforms. It was slow, inconsistent, and created constant operational friction. We realized that you simply can’t build a reliable AI strategy on top of unreliable infrastructure and siloed data. If you're navigating the complexities of your own AI journey, I hope our trial and error helps you find a smoother path. Check out the full post here: https://red.ht/4m52o7u and stay tuned for Part 2, where we’ll talk about moving from restrictive policies to associate empowerment.
-
Max K. liked thisMax K. liked thisThe AI industry is moving at a pace that traditional IT can barely comprehend, and honestly, the only way to keep up is through open collaboration. I’ve spent my career deep in the Linux kernel, and what we’re seeing now with Generative AI feels exactly like those early days of open source—a massive, community-driven shift that’s reshaping everything. We’re working closely with NVIDIA to build a stable foundation for the next decade of innovation. Check out the Red Hat newsroom from NVIDIA GTC last week to dive deeper into these updates and see how we’re industrializing AI together: https://red.ht/46VUnLC.
-
Max K. liked thisMax K. liked thisIn 2016, Geoffrey Hinton said we should stop training radiologists. "It's just completely obvious that within 5 years, deep learning is going to do better than radiologists." It's 2026 now. Ten years later. There are over 1,000 FDA-cleared AI tools for radiology. The technology works. Hinton wasn't wrong about the capability. But here's what happened to radiologists: → Salaries grew ~40% → There's a global shortage → Demand increased, not decreased → AI handles the routine scans so radiologists focus on complex cases The AI didn't replace them. It made imaging cheaper and faster, which meant doctors ordered more scans, which meant more work for radiologists, not less. Economists have a name for this. The Jevons paradox: when technology makes a resource more efficient to use, total consumption of that resource goes up, not down. Coal got more efficient in the 1800s. We used more coal, not less. Computing got cheaper. We didn't use fewer computers. AI makes diagnosis faster. We don't need fewer diagnosticians. Every time we automate something, we discover new uses for it that we couldn't afford before. This pattern matters right now because the same confident predictions are being made about software engineers, designers, writers, and analysts. "AI will replace X within Y years." Maybe. But we've heard that before. And so far, the track record of these predictions is terrible. Pay attention to the Jevons paradox. Now that building software is 10x cheaper, we don't build the same amount of software with fewer people. We'll build 100x more software.
-
Max K. liked thisRed Hat is expanding our work with NVIDIA to deliver Day 0 support for the new Rubin platform! As organizations move AI from experimentation into full production, they need a stable, high-performance foundation. We’re meeting this shift head-on by optimizing our hybrid cloud and AI portfolios (including RHEL, OpenShift, and Red Hat AI) for NVIDIA’s latest architectural breakthroughs.Max K. liked thisChris Wright breaks down the importance of our expanded collaboration with NVIDIA, including how we are engineering a complete AI stack optimized for NVIDIA's next-gen AI superchip platform, Vera Rubin. https://red.ht/49lf6cd
-
Max K. liked thisIt’s great to see Massachusetts being recognized as such a powerhouse for AI innovation. I see firsthand the incredible energy coming from our local ecosystem of talent, universities, and tech leaders. Really cool to see my home state leading the way in building an open future for AI.Max K. liked thisMassachusetts is a hub of outstanding AI talent, and The Open Accelerator exists to help amazing ideas evolve into prototypes, launch companies, become products, and be deployed in production… faster! The passion and expertise of Red Hat, IBM, and the Massachusetts AI Hub come together as The Open Accelerator to support Governor Maura Healey’s vision of Massachusetts as a global leader in AI. We’re engaging with a diverse community of founders, investors, academics, nonprofits, and researchers to ensure AI ideas and companies can grow and thrive in the Commonwealth. The Open Accelerator is all about radical collaboration. We’re not just building technology; we’re building a community dedicated to solving the hardest problems in AI. The future of AI is open, collaborative, and happening right here, in Massachusetts. Office of Massachusetts Governor Maura Healey Massachusetts Technology Collaborative
-
Max K. liked thisMax K. liked thisThis is my origin story. This is my quest to solve parallel programming.
Experience & Education
-
Microsoft
********* ****** ** ********* *****
-
****** ************
****** **** ********** ****** ******** ******
-
*** *********
******* ******** **********
-
********** ** *******
****** ** ******* ********* ******** * ******** ******* A+ and NSERC CGS-M scholarships
-
-
********** ** ******* * ********** *******
******* ******** ** ******* *********** ******** ******** *******
-
View Max’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Licenses & Certifications
Volunteer Experience
-
Mentor
Massachusetts Institute of Technology
- 5 months
Education
MIT Post-Doctoral Association Mentor @CSAIL
-
-
Mentor - School of Graduate Studies, Departments of Physics & Statistics
University of Toronto
- 1 year 9 months
Science and Technology
Mentoring graduate and undergraduate students in the physical sciences regarding career and job opportunities outside of academia.
Publications
-
DeepSeismic: a Deep Learning Library for Seismic Interpretation
European Association of Geoscientists & Engineers
See publicationWe introduce DeepSeismic, an open source Github repository (https://github.com/microsoft/seismic-deeplearning) that provides implementation of deep learning algorithms for seismic facies interpretation. The repository provides composable machine learning pipelines, that enables a data scientists and geophysicists to use state-of-the-art segmentation algorithms for seismic interpretation (e.g. UNet: Ronneberger et al. (2015) , SEResNet: Hu et al. (2018) , HRNet: Sun et al. (2019) ). We provide…
We introduce DeepSeismic, an open source Github repository (https://github.com/microsoft/seismic-deeplearning) that provides implementation of deep learning algorithms for seismic facies interpretation. The repository provides composable machine learning pipelines, that enables a data scientists and geophysicists to use state-of-the-art segmentation algorithms for seismic interpretation (e.g. UNet: Ronneberger et al. (2015) , SEResNet: Hu et al. (2018) , HRNet: Sun et al. (2019) ). We provide scripts to reproduce benchmark results from running these algorithms using various public seismic datasets (Dutch F3, and Penobscot). Finally,the repository provides documentation, and quick start Jupyter notebook and Python scripts to enable the community to get started with seismic interpretation projects quickly. We believe the results in this paper provide a strong baseline on which others can build upon. To the best of our knowledge,these provide state-of-the-art result on Dutch F3 data set. We have released the code and the models in an open-source GitHub repository with permissive MIT license.
-
Numerical strategies for quantum tomography: Alternatives to full optimization. [Quantum State Estimation]
Physical Review A (PRA)
We examine a variety of strategies for numerical quantum-state estimation from data of the sort commonly measured in experiments involving quantum-state tomography. We find that, in some important circumstances, an elaborate and time-consuming numerical optimization to obtain the optimum density matrix corresponding to a given data set is not necessary and that cruder, faster numerical techniques may well be sufficient; in other words, “the best” is the enemy of “good enough.”
Other authorsSee publication
Patents
-
System and method for prediction using synthetic features and gradient boosted decision tree
DE 17743691.2 - 1217 PCT/CN2017072082
Recommendations received
5 people have recommended Max
Join now to viewView Max’s full profile
-
See who you know in common
-
Get introduced
-
Contact Max directly
Other similar profiles
Explore more posts
-
NVIDIA AI
2M followers
Get faster and smarter MOE inference straight out of the box. 👇Deep dive on scaling expert parallelism with TensorRT-LLM. LLMs with MOE promise higher model capacity without linearly increasing compute costs - but they introduce new challenges -- more conditional computation, dynamic routing, and non-uniform GPU utilization -- solved by TensorRT-LLM. ✨ New in TensorRT-LLM has native support for expert parallelism—designed for fast, efficient inference with MoE models like Mixtral (Mistral AI) and DeepSeek (DeepSeek AI). This gives you: ✅ Dynamic expert routing: Automatically route tokens to the top-k experts with minimal overhead. ✅ Efficient expert scheduling: Balance expert loads across GPUs using smart sharding and token bucketization. ✅ Memory-aware execution: Maximize hardware utilization while respecting memory budgets. ✅ Drop-in support: Use @HuggingFace models with minimal code changes via TensorRT-LLM's #Python API. 🧠 How it works: MoE models activate only a subset of "experts" for each token. This dynamic nature is powerful—but hard to optimize. It’s all done under the hood using custom #CUDA kernels and NCCL-based communication primitives—giving you low latency, high throughput, and better GPU scaling. ✨ TensorRT-LLM handles: ✅Token-expert mapping using the gating network. ✅Token sorting to batch same-expert tokens together. ✅Expert parallel execution across GPUs. ✅Merging outputs for final predictions. 🛠️ Developer workflow - here is the code to get started. # Clone the repo git clone https://lnkd.in/g-GiDX23 # Use included examples to load and run a Mixtral model cd TensorRT-LLM/examples/mixtral From there, the Python API lets you load the model, convert it with TensorRT, and run expert parallel inference—all with a few lines of code. Results? 📈 Performance at scale. Tests show up to 2.3x faster inference throughput compared to standard tensor parallelism when using 8 GPUs and top-2 experts per token. Even better—TensorRT-LLM keeps efficiency high across increasing batch sizes. Want to see it in action or contribute? 👉 Read the full tech blog: https://lnkd.in/g_7YV3vV 👉 Explore the code on GitHub: https://lnkd.in/gNjQ5W2U 👉 Follow updates in the TensorRT-LLM repo: https://lnkd.in/gqSHYQ4u Share your experiences with us.
78
3 Comments -
Julius Kusuma
Meta • 3K followers
We developed an open-source AI tool to design concrete mixes that are stronger, more sustainable, and ready to build with faster—speeding up construction while reducing environmental impact. https://lnkd.in/gPCk8tCM But the impact of this AI tool is not just hypothetical! Amrize used Meta’s AI-based technologies to design a new low-carbon mix, and successfully deployed it in an at-scale slab-on-grade application at Meta's new data center in Rosemont, MN. Compared to the legacy mix, this new AI-designed mix is: 🦁 Stronger ⏱ Faster 🍃 Lower carbon ⏱ ️The ideal set time All this was achieved without needing any new materials, nor special equipment. Best of all, the AI is open-sourced. https://lnkd.in/g2KA7KZW This work was featured in a Meta engineering blog article published today! https://lnkd.in/gBU9HY8H
98
9 Comments -
Krishna Rupanagunta
2K followers
There is this paper from Stanford that posits that the path to AGI runs through LLMs - I think the more interesting part it discusses is LLMs as System-1 substrate, and Agentic Systems as the System-2 coordination layer that selects, constrains and binds these LLMs to perform meaningful tasks. They highlight three areas of focus: 1/ Semantic Anchoring - binding the LLMs through examples, metaphors, RAG, tuning etc. 2/ Socratic Filtering - use multiple models to ‘debate’ to pass the reasonableness test as part of the reasoning process 3/ Independent judges - AI judges anchored in the domain that block out weak responses that are not aligned to the overall goal 4/ Transactional memory - that persists states over time. While all these are part of most Agentic Systems that we build today, there are some interesting ideas under each of them that can enrich Systems design. One interesting concept in the paper is ‘Reasoning as a Phased Transition’ where a tiny amount of context can override the pre-trained context, producing an abrupt flip in behavior. Something we all observe empirically in AI systems today- the paper develops a formal definition for identifying the threshold and provides an important insight: small boosts in the above-listed areas can shift results from generic to targeted, anchored in the domain context. This is not new, we see this in many physical and biological systems where smooth changes in a control variable yield sharp changes in the system. Lots of interesting ideas with direct applications to building Enterprise AI systems. https://lnkd.in/gvCDXMtg
27
8 Comments -
PyTorch
318K followers
NVIDIA TensorRT LLM provides a high-level Python LLM API, with its PyTorch-native architecture enabling developers to experiment with the runtime or extend functionality. Learn how these latest TensorRT-LLM optimizations boost reasoning inference performance in our recent blog post. 🖇️ https://lnkd.in/gx52zwTA #PyTorch #OpenSourceAI #AI #Inference #Innovation
94
2 Comments -
Colleen Farrelly
Post Urban • 13K followers
Positional encoding and persistent homology have been powerful tools for graph neural networks. This very recent ICML paper combines the two methodologies to leverage both algorithm strengths. It's an interesting new direction for TDA + neural network architectures. https://lnkd.in/eskZZfsW
92
2 Comments -
John Pepin
International Capitalist Party • 112 followers
Two regimes. Two platforms. Two different shapes. I've been testing my Two State Ontology (TSO) framework on quantum hardware — IBM Marrakesh (superconducting qubits) and Quandela (photonic). TSO predicts that the shape of decoherence depends on whether a system is being actively driven through a critical interaction threshold (Γ_c). Below Γ_c (idle qubits, no interaction): standard exponential decay. At Γ_c (driven dynamics, active coupling): sigmoid/tanh transition. This week's results: • IBM Marrakesh idle decay: Both superposition (X2) and entanglement (X1) circuits decay exponentially. ΔAIC > 30 — decisive. This is what TSO predicts when no path rotation is occurring. • Photonic bunching cascade (Quandela SLOS): tanh preferred over exponential by ΔAIC > 25. Measured κ = 1.325 against TSO's prediction of 4/3 = 1.333 — a 0.6% deviation. • Kim et al. Rydberg re-analysis (129,791 shots): tanh beats exponential 2.75×. The pattern: standard QM below the threshold, TSO-specific behavior at it. Same relationship statistical mechanics has with thermodynamics — ordinary behavior in most regimes, critical phenomena only at phase transitions. TSO has zero free parameters. All constants derived from established physics. The decisive test remains: Rydberg atom arrays with tunable interaction strength through Γ_c, where TSO predicts ~35% deviation from exponential. All notebooks public. All failures documented (including 3+ killed versions). The framework evolved from first principles before any hardware testing. https://incapp.org/TSO/ #QuantumComputing #Physics #QuantumFoundations #OpenScience
4 Comments -
Alfonso R. Reyes
Oil Gains Analytics LLC • 11K followers
This is a fantastic article on using Claude Code for complex #computationalphysics problems. The typical way of using #GenAI for #research is basically you sitting there, holding the agent’s hand through every single step and then #validate. But now that models have gotten a lot better at long, complex tasks, there’s this sort of new paradigm emerging — you define the high-level goal, set a team of agents loose, and go grab several coffees throughout the days of the experiment. The article walks through a concrete example: a researcher at Anthropic (not even a cosmologist) using Claude Opus to build a differentiable Boltzmann solver — essentially code that models the #physics of the early universe right after the Big Bang. The secret sauce is basically a handful of #engineering patterns that is worth sharing: 1. A clear “mission file” (CLAUDE.md) that lives in the project root and tells the agent exactly what it needs to build, what the success criteria are, and any important design decisions. The agent can also update this file as it learns things. 2. A progress/changelog file that acts as long-term #memory across sessions — what worked, what didn’t, why certain approaches were abandoned. Without this, every new session may end up repeating the same dead ends. 3. A test oracle — in this case, the reference implementation (the Cosmic Linear Anisotropy Solving System, a well-established #cosmology code) — so the agent always has a way to check whether it’s actually making progress or just pretending. 4. Using #Git as a coordination mechanism: commit and push after every meaningful chunk of work, so we have a full #recoverable history and can check progress remotely. 5. The “Ralph loop” — a wrapper that pings the agent when it claims it’s done and asks “are you really done?” This fights a known #failure mode where agents give up a bit early on complex tasks. The result? A sub-percent #accuracy against the reference implementation, running on an HPC cluster over a few days. Not production-grade, but remarkably close for something built autonomously with minimal steering from someone outside the field. It includes the #repository with code and details of the implementation, similar in style of Andrej Karpathy, reporting his experiments during the last few weeks. https://lnkd.in/gsD4ESDx
59
2 Comments -
Mario Larcher
Canva • 5K followers
It had been on my list for a while to read the SID-1 technical report after seeing a post about how using OpenAI-style messages in RL can be surprisingly dangerous in multi-turn settings with many tool calls. The first insight is about the messages abstraction. Converting a raw token stream into messages and then back into tokens is lossy because it changes how the exact byte sequence is tokenized. A concrete example is when the model generates a sequence of bytes that the tokenizer splits into two very common tokens with normal probabilities. After parsing and reformatting through the chat template, those same bytes can be re-serialized in a way that the tokenizer now maps to a single, very rare token that the model almost never produces. The reward is computed on the original tokens the model actually generated, but the log-probs used for the update correspond to this new token that was never sampled. Since this token has an extremely low probability, its log-prob has a large magnitude, and in policy gradient this translates into a disproportionately large gradient. That single artificial token can end up dominating the update, creating a feedback loop that gradually destabilizes training and eventually leads to collapse. There is a second, distinct effect happening at the same time. Malformed tool calls or slightly wrong formatting can get “repaired” by the parser and chat template, so bad rollouts end up looking syntactically valid before they ever reach the trainer. The environment sees a correct tool call, the reward is good, and the trainer also sees a correct sequence. The model never receives signal that it actually produced something malformed. Stability is preserved, but the model does not learn tool correctness because the fixing layer steals the learning signal. Their fix is conceptually simple. Use a strict Tokens-In Tokens-Out pipeline, where the trainer sees exactly the token sequence the model generated. No parsing, no chat templates, no message abstraction in between. Another very interesting part of the report is their analysis of the “length debiasing” proposed in Dr. GRPO and similar works, where the per-token advantage is no longer normalized by the rollout length. This assumes that rollout length and quality are unrelated. In tool-use and reasoning-heavy environments this is not true. Bad rollouts tend to be longer than good ones. In that case, removing the length bias makes the average per-token advantage negative. Over long runs, this slowly pushes down the logits of all sampled tokens while pushing up the logits of tokens the model never uses, until the model starts emitting garbage or out-of-vocabulary tokens and collapses.
7
2 Comments -
Martin Ryner
Linköpings universitet • 892 followers
Transformer attention layers operate by projecting high-dimensional token representations into low-rank subspaces defined by per-head key/query/value projections. This inherently places a rank constraint on how relationships between items can be represented, and it forces the model to approximate rich interactions via combinations of rank-limited components. Our paper “Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization” provides a theoretical and algorithmic framework for dealing with full and low-rank structure in relational data representations by adapting transition probabilities and iteratively refining cluster orthogonality via Gromov-Wasserstein-inspired feedback. It shows how a low-rank Markov transition representation can be refined to better reflect true structure and interpretability, and how such refinements converge to meaningful solutions with spectral gap enhancement. This connects directly to issues in multi-head attention: -Both involve representing pairwise relationships (attention scores vs. transition affinities) in a full-rank or approximately low-rank form. -Low-rank factorization can produce spurious grouping or similarity structure that doesn’t reflect the true geometry, unless guided by careful feedback or regularization. -The GW-feedback orthogonalization that we exemplify with shows some ideas how to improve, stabilize, and interpret the geometry of such low-rank relational structures, offering insight into how optimization can go astray in ill-conditioned, low-rank parameter spaces, but also how the implication of the approximation can be assessed. Give it a read: Low-rank constraints are not just computational conveniences — they have real geometric and optimization implications, and without appropriate mechanisms (feedback, orthogonalization, spectral refinement), learned low-rank representations (like attention maps or transition matrices) can be misleading or unstable. I’d like to hear your thoughts on this. Is it needed? Should we care? What’s a good future architecture? What should I read? Merry Christmas!
10
-
AI Market Watch
2K followers
⚡ 10s AI News ⚡ Soumith Chintala Chintala, co-creator of PyTorch, is now CTO of Mira Murati‘s Thinking Machines Lab. This move pairs 11 years of Meta infrastructure expertise with OpenAI’s product legacy to solve AI scaling limits. With PyTorch used in over 80% of research, Chintala will drive vertical integration between frameworks and frontier models. This consolidation of elite talent aims to optimize compute efficiency as the startup targets a $50 billion valuation. 🚀 #AI #PyTorch #TechLeadership #ThinkingMachineLabs #AIMarketNews #WeeklyVentures
1
-
Ramin Mehran
Google DeepMind • 4K followers
In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One Training Example by Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen. The paper demonstrates that reinforcement learning with verifiable reward using only one or two training examples (1-shot RLVR) substantially improves mathematical reasoning in large language models, nearly doubling performance on benchmarks like MATH500. This method generalizes across different models, algorithms, and examples, showing unique phenomena such as post-saturation generalization and the importance of policy gradient loss and exploration encouragement. The authors provide open-source code and data, highlighting the potential for more data-efficient RLVR approaches in improving LLM capabilities.
4
-
Nishantha Ruwan
IWROBOTX Software Inc. • 2K followers
This paper investigates how reinforcement learning (RL) enhances large language models’ (LLMs) ability to recall and navigate structured knowledge hierarchies (such as medical‐codes). The authors show that RL‐enhanced models outperform both base and supervised fine‐tuned (SFT) versions on hierarchical recall tasks. Crucially, they argue the improvement arises not from new factual memorization but from better procedural skill in knowledge traversal: SFT models, when given structured prompting to guide hierarchical navigation, recover much of the gap (from ~24pp to ~7pp). They further analyze internal model activations to reveal that RL modifies how the model queries knowledge rather than what it knows: factual‐representation activations stay similar, but query‐representations diverge. https://lnkd.in/ggNa4nqD
1
-
Felix Xinnan Yu
Google DeepMind • 3K followers
New paper on improving LLMs' native capabilities of retrieval/ranking, to be presented in NeurIPS25: https://lnkd.in/eSX8vKPm Can we give LLM a list of items in context and ask it to retrieve/rank relevant ones? How does LLM solve such a problem? Is attention doing something similar to embedding-based retrieval? How to make it efficient with long context? Can we leverage embedding learning techniques to improve quality?
301
5 Comments -
Kamal Garg
Freelance • 3K followers
This is the fourth post in my series on PTC 2026. The last post outlined the architecture contrast between centralized training campuses and distributed inference networks. This post covers the toolchain changes that make distributed inference practical. Three themes came up repeatedly. First, disaggregated serving. Multiple teams described separating prefill and decode across different GPU pools. These phases have different resource profiles, so treating them as one monolithic job leaves performance on the table. The NVIDIA Dynamo came framework is making this operational. Second, KV cache mobility. This was the topic I found most interesting. Engineers described work to externalize context state so it can move between nodes or sites. The goal is failover without full recomputation. Projects like LMCache and Mooncake act as the glue layers enabling this across inference engines like vLLM and SGLang. Third, cache aware routing. Instead of naive load balancing, inference requests route to nodes that already hold relevant context. It’s the difference between treating inference as stateless compute versus a service with session affinity. The consistency across conversations was striking. The software to treat inference as a routed, recoverable service is maturing fast. That changes who shows up to build. In the next post, I will talk about the ecosystem assembling around distributed inference and the conversations that made it feel real at PTC.
8
-
ScitiX
2K followers
Are you still measuring your LLM inference throughput solely by FLOPS? If you are scaling production-grade LLMs like Llama-3-405B or running complex RAG systems, optimizing for raw compute might be the wrong strategy. In large-scale inference, we often see teams trying to run these models on standard H100 clusters—only to run into a memory wall. To fit model weights and massive KV cache footprint, you are often forced to rely on aggressive tensor parallelism, sharding the model across 8 to 16 GPUs. Essentially, you are paying for the compute of 16 units just to acquire the HBM Capacity of 8. 📉 The technical reality: ⚠️ Drastically reduced Model FLOPS Utilization (MFU). ⚠️ High Time-to-First-Token (TTFT) due to interconnect bottlenecks. ⚠️ Skyrocketing inference unit economics. To escape this trap, the architecture must shift from pure compute to memory density. Architectures with significantly higher HBM capacity, such as 141GB-class HBM3e designs, enable this transition... 1️⃣ Reduced Complexity: Minimize sharding and orchestration overhead. 2️⃣ Lower Latency: Keep data on-chip and reduce network dependency. 3️⃣ Better TCO: Stop paying the "Memory Tax" on legacy infrastructure. As models continue to scale, the bottleneck has undeniably shifted. In the current era of #AIInfrastructure, memory bandwidth is the new Compute. #LLM #HPC #HBM3e #NVIDIAH200 #Scitix #TensorParallelism #AIInfra
4
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content