What specific customer or product inflection convinced you to move from Recognite’s autonomy focus to a new data-center inference mission?
Foundationally, we started from first principles: how do you do AI processing more efficiently per square millimetre of silicon and per watt? AI is trillions of multiplies and linear-algebra ops, and in silicon a multiply costs ~20× the area and ~20× the energy of an add. By moving computation into the log domain—because log(A×B) = log A + log B—those trillions of multiplies become adds, radically shrinking area and power.
We built and proved a chip using this log-math architecture for video, then broadened our architecture for all AI workloads just as the ChatGPT moment hit. We realised we had the right engine to design an inference-focused data-center chip—with the right interfaces, memory, and interconnect—and partnered across the stack to deliver a full system: think of a seven-foot rack that slots in as an inference processor. With application complexity and user counts surging, we see the market trending toward ~90% inference / ~10% training by 2030, so we’ve architected purely for inference economics: tokens per CapEx dollar and tokens per watt.
Walk us through the LNS (log-number-system) data path—addition in log, crossing back to linear—and how you achieved >99.9% relative accuracy on LLAMA and diffusion models.
Log number systems are established; the question is how to do them most efficiently. All of this is abstracted away from the user. You bring a trained model and graph, and our SDK quantises into the log domain, compiles, and deploys to hardware. The end user never “sees” the log math; it happens inside our toolchain.
Key steps are quantising from linear to log, then performing the computation and mapping back. Traditional approaches lean on lookup tables, but those erase the gains you made. Over several years we developed optimisations—beyond, for example, a basic Mitchell approximation—to go from log back to linear with very low error and implement those directly in silicon. Because log has a very high dynamic range—and our sensory systems (sound, light) also operate effectively on log scales—we can process a large spectrum efficiently and maintain high accuracy across text, image, and video models.
You claim 8× power efficiency versus NVIDIA. What’s the benchmarking protocol, and when will third-party benchmarks be available on production silicon or simulators?
MLPerf/MLCommons is the standard, and our comparisons reference publicly released results on workloads such as LLAMA-70B and LLAMA-405B. Today, we measure on a bit-accurate simulator that mirrors the exact math path we implement in silicon; we built that discipline with our prior chip, so we have high confidence in representativeness.
We publish numbers conservatively and will validate on hardware. We’ll have hardware out next year, run the MLPerf benchmarks on silicon, and release product next year as well. We’ll share dates and results when we announce, and ensure independent, third-party visibility.
What’s the developer path from a PyTorch model to a production service on your system—tooling, supported ops, fallbacks—and who will validate simulator accuracy on silicon across LLAMA and diffusion?
We’re already working with customer partners. Our SDK ingests trained models from PyTorch (and supports Triton and other formats), quantises to the log domain, compiles, and deploys. An initial SDK went to partners; a fuller customer build will be available for evaluation and feedback in roughly six to eight weeks.
We support the operators used in publicly released models, across multimodal workloads. The plan is to release the customer SDK in November, enabling partners to validate our simulator-predicted accuracy and then correlate on silicon as hardware becomes available next year.
Taking a wider view, have you ever seen anything like today’s AI moment—in promise and in risk?
The closest corollary is the internet boom of the 1990s, which transformed communication and brought the world closer. I believe the AI age will surpass it in overall societal impact. You can already see it in everyday tasks—search, report writing—and in domains like finance and law, where AI accelerates people’s work and capacity.
We’ve moved from large language models and chatbots to the “age of reasoning,” with slower, context-rich models that demand ~100× more compute. This year is the “age of agents,” which multiplies inference demand as agents act on those reasoning models. Next comes video: moving from offline creation to real-time HD video generation, enabling natural, live conversations with AI agents. That will drive another step-function in compute and applications—especially on the inference side.
Were you ever scared about rebranding from Recognite to Tensordyne? And to confirm: SDK in November; hardware and product release next year?
No moments of terror—this evolution matches where the company and market are heading. We’ve moved from building chips to building full systems—accelerators plus racks—squarely targeted at data-center inference. The name Tensordyne fits what we’re doing, the timing is right, and feedback has been very positive.
Yes, to summarise: the customer SDK goes out in November. Hardware, benchmarking on silicon, and the product release are planned for next year. We’ll share specifics as we announce.
You’re headquartered in Silicon Valley and Munich. Do the hubs bring different strengths to Tensordyne?
The dual-hub model has been fantastic. In the U.S., the team focuses on chip design, hardware and systems, and software; in Munich, we’ve built a world-class AI team. AI is math at its core, and Europe—TUM in particular—produces outstanding math talent that works hand-in-glove with our chip designers.
Code, models, and requirements flow continuously between Munich and Silicon Valley. Two locations, one team: the collaboration ensures workloads are optimised end-to-end to run efficiently on our architecture.