Logo

Euclyd promises an exascale token factory with minimal power draw

Eindhoven-based Euclyd inference architecture promises the lowest power and cost per token for the next wave of agentic AI.

Published on September 21, 2025

Euclyd's Ingolf Held at the AI Infra Summit in Santa Clara, California

Euclyd's Ingolf Held at the AI Infra Summit in Santa Clara, California © Euclyd

Bart, co-founder of Media52 and Professor of Journalism oversees IO+, events, and Laio. A journalist at heart, he keeps writing as many stories as possible.

A new Eindhoven-born chip startup, Euclyd, emerged from stealth with CRAFTWERK, an inference architecture that claims to deliver the lowest power and cost per token for the next wave of agentic AI. The company has recently unveiled the design at the Kisaco AI Infra Summit in Santa Clara.

At the heart of the system is a palm-sized SiP (System-in-Package) that carries 16,384 custom SIMD processors and 1 TB of “Ultra Bandwidth Memory,” with a claimed bandwidth of 8,000 TB/s. Euclyd says a single SiP peaks at 8 PFLOPS (FP16) or 32 PFLOPS (FP4) and slots into a rack product, CRAFTWERK STATION CWS 32, that aggregates 32 SiPs to 1.024 exaFLOPS (FP4) and 32 TB of on-package memory. In multi-user mode, the rack is projected to generate 7.68 million tokens per second, “a 100× improvement” in power efficiency and cost per token versus leading alternatives.

“Our Crafted Compute philosophy reimagines inference from the ground up, with custom processors, custom memory, and advanced 2.5D/3D packaging,” said Bernardo Kastrup, founder & CEO. Peter Wennink, Euclyd investor and former ASML CEO, added: “AI inference will dominate datacenter silicon. CRAFTWERK’s breakthrough economics will accelerate agentic AI adoption.”

CoolSem

CoolSem wants to make III-V chips run cooler, harder, longer

Eindhoven startup unveils wafer-level thermal pathway for lasers, RF and power devices, tackling one of deep tech’s limiting problems: heat.

What’s new and what’s still a model

Euclyd stresses CRAFTWERK is “in advanced design”, not shipping silicon yet; the headline figures are modeled projections against Meta’s Llama 4 Maverick family. That matters because inference metrics are notoriously apples-to-oranges (throughput per user vs aggregate, model size, quantization, and concurrency all change the story). NVIDIA, for example, recently highlighted over 1,000 tokens/sec per user for Llama 4 Maverick on a DGX B200 (Blackwell) node, while Cerebras claimed 2,522 tokens/sec on its wafer-scale system; these are impressive single-user speeds that measure a different dimension than Euclyd’s multi-user rack-level total.

Independent coverage from Bits&Chips adds useful color: several Silicon Hive veterans (a famed Eindhoven DSP/IP team later acquired by Intel) are in Euclyd’s leadership, and the publication reiterates the SiP specs and the 7.68M tok/s @125 kW system projection.

Why this matters: Energy is the new bottleneck

The timing is sharp. The IEA projects data centre electricity demand to more than double by 2030, with AI-specific workloads quadrupling; a backdrop that makes tokens-per-kilowatt a board-level KPI rather than a nerdy statistic. If Euclyd’s numbers hold up in silicon, ~125 kW for exascale-class FP4 could be notable, especially as rivals chase speed by piling on ever-hotter accelerators.

ai, energy watt matters

AI’s power surge: Balancing progress & energy in a digital future

As AI accelerates progress, its electricity demands call for investments in power infrastructure, smarter policies and global collaboration.

Euclyd’s pitch leans into that: custom SIMD compute, very close-coupled memory (“UBM”), and advanced 2.5D/3D packaging to shorten data paths, classic levers for lower joules per token. The company frames this as “compute with purpose,” marrying performance, cost, and environmental footprint, very much in line with the Brainport narrative that useful AI must also be efficient AI.

The Eindhoven angle — and heavyweight mentors

Euclyd is headquartered in Eindhoven with a San Jose office, and lists Peter Wennink, Federico Faggin (microprocessor pioneer; Intel 4004, Zilog, Synaptics) and Steven Schuurman (Elastic founder) as mentors/backers, names that signal both semiconductor craft and scale-up savvy. For a region that already houses ASML, NXP, and a robust packaging ecosystem, a purpose-built inference startup is right on brand.

For now, Euclyd has put down an audacious marker. As Kastrup puts it, “We’ve engineered every gate for maximum efficiency and minimal power draw.” If that ethos survives first silicon, CRAFTWERK could make Brainport-crafted inference a talking point well beyond Eindhoven.

Watt Matters in AI

Watt Matters in AI

Watt Matters in AI is a conference that aims to explore the potential of AI with significantly improved energy efficiency. In the run-up to the conference, IO+ publishes a series of articles that describe the current situation and potential solutions. Click on the link to read them all.

View Watt Matters in AI