Tip-offs

Synaptic Tip-Offs
Synaptic Tip-Offs

Tip-Offs | Edition #46

Tip-Offs | Edition #46

Each week, we surface promising startups & founders through data signals. 

This week, we're spotlighting the inference economy →

Copy link

Copied

Copy link

Copied

Every token is a P&L line now

The money in AI has shifted from training models to running them.

Inference is becoming the biggest cost in AI. Deloitte projects inference will make up roughly two-thirds of all AI compute in 2026, up from one-third just three years ago. At the same time, per-token costs have fallen ~1,000x since late 2022. This sharp drop has made AI agents commercially viable for the first time.

However, running AI models is quickly becoming a commodity with shrinking margins. The real opportunity is making inference faster and cheaper through better infrastructure, using techniques like caching, KV cache reuse, speculative decoding, and custom chips.

As Theory Ventures illustrates, optimizing inference can cut delivery costs to around $0.70 while maintaining the same 30% margin.

Whoever owns inference usage over the next 3–5 years will set the pricing power later.

Inference spend growth over twelve quarters by Theory Ventures

Source

Inference Infrastructure Startups

Startups cutting the cost, latency & memory overhead of running AI models in production. Ranked by website visit growth over the past 3 months.

RunAnywhere | 287% growth in 3 months 
An on-device inference layer that builds engines to run AI models directly on the hardware companies already own. Raised a $500K Pre-Seed in October 2025. 
Redwood City, US

InferX | 168% growth in 3 months  
A serverless inference engine that loads models instantly and packs GPUs to high utilization, cutting cost and latency for AI deployments. Raised an undisclosed Seed round in December 2025.
Seattle, US

RightNow AI | 119% growth in 3 months 
An AI-powered code editor that optimizes CUDA kernels to squeeze more performance out of NVIDIA GPUs. Raised a $540K Seed in May 2026. 
Amman, Jordan

Cactus Compute | 115% growth in 3 months 
An on-device SDK that deploys AI models straight onto mobile hardware, with low latency, offline support, and no data leaving the phone. Raised a $500K Pre-Seed in September 2025. 
San Francisco, US

Tensormesh | 59% growth in 3 months  
AI-native caching layer that sits in front of LLM inference, cutting cost &  latency for enterprises running models at scale. Raised a $20M Seed in May 2026.  
Foster City, US

Inference Infrastructure Founders

Recently funded founders building the serving, caching & GPU-optimization infrastructure that runs AI models in production.

Mike Henry 
Founder of AI-chip maker Mythic ($165M raised). Later built the cloud business at Groq. Now CEO and Co-founder of Parasail, a pay-per-token "AI Supercloud" for inference. Raised a $32M Series A in April 2026.
San Francisco, US

Ying Sheng 
Stanford PhD. Co-creator of SGLang, the open-source inference engine. AI infrastructure veteran from xAI and Nvidia. Now co-founder of RadixArk, commercializing SGLang for frontier-scale serving. Raised a $100M seed at a $400M valuation in May 2026. 
Stanford, US

Zain Asgar
Stanford (CS, adjunct professor). Ex-Google engineer. Co-founded Pixie Labs, acquired by New Relic. Now co-founder and CEO of Gimlet Labs, a multi-silicon inference cloud that runs AI across CPUs, GPUs, and custom chips. Raised an $80M Series A in Mar 2026. 
San Francisco, US

Steven Arellano
Previously a software engineer at Google, 2σ, and Sei Labs. Now co-founder of Wafer, a high-performance inference engine for deploying custom AI models with optimized throughput and latency. Raised a $3.4M seed in Apr 2026.
San Francisco, US

Andrea Pili 
Former CMO at Queryo & CIO at Apply, where the Xference project was born. Now founder and CEO of Xference, a cross-platform inference engine for deploying models efficiently and securely in production. Raised an $824K pre-seed in Feb 2026. 
Cagliari, Italy 

Found some interesting startups? Get the next edition in your inbox.

Share

Copy link

Copied