Infernet Protocol

Infernet Protocol

We're doing to AI what Bitcoin did to money.

A peer-to-peer GPU compute marketplace — inference and distributed training. Run one CLI command, point it at the hardware you have, and start earning crypto. No native token, no rent extraction, no permission required.

For operators

Earn crypto for the GPU you already have

Run any model you can serve — Qwen, Llama, Mistral, your own. The control plane routes paying jobs to you and pays out in whichever coin you pasted in. No native token to hold, no platform spread above market.

For clients

Pay in any chain you want

Submit chat, training, or embedding jobs and pay in BTC, ETH, SOL, USDC on multiple chains, Lightning, or whatever CoinPayPortal supports. The provider you got routed to gets paid; you don't have to learn a new asset to use the network.

For the protocol

The control plane is convenience, not dependency

Operators authenticate with a Nostr keypair — never a database credential. Discovery bootstraps from infernetprotocol.com, then peers gossip via libp2p Kademlia + Nostr relays. The site can go dark and the network keeps working.

The honest version

Not a decentralized hyperscaler. The economic substrate for jobs that don't need NVLink.

Hyperscalers earn their interconnect moat on one specific workload: synchronous tensor-parallel training of frontier-scale models, where μs-level GPU-to-GPU bandwidth matters. We don't try to compete there. The math doesn't work, and we'd be selling a fiction. Here's what doeswork on a peer-to-peer GPU network — and why it's most of the market.

Live today

Single-GPU & CPU inference

Run any chat model your hardware can serve — Qwen, Llama, Mistral — through Ollama on whatever GPU or CPU you have. One model, one box, one request. Zero NVLink penalty. The dominant inference pattern by request volume, and the one a peer network is actually best at.

Coming next

vLLM + ComfyUI endpoints

OpenAI-compatible chat/completions via vLLM (drops into every existing tool that already speaks OpenAI), and image generation via ComfyUI. Both slot into the same engine-adapter pattern Ollama uses today. Embarrassingly parallel batch + LoRA fine-tunes + federated/DiLoCo-style training round out the workload set.

Not our market

Hyperscaler-only workloads

Tight-sync 100B+ training, 64-node Slurm clusters, rent-a-Linux-box pod hosting. Maybe 20 orgs on Earth do those at scale; they own their fleets and they're not our customers. Conceding them costs nothing and frees us to build the protocol the rest of the market actually wants.

On the ASIC future.Every flagship phone already ships with an inference accelerator. Apple Neural Engine, Qualcomm Hexagon, Tenstorrent, Groq's LPU, Cerebras, AWS Trainium — silicon optimized for matmul keeps getting cheaper, weirder, and more ubiquitous. Bitcoin's real lesson isn't that ASICs killed CPU mining. It's that the protocol survived three hardware generations because it didn't depend on any of them.

Infernet is the protocol layer. Whatever silicon shows up next, the matchmaking, escrow, reputation (CPR), and payment routing don't change.

Get started

Run a node in two commands.

Linux, macOS, or Windows (via WSL2). Re-run the installer anytime to update. Operators with their own hardware just point Infernet at it and configure payouts; the rest of the stack (Ollama, firewall, daemon) is handled by infernet setup.

  • · Install Ollama, pull a model, open the firewall
  • · Configure your payout addresses (BYO wallet or generated)
  • · Register with the control plane and start the daemon
  • · Windows? wsl --install -d Ubuntu, then run the same one-liner inside it

Install

curl -fsSL https://infernetprotocol.com/install.sh | sh

Then

infernet setup            # bootstrap Ollama + model + firewall
infernet "what is 2+2?"   # default verb is chat

Works on

One installer, host-agnostic. The script auto-detects whichever volume your GPU box mounts and puts the install there.

Try it now

The public playground at /chat routes through real provider nodes.

When the network has live providers, your prompt goes to one of them and tokens stream back over SSE. When it doesn't (early-launch reality), it falls back to NVIDIA NIM so the demo never breaks. Either way, it's the real wire.

Open /chat →