What this is
AllArkive bundles three things that already exist into something one person can install in an afternoon.
- An offline knowledge archive — Wikipedia, Stack Exchange, Project Gutenberg, iFixit, and more, packaged as Kiwix ZIM files.
- A local AI — an open-weight LLM via Ollama and Open WebUI, running entirely on your machine. No cloud calls.
- A retrieval pipeline — RAG that lets the AI answer questions using the local archive, with citations back to the source.
Run it on a laptop, a home server, or a Raspberry Pi. Use it as a private research assistant. Keep it as a fallback for when the open web gets worse.
Why
The infrastructure of shared knowledge is more centralised, more surveilled, and more hostile to users than it has ever been. The tools to run a useful piece of it on cheap hardware already exist — they just hadn't been packaged together. We packaged them.
This is not a survival kit, a prepper bunker, or a doomsday cache. It's a library and a search tool, built on open weights and open content, that runs on your own machine. The framing is censorship resistance, privacy, and educational access.
What's in the default bundle
The balanced bundle, recommended for most laptops — roughly 23 GB of ZIM data plus a 5 GB model.
- Wikipedia (English, text-only) — about 12 GB
- WikiMed — medical reference, for general lookups, not medical advice
- iFixit — repair guides
- SuperUser — general tech Q&A
- Project Gutenberg — a public-domain book selection
Smaller (minimal) and larger (comprehensive) bundles are documented in the bundle docs.
Install
git clone https://github.com/clupai8o0/allarkive.git
cd allarkive
cp compose/.env.example compose/.env
./scripts/bootstrap.sh
After install, open http://localhost:8080 for the local landing page.
Heads-up: after the stack starts, the RAG indexer keeps running in the background, embedding every ZIM chunk through your local Ollama. On CPU expect several hours for the balanced bundle, less for minimal. Leave it running — it's resumable and idempotent. Kiwix browsing at http://localhost:8081 works immediately; RAG answers improve as coverage grows.
System requirements
- Minimum: 8 GB RAM, 30 GB disk, x86_64 or arm64
- Recommended: 16 GB RAM, 50 GB SSD, modern CPU
- Pi text-only build: Raspberry Pi 4 (4 GB+) with a USB SSD
Architecture
Three layers, each replaceable.
┌──────────────────────────────────────────┐
│ Glue: landing page, docker-compose, │
│ bootstrap, RAG pipeline, docs │
├──────────────────────────────────────────┤
│ Local AI: Ollama + Open WebUI + RAG │
├──────────────────────────────────────────┤
│ Archive: Kiwix serving ZIM files │
└──────────────────────────────────────────┘
Full breakdown in the architecture doc.
Documentation
Pick your starting point. Each doc has its own page with a sidebar and on-page table of contents.
Read first
- README — one-line pitch, install snippet, license
- Architecture — three-layer design, components, data flow
- Threat model — what this does and does not protect against
- Roadmap — what's in v0.1, what's deferred
Install
Deployment patterns
- Raspberry Pi (full stack)
- Raspberry Pi as a dedicated archive node
- Split AI and archive across two machines
- Opt-in LAN access
Contributing and governance
- Contributing — DCO sign-off, PR checklist
- Code of conduct
- Governance — how decisions get made
- Security — how to report a vulnerability
- Full doc index
Security and honesty
- Default-local binding. Nothing listens on the public internet unless you opt in.
- No telemetry. Anywhere. Ever.
- Pinned dependencies. Docker images pinned by digest. ZIM files verified by checksum.
- Signed releases. Verify the artefacts you download.
- Honest disclaimers. The AI can be wrong. RAG with citations makes its output checkable, not infallible.
Status
v0.1 is alpha. First public release lands alongside our BSides Melbourne talk in May 2026.
Co-built by Sam and Sham. Standing on the shoulders of Kiwix, Ollama, Open WebUI, and the people who maintain the open archives we bundle.