Deployment: Raspberry Pi text-only stack (pattern B)
This guide runs the full AllArkive stack—kiwix-serve, Ollama, the RAG
service, Open WebUI, and the landing page—on a Raspberry Pi 4 or Pi 5
using a USB SSD for storage. The "text-only" label means the recommended
bundle (minimal) uses text-only ZIM archives; the AI and
RAG layers are present and fully functional.
Time estimate: two to four hours, mostly downloading images and model weights.
Hardware requirements
| Component | Minimum | Recommended |
|---|---|---|
| Board | Raspberry Pi 4 (4 GB RAM) | Pi 4 (8 GB) or Pi 5 (8 GB) |
| Storage | USB SSD, 32 GB free | USB SSD, 64 GB+ |
| Power | Official 5 V / 3 A adapter | Official 5 V / 5 A (Pi 5) |
| Network | Ethernet or Wi-Fi | Ethernet for downloads |
Do not use an SD card for AllArkive data. SD cards
fail under the sustained random-write load of model inference and vector
indexing. Boot from SD, store everything under /mnt/ssd on
the USB SSD.
RAM guidance by model
| Pi RAM | Recommended chat model | Notes |
|---|---|---|
| 4 GB | qwen2.5:1.5b |
Default. Leaves ~1.5 GB headroom. |
| 8 GB | qwen2.5:3b or phi3.5:3.8b |
Better answers, still fits. |
| 8 GB (Pi 5) | qwen2.5:7b |
Works with OLLAMA_MEMORY_LIMIT=6G. |
Step 1: Image the OS
Download Raspberry Pi OS Lite (64-bit) from
https://www.raspberrypi.com/software/operating-systems/ —
choose the "Lite" variant (no desktop). Bookworm (Debian 12) or
later.
Flash it with Raspberry Pi Imager. Before writing, click the gear icon and:
- Set a hostname (
allarkive-piworks) - Enable SSH
- Set a username and password
- Configure Wi-Fi if you are not using Ethernet
Boot the Pi, confirm SSH access:
ssh pi@allarkive-pi.localUpdate the system:
sudo apt update && sudo apt full-upgrade -y
sudo rebootStep 2: Mount the USB SSD
Identify the SSD device:
lsblkLook for a disk (sda, sdb, etc.) without a
partition listed in MOUNTPOINTS. Your SSD is typically
/dev/sda with a partition at /dev/sda1.
If the SSD is new and has no partition, format it:
sudo fdisk /dev/sda # create a new GPT partition table and one partition
sudo mkfs.ext4 /dev/sda1Mount it:
sudo mkdir -p /mnt/ssd
sudo mount /dev/sda1 /mnt/ssdMake the mount survive reboots. Add to /etc/fstab:
# Get the UUID of the partition
sudo blkid /dev/sda1Add a line to /etc/fstab (replace YOUR-UUID
with the value from blkid):
UUID=YOUR-UUID /mnt/ssd ext4 defaults,noatime 0 2
Verify:
sudo mount -a
df -h /mnt/ssdExpected: /mnt/ssd mounted with available space matching
your SSD size.
Step 3: Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER"
newgrp dockerVerify:
docker version
docker compose versionExpected: Docker Engine 24+ and Compose 2.x.
Step 4: Clone the repository
sudo mkdir -p /mnt/ssd/allarkive-repo
sudo chown "$USER" /mnt/ssd/allarkive-repo
git clone https://github.com/allarkive/allarkive /mnt/ssd/allarkive-repo
cd /mnt/ssd/allarkive-repoStep 5: Configure the environment
cp compose/.env.example compose/.envEdit compose/.env:
nano compose/.envRequired changes:
# Generate a secret key:
# openssl rand -hex 32
WEBUI_SECRET_KEY=<paste the output here>
# Point all data at the USB SSD
ALLARKIVE_DATA_DIR=/mnt/ssd/allarkive
# Set the smaller Pi model
OLLAMA_DEFAULT_MODEL=qwen2.5:1.5b
CHAT_MODEL=qwen2.5:1.5b
# Tune RAM limit to your board (4G for 4 GB Pi, 6G for 8 GB Pi)
OLLAMA_MEMORY_LIMIT=4G
Step 6: Run bootstrap
scripts/bootstrap.sh --pi --bundle minimalThis will:
- Check prerequisites (Docker, disk space)
- Create data directories under
/mnt/ssd/allarkive - Download the
minimalbundle ZIMs (~3.5 GB) and verify checksums - Start the Compose stack
(
compose/docker-compose.pi.yml) - Pull
qwen2.5:1.5b(~900 MB) andnomic-embed-text(~270 MB) into Ollama - Run the RAG indexer over the downloaded ZIMs
Wall-clock time: 30–90 minutes for download + image
build on a reasonable connection, depending on Pi generation. After
that, the RAG indexer keeps running in the background —
on a Pi 4 expect roughly 1–3 hours for the minimal
bundle, longer for balanced. Leave it running: the indexer is
resumable and idempotent, and Kiwix browsing at
http://<pi-ip>:8081 works immediately. RAG answers in
Open WebUI improve as coverage grows — "no sources found" early on is
expected for topics not yet indexed. Watch progress:
docker compose -f compose/docker-compose.pi.yml logs -f ragFor Pi-specific gotchas (cap tuning, OOM during indexing, search-only
mode, re-running bootstrap to expand coverage), see docs/TROUBLESHOOTING.md.
Recipe: Pi 5 with a large external SSD + tiny model + comprehensive ZIMs
If you have a Pi 5 with an 8 GB RAM module, a USB SSD with 500 GB+
free, and no urgency, you can run the full comprehensive
bundle (411 GB of ZIMs: Wikipedia with images, Project Gutenberg, Stack
Overflow, …) with the smallest chat model. Bundle and model are
independent flags — the platform default for Pi is minimal,
but explicit flags override it:
scripts/bootstrap.sh --pi --bundle comprehensive --model qwen2.5:1.5bExpect:
- Download: 411 GB over your connection. Hours to days.
- Indexing: on Pi 5 CPU, roughly 2–8 chunks/sec. Millions of chunks. Days to a couple of weeks for full coverage. The indexer is resumable; partial answers work as coverage grows.
- Disk: 411 GB ZIMs + ~10–20 GB vector index. Comfortably under 500 GB.
- Chat:
qwen2.5:1.5bat ~5–15 tokens/sec on Pi 5 CPU. Usable for short answers; lower quality than 7B.
Bump Docker / Ollama RAM limits in compose/.env if
needed:
OLLAMA_MEMORY_LIMIT=6G # default 4G, fine to raise on an 8 GB PiRecipe: search-only mode (no chat model, smallest footprint)
If the Pi has too little RAM to run a chat model alongside indexing,
or you just want fast keyword + semantic search without LLM
summarisation, pass --no-model. The chat-model pull step is
skipped; the embedding model is still pulled (indexing needs it). RAG
queries return retrieved passages with citations, and the landing page
hides the "Ask AI" UI automatically.
scripts/bootstrap.sh --pi --bundle comprehensive --no-modelThis is the lightest configuration: roughly ~270 MB of model on disk
(just nomic-embed-text), no chat-completion latency, and Pi
5 / Pi 4 RAM is only loaded with the embedder. The "search" half of the
stack works fully; the "chat" half is disabled.
If the indexer OOMs during first run
The Pi 4 with 4 GB RAM can struggle to run Ollama and the indexer at the same time. If the indexer crashes, run it after Ollama is idle:
docker compose -f compose/docker-compose.pi.yml exec rag \
python indexer.py --zim-dir /data --index-dir /index --ollama-url http://ollama:11434Step 7: Smoke test
From the Pi (or any machine on the same network with the right port forwarded):
# Check the landing page
curl -s http://127.0.0.1:8080/ | grep -o '<title>[^<]*</title>'
# Check kiwix is serving
curl -sf http://127.0.0.1:8081/ > /dev/null && echo "kiwix OK"
# Check Ollama
curl -s http://127.0.0.1:11434/api/version
# Check RAG health
curl -s http://127.0.0.1:8000/healthOpen http://127.0.0.1:8080 in a browser (or forward the
port via SSH):
ssh -L 8080:127.0.0.1:8080 pi@allarkive-pi.localThen open http://127.0.0.1:8080 on your laptop.
Port summary
| Service | Port | Bound to |
|---|---|---|
| Landing page | 8080 | 127.0.0.1 |
| kiwix-serve | 8081 | 127.0.0.1 |
| RAG service | 8000 | 127.0.0.1 |
| Open WebUI | 3000 | 127.0.0.1 |
| Ollama | 11434 | 127.0.0.1 |
Nothing is exposed to the LAN by default. See
docs/deployment/lan-access.md for the opt-in remote-access
path.
Troubleshooting
Docker daemon not running after reboot
sudo systemctl enable docker
sudo systemctl start dockerOllama OOM-killed
Reduce OLLAMA_MEMORY_LIMIT in compose/.env
and restart:
docker compose -f compose/docker-compose.pi.yml restart ollamaOr switch to a smaller model (qwen2.5:1.5b →
llama3.2:1b at ~600 MB).
ZIM download stalls
fetch-bundle.sh uses wget --continue.
Re-run the script; it resumes from where it left off:
scripts/fetch-bundle.sh minimal --dest /mnt/ssd/allarkive/zimKiwix exits immediately ("no ZIM files found")
Confirm ZIM files exist and the path is correct:
ls -lh /mnt/ssd/allarkive/zim/*.zimIf empty, fetch the bundle before starting the stack.
RAG indexer is slow
Pi 4 with 4 GB RAM processes roughly 5–15 articles per second during
embedding. For the minimal bundle (~400 MB of text), expect 15–30
minutes. Use RAG_MAX_ARTICLES in .env to cap
indexing time during testing:
RAG_MAX_ARTICLES=500
Port already bound
ss -tlnp | grep 8080Find the PID and stop the conflicting process, or change
LANDING_PORT in compose/.env.
Check all service logs
docker compose -f compose/docker-compose.pi.yml logs -fUpdating
To pull new images and restart:
cd /mnt/ssd/allarkive-repo
git pull
docker compose -f compose/docker-compose.pi.yml pull
docker compose -f compose/docker-compose.pi.yml up -dRe-index after adding new ZIM files:
docker compose -f compose/docker-compose.pi.yml exec rag \
python indexer.py --zim-dir /data --index-dir /index --ollama-url http://ollama:11434What is next
- To run a Pi as a dedicated archive node (no AI), see
docs/deployment/pi-archive-only.md. - To connect this Pi to a remote Kiwix archive node, see
docs/deployment/split.md. - To expose the stack to the LAN, see
docs/deployment/lan-access.md.