We rebuilt project logging at Zerops on VictoriaLogs

Honza

May 5, 20269 min read

Why every project gets its own logging stack

Zerops has one rule we don't bend: users share hardware, not software. Every project on Zerops runs on a dedicated technology stack — its own database, its own runtime, its own logs. The benefit users get from being on Zerops is access to bare-metal performance and operational know-how, not "your data sits in the same Postgres cluster as everyone else's." The shared bits are the iron underneath.

That principle applies to logs too. There is no shared logging cluster that all projects ship into. Every Zerops project gets its own logging infrastructure, deployed inside the project itself. Which means every choice we make about what that infrastructure is gets multiplied by the number of running projects on the platform. We can't ship something that needs an SRE on call.

The journey to here

InfluxDB (briefly)

Our first attempt was InfluxDB. It didn't last. RAM-hungry, CPU-hungry, fragile under our access patterns, and operationally painful at our scale. We abandoned it fast and moved on.

A custom Go service on SQLite

The replacement was a small Go service we wrote ourselves, backed by a local SQLite database per project. For a long time, this worked. SQLite is famously good at being unobtrusive, the Go service was simple to operate, and the whole thing fit comfortably inside every project. Most users never had any reason to think about it.

But we kept running into the same set of ceilings:

5 GB of storage per project, roughly 5 million log lines. Once you hit that, older logs roll off. For a quiet web app, that's months of history. For anything chatty — a busy worker pool, a debug-heavy deploy, a service mid-incident — that's hours.
Query latency around 10 seconds on broader scans. Tolerable for "show me the last 100 lines," painful for "find every error in the last week."
Concurrent reads, writes, and tails contending with each other. SQLite is not designed for many simultaneous readers tailing a file that's also being written to as fast as a busy app can produce lines.

For a while we papered over this with log forwarding. We shipped a first-class recipe for ELK on Zerops — users could spin up an ELK stack inside their account and forward logs from any number of projects into it. That recipe is still there, and it's still the right answer for serious production observability. But it doesn't fix the in-project default. The default still needed to be good.

Looking for a replacement

We had a clear shopping list:

Battle-tested. No exotic experiments running inside thousands of user projects.
Lightweight. Has to fit comfortably alongside the user's actual application, not crowd it out.
Operationally calm. Something we can deploy and forget — no babysitting cluster.
Same job description as the SQLite solution, with the ceilings raised by orders of magnitude.

We tested Loki. We considered the field. We landed on VictoriaLogs — partly because it ticked every box, and partly because we were already running it.

We were already using it on ourselves

VictoriaLogs has been our internal logging backend for Zerops' own application and infrastructure logs for some time. That's how it ended up at the top of the shortlist for user projects. We weren't gambling on a new technology — we were promoting one we'd already been operating in production against our own workload, and that consistently outperformed Loki in our side-by-side testing on the kinds of queries we and our users actually run. ELK stayed in the picture, but as the forwarding target, not as the embedded default. ELK is excellent and also too heavy to run inside every single project on the platform.

The stack we ended up with

VictoriaLogs solves storage and querying. To replace everything our custom service was doing, and to do it well, we needed a few more pieces:

VictoriaLogs — storage engine and query language (LogsQL). The compression and the query speed are the headline features. The UI it ships with is a bonus we pass straight through to users.
Vector — ingestion, transformation, and enrichment. Vector looks like a small tool until you start exploring what it can do; we keep finding new things to use it for.
syslog-ng — log forwarding to external destinations.
zlogproxy — a small in-house Go service that exposes a backward-compatible log API so the Zerops UI and existing tooling keep working without changes.

The piece worth dwelling on is Vector, because it's doing work that isn't obvious from the outside. Each running container in a Zerops project reports its own metadata — service name, service ID, container ID, and so on — to Vector over UDP. Vector keeps these in an in-memory table. As log lines flow through, Vector tags every single line with the correct container's metadata before forwarding to VictoriaLogs. The result is that users can run rich, structured queries across an entire project — "all error-level logs from this service across all its containers in the last hour" — and have that just work, without anyone having had to manually wire log labels.

The other nice surprise: because VictoriaLogs ships with a capable web UI of its own, we could expose it directly to users instead of building yet another log viewer.

What changed in numbers

	Custom Go + SQLite	VictoriaLogs stack
Storage ceiling per project	5 GB / ~5M log lines	~100× more lines for the same disk budget
Typical query latency	~3 seconds	~1 seconds
Concurrent reads + writes + tails	Contention	Non-issue
Query language	None worth the name	LogsQL
UI	Custom Zerops UI (still works, via zlogproxy)	VictoriaLogs UI exposed to users

The compression number deserves more than a sentence, because "100× more lines on the same disk" sounds like marketing until you look at where it comes from.

SQLite stores log entries row by row — a timestamp, a level, a service name, a container ID, a message body and any structured fields, all interleaved on disk in the order they arrived. That layout is fine for general-purpose data, but it's almost the worst case for a generic compressor: each row is a constantly-changing mix of types and the compressor never gets a long enough run of similar bytes to do real work.

VictoriaLogs stores each field in its own column, so timestamps sit next to timestamps and levels next to levels — and that kind of repetition compresses down to almost nothing. A column of INFO / WARN / ERROR strings, a column of a handful of service names, a column of monotonically-increasing timestamps, a column of container IDs that repeat thousands of times — these are dream inputs for a modern compression algorithm. Combined with VictoriaLogs' block-based encoding tuned specifically for log workloads, the real-world outcome on Zerops projects is that the same physical disk that held about 5M lines under SQLite holds roughly 100× more under VictoriaLogs. For most projects, in-project log retention simply stops being a thing you have to think about.

How we migrated thousands of running projects

The new stack was the easy part. Getting every existing project onto it without downtime was the interesting part.

For each project, the cutover went like this:

Provision the new infrastructure inside the project — VictoriaLogs, Vector, and zlogproxy, deployed alongside the existing custom service.
Stop the old service.
Switch port 514 to Vector. From this moment, every new log line flows into the new stack.
Backfill the old SQLite database into VictoriaLogs in the background, so historical logs aren't lost.
Tear down the old service once the backfill completes.

Run this in parallel across thousands of projects, automated end to end, and the whole platform migration takes about two days. No downtime windows. No "we'll be doing maintenance on Saturday between 2 and 4 AM UTC" emails. As far as we can tell, almost no users noticed anything happened.

The new in-project stack is dramatically better than what came before — but it doesn't change one fundamental fact: logs matter most exactly when your infrastructure is on fire, and the logs sitting inside that infrastructure are exactly the ones you can't reliably reach when something is badly wrong. A storage volume problem, a network partition, a regional outage — any of these can take down the very thing you need to read to figure out what's going on.

That's why we ship syslog-ng as part of the in-project stack from day one. The point isn't that VictoriaLogs is too small or that the in-project logs are unreliable — they're neither. The point is that real production observability lives outside the blast radius of the thing being observed. Forward your logs to a fully-managed external service (Better Stack, Grafana Cloud, Datadog, an external ELK), or to a dedicated logging project on Zerops separate from the application it's collecting from. Treat the in-project logs as the convenient default for development and routine operation, and the forwarded logs as the production-grade backup that's still readable when everything else isn't.

Closing thoughts

There isn't a clever moral to this story. We had a custom solution that worked well enough for a long time. It stopped working well enough for the more demanding cases. We replaced it with a stack that's an order of magnitude better on every axis we measured, built mostly out of software we were already running ourselves and had grown to trust. The migration was undramatic. The new defaults are, hopefully, what users would have built for themselves if they'd had to build their own logging stack.

If you're on Zerops, the embedded VictoriaLogs UI rolls out today. Open any project, head to logs, and it's there.