The app concept

In Flui, “app” is a first-class concept. It is the unit you deploy, observe, scale and retire — the object every CLI command under flui app * and every Apps page in the dashboard operates on. An app is not the same as a running process or a single piece of cluster infrastructure: it is the bundle of name and slug, source, runtime configuration, endpoints, certificates, builds, releases, logs and metrics that together describe one deployable thing and everything that surrounds it during its life on the cluster.

This chapter is the conceptual definition. The manifest that describes an app for Flui has its own chapter (The flui.yaml manifest), and the distinction between Application and CatalogApp has its own chapter too (Catalog vs application).

What an app carries

An app, regardless of where it comes from, carries the same set of things:

A name and a slug. A stable slug unique within the installation and a human-friendly name. The slug is what every other surface refers to.
A cluster. Every app lives on one cluster — the cluster whose dashboard you opened, or the one you pointed the CLI at. Moving an app between clusters is not a single in-place operation; it is a redeploy on the other cluster.
A source. Where the workload comes from. Today there are three: a container image pulled from a registry, a source repository that Flui builds for you against a flui.yaml manifest, or an entry picked from the Flui catalog of ready-made building blocks. The source decides what happens before the workload runs; the rest of the app shape is the same regardless of which one it is.
A runtime configuration. Replica count and scaling limits, healthcheck, ports, volumes — the operational shape of the workload once it is up.
An endpoint, or none at all. An app can declare that it should be reachable from outside the cluster, in which case the deploy flow gives it a hostname and a TLS certificate to match. An app that does not need to be reached — a background worker, a job — runs without one. The endpoint itself is its own concept, covered in DNS as part of the deploy contract. How that reachability is shaped — public on its own domain, or internal behind the dashboard — is the exposure axis, below.
A release history. Every deploy and every rollback is recorded as a new release entry, append-only. The current release is a pointer into that history.

The sections below take the points that need more than a line — resources, configuration and secrets, observability, automatic behaviours — and give each one a paragraph of its own.

Resources: CPU and memory

Every app declares two numbers per instance: the request — what the cluster reserves and guarantees — and the limit — the ceiling the instance cannot go past. The scheduler places an instance only if the node can honour the request; the runtime enforces the limit at peak.

The dashboard shows both numbers and the current usage side by side, and flags the app visually when the request is much higher than what the workload uses (over-allocated, paying for capacity that sits idle) or when the usage runs close to the limit (under-allocated, one bad minute away from being killed). The operator sees these signals without having to go looking for them.

There is also an automatic safety net for the very first launch of an app: if the workload runs out of memory in its early minutes, before its steady-state behaviour is even known, Flui can nudge the allocation upward on its own so the app gets a chance to start. The behaviour is meant for the bootstrap window, not for routine operation.

The capacity gate

Before an app is installed from the catalog, Flui adds up the resource needs of the app — and of every component it brings, if it is a composed stack — and refuses the install up front if the cluster does not have room, keeping roughly a 10% safety margin in reserve. When it refuses, the message names the shortfall: how much CPU and memory the install needs versus how much is free.

The gate is deliberately asymmetric, and it is worth knowing why:

CPU is gated on requests. CPU is compressible — oversubscribing it only throttles a workload, it does not kill it — so the gate only needs the scheduling floor (the request) to fit.
Memory is gated on limits. Memory is incompressible — a pod holds the RAM it touches up to its limit, and a node that runs past what it can allocate starts killing pods — so the gate uses the peak (the limit, falling back to the request when a component declares no limit).

In short: CPU you can crowd, memory you cannot, and the gate reflects exactly that. This is the same arithmetic the dashboard deploy wizard runs, so the two surfaces agree on what will fit.

Volumes and storage

An app that needs to keep data across restarts declares one or more volumes — a name, a mount path inside the workload, a size. Where the data physically lives — the cluster’s shared storage, where any node can reach it and the app can be moved freely, or a node’s local disk, where access is faster but the app is pinned to that node — is decided by the cluster’s storage configuration. That choice carries implications for placement and for what happens during cluster operations like a node resize. The full story is in Storage classes and dedicated placement.

Port protocol

Every app’s main port carries a protocol — http or tcp, defaulting to http. It is a small field with one large consequence: it gates observability. An http app is scraped by Prometheus for the platform metrics like request rates; a tcp app — a database, a message broker, anything that does not speak HTTP on its port — is deliberately not HTTP-scraped, because there is nothing meaningful to scrape and probing it would only add noise. Declaring tcp is how you tell Flui “this is a wire protocol, not a web service”.

Readiness probes can also carry HTTP headers when the bare path-and-port check is not enough — most often a Host: header, so an app that only answers for a specific virtual host still passes its health check during a rolling restart instead of being failed by a request its router does not recognise.

How an app is reached: public vs internal

Independently of whether an app has an endpoint, an app that does is reached in one of two ways:

Public. The app gets a public endpoint: a DNS record on its own hostname, a TLS certificate, and routing from the open internet to the app. This is the default and the right answer for anything meant to be reached from outside.
Internal. The app gets only a cluster-internal address — no public hostname, no certificate, no DNS record to manage. It is reachable solely through the dashboard’s authenticating proxy. This is the right answer for an admin tool or a console you want behind the platform’s own login rather than on the public internet.

Apps the catalog marks privatizable can choose between the two at install time; others are fixed at whatever their manifest declares.

Use the URL the app gives you

Each app’s API representation carries the authoritative links it should be reached by. A public app exposes a real url of the form https://<fqdn><entrypointPath>; an internal app exposes an internalUrl instead. The important rule — stated here because it is easy to get wrong — is that any consumer of these (the dashboard’s “Open” button, the assistant) must use the value verbatim and must not reconstruct a URL from the slug. The slug is not the hostname; the real link accounts for the endpoint’s actual domain and path, and only the stored url / internalUrl is guaranteed to be correct.

Configuration and secrets

Plain configuration reaches the workload as environment variables and lives with the app — the dashboard shows both names and values. Secrets follow the same name-value shape but are stored encrypted and write-only in the user interface: the dashboard shows the names so the operator knows what is set, but the values are masked. A secret can be overwritten with a new value at any time; it cannot be read back once stored.

Both are editable from the dashboard and the CLI without a fresh build or a re-deploy — the platform applies the change through a rolling restart, and the new value is in effect once the restart completes.

How an app is classified

The dashboard groups apps along two orthogonal axes:

Category. Two values: system and user. System apps are the components of the platform itself — the API, the dashboard, the identity provider, the database, the observability stack — installed and reconciled by Flui as part of the control cluster. User apps are everything you deploy. Both kinds live in the same table and are operated through the same surfaces; the difference is that system apps are protected from accidental deletion (a deletion command refuses unless forced).
Kind. A coarser grouping by purpose: database, tool, application, system. Used by the dashboard to organise the Apps page in a way that matches how operators think about what is running. It does not change the lifecycle.

These two are distinct from the manifest’s own kind: field (Application vs CatalogApp), which describes where the app comes from, not what it does. See Catalog vs application for that side.

The lifecycle

An app sits in one of a handful of steady states:

Awaiting build — for apps built from source, while the build is in flight and the platform is waiting for an image.
Running — the app is up and serving its work.
Degraded — the app is up but not at full health (some replicas down, a healthcheck failing intermittently).
Stopped — scaled to zero on purpose; the configuration is still there, no instances are running.
Failed — a terminal failure (the last build failed, the last deploy failed, an unrecoverable condition was hit).
Deleted — the app has been torn down; the row remains for audit.

During an operation — a deploy, a rollback, a tear-down — the app briefly passes through transient states that the dashboard surfaces as progress indicators rather than steady values.

Operations

Every operation an operator can run on an app is available on both surfaces, with the same semantics:

Deploy / redeploy. Apply (or re-apply) the app’s configuration. For image-based apps a redeploy can also pin a different image version; for source-built apps a redeploy reuses an existing build instead of building again. Every deploy appends a new release entry.
Rollback. Move the current release pointer back to a previous entry. Importantly, this adds a new release entry (marked as a rollback) rather than rewinding history — the trail of what happened stays intact.
Start / stop. Bring the app’s replicas back from zero, or scale them to zero while keeping the configuration in place. Stopping is reversible without re-deploying.
Restart. Replace the running instances without changing the release. See the rolling-restart section below for the shape of this operation.
Scale. Set the replica count explicitly within the limits the manifest declares.
Delete. Remove the app’s resources from the cluster, free its endpoint and certificate, and soft-delete the row. System apps require an explicit force flag for this operation.

Rolling restart

When the running instances of an app have to be replaced — to pick up new configuration or new secrets, or just to clear a wedged state — Flui rolls them: new instances come up first, and only when they are healthy do the old ones go away. The replica count never drops below the declared number, so the app keeps serving throughout. The platform also triggers a rolling restart on its own whenever a change it can apply in place needs to reach the running instances.

Logs and metrics

Every app has the same observability surfaces from day one, on both the dashboard and the CLI:

Logs. Streamed from every instance, queryable by level and free-text search.
Metrics. CPU, memory, network and replica counts over time — the baseline Flui collects for every app it manages, with no setup on the app side.
Live status. Replicas ready, current release, health at a glance.
Crashes. A summarised view of recent crashes: exit codes, out-of-memory events, restart counts, last log lines before the crash.

All four read from the same central backend on the control cluster — the position observability has in the platform is covered in Observability is not an afterthought.

Going deeper when something is wrong

For the cases where the everyday views are not enough — a first release that won’t start, a sudden burst of crashes, a configuration that subtly breaks the workload — the dashboard groups two deeper tools under an Advanced menu on every app:

Diagnoses. An automated read of recent failures that goes beyond the raw crash list: it correlates exit codes, out-of-memory events, healthcheck failures and the log lines around them, and surfaces a probable cause and a suggested remediation (a missing environment variable to add, a memory limit to raise). Unresolved diagnoses are also shown as a banner on the app’s overview, so they cannot be missed.
Debug pods. An on-demand snapshot of every running instance of the app, with the cluster-level details an operator needs when the higher-level views fall short. Refreshed on request rather than continuously, so the cluster is not paying for a constant detailed read of every app.

These surfaces are designed for the moments when the everyday ones — logs, metrics, status, crashes — have stopped being enough. They are not part of the routine flow.

Autoscaling, on the roadmap

Automatic scaling on load — adjusting the replica count within the manifest’s min/max range from the app’s own metrics — is the next step. The pieces are in place (every app already declares the range and produces the metrics), the loop that closes them is not yet in production. Today the replica count is operator-driven.

System apps are apps

A specific point worth being explicit about: the platform’s own components — the API, the dashboard, the identity provider, the database, the observability stack — are themselves apps in this model. They appear in the app list, they have releases, they expose logs and metrics, they go through the same lifecycle as user apps. What sets them apart is the system category and the protection it carries: their delete operation refuses unless the operator explicitly forces it. They are not special-cased in any other way.

This homogeneity matters. Operators do not learn one set of commands for “Flui’s own bits” and another for “the apps I deployed”. The way you read the dashboard’s logs is the way you read your app’s logs.

Where this chapter goes from here

The flui.yaml manifest — the declarative shape that produces an app.
Catalog vs application — the two kinds of manifest that produce apps, and what distinguishes them.
Observability is not an afterthought — the surfaces that read from the cluster’s metrics and logs for every app you run.
Storage classes and dedicated placement — where an app’s volumes physically live and how dedicated apps land on a worker.
Composed catalog apps — multi-component installs and how the capacity gate sums them.