failover
Read and write the primary adapter, and fall back to one or more secondary adapters when a backend is down. A live, per-operation failover chain - body-transparent, no native dependencies.
The built-in failover() plugin keeps a Files instance serving while a backend is down. Every operation tries the primary first; if it throws because the backend is unreachable, the plugin retries against a secondary - and the next, and the next - until one succeeds. The primary is the instance's own adapter (reached through the rest of the pipeline); the secondaries are backup adapters you pass in.
It's body-transparent: it never buffers or transforms bytes, so streaming, range downloads, url(), and signedUploadUrl() all keep working. It has no native dependencies, adds no methods (wrap only, so plain new Files() is enough), and works on any set of adapters.
import { Files } from "files-sdk";
import { s3 } from "files-sdk/s3";
import { failover } from "files-sdk/failover";
const files = new Files({
adapter: s3({ bucket: "primary", region: "us-east-1" }), // primary
plugins: [
failover({
secondaries: s3({ bucket: "backup", region: "us-west-2" }),
onFailover: ({ operation, failed }) =>
console.warn(`failover: ${operation} fell off backend ${failed}`),
}),
],
});
await files.download("report.pdf"); // primary, or the backup if it's down
await files.upload("invoice.pdf", body); // lands on the first reachable backendFailover vs replication vs tiering
These three Tier B plugins all take a second adapter, but they do different jobs:
| Plugin | What it does |
|---|---|
failover() | Try the primary; fall back to a secondary only when a backend is down. |
replication() | Write every mutation to all backends (fan-out). |
tiering() | Partition objects across backends by key / size / age. |
failover() treats each secondary as a full replica of one namespace, so it never splits or merges data across backends. It's the availability lever: keep serving against whatever backend is up.
The failover chain
Pass one secondary or several. They're tried in order after the primary, forming the chain [primary, ...secondaries]:
failover({
secondaries: [
s3({ bucket: "backup-eu" }), // tried after the primary
s3({ bucket: "backup-us" }), // tried after that
],
});Each operation walks the chain until one backend succeeds. If every backend is down, the last error is thrown.
When does it fail over?
By default, the plugin fails over only on a Provider error - a network failure, timeout, or 5xx, i.e. "the backend is down" - and never on an aborted request. A definitive answer from a healthy backend is surfaced as-is, not masked by probing a replica:
- a
NotFoundstays aNotFound- a genuine 404 isn't turned into a slow scan of every replica; - an
Unauthorized/Conflict/ReadOnlyis likewise a real answer, not a reason to try elsewhere.
This keeps reads honest: the primary is the source of truth, and the secondary only answers when the primary can't.
Customising the predicate
Pass shouldFailover to change the rule. It receives the error normalized to a FilesError (so code and aborted are always set). For example, to read through to a replica on a miss - useful when the secondary is a live mirror that may be ahead of the primary:
failover({
secondaries: replica,
shouldFailover: (error) =>
error.code === "NotFound" || error.code === "Provider",
});Failing over on NotFound means a delete that only reached one backend can
be "resurrected" on the next read from another. Reach for it only when your
secondaries are genuine replicas kept in sync.
What each verb does
download/head/url/existsread from the first reachable backend.upload/delete/copy/moverun against the first reachable backend. They are not fanned out to every backend - that'sreplication().listreturns the first reachable backend's page. It is not merged across backends (each secondary is a full replica, so there's nothing to interleave).signedUploadUrlsigns against the first reachable backend.
Bulk calls fan out to one operation per item, so each element fails over independently.
Streaming uploads
A ReadableStream body is read-once - once the primary has consumed it, there's nothing left to replay against a backup. So a streaming upload runs against the primary alone and isn't failed over; if the primary is down, the upload fails. Every other body (a string, Blob, File, ArrayBuffer, or typed array) re-reads, so it fails over normally. Buffer a stream up front if you need a streaming upload to survive a primary outage.
Observing failovers
onFailover fires (fire-and-forget) each time an operation moves to the next backend - wire it to your metrics or alerting to learn a backend is degraded:
failover({
secondaries: [backupA, backupB],
onFailover: ({ operation, failed, next, error }) => {
metrics.increment("storage.failover", { operation, from: failed });
log.warn(`backend ${failed} failed ${operation}: ${error.message}`);
},
});failed and next are indices into [primary, ...secondaries] - 0 is the primary, 1 the first secondary, and so on. A throw from the handler is swallowed, so it can never break the operation.
Consistency: availability, not convergence
Failover buys availability, not consistency. An object written to a secondary while the primary was down lives only on that secondary; once the primary recovers, a read hits it first and gets a NotFound. Failover doesn't reconcile that gap for you. To converge:
- keep the secondary current with
replication()(write-through to both), or - reconcile after an outage with
sync/transfer, or - pass a
shouldFailoverthat also fails over onNotFoundso reads fall through to the replica.
Ordering and prefixes
-
Place it last (innermost). Body-transforming plugins like
encryption()andcompression()wrapfailover()and transform the op on the way in, so the same bytes reach every backend:plugins: [encryption(key), failover({ secondaries: backup })]; -
Address objects by caller-facing keys. Each secondary does not receive the instance
prefix, so give it its own bucket / container and avoid a clientprefixon a failover instance.
Things to keep in mind
- Secondaries are real stores. A failed-over read pays the secondary's latency, and it must actually hold the object (keep it in sync with
replication()/sync). - The primary is the source of truth. With the default predicate, a healthy primary's
NotFoundis returned without consulting any replica. - Streaming uploads don't fail over. A
ReadableStreamcan't be replayed; buffer it first if it must survive a primary outage.
encryption
Envelope-encrypt object bodies at rest with AES-256-GCM. A per-object data key encrypts the body and your master key wraps it into metadata - provider-agnostic, no native dependencies, decrypted transparently on download.
signed-url-policy
A fail-safe guard that enforces safe defaults on url() and signedUploadUrl() - force a download disposition, cap expiry, and require a server-enforced upload size limit. Provider-agnostic, no native dependencies, no metadata.