failover

Read and write the primary adapter, and fall back to one or more secondary adapters when a backend is down. A live, per-operation failover chain - body-transparent, no native dependencies.

The built-in failover() plugin keeps a Files instance serving while a backend is down. Every operation tries the primary first; if it throws because the backend is unreachable, the plugin retries against a secondary - and the next, and the next - until one succeeds. The primary is the instance's own adapter (reached through the rest of the pipeline); the secondaries are backup adapters you pass in.

It's body-transparent: it never buffers or transforms bytes, so streaming, range downloads, url(), and signedUploadUrl() all keep working. It has no native dependencies, adds no methods (wrap only, so plain new Files() is enough), and works on any set of adapters.

import { Files } from "files-sdk";
import { s3 } from "files-sdk/s3";
import { failover } from "files-sdk/failover";

const files = new Files({
  adapter: s3({ bucket: "primary", region: "us-east-1" }), // primary
  plugins: [
    failover({
      secondaries: s3({ bucket: "backup", region: "us-west-2" }),
      onFailover: ({ operation, failed }) =>
        console.warn(`failover: ${operation} fell off backend ${failed}`),
    }),
  ],
});

await files.download("report.pdf"); // primary, or the backup if it's down
await files.upload("invoice.pdf", body); // lands on the first reachable backend

Failover vs replication vs tiering

These three Tier B plugins all take a second adapter, but they do different jobs:

PluginWhat it does
failover()Try the primary; fall back to a secondary only when a backend is down.
replication()Write every mutation to all backends (fan-out).
tiering()Partition objects across backends by key / size / age.

failover() treats each secondary as a full replica of one namespace, so it never splits or merges data across backends. It's the availability lever: keep serving against whatever backend is up.

The failover chain

Pass one secondary or several. They're tried in order after the primary, forming the chain [primary, ...secondaries]:

failover({
  secondaries: [
    s3({ bucket: "backup-eu" }), // tried after the primary
    s3({ bucket: "backup-us" }), // tried after that
  ],
});

Each operation walks the chain until one backend succeeds. If every backend is down, the last error is thrown.

When does it fail over?

By default, the plugin fails over only on a Provider error - a network failure, timeout, or 5xx, i.e. "the backend is down" - and never on an aborted request. A definitive answer from a healthy backend is surfaced as-is, not masked by probing a replica:

  • a NotFound stays a NotFound - a genuine 404 isn't turned into a slow scan of every replica;
  • an Unauthorized / Conflict / ReadOnly is likewise a real answer, not a reason to try elsewhere.

This keeps reads honest: the primary is the source of truth, and the secondary only answers when the primary can't.

Customising the predicate

Pass shouldFailover to change the rule. It receives the error normalized to a FilesError (so code and aborted are always set). For example, to read through to a replica on a miss - useful when the secondary is a live mirror that may be ahead of the primary:

failover({
  secondaries: replica,
  shouldFailover: (error) =>
    error.code === "NotFound" || error.code === "Provider",
});

Failing over on NotFound means a delete that only reached one backend can be "resurrected" on the next read from another. Reach for it only when your secondaries are genuine replicas kept in sync.

What each verb does

  • download / head / url / exists read from the first reachable backend.
  • upload / delete / copy / move run against the first reachable backend. They are not fanned out to every backend - that's replication().
  • list returns the first reachable backend's page. It is not merged across backends (each secondary is a full replica, so there's nothing to interleave).
  • signedUploadUrl signs against the first reachable backend.

Bulk calls fan out to one operation per item, so each element fails over independently.

Streaming uploads

A ReadableStream body is read-once - once the primary has consumed it, there's nothing left to replay against a backup. So a streaming upload runs against the primary alone and isn't failed over; if the primary is down, the upload fails. Every other body (a string, Blob, File, ArrayBuffer, or typed array) re-reads, so it fails over normally. Buffer a stream up front if you need a streaming upload to survive a primary outage.

Observing failovers

onFailover fires (fire-and-forget) each time an operation moves to the next backend - wire it to your metrics or alerting to learn a backend is degraded:

failover({
  secondaries: [backupA, backupB],
  onFailover: ({ operation, failed, next, error }) => {
    metrics.increment("storage.failover", { operation, from: failed });
    log.warn(`backend ${failed} failed ${operation}: ${error.message}`);
  },
});

failed and next are indices into [primary, ...secondaries] - 0 is the primary, 1 the first secondary, and so on. A throw from the handler is swallowed, so it can never break the operation.

Consistency: availability, not convergence

Failover buys availability, not consistency. An object written to a secondary while the primary was down lives only on that secondary; once the primary recovers, a read hits it first and gets a NotFound. Failover doesn't reconcile that gap for you. To converge:

  • keep the secondary current with replication() (write-through to both), or
  • reconcile after an outage with sync / transfer, or
  • pass a shouldFailover that also fails over on NotFound so reads fall through to the replica.

Ordering and prefixes

  • Place it last (innermost). Body-transforming plugins like encryption() and compression() wrap failover() and transform the op on the way in, so the same bytes reach every backend:

    plugins: [encryption(key), failover({ secondaries: backup })];
  • Address objects by caller-facing keys. Each secondary does not receive the instance prefix, so give it its own bucket / container and avoid a client prefix on a failover instance.

Things to keep in mind

  • Secondaries are real stores. A failed-over read pays the secondary's latency, and it must actually hold the object (keep it in sync with replication() / sync).
  • The primary is the source of truth. With the default predicate, a healthy primary's NotFound is returned without consulting any replica.
  • Streaming uploads don't fail over. A ReadableStream can't be replayed; buffer it first if it must survive a primary outage.

On this page