dedup

Content-address object bodies by SHA-256 hash so identical content is stored only once - re-uploads skip the byte upload and copies share one blob.

The built-in dedup() plugin stores each distinct body once. On upload it hashes the bytes (SHA-256), writes them a single time to a content-addressed blob under a store prefix (.dedup/ by default), and leaves a tiny pointer at your logical key. Upload the same content again — under any key — and the byte upload is skipped; only the pointer is written.

import { createFiles } from "files-sdk";
import { s3 } from "files-sdk/s3";
import { dedup } from "files-sdk/dedup";

const files = createFiles({
  adapter: s3({ bucket: "uploads" }),
  plugins: [dedup()],
});

await files.upload("a.png", bytes);
await files.upload("b.png", bytes); // same content — no second byte upload
await files.copy("a.png", "c.png"); // shares the one stored blob

How it works

The logical key (a.png) becomes an empty object whose metadata records the content hash; the bytes live at .dedup/<sha256>. Two keys with identical content point at the same blob, so the content is stored once no matter how many keys reference it.

upload hashes the body, writes the blob only if that hash isn’t already stored (exists()), then writes the pointer.
download follows the pointer to the blob and returns it under your key — ranges included, because blobs are stored verbatim (unlike compression()).
head / list report the logical content size with the internal fields stripped, without fetching the blob. List hides the blob store from normal listings.
copy / move relocate the small pointer, so duplicating a de-duplicated file is near-free and the copy shares the original blob.

Bulk upload([...]) / download([...]) are de-duplicated per item. Objects without this plugin’s marker — pre-existing, or written by another tool — pass straight through on read, so it’s safe to enable on a bucket that already has data.

Options

Option	Default	What it does
`prefix`	`".dedup"`	Where the content-addressed blobs live. Hidden from `list()`.

Objects under prefix are never themselves de-duplicated and are hidden from list() (unless you list within the prefix). Don’t store your own data there.

Ordering

Put dedup() first, before any body-transforming plugin. Encrypted bytes don’t de-duplicate — a random per-object key makes identical inputs encrypt to different bytes — so de-duplication has to see the original content:

plugins: [dedup(), compression(), encryption(key)];

With this order the one stored blob is itself compressed and encrypted, and reads unwind the onion automatically (decrypt → decompress → follow the pointer).

Things to keep in mind

It buffers the whole body to hash it, so — like compression() — it’s unsuitable for unknown-length streaming uploads and resumable uploads.
Reads cost a second fetch. A download reads the pointer, then the blob; a ranged download does a head first. head and list add nothing — they only read the pointer.
It needs adapter metadata support. The hash round-trips through metadata, the same gate a direct metadata upload hits.
url() and signedUploadUrl() fail closed. A presigned GET would hand out the empty pointer, not the content, and a presigned PUT would write directly and bypass content-addressing — so both throw. Download through the Files instance instead.
Blobs aren’t garbage-collected. delete (and an overwrite) drop the pointer but leave the content addressed — so it’s reused if the same bytes reappear. Reclaim unreferenced blobs with a storage lifecycle rule or a periodic sweep.