dedup
Content-address object bodies by their hash so identical content is stored only once. Re-uploading the same bytes skips the upload, and copies share a single stored blob. No native dependencies; works on any adapter that supports metadata.
The built-in dedup() plugin stores each distinct body once. On upload it hashes the bytes (SHA-256), writes them a single time to a content-addressed blob under a store prefix (.dedup/ by default), and leaves a tiny pointer at your logical key. Upload the same content again — under any key — and the byte upload is skipped; only the pointer is written.
import { createFiles } from "files-sdk";
import { s3 } from "files-sdk/s3";
import { dedup } from "files-sdk/dedup";
const files = createFiles({
adapter: s3({ bucket: "uploads" }),
plugins: [dedup()],
});
await files.upload("a.png", bytes);
await files.upload("b.png", bytes); // same content — no second byte upload
await files.copy("a.png", "c.png"); // shares the one stored blobHow it works
The logical key (a.png) becomes an empty object whose metadata records the content hash; the bytes live at .dedup/<sha256>. Two keys with identical content point at the same blob, so the content is stored once no matter how many keys reference it.
uploadhashes the body, writes the blob only if that hash isn't already stored (exists()), then writes the pointer.downloadfollows the pointer to the blob and returns it under your key — ranges included, because blobs are stored verbatim (unlikecompression()).head/listreport the logical content size with the internal fields stripped, without fetching the blob. List hides the blob store from normal listings.copy/moverelocate the small pointer, so duplicating a de-duplicated file is near-free and the copy shares the original blob.
Bulk upload([...]) / download([...]) are de-duplicated per item. Objects without this plugin's marker — pre-existing, or written by another tool — pass straight through on read, so it's safe to enable on a bucket that already has data.
Options
| Option | Default | What it does |
|---|---|---|
prefix | ".dedup" | Where the content-addressed blobs live. Hidden from list(). |
Objects under prefix are never themselves de-duplicated and are hidden from list() (unless you list within the prefix). Don't store your own data there.
Ordering
Put dedup() first, before any body-transforming plugin. Encrypted bytes don't de-duplicate — a random per-object key makes identical inputs encrypt to different bytes — so de-duplication has to see the original content:
plugins: [dedup(), compression(), encryption(key)];With this order the one stored blob is itself compressed and encrypted, and reads unwind the onion automatically (decrypt → decompress → follow the pointer).
Things to keep in mind
- It buffers the whole body to hash it, so — like
compression()— it's unsuitable for unknown-length streaming uploads and resumable uploads. - Reads cost a second fetch. A download reads the pointer, then the blob; a ranged download does a
headfirst.headandlistadd nothing — they only read the pointer. - It needs adapter metadata support. The hash round-trips through
metadata, the same gate a directmetadataupload hits. url()andsignedUploadUrl()fail closed. A presigned GET would hand out the empty pointer, not the content, and a presigned PUT would write directly and bypass content-addressing — so both throw. Download through theFilesinstance instead.- Blobs aren't garbage-collected.
delete(and an overwrite) drop the pointer but leave the content addressed — so it's reused if the same bytes reappear. Reclaim unreferenced blobs with a storage lifecycle rule or a periodic sweep.
contentType
A security guard that decides each upload's Content-Type from its bytes, not the client's claim. Magic-byte sniffing stops a mislabeled .png that's really HTML or SVG from being stored as an image and served inline. No metadata, no native dependencies, and it never buffers the whole body.
encryption
Envelope-encrypt object bodies at rest with AES-256-GCM. A per-object data key encrypts the body and your master key wraps it into metadata - provider-agnostic, no native dependencies, decrypted transparently on download.