Zip Files are Everywhere
I grew up with PKZIP. It was a huge improvement over ARC files and other file formats that were supported on PCs in the 80’s and 90’s. Open source implementations of Zip have been available in Linux since the early days. In modern times, tools like 7-Zip provide a popular and friendly interface into the contents of these files. Hopefully this history is already familiar to you since I’m not going to focus on the usefulness of Zip to users, but how often developers have turned to it.
Zip files are almost certainly already on your computer, they just don’t have a .zip extension. The Zip file format is being used by software developers so they don’t have to reinvent the wheel of compression or putting multiple files into one. Many programming languages provide libraries for retrieving and adding files to Zip files. So, if you’re using any of the popular office suites, your documents are probably Zip files in disguise. If you’re a developer in Java, Python, Android, or browsers then you’ve used Zip files that did not have a .zip extension. You probably had no idea that Zip was under the hood.
Formats that are Zip Files
I’ve tried to link to specs when I was able to find them. I’m sure this is an incomplete list, but it is already so broad. And in many cases, the Zip-based format is the most widely used format for that application.
Office & Document Formats
- Microsoft Office XML:
.docx,.xlsx,.pptx,.vsdx(Visio) - OpenDocument:
.odt,.ods,.odp,.odg(LibreOffice/OpenOffice) - Apple iWork:
.pages,.numbers,.key(Keynote) - EPUB ebooks:
.epub
Development & Application Files
- Java archives:
.jar,.war,.ear - Android packages:
.apk - Browser extensions:
.xpi(Firefox),.crx(Chrome - after header), .maff - Python wheels:
.whl - Sublime Text packages:
.sublime-package - PyTorch packages:
.pt
Design & Graphics
- Sketch:
.sketch(Mac design tool) - OpenRaster:
.ora(GIMP, Krita) - Adobe InDesign packages:
.indd(basic indd files are not Zips, but they are Zips when packaged) - KMZ:
.kmz(Google Earth - compressed KML)
3D & Manufacturing
Label & Specialty Formats
- Brother label files:
.lbx,.pte(P-touch Editor) - Mozilla Archive Format: .maff
- OpenDocument templates:
.ott,.ots,.otp
The long-tail of zip-based formats probably has hundreds of examples.
Other Contenders for Stealth File Format Domination
Since it is the computer business we have to be clear on our attitude toward standardization:
The nice thing about standards is that you have so many to choose from; furthermore, if you do not like any of them, you can just wait for next year’s model.
So let’s look at some of the other choices.
SQLite
SQLite is an amazing database because of its extremely thorough test suite and small footprint. Relational databases are handy ways to access and update your application’s data, so SQLite is often embedded to provide the storage layer. Again, it is a huge win to avoid reinventing the wheel. Examples of applications you are already using that are using SQLite internally:
- Browser data: Chrome/Firefox history, cookies, bookmarks
- Apple: iMessage database, Photos library metadata, iOS backups
- Application data: Skype, Adobe Lightroom catalogs, many mobile apps
This also means that you can use SQLite-compatible tools to hack around in the data layer of your applications. Go exploring. I can’t wait to hear what you build on this data that is far from what the developers envisioned.
Naturally there are a slew of database products built on this foundation.
Proprietary Products Built on SQLite
- SQLCipher - Adds 256-bit AES encryption to SQLite. Has both open-source and commercial editions. Used by many commercial apps requiring encrypted local storage.
- Couchbase Lite - Mobile/embedded NoSQL database that uses SQLite for storage underneath its document model. Syncs with Couchbase Server.
- ActorDB - Commercial distributed SQL database that uses SQLite as the storage engine on each node with actor-model distribution on top.
- Oracle Berkeley DB - Oracle’s embedded database has SQL API support built on SQLite components (though Berkeley DB itself predates this).
Distributed/Replicated SQLite
- Litestream - Streaming replication to S3/Azure/GCS. Makes SQLite viable for production by continuously backing up to object storage.
- LiteFS - Distributed filesystem for SQLite by Fly.io. Provides FUSE-based replication across nodes.
- rqlite - Distributed relational database using Raft consensus. SQLite on every node with automatic replication.
- dqlite - Canonical’s distributed SQLite built on Raft. Used in LXD and other infrastructure projects.
- Turso - libSQL-based (SQLite fork) edge database with global replication. Commercial service built on SQLite foundation.
Enhanced SQLite Variants
- libSQL - Open-source SQLite fork by Turso/ChiselStrike. Adds features like WASM functions, encryption, virtual WAL interfaces.
- SQLite Cloud - Commercial distributed SQLite service with client-server architecture.
- cr-sqlite - CRDTs for SQLite, enabling multi-writer replication and local-first applications.
Application Frameworks
- Datasette - Publishes SQLite databases as interactive websites/APIs. Great for data journalism and exploration.
The distributed variants (rqlite, dqlite, LiteFS) are particularly interesting for production use on Linux servers where you want SQLite’s simplicity with multi-node reliability.
HDF5 (Hierarchical Data Format)
Scientific computing’s universal container:
- MATLAB:
.matfiles (version 7.3+) - Neuroimaging: Various brain imaging formats
- Climate/geospatial: NetCDF-4
- Python’s PyTables, various scientific data stores
- Tools:
h5dump, HDFView
Not that stealthy and pretty niche, but worth a mention. It would be cool to see this spread beyond its current niche.
TAR (Tape Archive)
Beyond .tar.gz, it underlies:
- Container images: Docker/OCI layers
- Package formats:
.deb(Debian packages contain tar archives) - Backups: Time Machine uses tar variants
- The go-to for preserving Unix permissions/metadata
- Sadly not great for programmers since you have to read the entire file up to the part you want. Zip encrypts each file and has an index so you can seek to the right spot, and skip gobs of wasted IOPS. And so fewer programming languages have built-in support for it.
RIFF (Resource Interchange File Format)
Microsoft’s chunk-based format:
- Audio:
.wav - Video:
.avi - Images:
.webp - MIDI: Some variants
Conclusion
I’m a big fan of sqlite and I’m so glad to see it has been adopted in so many places, but based on my recollection and research, Zip is still the file format to beat if your goal is world domination.
Let me know if I’ve missed any gems in the stealth file format landscape. You could even file a github issue or pull request if you see a gap that you’re sure should be filled.
Meta
- Cover image was generated with the Nano Banana Pro model using Galaxy.AI.
- Cross-posted on linked-in.