mirror of
https://github.com/dswd/zvault
synced 2025-01-09 22:07:53 +00:00
333 lines
14 KiB
Markdown
333 lines
14 KiB
Markdown
zvault(1) -- Deduplicating backup solution
|
|
==========================================
|
|
|
|
## SYNOPSIS
|
|
|
|
`zvault <SUBCOMMAND>`
|
|
|
|
|
|
|
|
## DESCRIPTION
|
|
|
|
ZVault is a deduplicating backup solution. It creates backups from data read
|
|
from the filesystem or a tar file, deduplicates it, optionally compresses and
|
|
encrypts the data and stores the data in bundles at a potentially remote storage
|
|
location.
|
|
|
|
|
|
|
|
## OPTIONS
|
|
|
|
* `-h`, `--help`:
|
|
|
|
Prints help information
|
|
|
|
|
|
* `-V`, `--version`:
|
|
|
|
Prints version information
|
|
|
|
|
|
|
|
## SUBCOMMANDS
|
|
|
|
|
|
### Main Commands
|
|
|
|
* `init` Initialize a new repository, _zvault-init(1)_
|
|
* `import` Reconstruct a repository from the remote storage, _zvault-import(1)_
|
|
* `backup` Create a new backup, _zvault-backup(1)_
|
|
* `restore` Restore a backup or subtree, _zvault-restore(1)_
|
|
* `check` Check the repository, a backup or a backup subtree, _zvault-check(1)_
|
|
* `list` List backups or backup contents, _zvault-list(1)_
|
|
* `info` Display information on a repository, a backup or a subtree, _zvault-info(1)_
|
|
* `mount` Mount the repository, a backup or a subtree, _zvault-mount(1)_
|
|
* `remove` Remove a backup or a subtree, _zvault-remove(1)_
|
|
* `prune` Remove backups based on age, _zvault-prune(1)_
|
|
* `vacuum` Reclaim space by rewriting bundles, _zvault-vacuum(1)_
|
|
|
|
|
|
### Other Commands
|
|
|
|
* `addkey` Add a key pair to the repository, _zvault-addkey(1)_
|
|
* `algotest` Test a specific algorithm combination, _zvault-algotest(1)_
|
|
* `analyze` Analyze the used and reclaimable space of bundles, _zvault-analyze(1)_
|
|
* `bundleinfo` Display information on a bundle, _zvault-bundleinfo(1)_
|
|
* `bundlelist` List bundles in a repository, _zvault-bundlelist(1)_
|
|
* `config` Display or change the configuration, _zvault-config(1)_
|
|
* `diff` Display differences between two backup versions, _zvault-diff(1)_
|
|
* `genkey` Generate a new key pair, _zvault-genkey(1)_
|
|
* `versions` Find different versions of a file in all backups, _zvault-versions(1)_
|
|
|
|
|
|
## USAGE
|
|
|
|
### Path syntax
|
|
|
|
Most subcommands work with a repository that has to be specified as a parameter.
|
|
If this repository is specified as `::`, the default repository in `~/.zvault`
|
|
will be used instead.
|
|
|
|
Some subcommands need to reference a specific backup in the repository. This is
|
|
done via the syntax `repository::backup_name` where `repository` is the path to
|
|
the repository and `backup_name` is the name of the backup in that repository
|
|
as listed by `zvault list`. In this case, `repository` can be omitted,
|
|
shortening the syntax to `::backup_name`. In this case, the default repository
|
|
is used.
|
|
|
|
Some subcommands need to reference a specific subtree inside a backup. This is
|
|
done via the syntax `repository::backup_name::subtree` where
|
|
`repository::backup_name` specifies a backup as described before and `subtree`
|
|
is the path to the subtree of the backup. Again, `repository` can be omitted,
|
|
yielding the shortened syntax `::backup_name::subtree`.
|
|
|
|
Some subcommands can take either a repository, a backup or a backup subtree. In
|
|
this case it is important to note that if a path component is empty, it is
|
|
regarded as not set at all.
|
|
|
|
Examples:
|
|
|
|
- `~/.zvault` references the repository in `~/.zvault` and is identical with
|
|
`::`.
|
|
- `::backup1` references the backup `backup1` in the default repository
|
|
- `::backup1::/` references the root folder of the backup `backup1` in the
|
|
default repository
|
|
|
|
|
|
## CONFIGURATION OPTIONS
|
|
ZVault offers some configuration options that affect the backup speed, storage
|
|
space, security and RAM usage. Users should select them carefully for their
|
|
scenario. The performance of different combinations can be compared using
|
|
_zvault-algotest(1)_.
|
|
|
|
|
|
### Bundle size
|
|
The target bundle size affects how big bundles written to the remote storage
|
|
will become. The configured size is not a hard maximum, as headers and some
|
|
effects of compression can cause bundles to become slightly larger than this
|
|
size. Also since bundles will be closed at the end of a backup run, some bundles
|
|
can also be smaller than this size. However most bundles will end up with
|
|
approximately the specified size.
|
|
|
|
The configured value for the bundle size has some practical consequences.
|
|
Since the whole bundle is compressed as a whole (a so-called *solid archive*),
|
|
the compression ratio is impacted negatively if bundles are small. Also the
|
|
remote storage could become inefficient if too many small bundle files are
|
|
stored. On the other side, since the whole bundle has to be fetched and
|
|
decompressed to read a single chunk from that bundle, bigger bundles increase
|
|
the overhead of reading the data.
|
|
|
|
The recommended bundle size is 25 MiB, but values between 5 MiB and 100 MiB
|
|
should also be feasable.
|
|
|
|
|
|
### Chunker
|
|
The chunker is the component that splits the input data into so-called *chunks*.
|
|
The main goal of the chunker is to produce as many identical chunks as possible
|
|
when only small parts of the data changed since the last backup. The identical
|
|
chunks do not have to be stored again, thus the input data is deduplicated.
|
|
To achieve this goal, the chunker splits the input data based on the data
|
|
itself, so that identical parts can be detected even when their position
|
|
changed.
|
|
|
|
ZVault offers different chunker algorithms with different properties to choose
|
|
from:
|
|
|
|
- The **rabin** chunker is a very common algorithm with a good quality but a
|
|
mediocre speed (about 350 MB/s).
|
|
- The **ae** chunker is a novel approach that can reach very high speeds
|
|
(over 750 MB/s) at a cost of deduplication rate.
|
|
- The **fastcdc** algorithm reaches a similar deduplication rate as the rabin
|
|
chunker but is faster (about 550 MB/s).
|
|
|
|
The recommended chunker is **fastcdc**.
|
|
|
|
Besides the chunker algorithm, an important setting is the target chunk size,
|
|
i.e. the planned average chunk size. Since the chunker splits the data on
|
|
data-dependent criteria, it will not achieve the configured size exactly.
|
|
The chunk size has a number of practical implications. Since deduplication works
|
|
by identifying identical chunks, smaller chunk sizes will be able to find more
|
|
identical chunks and thereby reduce the overall storage space.
|
|
|
|
On the other side, the index needs to store 24 bytes per chunk, so many small
|
|
chunks will take more space than few big chunks. Since the index of all chunks
|
|
in the repository needs to be loaded into memory during the backup, huge
|
|
repositories can get a problem with memory usage. Since the index could be only
|
|
40% filled and the chunker could yield smaller chunks than configured, 100 bytes
|
|
per chunk should be a safe value to calculate with.
|
|
|
|
The configured value for chunk size needs to be a power of 2. Here is a
|
|
selection of chunk sizes and their estimated RAM usage:
|
|
|
|
- Chunk size 4 KiB => ~40 GiB data stored in 1 GiB RAM
|
|
- Chunk size 8 KiB => ~80 GiB data stored in 1 GiB RAM
|
|
- Chunk size 16 KiB => ~160 GiB data stored in 1 GiB RAM
|
|
- Chunk size 32 KiB => ~325 GiB data stored in 1 GiB RAM
|
|
- Chunk size 64 KiB => ~650 GiB data stored in 1 GiB RAM
|
|
- Chunk size 128 KiB => ~1.3 TiB data stored in 1 GiB RAM
|
|
- Chunk size 256 KiB => ~2.5 TiB data stored in 1 GiB RAM
|
|
- Chunk size 512 KiB => ~5 TiB data stored in 1 GiB RAM
|
|
- Chunk size 1024 KiB => ~10 TiB data stored in 1 GiB RAM
|
|
|
|
The recommended chunk size for normal computers is 16 KiB. Servers with lots of
|
|
data might want to use 128 KiB or 1024 KiB instead.
|
|
|
|
The chunker algortihm and chunk size are configured together in the format
|
|
`algorithm/size` where algorithm is one of `rabin`, `ae` and `fastcdc` and size
|
|
is the size in KiB e.g. `16`. So the recommended configuration is `fastcdc/16`.
|
|
|
|
Please not that since the chunker algorithm and chunk size affect the chunks
|
|
created from the input data, any change to those values will make existing
|
|
chunks inaccessible for deduplication purposes. The old data is still readable
|
|
but new backups will have to store all data again.
|
|
|
|
|
|
### Compression
|
|
ZVault offers different compression algorithms that can be used to compress the
|
|
stored data after deduplication. The compression ratio that can be achieved
|
|
mostly depends on the input data (test data can be compressed well and media
|
|
data like music and videos are already compressed and can not be compressed
|
|
significantly).
|
|
|
|
Using a compression algorithm is a trade-off between backup speed and storage
|
|
space. Higher compression takes longer and saves more space while low
|
|
compression is faster but needs more space.
|
|
|
|
ZVault supports the following compression methods:
|
|
|
|
- **deflate** (also called *zlib* and *gzip*) is the most common algorithm today
|
|
and guarantees that backups can be decompressed in future centuries. Its
|
|
speed and compression ratio are acceptable but other algorithms are better.
|
|
This is a rather conservative choice. This algorithm supports the levels 1
|
|
(fastest) to 9 (best).
|
|
- **lz4** is a very fast compression algorithm that does not impact backup speed
|
|
very much. It does not compress as good as other algorithms but is much faster
|
|
than all other algorithms. This algorithm supports levels 1 (fastest) to 14
|
|
(best) but levels above 7 are significantly slower and not recommended.
|
|
- **brotli** is a modern compression algorithm that is both faster and
|
|
compresses better than deflate. It offers a big range of compression ratios
|
|
and speeds via its levels. This algorithm supports levels 1 (fastest) to 10
|
|
(best).
|
|
- **lzma** is about the algorithm with the best compression today. That comes
|
|
at the cost of speed. LZMA is rather slow at all levels so it can slow down
|
|
the backup speed significantly. This algorithm supports levels 1 (fastest) to
|
|
9 (best).
|
|
|
|
The recommended combinations are:
|
|
|
|
- Focusing speed: lz4 with level between 1 and 7
|
|
- Balanced focus: brotli with levels between 1 and 10
|
|
- Focusing storage space: lzma with levels between 1 and 9
|
|
|
|
The compression algorithm and level are configured together via the syntax
|
|
`algorithm/level` where `algorithm` is either `deflate`, `lz4`, `brotli` or
|
|
`lzma` and `level` is a number.
|
|
|
|
The default compression setting is **brotli/3**.
|
|
|
|
Since the compression ratio and speed hugely depend on the input data,
|
|
_zvault-algotest(1)_ should be used to compare algorithms with actual input
|
|
data.
|
|
|
|
|
|
|
|
### Encryption
|
|
When enabled, zVault uses modern encryption provided by *libsodium* to encrypt
|
|
the bundles that are stored remotely. This makes it impossible for anyone with
|
|
access to the remote bundles to read their contents or to modify them.
|
|
|
|
zVault uses asymmetric encryption, which means that encryption uses a so called
|
|
*public key* and decryption uses a different *secret key*. This makes it
|
|
possible to setup a backup configuration where the machine can only create
|
|
backups but not read them. Since lots of subcommands need to read the backups,
|
|
this setup is not recommended in general.
|
|
|
|
The key pairs used by zVault can be created by _zvault-genkey(1)_ and added to a
|
|
repository via _zvault-addkey(1)_ or upon creation via the `--encryption` flag
|
|
in _zvault-init(1)_.
|
|
|
|
**Important: The key pair is needed to read and restore any encrypted backup.
|
|
Loosing the secret key means that all data in the backups is lost forever.
|
|
There is no backdoor, even the developers of zVault can not recover a lost key
|
|
pair. So it is important to store the key pair in a safe location. The key pair
|
|
is small enough to be printed on paper for example.**
|
|
|
|
|
|
### Hash method
|
|
ZVault uses hash fingerprints to identify chunks. It is critically important
|
|
that no two chunks have the same hash value (a so-called hash collision) as this
|
|
would cause one chunk to overwrite the other chunk. For this purpose zVault uses
|
|
128 bit hashes, that have a collision probability of less than 1.5e-15 even for
|
|
1 trillion stored chunks (about 15.000 TiB stored data in 16 KiB chunks).
|
|
|
|
ZVault offers two different hash algorithms: **blake2** and **murmur3**.
|
|
|
|
Murmur3 is blazingly fast but is not cryptographically secure. That means that
|
|
while random hash collisions are negligible, an attacker with access to files
|
|
could manipulate a file so that it will cause a hash collision and affects other
|
|
data in the repository. **This hash should only be used when the security
|
|
implications of this are fully understood.**
|
|
|
|
Blake2 is slower than murmur3 but also pretty fast and this hash algorithm is
|
|
cryptographically secure, i.e. even an attacker can not cause hash collisions.
|
|
|
|
The recommended hash algorithm is **blake2**.
|
|
|
|
|
|
|
|
## EXAMPLES
|
|
|
|
This command will initialize a repository in the default location with
|
|
encryption enabled:
|
|
|
|
$> zvault init :: -e --remote /mnt/remote/backups
|
|
|
|
Before using this repository, the key pair located at `~/.zvault/keys` should be
|
|
backed up in a safe location (e.g. printed to paper).
|
|
|
|
This command will create a backup of the whole system tagged by date:
|
|
|
|
$> zvault backup / ::system/$(date +%F)
|
|
|
|
If the home folders are mounted on /home, the following command can be used to
|
|
backup them separatly (zVault will not backup mounted folders by default):
|
|
|
|
$> zvault backup /home ::homes/$(date +%F)
|
|
|
|
The backups can be listed by this command:
|
|
|
|
$> zvault list ::
|
|
|
|
and inspected by this command (the date needs to be adapted):
|
|
|
|
$> zvault info ::homes/2017-04-06
|
|
|
|
To restore some files from a backup, the following command can be used:
|
|
|
|
$> zvault restore ::homes/2017-04-06::bob/file.dat /tmp
|
|
|
|
Alternatively the repository can be mounted with this command:
|
|
|
|
$> zvault mount ::homes/2017-04-06 /mnt/tmp
|
|
|
|
A single backup can be removed with this command:
|
|
|
|
$> zvault remove ::homes/2017-04-06
|
|
|
|
Multiple backups can be removed based on their date with the following command
|
|
(add `-f` to actually remove backups):
|
|
|
|
$> zvault prune :: --prefix system --daily 7 --weekly 5 --monthly 12
|
|
|
|
To reclaim storage space after removing some backups vacuum needs to be run
|
|
(add `-f` to actually remove bundles):
|
|
|
|
$> zvault vacuum ::
|
|
|
|
|
|
|
|
## COPYRIGHT
|
|
|
|
Copyright (C) 2017 Dennis Schwerdel
|
|
This software is licensed under GPL-3 or newer (see LICENSE.md)
|