mirror of https://github.com/dswd/zvault
Repository readme
This commit is contained in:
parent
883c4c1c24
commit
4145160660
|
@ -1,79 +0,0 @@
|
|||
% Bundle file format
|
||||
## Bundle file format
|
||||
|
||||
The bundle file format consists of 4 parts:
|
||||
- A magic header with version
|
||||
- An encoded header structure
|
||||
- An encoded chunk list
|
||||
- The chunk data
|
||||
|
||||
The main reason for having those multiple parts is that it is expected that the
|
||||
smaller front parts can be read much faster than the the whole file. So
|
||||
information that is needed more frequently is put into earlier parts and the
|
||||
data that is need the least frequent is put into the latter part so that it does
|
||||
not slow down reading the front parts. Keeping those parts in separate files
|
||||
was also considered but rejected to increase the reliability of the storage.
|
||||
|
||||
|
||||
### Magic header with version
|
||||
The first part of a bundle file contains an 8 byte magic header with version
|
||||
information.
|
||||
|
||||
The first 6 bytes of the header consist of the fixed string "zvault", followed
|
||||
by one byte with the fixed value 0x01. Those 7 bytes make up the magic header of
|
||||
the file and serve to identify the file type as a zvault bundle file.
|
||||
|
||||
The 8th byte of the first file part is the version of the file format. This
|
||||
value is currently 0x01 and is expected to be increased for any breaking changes
|
||||
in the file format.
|
||||
|
||||
|
||||
### Encoded header structure
|
||||
The encoded header structure is the second part of the bundle file format and
|
||||
follows directly after the 8 bytes of the magic header.
|
||||
|
||||
The header structure is defined in `bundle.rs` as `BundleInfo` and contains
|
||||
general information on the bundle's contents and on how to decode the other two
|
||||
parts of the bundle file.
|
||||
|
||||
This header structure is encoded using the *MsgPack* format. It is neither
|
||||
compressed (since its size is pretty small) nor encrypted (since it only
|
||||
contains general information and no user data) in any way.
|
||||
|
||||
|
||||
### Encoded chunk list
|
||||
The chunk list is the third part of the bundle file and follows directly after
|
||||
the encoded header structure.
|
||||
|
||||
The chunk list contains hashes and sizes of all chunks stored in this bundle in
|
||||
the order they are stored. The list is encoded efficiently as 20 bytes per chunk
|
||||
(16 for the hash and 4 for the size) as defined in `../util/chunk.rs`.
|
||||
|
||||
Since the chunk list contains confidential information (the chunk hashes and
|
||||
sized can be used to identify files) the encoded chunk list is encrypted using
|
||||
the encryption method specified in the header structure. The header structure
|
||||
also contains the full size of the encoded and encrypted chunk list which is
|
||||
needed since the encryption could add some bytes for a nonce or an
|
||||
authentication code.
|
||||
|
||||
The chunk list is not compressed since the hashes have a very high entropy and
|
||||
do not compress significantly.
|
||||
|
||||
The chunk list is not stored in the header structure because it contains
|
||||
confidential data and the encryption method is stored in the header. Also the
|
||||
chunk list can be pretty big compared to the header which needs to be read more
|
||||
often.
|
||||
|
||||
|
||||
### Chunk data
|
||||
The chunk data is the final part of a bundle file and follows after the encoded
|
||||
chunk list. The starting position can be obtained from the header as the encoded
|
||||
size of the chunk list is stored there.
|
||||
|
||||
The chunk data part consists of the content data of the chunks contained in this
|
||||
bundle simply concatenated without any separator. The actual size (and by
|
||||
summing up the sizes also the starting position) of each chunk can be obtained
|
||||
from the chunk list.
|
||||
|
||||
The chunk data is compressed as whole (solid archive) and encrypted with the
|
||||
methods specified in the bundle header structure.
|
|
@ -15,6 +15,9 @@ The repository contains the following components:
|
|||
is running for sure.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Repository format
|
||||
|
||||
In case the zVault software is not available for restoring the backups included
|
||||
|
@ -23,11 +26,12 @@ so that its contents can be read without zVault.
|
|||
|
||||
|
||||
### Bundle files
|
||||
The bundle file format consists of 4 parts:
|
||||
The bundle file format consists of 5 parts:
|
||||
- A magic header with version
|
||||
- An encoded header structure
|
||||
- An encoded chunk list
|
||||
- The chunk data
|
||||
- A tiny header with encryption information
|
||||
- An encoded and encrypted bundle information structure
|
||||
- An encoded and encrypted chunk list
|
||||
- The chunk data (compressed and encrypted)
|
||||
|
||||
The main reason for having those multiple parts is that it is expected that the
|
||||
smaller front parts can be read much faster than the the whole file. So
|
||||
|
@ -50,87 +54,463 @@ value is currently 0x01 and is expected to be increased for any breaking changes
|
|||
in the file format.
|
||||
|
||||
|
||||
#### Encoded header structure
|
||||
The encoded header structure is the second part of the bundle file format and
|
||||
follows directly after the 8 bytes of the magic header.
|
||||
#### Encryption header
|
||||
The encryption header is the second part of the bundle file format and follows
|
||||
directly after the 8 bytes of the magic header.
|
||||
|
||||
The header structure is defined in the appendix as `BundleInfo` and contains
|
||||
general information on the bundle's contents and on how to decode the other two
|
||||
parts of the bundle file.
|
||||
The header structure is defined in the appendix as `BundleHeader` and contains
|
||||
information on how to decrypt the other parts of the bundle as well as the
|
||||
encrypted size of the following bundle information.
|
||||
|
||||
This header structure is encoded using the *MsgPack* format. It is neither
|
||||
compressed (since its size is pretty small) nor encrypted (since it only
|
||||
contains general information and no user data) in any way.
|
||||
Please note that this header even exists when the bundle is not encrypted (the
|
||||
header then contains no encryption method).
|
||||
|
||||
|
||||
#### Bundle information
|
||||
The bundle information structure is the third part of the bundle file format and
|
||||
follows directly after the encryption header.
|
||||
|
||||
The information structure is defined in the appendix as `BundleInfo` and
|
||||
contains general information on the bundle's contents and on how to decode the
|
||||
other two parts of the bundle file.
|
||||
|
||||
This structure is encrypted using the method described in the previous
|
||||
encryption header since it contains confidential information (the bundle id
|
||||
could be used to identify the data contained in the bundle). The size of the
|
||||
encrypted structure is also stored in the previous header. This structure is not
|
||||
compressed, as it is pretty small.
|
||||
|
||||
|
||||
#### Encoded chunk list
|
||||
The chunk list is the third part of the bundle file and follows directly after
|
||||
the encoded header structure.
|
||||
The chunk list is the forth part of the bundle file and follows directly after
|
||||
the bundle information structure.
|
||||
|
||||
The chunk list contains hashes and sizes of all chunks stored in this bundle in
|
||||
the order they are stored. The list is encoded efficiently as 20 bytes per chunk
|
||||
(16 for the hash and 4 for the size) as defined in the appendix as `ChunkList`.
|
||||
the order they are stored. The list is encoded as defined in the appendix as
|
||||
`ChunkList`.
|
||||
|
||||
Since the chunk list contains confidential information (the chunk hashes and
|
||||
sized can be used to identify files) the encoded chunk list is encrypted using
|
||||
the encryption method specified in the header structure. The header structure
|
||||
also contains the full size of the encoded and encrypted chunk list which is
|
||||
needed since the encryption could add some bytes for a nonce or an
|
||||
authentication code.
|
||||
the encryption method specified in the encryption header. The bundle information
|
||||
structure contains the full size of the encoded and encrypted chunk list as
|
||||
`chunk_list_size` which is needed since the encryption could add some bytes for
|
||||
a nonce or an authentication code.
|
||||
|
||||
The chunk list is not compressed since the hashes have a very high entropy and
|
||||
do not compress significantly.
|
||||
|
||||
The chunk list is not stored in the header structure because it contains
|
||||
confidential data and the encryption method is stored in the header. Also the
|
||||
chunk list can be pretty big compared to the header which needs to be read more
|
||||
often.
|
||||
The chunk list is not stored in the bundle info structure because it can be
|
||||
pretty big compared to the info structure which needs to be read more often.
|
||||
|
||||
|
||||
#### Chunk data
|
||||
The chunk data is the final part of a bundle file and follows after the encoded
|
||||
chunk list. The starting position can be obtained from the header as the encoded
|
||||
size of the chunk list is stored there.
|
||||
The chunk data is the final part of a bundle file and follows after the chunk
|
||||
list. The starting position can be obtained from the bundle info structure as
|
||||
the encoded size of the chunk list is stored there as `chunk_list_size`.
|
||||
|
||||
The chunk data part consists of the content data of the chunks contained in this
|
||||
bundle simply concatenated without any separator. The actual size (and by
|
||||
summing up the sizes also the starting position) of each chunk can be obtained
|
||||
from the chunk list.
|
||||
The chunk data part consists of the data of the chunks contained in this
|
||||
bundle simply concatenated without any separator. The individual chunk sizes can
|
||||
be obtained from the chunk list. The starting position of any chunk can be
|
||||
calculated by summing up the sized of all previous chunks.
|
||||
|
||||
The chunk data is compressed as whole (solid archive) and encrypted with the
|
||||
methods specified in the bundle header structure.
|
||||
methods specified in the bundle information structure.
|
||||
|
||||
|
||||
### Inode metadata
|
||||
TODO
|
||||
|
||||
### Backup format
|
||||
TODO
|
||||
The repository contains multiple backups that share the data contained in the
|
||||
bundles. The individual backups are encoded in backup files as described in the
|
||||
following section. Those backup files reference a list of chunks in the bundles
|
||||
as a root inode entry. Each inode entry references lists of chunks for its data
|
||||
and potential child entries.
|
||||
|
||||
All chunks that are referenced either in the backup files or in the inode
|
||||
entries are contained in one of the bundles and is uniquely identified by its
|
||||
hash. An index, e.g. a hash table, can help to find the correct bundle quickly.
|
||||
|
||||
|
||||
#### Backup files
|
||||
Backup files contain information on one specific backup and reference the
|
||||
directory root of that backup.
|
||||
|
||||
Backup files consist of 3 parts:
|
||||
- A magic header with version
|
||||
- A tiny header with encryption information
|
||||
- An encoded and encrypted backup information structure
|
||||
|
||||
|
||||
##### Magic header with version
|
||||
The first part of a backup file contains an 8 byte magic header with version
|
||||
information.
|
||||
|
||||
The first 6 bytes of the header consist of the fixed string "zvault", followed
|
||||
by one byte with the fixed value 0x03. Those 7 bytes make up the magic header of
|
||||
the file and serve to identify the file type as a zvault backup file.
|
||||
|
||||
The 8th byte of the first file part is the version of the file format. This
|
||||
value is currently 0x01 and is expected to be increased for any breaking changes
|
||||
in the file format.
|
||||
|
||||
|
||||
##### Encryption header
|
||||
The encryption header is the second part of the backup file format and follows
|
||||
directly after the 8 bytes of the magic header.
|
||||
|
||||
The header structure is defined in the appendix as `BackupHeader` and contains
|
||||
information on how to decrypt the rest of the backup file.
|
||||
|
||||
Please note that this header even exists when the backup file is not encrypted
|
||||
(the header then contains no encryption method).
|
||||
|
||||
|
||||
##### Backup information
|
||||
The backup information structure is the final part of the backup file format and
|
||||
follows directly after the encryption header.
|
||||
|
||||
The information structure is defined in the appendix as `Backup` and
|
||||
contains general information on the backup's contents and references the
|
||||
directory root of the backup tree.
|
||||
|
||||
This structure is encrypted using the method described in the previous
|
||||
encryption header since it contains confidential information. This structure is
|
||||
not compressed, as it is pretty small.
|
||||
|
||||
|
||||
#### Directories & file data
|
||||
The inode entries are encoded as defined in the appendix as `Inode`. The inode
|
||||
structure contains all meta information on an inode entry, e.g. its file type,
|
||||
the data size, modification time, permissions and ownership, etc. Also, the
|
||||
structure contains optional information that is specific to the file type.
|
||||
For regular files, the inode structure contains the data of that file either
|
||||
inline (for very small files) or as a reference via a chunk list.
|
||||
For directories, the inode structure contains a mapping of child inode entries
|
||||
with their name as key and a chunk list referring their encoded `Inode`
|
||||
structure as value.
|
||||
For symlinks, the inode structure contains the target in the field
|
||||
`symlink_target`.
|
||||
|
||||
Starting from the `root` of the `Backup` structure, the whole backup file tree
|
||||
can be reconstructed by traversing the children of each inode recursively.
|
||||
Since files can only be retrieved by traversing their parent directories, they
|
||||
contain no back link to their parent directory.
|
||||
|
||||
|
||||
|
||||
### Backup file
|
||||
TODO
|
||||
|
||||
|
||||
## Appendix
|
||||
|
||||
### MessagePack encoding
|
||||
|
||||
Most zvault structures are encoded using the MessagePack encoding as specified
|
||||
at http://www.msgpack.org. The version of MessagePack that is used, is dated to
|
||||
2013-04-21.
|
||||
|
||||
All structure encodings are based on a mapping that associates values to the
|
||||
structure's fields. In order to save space, the structure's fields are not
|
||||
referenced by name but by an assigned number. In the encoding specification,
|
||||
this is written as `FIELD: TYPE => NUMBER` where `FIELD` is the field name used
|
||||
to reference the field in the rest of the description, `TYPE` is the type of the
|
||||
field's values and `NUMBER` is the number used as key for this field in the
|
||||
mapping.
|
||||
|
||||
The simple types used are called `null`, `bool`, `int`, `float`, `string`
|
||||
and `bytes` that correspond to the MessagePack data types (`null` means `Nil`,
|
||||
`bytes` means `Binary` and the other types are lower case to distinguish them
|
||||
from custom types).
|
||||
|
||||
Complex data types are noted as `{KEY => VALUE}` for mappings and `[TYPE]`
|
||||
for arrays. Tuples of multiple types e.g. `(TYPE1, TYPE2, TYPE3)` are also
|
||||
encoded as arrays but regarded as differently as they contain different types
|
||||
and have a fixed length.
|
||||
|
||||
If a field is optional, its type is listed as `TYPE?` which means that
|
||||
either `null` or the `TYPE` is expected. If a value of `TYPE` is given. the
|
||||
option is regarded as being set and if `null` is given, the option is regarded
|
||||
as not being set.
|
||||
|
||||
If a structure contains fields with structures or other complex data types, the
|
||||
values of those fields are encoded as described for those values (often again as
|
||||
a mapping on their own). The encoding specification uses the name of the
|
||||
structure as a field type in this case.
|
||||
|
||||
For some structures, there exist a set of default values for the structure's
|
||||
fields. If any field is missing in the encoded mapping, the corresponding value
|
||||
from the defaults will be taken instead.
|
||||
|
||||
|
||||
### Constants
|
||||
TODO
|
||||
The following types are used as named constants. In the encoding, simply the
|
||||
value (mostly a number) is used instead of the name but in the rest of the
|
||||
specification the name is used for clarity.
|
||||
|
||||
|
||||
#### `BundleMode`
|
||||
The `BundleMode` describes the contents of the chunks of a bundle.
|
||||
- `Data` means that the chunks contain file data
|
||||
- `Meta` means that the chunks either contain encoded chunk lists or encoded
|
||||
inode metadata
|
||||
|
||||
BundleMode {
|
||||
Data => 0,
|
||||
Meta => 1
|
||||
}
|
||||
|
||||
|
||||
#### `HashMethod`
|
||||
The `HashMethod` describes the method used to create fingerprint hashes from
|
||||
chunk data. This is not relevant for reading backups.
|
||||
- `Blake2` means the hash method `Blake2b` as described in RFC 7693 with the
|
||||
hash length set to 128 bits.
|
||||
- `Murmur3` means the hash method `MurmurHash3` as described at
|
||||
https://en.wikipedia.org/wiki/MurmurHash for the x64 architecture and with the
|
||||
hash length set to 128 bits.
|
||||
|
||||
HashMethod {
|
||||
Blake2 => 1,
|
||||
Murmur3 => 2
|
||||
}
|
||||
|
||||
|
||||
#### `EncryptionMethod`
|
||||
The `EncryptionMethod` describes the method used to encrypt (and thus also
|
||||
decrypt) data.
|
||||
- `Sodium` means the `crypto_box_seal` method of `libsodium` as specified at
|
||||
http://www.libsodium.org as a combination of `X25519` and `XSalsa20-Poly1305`.
|
||||
|
||||
EncryptionMethod {
|
||||
Sodium => 0
|
||||
}
|
||||
|
||||
|
||||
#### `CompressionMethod`
|
||||
The `CompressionMethod` describes a compression method used to compress (and
|
||||
thus also decompress) data.
|
||||
- `Deflate` means the gzip/zlib method (without header) as described in RFC 1951
|
||||
- `Brotli` means the Google Brotli method as described in RFC 7932
|
||||
- `Lzma` means the LZMA method (XZ stream format) as described at
|
||||
http://tukaani.org/xz/
|
||||
- `Lz4` means the LZ4 method as described at http://www.lz4.org
|
||||
|
||||
CompressionMethod {
|
||||
Deflate => 0,
|
||||
Brotli => 1,
|
||||
Lzma => 2,
|
||||
Lz4 => 3
|
||||
}
|
||||
|
||||
|
||||
#### `FileType`
|
||||
The `FileType` describes the type of an inode.
|
||||
- `File` means on ordinary file that contains data
|
||||
- `Directory` means a directory that does not contain data but might have
|
||||
children
|
||||
- `Symlink` means a symlink that points to a target
|
||||
|
||||
FileType {
|
||||
File => 0,
|
||||
Directory => 1,
|
||||
Symlink => 2
|
||||
}
|
||||
|
||||
|
||||
### Types
|
||||
The following types are used to simplify the encoding specifications. They can
|
||||
simply be substituted by their definitions. For simplicity, their names will be
|
||||
used in the encoding specifications instead of their definitions.
|
||||
|
||||
|
||||
#### `Encryption`
|
||||
The `Encryption` is a combination of an `EncryptionMethod` and a key.
|
||||
The method specifies how the key was used to encrypt the data.
|
||||
For the `Sodium` method, the key is the public key used to encrypt the data
|
||||
with. The secret key needed for decryption, must correspond to that public key.
|
||||
|
||||
Encryption = (EncryptionMethod, bytes)
|
||||
|
||||
|
||||
#### `Compression`
|
||||
The `Compression` is a micro-structure containing the compression method and the
|
||||
compression level. The level is only used for compression.
|
||||
|
||||
Compression {
|
||||
method: CompressionMethod => 0,
|
||||
level: int => 1
|
||||
}
|
||||
|
||||
|
||||
### `BundleHeader` encoding
|
||||
The `BundleHeader` structure contains information on how to decrypt other parts
|
||||
of a bundle. The structure is encoded using the MessagePack encoding that has
|
||||
been defined in a previous section.
|
||||
The `encryption` field contains the information needed to decrypt the rest of
|
||||
the bundle parts. If the `encryption` option is set, the following parts are
|
||||
encrypted using the specified method and key, otherwise the parts are not
|
||||
encrypted. The `info_size` contains the encrypted size of the following
|
||||
`BundleInfo` structure.
|
||||
|
||||
BundleHeader {
|
||||
encryption: Encryption? => 0,
|
||||
info_size: int => 1
|
||||
}
|
||||
|
||||
|
||||
### `BundeInfo` encoding
|
||||
serde_impl!(BundleInfo(u64) {
|
||||
id: BundleId => 0,
|
||||
The `BundleInfo` structure contains information on a bundle. The structure is
|
||||
encoded using the MessagePack encoding that has been defined in a previous
|
||||
section.
|
||||
If the `compression` option is set, the chunk data is compressed with the
|
||||
specified method, otherwise it is uncompressed. The encrypted size of the
|
||||
following `ChunkList` is stored in the `chunk_list_size` field.
|
||||
|
||||
BundeInfo {
|
||||
id: bytes => 0,
|
||||
mode: BundleMode => 1,
|
||||
compression: Option<Compression> => 2,
|
||||
encryption: Option<Encryption> => 3,
|
||||
compression: Compression? => 2,
|
||||
hash_method: HashMethod => 4,
|
||||
raw_size: usize => 6,
|
||||
encoded_size: usize => 7,
|
||||
chunk_count: usize => 8,
|
||||
chunk_info_size: usize => 9
|
||||
});
|
||||
raw_size: int => 6,
|
||||
encoded_size: int => 7,
|
||||
chunk_count: int => 8,
|
||||
chunk_list_size: int => 9
|
||||
}
|
||||
|
||||
This structure is encoded with the following field default values:
|
||||
- `hash_method`: `Blake2`
|
||||
- `mode`: `Data`
|
||||
- All other fields: `0`, `null` or an empty byte sequence depending on the type.
|
||||
|
||||
|
||||
### `ChunkList` encoding
|
||||
TODO
|
||||
The `ChunkList` contains a list of chunk hashes and chunk sizes. This list is
|
||||
NOT encoded using the MessagePack format as a simple binary format is much more
|
||||
efficient in this case.
|
||||
|
||||
For each chunk, the hash and its size are encoded in the following way:
|
||||
- The hash is encoded as 16 bytes (little-endian).
|
||||
- The size is encoded as a 32-bit value (4 bytes) in little-endian.
|
||||
The encoded hash and the size are concatenated (hash first, size second)
|
||||
yielding 20 bytes for each chunk.
|
||||
Those 20 bytes of encoded chunk information are concatenated for all chunks in
|
||||
the list in order or appearance in the list.
|
||||
|
||||
|
||||
### `Inode` encoding
|
||||
The `Inode` structure contains information on a backup inode, e.g. a file or
|
||||
a directory. The structure is encoded using the MessagePack encoding that has
|
||||
been defined in a previous section.
|
||||
The `name` field contains the name of this inode which can be concatenated with
|
||||
the names of all parent inodes (with a platform-dependent seperator) to form the
|
||||
full path of the inode.
|
||||
The `size` field contains the raw size of the data in
|
||||
bytes (this is 0 for everything except files).
|
||||
The `file_type` specifies the type of this inode.
|
||||
The `mode` field specifies the permissions of the inode as a number which is
|
||||
normally interpreted as octal.
|
||||
The `user` and `group` fields specify the ownership of the inode in the form of
|
||||
user and group id.
|
||||
The `timestamp` specifies the modification time of the inode in whole seconds
|
||||
since the UNIX epoch (1970-01-01 12:00 am).
|
||||
The `symlink_target` specifies the target of symlink inodes and is only set for
|
||||
symlinks.
|
||||
The `data` specifies the data of a file and is only set for regular files. The
|
||||
data is specified as a tuple of `nesting` and `bytes`. If `nesting` is `0`,
|
||||
`bytes` contains the data of the file. This "inline" format is only used for
|
||||
small files. If `nesting` is `1`, `bytes` is an encoded `ChunkList` (as
|
||||
described in a previous section). The concatenated data of those chunks make up
|
||||
the data of the file. If `nesting` is `2`, `bytes` is also an encoded
|
||||
`ChunkList`, but the concatenated data of those chunks form again an encoded
|
||||
`ChunkList` which in turn contains the chunks with the file data. Thus `nesting`
|
||||
specifies the number of indirection steps via `ChunkList`s.
|
||||
The `children` field specifies the child inodes of a directory and is only set
|
||||
for directories. It is a mapping from the name of the child entry to the bytes
|
||||
of the encoded chunklist of the encoded `Inode` structure of the child. It is
|
||||
important that the names in the mapping correspond with the names in the
|
||||
respective child `Inode`s and that the mapping is stored in alphabetic order of
|
||||
the names.
|
||||
The `cum_size`, `cum_dirs` and `cum_files` are cumulative values for the inode
|
||||
as well as the whole subtree (including all children recursively). `cum_size` is
|
||||
the sum of all inode data sizes plus 1000 bytes for each inode (for encoded
|
||||
metadata). `cum_dirs` and `cum_files` is the count of directories and
|
||||
non-directories (symlinks and regular files).
|
||||
|
||||
Inode {
|
||||
name: string => 0,
|
||||
size: int => 1,
|
||||
file_type: FileType => 2,
|
||||
mode: int => 3,
|
||||
user: int => 4,
|
||||
group: int => 5,
|
||||
timestamp: int => 7,
|
||||
symlink_target: string? => 9,
|
||||
data: (int, bytes)? => 10,
|
||||
children: {string => bytes}? => 11,
|
||||
cum_size: int => 12,
|
||||
cum_dirs: int => 13,
|
||||
cum_files: int => 14
|
||||
}
|
||||
|
||||
This structure is encoded with the following field default values:
|
||||
- `file_type`: `File`
|
||||
- `mode`: `0o644`
|
||||
- `user` and `group`: `1000`
|
||||
- All other fields: `0`, `null` or an empty string depending on the type.
|
||||
|
||||
|
||||
### `BackupHeader` encoding
|
||||
The `BackupHeader` structure contains information on how to decrypt the rest of
|
||||
the backup file. The structure is encoded using the MessagePack encoding that
|
||||
has been defined in a previous section.
|
||||
The `encryption` field contains the information needed to decrypt the rest of
|
||||
the backup file. If the `encryption` option is set, the rest of the backup file
|
||||
is encrypted using the specified method and key, otherwise the rest is not
|
||||
encrypted.
|
||||
|
||||
BackupHeader {
|
||||
encryption: Encryption? => 0
|
||||
}
|
||||
|
||||
|
||||
### `Backup` encoding
|
||||
The `Backup` structure contains information on one specific backup and
|
||||
references the root of the backup file tree. The structure is encoded using the
|
||||
MessagePack encoding that has been defined in a previous section.
|
||||
The `root` field contains an encoded `ChunkList` that references the root of the
|
||||
backup file tree.
|
||||
The fields `total_data_size`, `changed_data_size`, `deduplicated_data_size` and
|
||||
`encoded_data_size` list the sizes of the backup in various stages in bytes.
|
||||
- `total_data_size` gives the cumulative sizes of all entries in the backup.
|
||||
- `changed_data_size` gives the size of only those entries that changed since
|
||||
the reference backup.
|
||||
- `deduplicated_data_size` gives the cumulative raw size of all new chunks in
|
||||
this backup that have not been stored in the repository yet.
|
||||
- `encoded_data_size` gives the cumulative encoded (and compressed) size of all
|
||||
new bundles that have been written specifically to store this backup.
|
||||
The fields `bundle_count` and `chunk_count` contain the number of new bundles
|
||||
and chunks that had to be written to store this backup. `avg_chunk_size` is the
|
||||
average size of new chunks in this backup.
|
||||
The field `date` specifies the start of the backup run in seconds since the UNIX
|
||||
epoch and the field `duration` contains the duration of the backup run in
|
||||
seconds as a floating point number containing also fractions of seconds.
|
||||
The fields `file_count` and `dir_count` contain the total number of
|
||||
non-directories and directories in this backup.
|
||||
The `host` and `path` field contain the host name and the the path on that host
|
||||
where the root of the backup was located.
|
||||
The field `config` contains the configuration of zVault during the backup run.
|
||||
|
||||
Backup {
|
||||
root: bytes => 0,
|
||||
total_data_size: int => 1,
|
||||
changed_data_size: int => 2,
|
||||
deduplicated_data_size: int => 3,
|
||||
encoded_data_size: int => 4,
|
||||
bundle_count: int => 5,
|
||||
chunk_count: int => 6,
|
||||
avg_chunk_size: float => 7,
|
||||
date: int => 8,
|
||||
duration: float => 9,
|
||||
file_count: int => 10,
|
||||
dir_count: int => 11,
|
||||
host: string => 12,
|
||||
path: string => 13,
|
||||
config: Config => 14
|
||||
}
|
||||
|
|
|
@ -7,11 +7,14 @@ lost+found
|
|||
/proc
|
||||
/run
|
||||
/snap
|
||||
/media
|
||||
|
||||
# Cache data that does not need to be backed up
|
||||
/home/*/.cache
|
||||
/home/*/.zvault
|
||||
/var/cache
|
||||
/tmp
|
||||
/home/**/Trash
|
||||
|
||||
# Avoid backing up zvault remote backups
|
||||
remote/bundles
|
||||
|
|
|
@ -61,7 +61,7 @@ pub fn bundle_path(bundle: &BundleId, mut folder: PathBuf, mut count: usize) ->
|
|||
}
|
||||
folder = folder.join(&file[0..2]);
|
||||
file = file[2..].to_string();
|
||||
count /= 100;
|
||||
count /= 250;
|
||||
}
|
||||
(folder, file.into())
|
||||
}
|
||||
|
|
|
@ -63,10 +63,10 @@ impl fmt::Debug for BundleId {
|
|||
|
||||
#[derive(Eq, Debug, PartialEq, Clone, Copy)]
|
||||
pub enum BundleMode {
|
||||
Content, Meta
|
||||
Data, Meta
|
||||
}
|
||||
serde_impl!(BundleMode(u8) {
|
||||
Content => 0,
|
||||
Data => 0,
|
||||
Meta => 1
|
||||
});
|
||||
|
||||
|
@ -92,7 +92,7 @@ pub struct BundleInfo {
|
|||
pub raw_size: usize,
|
||||
pub encoded_size: usize,
|
||||
pub chunk_count: usize,
|
||||
pub chunk_info_size: usize
|
||||
pub chunk_list_size: usize
|
||||
}
|
||||
serde_impl!(BundleInfo(u64?) {
|
||||
id: BundleId => 0,
|
||||
|
@ -103,7 +103,7 @@ serde_impl!(BundleInfo(u64?) {
|
|||
raw_size: usize => 6,
|
||||
encoded_size: usize => 7,
|
||||
chunk_count: usize => 8,
|
||||
chunk_info_size: usize => 9
|
||||
chunk_list_size: usize => 9
|
||||
});
|
||||
|
||||
impl Default for BundleInfo {
|
||||
|
@ -116,8 +116,8 @@ impl Default for BundleInfo {
|
|||
raw_size: 0,
|
||||
encoded_size: 0,
|
||||
chunk_count: 0,
|
||||
mode: BundleMode::Content,
|
||||
chunk_info_size: 0
|
||||
mode: BundleMode::Data,
|
||||
chunk_list_size: 0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -106,7 +106,7 @@ impl BundleReader {
|
|||
let mut info: BundleInfo = try!(msgpack::decode(&info_data).context(path));
|
||||
info.encryption = header.encryption;
|
||||
debug!("Load bundle {}", info.id);
|
||||
let content_start = file.seek(SeekFrom::Current(0)).unwrap() as usize + info.chunk_info_size;
|
||||
let content_start = file.seek(SeekFrom::Current(0)).unwrap() as usize + info.chunk_list_size;
|
||||
Ok((info, version, content_start))
|
||||
}
|
||||
|
||||
|
@ -124,11 +124,11 @@ impl BundleReader {
|
|||
fn load_chunklist(&mut self) -> Result<(), BundleReaderError> {
|
||||
debug!("Load bundle chunklist {} ({:?})", self.info.id, self.info.mode);
|
||||
let mut file = BufReader::new(try!(File::open(&self.path).context(&self.path as &Path)));
|
||||
let len = self.info.chunk_info_size;
|
||||
let len = self.info.chunk_list_size;
|
||||
let start = self.content_start - len;
|
||||
try!(file.seek(SeekFrom::Start(start as u64)).context(&self.path as &Path));
|
||||
let mut chunk_data = Vec::with_capacity(len);
|
||||
chunk_data.resize(self.info.chunk_info_size, 0);
|
||||
chunk_data.resize(self.info.chunk_list_size, 0);
|
||||
try!(file.read_exact(&mut chunk_data).context(&self.path as &Path));
|
||||
if let Some(ref encryption) = self.info.encryption {
|
||||
chunk_data = try!(self.crypto.lock().unwrap().decrypt(&encryption, &chunk_data).context(&self.path as &Path));
|
||||
|
|
|
@ -114,7 +114,7 @@ impl BundleWriter {
|
|||
id: id.clone(),
|
||||
raw_size: self.raw_size,
|
||||
encoded_size: encoded_size,
|
||||
chunk_info_size: chunk_data.len()
|
||||
chunk_list_size: chunk_data.len()
|
||||
};
|
||||
let mut info_data = try!(msgpack::encode(&info).context(&path as &Path));
|
||||
if let Some(ref encryption) = self.encryption {
|
||||
|
|
|
@ -90,7 +90,7 @@ pub struct Backup {
|
|||
pub config: Config,
|
||||
}
|
||||
serde_impl!(Backup(u8?) {
|
||||
root: Vec<Chunk> => 0,
|
||||
root: ChunkList => 0,
|
||||
total_data_size: u64 => 1,
|
||||
changed_data_size: u64 => 2,
|
||||
deduplicated_data_size: u64 => 3,
|
||||
|
|
|
@ -36,7 +36,7 @@ impl Repository {
|
|||
let next_free_bundle_id = self.next_free_bundle_id();
|
||||
// Select a bundle writer according to the mode and...
|
||||
let writer = match mode {
|
||||
BundleMode::Content => &mut self.content_bundle,
|
||||
BundleMode::Data => &mut self.data_bundle,
|
||||
BundleMode::Meta => &mut self.meta_bundle
|
||||
};
|
||||
// ...alocate one if needed
|
||||
|
@ -60,7 +60,7 @@ impl Repository {
|
|||
raw_size = writer_obj.raw_size();
|
||||
}
|
||||
let bundle_id = match mode {
|
||||
BundleMode::Content => self.next_content_bundle,
|
||||
BundleMode::Data => self.next_data_bundle,
|
||||
BundleMode::Meta => self.next_meta_bundle
|
||||
};
|
||||
// Finish bundle if over maximum size
|
||||
|
@ -72,8 +72,8 @@ impl Repository {
|
|||
if self.next_meta_bundle == bundle_id {
|
||||
self.next_meta_bundle = next_free_bundle_id
|
||||
}
|
||||
if self.next_content_bundle == bundle_id {
|
||||
self.next_content_bundle = next_free_bundle_id
|
||||
if self.next_data_bundle == bundle_id {
|
||||
self.next_data_bundle = next_free_bundle_id
|
||||
}
|
||||
// Not saving the bundle map, this will be done by flush
|
||||
}
|
||||
|
|
|
@ -48,10 +48,10 @@ impl Repository {
|
|||
}
|
||||
|
||||
fn check_repository(&self) -> Result<(), RepositoryError> {
|
||||
if self.next_content_bundle == self.next_meta_bundle {
|
||||
if self.next_data_bundle == self.next_meta_bundle {
|
||||
return Err(RepositoryIntegrityError::InvalidNextBundleId.into())
|
||||
}
|
||||
if self.bundle_map.get(self.next_content_bundle).is_some() {
|
||||
if self.bundle_map.get(self.next_data_bundle).is_some() {
|
||||
return Err(RepositoryIntegrityError::InvalidNextBundleId.into())
|
||||
}
|
||||
if self.bundle_map.get(self.next_meta_bundle).is_some() {
|
||||
|
|
|
@ -150,7 +150,7 @@ serde_impl!(Inode(u8?) {
|
|||
//__old_create_time: i64 => 8,
|
||||
symlink_target: Option<String> => 9,
|
||||
data: Option<FileData> => 10,
|
||||
children: BTreeMap<String, ChunkList> => 11,
|
||||
children: Option<BTreeMap<String, ChunkList>> => 11,
|
||||
cum_size: u64 => 12,
|
||||
cum_dirs: usize => 13,
|
||||
cum_files: usize => 14
|
||||
|
@ -255,7 +255,7 @@ impl Repository {
|
|||
try!(file.read_to_end(&mut data));
|
||||
inode.data = Some(FileData::Inline(data.into()));
|
||||
} else {
|
||||
let mut chunks = try!(self.put_stream(BundleMode::Content, &mut file));
|
||||
let mut chunks = try!(self.put_stream(BundleMode::Data, &mut file));
|
||||
if chunks.len() < 10 {
|
||||
inode.data = Some(FileData::ChunkedDirect(chunks));
|
||||
} else {
|
||||
|
|
|
@ -41,10 +41,10 @@ pub struct Repository {
|
|||
index: Index,
|
||||
crypto: Arc<Mutex<Crypto>>,
|
||||
bundle_map: BundleMap,
|
||||
next_content_bundle: u32,
|
||||
next_data_bundle: u32,
|
||||
next_meta_bundle: u32,
|
||||
bundles: BundleDb,
|
||||
content_bundle: Option<BundleWriter>,
|
||||
data_bundle: Option<BundleWriter>,
|
||||
meta_bundle: Option<BundleWriter>,
|
||||
chunker: Chunker,
|
||||
locks: LockFolder
|
||||
|
@ -82,10 +82,10 @@ impl Repository {
|
|||
config: config,
|
||||
index: index,
|
||||
bundle_map: bundle_map,
|
||||
next_content_bundle: 1,
|
||||
next_data_bundle: 1,
|
||||
next_meta_bundle: 0,
|
||||
bundles: bundles,
|
||||
content_bundle: None,
|
||||
data_bundle: None,
|
||||
meta_bundle: None,
|
||||
crypto: crypto,
|
||||
locks: locks
|
||||
|
@ -113,10 +113,10 @@ impl Repository {
|
|||
index: index,
|
||||
crypto: crypto,
|
||||
bundle_map: bundle_map,
|
||||
next_content_bundle: 0,
|
||||
next_data_bundle: 0,
|
||||
next_meta_bundle: 0,
|
||||
bundles: bundles,
|
||||
content_bundle: None,
|
||||
data_bundle: None,
|
||||
meta_bundle: None,
|
||||
locks: locks
|
||||
};
|
||||
|
@ -128,7 +128,7 @@ impl Repository {
|
|||
}
|
||||
try!(repo.save_bundle_map());
|
||||
repo.next_meta_bundle = repo.next_free_bundle_id();
|
||||
repo.next_content_bundle = repo.next_free_bundle_id();
|
||||
repo.next_data_bundle = repo.next_free_bundle_id();
|
||||
Ok(repo)
|
||||
}
|
||||
|
||||
|
@ -177,7 +177,7 @@ impl Repository {
|
|||
|
||||
#[inline]
|
||||
fn next_free_bundle_id(&self) -> u32 {
|
||||
let mut id = max(self.next_content_bundle, self.next_meta_bundle) + 1;
|
||||
let mut id = max(self.next_data_bundle, self.next_meta_bundle) + 1;
|
||||
while self.bundle_map.get(id).is_some() {
|
||||
id += 1;
|
||||
}
|
||||
|
@ -185,14 +185,14 @@ impl Repository {
|
|||
}
|
||||
|
||||
pub fn flush(&mut self) -> Result<(), RepositoryError> {
|
||||
if self.content_bundle.is_some() {
|
||||
if self.data_bundle.is_some() {
|
||||
let mut finished = None;
|
||||
mem::swap(&mut self.content_bundle, &mut finished);
|
||||
mem::swap(&mut self.data_bundle, &mut finished);
|
||||
{
|
||||
let bundle = try!(self.bundles.add_bundle(finished.unwrap()));
|
||||
self.bundle_map.set(self.next_content_bundle, bundle.id.clone());
|
||||
self.bundle_map.set(self.next_data_bundle, bundle.id.clone());
|
||||
}
|
||||
self.next_content_bundle = self.next_free_bundle_id()
|
||||
self.next_data_bundle = self.next_free_bundle_id()
|
||||
}
|
||||
if self.meta_bundle.is_some() {
|
||||
let mut finished = None;
|
||||
|
@ -211,7 +211,7 @@ impl Repository {
|
|||
fn add_new_remote_bundle(&mut self, bundle: BundleInfo) -> Result<(), RepositoryError> {
|
||||
info!("Adding new bundle to index: {}", bundle.id);
|
||||
let bundle_id = match bundle.mode {
|
||||
BundleMode::Content => self.next_content_bundle,
|
||||
BundleMode::Data => self.next_data_bundle,
|
||||
BundleMode::Meta => self.next_meta_bundle
|
||||
};
|
||||
let chunks = try!(self.bundles.get_chunk_list(&bundle.id));
|
||||
|
@ -219,8 +219,8 @@ impl Repository {
|
|||
if self.next_meta_bundle == bundle_id {
|
||||
self.next_meta_bundle = self.next_free_bundle_id()
|
||||
}
|
||||
if self.next_content_bundle == bundle_id {
|
||||
self.next_content_bundle = self.next_free_bundle_id()
|
||||
if self.next_data_bundle == bundle_id {
|
||||
self.next_data_bundle = self.next_free_bundle_id()
|
||||
}
|
||||
for (i, (hash, _len)) in chunks.into_inner().into_iter().enumerate() {
|
||||
try!(self.index.set(&hash, &Location{bundle: bundle_id as u32, chunk: i as u32}));
|
||||
|
|
|
@ -36,32 +36,32 @@ quick_error!{
|
|||
}
|
||||
|
||||
#[derive(Clone, Debug, Copy, Eq, PartialEq)]
|
||||
pub enum CompressionAlgo {
|
||||
pub enum CompressionMethod {
|
||||
Deflate, // Standardized
|
||||
Brotli, // Good speed and ratio
|
||||
Lzma2, // Very good ratio, slow
|
||||
Lzma, // Very good ratio, slow
|
||||
Lz4 // Very fast, low ratio
|
||||
}
|
||||
serde_impl!(CompressionAlgo(u8) {
|
||||
serde_impl!(CompressionMethod(u8) {
|
||||
Deflate => 0,
|
||||
Brotli => 1,
|
||||
Lzma2 => 2,
|
||||
Lzma => 2,
|
||||
Lz4 => 3
|
||||
});
|
||||
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
pub struct Compression {
|
||||
algo: CompressionAlgo,
|
||||
method: CompressionMethod,
|
||||
level: u8
|
||||
}
|
||||
impl Default for Compression {
|
||||
fn default() -> Self {
|
||||
Compression { algo: CompressionAlgo::Brotli, level: 3 }
|
||||
Compression { method: CompressionMethod::Brotli, level: 3 }
|
||||
}
|
||||
}
|
||||
serde_impl!(Compression(u64) {
|
||||
algo: CompressionAlgo => 0,
|
||||
method: CompressionMethod => 0,
|
||||
level: u8 => 1
|
||||
});
|
||||
|
||||
|
@ -81,23 +81,23 @@ impl Compression {
|
|||
} else {
|
||||
(name, 5)
|
||||
};
|
||||
let algo = match name {
|
||||
"deflate" | "zlib" | "gzip" => CompressionAlgo::Deflate,
|
||||
"brotli" => CompressionAlgo::Brotli,
|
||||
"lzma2" => CompressionAlgo::Lzma2,
|
||||
"lz4" => CompressionAlgo::Lz4,
|
||||
let method = match name {
|
||||
"deflate" | "zlib" | "gzip" => CompressionMethod::Deflate,
|
||||
"brotli" => CompressionMethod::Brotli,
|
||||
"lzma" | "lzma2" | "xz" => CompressionMethod::Lzma,
|
||||
"lz4" => CompressionMethod::Lz4,
|
||||
_ => return Err(CompressionError::UnsupportedCodec(name.to_string()))
|
||||
};
|
||||
Ok(Compression { algo: algo, level: level })
|
||||
Ok(Compression { method: method, level: level })
|
||||
}
|
||||
|
||||
#[inline]
|
||||
pub fn name(&self) -> &'static str {
|
||||
match self.algo {
|
||||
CompressionAlgo::Deflate => "deflate",
|
||||
CompressionAlgo::Brotli => "brotli",
|
||||
CompressionAlgo::Lzma2 => "lzma2",
|
||||
CompressionAlgo::Lz4 => "lz4",
|
||||
match self.method {
|
||||
CompressionMethod::Deflate => "deflate",
|
||||
CompressionMethod::Brotli => "brotli",
|
||||
CompressionMethod::Lzma => "lzma",
|
||||
CompressionMethod::Lz4 => "lz4",
|
||||
}
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in New Issue