Repository readme

2017-04-03 14:05:16 +02:00 · 2017-04-03 14:05:16 +02:00 · 4145160660
parent 883c4c1c24
commit 4145160660
13 changed files with 488 additions and 184 deletions
--- a/docs/bundle_format.md
+++ b/docs/bundle_format.md
@ -1,79 +0,0 @@
 % Bundle file format
 ## Bundle file format
 The bundle file format consists of 4 parts:
 - A magic header with version
 - An encoded header structure
 - An encoded chunk list
 - The chunk data
 The main reason for having those multiple parts is that it is expected that the
 smaller front parts can be read much faster than the the whole file. So
 information that is needed more frequently is put into earlier parts and the
 data that is need the least frequent is put into the latter part so that it does
 not slow down reading the front parts. Keeping those parts in separate files
 was also considered but rejected to increase the reliability of the storage.
 ### Magic header with version
 The first part of a bundle file contains an 8 byte magic header with version
 information.
 The first 6 bytes of the header consist of the fixed string "zvault", followed
 by one byte with the fixed value 0x01. Those 7 bytes make up the magic header of
 the file and serve to identify the file type as a zvault bundle file.
 The 8th byte of the first file part is the version of the file format. This
 value is currently 0x01 and is expected to be increased for any breaking changes
 in the file format.
 ### Encoded header structure
 The encoded header structure is the second part of the bundle file format and
 follows directly after the 8 bytes of the magic header.
 The header structure is defined in `bundle.rs` as `BundleInfo` and contains
 general information on the bundle's contents and on how to decode the other two
 parts of the bundle file.
 This header structure is encoded using the *MsgPack* format. It is neither
 compressed (since its size is pretty small) nor encrypted (since it only
 contains general information and no user data) in any way.
 ### Encoded chunk list
 The chunk list is the third part of the bundle file and follows directly after
 the encoded header structure.
 The chunk list contains hashes and sizes of all chunks stored in this bundle in
 the order they are stored. The list is encoded efficiently as 20 bytes per chunk
 (16 for the hash and 4 for the size) as defined in `../util/chunk.rs`.
 Since the chunk list contains confidential information (the chunk hashes and
 sized can be used to identify files) the encoded chunk list is encrypted using
 the encryption method specified in the header structure. The header structure
 also contains the full size of the encoded and encrypted chunk list which is
 needed since the encryption could add some bytes for a nonce or an
 authentication code.
 The chunk list is not compressed since the hashes have a very high entropy and
 do not compress significantly.
 The chunk list is not stored in the header structure because it contains
 confidential data and the encryption method is stored in the header. Also the
 chunk list can be pretty big compared to the header which needs to be read more
 often.
 ### Chunk data
 The chunk data is the final part of a bundle file and follows after the encoded
 chunk list. The starting position can be obtained from the header as the encoded
 size of the chunk list is stored there.
 The chunk data part consists of the content data of the chunks contained in this
 bundle simply concatenated without any separator. The actual size (and by
 summing up the sizes also the starting position) of each chunk can be obtained
 from the chunk list.
 The chunk data is compressed as whole (solid archive) and encrypted with the
 methods specified in the bundle header structure.
--- a/docs/repository_readme.md
+++ b/docs/repository_readme.md
@ -15,6 +15,9 @@ The repository contains the following components:
  is running for sure.
 ## Repository format
 In case the zVault software is not available for restoring the backups included
@ -23,11 +26,12 @@ so that its contents can be read without zVault.
 ### Bundle files
-The bundle file format consists of 4 parts:
+The bundle file format consists of 5 parts:
 - A magic header with version
- An encoded header structure
+- A tiny header with encryption information
- An encoded chunk list
+- An encoded and encrypted bundle information structure
- The chunk data
+- An encoded and encrypted chunk list
 - The chunk data (compressed and encrypted)
 The main reason for having those multiple parts is that it is expected that the
 smaller front parts can be read much faster than the the whole file. So
@ -50,87 +54,463 @@ value is currently 0x01 and is expected to be increased for any breaking changes
 in the file format.
-#### Encoded header structure
+#### Encryption header
-The encoded header structure is the second part of the bundle file format and
+The encryption header is the second part of the bundle file format and follows
-follows directly after the 8 bytes of the magic header.
+directly after the 8 bytes of the magic header.
-The header structure is defined in the appendix as `BundleInfo` and contains
+The header structure is defined in the appendix as `BundleHeader` and contains
-general information on the bundle's contents and on how to decode the other two
+information on how to decrypt the other parts of the bundle as well as the
-parts of the bundle file.
+encrypted size of the following bundle information.
-This header structure is encoded using the *MsgPack* format. It is neither
+Please note that this header even exists when the bundle is not encrypted (the
-compressed (since its size is pretty small) nor encrypted (since it only
+header then contains no encryption method).
-contains general information and no user data) in any way.
+
 #### Bundle information
 The bundle information structure is the third part of the bundle file format and
 follows directly after the encryption header.
 The information structure is defined in the appendix as `BundleInfo` and
 contains general information on the bundle's contents and on how to decode the
 other two parts of the bundle file.
 This structure is encrypted using the method described in the previous
 encryption header since it contains confidential information (the bundle id
 could be used to identify the data contained in the bundle). The size of the
 encrypted structure is also stored in the previous header. This structure is not
 compressed, as it is pretty small.
 #### Encoded chunk list
-The chunk list is the third part of the bundle file and follows directly after
+The chunk list is the forth part of the bundle file and follows directly after
-the encoded header structure.
+the bundle information structure.
 The chunk list contains hashes and sizes of all chunks stored in this bundle in
-the order they are stored. The list is encoded efficiently as 20 bytes per chunk
+the order they are stored. The list is encoded as defined in the appendix as
-(16 for the hash and 4 for the size) as defined in the appendix as `ChunkList`.
+`ChunkList`.
 Since the chunk list contains confidential information (the chunk hashes and
 sized can be used to identify files) the encoded chunk list is encrypted using
-the encryption method specified in the header structure. The header structure
+the encryption method specified in the encryption header. The bundle information
-also contains the full size of the encoded and encrypted chunk list which is
+structure contains the full size of the encoded and encrypted chunk list as
-needed since the encryption could add some bytes for a nonce or an
+`chunk_list_size` which is needed since the encryption could add some bytes for
-authentication code.
+a nonce or an authentication code.
 The chunk list is not compressed since the hashes have a very high entropy and
 do not compress significantly.
-The chunk list is not stored in the header structure because it contains
+The chunk list is not stored in the bundle info structure because it can be
-confidential data and the encryption method is stored in the header. Also the
+pretty big compared to the info structure which needs to be read more often.
 chunk list can be pretty big compared to the header which needs to be read more
 often.
 #### Chunk data
-The chunk data is the final part of a bundle file and follows after the encoded
+The chunk data is the final part of a bundle file and follows after the chunk
-chunk list. The starting position can be obtained from the header as the encoded
+list. The starting position can be obtained from the bundle info structure as
-size of the chunk list is stored there.
+the encoded size of the chunk list is stored there as `chunk_list_size`.
-The chunk data part consists of the content data of the chunks contained in this
+The chunk data part consists of the data of the chunks contained in this
-bundle simply concatenated without any separator. The actual size (and by
+bundle simply concatenated without any separator. The individual chunk sizes can
-summing up the sizes also the starting position) of each chunk can be obtained
+be obtained from the chunk list. The starting position of any chunk can be
-from the chunk list.
+calculated by summing up the sized of all previous chunks.
 The chunk data is compressed as whole (solid archive) and encrypted with the
-methods specified in the bundle header structure.
+methods specified in the bundle information structure.
 ### Inode metadata
 TODO
 ### Backup format
-TODO
+The repository contains multiple backups that share the data contained in the
 bundles. The individual backups are encoded in backup files as described in the
 following section. Those backup files reference a list of chunks in the bundles
 as a root inode entry. Each inode entry references lists of chunks for its data
 and potential child entries.
 All chunks that are referenced either in the backup files or in the inode
 entries are contained in one of the bundles and is uniquely identified by its
 hash. An index, e.g. a hash table, can help to find the correct bundle quickly.
 #### Backup files
 Backup files contain information on one specific backup and reference the
 directory root of that backup.
 Backup files consist of 3 parts:
 - A magic header with version
 - A tiny header with encryption information
 - An encoded and encrypted backup information structure
 ##### Magic header with version
 The first part of a backup file contains an 8 byte magic header with version
 information.
 The first 6 bytes of the header consist of the fixed string "zvault", followed
 by one byte with the fixed value 0x03. Those 7 bytes make up the magic header of
 the file and serve to identify the file type as a zvault backup file.
 The 8th byte of the first file part is the version of the file format. This
 value is currently 0x01 and is expected to be increased for any breaking changes
 in the file format.
 ##### Encryption header
 The encryption header is the second part of the backup file format and follows
 directly after the 8 bytes of the magic header.
 The header structure is defined in the appendix as `BackupHeader` and contains
 information on how to decrypt the rest of the backup file.
 Please note that this header even exists when the backup file is not encrypted
 (the header then contains no encryption method).
 ##### Backup information
 The backup information structure is the final part of the backup file format and
 follows directly after the encryption header.
 The information structure is defined in the appendix as `Backup` and
 contains general information on the backup's contents and references the
 directory root of the backup tree.
 This structure is encrypted using the method described in the previous
 encryption header since it contains confidential information. This structure is
 not compressed, as it is pretty small.
 #### Directories & file data
 The inode entries are encoded as defined in the appendix as `Inode`. The inode
 structure contains all meta information on an inode entry, e.g. its file type,
 the data size, modification time, permissions and ownership, etc. Also, the
 structure contains optional information that is specific to the file type.
 For regular files, the inode structure contains the data of that file either
 inline (for very small files) or as a reference via a chunk list.
 For directories, the inode structure contains a mapping of child inode entries
 with their name as key and a chunk list referring their encoded `Inode`
 structure as value.
 For symlinks, the inode structure contains the target in the field
 `symlink_target`.
 Starting from the `root` of the `Backup` structure, the whole backup file tree
 can be reconstructed by traversing the children of each inode recursively.
 Since files can only be retrieved by traversing their parent directories, they
 contain no back link to their parent directory.
 ### Backup file
 TODO
 ## Appendix
 ### MessagePack encoding
 Most zvault structures are encoded using the MessagePack encoding as specified
 at http://www.msgpack.org. The version of MessagePack that is used, is dated to
 2013-04-21.
 All structure encodings are based on a mapping that associates values to the
 structure's fields. In order to save space, the structure's fields are not
 referenced by name but by an assigned number. In the encoding specification,
 this is written as `FIELD: TYPE => NUMBER` where `FIELD` is the field name used
 to reference the field in the rest of the description, `TYPE` is the type of the
 field's values and `NUMBER` is the number used as key for this field in the
 mapping.
 The simple types used are called `null`, `bool`, `int`, `float`, `string`
 and `bytes` that correspond to the MessagePack data types (`null` means `Nil`,
 `bytes` means `Binary` and the other types are lower case to distinguish them
 from custom types).
 Complex data types are noted as `{KEY => VALUE}` for mappings and `[TYPE]`
 for arrays. Tuples of multiple types e.g. `(TYPE1, TYPE2, TYPE3)` are also
 encoded as arrays but regarded as differently as they contain different types
 and have a fixed length.
 If a field is optional, its type is listed as `TYPE?` which means that
 either `null` or the `TYPE` is expected. If a value of `TYPE` is given. the
 option is regarded as being set and if `null` is given, the option is regarded
 as not being set.
 If a structure contains fields with structures or other complex data types, the
 values of those fields are encoded as described for those values (often again as
 a mapping on their own). The encoding specification uses the name of the
 structure as a field type in this case.
 For some structures, there exist a set of default values for the structure's
 fields. If any field is missing in the encoded mapping, the corresponding value
 from the defaults will be taken instead.
 ### Constants
-TODO
+The following types are used as named constants. In the encoding, simply the
 value (mostly a number) is used instead of the name but in the rest of the
 specification the name is used for clarity.
 #### `BundleMode`
 The `BundleMode` describes the contents of the chunks of a bundle.
 - `Data` means that the chunks contain file data
 - `Meta` means that the chunks either contain encoded chunk lists or encoded
  inode metadata
    BundleMode {
        Data => 0,
        Meta => 1
    }
 #### `HashMethod`
 The `HashMethod` describes the method used to create fingerprint hashes from
 chunk data. This is not relevant for reading backups.
 - `Blake2` means the hash method `Blake2b` as described in RFC 7693 with the
  hash length set to 128 bits.
 - `Murmur3` means the hash method `MurmurHash3` as described at
  https://en.wikipedia.org/wiki/MurmurHash for the x64 architecture and with the
  hash length set to 128 bits.
    HashMethod {
        Blake2 => 1,
        Murmur3 => 2
    }
 #### `EncryptionMethod`
 The `EncryptionMethod` describes the method used to encrypt (and thus also
 decrypt) data.
 - `Sodium` means the `crypto_box_seal` method of `libsodium` as specified at
  http://www.libsodium.org as a combination of `X25519` and `XSalsa20-Poly1305`.
    EncryptionMethod {
        Sodium => 0
    }
 #### `CompressionMethod`
 The `CompressionMethod` describes a compression method used to compress (and
 thus also decompress) data.
 - `Deflate` means the gzip/zlib method (without header) as described in RFC 1951
 - `Brotli` means the Google Brotli method as described in RFC 7932
 - `Lzma` means the LZMA method (XZ stream format) as described at
  http://tukaani.org/xz/
 - `Lz4` means the LZ4 method as described at http://www.lz4.org
    CompressionMethod {
        Deflate => 0,
        Brotli => 1,
        Lzma => 2,
        Lz4 => 3
    }
 #### `FileType`
 The `FileType` describes the type of an inode.
 - `File` means on ordinary file that contains data
 - `Directory` means a directory that does not contain data but might have
  children
 - `Symlink` means a symlink that points to a target
    FileType {
        File => 0,
        Directory => 1,
        Symlink => 2
    }
 ### Types
 The following types are used to simplify the encoding specifications. They can
 simply be substituted by their definitions. For simplicity, their names will be
 used in the encoding specifications instead of their definitions.
 #### `Encryption`
 The `Encryption` is a combination of an `EncryptionMethod` and a key.
 The method specifies how the key was used to encrypt the data.
 For the `Sodium` method, the key is the public key used to encrypt the data
 with. The secret key needed for decryption, must correspond to that public key.
    Encryption = (EncryptionMethod, bytes)
 #### `Compression`
 The `Compression` is a micro-structure containing the compression method and the
 compression level. The level is only used for compression.
    Compression {
        method: CompressionMethod => 0,
        level: int => 1
    }
 ### `BundleHeader` encoding
 The `BundleHeader` structure contains information on how to decrypt other parts
 of a bundle. The structure is encoded using the MessagePack encoding that has
 been defined in a previous section.
 The `encryption` field contains the information needed to decrypt the rest of
 the bundle parts. If the `encryption` option is set, the following parts are
 encrypted using the specified method and key, otherwise the parts are not
 encrypted. The `info_size` contains the encrypted size of the following
 `BundleInfo` structure.
    BundleHeader {
        encryption: Encryption? => 0,
        info_size: int => 1
    }
 ### `BundeInfo` encoding
-serde_impl!(BundleInfo(u64) {
+The `BundleInfo` structure contains information on a bundle. The structure is
-    id: BundleId => 0,
+encoded using the MessagePack encoding that has been defined in a previous
 section.
 If the `compression` option is set, the chunk data is compressed with the
 specified method, otherwise it is uncompressed. The encrypted size of the
 following `ChunkList` is stored in the `chunk_list_size` field.
    BundeInfo {
        id: bytes => 0,
        mode: BundleMode => 1,
-    compression: Option<Compression> => 2,
+        compression: Compression? => 2,
    encryption: Option<Encryption> => 3,
        hash_method: HashMethod => 4,
-    raw_size: usize => 6,
+        raw_size: int => 6,
-    encoded_size: usize => 7,
+        encoded_size: int => 7,
-    chunk_count: usize => 8,
+        chunk_count: int => 8,
-    chunk_info_size: usize => 9
+        chunk_list_size: int => 9
-});
+    }
 This structure is encoded with the following field default values:
 - `hash_method`: `Blake2`
 - `mode`: `Data`
 - All other fields: `0`, `null` or an empty byte sequence depending on the type.
 ### `ChunkList` encoding
-TODO
+The `ChunkList` contains a list of chunk hashes and chunk sizes. This list is
 NOT encoded using the MessagePack format as a simple binary format is much more
 efficient in this case.
 For each chunk, the hash and its size are encoded in the following way:
 - The hash is encoded as 16 bytes (little-endian).
 - The size is encoded as a 32-bit value (4 bytes) in little-endian.
 The encoded hash and the size are concatenated (hash first, size second)
 yielding 20 bytes for each chunk.
 Those 20 bytes of encoded chunk information are concatenated for all chunks in
 the list in order or appearance in the list.
 ### `Inode` encoding
 The `Inode` structure contains information on a backup inode, e.g. a file or
 a directory. The structure is encoded using the MessagePack encoding that has
 been defined in a previous section.
 The `name` field contains the name of this inode which can be concatenated with
 the names of all parent inodes (with a platform-dependent seperator) to form the
 full path of the inode.
 The `size` field contains the raw size of the data in
 bytes (this is 0 for everything except files).
 The `file_type` specifies the type of this inode.
 The `mode` field specifies the permissions of the inode as a number which is
 normally interpreted as octal.
 The `user` and `group` fields specify the ownership of the inode in the form of
 user and group id.
 The `timestamp` specifies the modification time of the inode in whole seconds
 since the UNIX epoch (1970-01-01 12:00 am).
 The `symlink_target` specifies the target of symlink inodes and is only set for
 symlinks.
 The `data` specifies the data of a file and is only set for regular files. The
 data is specified as a tuple of `nesting` and `bytes`. If `nesting` is `0`,
 `bytes` contains the data of the file. This "inline" format is only used for
 small files. If `nesting` is `1`, `bytes` is an encoded `ChunkList` (as
 described in a previous section). The concatenated data of those chunks make up
 the data of the file. If `nesting` is `2`, `bytes` is also an encoded
 `ChunkList`, but the concatenated data of those chunks form again an encoded
 `ChunkList` which in turn contains the chunks with the file data. Thus `nesting`
 specifies the number of indirection steps via `ChunkList`s.
 The `children` field specifies the child inodes of a directory and is only set
 for directories. It is a mapping from the name of the child entry to the bytes
 of the encoded chunklist of the encoded `Inode` structure of the child. It is
 important that the names in the mapping correspond with the names in the
 respective child `Inode`s and that the mapping is stored in alphabetic order of
 the names.
 The `cum_size`, `cum_dirs` and `cum_files` are cumulative values for the inode
 as well as the whole subtree (including all children recursively). `cum_size` is
 the sum of all inode data sizes plus 1000 bytes for each inode (for encoded
 metadata). `cum_dirs` and `cum_files` is the count of directories and
 non-directories (symlinks and regular files).
    Inode {
        name: string => 0,
        size: int => 1,
        file_type: FileType => 2,
        mode: int => 3,
        user: int => 4,
        group: int => 5,
        timestamp: int => 7,
        symlink_target: string? => 9,
        data: (int, bytes)? => 10,
        children: {string => bytes}? => 11,
        cum_size: int => 12,
        cum_dirs: int => 13,
        cum_files: int => 14
    }
 This structure is encoded with the following field default values:
 - `file_type`: `File`
 - `mode`: `0o644`
 - `user` and `group`: `1000`
 - All other fields: `0`, `null` or an empty string depending on the type.
 ### `BackupHeader` encoding
 The `BackupHeader` structure contains information on how to decrypt the rest of
 the backup file. The structure is encoded using the MessagePack encoding that
 has been defined in a previous section.
 The `encryption` field contains the information needed to decrypt the rest of
 the backup file. If the `encryption` option is set, the rest of the backup file
 is encrypted using the specified method and key, otherwise the rest is not
 encrypted.
    BackupHeader {
        encryption: Encryption? => 0
    }
 ### `Backup` encoding
 The `Backup` structure contains information on one specific backup and
 references the root of the backup file tree. The structure is encoded using the
 MessagePack encoding that has been defined in a previous section.
 The `root` field contains an encoded `ChunkList` that references the root of the
 backup file tree.
 The fields `total_data_size`, `changed_data_size`, `deduplicated_data_size` and
 `encoded_data_size` list the sizes of the backup in various stages in bytes.
 - `total_data_size` gives the cumulative sizes of all entries in the backup.
 - `changed_data_size` gives the size of only those entries that changed since
  the reference backup.
 - `deduplicated_data_size` gives the cumulative raw size of all new chunks in
  this backup that have not been stored in the repository yet.
 - `encoded_data_size` gives the cumulative encoded (and compressed) size of all
  new bundles that have been written specifically to store this backup.
 The fields `bundle_count` and `chunk_count` contain the number of new bundles
 and chunks that had to be written to store this backup. `avg_chunk_size` is the
 average size of new chunks in this backup.
 The field `date` specifies the start of the backup run in seconds since the UNIX
 epoch and the field `duration` contains the duration of the backup run in
 seconds as a floating point number containing also fractions of seconds.
 The fields `file_count` and `dir_count` contain the total number of
 non-directories and directories in this backup.
 The `host` and `path` field contain the host name and the the path on that host
 where the root of the backup was located.
 The field `config` contains the configuration of zVault during the backup run.
    Backup {
        root: bytes => 0,
        total_data_size: int => 1,
        changed_data_size: int => 2,
        deduplicated_data_size: int => 3,
        encoded_data_size: int => 4,
        bundle_count: int => 5,
        chunk_count: int => 6,
        avg_chunk_size: float => 7,
        date: int => 8,
        duration: float => 9,
        file_count: int => 10,
        dir_count: int => 11,
        host: string => 12,
        path: string => 13,
        config: Config => 14
    }
--- a/excludes.default
+++ b/excludes.default
@ -7,11 +7,14 @@ lost+found
 /proc
 /run
 /snap
 /media
 # Cache data that does not need to be backed up
 /home/*/.cache
 /home/*/.zvault
 /var/cache
 /tmp
 /home/**/Trash
 # Avoid backing up zvault remote backups
 remote/bundles
--- a/src/bundledb/db.rs
+++ b/src/bundledb/db.rs
@ -61,7 +61,7 @@ pub fn bundle_path(bundle: &BundleId, mut folder: PathBuf, mut count: usize) ->
        }
        folder = folder.join(&file[0..2]);
        file = file[2..].to_string();
-        count /= 100;
+        count /= 250;
    }
    (folder, file.into())
 }
--- a/src/bundledb/mod.rs
+++ b/src/bundledb/mod.rs
@ -63,10 +63,10 @@ impl fmt::Debug for BundleId {
 #[derive(Eq, Debug, PartialEq, Clone, Copy)]
 pub enum BundleMode {
-    Content, Meta
+    Data, Meta
 }
 serde_impl!(BundleMode(u8) {
-    Content => 0,
+    Data => 0,
    Meta => 1
 });
@ -92,7 +92,7 @@ pub struct BundleInfo {
    pub raw_size: usize,
    pub encoded_size: usize,
    pub chunk_count: usize,
-    pub chunk_info_size: usize
+    pub chunk_list_size: usize
 }
 serde_impl!(BundleInfo(u64?) {
    id: BundleId => 0,
@ -103,7 +103,7 @@ serde_impl!(BundleInfo(u64?) {
    raw_size: usize => 6,
    encoded_size: usize => 7,
    chunk_count: usize => 8,
-    chunk_info_size: usize => 9
+    chunk_list_size: usize => 9
 });
 impl Default for BundleInfo {
@ -116,8 +116,8 @@ impl Default for BundleInfo {
            raw_size: 0,
            encoded_size: 0,
            chunk_count: 0,
-            mode: BundleMode::Content,
+            mode: BundleMode::Data,
-            chunk_info_size: 0
+            chunk_list_size: 0
        }
    }
 }
--- a/src/bundledb/reader.rs
+++ b/src/bundledb/reader.rs
@ -106,7 +106,7 @@ impl BundleReader {
        let mut info: BundleInfo = try!(msgpack::decode(&info_data).context(path));
        info.encryption = header.encryption;
        debug!("Load bundle {}", info.id);
-        let content_start = file.seek(SeekFrom::Current(0)).unwrap() as usize + info.chunk_info_size;
+        let content_start = file.seek(SeekFrom::Current(0)).unwrap() as usize + info.chunk_list_size;
        Ok((info, version, content_start))
    }
@ -124,11 +124,11 @@ impl BundleReader {
    fn load_chunklist(&mut self) -> Result<(), BundleReaderError> {
        debug!("Load bundle chunklist {} ({:?})", self.info.id, self.info.mode);
        let mut file = BufReader::new(try!(File::open(&self.path).context(&self.path as &Path)));
-        let len = self.info.chunk_info_size;
+        let len = self.info.chunk_list_size;
        let start = self.content_start - len;
        try!(file.seek(SeekFrom::Start(start as u64)).context(&self.path as &Path));
        let mut chunk_data = Vec::with_capacity(len);
-        chunk_data.resize(self.info.chunk_info_size, 0);
+        chunk_data.resize(self.info.chunk_list_size, 0);
        try!(file.read_exact(&mut chunk_data).context(&self.path as &Path));
        if let Some(ref encryption) = self.info.encryption {
            chunk_data = try!(self.crypto.lock().unwrap().decrypt(&encryption, &chunk_data).context(&self.path as &Path));
--- a/src/bundledb/writer.rs
+++ b/src/bundledb/writer.rs
@ -114,7 +114,7 @@ impl BundleWriter {
            id: id.clone(),
            raw_size: self.raw_size,
            encoded_size: encoded_size,
-            chunk_info_size: chunk_data.len()
+            chunk_list_size: chunk_data.len()
        };
        let mut info_data = try!(msgpack::encode(&info).context(&path as &Path));
        if let Some(ref encryption) = self.encryption {
--- a/src/repository/backup_file.rs
+++ b/src/repository/backup_file.rs
@ -90,7 +90,7 @@ pub struct Backup {
    pub config: Config,
 }
 serde_impl!(Backup(u8?) {
-    root: Vec<Chunk> => 0,
+    root: ChunkList => 0,
    total_data_size: u64 => 1,
    changed_data_size: u64 => 2,
    deduplicated_data_size: u64 => 3,
--- a/src/repository/basic_io.rs
+++ b/src/repository/basic_io.rs
@ -36,7 +36,7 @@ impl Repository {
        let next_free_bundle_id = self.next_free_bundle_id();
        // Select a bundle writer according to the mode and...
        let writer = match mode {
-            BundleMode::Content => &mut self.content_bundle,
+            BundleMode::Data => &mut self.data_bundle,
            BundleMode::Meta => &mut self.meta_bundle
        };
        // ...alocate one if needed
@ -60,7 +60,7 @@ impl Repository {
            raw_size = writer_obj.raw_size();
        }
        let bundle_id = match mode {
-            BundleMode::Content => self.next_content_bundle,
+            BundleMode::Data => self.next_data_bundle,
            BundleMode::Meta => self.next_meta_bundle
        };
        // Finish bundle if over maximum size
@ -72,8 +72,8 @@ impl Repository {
            if self.next_meta_bundle == bundle_id {
                self.next_meta_bundle = next_free_bundle_id
            }
-            if self.next_content_bundle == bundle_id {
+            if self.next_data_bundle == bundle_id {
-                self.next_content_bundle = next_free_bundle_id
+                self.next_data_bundle = next_free_bundle_id
            }
            // Not saving the bundle map, this will be done by flush
        }
--- a/src/repository/integrity.rs
+++ b/src/repository/integrity.rs
@ -48,10 +48,10 @@ impl Repository {
    }
    fn check_repository(&self) -> Result<(), RepositoryError> {
-        if self.next_content_bundle == self.next_meta_bundle {
+        if self.next_data_bundle == self.next_meta_bundle {
            return Err(RepositoryIntegrityError::InvalidNextBundleId.into())
        }
-        if self.bundle_map.get(self.next_content_bundle).is_some() {
+        if self.bundle_map.get(self.next_data_bundle).is_some() {
            return Err(RepositoryIntegrityError::InvalidNextBundleId.into())
        }
        if self.bundle_map.get(self.next_meta_bundle).is_some() {
--- a/src/repository/metadata.rs
+++ b/src/repository/metadata.rs
@ -150,7 +150,7 @@ serde_impl!(Inode(u8?) {
    //__old_create_time: i64 => 8,
    symlink_target: Option<String> => 9,
    data: Option<FileData> => 10,
-    children: BTreeMap<String, ChunkList> => 11,
+    children: Option<BTreeMap<String, ChunkList>> => 11,
    cum_size: u64 => 12,
    cum_dirs: usize => 13,
    cum_files: usize => 14
@ -255,7 +255,7 @@ impl Repository {
                try!(file.read_to_end(&mut data));
                inode.data = Some(FileData::Inline(data.into()));
            } else {
-                let mut chunks = try!(self.put_stream(BundleMode::Content, &mut file));
+                let mut chunks = try!(self.put_stream(BundleMode::Data, &mut file));
                if chunks.len() < 10 {
                    inode.data = Some(FileData::ChunkedDirect(chunks));
                } else {
--- a/src/repository/mod.rs
+++ b/src/repository/mod.rs
@ -41,10 +41,10 @@ pub struct Repository {
    index: Index,
    crypto: Arc<Mutex<Crypto>>,
    bundle_map: BundleMap,
-    next_content_bundle: u32,
+    next_data_bundle: u32,
    next_meta_bundle: u32,
    bundles: BundleDb,
-    content_bundle: Option<BundleWriter>,
+    data_bundle: Option<BundleWriter>,
    meta_bundle: Option<BundleWriter>,
    chunker: Chunker,
    locks: LockFolder
@ -82,10 +82,10 @@ impl Repository {
            config: config,
            index: index,
            bundle_map: bundle_map,
-            next_content_bundle: 1,
+            next_data_bundle: 1,
            next_meta_bundle: 0,
            bundles: bundles,
-            content_bundle: None,
+            data_bundle: None,
            meta_bundle: None,
            crypto: crypto,
            locks: locks
@ -113,10 +113,10 @@ impl Repository {
            index: index,
            crypto: crypto,
            bundle_map: bundle_map,
-            next_content_bundle: 0,
+            next_data_bundle: 0,
            next_meta_bundle: 0,
            bundles: bundles,
-            content_bundle: None,
+            data_bundle: None,
            meta_bundle: None,
            locks: locks
        };
@ -128,7 +128,7 @@ impl Repository {
        }
        try!(repo.save_bundle_map());
        repo.next_meta_bundle = repo.next_free_bundle_id();
-        repo.next_content_bundle = repo.next_free_bundle_id();
+        repo.next_data_bundle = repo.next_free_bundle_id();
        Ok(repo)
    }
@ -177,7 +177,7 @@ impl Repository {
    #[inline]
    fn next_free_bundle_id(&self) -> u32 {
-        let mut id = max(self.next_content_bundle, self.next_meta_bundle) + 1;
+        let mut id = max(self.next_data_bundle, self.next_meta_bundle) + 1;
        while self.bundle_map.get(id).is_some() {
            id += 1;
        }
@ -185,14 +185,14 @@ impl Repository {
    }
    pub fn flush(&mut self) -> Result<(), RepositoryError> {
-        if self.content_bundle.is_some() {
+        if self.data_bundle.is_some() {
            let mut finished = None;
-            mem::swap(&mut self.content_bundle, &mut finished);
+            mem::swap(&mut self.data_bundle, &mut finished);
            {
                let bundle = try!(self.bundles.add_bundle(finished.unwrap()));
-                self.bundle_map.set(self.next_content_bundle, bundle.id.clone());
+                self.bundle_map.set(self.next_data_bundle, bundle.id.clone());
            }
-            self.next_content_bundle = self.next_free_bundle_id()
+            self.next_data_bundle = self.next_free_bundle_id()
        }
        if self.meta_bundle.is_some() {
            let mut finished = None;
@ -211,7 +211,7 @@ impl Repository {
    fn add_new_remote_bundle(&mut self, bundle: BundleInfo) -> Result<(), RepositoryError> {
        info!("Adding new bundle to index: {}", bundle.id);
        let bundle_id = match bundle.mode {
-            BundleMode::Content => self.next_content_bundle,
+            BundleMode::Data => self.next_data_bundle,
            BundleMode::Meta => self.next_meta_bundle
        };
        let chunks = try!(self.bundles.get_chunk_list(&bundle.id));
@ -219,8 +219,8 @@ impl Repository {
        if self.next_meta_bundle == bundle_id {
            self.next_meta_bundle = self.next_free_bundle_id()
        }
-        if self.next_content_bundle == bundle_id {
+        if self.next_data_bundle == bundle_id {
-            self.next_content_bundle = self.next_free_bundle_id()
+            self.next_data_bundle = self.next_free_bundle_id()
        }
        for (i, (hash, _len)) in chunks.into_inner().into_iter().enumerate() {
            try!(self.index.set(&hash, &Location{bundle: bundle_id as u32, chunk: i as u32}));
--- a/src/util/compression.rs
+++ b/src/util/compression.rs
@ -36,32 +36,32 @@ quick_error!{
 }
 #[derive(Clone, Debug, Copy, Eq, PartialEq)]
-pub enum CompressionAlgo {
+pub enum CompressionMethod {
    Deflate, // Standardized
    Brotli, // Good speed and ratio
-    Lzma2, // Very good ratio, slow
+    Lzma, // Very good ratio, slow
    Lz4 // Very fast, low ratio
 }
-serde_impl!(CompressionAlgo(u8) {
+serde_impl!(CompressionMethod(u8) {
    Deflate => 0,
    Brotli => 1,
-    Lzma2 => 2,
+    Lzma => 2,
    Lz4 => 3
 });
 #[derive(Clone, Debug, Eq, PartialEq)]
 pub struct Compression {
-    algo: CompressionAlgo,
+    method: CompressionMethod,
    level: u8
 }
 impl Default for Compression {
    fn default() -> Self {
-        Compression { algo: CompressionAlgo::Brotli, level: 3 }
+        Compression { method: CompressionMethod::Brotli, level: 3 }
    }
 }
 serde_impl!(Compression(u64) {
-    algo: CompressionAlgo => 0,
+    method: CompressionMethod => 0,
    level: u8 => 1
 });
@ -81,23 +81,23 @@ impl Compression {
        } else {
            (name, 5)
        };
-        let algo = match name {
+        let method = match name {
-            "deflate" | "zlib" | "gzip" => CompressionAlgo::Deflate,
+            "deflate" | "zlib" | "gzip" => CompressionMethod::Deflate,
-            "brotli" => CompressionAlgo::Brotli,
+            "brotli" => CompressionMethod::Brotli,
-            "lzma2" => CompressionAlgo::Lzma2,
+            "lzma" | "lzma2" | "xz" => CompressionMethod::Lzma,
-            "lz4" => CompressionAlgo::Lz4,
+            "lz4" => CompressionMethod::Lz4,
            _ => return Err(CompressionError::UnsupportedCodec(name.to_string()))
        };
-        Ok(Compression { algo: algo, level: level })
+        Ok(Compression { method: method, level: level })
    }
    #[inline]
    pub fn name(&self) -> &'static str {
-        match self.algo {
+        match self.method {
-            CompressionAlgo::Deflate => "deflate",
+            CompressionMethod::Deflate => "deflate",
-            CompressionAlgo::Brotli => "brotli",
+            CompressionMethod::Brotli => "brotli",
-            CompressionAlgo::Lzma2 => "lzma2",
+            CompressionMethod::Lzma => "lzma",
-            CompressionAlgo::Lz4 => "lz4",
+            CompressionMethod::Lz4 => "lz4",
        }
    }