mirror of https://github.com/dswd/zvault
Some format changes
This commit is contained in:
parent
70e6026e5a
commit
d11c791652
|
@ -103,6 +103,15 @@ Recommended: Brotli/2-7
|
|||
- File attributes
|
||||
- xattrs https://crates.io/crates/xattr
|
||||
|
||||
### Formats
|
||||
- Bundles
|
||||
- Encrypted bundle header
|
||||
- Random bundle name
|
||||
- Metadata
|
||||
- Arbitrarily nested chunk lists
|
||||
- Cumulative size, chunk count, dir/file count
|
||||
- Permissive msgpack mode
|
||||
|
||||
### CLI functionality
|
||||
- list --tree
|
||||
|
||||
|
|
|
@ -0,0 +1,136 @@
|
|||
# ZVault repository
|
||||
|
||||
This folder is a zVault remote repository and contains backup data.
|
||||
|
||||
The repository contains the following components:
|
||||
* The backup bundles in the subfolder `bundles`. The individual files are
|
||||
organized in subfolders and named after their bundle ids. The structure and
|
||||
names of the files is not important as the files include the bundle id in
|
||||
their headers. Thus the files can be renamed and reorganized.
|
||||
* The backup anchor files in the subfolder `backups`. The names of the files
|
||||
and their structure determine the backup names but are not used otherwise.
|
||||
* Active locks in the subfolder `locks`. This folder only contains lock files
|
||||
when the repository is currently used. If any zVault process crashes, a stale
|
||||
lock file might be left back. Those files can be safely removed if no process
|
||||
is running for sure.
|
||||
|
||||
|
||||
## Repository format
|
||||
|
||||
In case the zVault software is not available for restoring the backups included
|
||||
in this repository the following sections describe the format of the repository
|
||||
so that its contents can be read without zVault.
|
||||
|
||||
|
||||
### Bundle files
|
||||
The bundle file format consists of 4 parts:
|
||||
- A magic header with version
|
||||
- An encoded header structure
|
||||
- An encoded chunk list
|
||||
- The chunk data
|
||||
|
||||
The main reason for having those multiple parts is that it is expected that the
|
||||
smaller front parts can be read much faster than the the whole file. So
|
||||
information that is needed more frequently is put into earlier parts and the
|
||||
data that is need the least frequent is put into the latter part so that it does
|
||||
not slow down reading the front parts. Keeping those parts in separate files
|
||||
was also considered but rejected to increase the reliability of the storage.
|
||||
|
||||
|
||||
#### Magic header with version
|
||||
The first part of a bundle file contains an 8 byte magic header with version
|
||||
information.
|
||||
|
||||
The first 6 bytes of the header consist of the fixed string "zvault", followed
|
||||
by one byte with the fixed value 0x01. Those 7 bytes make up the magic header of
|
||||
the file and serve to identify the file type as a zvault bundle file.
|
||||
|
||||
The 8th byte of the first file part is the version of the file format. This
|
||||
value is currently 0x01 and is expected to be increased for any breaking changes
|
||||
in the file format.
|
||||
|
||||
|
||||
#### Encoded header structure
|
||||
The encoded header structure is the second part of the bundle file format and
|
||||
follows directly after the 8 bytes of the magic header.
|
||||
|
||||
The header structure is defined in the appendix as `BundleInfo` and contains
|
||||
general information on the bundle's contents and on how to decode the other two
|
||||
parts of the bundle file.
|
||||
|
||||
This header structure is encoded using the *MsgPack* format. It is neither
|
||||
compressed (since its size is pretty small) nor encrypted (since it only
|
||||
contains general information and no user data) in any way.
|
||||
|
||||
|
||||
#### Encoded chunk list
|
||||
The chunk list is the third part of the bundle file and follows directly after
|
||||
the encoded header structure.
|
||||
|
||||
The chunk list contains hashes and sizes of all chunks stored in this bundle in
|
||||
the order they are stored. The list is encoded efficiently as 20 bytes per chunk
|
||||
(16 for the hash and 4 for the size) as defined in the appendix as `ChunkList`.
|
||||
|
||||
Since the chunk list contains confidential information (the chunk hashes and
|
||||
sized can be used to identify files) the encoded chunk list is encrypted using
|
||||
the encryption method specified in the header structure. The header structure
|
||||
also contains the full size of the encoded and encrypted chunk list which is
|
||||
needed since the encryption could add some bytes for a nonce or an
|
||||
authentication code.
|
||||
|
||||
The chunk list is not compressed since the hashes have a very high entropy and
|
||||
do not compress significantly.
|
||||
|
||||
The chunk list is not stored in the header structure because it contains
|
||||
confidential data and the encryption method is stored in the header. Also the
|
||||
chunk list can be pretty big compared to the header which needs to be read more
|
||||
often.
|
||||
|
||||
|
||||
#### Chunk data
|
||||
The chunk data is the final part of a bundle file and follows after the encoded
|
||||
chunk list. The starting position can be obtained from the header as the encoded
|
||||
size of the chunk list is stored there.
|
||||
|
||||
The chunk data part consists of the content data of the chunks contained in this
|
||||
bundle simply concatenated without any separator. The actual size (and by
|
||||
summing up the sizes also the starting position) of each chunk can be obtained
|
||||
from the chunk list.
|
||||
|
||||
The chunk data is compressed as whole (solid archive) and encrypted with the
|
||||
methods specified in the bundle header structure.
|
||||
|
||||
|
||||
### Inode metadata
|
||||
TODO
|
||||
|
||||
### Backup format
|
||||
TODO
|
||||
|
||||
### Backup file
|
||||
TODO
|
||||
|
||||
|
||||
## Appendix
|
||||
|
||||
### Constants
|
||||
TODO
|
||||
|
||||
### Types
|
||||
|
||||
### `BundeInfo` encoding
|
||||
serde_impl!(BundleInfo(u64) {
|
||||
id: BundleId => 0,
|
||||
mode: BundleMode => 1,
|
||||
compression: Option<Compression> => 2,
|
||||
encryption: Option<Encryption> => 3,
|
||||
hash_method: HashMethod => 4,
|
||||
raw_size: usize => 6,
|
||||
encoded_size: usize => 7,
|
||||
chunk_count: usize => 8,
|
||||
chunk_info_size: usize => 9
|
||||
});
|
||||
|
||||
|
||||
### `ChunkList` encoding
|
||||
TODO
|
|
@ -78,7 +78,7 @@ pub struct BundleInfo {
|
|||
pub chunk_count: usize,
|
||||
pub chunk_info_size: usize
|
||||
}
|
||||
serde_impl!(BundleInfo(u64) {
|
||||
serde_impl!(BundleInfo(u64?) {
|
||||
id: BundleId => 0,
|
||||
mode: BundleMode => 1,
|
||||
compression: Option<Compression> => 2,
|
||||
|
|
|
@ -78,7 +78,7 @@ impl IChunker for Chunker {
|
|||
}
|
||||
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
|
||||
pub enum ChunkerType {
|
||||
Ae(usize),
|
||||
Rabin((usize, u32)),
|
||||
|
|
|
@ -89,7 +89,7 @@ fn print_backup(backup: &Backup) {
|
|||
pub fn format_inode_one_line(inode: &Inode) -> String {
|
||||
match inode.file_type {
|
||||
FileType::Directory => format!("{:25}\t{} entries", format!("{}/", inode.name), inode.children.as_ref().unwrap().len()),
|
||||
FileType::File => format!("{:25}\t{:>10}\t{}", inode.name, to_file_size(inode.size), Local.timestamp(inode.modify_time, 0).to_rfc2822()),
|
||||
FileType::File => format!("{:25}\t{:>10}\t{}", inode.name, to_file_size(inode.size), Local.timestamp(inode.timestamp, 0).to_rfc2822()),
|
||||
FileType::Symlink => format!("{:25}\t -> {}", inode.name, inode.symlink_target.as_ref().unwrap()),
|
||||
}
|
||||
}
|
||||
|
@ -101,7 +101,7 @@ fn print_inode(inode: &Inode) {
|
|||
println!("Permissions: {:3o}", inode.mode);
|
||||
println!("User: {}", inode.user);
|
||||
println!("Group: {}", inode.group);
|
||||
println!("Modification time: {}", Local.timestamp(inode.modify_time, 0).to_rfc2822());
|
||||
println!("Timestamp: {}", Local.timestamp(inode.timestamp, 0).to_rfc2822());
|
||||
if let Some(ref target) = inode.symlink_target {
|
||||
println!("Symlink target: {}", target);
|
||||
}
|
||||
|
|
|
@ -87,8 +87,8 @@ impl FuseInode {
|
|||
ino: self.num,
|
||||
size: self.inode.size,
|
||||
blocks: self.inode.size / 512,
|
||||
atime: Timespec::new(self.inode.modify_time, 0),
|
||||
mtime: Timespec::new(self.inode.modify_time, 0),
|
||||
atime: Timespec::new(self.inode.timestamp, 0),
|
||||
mtime: Timespec::new(self.inode.timestamp, 0),
|
||||
ctime: Timespec::new(0, 0),
|
||||
crtime: Timespec::new(0, 0),
|
||||
kind: convert_file_type(self.inode.file_type),
|
||||
|
|
|
@ -290,14 +290,14 @@ impl Repository {
|
|||
for (name, backup) in try!(self.get_backups()) {
|
||||
match self.get_backup_inode(&backup, path) {
|
||||
Ok(inode) => {
|
||||
versions.insert((inode.file_type, inode.modify_time, inode.size), (name, inode));
|
||||
versions.insert((inode.file_type, inode.timestamp, inode.size), (name, inode));
|
||||
},
|
||||
Err(RepositoryError::NoSuchFileInBackup(..)) => continue,
|
||||
Err(err) => return Err(err)
|
||||
}
|
||||
}
|
||||
let mut versions: Vec<_> = versions.into_iter().map(|(_, v)| v).collect();
|
||||
versions.sort_by_key(|v| v.1.modify_time);
|
||||
versions.sort_by_key(|v| v.1.timestamp);
|
||||
Ok(versions)
|
||||
}
|
||||
|
||||
|
|
|
@ -89,7 +89,7 @@ pub struct Backup {
|
|||
pub path: String,
|
||||
pub config: Config,
|
||||
}
|
||||
serde_impl!(Backup(u8) {
|
||||
serde_impl!(Backup(u8?) {
|
||||
root: Vec<Chunk> => 0,
|
||||
total_data_size: u64 => 1,
|
||||
changed_data_size: u64 => 2,
|
||||
|
|
|
@ -149,7 +149,7 @@ serde_impl!(ConfigYaml(String) {
|
|||
|
||||
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
#[derive(Debug, Clone, Eq, PartialEq)]
|
||||
pub struct Config {
|
||||
pub compression: Option<Compression>,
|
||||
pub encryption: Option<Encryption>,
|
||||
|
|
|
@ -112,7 +112,7 @@ pub struct Inode {
|
|||
pub user: u32,
|
||||
pub group: u32,
|
||||
pub __old_access_time: i64,
|
||||
pub modify_time: i64,
|
||||
pub timestamp: i64,
|
||||
pub __old_create_time: i64,
|
||||
pub symlink_target: Option<String>,
|
||||
pub contents: Option<FileContents>,
|
||||
|
@ -128,7 +128,7 @@ impl Default for Inode {
|
|||
user: 1000,
|
||||
group: 1000,
|
||||
__old_access_time: 0,
|
||||
modify_time: 0,
|
||||
timestamp: 0,
|
||||
__old_create_time: 0,
|
||||
symlink_target: None,
|
||||
contents: None,
|
||||
|
@ -136,7 +136,7 @@ impl Default for Inode {
|
|||
}
|
||||
}
|
||||
}
|
||||
serde_impl!(Inode(u8) {
|
||||
serde_impl!(Inode(u8?) {
|
||||
name: String => 0,
|
||||
size: u64 => 1,
|
||||
file_type: FileType => 2,
|
||||
|
@ -144,7 +144,7 @@ serde_impl!(Inode(u8) {
|
|||
user: u32 => 4,
|
||||
group: u32 => 5,
|
||||
__old_access_time: i64 => 6,
|
||||
modify_time: i64 => 7,
|
||||
timestamp: i64 => 7,
|
||||
__old_create_time: i64 => 8,
|
||||
symlink_target: Option<String> => 9,
|
||||
contents: Option<FileContents> => 10,
|
||||
|
@ -177,7 +177,7 @@ impl Inode {
|
|||
inode.mode = meta.st_mode();
|
||||
inode.user = meta.st_uid();
|
||||
inode.group = meta.st_gid();
|
||||
inode.modify_time = meta.st_mtime();
|
||||
inode.timestamp = meta.st_mtime();
|
||||
Ok(inode)
|
||||
}
|
||||
|
||||
|
@ -204,11 +204,8 @@ impl Inode {
|
|||
&full_path,
|
||||
Permissions::from_mode(self.mode)
|
||||
).map_err(|e| InodeError::SetPermissions(e, full_path.clone(), self.mode)));
|
||||
try!(filetime::set_file_times(
|
||||
&full_path,
|
||||
FileTime::from_seconds_since_1970(self.modify_time as u64, 0),
|
||||
FileTime::from_seconds_since_1970(self.modify_time as u64, 0)
|
||||
).map_err(|e| InodeError::SetTimes(e, full_path.clone())));
|
||||
let time = FileTime::from_seconds_since_1970(self.timestamp as u64, 0);
|
||||
try!(filetime::set_file_times(&full_path, time, time).map_err(|e| InodeError::SetTimes(e, full_path.clone())));
|
||||
try!(chown(&full_path, self.user, self.group).map_err(|e| InodeError::SetOwnership(e, full_path.clone())));
|
||||
Ok(file)
|
||||
}
|
||||
|
@ -216,11 +213,11 @@ impl Inode {
|
|||
pub fn is_same_meta(&self, other: &Inode) -> bool {
|
||||
self.file_type == other.file_type && self.size == other.size && self.mode == other.mode
|
||||
&& self.user == other.user && self.group == other.group && self.name == other.name
|
||||
&& self.modify_time == other.modify_time && self.symlink_target == other.symlink_target
|
||||
&& self.timestamp == other.timestamp && self.symlink_target == other.symlink_target
|
||||
}
|
||||
|
||||
pub fn is_same_meta_quick(&self, other: &Inode) -> bool {
|
||||
self.modify_time == other.modify_time
|
||||
self.timestamp == other.timestamp
|
||||
&& self.file_type == other.file_type
|
||||
&& self.size == other.size
|
||||
}
|
||||
|
|
|
@ -29,6 +29,7 @@ pub use self::info::{RepositoryInfo, BundleAnalysis};
|
|||
use self::bundle_map::BundleMap;
|
||||
|
||||
|
||||
const REPOSITORY_README: &'static [u8] = include_bytes!("../../docs/repository_readme.md");
|
||||
const DEFAULT_EXCLUDES: &'static [u8] = include_bytes!("../../excludes.default");
|
||||
|
||||
|
||||
|
@ -59,6 +60,8 @@ impl Repository {
|
|||
try!(fs::create_dir(path.join("keys")));
|
||||
let crypto = Arc::new(Mutex::new(try!(Crypto::open(path.join("keys")))));
|
||||
try!(symlink(remote, path.join("remote")));
|
||||
let mut remote_readme = try!(File::create(path.join("remote/README.md")));
|
||||
try!(remote_readme.write_all(REPOSITORY_README));
|
||||
try!(fs::create_dir_all(path.join("remote/locks")));
|
||||
let locks = LockFolder::new(path.join("remote/locks"));
|
||||
let bundles = try!(BundleDb::create(
|
||||
|
|
|
@ -35,7 +35,7 @@ quick_error!{
|
|||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Copy)]
|
||||
#[derive(Clone, Debug, Copy, Eq, PartialEq)]
|
||||
pub enum CompressionAlgo {
|
||||
Deflate, // Standardized
|
||||
Brotli, // Good speed and ratio
|
||||
|
@ -50,7 +50,7 @@ serde_impl!(CompressionAlgo(u8) {
|
|||
});
|
||||
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
pub struct Compression {
|
||||
algo: CompressionAlgo,
|
||||
level: u8
|
||||
|
|
|
@ -94,7 +94,7 @@ impl Deserialize for Hash {
|
|||
}
|
||||
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
|
||||
pub enum HashMethod {
|
||||
Blake2,
|
||||
Murmur3
|
||||
|
|
Loading…
Reference in New Issue