mirror of
https://github.com/deepseek-ai/3FS
synced 2025-06-26 18:16:45 +00:00
Initial commit
This commit is contained in:
2
src/storage/chunk_engine/.gitignore
vendored
Normal file
2
src/storage/chunk_engine/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
/target
|
||||
/lcov.info
|
||||
40
src/storage/chunk_engine/Cargo.toml
Normal file
40
src/storage/chunk_engine/Cargo.toml
Normal file
@@ -0,0 +1,40 @@
|
||||
[package]
|
||||
name = "chunk_engine"
|
||||
version = "0.1.11"
|
||||
edition = "2021"
|
||||
|
||||
[lib]
|
||||
crate-type = ["lib", "staticlib"]
|
||||
|
||||
[dependencies]
|
||||
anyhow = "1"
|
||||
byteorder = "1"
|
||||
crc32c = "0"
|
||||
cxx = "1"
|
||||
dashmap = "6"
|
||||
derse = { version = ">=0.1.32", features = ["tinyvec"] }
|
||||
lazy_static = "1"
|
||||
libc = "0"
|
||||
lockmap = "0.1.6"
|
||||
rand = "0"
|
||||
rocksdb = "0"
|
||||
rolling-file = "0"
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
static_assertions = "1"
|
||||
tinyvec = { version = "1", features = ["alloc"] }
|
||||
toml = "0"
|
||||
tracing = "0"
|
||||
tracing-appender = "0"
|
||||
tracing-subscriber = { version = "0", features = ["fmt"] }
|
||||
|
||||
[dev-dependencies]
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
tempfile = "3"
|
||||
criterion = "0"
|
||||
|
||||
[build-dependencies]
|
||||
cxx-build = "1"
|
||||
|
||||
[[bench]]
|
||||
name = "bench_allocator"
|
||||
harness = false
|
||||
62
src/storage/chunk_engine/README.md
Normal file
62
src/storage/chunk_engine/README.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# chunk-engine
|
||||
|
||||
### Design
|
||||
|
||||
1. The entire Chunk Engine can be divided into two components:
|
||||
1. **Allocator**: Responsible for allocating/reclaiming chunks and modifying memory states.
|
||||
2. **MetaStore**: Responsible for persisting allocation/reclamation events.
|
||||
2. Workflow for writing a new chunk:
|
||||
1. The **Allocator** assigns a new chunk position, pointing to a disk space (purely in-memory operation).
|
||||
2. Write data to this chunk position. If a power failure or write failure occurs at this stage, no existing data is affected.
|
||||
3. Generate corresponding chunk metadata and persist it alongside the allocation event to the **MetaStore**. Using RocksDB's WriteBatch ensures **atomic** updates—the entire write operation either succeeds or fails, with no intermediate states.
|
||||
3. Maintaining the Allocator's in-memory state:
|
||||
1. At startup, the Allocator **quickly** loads all allocation information from RocksDB.
|
||||
2. Allocation is performed in-memory first, followed by persistence. If a failure occurs before persistence, the allocation event is lost.
|
||||
3. Reclamation first persists the event to disk, then modifies the memory state. Even if a chunk deletion event is persisted, the chunk remains readable as long as memory holds its reference.
|
||||
4. This ensures conflict-free read/write operations: a read operation acquires a chunk reference, guaranteeing the chunk's validity until the read completes.
|
||||
4. Use `Arc` to manage ownership of chunk position:
|
||||
1. For allocation, returns an `Arc<ChunkPos>`. If persistence fails, the position is automatically released when the `Arc` is dropped.
|
||||
2. Read operations also return an `Arc<ChunkPos>`, ensuring safe data access even during concurrent writes or deletions.
|
||||
|
||||
### Allocator
|
||||
|
||||
Storage hierarchy:
|
||||
|
||||
1. **Chunk**: Basic data unit, currently proposed as 64KB, 512KB, and 4MB.
|
||||
2. **Group**: Each group contains 256 chunks (16MB, 128MB, or 1GB depending on chunk size).
|
||||
3. **File**: For 512KB chunks, a single file (~120GB) contains ~960 groups.
|
||||
4. **Disk**: Single disk capacity of 30TB, divided into 256 files per chunk size.
|
||||
5. **Node**: A single node contains 10–20 disks.
|
||||
|
||||
This configuration supports up to ~1.2 billion chunks and ~5 million groups per machine.
|
||||
|
||||
Implementation details:
|
||||
1. Each group uses a 256-bit bitset (4 `uint64_t`) to track allocation status.
|
||||
2. Maintain three in-memory structures:
|
||||
- `allocated_groups`: Groups with allocated space but no chunks assigned.
|
||||
- `unallocated_groups`: Groups without allocated space.
|
||||
- `active_groups`: Map of `<group_id, group_state>` tracking allocation status.
|
||||
3. Chunk allocation workflow:
|
||||
1. Prioritize finding free slots in `active_groups` using **`__builtin_ctz`** for fast bitwise operations.
|
||||
2. If `active_groups` is empty, acquire a new group from `allocated_groups`.
|
||||
3. If `allocated_groups` is empty, fetch a group from `unallocated_groups` and allocate disk space synchronously.
|
||||
4. Background threads:
|
||||
- **`allocate_thread`**: Maintains `active_groups` within a target size range to ensure in-memory allocation efficiency.
|
||||
- **`compact_thread`**: Periodically scans `active_groups`, migrates all chunks from selected groups, releases space, and returns groups to `allocated_groups`.
|
||||
|
||||
### MetaStore
|
||||
|
||||
Persists three mappings:
|
||||
1. **`chunk_id -> chunk_meta`**: Metadata includes chunk location, length, hash, version, etc., serialized using **`derse`**.
|
||||
2. **`group_id -> group_state`**: Tracks chunk allocation status within groups, leveraging RocksDB's **MergeOp** for atomic updates.
|
||||
3. **`chunk_pos -> chunk_id`**: Maps physical positions to chunk IDs, used by `compact_thread` during chunk migration.
|
||||
|
||||
### Chunk Engine
|
||||
|
||||
1. **MetaCache**: Maintains an in-memory `chunk_id -> chunk_info` mapping, where `chunk_info` includes `chunk_meta` and `Arc<ChunkPos>`.
|
||||
2. **Read operation**: Returns `chunk_info`. The `Arc<ChunkPos>` ensures safe data access until the read completes.
|
||||
3. **Write operation workflow**:
|
||||
1. Query `MetaCache` to retrieve the current `chunk_info`.
|
||||
2. Invoke `Allocator::allocate()` to obtain a new chunk position.
|
||||
3. Read existing chunk data, write it to the new chunk position, append the new write request, and generate `new_chunk_info`.
|
||||
4. Persist `new_chunk_info` to the **MetaStore** along with a release record for the original chunk position.
|
||||
42
src/storage/chunk_engine/benches/bench_allocator.rs
Normal file
42
src/storage/chunk_engine/benches/bench_allocator.rs
Normal file
@@ -0,0 +1,42 @@
|
||||
use chunk_engine::*;
|
||||
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
|
||||
use std::sync::Arc;
|
||||
|
||||
fn allocate(allocator: &Arc<Allocator>, n: usize) {
|
||||
for _ in 0..n {
|
||||
drop(allocator.allocate(true).unwrap());
|
||||
}
|
||||
}
|
||||
|
||||
fn criterion_benchmark(c: &mut Criterion) {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let cluster_config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
let clusters = Clusters::open(&cluster_config).unwrap();
|
||||
|
||||
let meta_store_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().join("meta"),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let meta_store = std::sync::Arc::new(MetaStore::open(&meta_store_config).unwrap());
|
||||
|
||||
let allocator = chunk_engine::Allocator::load(clusters, meta_store.iterator()).unwrap();
|
||||
allocator.do_allocate_task(1, 1, &meta_store).unwrap();
|
||||
|
||||
let count: usize = 1 << 16;
|
||||
|
||||
c.bench_with_input(BenchmarkId::new("allocate", count), &count, |b, &c| {
|
||||
b.iter(|| allocate(&allocator, c))
|
||||
});
|
||||
}
|
||||
|
||||
criterion_group!(benches, criterion_benchmark);
|
||||
criterion_main!(benches);
|
||||
4
src/storage/chunk_engine/build.rs
Normal file
4
src/storage/chunk_engine/build.rs
Normal file
@@ -0,0 +1,4 @@
|
||||
fn main() {
|
||||
let _ = cxx_build::bridge("src/cxx.rs");
|
||||
println!("cargo:rerun-if-changed=src/cxx.rs");
|
||||
}
|
||||
182
src/storage/chunk_engine/docs/architecture.drawio.svg
Normal file
182
src/storage/chunk_engine/docs/architecture.drawio.svg
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 77 KiB |
94
src/storage/chunk_engine/examples/chunk_viewer.rs
Normal file
94
src/storage/chunk_engine/examples/chunk_viewer.rs
Normal file
@@ -0,0 +1,94 @@
|
||||
use chunk_engine::*;
|
||||
use clap::Parser;
|
||||
use derse::Deserialize;
|
||||
use std::{
|
||||
collections::{BTreeMap, HashMap},
|
||||
path::PathBuf,
|
||||
sync::Arc,
|
||||
};
|
||||
|
||||
/// A distributed copy/move tool.
|
||||
#[derive(Parser, Debug, Clone)]
|
||||
#[command(version, about, long_about = None)]
|
||||
pub struct Args {
|
||||
/// Path to rocksdb.
|
||||
pub path: PathBuf,
|
||||
}
|
||||
|
||||
fn main() -> Result<()> {
|
||||
let args = Args::parse();
|
||||
|
||||
let meta_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: args.path,
|
||||
create: false,
|
||||
read_only: true,
|
||||
},
|
||||
prefix_len: 4,
|
||||
};
|
||||
let meta_store = MetaStore::open(&meta_config)?;
|
||||
|
||||
let mut chunk_allocators = HashMap::new();
|
||||
let mut used_map = BTreeMap::new();
|
||||
let mut reversed_map = BTreeMap::new();
|
||||
let mut group_count = BTreeMap::new();
|
||||
let mut chunk_size = CHUNK_SIZE_SMALL;
|
||||
let mut real_map = BTreeMap::new();
|
||||
loop {
|
||||
let counter = Arc::new(AllocatorCounter::new(chunk_size));
|
||||
let it = meta_store.iterator();
|
||||
let chunk_allocator = ChunkAllocator::load(it, counter.clone(), chunk_size)?;
|
||||
let allocated_chunks = counter.allocated_chunks();
|
||||
let reserved_chunks = counter.reserved_chunks();
|
||||
used_map.insert(chunk_size, allocated_chunks - reserved_chunks);
|
||||
reversed_map.insert(chunk_size, reserved_chunks);
|
||||
group_count.insert(
|
||||
chunk_size,
|
||||
(
|
||||
chunk_allocator.full_groups.len(),
|
||||
chunk_allocator.active_groups.len(),
|
||||
),
|
||||
);
|
||||
real_map.insert(chunk_size, 0u64);
|
||||
chunk_allocators.insert(chunk_size, chunk_allocator);
|
||||
|
||||
if chunk_size >= CHUNK_SIZE_ULTRA {
|
||||
break;
|
||||
}
|
||||
chunk_size *= 2;
|
||||
}
|
||||
|
||||
let mut it = meta_store.iterator();
|
||||
let end_key = MetaKey::chunk_meta_key_prefix();
|
||||
it.seek(&end_key)?;
|
||||
|
||||
if it.key() == Some(end_key.as_ref()) {
|
||||
it.next(); // [begin, end)
|
||||
}
|
||||
|
||||
loop {
|
||||
if !it.valid() {
|
||||
break;
|
||||
}
|
||||
|
||||
if it.key().unwrap()[0] != MetaKey::CHUNK_META_KEY_PREFIX {
|
||||
break;
|
||||
}
|
||||
|
||||
let chunk_meta =
|
||||
ChunkMeta::deserialize(it.value().unwrap()).map_err(Error::SerializationError)?;
|
||||
|
||||
let chunk_size = chunk_meta.pos.chunk_size();
|
||||
let allocator = chunk_allocators.get_mut(&chunk_size).unwrap();
|
||||
allocator.reference(chunk_meta.pos, true);
|
||||
real_map.entry(chunk_size).and_modify(|v| *v += 1);
|
||||
|
||||
it.next();
|
||||
}
|
||||
println!("{:#?}", used_map);
|
||||
println!("{:#?}", reversed_map);
|
||||
println!("{:#?}", group_count);
|
||||
assert_eq!(used_map, real_map);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
258
src/storage/chunk_engine/src/alloc/allocator.rs
Normal file
258
src/storage/chunk_engine/src/alloc/allocator.rs
Normal file
@@ -0,0 +1,258 @@
|
||||
use super::super::*;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
pub struct Allocator {
|
||||
allocator: Mutex<ChunkAllocator>,
|
||||
pub counter: Arc<AllocatorCounter>,
|
||||
pub clusters: Clusters,
|
||||
}
|
||||
|
||||
impl Allocator {
|
||||
pub fn load(clusters: Clusters, it: RocksDBIterator) -> Result<Arc<Allocator>> {
|
||||
let counter = Arc::new(AllocatorCounter::new(clusters.chunk_size));
|
||||
Ok(Arc::new(Self {
|
||||
allocator: Mutex::new(ChunkAllocator::load(
|
||||
it,
|
||||
counter.clone(),
|
||||
clusters.chunk_size,
|
||||
)?),
|
||||
counter,
|
||||
clusters,
|
||||
}))
|
||||
}
|
||||
|
||||
pub fn allocate(self: &Arc<Self>, allow_to_allocate: bool) -> Result<Chunk> {
|
||||
let this = self.as_ref();
|
||||
let mut allocator = this.allocator.lock().unwrap();
|
||||
allocator
|
||||
.allocate(&this.clusters, allow_to_allocate)
|
||||
.map(|pos| {
|
||||
Chunk::new(
|
||||
ChunkMeta {
|
||||
pos,
|
||||
..Default::default()
|
||||
},
|
||||
self.clone(),
|
||||
)
|
||||
})
|
||||
}
|
||||
|
||||
pub fn reference(self: &Arc<Self>, meta: ChunkMeta, first_ref: bool) -> Chunk {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator.reference(meta.pos, first_ref);
|
||||
Chunk::new(meta, self.clone())
|
||||
}
|
||||
|
||||
pub fn dereference(&self, pos: Position) {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator.dereference(pos)
|
||||
}
|
||||
|
||||
pub fn get_allocate_task(&self, min_remain: usize, max_remain: usize) -> AllocateTask {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator
|
||||
.group_allocator
|
||||
.get_allocate_task(min_remain, max_remain)
|
||||
}
|
||||
|
||||
pub fn finish_allocate_task(&self, task: AllocateTask, succ: bool) {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator.group_allocator.finish_allocate_task(task, succ);
|
||||
}
|
||||
|
||||
pub fn do_allocate_task(
|
||||
&self,
|
||||
min_remain: usize,
|
||||
max_remain: usize,
|
||||
meta_store: &MetaStore,
|
||||
) -> Result<AllocateTask> {
|
||||
let task = self.get_allocate_task(min_remain, max_remain);
|
||||
|
||||
let result = match task {
|
||||
AllocateTask::None => return Ok(task),
|
||||
AllocateTask::Allocate(group_id) => (|| {
|
||||
self.clusters.allocate(group_id)?;
|
||||
meta_store.allocate_group(group_id)
|
||||
})(),
|
||||
AllocateTask::Deallocate(group_id) => (|| {
|
||||
tracing::warn!("deallocate group: {:?}", group_id);
|
||||
meta_store.remove_group(group_id)?;
|
||||
self.clusters.deallocate(group_id)
|
||||
})(),
|
||||
};
|
||||
|
||||
self.finish_allocate_task(task, result.is_ok());
|
||||
|
||||
result?;
|
||||
Ok(task)
|
||||
}
|
||||
|
||||
pub fn get_compact_task(&self, max_reserved: u64) -> Option<GroupId> {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator.get_compact_task(max_reserved)
|
||||
}
|
||||
|
||||
pub fn finish_compact_task(&self, group_id: GroupId) {
|
||||
let mut allocator = self.allocator.lock().unwrap();
|
||||
allocator.finish_compact_task(group_id)
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for Allocator {
|
||||
fn drop(&mut self) {
|
||||
tracing::info!("Allocator {:?} is dropping...", self.clusters);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_allocator() {
|
||||
use rand::seq::SliceRandom;
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let cluster_config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
let clusters = Clusters::open(&cluster_config).unwrap();
|
||||
|
||||
let meta_store_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().join("meta"),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let meta_store = Arc::new(MetaStore::open(&meta_store_config).unwrap());
|
||||
|
||||
let allocator = Allocator::load(clusters, meta_store.iterator()).unwrap();
|
||||
|
||||
for _ in 0..10000 {
|
||||
let chunk = allocator.allocate(true).unwrap();
|
||||
assert_eq!(chunk.meta().pos, Position::new(GroupId::default(), 0));
|
||||
}
|
||||
|
||||
const N: usize = 1000;
|
||||
let mut chunks = vec![];
|
||||
for _ in 0..N {
|
||||
let chunk = allocator.allocate(true).unwrap();
|
||||
chunks.push(std::sync::Arc::new(chunk));
|
||||
}
|
||||
|
||||
{
|
||||
let allocator = allocator.allocator.lock().unwrap();
|
||||
assert_eq!(allocator.full_groups.len(), N / 256);
|
||||
assert_eq!(allocator.active_groups.len(), 1);
|
||||
assert_eq!(
|
||||
allocator.active_groups.iter().next().unwrap().1.count() as usize,
|
||||
N % 256
|
||||
);
|
||||
}
|
||||
|
||||
const T: usize = 8;
|
||||
(0..T)
|
||||
.map(|i| {
|
||||
let chunks = chunks.clone();
|
||||
std::thread::spawn(move || {
|
||||
let mut vec = create_aligned_vec(ALIGN_SIZE);
|
||||
vec.fill(0);
|
||||
for chunk in chunks.iter() {
|
||||
if chunk.meta().pos.index() as usize % T == i {
|
||||
vec.fill(chunk.meta().pos.index());
|
||||
chunk.pwrite(&vec[..], 0).unwrap();
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect::<Vec<_>>()
|
||||
.into_iter()
|
||||
.for_each(|t| t.join().unwrap());
|
||||
|
||||
chunks.shuffle(&mut rand::thread_rng());
|
||||
|
||||
(0..T)
|
||||
.map(|i| {
|
||||
let chunks = chunks.clone();
|
||||
std::thread::spawn(move || {
|
||||
let mut buf = [0u8; 8];
|
||||
for chunk in chunks.iter() {
|
||||
if chunk.meta().pos.index() as usize % T == i {
|
||||
assert!(chunk.pread(&mut buf, 0).is_ok());
|
||||
assert_eq!(buf, [chunk.meta().pos.index(); 8]);
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect::<Vec<_>>()
|
||||
.into_iter()
|
||||
.for_each(|t| t.join().unwrap());
|
||||
|
||||
chunks.clear();
|
||||
|
||||
{
|
||||
let allocator = allocator.allocator.lock().unwrap();
|
||||
assert!(allocator.full_groups.is_empty());
|
||||
assert!(allocator.active_groups.is_empty());
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_allocator_do_allocate_task() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
const S: Size = CHUNK_SIZE_NORMAL;
|
||||
|
||||
let cluster_config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: S,
|
||||
create: true,
|
||||
};
|
||||
let clusters = Clusters::open(&cluster_config).unwrap();
|
||||
|
||||
let meta_store_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().join("meta"),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let meta_store = Arc::new(MetaStore::open(&meta_store_config).unwrap());
|
||||
|
||||
let allocator = Allocator::load(clusters, meta_store.iterator()).unwrap();
|
||||
|
||||
for _ in 0..4 {
|
||||
assert!(matches!(
|
||||
allocator.do_allocate_task(4, 8, &meta_store).unwrap(),
|
||||
AllocateTask::Allocate(_)
|
||||
));
|
||||
}
|
||||
assert!(matches!(
|
||||
allocator.do_allocate_task(4, 8, &meta_store).unwrap(),
|
||||
AllocateTask::None
|
||||
));
|
||||
|
||||
let s = allocator.counter.used_size();
|
||||
assert_eq!(s.allocated_size, S * GroupState::TOTAL_BITS as u64 * 4);
|
||||
assert_eq!(s.reserved_size, S * GroupState::TOTAL_BITS as u64 * 4);
|
||||
|
||||
for _ in 2..4 {
|
||||
assert!(matches!(
|
||||
allocator.do_allocate_task(1, 2, &meta_store).unwrap(),
|
||||
AllocateTask::Deallocate(_)
|
||||
));
|
||||
}
|
||||
assert!(matches!(
|
||||
allocator.do_allocate_task(1, 2, &meta_store).unwrap(),
|
||||
AllocateTask::None
|
||||
));
|
||||
|
||||
let s = allocator.counter.used_size();
|
||||
assert_eq!(s.allocated_size, S * GroupState::TOTAL_BITS as u64 * 2);
|
||||
assert_eq!(s.reserved_size, S * GroupState::TOTAL_BITS as u64 * 2);
|
||||
}
|
||||
}
|
||||
88
src/storage/chunk_engine/src/alloc/allocator_counter.rs
Normal file
88
src/storage/chunk_engine/src/alloc/allocator_counter.rs
Normal file
@@ -0,0 +1,88 @@
|
||||
use super::super::*;
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
|
||||
#[derive(Default)]
|
||||
pub struct AllocatorCounter {
|
||||
pub chunk_size: Size,
|
||||
pub allocated_chunks: AtomicU64,
|
||||
pub reserved_chunks: AtomicU64,
|
||||
pub position_count: AtomicU64,
|
||||
pub position_rc: AtomicU64,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
#[repr(C)]
|
||||
pub struct UsedSize {
|
||||
pub allocated_size: Size,
|
||||
pub reserved_size: Size,
|
||||
pub position_count: u64,
|
||||
pub position_rc: u64,
|
||||
}
|
||||
|
||||
impl std::iter::Sum for UsedSize {
|
||||
fn sum<I: Iterator<Item = Self>>(iter: I) -> Self {
|
||||
let mut s = UsedSize::default();
|
||||
for i in iter {
|
||||
s.allocated_size += i.allocated_size;
|
||||
s.reserved_size += i.reserved_size;
|
||||
s.position_count += i.position_count;
|
||||
s.position_rc += i.position_rc;
|
||||
}
|
||||
s
|
||||
}
|
||||
}
|
||||
|
||||
impl AllocatorCounter {
|
||||
pub fn new(chunk_size: Size) -> Self {
|
||||
Self {
|
||||
chunk_size,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
pub fn allocated_chunks(&self) -> u64 {
|
||||
self.allocated_chunks.load(Ordering::Acquire)
|
||||
}
|
||||
|
||||
pub fn reserved_chunks(&self) -> u64 {
|
||||
self.reserved_chunks.load(Ordering::Acquire)
|
||||
}
|
||||
|
||||
pub fn used_size(&self) -> UsedSize {
|
||||
UsedSize {
|
||||
allocated_size: self.allocated_chunks() * self.chunk_size,
|
||||
reserved_size: self.reserved_chunks() * self.chunk_size,
|
||||
position_count: self.position_count.load(Ordering::Acquire),
|
||||
position_rc: self.position_rc.load(Ordering::Acquire),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn init(&self, allocated_count: u64, reserved_count: u64) {
|
||||
self.allocated_chunks
|
||||
.store(allocated_count, Ordering::Release);
|
||||
self.reserved_chunks
|
||||
.store(reserved_count, Ordering::Release);
|
||||
}
|
||||
|
||||
pub fn allocate_group(&self) {
|
||||
self.allocated_chunks
|
||||
.fetch_add(GroupState::TOTAL_BITS as u64, Ordering::SeqCst);
|
||||
self.reserved_chunks
|
||||
.fetch_add(GroupState::TOTAL_BITS as u64, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn deallocate_group(&self) {
|
||||
self.allocated_chunks
|
||||
.fetch_sub(GroupState::TOTAL_BITS as u64, Ordering::SeqCst);
|
||||
self.reserved_chunks
|
||||
.fetch_sub(GroupState::TOTAL_BITS as u64, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn allocate_chunk(&self) {
|
||||
self.reserved_chunks.fetch_sub(1, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn deallocate_chunk(&self) {
|
||||
self.reserved_chunks.fetch_add(1, Ordering::SeqCst);
|
||||
}
|
||||
}
|
||||
200
src/storage/chunk_engine/src/alloc/allocators.rs
Normal file
200
src/storage/chunk_engine/src/alloc/allocators.rs
Normal file
@@ -0,0 +1,200 @@
|
||||
use super::super::*;
|
||||
use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct Allocators {
|
||||
pub vec: [Arc<Allocator>; CHUNK_SIZE_NUMBER],
|
||||
meta_store: Arc<MetaStore>,
|
||||
}
|
||||
|
||||
impl Allocators {
|
||||
pub fn new(path: &Path, create: bool, meta_store: Arc<MetaStore>) -> Result<Self> {
|
||||
let mut allocators = vec![];
|
||||
for i in 0..CHUNK_SIZE_NUMBER {
|
||||
let chunk_size = CHUNK_SIZE_SMALL * (1 << i);
|
||||
let allocator = Self::create(path, create, &meta_store, chunk_size)?;
|
||||
allocators.push(allocator);
|
||||
}
|
||||
|
||||
Ok(Self {
|
||||
vec: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10].map(|i| allocators[i].clone()),
|
||||
meta_store,
|
||||
})
|
||||
}
|
||||
|
||||
fn create(
|
||||
path: &Path,
|
||||
create: bool,
|
||||
meta_store: &Arc<MetaStore>,
|
||||
chunk_size: Size,
|
||||
) -> Result<Arc<Allocator>> {
|
||||
let cluster_config = ClustersConfig {
|
||||
path: path.join(chunk_size.to_string()),
|
||||
chunk_size,
|
||||
create,
|
||||
};
|
||||
let clusters = Clusters::open(&cluster_config)?;
|
||||
let allocator = Allocator::load(clusters, meta_store.iterator())?;
|
||||
tracing::info!("Allocator {:?} is created...", allocator.clusters);
|
||||
Result::Ok(allocator)
|
||||
}
|
||||
|
||||
pub fn select_by_pos(&self, pos: Position) -> Result<&Arc<Allocator>> {
|
||||
let chunk_size = pos.chunk_size();
|
||||
if chunk_size.is_power_of_two()
|
||||
&& CHUNK_SIZE_SMALL <= chunk_size
|
||||
&& chunk_size <= CHUNK_SIZE_ULTRA
|
||||
{
|
||||
Ok(&self.vec[chunk_size.trailing_zeros() as usize - CHUNK_SIZE_SHIFT])
|
||||
} else {
|
||||
Err(Error::InvalidArg(format!(
|
||||
"select allocator invalid pos: {pos:?}"
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn select_by_size(&self, size: Size) -> Result<&Arc<Allocator>> {
|
||||
if size <= CHUNK_SIZE_SMALL {
|
||||
Ok(&self.vec[0])
|
||||
} else if size <= CHUNK_SIZE_ULTRA {
|
||||
Ok(&self.vec[size.next_power_of_two().trailing_zeros() as usize - CHUNK_SIZE_SHIFT])
|
||||
} else {
|
||||
Err(Error::InvalidArg(format!(
|
||||
"select allocator invalid size: {size:?}"
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn allocate(&self, size: Size, allow_to_allocate: bool) -> Result<Chunk> {
|
||||
let allocator = self.select_by_size(size)?;
|
||||
allocator.allocate(allow_to_allocate)
|
||||
}
|
||||
|
||||
pub fn allocate_groups(
|
||||
&self,
|
||||
min_remain: usize,
|
||||
max_remain: usize,
|
||||
batch_size: usize,
|
||||
allocate_ultra_groups: bool,
|
||||
) -> usize {
|
||||
let mut finish = 0usize;
|
||||
for allocator in &self.vec {
|
||||
let is_ultra = allocator.clusters.chunk_size > CHUNK_SIZE_LARGE;
|
||||
if is_ultra != allocate_ultra_groups {
|
||||
continue;
|
||||
}
|
||||
for _ in 0..batch_size {
|
||||
match allocator.do_allocate_task(min_remain, max_remain, &self.meta_store) {
|
||||
Ok(AllocateTask::None) => break,
|
||||
Ok(_) => {
|
||||
finish += 1;
|
||||
continue;
|
||||
}
|
||||
Err(_) => break,
|
||||
}
|
||||
}
|
||||
}
|
||||
finish
|
||||
}
|
||||
|
||||
pub fn used_size(&self) -> UsedSize {
|
||||
self.vec
|
||||
.iter()
|
||||
.map(|allocator| allocator.counter.used_size())
|
||||
.sum()
|
||||
}
|
||||
|
||||
pub fn get_allocate_tasks(&self, max_reserved: u64) -> tinyvec::ArrayVec<[GroupId; 3]> {
|
||||
self.vec
|
||||
.iter()
|
||||
.filter_map(|allocator| allocator.get_compact_task(max_reserved))
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub fn finish_compact_task(&self, group_id: GroupId) {
|
||||
self.select_by_pos(Position::new(group_id, 0))
|
||||
.unwrap()
|
||||
.finish_compact_task(group_id);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_allocators() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path();
|
||||
|
||||
let meta_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: path.join("meta"),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let meta_store = Arc::new(MetaStore::open(&meta_config).unwrap());
|
||||
let allocators = Allocators::new(path, true, meta_store).unwrap();
|
||||
|
||||
assert_eq!(
|
||||
allocators
|
||||
.select_by_pos(Position::new(GroupId::new(CHUNK_SIZE_NORMAL, 0, 0), 0))
|
||||
.unwrap()
|
||||
.clusters
|
||||
.chunk_size,
|
||||
CHUNK_SIZE_NORMAL
|
||||
);
|
||||
|
||||
assert_eq!(
|
||||
allocators
|
||||
.select_by_size(CHUNK_SIZE_SMALL)
|
||||
.unwrap()
|
||||
.clusters
|
||||
.chunk_size,
|
||||
CHUNK_SIZE_SMALL
|
||||
);
|
||||
|
||||
assert_eq!(
|
||||
allocators
|
||||
.select_by_size(CHUNK_SIZE_SMALL + 1)
|
||||
.unwrap()
|
||||
.clusters
|
||||
.chunk_size,
|
||||
CHUNK_SIZE_SMALL * 2,
|
||||
);
|
||||
|
||||
assert_eq!(
|
||||
allocators
|
||||
.select_by_size(CHUNK_SIZE_NORMAL)
|
||||
.unwrap()
|
||||
.clusters
|
||||
.chunk_size,
|
||||
CHUNK_SIZE_NORMAL
|
||||
);
|
||||
|
||||
assert_eq!(
|
||||
allocators
|
||||
.select_by_size(CHUNK_SIZE_NORMAL + 1)
|
||||
.unwrap()
|
||||
.clusters
|
||||
.chunk_size,
|
||||
CHUNK_SIZE_NORMAL * 2,
|
||||
);
|
||||
|
||||
let used_size = allocators.used_size();
|
||||
assert_eq!(used_size.allocated_size, 0);
|
||||
assert_eq!(used_size.reserved_size, 0);
|
||||
|
||||
assert!(allocators
|
||||
.select_by_pos(Position::new(GroupId::new(CHUNK_SIZE_ULTRA, 0, 0), 0))
|
||||
.is_ok());
|
||||
assert!(allocators
|
||||
.select_by_pos(Position::new(GroupId::new(Size::gibibyte(1), 0, 0), 0))
|
||||
.is_err());
|
||||
assert!(allocators.select_by_size(Size::gibibyte(1)).is_err());
|
||||
}
|
||||
}
|
||||
312
src/storage/chunk_engine/src/alloc/chunk.rs
Normal file
312
src/storage/chunk_engine/src/alloc/chunk.rs
Normal file
@@ -0,0 +1,312 @@
|
||||
use super::super::*;
|
||||
use lazy_static::lazy_static;
|
||||
use rand::Rng;
|
||||
use std::cell::RefCell;
|
||||
|
||||
use std::sync::atomic::Ordering;
|
||||
use std::sync::Arc;
|
||||
|
||||
pub struct Chunk {
|
||||
meta: ChunkMeta,
|
||||
allocator: Arc<Allocator>,
|
||||
}
|
||||
|
||||
pub type ChunkArc = Arc<Chunk>;
|
||||
|
||||
lazy_static! {
|
||||
static ref ZERO: Vec<u8> = {
|
||||
let mut vec = create_aligned_vec(CHUNK_SIZE_ULTRA);
|
||||
vec.fill(0);
|
||||
vec
|
||||
};
|
||||
}
|
||||
|
||||
impl Chunk {
|
||||
thread_local! {
|
||||
static BUFFER: RefCell<Vec<u8>> = RefCell::new(create_aligned_vec(CHUNK_SIZE_ULTRA));
|
||||
}
|
||||
|
||||
pub fn new(meta: ChunkMeta, allocator: Arc<Allocator>) -> Self {
|
||||
Self { meta, allocator }
|
||||
}
|
||||
|
||||
pub fn meta(&self) -> &ChunkMeta {
|
||||
&self.meta
|
||||
}
|
||||
|
||||
pub fn capacity(&self) -> u32 {
|
||||
self.meta.pos.chunk_size().into()
|
||||
}
|
||||
|
||||
pub fn update_meta(&mut self, req: &UpdateReq) {
|
||||
self.meta.chunk_ver = req.out_commit_ver;
|
||||
self.meta.chain_ver = req.chain_ver;
|
||||
self.meta.last_request_id = req.last_request_id;
|
||||
self.meta.last_client_low = req.last_client_low;
|
||||
self.meta.last_client_high = req.last_client_high;
|
||||
if req.desired_tag.is_empty() {
|
||||
let r: u64 = rand::thread_rng().gen();
|
||||
self.meta.etag = ETag::from(format!("{:X}", r).as_bytes());
|
||||
} else {
|
||||
self.meta.etag = req.desired_tag.into();
|
||||
}
|
||||
self.meta.uncommitted = true;
|
||||
self.meta.timestamp = ChunkMeta::now();
|
||||
}
|
||||
|
||||
pub fn set_chain_ver(&mut self, chain_ver: u32) {
|
||||
self.meta.chain_ver = chain_ver;
|
||||
}
|
||||
|
||||
pub fn set_committed(&mut self) {
|
||||
self.meta.uncommitted = false;
|
||||
}
|
||||
|
||||
pub fn copy_chunk(&self) -> Result<Chunk> {
|
||||
// 1. allocate new chunk.
|
||||
let mut new_chunk = self.allocator.allocate(true)?;
|
||||
|
||||
// 2. copy meta.
|
||||
new_chunk.meta = ChunkMeta {
|
||||
pos: new_chunk.meta.pos,
|
||||
etag: Default::default(),
|
||||
..self.meta
|
||||
};
|
||||
|
||||
// 3. copy data.
|
||||
Self::BUFFER.with(|v| {
|
||||
let mut vec = v.borrow_mut();
|
||||
let len = self.meta.len.next_multiple_of(ALIGN_SIZE.into());
|
||||
let buf = &mut vec[..len as usize]; // aligned.
|
||||
self.pread(buf, 0)?;
|
||||
new_chunk.pwrite(buf, 0)?;
|
||||
Result::Ok(())
|
||||
})?;
|
||||
|
||||
Ok(new_chunk)
|
||||
}
|
||||
|
||||
pub fn copy_on_write(
|
||||
&self,
|
||||
data: &[u8],
|
||||
offset: u32,
|
||||
checksum: u32,
|
||||
is_syncing: bool,
|
||||
allow_to_allocate: bool,
|
||||
allocators: &Allocators,
|
||||
metrics: &Metrics,
|
||||
) -> Result<Chunk> {
|
||||
// 1. allocate new chunk.
|
||||
let new_len = std::cmp::max(self.meta.len, offset + data.len() as u32);
|
||||
let begin = std::time::Instant::now();
|
||||
let mut new_chunk = allocators.allocate(Size::from(new_len), allow_to_allocate)?;
|
||||
let begin2 = std::time::Instant::now();
|
||||
let latency = begin2.duration_since(begin).as_micros() as _;
|
||||
metrics.allocate_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics
|
||||
.allocate_latency
|
||||
.fetch_add(latency, Ordering::AcqRel);
|
||||
metrics.copy_on_write_times.fetch_add(1, Ordering::AcqRel);
|
||||
|
||||
// 2. write data.
|
||||
let skip_read = is_syncing || (offset == 0 && data.len() >= self.meta.len as usize);
|
||||
let checksum = Self::BUFFER.with(|v| {
|
||||
let mut vec = v.borrow_mut();
|
||||
if !skip_read {
|
||||
// aligned read.
|
||||
let len = self.meta.len.next_multiple_of(ALIGN_SIZE.into());
|
||||
let begin = std::time::Instant::now();
|
||||
self.pread(&mut vec[..len as usize], 0)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as _;
|
||||
metrics
|
||||
.copy_on_write_read_times
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics
|
||||
.copy_on_write_read_bytes
|
||||
.fetch_add(len as _, Ordering::AcqRel);
|
||||
metrics
|
||||
.copy_on_write_read_latency
|
||||
.fetch_add(latency, Ordering::AcqRel);
|
||||
}
|
||||
|
||||
// aligned write.
|
||||
if skip_read && is_aligned_io(data, offset) {
|
||||
let begin = std::time::Instant::now();
|
||||
new_chunk.pwrite(data, offset)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as _;
|
||||
metrics.pwrite_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.pwrite_latency.fetch_add(latency, Ordering::AcqRel);
|
||||
} else {
|
||||
if self.meta.len < offset {
|
||||
vec[self.meta.len as usize..offset as usize].fill(0);
|
||||
}
|
||||
vec[offset as usize..][..data.len()].copy_from_slice(data);
|
||||
let len = new_len.next_multiple_of(ALIGN_SIZE.into());
|
||||
let begin = std::time::Instant::now();
|
||||
new_chunk.pwrite(&vec[..len as usize], 0)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as _;
|
||||
metrics.pwrite_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.pwrite_latency.fetch_add(latency, Ordering::AcqRel);
|
||||
};
|
||||
|
||||
Result::Ok(if skip_read {
|
||||
metrics.checksum_reuse.fetch_add(1, Ordering::AcqRel);
|
||||
checksum
|
||||
} else {
|
||||
metrics.checksum_recalculate.fetch_add(1, Ordering::AcqRel);
|
||||
crc32c::crc32c(&vec[..new_len as usize])
|
||||
})
|
||||
})?;
|
||||
let latency = std::time::Instant::now().duration_since(begin2).as_micros() as _;
|
||||
metrics
|
||||
.copy_on_write_latency
|
||||
.fetch_add(latency, Ordering::AcqRel);
|
||||
|
||||
// 3. copy meta.
|
||||
new_chunk.meta.len = if is_syncing {
|
||||
offset + data.len() as u32
|
||||
} else {
|
||||
new_len
|
||||
};
|
||||
new_chunk.meta.checksum = checksum;
|
||||
|
||||
Ok(new_chunk)
|
||||
}
|
||||
|
||||
pub fn safe_write(
|
||||
&mut self,
|
||||
data: &[u8],
|
||||
offset: u32,
|
||||
checksum: u32,
|
||||
truncate: bool,
|
||||
metrics: &Metrics,
|
||||
) -> Result<()> {
|
||||
if truncate && offset < self.meta.len {
|
||||
metrics
|
||||
.safe_write_truncate_shorten
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.checksum_recalculate.fetch_add(1, Ordering::AcqRel);
|
||||
return Self::BUFFER.with(|v| {
|
||||
// aligned read.
|
||||
let mut vec = v.borrow_mut();
|
||||
let len = offset.next_multiple_of(ALIGN_SIZE.into());
|
||||
self.pread(&mut vec[..len as usize], 0)?;
|
||||
self.meta.len = offset;
|
||||
self.meta.checksum = crc32c::crc32c(&vec[..offset as usize]);
|
||||
Result::Ok(())
|
||||
});
|
||||
}
|
||||
|
||||
if is_aligned_len(self.meta.len)
|
||||
&& is_aligned_len(offset)
|
||||
&& (data.is_empty() || is_aligned_buf(data))
|
||||
{
|
||||
// already aligned.
|
||||
if offset > self.meta.len {
|
||||
let padding = (offset - self.meta.len) as usize;
|
||||
let begin = std::time::Instant::now();
|
||||
self.pwrite(&ZERO[..padding], self.meta.len)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as _;
|
||||
metrics.pwrite_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.pwrite_latency.fetch_add(latency, Ordering::AcqRel);
|
||||
self.meta.len = offset;
|
||||
self.meta.checksum = crc32c::crc32c_append(self.meta.checksum, &ZERO[..padding]);
|
||||
metrics
|
||||
.safe_write_truncate_extend
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.checksum_combine.fetch_add(1, Ordering::AcqRel);
|
||||
}
|
||||
|
||||
if !data.is_empty() {
|
||||
assert!(offset == self.meta.len);
|
||||
let begin = std::time::Instant::now();
|
||||
self.pwrite(data, offset)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as u64;
|
||||
metrics.pwrite_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.pwrite_latency.fetch_add(latency, Ordering::AcqRel);
|
||||
self.meta.len = offset + data.len() as u32;
|
||||
self.meta.checksum =
|
||||
crc32c::crc32c_combine(self.meta.checksum, checksum, data.len());
|
||||
metrics
|
||||
.safe_write_direct_append
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.checksum_combine.fetch_add(1, Ordering::AcqRel);
|
||||
}
|
||||
} else if self.meta.len < offset + data.len() as u32 {
|
||||
// copy to buffer and write.
|
||||
assert!(self.meta.len <= offset);
|
||||
Self::BUFFER.with(|v| {
|
||||
let mut vec = v.borrow_mut();
|
||||
let start = self.meta.len & !(ALIGN_SIZE.0 as u32 - 1);
|
||||
if start != self.meta.len {
|
||||
metrics
|
||||
.safe_write_read_tail_times
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics
|
||||
.safe_write_read_tail_bytes
|
||||
.fetch_add(ALIGN_SIZE.0, Ordering::AcqRel);
|
||||
self.pread(&mut vec[start as usize..][..ALIGN_SIZE.into()], start)?;
|
||||
}
|
||||
if self.meta.len < offset {
|
||||
metrics
|
||||
.safe_write_truncate_extend
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
vec[self.meta.len as usize..offset as usize].fill(0);
|
||||
}
|
||||
vec[offset as usize..][..data.len()].copy_from_slice(data);
|
||||
let new_len = offset as usize + data.len();
|
||||
let begin = std::time::Instant::now();
|
||||
self.pwrite(
|
||||
&vec[start as usize..new_len.next_multiple_of(ALIGN_SIZE.into())],
|
||||
start,
|
||||
)?;
|
||||
let latency = std::time::Instant::now().duration_since(begin).as_micros() as _;
|
||||
metrics.pwrite_times.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.pwrite_latency.fetch_add(latency, Ordering::AcqRel);
|
||||
self.meta.checksum = crc32c::crc32c_append(
|
||||
self.meta.checksum,
|
||||
&vec[self.meta.len as usize..new_len],
|
||||
);
|
||||
metrics
|
||||
.safe_write_indirect_append
|
||||
.fetch_add(1, Ordering::AcqRel);
|
||||
metrics.checksum_combine.fetch_add(1, Ordering::AcqRel);
|
||||
self.meta.len = new_len as u32;
|
||||
Result::Ok(())
|
||||
})?;
|
||||
} else {
|
||||
assert!(data.is_empty());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn pread(&self, buf: &mut [u8], offset: u32) -> Result<()> {
|
||||
self.allocator.clusters.pread(self.meta.pos, buf, offset)
|
||||
}
|
||||
|
||||
pub(super) fn pwrite(&self, buf: &[u8], offset: u32) -> Result<()> {
|
||||
self.allocator.clusters.pwrite(self.meta.pos, buf, offset)
|
||||
}
|
||||
|
||||
pub fn fd_and_offset(&self) -> FdAndOffset {
|
||||
self.allocator.clusters.fd_and_offset(self.meta.pos)
|
||||
}
|
||||
}
|
||||
|
||||
impl Clone for Chunk {
|
||||
fn clone(&self) -> Self {
|
||||
self.allocator.reference(self.meta.clone(), false)
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for Chunk {
|
||||
fn drop(&mut self) {
|
||||
self.allocator.dereference(self.meta.pos);
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for Chunk {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
std::fmt::Debug::fmt(&self.meta, f)
|
||||
}
|
||||
}
|
||||
304
src/storage/chunk_engine/src/alloc/chunk_allocator.rs
Normal file
304
src/storage/chunk_engine/src/alloc/chunk_allocator.rs
Normal file
@@ -0,0 +1,304 @@
|
||||
use super::super::*;
|
||||
use std::collections::hash_map::Entry;
|
||||
|
||||
use std::sync::atomic::Ordering;
|
||||
use std::sync::Arc;
|
||||
|
||||
pub struct ChunkAllocator {
|
||||
pub full_groups: ShardsSet<GroupId>,
|
||||
pub active_groups: ShardsMap<GroupId, GroupState>,
|
||||
pub(super) active_levels: [ShardsSet<GroupId>; GroupState::LEVELS],
|
||||
pub(super) frozen_groups: ShardsMap<GroupId, GroupState>,
|
||||
pub(super) group_allocator: GroupAllocator,
|
||||
pub(super) position_rc: ShardsMap<Position, u32>,
|
||||
pub(super) counter: Arc<AllocatorCounter>,
|
||||
}
|
||||
|
||||
impl ChunkAllocator {
|
||||
pub fn with_chunk_size(chunk_size: Size) -> Self {
|
||||
let counter = Arc::new(AllocatorCounter::new(chunk_size));
|
||||
Self {
|
||||
full_groups: Default::default(),
|
||||
active_groups: Default::default(),
|
||||
active_levels: Default::default(),
|
||||
frozen_groups: Default::default(),
|
||||
group_allocator: GroupAllocator::init(counter.clone()),
|
||||
position_rc: Default::default(),
|
||||
counter,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn load(
|
||||
mut it: RocksDBIterator,
|
||||
counter: Arc<AllocatorCounter>,
|
||||
chunk_size: Size,
|
||||
) -> Result<Self> {
|
||||
let mut full_groups = ShardsSet::with_capacity(4096);
|
||||
let mut active_groups = ShardsMap::with_capacity(4096);
|
||||
let frozen_groups = ShardsMap::with_capacity(4096);
|
||||
let mut active_levels = std::array::from_fn(|_| ShardsSet::with_capacity(4096));
|
||||
|
||||
let mut allocated_groups = ShardsSet::with_capacity(4096);
|
||||
let mut unallocated_groups = ShardsSet::with_capacity(4096);
|
||||
let mut current = GroupId::new(chunk_size, 0, 0);
|
||||
|
||||
let mut allocated_count: u64 = 0;
|
||||
let mut reserved_count: u64 = 0;
|
||||
|
||||
let prefix = MetaKey::group_bits_chunk_size_prefix(current);
|
||||
it.iterate(prefix, |key, value| {
|
||||
let group_id = MetaKey::parse_group_bits_key(key)?;
|
||||
let group_state = GroupState::from(value)?;
|
||||
|
||||
assert!(
|
||||
current <= group_id,
|
||||
"current {current:?} > next {group_id:?}"
|
||||
);
|
||||
while current < group_id {
|
||||
unallocated_groups.insert(current);
|
||||
current.next();
|
||||
}
|
||||
current.next();
|
||||
|
||||
allocated_count += GroupState::TOTAL_BITS as u64;
|
||||
if group_state.is_empty() {
|
||||
allocated_groups.insert(group_id);
|
||||
reserved_count += GroupState::TOTAL_BITS as u64;
|
||||
} else if group_state.is_full() {
|
||||
full_groups.insert(group_id);
|
||||
} else {
|
||||
reserved_count += GroupState::TOTAL_BITS as u64 - group_state.count() as u64;
|
||||
active_levels[group_state.level() as usize].insert(group_id);
|
||||
active_groups.insert(group_id, group_state);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
counter.init(allocated_count, reserved_count);
|
||||
let chunk_allocator = ChunkAllocator {
|
||||
full_groups,
|
||||
active_groups,
|
||||
active_levels,
|
||||
frozen_groups,
|
||||
counter: counter.clone(),
|
||||
group_allocator: GroupAllocator {
|
||||
allocated_groups,
|
||||
unallocated_groups,
|
||||
next_group_id: current,
|
||||
counter,
|
||||
},
|
||||
position_rc: ShardsMap::with_capacity(1 << 20),
|
||||
};
|
||||
|
||||
Ok(chunk_allocator)
|
||||
}
|
||||
|
||||
pub fn allocate(&mut self, clusters: &Clusters, allow_to_allocate: bool) -> Result<Position> {
|
||||
if !self.active_groups.is_empty() {
|
||||
for level in (0..GroupState::LEVELS).rev() {
|
||||
let set = &mut self.active_levels[level];
|
||||
if let Some(&group_id) = set.iter().next() {
|
||||
let state = self.active_groups.get_mut(&group_id).unwrap();
|
||||
let index = state.allocate().unwrap();
|
||||
if state.is_full() {
|
||||
self.full_groups.insert(group_id);
|
||||
self.active_groups.remove(&group_id);
|
||||
set.remove(&group_id);
|
||||
} else if state.level() != level as u32 {
|
||||
set.remove(&group_id);
|
||||
self.active_levels[level + 1].insert(group_id);
|
||||
}
|
||||
let pos = Position::new(group_id, index);
|
||||
self.reference(pos, true);
|
||||
self.counter.allocate_chunk();
|
||||
return Ok(pos);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let group_id = self.group_allocator.allocate(clusters, allow_to_allocate)?;
|
||||
self.counter.allocate_chunk();
|
||||
let state = match self.active_groups.entry(group_id) {
|
||||
Entry::Occupied(_) => panic!("should not be active groups: {:?}", group_id),
|
||||
Entry::Vacant(entry) => entry.insert(GroupState::empty()),
|
||||
};
|
||||
let index = state.allocate().unwrap();
|
||||
self.active_levels[state.level() as usize].insert(group_id);
|
||||
let pos = Position::new(group_id, index);
|
||||
self.reference(pos, true);
|
||||
Ok(pos)
|
||||
}
|
||||
|
||||
pub fn reference(&mut self, pos: Position, first_ref: bool) {
|
||||
let group_id = pos.group_id();
|
||||
if let Some(state) = self.active_groups.get_mut(&group_id) {
|
||||
assert!(state.check(pos.index()), "ref pos failed: {:?}", pos);
|
||||
} else if let Some(state) = self.frozen_groups.get_mut(&group_id) {
|
||||
assert!(state.check(pos.index()), "ref pos failed: {:?}", pos);
|
||||
} else {
|
||||
assert!(self.full_groups.contains(&group_id));
|
||||
}
|
||||
|
||||
let rc = match self.position_rc.entry(pos) {
|
||||
Entry::Occupied(mut occupied_entry) => {
|
||||
let rc = occupied_entry.get_mut();
|
||||
*rc += 1;
|
||||
*rc
|
||||
}
|
||||
Entry::Vacant(vacant_entry) => {
|
||||
self.counter.position_count.fetch_add(1, Ordering::AcqRel);
|
||||
vacant_entry.insert(1);
|
||||
1
|
||||
}
|
||||
};
|
||||
self.counter.position_rc.fetch_add(1, Ordering::AcqRel);
|
||||
|
||||
if first_ref {
|
||||
assert!(rc == 1, "should be first ref to pos {:?}, rc {}", pos, rc);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn dereference(&mut self, pos: Position) {
|
||||
self.counter.position_rc.fetch_sub(1, Ordering::AcqRel);
|
||||
let count = self.position_rc.get_mut(&pos).unwrap();
|
||||
*count -= 1;
|
||||
if *count == 0 {
|
||||
self.counter.position_count.fetch_sub(1, Ordering::AcqRel);
|
||||
self.position_rc.remove(&pos);
|
||||
self.deallocate(pos);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn deallocate(&mut self, pos: Position) {
|
||||
let group_id = pos.group_id();
|
||||
if let Some(state) = self.active_groups.get_mut(&group_id) {
|
||||
let level = state.level();
|
||||
state.deallocate(pos.index()).unwrap();
|
||||
if state.is_empty() {
|
||||
self.active_groups.remove(&group_id);
|
||||
self.active_levels[level as usize].remove(&group_id);
|
||||
self.group_allocator.deallocate(group_id);
|
||||
} else if state.level() != level {
|
||||
self.active_levels[level as usize].remove(&group_id);
|
||||
self.active_levels[level as usize - 1].insert(group_id);
|
||||
}
|
||||
} else if let Some(state) = self.frozen_groups.get_mut(&group_id) {
|
||||
state.deallocate(pos.index()).unwrap();
|
||||
if state.is_empty() {
|
||||
self.frozen_groups.remove(&group_id);
|
||||
self.group_allocator.deallocate(group_id);
|
||||
}
|
||||
} else if self.full_groups.contains(&group_id) {
|
||||
let mut state = GroupState::full();
|
||||
state.deallocate(pos.index()).unwrap();
|
||||
self.active_levels[state.level() as usize].insert(group_id);
|
||||
self.active_groups.insert(group_id, state);
|
||||
self.full_groups.remove(&group_id);
|
||||
} else {
|
||||
unreachable!(
|
||||
"deallocate position failed! not found this position: {:?}",
|
||||
pos
|
||||
);
|
||||
}
|
||||
self.counter.deallocate_chunk();
|
||||
}
|
||||
|
||||
pub fn get_compact_task(&mut self, max_reserved: u64) -> Option<GroupId> {
|
||||
let reserved = self.counter.reserved_chunks();
|
||||
if reserved <= max_reserved {
|
||||
return None;
|
||||
}
|
||||
|
||||
for set in &mut self.active_levels {
|
||||
if let Some(&group_id) = set.iter().next() {
|
||||
set.remove(&group_id);
|
||||
let state = self.active_groups.remove(&group_id).unwrap();
|
||||
self.frozen_groups.insert(group_id, state);
|
||||
return Some(group_id);
|
||||
}
|
||||
}
|
||||
|
||||
None
|
||||
}
|
||||
|
||||
pub fn finish_compact_task(&mut self, group_id: GroupId) {
|
||||
if let Some(state) = self.frozen_groups.remove(&group_id) {
|
||||
self.active_levels[state.level() as usize].insert(group_id);
|
||||
self.active_groups.insert(group_id, state);
|
||||
tracing::info!("finish compact task and move back {:?}", group_id);
|
||||
} else {
|
||||
tracing::info!("finish compact task successful!");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_chunk_allocator() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
|
||||
let clusters = Clusters::open(&config).unwrap();
|
||||
let mut chunk_allocator = ChunkAllocator::with_chunk_size(CHUNK_SIZE_NORMAL);
|
||||
assert!(chunk_allocator.active_groups.is_empty());
|
||||
assert!(chunk_allocator
|
||||
.active_levels
|
||||
.iter()
|
||||
.all(|set| set.is_empty()));
|
||||
assert!(chunk_allocator.full_groups.is_empty());
|
||||
|
||||
let one_level_count = GroupState::TOTAL_BITS / GroupState::LEVELS;
|
||||
for i in 0..(one_level_count - 1) {
|
||||
let pos = chunk_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(pos, Position::new(GroupId::default(), i as _));
|
||||
}
|
||||
assert_eq!(chunk_allocator.active_groups.len(), 1);
|
||||
assert_eq!(chunk_allocator.active_levels[0].len(), 1);
|
||||
|
||||
let pos = chunk_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(
|
||||
pos,
|
||||
Position::new(GroupId::default(), one_level_count as u8 - 1)
|
||||
);
|
||||
assert_eq!(chunk_allocator.active_groups.len(), 1);
|
||||
assert_eq!(chunk_allocator.active_levels[0].len(), 0);
|
||||
assert_eq!(chunk_allocator.active_levels[1].len(), 1);
|
||||
|
||||
let used_size = chunk_allocator.counter.used_size();
|
||||
assert_eq!(
|
||||
used_size.allocated_size,
|
||||
CHUNK_SIZE_NORMAL * GroupState::TOTAL_BITS
|
||||
);
|
||||
assert_eq!(
|
||||
used_size.reserved_size,
|
||||
CHUNK_SIZE_NORMAL * (GroupState::TOTAL_BITS - one_level_count)
|
||||
);
|
||||
|
||||
for i in one_level_count..GroupState::TOTAL_BITS {
|
||||
let pos = chunk_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(pos, Position::new(GroupId::default(), i as _));
|
||||
}
|
||||
assert!(chunk_allocator.active_groups.is_empty());
|
||||
assert!(chunk_allocator
|
||||
.active_levels
|
||||
.iter()
|
||||
.all(|set| set.is_empty()));
|
||||
assert_eq!(chunk_allocator.full_groups.len(), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[should_panic(expected = "not found this position")]
|
||||
fn test_chunk_invalid_deallocate() {
|
||||
let mut allocator = ChunkAllocator::with_chunk_size(CHUNK_SIZE_NORMAL);
|
||||
allocator.deallocate(Position::default());
|
||||
}
|
||||
}
|
||||
190
src/storage/chunk_engine/src/alloc/group_allocator.rs
Normal file
190
src/storage/chunk_engine/src/alloc/group_allocator.rs
Normal file
@@ -0,0 +1,190 @@
|
||||
use std::sync::Arc;
|
||||
|
||||
use super::super::*;
|
||||
|
||||
pub struct GroupAllocator {
|
||||
pub(super) allocated_groups: ShardsSet<GroupId>,
|
||||
pub(super) unallocated_groups: ShardsSet<GroupId>,
|
||||
pub(super) next_group_id: GroupId,
|
||||
pub(super) counter: Arc<AllocatorCounter>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum AllocateTask {
|
||||
None,
|
||||
Allocate(GroupId),
|
||||
Deallocate(GroupId),
|
||||
}
|
||||
|
||||
impl GroupAllocator {
|
||||
pub fn init(counter: Arc<AllocatorCounter>) -> Self {
|
||||
Self {
|
||||
allocated_groups: Default::default(),
|
||||
unallocated_groups: Default::default(),
|
||||
next_group_id: Default::default(),
|
||||
counter,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn allocate(&mut self, clusters: &Clusters, allow_to_allocate: bool) -> Result<GroupId> {
|
||||
if let Some(&group_id) = self.allocated_groups.iter().next() {
|
||||
self.allocated_groups.remove(&group_id);
|
||||
Ok(group_id)
|
||||
} else if allow_to_allocate {
|
||||
let group_id = self.get_unallocated_group_id();
|
||||
tracing::info!("allocate group slow path {:?}", group_id);
|
||||
let result = clusters.allocate(group_id);
|
||||
if let Err(err) = result {
|
||||
self.unallocated_groups.insert(group_id);
|
||||
return Err(err);
|
||||
}
|
||||
self.counter.allocate_group();
|
||||
Ok(group_id)
|
||||
} else {
|
||||
Err(Error::NoSpace)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn deallocate(&mut self, group_id: GroupId) {
|
||||
self.allocated_groups.insert(group_id);
|
||||
}
|
||||
|
||||
fn get_unallocated_group_id(&mut self) -> GroupId {
|
||||
if let Some(&group_id) = self.unallocated_groups.iter().next() {
|
||||
self.unallocated_groups.remove(&group_id);
|
||||
group_id
|
||||
} else {
|
||||
let group_id = self.next_group_id;
|
||||
self.next_group_id = self.next_group_id.plus_one();
|
||||
group_id
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_allocate_task(&mut self, min_remain: usize, max_remain: usize) -> AllocateTask {
|
||||
if self.allocated_groups.len() < min_remain {
|
||||
AllocateTask::Allocate(self.get_unallocated_group_id())
|
||||
} else if self.allocated_groups.len() > max_remain {
|
||||
let group_id = *self.allocated_groups.iter().next().unwrap();
|
||||
self.allocated_groups.remove(&group_id);
|
||||
AllocateTask::Deallocate(group_id)
|
||||
} else {
|
||||
AllocateTask::None
|
||||
}
|
||||
}
|
||||
|
||||
pub fn finish_allocate_task(&mut self, task: AllocateTask, succ: bool) {
|
||||
match (task, succ) {
|
||||
(AllocateTask::Allocate(group_id), true) => {
|
||||
self.counter.allocate_group();
|
||||
self.allocated_groups.insert(group_id)
|
||||
}
|
||||
(AllocateTask::Deallocate(group_id), true) => {
|
||||
self.counter.deallocate_group();
|
||||
self.unallocated_groups.insert(group_id)
|
||||
}
|
||||
(AllocateTask::Allocate(group_id), false) => self.unallocated_groups.insert(group_id),
|
||||
(AllocateTask::Deallocate(group_id), false) => self.allocated_groups.insert(group_id),
|
||||
_ => false,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_group_allocator() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
|
||||
let clusters = Clusters::open(&config).unwrap();
|
||||
let counter = Arc::new(AllocatorCounter::new(CHUNK_SIZE_NORMAL));
|
||||
let mut group_allocator = GroupAllocator::init(counter);
|
||||
|
||||
let group_id_1 = group_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(group_id_1, GroupId::default());
|
||||
assert_eq!(group_allocator.next_group_id, group_id_1.plus_one());
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
|
||||
let group_id_2 = group_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(group_id_1.plus_one(), group_id_2);
|
||||
assert_eq!(group_allocator.next_group_id, group_id_2.plus_one());
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
|
||||
group_allocator.deallocate(group_id_1);
|
||||
assert_eq!(group_allocator.next_group_id, group_id_2.plus_one());
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 1);
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
|
||||
let group_id_3 = group_allocator.allocate(&clusters, true).unwrap();
|
||||
assert_eq!(group_id_1, group_id_3);
|
||||
assert_eq!(group_allocator.next_group_id, group_id_2.plus_one());
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
|
||||
group_allocator.allocate(&clusters, false).unwrap_err();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_group_allocator_task() {
|
||||
let counter = Arc::new(AllocatorCounter::new(CHUNK_SIZE_NORMAL));
|
||||
let mut group_allocator = GroupAllocator::init(counter);
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 0);
|
||||
|
||||
let task = group_allocator.get_allocate_task(2, 4);
|
||||
assert!(matches!(task, AllocateTask::Allocate(_)));
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 1);
|
||||
|
||||
group_allocator.finish_allocate_task(task, false);
|
||||
let task = group_allocator.get_allocate_task(2, 4);
|
||||
assert!(matches!(task, AllocateTask::Allocate(_)));
|
||||
assert!(group_allocator.allocated_groups.is_empty());
|
||||
assert!(group_allocator.unallocated_groups.is_empty());
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 1);
|
||||
|
||||
group_allocator.finish_allocate_task(task, true);
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 1);
|
||||
assert_eq!(group_allocator.unallocated_groups.len(), 0);
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 1);
|
||||
|
||||
let task = group_allocator.get_allocate_task(2, 4);
|
||||
assert!(matches!(task, AllocateTask::Allocate(_)));
|
||||
group_allocator.finish_allocate_task(task, true);
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 2);
|
||||
assert_eq!(group_allocator.unallocated_groups.len(), 0);
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 2);
|
||||
|
||||
let task = group_allocator.get_allocate_task(2, 4);
|
||||
assert!(matches!(task, AllocateTask::None));
|
||||
group_allocator.finish_allocate_task(task, true);
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 2);
|
||||
assert_eq!(group_allocator.unallocated_groups.len(), 0);
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 2);
|
||||
|
||||
let task = group_allocator.get_allocate_task(3, 4);
|
||||
assert!(matches!(task, AllocateTask::Allocate(_)));
|
||||
group_allocator.finish_allocate_task(task, false);
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 2);
|
||||
assert_eq!(group_allocator.unallocated_groups.len(), 1);
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 3);
|
||||
|
||||
let task = group_allocator.get_allocate_task(1, 1);
|
||||
assert!(matches!(task, AllocateTask::Deallocate(_)));
|
||||
group_allocator.finish_allocate_task(task, false);
|
||||
assert_eq!(group_allocator.allocated_groups.len(), 2);
|
||||
assert_eq!(group_allocator.unallocated_groups.len(), 1);
|
||||
assert_eq!(group_allocator.next_group_id.cluster(), 3);
|
||||
}
|
||||
}
|
||||
27
src/storage/chunk_engine/src/alloc/metrics.rs
Normal file
27
src/storage/chunk_engine/src/alloc/metrics.rs
Normal file
@@ -0,0 +1,27 @@
|
||||
use std::sync::atomic::AtomicU64;
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
#[repr(C)]
|
||||
pub struct Metrics {
|
||||
pub copy_on_write_times: AtomicU64,
|
||||
pub copy_on_write_latency: AtomicU64,
|
||||
pub copy_on_write_read_bytes: AtomicU64,
|
||||
pub copy_on_write_read_times: AtomicU64,
|
||||
pub copy_on_write_read_latency: AtomicU64,
|
||||
|
||||
pub checksum_reuse: AtomicU64,
|
||||
pub checksum_combine: AtomicU64,
|
||||
pub checksum_recalculate: AtomicU64,
|
||||
|
||||
pub safe_write_direct_append: AtomicU64,
|
||||
pub safe_write_indirect_append: AtomicU64,
|
||||
pub safe_write_truncate_shorten: AtomicU64,
|
||||
pub safe_write_truncate_extend: AtomicU64,
|
||||
pub safe_write_read_tail_times: AtomicU64,
|
||||
pub safe_write_read_tail_bytes: AtomicU64,
|
||||
|
||||
pub allocate_times: AtomicU64,
|
||||
pub allocate_latency: AtomicU64,
|
||||
pub pwrite_times: AtomicU64,
|
||||
pub pwrite_latency: AtomicU64,
|
||||
}
|
||||
17
src/storage/chunk_engine/src/alloc/mod.rs
Normal file
17
src/storage/chunk_engine/src/alloc/mod.rs
Normal file
@@ -0,0 +1,17 @@
|
||||
mod allocator;
|
||||
mod allocator_counter;
|
||||
mod allocators;
|
||||
mod chunk;
|
||||
mod chunk_allocator;
|
||||
mod group_allocator;
|
||||
mod metrics;
|
||||
mod writing_chunk;
|
||||
|
||||
pub use allocator::*;
|
||||
pub use allocator_counter::*;
|
||||
pub use allocators::*;
|
||||
pub use chunk::*;
|
||||
pub use chunk_allocator::*;
|
||||
pub use group_allocator::*;
|
||||
pub use metrics::*;
|
||||
pub use writing_chunk::*;
|
||||
124
src/storage/chunk_engine/src/alloc/writing_chunk.rs
Normal file
124
src/storage/chunk_engine/src/alloc/writing_chunk.rs
Normal file
@@ -0,0 +1,124 @@
|
||||
use crate::{Bytes, Chunk, ChunkArc, ChunkMeta};
|
||||
use dashmap::DashMap;
|
||||
use std::{collections::HashMap, sync::Arc};
|
||||
|
||||
pub struct WritingHolder {
|
||||
pub chunk: Chunk,
|
||||
pub abort: bool,
|
||||
}
|
||||
|
||||
pub type WritingList = DashMap<Bytes, HashMap<Bytes, WritingHolder>>;
|
||||
|
||||
pub struct WritingChunk {
|
||||
pub chunk_id: Bytes,
|
||||
pub chunk: Chunk,
|
||||
pub list: Arc<WritingList>,
|
||||
pub prefix_len: u32,
|
||||
pub is_remove: bool,
|
||||
pub commit_succ: bool,
|
||||
}
|
||||
|
||||
impl WritingChunk {
|
||||
pub fn meta(&self) -> &ChunkMeta {
|
||||
self.chunk.meta()
|
||||
}
|
||||
|
||||
pub fn set_committed(&mut self) {
|
||||
self.chunk.set_committed();
|
||||
}
|
||||
|
||||
pub fn commit_succ(&mut self) {
|
||||
self.commit_succ = true;
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for WritingChunk {
|
||||
fn drop(&mut self) {
|
||||
let prefix = &self.chunk_id[..self.prefix_len as usize];
|
||||
if let Some(mut map) = self.list.get_mut(prefix) {
|
||||
if self.commit_succ {
|
||||
if map.remove(&self.chunk_id).is_some() {
|
||||
return;
|
||||
}
|
||||
} else if let Some(holder) = map.get_mut(&self.chunk_id) {
|
||||
holder.abort = true;
|
||||
return;
|
||||
}
|
||||
}
|
||||
panic!("chunk id {:?} is not in the writing list!", self.chunk_id);
|
||||
}
|
||||
}
|
||||
|
||||
impl From<&WritingChunk> for ChunkArc {
|
||||
fn from(chunk: &WritingChunk) -> Self {
|
||||
Arc::new(chunk.chunk.clone())
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for WritingChunk {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("WritingChunk")
|
||||
.field("chunk_id", &self.chunk_id)
|
||||
.field("chunk", &self.chunk)
|
||||
.field("is_remove", &self.is_remove)
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use crate::*;
|
||||
use std::sync::Arc;
|
||||
|
||||
fn test_writing_chunk_not_in_list(has_list: bool, commit_succ: bool) {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path();
|
||||
|
||||
let meta_config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: path.join("meta"),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let meta_store = Arc::new(MetaStore::open(&meta_config).unwrap());
|
||||
let allocators = Allocators::new(path, true, meta_store).unwrap();
|
||||
let chunk = allocators.allocate(CHUNK_SIZE_NORMAL, true).unwrap();
|
||||
|
||||
let writing_list: Arc<WritingList> = Default::default();
|
||||
if has_list {
|
||||
writing_list
|
||||
.entry(Bytes::from(b"te".as_slice()))
|
||||
.or_default();
|
||||
}
|
||||
let writing_chunk = WritingChunk {
|
||||
chunk_id: b"test".as_ref().into(),
|
||||
chunk,
|
||||
list: writing_list.clone(),
|
||||
prefix_len: 2,
|
||||
is_remove: false,
|
||||
commit_succ,
|
||||
};
|
||||
println!("{:#?}", writing_chunk);
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[should_panic(expected = "chunk id [116, 101, 115, 116] is not in the writing list!")]
|
||||
fn test_writing_chunk_not_in_list_1() {
|
||||
test_writing_chunk_not_in_list(false, false);
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[should_panic(expected = "chunk id [116, 101, 115, 116] is not in the writing list!")]
|
||||
fn test_writing_chunk_not_in_list_2() {
|
||||
test_writing_chunk_not_in_list(true, false);
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[should_panic(expected = "chunk id [116, 101, 115, 116] is not in the writing list!")]
|
||||
fn test_writing_chunk_not_in_list_3() {
|
||||
test_writing_chunk_not_in_list(true, true);
|
||||
}
|
||||
}
|
||||
89
src/storage/chunk_engine/src/bin/bench.rs
Normal file
89
src/storage/chunk_engine/src/bin/bench.rs
Normal file
@@ -0,0 +1,89 @@
|
||||
use std::sync::{
|
||||
atomic::{AtomicUsize, Ordering},
|
||||
Arc,
|
||||
};
|
||||
|
||||
use anyhow::{Context, Result};
|
||||
use chunk_engine::*;
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Debug, Default, Deserialize)]
|
||||
struct Config {
|
||||
engine: EngineConfig,
|
||||
threads: usize,
|
||||
count: usize,
|
||||
level: String,
|
||||
}
|
||||
|
||||
fn main() -> Result<()> {
|
||||
let mut iter = std::env::args();
|
||||
iter.next();
|
||||
let config_path = iter
|
||||
.next()
|
||||
.ok_or(anyhow::anyhow!("get config path failed"))?;
|
||||
|
||||
let content = std::fs::read_to_string(&config_path)
|
||||
.with_context(|| format!("failed to open config file {:?}", config_path))?;
|
||||
|
||||
let config: Config = toml::from_str(&content)
|
||||
.with_context(|| format!("failed to parse config file {:?}", config_path))?;
|
||||
|
||||
let level = match config.level.as_str() {
|
||||
"info" => tracing::Level::INFO,
|
||||
"debug" => tracing::Level::DEBUG,
|
||||
_ => tracing::Level::WARN,
|
||||
};
|
||||
tracing_subscriber::fmt().with_max_level(level).init();
|
||||
tracing::info!("config content: {:#?}", config);
|
||||
|
||||
let engine = chunk_engine::Engine::open(&config.engine).unwrap();
|
||||
engine.start_allocate_workers(2);
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
let bytes = Arc::new(AtomicUsize::default());
|
||||
let running = Arc::new(AtomicUsize::default());
|
||||
|
||||
let threads = (0..config.threads)
|
||||
.map(|i| {
|
||||
let engine = engine.clone();
|
||||
let bytes = bytes.clone();
|
||||
let running = running.clone();
|
||||
|
||||
let mut vec = create_aligned_vec(CHUNK_SIZE_NORMAL);
|
||||
vec.fill(i as u8);
|
||||
let checksum = crc32c::crc32c(&vec);
|
||||
running.fetch_add(1, Ordering::SeqCst);
|
||||
|
||||
Ok(std::thread::spawn(move || {
|
||||
let mut chunk_id: usize = i << 32;
|
||||
for _ in 0..config.count {
|
||||
engine
|
||||
.write(&chunk_id.to_be_bytes(), &vec, 0, checksum)
|
||||
.unwrap();
|
||||
chunk_id += 1;
|
||||
bytes.fetch_add(vec.len(), Ordering::SeqCst);
|
||||
}
|
||||
running.fetch_sub(1, Ordering::SeqCst);
|
||||
}))
|
||||
})
|
||||
.collect::<Result<Vec<_>>>()?;
|
||||
|
||||
while running.load(Ordering::Acquire) > 0 {
|
||||
std::thread::sleep(std::time::Duration::from_secs(1));
|
||||
let bytes = bytes.swap(0, Ordering::Acquire);
|
||||
let used_size = engine.used_size();
|
||||
tracing::info!(
|
||||
"throughput: {:?}/s, allocated: {:?}, reserved: {:?}",
|
||||
Size::from(bytes),
|
||||
used_size.allocated_size,
|
||||
used_size.reserved_size,
|
||||
);
|
||||
}
|
||||
|
||||
for thread in threads {
|
||||
thread.join().unwrap();
|
||||
}
|
||||
|
||||
engine.stop_and_join();
|
||||
engine.speed_up_quit();
|
||||
Ok(())
|
||||
}
|
||||
1655
src/storage/chunk_engine/src/core/engine.rs
Normal file
1655
src/storage/chunk_engine/src/core/engine.rs
Normal file
File diff suppressed because it is too large
Load Diff
3
src/storage/chunk_engine/src/core/mod.rs
Normal file
3
src/storage/chunk_engine/src/core/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
||||
mod engine;
|
||||
|
||||
pub use engine::*;
|
||||
598
src/storage/chunk_engine/src/cxx.rs
Normal file
598
src/storage/chunk_engine/src/cxx.rs
Normal file
@@ -0,0 +1,598 @@
|
||||
use std::collections::BTreeSet;
|
||||
use std::path::PathBuf;
|
||||
use std::pin::Pin;
|
||||
use std::sync::atomic::Ordering;
|
||||
use std::sync::Arc;
|
||||
|
||||
use crate::*;
|
||||
pub use ::cxx::CxxString;
|
||||
|
||||
fn create(path: &str, create: bool, prefix_len: usize, error: Pin<&mut CxxString>) -> Box<Engine> {
|
||||
let config = EngineConfig {
|
||||
path: PathBuf::from(path),
|
||||
create,
|
||||
prefix_len,
|
||||
};
|
||||
match Engine::open(&config) {
|
||||
Ok(engine) => Box::new(engine),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
unsafe { Box::from_raw(std::ptr::null_mut()) }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
struct LogGuard(tracing_appender::non_blocking::WorkerGuard);
|
||||
|
||||
fn init_log(path: &str, error: Pin<&mut CxxString>) -> Box<LogGuard> {
|
||||
match rolling_file::BasicRollingFileAppender::new(
|
||||
path,
|
||||
rolling_file::RollingConditionBasic::new().max_size(Size::mebibyte(500).into()),
|
||||
20,
|
||||
) {
|
||||
Ok(file_appender) => {
|
||||
let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
|
||||
tracing_subscriber::fmt()
|
||||
.with_max_level(tracing::Level::INFO)
|
||||
.with_writer(non_blocking)
|
||||
.with_ansi(false)
|
||||
.init();
|
||||
Box::new(LogGuard(guard))
|
||||
}
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
unsafe { Box::from_raw(std::ptr::null_mut()) }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Chunk {
|
||||
fn raw_meta(&self) -> &ffi::RawMeta {
|
||||
unsafe { std::mem::transmute(self.meta()) }
|
||||
}
|
||||
|
||||
fn raw_etag(&self) -> &[u8] {
|
||||
&self.meta().etag
|
||||
}
|
||||
|
||||
fn uncommitted(&self) -> bool {
|
||||
self.meta().uncommitted
|
||||
}
|
||||
}
|
||||
|
||||
impl WritingChunk {
|
||||
fn raw_meta(&self) -> &ffi::RawMeta {
|
||||
self.chunk.raw_meta()
|
||||
}
|
||||
|
||||
fn raw_etag(&self) -> &[u8] {
|
||||
self.chunk.raw_etag()
|
||||
}
|
||||
|
||||
fn uncommitted(&self) -> bool {
|
||||
self.chunk.uncommitted()
|
||||
}
|
||||
|
||||
fn raw_chunk(&self) -> *const Chunk {
|
||||
&self.chunk
|
||||
}
|
||||
|
||||
fn set_chain_ver(&mut self, chain_ver: u32) {
|
||||
self.chunk.set_chain_ver(chain_ver);
|
||||
}
|
||||
}
|
||||
|
||||
impl Engine {
|
||||
fn raw_used_size(&self) -> ffi::RawUsedSize {
|
||||
unsafe { std::mem::transmute(self.used_size()) }
|
||||
}
|
||||
|
||||
fn get_raw_chunk(&self, chunk_id: &[u8], error: Pin<&mut CxxString>) -> *const Chunk {
|
||||
match self.get(chunk_id) {
|
||||
Ok(None) => {
|
||||
error.clear();
|
||||
std::ptr::null()
|
||||
}
|
||||
Ok(Some(c)) => {
|
||||
error.clear();
|
||||
Arc::into_raw(c)
|
||||
}
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
std::ptr::null()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn get_raw_chunks(&self, reqs: &mut [GetReq], error: Pin<&mut CxxString>) {
|
||||
let chunk_ids = reqs
|
||||
.iter()
|
||||
.map(|r| Bytes::from(r.chunk_id))
|
||||
.collect::<BTreeSet<_>>();
|
||||
match self.batch_get(&chunk_ids) {
|
||||
Ok(chunks) => {
|
||||
for req in reqs {
|
||||
match chunks.get(req.chunk_id) {
|
||||
Some(c) => req.chunk_ptr = Arc::into_raw(c.clone()),
|
||||
None => req.chunk_ptr = std::ptr::null_mut(),
|
||||
}
|
||||
}
|
||||
error.clear();
|
||||
}
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
unsafe fn release_raw_chunk(&self, chunk: *const Chunk) {
|
||||
if !chunk.is_null() {
|
||||
Arc::from_raw(chunk);
|
||||
}
|
||||
}
|
||||
|
||||
unsafe fn release_writing_chunk(&self, chunk: *mut WritingChunk) {
|
||||
if !chunk.is_null() {
|
||||
let _ = Box::from_raw(chunk);
|
||||
}
|
||||
}
|
||||
|
||||
fn update_raw_chunk(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
mut req: Pin<&mut ffi::UpdateReq>,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> *mut WritingChunk {
|
||||
match self.update_chunk(chunk_id, &mut req) {
|
||||
Ok(chunk) => Box::into_raw(Box::new(chunk)),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
req.out_error_code = match e {
|
||||
Error::IoError(_) => 4011, // ChunkWriteFailed
|
||||
Error::RocksDBError(_) => 4003, // ChunkMetadataSetError
|
||||
Error::MetaError(_) => 4002, // ChunkMetadataGetError
|
||||
Error::InvalidArg(_) => 3, // InvalidArg
|
||||
Error::SerializationError(_) => 4002, // ChunkMetadataGetError
|
||||
Error::ChecksumMismatch(_) => 4080, // ChecksumMismatch
|
||||
Error::ChainVersionMismatch(_) => 4081, // ChainVersionMismatch
|
||||
Error::ChunkETagMismatch(_) => 4083, // ChunkETagMismatch
|
||||
Error::ChunkAlreadyExists => 4084, // ChunkAlreadyExists
|
||||
Error::ChunkCommittedUpdate(_) => 4008, // ChunkCommittedUpdate
|
||||
Error::ChunkMissingUpdate(_) => 4007, // ChunkMissingUpdate
|
||||
Error::NoSpace => 7021, // NoSpace
|
||||
};
|
||||
std::ptr::null_mut()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
unsafe fn commit_raw_chunk(
|
||||
&self,
|
||||
new_chunk: *mut WritingChunk,
|
||||
sync: bool,
|
||||
error: Pin<&mut CxxString>,
|
||||
) {
|
||||
let new_chunk = Box::from_raw(new_chunk);
|
||||
match self.commit_chunk(*new_chunk, sync) {
|
||||
Ok(_) => (),
|
||||
Err(e) => error.push_str(&e.to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
unsafe fn commit_raw_chunks(
|
||||
&self,
|
||||
reqs: &[*mut WritingChunk],
|
||||
sync: bool,
|
||||
error: Pin<&mut CxxString>,
|
||||
) {
|
||||
let chunks = reqs.iter().map(|c| *Box::from_raw(*c)).collect::<Vec<_>>();
|
||||
match self.commit_chunks(chunks, sync) {
|
||||
Ok(_) => (),
|
||||
Err(e) => error.push_str(&e.to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
fn query_raw_chunks(
|
||||
&self,
|
||||
begin: &[u8],
|
||||
end: &[u8],
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks> {
|
||||
match self.query_chunks(begin, end, max_count) {
|
||||
Ok(vec) => Box::new(RawChunks { vec }),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
Default::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn query_all_raw_chunks(&self, prefix: &[u8], error: Pin<&mut CxxString>) -> Box<RawChunks> {
|
||||
match self.query_all_chunks(prefix) {
|
||||
Ok(vec) => Box::new(RawChunks { vec }),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
Default::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn query_raw_chunks_by_timestamp(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
begin: u64,
|
||||
end: u64,
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks> {
|
||||
match self.query_chunks_by_timestamp(prefix, begin, end, max_count) {
|
||||
Ok(vec) => Box::new(RawChunks { vec }),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
Default::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn raw_batch_remove(
|
||||
&self,
|
||||
begin: &[u8],
|
||||
end: &[u8],
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> u64 {
|
||||
match self.batch_remove(begin, end, max_count) {
|
||||
Ok(cnt) => cnt,
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn query_raw_used_size(&self, prefix: &[u8], error: Pin<&mut CxxString>) -> u64 {
|
||||
match self.meta_store.query_used_size(prefix) {
|
||||
Ok(size) => size,
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn get_metrics(&self) -> ffi::Metrics {
|
||||
let metrics = self.metrics.as_ref();
|
||||
let copy_on_write_times = metrics.copy_on_write_times.swap(0, Ordering::AcqRel);
|
||||
let copy_on_write_latency = metrics.copy_on_write_latency.swap(0, Ordering::AcqRel);
|
||||
let copy_on_write_read_times = metrics.copy_on_write_read_times.swap(0, Ordering::AcqRel);
|
||||
let copy_on_write_read_latency =
|
||||
metrics.copy_on_write_read_latency.swap(0, Ordering::AcqRel);
|
||||
let allocate_total_latency = metrics.allocate_latency.swap(0, Ordering::AcqRel);
|
||||
let allocate_total_times = metrics.allocate_times.swap(0, Ordering::AcqRel);
|
||||
let pwrite_total_latency = metrics.pwrite_latency.swap(0, Ordering::AcqRel);
|
||||
let pwrite_total_times = metrics.pwrite_times.swap(0, Ordering::AcqRel);
|
||||
ffi::Metrics {
|
||||
copy_on_write_times,
|
||||
copy_on_write_latency: copy_on_write_latency / std::cmp::max(1, copy_on_write_times),
|
||||
copy_on_write_read_bytes: metrics.copy_on_write_read_bytes.swap(0, Ordering::AcqRel),
|
||||
copy_on_write_read_times,
|
||||
copy_on_write_read_latency: copy_on_write_read_latency
|
||||
/ std::cmp::max(1, copy_on_write_read_times),
|
||||
checksum_reuse: metrics.checksum_reuse.swap(0, Ordering::AcqRel),
|
||||
checksum_combine: metrics.checksum_combine.swap(0, Ordering::AcqRel),
|
||||
checksum_recalculate: metrics.checksum_recalculate.swap(0, Ordering::AcqRel),
|
||||
safe_write_direct_append: metrics.safe_write_direct_append.swap(0, Ordering::AcqRel),
|
||||
safe_write_indirect_append: metrics
|
||||
.safe_write_indirect_append
|
||||
.swap(0, Ordering::AcqRel),
|
||||
safe_write_truncate_shorten: metrics
|
||||
.safe_write_truncate_shorten
|
||||
.swap(0, Ordering::AcqRel),
|
||||
safe_write_truncate_extend: metrics
|
||||
.safe_write_truncate_extend
|
||||
.swap(0, Ordering::AcqRel),
|
||||
safe_write_read_tail_times: metrics
|
||||
.safe_write_read_tail_times
|
||||
.swap(0, Ordering::AcqRel),
|
||||
safe_write_read_tail_bytes: metrics
|
||||
.safe_write_read_tail_bytes
|
||||
.swap(0, Ordering::AcqRel),
|
||||
allocate_latency: allocate_total_latency / std::cmp::max(1, allocate_total_times),
|
||||
allocate_times: allocate_total_times,
|
||||
pwrite_latency: pwrite_total_latency / std::cmp::max(1, pwrite_total_times),
|
||||
pwrite_times: pwrite_total_times,
|
||||
}
|
||||
}
|
||||
|
||||
fn query_uncommitted_raw_chunks(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks> {
|
||||
match self.query_uncommitted_chunks(prefix) {
|
||||
Ok(chunks) => Box::new(RawChunks { vec: chunks }),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
Default::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn handle_uncommitted_raw_chunks(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
chain_ver: u32,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks> {
|
||||
match self.handle_uncommitted_chunks(prefix, chain_ver) {
|
||||
Ok(chunks) => Box::new(RawChunks { vec: chunks }),
|
||||
Err(e) => {
|
||||
error.push_str(&e.to_string());
|
||||
Box::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
struct RawChunks {
|
||||
vec: Vec<(Bytes, ChunkMeta)>,
|
||||
}
|
||||
|
||||
impl RawChunks {
|
||||
fn len(&self) -> usize {
|
||||
self.vec.len()
|
||||
}
|
||||
|
||||
fn chunk_id(&self, pos: usize) -> &[u8] {
|
||||
self.vec[pos].0.as_ref()
|
||||
}
|
||||
|
||||
fn chunk_meta(&self, pos: usize) -> &ffi::RawMeta {
|
||||
unsafe { std::mem::transmute(&self.vec[pos].1) }
|
||||
}
|
||||
|
||||
fn chunk_etag(&self, pos: usize) -> &[u8] {
|
||||
&self.vec[pos].1.etag
|
||||
}
|
||||
|
||||
fn chunk_uncommitted(&self, pos: usize) -> bool {
|
||||
self.vec[pos].1.uncommitted
|
||||
}
|
||||
}
|
||||
|
||||
#[::cxx::bridge(namespace = "hf3fs::chunk_engine")]
|
||||
pub mod ffi {
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
struct UpdateReq {
|
||||
without_checksum: bool,
|
||||
is_truncate: bool,
|
||||
is_remove: bool,
|
||||
is_syncing: bool,
|
||||
update_ver: u32,
|
||||
chain_ver: u32,
|
||||
checksum: u32,
|
||||
length: u32,
|
||||
offset: u32,
|
||||
data: u64,
|
||||
last_request_id: u64,
|
||||
last_client_low: u64,
|
||||
last_client_high: u64,
|
||||
expected_tag: &'static [u8],
|
||||
desired_tag: &'static [u8],
|
||||
create_new: bool,
|
||||
|
||||
out_non_existent: bool,
|
||||
out_error_code: u16,
|
||||
out_commit_ver: u32,
|
||||
out_chain_ver: u32,
|
||||
out_checksum: u32,
|
||||
}
|
||||
|
||||
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
||||
struct GetReq<'a> {
|
||||
chunk_id: &'a [u8],
|
||||
chunk_ptr: *const Chunk,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
struct RawMeta {
|
||||
pos: u64,
|
||||
chain_ver: u32,
|
||||
chunk_ver: u32,
|
||||
len: u32,
|
||||
checksum: u32,
|
||||
timestamp: u64,
|
||||
last_request_id: u64,
|
||||
last_client_low: u64,
|
||||
last_client_high: u64,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
struct RawUsedSize {
|
||||
allocated_size: u64,
|
||||
reserved_size: u64,
|
||||
position_count: u64,
|
||||
position_rc: u64,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
struct FdAndOffset {
|
||||
fd: i32,
|
||||
offset: u64,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq, Debug)]
|
||||
pub struct Metrics {
|
||||
pub copy_on_write_times: u64,
|
||||
pub copy_on_write_latency: u64,
|
||||
pub copy_on_write_read_bytes: u64,
|
||||
pub copy_on_write_read_times: u64,
|
||||
pub copy_on_write_read_latency: u64,
|
||||
|
||||
pub checksum_reuse: u64,
|
||||
pub checksum_combine: u64,
|
||||
pub checksum_recalculate: u64,
|
||||
|
||||
pub safe_write_direct_append: u64,
|
||||
pub safe_write_indirect_append: u64,
|
||||
pub safe_write_truncate_shorten: u64,
|
||||
pub safe_write_truncate_extend: u64,
|
||||
pub safe_write_read_tail_times: u64,
|
||||
pub safe_write_read_tail_bytes: u64,
|
||||
|
||||
pub allocate_times: u64,
|
||||
pub allocate_latency: u64,
|
||||
pub pwrite_times: u64,
|
||||
pub pwrite_latency: u64,
|
||||
}
|
||||
|
||||
extern "Rust" {
|
||||
type Engine;
|
||||
fn create(
|
||||
path: &str,
|
||||
create: bool,
|
||||
prefix_len: usize,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<Engine>;
|
||||
|
||||
fn raw_used_size(&self) -> RawUsedSize;
|
||||
fn allocate_groups(&self, min_remain: usize, max_remain: usize, batch_size: usize)
|
||||
-> usize;
|
||||
fn allocate_ultra_groups(
|
||||
&self,
|
||||
min_remain: usize,
|
||||
max_remain: usize,
|
||||
batch_size: usize,
|
||||
) -> usize;
|
||||
fn compact_groups(&self, max_reserved: u64) -> usize;
|
||||
|
||||
fn set_allow_to_allocate(&self, val: bool);
|
||||
fn speed_up_quit(&self);
|
||||
|
||||
fn get_raw_chunk(&self, chunk_id: &[u8], error: Pin<&mut CxxString>) -> *const Chunk;
|
||||
fn get_raw_chunks(&self, reqs: &mut [GetReq], error: Pin<&mut CxxString>);
|
||||
unsafe fn release_raw_chunk(&self, chunk: *const Chunk);
|
||||
unsafe fn release_writing_chunk(&self, chunk: *mut WritingChunk);
|
||||
|
||||
fn update_raw_chunk(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
req: Pin<&mut UpdateReq>,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> *mut WritingChunk;
|
||||
|
||||
unsafe fn commit_raw_chunk(
|
||||
&self,
|
||||
new_chunk: *mut WritingChunk,
|
||||
sync: bool,
|
||||
error: Pin<&mut CxxString>,
|
||||
);
|
||||
|
||||
unsafe fn commit_raw_chunks(
|
||||
&self,
|
||||
reqs: &[*mut WritingChunk],
|
||||
sync: bool,
|
||||
error: Pin<&mut CxxString>,
|
||||
);
|
||||
|
||||
fn query_raw_chunks(
|
||||
&self,
|
||||
begin: &[u8],
|
||||
end: &[u8],
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks>;
|
||||
|
||||
fn query_all_raw_chunks(&self, prefix: &[u8], error: Pin<&mut CxxString>)
|
||||
-> Box<RawChunks>;
|
||||
|
||||
fn query_raw_chunks_by_timestamp(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
begin: u64,
|
||||
end: u64,
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks>;
|
||||
|
||||
fn raw_batch_remove(
|
||||
&self,
|
||||
begin: &[u8],
|
||||
end: &[u8],
|
||||
max_count: u64,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> u64;
|
||||
|
||||
fn query_raw_used_size(&self, prefix: &[u8], error: Pin<&mut CxxString>) -> u64;
|
||||
|
||||
fn get_metrics(&self) -> Metrics;
|
||||
|
||||
fn query_uncommitted_raw_chunks(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks>;
|
||||
|
||||
fn handle_uncommitted_raw_chunks(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
chain_ver: u32,
|
||||
error: Pin<&mut CxxString>,
|
||||
) -> Box<RawChunks>;
|
||||
}
|
||||
|
||||
extern "Rust" {
|
||||
type LogGuard;
|
||||
fn init_log(path: &str, error: Pin<&mut CxxString>) -> Box<LogGuard>;
|
||||
}
|
||||
|
||||
extern "Rust" {
|
||||
type Chunk;
|
||||
fn raw_meta(&self) -> &RawMeta;
|
||||
fn raw_etag(&self) -> &[u8];
|
||||
fn uncommitted(&self) -> bool;
|
||||
fn fd_and_offset(&self) -> FdAndOffset;
|
||||
}
|
||||
|
||||
extern "Rust" {
|
||||
type WritingChunk;
|
||||
fn raw_meta(&self) -> &RawMeta;
|
||||
fn raw_etag(&self) -> &[u8];
|
||||
fn uncommitted(&self) -> bool;
|
||||
fn raw_chunk(&self) -> *const Chunk;
|
||||
fn set_chain_ver(&mut self, chain_ver: u32);
|
||||
}
|
||||
|
||||
extern "Rust" {
|
||||
type RawChunks;
|
||||
fn len(&self) -> usize;
|
||||
fn chunk_id(&self, pos: usize) -> &[u8];
|
||||
fn chunk_meta(&self, pos: usize) -> &RawMeta;
|
||||
fn chunk_etag(&self, pos: usize) -> &[u8];
|
||||
fn chunk_uncommitted(&self, pos: usize) -> bool;
|
||||
}
|
||||
}
|
||||
|
||||
static_assertions::const_assert_eq!(
|
||||
std::mem::align_of::<ChunkMeta>(),
|
||||
std::mem::align_of::<ffi::RawMeta>()
|
||||
);
|
||||
static_assertions::const_assert_eq!(
|
||||
std::mem::size_of::<UsedSize>(),
|
||||
std::mem::size_of::<ffi::RawUsedSize>()
|
||||
);
|
||||
static_assertions::const_assert_eq!(
|
||||
std::mem::align_of::<UsedSize>(),
|
||||
std::mem::align_of::<ffi::RawUsedSize>()
|
||||
);
|
||||
static_assertions::const_assert_eq!(
|
||||
std::mem::size_of::<Metrics>(),
|
||||
std::mem::size_of::<ffi::Metrics>()
|
||||
);
|
||||
static_assertions::const_assert_eq!(
|
||||
std::mem::align_of::<Metrics>(),
|
||||
std::mem::align_of::<ffi::Metrics>()
|
||||
);
|
||||
176
src/storage/chunk_engine/src/file/cluster.rs
Normal file
176
src/storage/chunk_engine/src/file/cluster.rs
Normal file
@@ -0,0 +1,176 @@
|
||||
use std::fs::File;
|
||||
use std::os::fd::AsRawFd;
|
||||
use std::os::unix::fs::FileExt;
|
||||
use std::path::Path;
|
||||
use std::{fs::OpenOptions, os::unix::fs::OpenOptionsExt};
|
||||
|
||||
use super::super::*;
|
||||
|
||||
const PUNCH_HOLE_FLAGS: i32 = libc::FALLOC_FL_PUNCH_HOLE | libc::FALLOC_FL_KEEP_SIZE;
|
||||
|
||||
pub struct Cluster {
|
||||
pub normal_fd: File,
|
||||
pub direct_fd: File,
|
||||
}
|
||||
|
||||
impl Cluster {
|
||||
pub fn open(path: &Path, create: bool, support_direct_io: bool) -> Result<Self> {
|
||||
let normal_fd = OpenOptions::new()
|
||||
.read(true)
|
||||
.write(true)
|
||||
.create(create)
|
||||
.custom_flags(libc::O_SYNC)
|
||||
.open(path)
|
||||
.map_err(|err| Error::IoError(format!("open {:?} failed: {:?}", path, err)))?;
|
||||
|
||||
let direct_fd = OpenOptions::new()
|
||||
.read(true)
|
||||
.write(true)
|
||||
.custom_flags(if support_direct_io {
|
||||
libc::O_DIRECT
|
||||
} else {
|
||||
libc::O_SYNC
|
||||
})
|
||||
.open(path)
|
||||
.map_err(|err| Error::IoError(format!("open {:?} failed: {:?}", path, err)))?;
|
||||
|
||||
Ok(Self {
|
||||
normal_fd,
|
||||
direct_fd,
|
||||
})
|
||||
}
|
||||
|
||||
pub fn fallocate(&self, group_id: GroupId, punch_hole: bool) -> Result<()> {
|
||||
let res = unsafe {
|
||||
libc::fallocate(
|
||||
self.direct_fd.as_raw_fd(),
|
||||
if punch_hole { PUNCH_HOLE_FLAGS } else { 0 },
|
||||
group_id.offset().into(),
|
||||
group_id.size().into(),
|
||||
)
|
||||
};
|
||||
if res == -1 {
|
||||
Err(Error::IoError(format!(
|
||||
"fallocate {} error: {:?}",
|
||||
self.direct_fd.as_raw_fd(),
|
||||
std::io::Error::last_os_error()
|
||||
)))
|
||||
} else {
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
pub fn pread(&self, pos: Position, mut buf: &mut [u8], offset: u32) -> Result<()> {
|
||||
let aligned = is_aligned_io(buf, offset);
|
||||
let mut offset = pos.offset() + offset;
|
||||
while !buf.is_empty() {
|
||||
let fd = if aligned && is_aligned_len(buf.len() as u32) {
|
||||
&self.direct_fd
|
||||
} else {
|
||||
&self.normal_fd
|
||||
};
|
||||
|
||||
match fd.read_at(buf, offset.into()) {
|
||||
Ok(0) => return Err(Error::IoError(format!("read {:?} return 0", fd))),
|
||||
Ok(n) => {
|
||||
buf = &mut buf[n..];
|
||||
offset += n;
|
||||
}
|
||||
Err(e) => Self::handle_error(e)?,
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn pwrite(&self, pos: Position, mut buf: &[u8], offset: u32) -> Result<()> {
|
||||
let aligned = is_aligned_io(buf, offset);
|
||||
let mut offset = pos.offset() + offset;
|
||||
while !buf.is_empty() {
|
||||
let fd = if aligned && is_aligned_len(buf.len() as u32) {
|
||||
&self.direct_fd
|
||||
} else {
|
||||
&self.normal_fd
|
||||
};
|
||||
|
||||
match fd.write_at(buf, offset.into()) {
|
||||
Ok(0) => return Err(Error::IoError(format!("write {:?} return 0", fd))),
|
||||
Ok(n) => {
|
||||
buf = &buf[n..];
|
||||
offset += n as u64;
|
||||
}
|
||||
Err(e) => Self::handle_error(e)?,
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn handle_error(e: std::io::Error) -> Result<()> {
|
||||
if e.kind() == std::io::ErrorKind::Interrupted {
|
||||
Ok(())
|
||||
} else {
|
||||
Err(Error::IoError(format!("io error: {:?}", e)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::os::fd::FromRawFd;
|
||||
|
||||
#[test]
|
||||
fn test_cluster_open() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let support_direct_io = FsType::check(&dir).support_direct_io();
|
||||
|
||||
for chunk_size in [CHUNK_SIZE_NORMAL, CHUNK_SIZE_SMALL, CHUNK_SIZE_LARGE] {
|
||||
let file_path = dir.path().join(format!("test.cluster.{}", chunk_size));
|
||||
assert!(Cluster::open(&file_path, false, support_direct_io).is_err());
|
||||
|
||||
let cluster = Cluster::open(&file_path, true, support_direct_io).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), 0);
|
||||
|
||||
let cluster = Cluster::open(&file_path, false, support_direct_io).unwrap();
|
||||
let group_id = GroupId::new(chunk_size, 0, 0);
|
||||
|
||||
let mut buf = [0u8; 5];
|
||||
let pos = Position::new(group_id, 0);
|
||||
assert!(cluster.pread(pos, &mut buf, 0).is_err());
|
||||
|
||||
cluster.fallocate(group_id, false).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size());
|
||||
|
||||
let bytes = "hello world!".as_bytes();
|
||||
assert!(cluster.pwrite(pos, bytes, 0).is_ok());
|
||||
|
||||
assert!(cluster.pread(pos, &mut buf, 0).is_ok());
|
||||
assert_eq!(&buf, &bytes[0..buf.len()]);
|
||||
|
||||
cluster.fallocate(group_id, true).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size());
|
||||
}
|
||||
|
||||
assert!(Cluster::open(Path::new("/dev/null"), false, support_direct_io).is_err());
|
||||
|
||||
let cluster = Cluster {
|
||||
normal_fd: File::open("/dev/null").unwrap(),
|
||||
direct_fd: File::open("/dev/null").unwrap(),
|
||||
};
|
||||
assert!(cluster.fallocate(GroupId::default(), false).is_err());
|
||||
assert!(cluster.fallocate(GroupId::default(), true).is_err());
|
||||
assert!(cluster.pwrite(Position::from(0), &[1], 0).is_err());
|
||||
|
||||
let cluster = Cluster {
|
||||
normal_fd: unsafe { File::from_raw_fd(23333) },
|
||||
direct_fd: unsafe { File::from_raw_fd(23333) },
|
||||
};
|
||||
let mut buf = [0u8; 32];
|
||||
assert!(cluster.pread(Position::from(0), &mut buf, 0).is_err());
|
||||
std::mem::forget(cluster);
|
||||
|
||||
assert!(Cluster::handle_error(std::io::Error::from_raw_os_error(libc::EINTR)).is_ok());
|
||||
}
|
||||
}
|
||||
118
src/storage/chunk_engine/src/file/clusters.rs
Normal file
118
src/storage/chunk_engine/src/file/clusters.rs
Normal file
@@ -0,0 +1,118 @@
|
||||
use super::super::*;
|
||||
use std::{fmt::Debug, os::fd::AsRawFd, path::PathBuf};
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
pub struct ClustersConfig {
|
||||
pub path: PathBuf,
|
||||
pub chunk_size: Size,
|
||||
pub create: bool,
|
||||
}
|
||||
|
||||
pub struct Clusters {
|
||||
pub path: PathBuf,
|
||||
pub chunk_size: Size,
|
||||
files: Vec<Cluster>,
|
||||
}
|
||||
|
||||
impl Clusters {
|
||||
const COUNT: u32 = 256;
|
||||
|
||||
pub fn open(config: &ClustersConfig) -> Result<Self> {
|
||||
let mut files: Vec<Cluster> = vec![];
|
||||
|
||||
if config.create {
|
||||
std::fs::create_dir_all(&config.path)
|
||||
.map_err(|e| Error::IoError(format!("create dir {:?} fail: {e:?}", config.path)))?;
|
||||
}
|
||||
|
||||
let support_direct_io = FsType::check(&config.path).support_direct_io();
|
||||
for cluster_id in 0..Self::COUNT {
|
||||
let file_path = config.path.join(format!("{:02X}", cluster_id));
|
||||
files.push(Cluster::open(&file_path, config.create, support_direct_io)?);
|
||||
}
|
||||
|
||||
Ok(Clusters {
|
||||
path: config.path.clone(),
|
||||
chunk_size: config.chunk_size,
|
||||
files,
|
||||
})
|
||||
}
|
||||
|
||||
pub fn allocate(&self, group_id: GroupId) -> Result<()> {
|
||||
self.files[group_id.cluster() as usize].fallocate(group_id, false)
|
||||
}
|
||||
|
||||
pub fn deallocate(&self, group_id: GroupId) -> Result<()> {
|
||||
self.files[group_id.cluster() as usize].fallocate(group_id, true)
|
||||
}
|
||||
|
||||
pub fn pread(&self, pos: Position, buf: &mut [u8], offset: u32) -> Result<()> {
|
||||
self.files[pos.cluster() as usize].pread(pos, buf, offset)
|
||||
}
|
||||
|
||||
pub fn pwrite(&self, pos: Position, buf: &[u8], offset: u32) -> Result<()> {
|
||||
self.files[pos.cluster() as usize].pwrite(pos, buf, offset)
|
||||
}
|
||||
|
||||
pub fn fd_and_offset(&self, pos: Position) -> FdAndOffset {
|
||||
FdAndOffset {
|
||||
fd: self.files[pos.cluster() as usize].direct_fd.as_raw_fd(),
|
||||
offset: pos.offset().into(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Debug for Clusters {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("Clusters")
|
||||
.field("path", &self.path)
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_clusters() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = ClustersConfig {
|
||||
path: dir.path().into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
|
||||
let clusters = Clusters::open(&config).unwrap();
|
||||
|
||||
let group_id = GroupId::new(CHUNK_SIZE_NORMAL, 0, 0);
|
||||
let cluster = &clusters.files[0];
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), 0);
|
||||
|
||||
clusters.allocate(group_id).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size());
|
||||
|
||||
let group_id_3 = GroupId::new(CHUNK_SIZE_NORMAL, 0, 3);
|
||||
clusters.allocate(group_id_3).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size() * 4);
|
||||
|
||||
clusters.deallocate(group_id).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size() * 4);
|
||||
|
||||
clusters.deallocate(group_id_3).unwrap();
|
||||
let meta = cluster.normal_fd.metadata().unwrap();
|
||||
assert_eq!(meta.len(), group_id.size() * 4);
|
||||
|
||||
let config = ClustersConfig {
|
||||
path: std::path::Path::new("/proc/test").into(),
|
||||
chunk_size: CHUNK_SIZE_NORMAL,
|
||||
create: true,
|
||||
};
|
||||
assert!(Clusters::open(&config).is_err());
|
||||
}
|
||||
}
|
||||
33
src/storage/chunk_engine/src/file/fs_type.rs
Normal file
33
src/storage/chunk_engine/src/file/fs_type.rs
Normal file
@@ -0,0 +1,33 @@
|
||||
use std::{ffi::CString, os::unix::ffi::OsStrExt, path::Path};
|
||||
|
||||
#[derive(Debug, PartialEq, Clone, Copy)]
|
||||
pub enum FsType {
|
||||
EXT4,
|
||||
NFS,
|
||||
XFS,
|
||||
ZFS,
|
||||
OTHER,
|
||||
}
|
||||
|
||||
impl FsType {
|
||||
pub fn check(path: impl AsRef<Path>) -> Self {
|
||||
let path_cstr = CString::new(path.as_ref().as_os_str().as_bytes()).unwrap();
|
||||
let mut stat: libc::statfs = unsafe { std::mem::zeroed() };
|
||||
let result = unsafe { libc::statfs(path_cstr.as_ptr(), &mut stat) };
|
||||
if result != 0 {
|
||||
Self::OTHER
|
||||
} else {
|
||||
match stat.f_type {
|
||||
libc::EXT4_SUPER_MAGIC => Self::EXT4,
|
||||
libc::NFS_SUPER_MAGIC => Self::NFS,
|
||||
libc::XFS_SUPER_MAGIC => Self::XFS,
|
||||
0x2FC12FC1 => Self::ZFS, // https://github.com/openzfs/zfs/blob/33174af15112ed5c53299da2d28e763b0163f428/include/sys/fs/zfs.h#L1339
|
||||
_ => Self::OTHER,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn support_direct_io(&self) -> bool {
|
||||
!matches!(self, FsType::ZFS)
|
||||
}
|
||||
}
|
||||
7
src/storage/chunk_engine/src/file/mod.rs
Normal file
7
src/storage/chunk_engine/src/file/mod.rs
Normal file
@@ -0,0 +1,7 @@
|
||||
mod cluster;
|
||||
mod clusters;
|
||||
mod fs_type;
|
||||
|
||||
pub use cluster::*;
|
||||
pub use clusters::*;
|
||||
pub use fs_type::*;
|
||||
18
src/storage/chunk_engine/src/lib.rs
Normal file
18
src/storage/chunk_engine/src/lib.rs
Normal file
@@ -0,0 +1,18 @@
|
||||
mod alloc;
|
||||
mod core;
|
||||
mod cxx;
|
||||
mod file;
|
||||
mod meta;
|
||||
mod types;
|
||||
mod utils;
|
||||
|
||||
pub use alloc::*;
|
||||
pub use core::*;
|
||||
pub use cxx::{
|
||||
ffi::{FdAndOffset, GetReq, UpdateReq},
|
||||
CxxString,
|
||||
};
|
||||
pub use file::*;
|
||||
pub use meta::*;
|
||||
pub use types::*;
|
||||
pub use utils::*;
|
||||
217
src/storage/chunk_engine/src/meta/meta_key.rs
Normal file
217
src/storage/chunk_engine/src/meta/meta_key.rs
Normal file
@@ -0,0 +1,217 @@
|
||||
use super::super::{Bytes, Error, GroupId, Position, Result};
|
||||
use byteorder::{BigEndian, ByteOrder};
|
||||
|
||||
pub struct MetaKey(Bytes);
|
||||
|
||||
impl MetaKey {
|
||||
pub const CHUNK_META_KEY_PREFIX: u8 = 1;
|
||||
pub const GROUP_BITS_KEY_PREFIX: u8 = 2;
|
||||
pub const POS_TO_CHUNK_KEY_PREFIX: u8 = 3;
|
||||
pub const USED_SIZE_KEY_PREFIX: u8 = 4;
|
||||
pub const USED_SIZE_PREFIX_LEN_KEY: u8 = 5;
|
||||
pub const TIMESTAMP_KEY_PREFIX: u8 = 6;
|
||||
// pub const WRITING_CHUNK_KEY_PREFIX: u8 = 7;
|
||||
pub const VERSION_KEY: u8 = 8;
|
||||
pub const WRITING_CHUNK_KEY_PREFIX: u8 = 9;
|
||||
pub const TEST_KEY_PREFIX: u8 = b'm';
|
||||
|
||||
fn prefix(mark: u8) -> Self {
|
||||
let mut vec = Bytes::new();
|
||||
vec.push(mark);
|
||||
Self(vec)
|
||||
}
|
||||
|
||||
pub fn chunk_meta_key_prefix() -> Self {
|
||||
Self::prefix(Self::CHUNK_META_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn chunk_meta_key(chunk_id: &[u8]) -> Self {
|
||||
let mut out = Self::chunk_meta_key_prefix();
|
||||
for num in chunk_id {
|
||||
out.0.push(!num)
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
pub fn parse_chunk_meta_key(key: &[u8]) -> Bytes {
|
||||
let mut out = Bytes::new();
|
||||
for num in &key[1..] {
|
||||
out.push(!num);
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
pub fn group_bits_key_prefix() -> Self {
|
||||
Self::prefix(Self::GROUP_BITS_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn group_bits_chunk_size_prefix(group_id: GroupId) -> Self {
|
||||
let mut out = Self::group_bits_key_prefix();
|
||||
out.0.extend_from_slice(&group_id.to_be_bytes()[..4]);
|
||||
out
|
||||
}
|
||||
|
||||
pub fn group_bits_key(group_id: GroupId) -> Self {
|
||||
let mut out = Self::group_bits_key_prefix();
|
||||
out.0.extend_from_slice(&group_id.to_be_bytes());
|
||||
out
|
||||
}
|
||||
|
||||
pub fn parse_group_bits_key(key: &[u8]) -> Result<GroupId> {
|
||||
if key.len() == std::mem::size_of::<u8>() + std::mem::size_of::<u64>() {
|
||||
let group_id = BigEndian::read_u64(&key[1..]);
|
||||
Ok(GroupId::from(group_id))
|
||||
} else {
|
||||
Err(Error::MetaError(format!(
|
||||
"parse group bits key fail: {:?}",
|
||||
key
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn pos_to_chunk_key_prefix() -> Self {
|
||||
Self::prefix(Self::POS_TO_CHUNK_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn group_to_chunks_key_prefix(group_id: GroupId) -> Self {
|
||||
let mut out = Self::pos_to_chunk_key_prefix();
|
||||
out.0
|
||||
.extend_from_slice(&Position::new(group_id, 0).to_be_bytes());
|
||||
out.0.pop();
|
||||
out
|
||||
}
|
||||
|
||||
pub fn pos_to_chunk_key(pos: Position) -> Self {
|
||||
let mut out = Self::pos_to_chunk_key_prefix();
|
||||
out.0.extend_from_slice(&pos.to_be_bytes());
|
||||
out
|
||||
}
|
||||
|
||||
pub fn parse_pos_to_chunk_key(key: &[u8]) -> Result<Position> {
|
||||
if key.len() == std::mem::size_of::<u8>() + std::mem::size_of::<u64>() {
|
||||
Ok(Position::from(BigEndian::read_u64(&key[1..])))
|
||||
} else {
|
||||
Err(Error::MetaError(format!(
|
||||
"parse pos to chunk key fail: {:?}",
|
||||
key
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn used_size_key_prefix() -> Self {
|
||||
Self::prefix(Self::USED_SIZE_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn used_size_key(prefix: &[u8]) -> Self {
|
||||
let mut out = Self::used_size_key_prefix();
|
||||
out.0.extend_from_slice(prefix);
|
||||
out
|
||||
}
|
||||
|
||||
pub fn used_size_prefix_len_key() -> Self {
|
||||
Self::prefix(Self::USED_SIZE_PREFIX_LEN_KEY)
|
||||
}
|
||||
|
||||
pub fn timestamp_key_prefix() -> Self {
|
||||
Self::prefix(Self::TIMESTAMP_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn timestamp_key_filter(prefix: &[u8], timestamp: u64) -> Self {
|
||||
let mut out = Self::timestamp_key_prefix();
|
||||
out.0.extend_from_slice(prefix);
|
||||
out.0.extend_from_slice(×tamp.to_be_bytes());
|
||||
out
|
||||
}
|
||||
|
||||
pub fn timestamp_key(timestamp: u64, chunk_id: &[u8], prefix_len: usize) -> Self {
|
||||
let mut out = Self::timestamp_key_filter(&chunk_id[..prefix_len], timestamp);
|
||||
out.0.extend_from_slice(&chunk_id[prefix_len..]);
|
||||
out
|
||||
}
|
||||
|
||||
pub fn parse_timestamp_key(key: &[u8], prefix_len: usize) -> Result<(u64, Bytes)> {
|
||||
const L: usize = std::mem::size_of::<u8>() + std::mem::size_of::<u64>();
|
||||
if key.len() > L + prefix_len {
|
||||
let mut chunk_id = Bytes::from(&key[1..1 + prefix_len]);
|
||||
let timestamp = BigEndian::read_u64(&key[1 + prefix_len..]);
|
||||
chunk_id.extend_from_slice(&key[L + prefix_len..]);
|
||||
Ok((timestamp, chunk_id))
|
||||
} else {
|
||||
Err(Error::MetaError(format!(
|
||||
"parse timestamp key fail: {:?}",
|
||||
key
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn version_key() -> Self {
|
||||
Self::prefix(Self::VERSION_KEY)
|
||||
}
|
||||
|
||||
pub fn writing_chunk_key_prefix() -> Self {
|
||||
Self::prefix(Self::WRITING_CHUNK_KEY_PREFIX)
|
||||
}
|
||||
|
||||
pub fn writing_chunk_key(chunk_id: &[u8]) -> Self {
|
||||
let mut out = Self::writing_chunk_key_prefix();
|
||||
out.0.extend_from_slice(chunk_id);
|
||||
out
|
||||
}
|
||||
|
||||
pub fn parse_writing_chunk_key(key: &[u8]) -> Result<Bytes> {
|
||||
if key.len() > 1 {
|
||||
Ok(Bytes::from(&key[1..]))
|
||||
} else {
|
||||
Err(Error::MetaError(format!(
|
||||
"parse writing chunk key fail: {:?}",
|
||||
key
|
||||
)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl AsRef<[u8]> for MetaKey {
|
||||
fn as_ref(&self) -> &[u8] {
|
||||
&self.0
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
#[test]
|
||||
fn test_meta_key_create() {
|
||||
use super::super::super::*;
|
||||
|
||||
let prefix = MetaKey::chunk_meta_key_prefix();
|
||||
assert_eq!(prefix.as_ref(), [MetaKey::CHUNK_META_KEY_PREFIX]);
|
||||
|
||||
let meta_key = MetaKey::chunk_meta_key(&[1, 2, 3, 4]);
|
||||
assert_eq!(
|
||||
meta_key.as_ref(),
|
||||
[MetaKey::CHUNK_META_KEY_PREFIX, !1, !2, !3, !4]
|
||||
);
|
||||
|
||||
let group_id = GroupId::new(CHUNK_SIZE_NORMAL, 1, 2);
|
||||
let pos = Position::new(group_id, 3);
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(pos);
|
||||
assert_eq!(pos_to_chunk_key.as_ref().len(), 1 + 8);
|
||||
let parsed_pos = MetaKey::parse_pos_to_chunk_key(pos_to_chunk_key.as_ref()).unwrap();
|
||||
assert_eq!(pos, parsed_pos);
|
||||
|
||||
let group_to_chunks_key_prefix = MetaKey::group_to_chunks_key_prefix(group_id);
|
||||
assert_eq!(group_to_chunks_key_prefix.as_ref().len(), 8);
|
||||
|
||||
assert!(MetaKey::parse_group_bits_key(&[]).is_err());
|
||||
assert!(MetaKey::parse_pos_to_chunk_key(&[]).is_err());
|
||||
|
||||
let timestamp_key = MetaKey::timestamp_key(1024, &[1, 2, 3, 4], 2);
|
||||
let (timestamp, chunk) = MetaKey::parse_timestamp_key(×tamp_key.0, 2).unwrap();
|
||||
assert_eq!(timestamp, 1024);
|
||||
assert_eq!(chunk, [1, 2, 3, 4].as_slice());
|
||||
|
||||
MetaKey::parse_timestamp_key(&[MetaKey::TIMESTAMP_KEY_PREFIX, 0, 1, 2, 3, 4, 5, 6, 7], 0)
|
||||
.unwrap_err();
|
||||
|
||||
MetaKey::parse_writing_chunk_key(MetaKey::writing_chunk_key_prefix().as_ref()).unwrap_err();
|
||||
}
|
||||
}
|
||||
152
src/storage/chunk_engine/src/meta/meta_merge.rs
Normal file
152
src/storage/chunk_engine/src/meta/meta_merge.rs
Normal file
@@ -0,0 +1,152 @@
|
||||
use byteorder::{ByteOrder, LittleEndian};
|
||||
use derse::{DownwardBytes, Serialize};
|
||||
|
||||
use super::super::{GroupState, MergeState, MetaKey};
|
||||
|
||||
pub struct MetaMergeOp;
|
||||
|
||||
impl super::MergeOp for MetaMergeOp {
|
||||
fn full_merge<'a>(
|
||||
key: &[u8],
|
||||
value: Option<&[u8]>,
|
||||
operands: impl Iterator<Item = &'a [u8]>,
|
||||
) -> Option<Vec<u8>> {
|
||||
match key[0] {
|
||||
MetaKey::GROUP_BITS_KEY_PREFIX => {
|
||||
let mut merge_bits = MergeState::empty();
|
||||
for op in operands {
|
||||
merge_bits.merge(&MergeState::from(op).ok()?);
|
||||
}
|
||||
|
||||
let mut bits = if let Some(group_bits) = value {
|
||||
GroupState::from(group_bits).ok()?
|
||||
} else {
|
||||
GroupState::empty()
|
||||
};
|
||||
bits.update(&merge_bits);
|
||||
Some(Vec::from(bits.as_bytes()))
|
||||
}
|
||||
MetaKey::USED_SIZE_KEY_PREFIX => {
|
||||
let mut total = 0i64;
|
||||
for op in operands {
|
||||
if op.len() != std::mem::size_of_val(&total) {
|
||||
return None;
|
||||
}
|
||||
total += LittleEndian::read_i64(op);
|
||||
}
|
||||
if let Some(value) = value {
|
||||
if value.len() != std::mem::size_of_val(&total) {
|
||||
return None;
|
||||
}
|
||||
total += LittleEndian::read_i64(value);
|
||||
}
|
||||
let mut vec = Vec::with_capacity(std::mem::size_of_val(&total));
|
||||
vec.extend_from_slice(&total.to_le_bytes());
|
||||
Some(vec)
|
||||
}
|
||||
MetaKey::TEST_KEY_PREFIX => {
|
||||
let mut out = Vec::<u8>::new();
|
||||
if let Some(value) = value {
|
||||
out.extend_from_slice(value);
|
||||
}
|
||||
for op in operands {
|
||||
out.extend_from_slice(op);
|
||||
}
|
||||
Some(out)
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn partial_merge<'a>(key: &[u8], operands: impl Iterator<Item = &'a [u8]>) -> Option<Vec<u8>> {
|
||||
match key[0] {
|
||||
MetaKey::GROUP_BITS_KEY_PREFIX => {
|
||||
let mut merge_bits = MergeState::empty();
|
||||
for op in operands {
|
||||
merge_bits.merge(&MergeState::from(op).ok()?);
|
||||
}
|
||||
|
||||
if let Ok(bytes) = merge_bits.serialize::<DownwardBytes>() {
|
||||
Some(Vec::from(bytes.as_slice()))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
MetaKey::USED_SIZE_KEY_PREFIX => {
|
||||
let mut total = 0i64;
|
||||
for op in operands {
|
||||
if op.len() != std::mem::size_of_val(&total) {
|
||||
return None;
|
||||
}
|
||||
total += LittleEndian::read_i64(op);
|
||||
}
|
||||
let mut vec = Vec::with_capacity(std::mem::size_of_val(&total));
|
||||
vec.extend_from_slice(&total.to_le_bytes());
|
||||
Some(vec)
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::*;
|
||||
|
||||
#[test]
|
||||
fn test_meta_merge_op() {
|
||||
let slice = [233u8].as_slice();
|
||||
assert_eq!(
|
||||
MetaMergeOp::partial_merge(&[233], vec![slice].into_iter()),
|
||||
None
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_used_size_merge() {
|
||||
let mut ops = Vec::new();
|
||||
for i in 0..10 {
|
||||
let mut vec = Vec::with_capacity(std::mem::size_of::<i64>());
|
||||
vec.extend_from_slice(&(i as i64).to_le_bytes());
|
||||
ops.push(vec);
|
||||
}
|
||||
|
||||
let merged = MetaMergeOp::partial_merge(
|
||||
&[MetaKey::USED_SIZE_KEY_PREFIX],
|
||||
ops.iter().map(|v| v.as_slice()),
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(LittleEndian::read_i64(&merged), (0..10).sum::<i64>());
|
||||
|
||||
// test full merge.
|
||||
let mut ops = Vec::new();
|
||||
for i in 0..10 {
|
||||
let mut vec = Vec::with_capacity(std::mem::size_of::<i64>());
|
||||
vec.extend_from_slice(&(i as i64).to_le_bytes());
|
||||
ops.push(vec);
|
||||
}
|
||||
|
||||
let value = 10i64;
|
||||
let merged = MetaMergeOp::full_merge(
|
||||
&[MetaKey::USED_SIZE_KEY_PREFIX],
|
||||
Some(value.to_le_bytes().as_slice()),
|
||||
ops.iter().map(|v| v.as_slice()),
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(
|
||||
LittleEndian::read_i64(&merged),
|
||||
(0..10).sum::<i64>() + value
|
||||
);
|
||||
|
||||
// test invalid ops.
|
||||
let invalid_ops = [vec![1, 2, 3]];
|
||||
assert_eq!(
|
||||
MetaMergeOp::partial_merge(
|
||||
&[MetaKey::USED_SIZE_KEY_PREFIX],
|
||||
invalid_ops.iter().map(|v| v.as_slice()),
|
||||
),
|
||||
None
|
||||
);
|
||||
}
|
||||
}
|
||||
873
src/storage/chunk_engine/src/meta/meta_store.rs
Normal file
873
src/storage/chunk_engine/src/meta/meta_store.rs
Normal file
@@ -0,0 +1,873 @@
|
||||
use std::{cell::RefCell, collections::HashMap, ops::DerefMut};
|
||||
|
||||
use super::super::*;
|
||||
use byteorder::{ByteOrder, LittleEndian};
|
||||
use derse::{Deserialize, DownwardBytes, Serialize};
|
||||
|
||||
#[derive(Debug, Default, Clone)]
|
||||
pub struct MetaStoreConfig {
|
||||
pub rocksdb: RocksDBConfig,
|
||||
pub prefix_len: usize,
|
||||
}
|
||||
|
||||
pub struct MetaStore {
|
||||
rocksdb: RocksDB,
|
||||
config: MetaStoreConfig,
|
||||
}
|
||||
|
||||
impl MetaStore {
|
||||
thread_local! {
|
||||
static BYTES: RefCell<DownwardBytes> = RefCell::new(DownwardBytes::with_capacity(Size::MB.into()));
|
||||
}
|
||||
|
||||
pub fn open(config: &MetaStoreConfig) -> Result<Self> {
|
||||
let rocksdb = RocksDB::open::<MetaMergeOp>(&config.rocksdb)?;
|
||||
|
||||
let mut this = MetaStore {
|
||||
rocksdb,
|
||||
config: config.clone(),
|
||||
};
|
||||
|
||||
this.update_used_size_if_need()?;
|
||||
|
||||
Ok(this)
|
||||
}
|
||||
|
||||
pub fn get_chunk_meta(&self, chunk_id: &[u8]) -> Result<Option<ChunkMeta>> {
|
||||
let chunk_meta_key = MetaKey::chunk_meta_key(chunk_id);
|
||||
let value = self.rocksdb.get(chunk_meta_key)?;
|
||||
|
||||
if let Some(value) = value {
|
||||
Ok(Some(
|
||||
ChunkMeta::deserialize(value.as_ref()).map_err(Error::SerializationError)?,
|
||||
))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn query_chunks(
|
||||
&self,
|
||||
begin: impl AsRef<[u8]>,
|
||||
end: impl AsRef<[u8]>,
|
||||
max_count: u64,
|
||||
) -> Result<Vec<(Bytes, ChunkMeta)>> {
|
||||
let it = self.iterator();
|
||||
self.query_chunks_from_iterator(it, begin, end, max_count)
|
||||
}
|
||||
|
||||
pub fn query_chunks_from_iterator(
|
||||
&self,
|
||||
mut it: RocksDBIterator,
|
||||
begin: impl AsRef<[u8]>,
|
||||
end: impl AsRef<[u8]>,
|
||||
max_count: u64,
|
||||
) -> Result<Vec<(Bytes, ChunkMeta)>> {
|
||||
let mut out = Vec::<(Bytes, ChunkMeta)>::with_capacity(4096);
|
||||
|
||||
let end_key = MetaKey::chunk_meta_key(end.as_ref());
|
||||
it.seek(&end_key)?;
|
||||
|
||||
if it.key() == Some(end_key.as_ref()) {
|
||||
it.next(); // [begin, end)
|
||||
}
|
||||
|
||||
for _ in 0..max_count {
|
||||
if !it.valid() {
|
||||
break;
|
||||
}
|
||||
|
||||
if it.key().unwrap()[0] != MetaKey::CHUNK_META_KEY_PREFIX {
|
||||
break;
|
||||
}
|
||||
|
||||
let chunk_id = MetaKey::parse_chunk_meta_key(it.key().unwrap());
|
||||
if begin.as_ref() <= chunk_id.as_ref() {
|
||||
let chunk_meta = ChunkMeta::deserialize(it.value().unwrap())
|
||||
.map_err(Error::SerializationError)?;
|
||||
out.push((chunk_id, chunk_meta))
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
|
||||
it.next();
|
||||
}
|
||||
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
pub fn query_chunks_by_timestamp(
|
||||
&self,
|
||||
prefix: &[u8],
|
||||
begin: u64,
|
||||
end: u64,
|
||||
max_count: u64,
|
||||
) -> Result<Vec<Bytes>> {
|
||||
let mut it = self.iterator();
|
||||
let mut out = Vec::<Bytes>::with_capacity(4096);
|
||||
|
||||
let begin_key = MetaKey::timestamp_key_filter(prefix, begin);
|
||||
it.seek(&begin_key)?;
|
||||
|
||||
for _ in 0..max_count {
|
||||
if !it.valid() {
|
||||
break;
|
||||
}
|
||||
|
||||
let key = it.key().unwrap();
|
||||
if key[0] != MetaKey::TIMESTAMP_KEY_PREFIX {
|
||||
break;
|
||||
}
|
||||
if key.len() <= prefix.len() || &key[1..1 + self.config.prefix_len] != prefix {
|
||||
break;
|
||||
}
|
||||
|
||||
let (timestamp, chunk_id) = MetaKey::parse_timestamp_key(key, self.config.prefix_len)?;
|
||||
if timestamp < end {
|
||||
out.push(chunk_id)
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
|
||||
it.next();
|
||||
}
|
||||
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
#[inline(always)]
|
||||
pub fn write(&self, write_batch: rocksdb::WriteBatch, sync: bool) -> Result<()> {
|
||||
self.rocksdb.write(write_batch, sync)
|
||||
}
|
||||
|
||||
pub fn add_chunk(&self, chunk_id: &[u8], chunk_meta: &ChunkMeta, sync: bool) -> Result<()> {
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
self.add_chunk_mut(chunk_id, chunk_meta, &mut write_batch)?;
|
||||
self.write(write_batch, sync)
|
||||
}
|
||||
|
||||
pub fn add_chunk_mut(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
chunk_meta: &ChunkMeta,
|
||||
write_batch: &mut rocksdb::WriteBatch,
|
||||
) -> Result<()> {
|
||||
// 1. add chunk meta.
|
||||
let chunk_meta_key = MetaKey::chunk_meta_key(chunk_id);
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
chunk_meta
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.put(chunk_meta_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
// 2. add pos->chunk map.
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(chunk_meta.pos);
|
||||
write_batch.put(pos_to_chunk_key, chunk_id);
|
||||
|
||||
// 3. update group bits.
|
||||
let group_bits_key = MetaKey::group_bits_key(chunk_meta.pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::acquire(chunk_meta.pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
// 4. update used size.
|
||||
self.update_used_size(chunk_id, chunk_meta.pos.chunk_size().0 as i64, write_batch)?;
|
||||
|
||||
// 5. add timestamp->chunk map.
|
||||
let timestamp_key =
|
||||
MetaKey::timestamp_key(chunk_meta.timestamp, chunk_id, self.config.prefix_len);
|
||||
write_batch.put(timestamp_key, chunk_id);
|
||||
|
||||
// 6. remove writing chunk log.
|
||||
self.remove_writing_chunk_mut(chunk_id, write_batch);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn move_chunk(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
old_meta: &ChunkMeta,
|
||||
new_meta: &ChunkMeta,
|
||||
sync: bool,
|
||||
) -> Result<()> {
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
self.move_chunk_mut(chunk_id, old_meta, new_meta, &mut write_batch)?;
|
||||
self.write(write_batch, sync)
|
||||
}
|
||||
|
||||
pub fn move_chunk_mut(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
old_meta: &ChunkMeta,
|
||||
new_meta: &ChunkMeta,
|
||||
write_batch: &mut rocksdb::WriteBatch,
|
||||
) -> Result<()> {
|
||||
// 1. change chunk meta.
|
||||
let chunk_meta_key = MetaKey::chunk_meta_key(chunk_id);
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
new_meta
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.put(chunk_meta_key, bytes.as_slice());
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
if old_meta.pos != new_meta.pos {
|
||||
// 2. remove old pos->chunk map.
|
||||
let old_pos = old_meta.pos;
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(old_pos);
|
||||
write_batch.delete(pos_to_chunk_key);
|
||||
|
||||
let group_bits_key = MetaKey::group_bits_key(old_pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::release(old_pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
// 3. add new pos->chunk map.
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(new_meta.pos);
|
||||
write_batch.put(pos_to_chunk_key, chunk_id);
|
||||
|
||||
let group_bits_key = MetaKey::group_bits_key(new_meta.pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::acquire(new_meta.pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
// 4. update used size.
|
||||
self.update_used_size(
|
||||
chunk_id,
|
||||
new_meta.pos.chunk_size().0 as i64 - old_pos.chunk_size().0 as i64,
|
||||
write_batch,
|
||||
)?;
|
||||
}
|
||||
|
||||
// 5. update timestamp->chunk map.
|
||||
self.check_chunk_id(chunk_id)?;
|
||||
let timestamp_key =
|
||||
MetaKey::timestamp_key(new_meta.timestamp, chunk_id, self.config.prefix_len);
|
||||
write_batch.put(timestamp_key, []);
|
||||
let timestamp_key =
|
||||
MetaKey::timestamp_key(old_meta.timestamp, chunk_id, self.config.prefix_len);
|
||||
write_batch.delete(timestamp_key);
|
||||
|
||||
// 6. remove writing chunk log.
|
||||
self.remove_writing_chunk_mut(chunk_id, write_batch);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn remove(&self, chunk_id: &[u8], chunk_meta: &ChunkMeta, sync: bool) -> Result<()> {
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
self.remove_mut(chunk_id, chunk_meta, &mut write_batch)?;
|
||||
self.write(write_batch, sync)
|
||||
}
|
||||
|
||||
pub fn remove_mut(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
chunk_meta: &ChunkMeta,
|
||||
write_batch: &mut rocksdb::WriteBatch,
|
||||
) -> Result<()> {
|
||||
// 1. delete chunk meta.
|
||||
let chunk_meta_key = MetaKey::chunk_meta_key(chunk_id);
|
||||
write_batch.delete(chunk_meta_key);
|
||||
|
||||
// 2. delete pos->chunk map.
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(chunk_meta.pos);
|
||||
write_batch.delete(pos_to_chunk_key);
|
||||
|
||||
// 3. release position.
|
||||
let group_bits_key = MetaKey::group_bits_key(chunk_meta.pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::release(chunk_meta.pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
// 4. update used size.
|
||||
self.update_used_size(
|
||||
chunk_id,
|
||||
-(chunk_meta.pos.chunk_size().0 as i64),
|
||||
write_batch,
|
||||
)?;
|
||||
|
||||
// 5. delete timestamp->chunk map.
|
||||
let timestamp_key =
|
||||
MetaKey::timestamp_key(chunk_meta.timestamp, chunk_id, self.config.prefix_len);
|
||||
write_batch.delete(timestamp_key);
|
||||
|
||||
// 6. remove writing chunk log.
|
||||
self.remove_writing_chunk_mut(chunk_id, write_batch);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn allocate_group(&self, group_id: GroupId) -> Result<()> {
|
||||
let group_bits_key = MetaKey::group_bits_key(group_id);
|
||||
self.rocksdb
|
||||
.put(group_bits_key, GroupState::empty().as_bytes(), true)
|
||||
}
|
||||
|
||||
pub fn remove_group(&self, group_id: GroupId) -> Result<()> {
|
||||
let group_bits_key = MetaKey::group_bits_key(group_id);
|
||||
self.rocksdb.delete(group_bits_key, true)
|
||||
}
|
||||
|
||||
pub fn iterator(&self) -> RocksDBIterator {
|
||||
self.rocksdb.new_iterator()
|
||||
}
|
||||
|
||||
fn update_used_size(
|
||||
&self,
|
||||
chunk_id: &[u8],
|
||||
diff: i64,
|
||||
write_batch: &mut rocksdb::WriteBatch,
|
||||
) -> Result<()> {
|
||||
self.check_chunk_id(chunk_id)?;
|
||||
let used_size_key = MetaKey::used_size_key(&chunk_id[..self.config.prefix_len]);
|
||||
write_batch.merge(used_size_key, diff.to_le_bytes());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn persist_writing_chunk(&self, chunk_id: &[u8], chunk_meta: &ChunkMeta) -> Result<()> {
|
||||
let chunk_meta_key = MetaKey::writing_chunk_key(chunk_id);
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
chunk_meta
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
self.rocksdb.put(chunk_meta_key, &bytes[..], true)
|
||||
})
|
||||
}
|
||||
|
||||
pub fn remove_writing_chunk_mut(&self, chunk_id: &[u8], write_batch: &mut rocksdb::WriteBatch) {
|
||||
write_batch.delete(MetaKey::writing_chunk_key(chunk_id));
|
||||
}
|
||||
|
||||
pub fn occupy_uncommitted_positions(&mut self) -> Result<Vec<(Bytes, ChunkMeta, bool)>> {
|
||||
let mut prefix_len = 0;
|
||||
std::mem::swap(&mut self.config.prefix_len, &mut prefix_len);
|
||||
let list = self.query_uncommitted_chunks(&[])?;
|
||||
std::mem::swap(&mut self.config.prefix_len, &mut prefix_len);
|
||||
|
||||
let mut uncommitted_chunks = vec![];
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
let mut count = 0;
|
||||
for (chunk_id, writing_meta) in list {
|
||||
let pos = writing_meta.pos;
|
||||
match self.get_chunk_meta(&chunk_id)? {
|
||||
Some(meta) if meta.pos == writing_meta.pos => {
|
||||
uncommitted_chunks.push((chunk_id, writing_meta, false));
|
||||
}
|
||||
_ => {
|
||||
uncommitted_chunks.push((chunk_id.clone(), writing_meta, true));
|
||||
|
||||
count += 1;
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(pos);
|
||||
write_batch.put(pos_to_chunk_key, chunk_id);
|
||||
|
||||
let group_bits_key = MetaKey::group_bits_key(pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::acquire(pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
}
|
||||
}
|
||||
}
|
||||
if !uncommitted_chunks.is_empty() {
|
||||
self.write(write_batch, true)?;
|
||||
tracing::info!("occupy {} positions for writing chunks", count);
|
||||
}
|
||||
Ok(uncommitted_chunks)
|
||||
}
|
||||
|
||||
pub fn vacate_uncommitted_positions(
|
||||
&self,
|
||||
uncommitted_chunks: Vec<(Bytes, ChunkMeta, bool)>,
|
||||
) -> Result<()> {
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
let mut count = 0;
|
||||
for (_, chunk_meta, occupied) in uncommitted_chunks {
|
||||
if !occupied {
|
||||
continue;
|
||||
}
|
||||
|
||||
count += 1;
|
||||
let pos_to_chunk_key = MetaKey::pos_to_chunk_key(chunk_meta.pos);
|
||||
write_batch.delete(pos_to_chunk_key);
|
||||
|
||||
let group_bits_key = MetaKey::group_bits_key(chunk_meta.pos.group_id());
|
||||
Self::with_tls_bytes(|bytes| {
|
||||
MergeState::release(chunk_meta.pos.index())
|
||||
.serialize_to(bytes)
|
||||
.map_err(Error::SerializationError)?;
|
||||
write_batch.merge(group_bits_key, &bytes[..]);
|
||||
Ok(())
|
||||
})?;
|
||||
}
|
||||
self.write(write_batch, true)?;
|
||||
tracing::info!("vacate {} positions for writing chunks", count);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn query_uncommitted_chunks(&self, prefix: &[u8]) -> Result<Vec<(Bytes, ChunkMeta)>> {
|
||||
self.check_prefix(prefix)?;
|
||||
|
||||
let mut it = self.iterator();
|
||||
let mut out = Vec::<(Bytes, ChunkMeta)>::with_capacity(4096);
|
||||
|
||||
let end_key = MetaKey::writing_chunk_key(prefix);
|
||||
it.seek(&end_key)?;
|
||||
|
||||
if it.key() == Some(end_key.as_ref()) {
|
||||
it.next(); // [begin, end)
|
||||
}
|
||||
|
||||
loop {
|
||||
if !it.valid() {
|
||||
break;
|
||||
}
|
||||
|
||||
if it.key().unwrap()[0] != MetaKey::WRITING_CHUNK_KEY_PREFIX {
|
||||
break;
|
||||
}
|
||||
|
||||
let chunk_id = MetaKey::parse_writing_chunk_key(it.key().unwrap())?;
|
||||
if prefix <= chunk_id.as_ref() {
|
||||
let chunk_meta = ChunkMeta::deserialize(it.value().unwrap())
|
||||
.map_err(Error::SerializationError)?;
|
||||
out.push((chunk_id, chunk_meta))
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
|
||||
it.next();
|
||||
}
|
||||
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
fn check_chunk_id(&self, chunk_id: &[u8]) -> Result<()> {
|
||||
let prefix_len = self.config.prefix_len;
|
||||
if chunk_id.len() < prefix_len {
|
||||
return Err(Error::InvalidArg(format!(
|
||||
"chunk_id.len() < prefix len: {:?}, {}",
|
||||
chunk_id, prefix_len
|
||||
)));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn check_prefix(&self, prefix: &[u8]) -> Result<()> {
|
||||
let prefix_len = self.config.prefix_len;
|
||||
if prefix.len() != prefix_len {
|
||||
return Err(Error::InvalidArg(format!(
|
||||
"prefix.len() != prefix len: {:?}, {}",
|
||||
prefix, prefix_len
|
||||
)));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn query_used_size(&self, prefix: &[u8]) -> Result<u64> {
|
||||
self.check_prefix(prefix)?;
|
||||
|
||||
let used_size_key = MetaKey::used_size_key(prefix);
|
||||
let value = self.rocksdb.get(used_size_key)?;
|
||||
if let Some(size) = value {
|
||||
if size.len() != std::mem::size_of::<u64>() {
|
||||
Err(Error::InvalidArg(format!(
|
||||
"invalid size length: {:?}",
|
||||
size.as_ref()
|
||||
)))
|
||||
} else {
|
||||
Ok(LittleEndian::read_u64(size.as_ref()))
|
||||
}
|
||||
} else {
|
||||
Ok(0)
|
||||
}
|
||||
}
|
||||
|
||||
fn with_tls_bytes<F, R>(f: F) -> R
|
||||
where
|
||||
F: FnOnce(&mut DownwardBytes) -> R,
|
||||
{
|
||||
Self::BYTES.with(|v| {
|
||||
let mut bytes = v.borrow_mut();
|
||||
let result = f(bytes.deref_mut());
|
||||
bytes.clear_and_shrink_to(Size::MB.into());
|
||||
result
|
||||
})
|
||||
}
|
||||
|
||||
fn update_used_size_if_need(&mut self) -> Result<()> {
|
||||
let old_len = match self.rocksdb.get(MetaKey::used_size_prefix_len_key())? {
|
||||
Some(value) => {
|
||||
if value.len() != std::mem::size_of::<u32>() {
|
||||
return Err(Error::InvalidArg(format!(
|
||||
"invalid used size prefix length: {:?}",
|
||||
value.as_ref()
|
||||
)));
|
||||
}
|
||||
LittleEndian::read_u32(value.as_ref()) as usize
|
||||
}
|
||||
None => 0,
|
||||
};
|
||||
|
||||
let prefix_len = self.config.prefix_len;
|
||||
if old_len == prefix_len {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let mut map = HashMap::<Bytes, u64>::new();
|
||||
if prefix_len == 0 {
|
||||
map.insert(Bytes::new(), 0);
|
||||
}
|
||||
let mut it = self.iterator();
|
||||
it.iterate(MetaKey::chunk_meta_key_prefix(), |key, value| {
|
||||
let mut chunk_id = MetaKey::parse_chunk_meta_key(key);
|
||||
chunk_id.resize(prefix_len, 0);
|
||||
let chunk_meta = ChunkMeta::deserialize(value).map_err(Error::SerializationError)?;
|
||||
let chunk_size = chunk_meta.pos.chunk_size().0;
|
||||
map.entry(chunk_id)
|
||||
.and_modify(|v| *v += chunk_size)
|
||||
.or_insert(chunk_size);
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
write_batch.put(
|
||||
MetaKey::used_size_prefix_len_key(),
|
||||
(prefix_len as u32).to_le_bytes(),
|
||||
);
|
||||
for (prefix, size) in map {
|
||||
write_batch.put(MetaKey::used_size_key(&prefix), size.to_le_bytes())
|
||||
}
|
||||
self.write(write_batch, true)
|
||||
}
|
||||
|
||||
pub const V1_FIX_TIMESTAMP: u8 = 1;
|
||||
pub const LATEST_VERSION: u8 = Self::V1_FIX_TIMESTAMP;
|
||||
|
||||
pub fn get_version(&self) -> Result<u8> {
|
||||
match self.rocksdb.get(MetaKey::version_key())? {
|
||||
Some(value) if !value.is_empty() => Ok(value[0]),
|
||||
Some(value) => Err(Error::InvalidArg(format!(
|
||||
"invalid version: {:?}",
|
||||
value.as_ref()
|
||||
))),
|
||||
None => Ok(0),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn set_version(&self, version: u8) -> Result<()> {
|
||||
self.rocksdb.put(MetaKey::version_key(), [version], true)
|
||||
}
|
||||
|
||||
pub fn remove_range_mut(
|
||||
&self,
|
||||
prefix: u8,
|
||||
write_batch: &mut rocksdb::WriteBatch,
|
||||
) -> Result<()> {
|
||||
if prefix == MetaKey::CHUNK_META_KEY_PREFIX
|
||||
|| prefix == MetaKey::GROUP_BITS_KEY_PREFIX
|
||||
|| prefix == MetaKey::POS_TO_CHUNK_KEY_PREFIX
|
||||
|| prefix == MetaKey::USED_SIZE_KEY_PREFIX
|
||||
|| prefix == MetaKey::USED_SIZE_PREFIX_LEN_KEY
|
||||
{
|
||||
return Err(Error::InvalidArg(format!(
|
||||
"invalid remove range: {}",
|
||||
prefix
|
||||
)));
|
||||
}
|
||||
|
||||
write_batch.delete_range(&[prefix], &[prefix + 1]);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_meta_store_normal() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
|
||||
let chunk_id = "1000".as_bytes();
|
||||
let chunk_meta = meta_store.get_chunk_meta(chunk_id).unwrap();
|
||||
assert!(chunk_meta.is_none());
|
||||
|
||||
let chunk_meta_in = ChunkMeta {
|
||||
chunk_ver: 1,
|
||||
..Default::default()
|
||||
};
|
||||
meta_store
|
||||
.add_chunk(chunk_id, &chunk_meta_in, false)
|
||||
.unwrap();
|
||||
|
||||
let chunk_id = "1000".as_bytes();
|
||||
let chunk_meta_out = meta_store.get_chunk_meta(chunk_id).unwrap().unwrap();
|
||||
assert_eq!(chunk_meta_in, chunk_meta_out);
|
||||
assert_eq!(meta_store.query_chunks([], "100", 10).unwrap().len(), 1);
|
||||
|
||||
meta_store.remove(chunk_id, &chunk_meta_out, false).unwrap();
|
||||
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
meta_store
|
||||
.remove_range_mut(MetaKey::CHUNK_META_KEY_PREFIX, &mut write_batch)
|
||||
.unwrap_err();
|
||||
|
||||
meta_store
|
||||
.rocksdb
|
||||
.put(MetaKey::version_key(), &[], false)
|
||||
.unwrap();
|
||||
meta_store.get_version().unwrap_err();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_meta_get_set() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
|
||||
let group_id = GroupId::default();
|
||||
let mut chunk_meta = ChunkMeta::default();
|
||||
for i in 0..128u32 {
|
||||
chunk_meta.pos = Position::new(group_id, 2 * i as u8);
|
||||
meta_store
|
||||
.add_chunk(&i.to_be_bytes(), &chunk_meta, false)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
let vec = meta_store
|
||||
.query_chunks(10u32.to_be_bytes(), 20u32.to_be_bytes(), 30)
|
||||
.unwrap();
|
||||
assert_eq!(vec.len(), 10);
|
||||
assert_eq!(vec.first().unwrap().0.as_ref(), &19u32.to_be_bytes());
|
||||
assert_eq!(vec.last().unwrap().0.as_ref(), &10u32.to_be_bytes());
|
||||
|
||||
let vec = meta_store
|
||||
.query_chunks(80u32.to_be_bytes(), 100u32.to_be_bytes(), 30)
|
||||
.unwrap();
|
||||
assert_eq!(vec.len(), 20);
|
||||
|
||||
let mut it = meta_store.iterator();
|
||||
let mut count = 0;
|
||||
it.iterate(MetaKey::group_bits_key_prefix(), |_key, value| {
|
||||
count += 1;
|
||||
let bits = GroupState::from(value)?;
|
||||
assert_eq!(bits.count(), 128);
|
||||
for i in 0..128 {
|
||||
assert!(bits.check(i * 2));
|
||||
assert!(!bits.check(i * 2 + 1));
|
||||
}
|
||||
Ok(())
|
||||
})
|
||||
.unwrap();
|
||||
assert_eq!(count, 1);
|
||||
|
||||
for i in 0..128u32 {
|
||||
chunk_meta.pos = Position::new(group_id, 1 + 2 * i as u8);
|
||||
meta_store
|
||||
.add_chunk(&i.to_be_bytes(), &chunk_meta, false)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
let mut it = meta_store.iterator();
|
||||
let mut count = 0;
|
||||
it.iterate(MetaKey::group_bits_key_prefix(), |_key, value| {
|
||||
count += 1;
|
||||
let bits = GroupState::from(value)?;
|
||||
assert_eq!(bits.count(), 256);
|
||||
assert!(bits.is_full());
|
||||
Ok(())
|
||||
})
|
||||
.unwrap();
|
||||
assert_eq!(count, 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_meta_store_open_failed() {
|
||||
let config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: "/proc/test".into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
assert!(MetaStore::open(&config).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_meta_store_update_used_size() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
prefix_len: 4,
|
||||
};
|
||||
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
|
||||
let chunk_id = [0, 1, 2, 3];
|
||||
let group_id = GroupId::default();
|
||||
let chunk_meta = ChunkMeta {
|
||||
pos: Position::new(group_id, 0_u8),
|
||||
..Default::default()
|
||||
};
|
||||
meta_store
|
||||
.add_chunk(&chunk_id[..3], &chunk_meta, false)
|
||||
.unwrap_err();
|
||||
meta_store.add_chunk(&chunk_id, &chunk_meta, false).unwrap();
|
||||
|
||||
meta_store.query_used_size(&chunk_id[..3]).unwrap_err();
|
||||
assert_eq!(
|
||||
meta_store.query_used_size(&chunk_id).unwrap(),
|
||||
CHUNK_SIZE_NORMAL
|
||||
);
|
||||
assert_eq!(meta_store.query_used_size(&0u32.to_le_bytes()).unwrap(), 0);
|
||||
|
||||
meta_store
|
||||
.query_chunks_by_timestamp(&0u32.to_le_bytes(), 0, u64::MAX, u64::MAX)
|
||||
.unwrap();
|
||||
|
||||
meta_store.remove(&chunk_id, &chunk_meta, false).unwrap();
|
||||
assert_eq!(meta_store.query_used_size(&chunk_id).unwrap(), 0);
|
||||
|
||||
let key = MetaKey::used_size_key(&chunk_id);
|
||||
meta_store.rocksdb.put(key, [], false).unwrap();
|
||||
meta_store.query_used_size(&chunk_id).unwrap_err();
|
||||
|
||||
meta_store
|
||||
.rocksdb
|
||||
.put(MetaKey::used_size_prefix_len_key(), [233], false)
|
||||
.unwrap();
|
||||
drop(meta_store);
|
||||
assert!(MetaStore::open(&config).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_meta_store_update_used_size_prefix_len() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let mut config = MetaStoreConfig {
|
||||
rocksdb: RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
},
|
||||
prefix_len: 0,
|
||||
};
|
||||
|
||||
const N: u64 = 1024;
|
||||
let start = ChunkMeta::now();
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
for i in 0..N {
|
||||
let chunk_id = i.to_le_bytes();
|
||||
let id = i as u8;
|
||||
let chunk_size = if id % 2 == 0 {
|
||||
CHUNK_SIZE_NORMAL
|
||||
} else {
|
||||
CHUNK_SIZE_SMALL
|
||||
};
|
||||
|
||||
let pos = Position::new(GroupId::new(chunk_size, 0, 0), id);
|
||||
let meta = ChunkMeta {
|
||||
pos,
|
||||
..Default::default()
|
||||
};
|
||||
meta_store.add_chunk(&chunk_id, &meta, false).unwrap();
|
||||
}
|
||||
|
||||
let size = meta_store.query_used_size(&[]).unwrap();
|
||||
assert_eq!(size, N / 2 * (CHUNK_SIZE_NORMAL.0 + CHUNK_SIZE_SMALL.0));
|
||||
|
||||
let mut write_batch = RocksDB::new_write_batch();
|
||||
write_batch.put("m", "m");
|
||||
meta_store.write(write_batch, false).unwrap();
|
||||
|
||||
let end = ChunkMeta::now();
|
||||
let vec = meta_store
|
||||
.query_chunks_by_timestamp(&[], 0, start, u64::MAX)
|
||||
.unwrap();
|
||||
assert!(vec.is_empty());
|
||||
let vec = meta_store
|
||||
.query_chunks_by_timestamp(&[], start, end + 1, u64::MAX)
|
||||
.unwrap();
|
||||
assert_eq!(vec.len(), N as usize);
|
||||
|
||||
drop(meta_store);
|
||||
|
||||
config.prefix_len = 1;
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
for i in 0..=u8::MAX {
|
||||
let size = meta_store.query_used_size(&[i]).unwrap();
|
||||
if i % 2 == 0 {
|
||||
assert_eq!(size, N / 256 * CHUNK_SIZE_NORMAL.0);
|
||||
} else {
|
||||
assert_eq!(size, N / 256 * CHUNK_SIZE_SMALL.0);
|
||||
}
|
||||
}
|
||||
|
||||
for i in 0..N {
|
||||
let chunk_id = i.to_le_bytes();
|
||||
let meta = meta_store.get_chunk_meta(&chunk_id).unwrap().unwrap();
|
||||
meta_store.remove(&chunk_id, &meta, false).unwrap();
|
||||
}
|
||||
for i in 0..=u8::MAX {
|
||||
let size = meta_store.query_used_size(&[i]).unwrap();
|
||||
assert_eq!(size, 0);
|
||||
}
|
||||
|
||||
drop(meta_store);
|
||||
config.prefix_len = 0;
|
||||
let meta_store = MetaStore::open(&config).unwrap();
|
||||
let size = meta_store.query_used_size(&[]).unwrap();
|
||||
assert_eq!(size, 0);
|
||||
}
|
||||
}
|
||||
9
src/storage/chunk_engine/src/meta/mod.rs
Normal file
9
src/storage/chunk_engine/src/meta/mod.rs
Normal file
@@ -0,0 +1,9 @@
|
||||
mod meta_key;
|
||||
mod meta_merge;
|
||||
mod meta_store;
|
||||
mod rocksdb;
|
||||
|
||||
pub use meta_key::*;
|
||||
pub use meta_merge::*;
|
||||
pub use meta_store::*;
|
||||
pub use rocksdb::*;
|
||||
314
src/storage/chunk_engine/src/meta/rocksdb.rs
Normal file
314
src/storage/chunk_engine/src/meta/rocksdb.rs
Normal file
@@ -0,0 +1,314 @@
|
||||
use crate::{Error, Result, Size};
|
||||
use std::path::PathBuf;
|
||||
|
||||
#[derive(Debug, Default, Clone)]
|
||||
pub struct RocksDBConfig {
|
||||
pub path: PathBuf,
|
||||
pub create: bool,
|
||||
pub read_only: bool,
|
||||
}
|
||||
|
||||
pub struct RocksDB {
|
||||
db: rocksdb::DB,
|
||||
write_options: [rocksdb::WriteOptions; 2], // 0 for non-sync, 1 for sync.
|
||||
}
|
||||
|
||||
pub trait MergeOp {
|
||||
fn full_merge<'a>(
|
||||
key: &[u8],
|
||||
value: Option<&[u8]>,
|
||||
operands: impl Iterator<Item = &'a [u8]>,
|
||||
) -> Option<Vec<u8>>;
|
||||
|
||||
fn partial_merge<'a>(key: &[u8], operands: impl Iterator<Item = &'a [u8]>) -> Option<Vec<u8>>;
|
||||
}
|
||||
|
||||
impl RocksDB {
|
||||
pub fn open<T: MergeOp + 'static>(config: &RocksDBConfig) -> Result<Self> {
|
||||
let mut db_options = rocksdb::Options::default();
|
||||
db_options.create_if_missing(config.create);
|
||||
db_options.set_merge_operator(
|
||||
"merge",
|
||||
|key, value, operands| T::full_merge(key, value, operands.iter()),
|
||||
|key, _value, operands| T::partial_merge(key, operands.iter()),
|
||||
);
|
||||
|
||||
let mut table_options = rocksdb::BlockBasedOptions::default();
|
||||
table_options.set_bloom_filter(10.0, true);
|
||||
db_options.set_block_based_table_factory(&table_options);
|
||||
|
||||
let db = if config.read_only {
|
||||
rocksdb::DB::open_for_read_only(&db_options, &config.path, false)
|
||||
} else {
|
||||
rocksdb::DB::open(&db_options, &config.path)
|
||||
}
|
||||
.map_err(|err| Error::RocksDBError(format!("open rocksdb fail: {:?}", err)))?;
|
||||
|
||||
let mut sync_write_options = rocksdb::WriteOptions::new();
|
||||
sync_write_options.set_sync(true);
|
||||
|
||||
Ok(Self {
|
||||
db,
|
||||
write_options: [rocksdb::WriteOptions::new(), sync_write_options],
|
||||
})
|
||||
}
|
||||
|
||||
pub fn get(&self, key: impl AsRef<[u8]>) -> Result<Option<rocksdb::DBPinnableSlice>> {
|
||||
match self.db.get_pinned(key) {
|
||||
Ok(v) => Ok(v),
|
||||
Err(e) => Err(Error::RocksDBError(format!("RocksDB fail: {e:?}"))),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn put(&self, key: impl AsRef<[u8]>, value: impl AsRef<[u8]>, sync: bool) -> Result<()> {
|
||||
match self
|
||||
.db
|
||||
.put_opt(key, value, &self.write_options[sync as usize])
|
||||
{
|
||||
Ok(v) => Ok(v),
|
||||
Err(e) => Err(Error::RocksDBError(format!("RocksDB fail: {e:?}"))),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn delete(&self, key: impl AsRef<[u8]>, sync: bool) -> Result<()> {
|
||||
match self.db.delete_opt(key, &self.write_options[sync as usize]) {
|
||||
Ok(v) => Ok(v),
|
||||
Err(e) => Err(Error::RocksDBError(format!("RocksDB fail: {e:?}"))),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn new_write_batch() -> rocksdb::WriteBatch {
|
||||
rocksdb::WriteBatch::default()
|
||||
}
|
||||
|
||||
pub fn write(&self, batch: rocksdb::WriteBatch, sync: bool) -> Result<()> {
|
||||
match self.db.write_opt(batch, &self.write_options[sync as usize]) {
|
||||
Ok(v) => Ok(v),
|
||||
Err(e) => Err(Error::RocksDBError(format!("RocksDB fail: {e:?}"))),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn new_iterator(&self) -> RocksDBIterator {
|
||||
let mut read_options = rocksdb::ReadOptions::default();
|
||||
read_options.set_readahead_size(Size::mebibyte(4).into());
|
||||
RocksDBIterator(self.db.raw_iterator_opt(read_options))
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for RocksDB {
|
||||
fn drop(&mut self) {
|
||||
tracing::info!("RocksDB {:?} is closing...", self.db);
|
||||
}
|
||||
}
|
||||
|
||||
pub struct RocksDBIterator<'a>(rocksdb::DBRawIterator<'a>);
|
||||
|
||||
impl RocksDBIterator<'_> {
|
||||
pub fn iterate<P, Fn>(&mut self, prefix: P, mut func: Fn) -> Result<u32>
|
||||
where
|
||||
P: AsRef<[u8]>,
|
||||
Fn: FnMut(&[u8], &[u8]) -> Result<()>,
|
||||
{
|
||||
let it = &mut self.0;
|
||||
it.seek(prefix.as_ref());
|
||||
let mut count = 0;
|
||||
while it.valid() && it.key().unwrap().starts_with(prefix.as_ref()) {
|
||||
func(it.key().unwrap(), it.value().unwrap_or(&[]))?;
|
||||
it.next();
|
||||
count += 1;
|
||||
}
|
||||
self.status()?;
|
||||
Ok(count)
|
||||
}
|
||||
|
||||
pub fn seek<P>(&mut self, prefix: P) -> Result<()>
|
||||
where
|
||||
P: AsRef<[u8]>,
|
||||
{
|
||||
self.0.seek(prefix.as_ref());
|
||||
self.status()
|
||||
}
|
||||
|
||||
pub fn valid(&self) -> bool {
|
||||
self.0.valid()
|
||||
}
|
||||
|
||||
pub fn status(&self) -> Result<()> {
|
||||
self.0
|
||||
.status()
|
||||
.map_err(|e| Error::RocksDBError(e.to_string()))
|
||||
}
|
||||
|
||||
pub fn next(&mut self) {
|
||||
self.0.next();
|
||||
}
|
||||
|
||||
pub fn key(&self) -> Option<&[u8]> {
|
||||
self.0.key()
|
||||
}
|
||||
|
||||
pub fn value(&self) -> Option<&[u8]> {
|
||||
self.0.value()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
#[test]
|
||||
fn test_rocksdb_create_get_set() {
|
||||
use super::super::*;
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let rocksdb = RocksDB::open::<MetaMergeOp>(&config).unwrap();
|
||||
|
||||
let value = rocksdb.get("merry".as_bytes()).unwrap();
|
||||
assert!(value.is_none());
|
||||
|
||||
rocksdb
|
||||
.put("merry".as_bytes(), "world".as_bytes(), false)
|
||||
.unwrap();
|
||||
|
||||
let value = rocksdb.get("merry".as_bytes()).unwrap();
|
||||
assert_eq!(value.as_deref(), Some("world".as_bytes()));
|
||||
|
||||
let mut batch = RocksDB::new_write_batch();
|
||||
batch.put("merry", "RocksDB");
|
||||
batch.put("peace", "love");
|
||||
rocksdb.write(batch, false).unwrap();
|
||||
|
||||
let value = rocksdb.get("merry".as_bytes()).unwrap();
|
||||
assert_eq!(value.as_deref(), Some("RocksDB".as_bytes()));
|
||||
let value = rocksdb.get("peace".as_bytes()).unwrap();
|
||||
assert_eq!(value.as_deref(), Some("love".as_bytes()));
|
||||
|
||||
let mut batch = RocksDB::new_write_batch();
|
||||
batch.merge("merry", "1");
|
||||
batch.merge("merry", "2");
|
||||
for i in 0..16 {
|
||||
batch.merge("merge", format!("{i}"));
|
||||
}
|
||||
rocksdb.write(batch, false).unwrap();
|
||||
|
||||
let value = rocksdb.get("merry".as_bytes()).unwrap();
|
||||
assert_eq!(value.as_deref(), Some("RocksDB12".as_bytes()));
|
||||
let value = rocksdb.get("merge".as_bytes()).unwrap();
|
||||
assert_eq!(value.as_deref(), Some("0123456789101112131415".as_bytes()));
|
||||
|
||||
let mut it = rocksdb.new_iterator();
|
||||
let mut count = 0;
|
||||
let mut runner = |_: &[u8], _: &[u8]| {
|
||||
count += 1;
|
||||
crate::Result::Ok(())
|
||||
};
|
||||
|
||||
assert_eq!(it.iterate([], &mut runner).unwrap(), 3);
|
||||
assert_eq!(it.iterate("m", &mut runner).unwrap(), 2);
|
||||
assert_eq!(it.iterate("a", &mut runner).unwrap(), 0);
|
||||
assert_eq!(it.iterate("z", &mut runner).unwrap(), 0);
|
||||
|
||||
let config = RocksDBConfig {
|
||||
path: std::path::Path::new("/proc/test").into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
};
|
||||
assert!(RocksDB::open::<MetaMergeOp>(&config).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_rocksdb_parallel_write() {
|
||||
use super::super::*;
|
||||
use std::sync::Arc;
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let rocksdb = Arc::new(RocksDB::open::<MetaMergeOp>(&config).unwrap());
|
||||
|
||||
const T: usize = 16;
|
||||
const N: usize = 1000;
|
||||
let mut threads = vec![];
|
||||
for i in 0..T {
|
||||
let rocksdb = rocksdb.clone();
|
||||
threads.push(
|
||||
std::thread::Builder::new()
|
||||
.name(format!("test-{i}"))
|
||||
.spawn(move || {
|
||||
for j in 0..N {
|
||||
let value = [j as u8; 32];
|
||||
let mut batch = RocksDB::new_write_batch();
|
||||
batch.put(format!("a{}atesta", i * N + j), value);
|
||||
batch.put(format!("b{}btestb", i * N + j), value);
|
||||
batch.merge(format!("m{}mtestm", i * N + j), value);
|
||||
rocksdb.write(batch, false).unwrap();
|
||||
}
|
||||
})
|
||||
.unwrap(),
|
||||
)
|
||||
}
|
||||
|
||||
for thread in threads {
|
||||
thread.join().unwrap();
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_rocksdb_invalid_merge() {
|
||||
use super::super::*;
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
let config = RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: true,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let rocksdb = RocksDB::open::<MetaMergeOp>(&config).unwrap();
|
||||
|
||||
let mut batch = RocksDB::new_write_batch();
|
||||
batch.merge("invalid_merge", "");
|
||||
rocksdb.write(batch, false).unwrap();
|
||||
|
||||
assert!(rocksdb.get("invalid_merge").is_err());
|
||||
|
||||
let mut runner = |_: &[u8], _: &[u8]| crate::Result::Ok(());
|
||||
let mut it = rocksdb.new_iterator();
|
||||
assert!(it.iterate("invalid_merge", &mut runner).is_err());
|
||||
|
||||
assert!(it.seek("invalid_merge").is_err());
|
||||
|
||||
assert!(rocksdb.put("invalid_merge", "ok", false).is_ok());
|
||||
assert!(rocksdb.get("invalid_merge").is_ok());
|
||||
drop(it);
|
||||
|
||||
let mut it = rocksdb.new_iterator();
|
||||
assert_eq!(it.iterate("invalid_merge", &mut runner), Ok(1));
|
||||
|
||||
it.seek("invalid_merge").unwrap();
|
||||
assert!(it.valid());
|
||||
assert_eq!(it.key().unwrap(), "invalid_merge".as_bytes());
|
||||
assert_eq!(it.value().unwrap(), "ok".as_bytes());
|
||||
|
||||
it.next();
|
||||
assert!(!it.valid());
|
||||
assert!(it.status().is_ok());
|
||||
|
||||
drop(it);
|
||||
drop(rocksdb);
|
||||
let config = RocksDBConfig {
|
||||
path: dir.path().into(),
|
||||
create: false,
|
||||
read_only: true,
|
||||
};
|
||||
RocksDB::open::<MetaMergeOp>(&config).unwrap();
|
||||
}
|
||||
}
|
||||
84
src/storage/chunk_engine/src/types/chunk_meta.rs
Normal file
84
src/storage/chunk_engine/src/types/chunk_meta.rs
Normal file
@@ -0,0 +1,84 @@
|
||||
use super::super::*;
|
||||
|
||||
pub type ETag = tinyvec::TinyVec<[u8; 14]>;
|
||||
|
||||
#[derive(derse::Serialize, derse::Deserialize, Clone, PartialEq, Eq, Debug)]
|
||||
#[repr(C)]
|
||||
pub struct ChunkMeta {
|
||||
pub pos: Position,
|
||||
pub chain_ver: u32,
|
||||
pub chunk_ver: u32,
|
||||
pub len: u32,
|
||||
pub checksum: u32,
|
||||
pub timestamp: u64,
|
||||
pub last_request_id: u64,
|
||||
pub last_client_low: u64,
|
||||
pub last_client_high: u64,
|
||||
pub etag: ETag,
|
||||
pub uncommitted: bool,
|
||||
}
|
||||
|
||||
impl ChunkMeta {
|
||||
pub fn now() -> u64 {
|
||||
std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_micros() as _
|
||||
}
|
||||
|
||||
pub fn set_default_etag_if_need(&mut self) {
|
||||
if self.etag.is_empty() {
|
||||
self.etag = ETag::from(format!("{:X}", self.checksum).as_bytes());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ChunkMeta {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
pos: Position::new(GroupId::new(Size::GB, 0, 0), 0),
|
||||
chain_ver: 0,
|
||||
chunk_ver: 0,
|
||||
len: 0,
|
||||
checksum: 0,
|
||||
timestamp: Self::now(),
|
||||
last_request_id: 0,
|
||||
last_client_low: 0,
|
||||
last_client_high: 0,
|
||||
etag: Default::default(),
|
||||
uncommitted: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use derse::{Deserialize, Serialize};
|
||||
|
||||
#[test]
|
||||
fn test_chunk_meta_seralization() {
|
||||
let ser = ChunkMeta {
|
||||
pos: Position::new(GroupId::default(), 88),
|
||||
chain_ver: 1,
|
||||
chunk_ver: 1,
|
||||
len: 2,
|
||||
timestamp: 0,
|
||||
etag: ETag::from(b"hello".as_slice()),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let bytes: derse::DownwardBytes = ser.serialize().unwrap();
|
||||
assert_eq!(
|
||||
bytes.as_slice(),
|
||||
&[
|
||||
63, 88, 0, 0, 0, 0, 0, 8, 0, 1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||
0, 0, 5, b'h', b'e', b'l', b'l', b'o', 0,
|
||||
]
|
||||
);
|
||||
|
||||
let der = ChunkMeta::deserialize(&bytes[..]).unwrap();
|
||||
assert_eq!(ser, der);
|
||||
}
|
||||
}
|
||||
8
src/storage/chunk_engine/src/types/constants.rs
Normal file
8
src/storage/chunk_engine/src/types/constants.rs
Normal file
@@ -0,0 +1,8 @@
|
||||
use super::super::Size;
|
||||
|
||||
pub const CHUNK_SIZE_SMALL: Size = Size::kibibyte(64);
|
||||
pub const CHUNK_SIZE_NORMAL: Size = Size::kibibyte(512);
|
||||
pub const CHUNK_SIZE_LARGE: Size = Size::mebibyte(4);
|
||||
pub const CHUNK_SIZE_ULTRA: Size = Size::mebibyte(64);
|
||||
pub const CHUNK_SIZE_SHIFT: usize = 16; // 64KiB is 2^16
|
||||
pub const CHUNK_SIZE_NUMBER: usize = 11; // from 64KiB to 64MiB
|
||||
114
src/storage/chunk_engine/src/types/group_id.rs
Normal file
114
src/storage/chunk_engine/src/types/group_id.rs
Normal file
@@ -0,0 +1,114 @@
|
||||
use super::super::*;
|
||||
|
||||
#[derive(Copy, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
|
||||
pub struct GroupId(pub u64);
|
||||
|
||||
impl Default for GroupId {
|
||||
fn default() -> Self {
|
||||
GroupId::new(CHUNK_SIZE_NORMAL, 0, 0)
|
||||
}
|
||||
}
|
||||
|
||||
impl GroupId {
|
||||
// 32bit chunk size + 24bit group + 8bit cluster
|
||||
const SHIFT: u32 = 8;
|
||||
pub const COUNT: u32 = (1 << Self::SHIFT);
|
||||
|
||||
pub const fn new(chunk_size: Size, cluster: u8, group: u32) -> Self {
|
||||
Self(chunk_size.0 << 32 | (group << Self::SHIFT | cluster as u32) as u64)
|
||||
}
|
||||
|
||||
pub const fn chunk_size(&self) -> Size {
|
||||
Size::new(self.0 >> 32)
|
||||
}
|
||||
|
||||
pub const fn cluster(&self) -> u8 {
|
||||
self.0 as u8
|
||||
}
|
||||
|
||||
pub const fn group(&self) -> u32 {
|
||||
(self.0 as u32) >> Self::SHIFT
|
||||
}
|
||||
|
||||
pub fn offset(&self) -> Size {
|
||||
const MARKS: u64 = !(GroupId::COUNT - 1) as u64;
|
||||
self.chunk_size() * (self.0 & MARKS)
|
||||
}
|
||||
|
||||
pub fn size(&self) -> Size {
|
||||
self.chunk_size() * GroupId::COUNT as u64
|
||||
}
|
||||
|
||||
pub fn plus_one(&self) -> Self {
|
||||
Self(self.0 + 1)
|
||||
}
|
||||
|
||||
pub fn next(&mut self) {
|
||||
self.0 += 1
|
||||
}
|
||||
|
||||
pub const fn inner(&self) -> u64 {
|
||||
self.0
|
||||
}
|
||||
}
|
||||
|
||||
impl From<u64> for GroupId {
|
||||
fn from(value: u64) -> Self {
|
||||
Self(value)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<GroupId> for u64 {
|
||||
fn from(val: GroupId) -> Self {
|
||||
val.0
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Deref for GroupId {
|
||||
type Target = u64;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
&self.0
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for GroupId {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(
|
||||
f,
|
||||
"GroupId {{ chunk_size: {}, cluster: {}, group: {} }}",
|
||||
self.chunk_size(),
|
||||
self.cluster(),
|
||||
self.group(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_group_id_next() {
|
||||
let mut group_id = GroupId::default();
|
||||
|
||||
for _ in 0..1000 {
|
||||
for i in 0..=255 {
|
||||
let next = group_id.plus_one();
|
||||
if i == 255 {
|
||||
assert_eq!(group_id.chunk_size(), next.chunk_size());
|
||||
assert_eq!(0, next.cluster());
|
||||
assert_eq!(group_id.group() + 1, next.group());
|
||||
} else {
|
||||
assert_eq!(group_id.chunk_size(), next.chunk_size());
|
||||
assert_eq!(group_id.cluster() + 1, next.cluster());
|
||||
assert_eq!(group_id.group(), next.group());
|
||||
}
|
||||
group_id = next;
|
||||
}
|
||||
}
|
||||
|
||||
let value = u64::from(group_id);
|
||||
assert_eq!(value, group_id.0);
|
||||
}
|
||||
}
|
||||
163
src/storage/chunk_engine/src/types/group_state.rs
Normal file
163
src/storage/chunk_engine/src/types/group_state.rs
Normal file
@@ -0,0 +1,163 @@
|
||||
use super::super::*;
|
||||
use std::num::NonZeroU64;
|
||||
|
||||
type Item = u64;
|
||||
type Bits = [Item; 4];
|
||||
|
||||
#[derive(Debug, PartialEq, Copy, Clone)]
|
||||
pub struct GroupState {
|
||||
bits: Bits,
|
||||
count: u32,
|
||||
}
|
||||
|
||||
impl GroupState {
|
||||
const TOTAL_BYTES: usize = std::mem::size_of::<Bits>();
|
||||
pub const TOTAL_BITS: usize = 8 * Self::TOTAL_BYTES;
|
||||
pub const ITEM_BITS: u8 = 8 * std::mem::size_of::<Item>() as u8;
|
||||
pub const LEN: usize = Self::TOTAL_BYTES / std::mem::size_of::<Item>();
|
||||
pub const LEVELS: usize = 4;
|
||||
|
||||
pub fn from(value: &[u8]) -> Result<Self> {
|
||||
let mut out = Self::empty();
|
||||
if value.len() != Self::TOTAL_BYTES {
|
||||
return Err(Error::MetaError(format!(
|
||||
"group state load bytes {} != {}",
|
||||
value.len(),
|
||||
Self::TOTAL_BYTES
|
||||
)));
|
||||
}
|
||||
out.as_mut_bytes().copy_from_slice(value);
|
||||
out.count = out.bits.iter().map(|b| b.count_ones()).sum();
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
pub const fn empty() -> Self {
|
||||
Self {
|
||||
bits: [0; Self::LEN],
|
||||
count: 0,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn full() -> Self {
|
||||
Self {
|
||||
bits: [!0; Self::LEN],
|
||||
count: Self::TOTAL_BITS as u32,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.count == 0
|
||||
}
|
||||
|
||||
pub fn is_full(&self) -> bool {
|
||||
self.count == Self::TOTAL_BITS as u32
|
||||
}
|
||||
|
||||
pub fn allocate(&mut self) -> Option<u8> {
|
||||
for (i, v) in self.bits.iter_mut().enumerate() {
|
||||
if let Some(mark) = NonZeroU64::new(!*v) {
|
||||
let idx = mark.trailing_zeros();
|
||||
*v |= 1 << idx;
|
||||
self.count += 1;
|
||||
return Some(i as u8 * Self::ITEM_BITS + idx as u8);
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
pub fn count(&self) -> u32 {
|
||||
self.count
|
||||
}
|
||||
|
||||
pub fn level(&self) -> u32 {
|
||||
self.count() / (Self::TOTAL_BITS / Self::LEVELS) as u32
|
||||
}
|
||||
|
||||
pub fn check(&self, index: u8) -> bool {
|
||||
let x = index / Self::ITEM_BITS;
|
||||
let y = index % Self::ITEM_BITS;
|
||||
self.bits[x as usize] & (1 << y) != 0
|
||||
}
|
||||
|
||||
pub fn deallocate(&mut self, index: u8) -> Result<()> {
|
||||
let x = index / Self::ITEM_BITS;
|
||||
let y = index % Self::ITEM_BITS;
|
||||
let mark = &mut self.bits[x as usize];
|
||||
if *mark & (1 << y) != 0 {
|
||||
*mark ^= 1 << y;
|
||||
self.count -= 1;
|
||||
Ok(())
|
||||
} else {
|
||||
Err(Error::MetaError(format!(
|
||||
"group state deallocate fail: index {}",
|
||||
index
|
||||
)))
|
||||
}
|
||||
}
|
||||
|
||||
pub fn update(&mut self, merge_bits: &MergeState) {
|
||||
for pos in &merge_bits.acquire {
|
||||
let x = pos / Self::ITEM_BITS;
|
||||
let y = pos % Self::ITEM_BITS;
|
||||
self.bits[x as usize] |= 1 << y;
|
||||
}
|
||||
for pos in &merge_bits.release {
|
||||
let x = pos / Self::ITEM_BITS;
|
||||
let y = pos % Self::ITEM_BITS;
|
||||
self.bits[x as usize] &= !(1 << y);
|
||||
}
|
||||
self.count = self.bits.iter().map(|b| b.count_ones()).sum();
|
||||
}
|
||||
|
||||
pub fn as_bytes(&self) -> &[u8; Self::TOTAL_BYTES] {
|
||||
unsafe { std::mem::transmute(&self.bits) }
|
||||
}
|
||||
|
||||
pub fn as_mut_bytes(&mut self) -> &mut [u8; Self::TOTAL_BYTES] {
|
||||
unsafe { std::mem::transmute(&mut self.bits) }
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_group_bits_normal() {
|
||||
use rand::seq::SliceRandom;
|
||||
|
||||
let mut group_state = GroupState::empty();
|
||||
assert_eq!(group_state.count(), 0);
|
||||
|
||||
for i in 0..=255 {
|
||||
assert_eq!(i, group_state.allocate().unwrap());
|
||||
}
|
||||
assert!(group_state.allocate().is_none());
|
||||
assert_eq!(group_state.count(), 256);
|
||||
|
||||
let mut vec = (0..=255).collect::<Vec<u8>>();
|
||||
vec.shuffle(&mut rand::thread_rng());
|
||||
for i in vec {
|
||||
group_state.deallocate(i).unwrap();
|
||||
group_state.deallocate(i).unwrap_err();
|
||||
|
||||
let j = group_state.allocate().unwrap();
|
||||
group_state.deallocate(j).unwrap();
|
||||
group_state.deallocate(j).unwrap_err();
|
||||
}
|
||||
assert_eq!(group_state.count(), 0);
|
||||
|
||||
group_state.allocate().unwrap();
|
||||
group_state.allocate().unwrap();
|
||||
group_state.deallocate(0).unwrap();
|
||||
assert!(group_state.check(1));
|
||||
|
||||
let bytes = group_state.as_bytes();
|
||||
let another_state = GroupState::from(bytes).unwrap();
|
||||
assert_eq!(another_state, group_state);
|
||||
|
||||
assert!(GroupState::from(&bytes[1..]).is_err());
|
||||
}
|
||||
}
|
||||
89
src/storage/chunk_engine/src/types/merge_state.rs
Normal file
89
src/storage/chunk_engine/src/types/merge_state.rs
Normal file
@@ -0,0 +1,89 @@
|
||||
use std::collections::HashSet;
|
||||
|
||||
use super::super::*;
|
||||
use derse::Deserialize;
|
||||
|
||||
#[derive(Clone, Debug, Default, derse::Deserialize, derse::Serialize, PartialEq)]
|
||||
pub struct MergeState {
|
||||
pub acquire: HashSet<u8>,
|
||||
pub release: HashSet<u8>,
|
||||
}
|
||||
|
||||
impl MergeState {
|
||||
pub fn empty() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn from(value: &[u8]) -> Result<Self> {
|
||||
Self::deserialize(value).map_err(Error::SerializationError)
|
||||
}
|
||||
|
||||
pub fn acquire(pos: u8) -> Self {
|
||||
let mut b = Self::empty();
|
||||
b.acquire.insert(pos);
|
||||
b
|
||||
}
|
||||
|
||||
pub fn release(pos: u8) -> Self {
|
||||
let mut b = Self::empty();
|
||||
b.release.insert(pos);
|
||||
b
|
||||
}
|
||||
|
||||
pub fn merge(&mut self, right: &Self) {
|
||||
for pos in &right.acquire {
|
||||
self.acquire.insert(*pos);
|
||||
self.release.remove(pos);
|
||||
}
|
||||
for pos in &right.release {
|
||||
self.acquire.remove(pos);
|
||||
self.release.insert(*pos);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_merge_bits() {
|
||||
fn group_bits_apply(mut bits: GroupState, merge_bits: &MergeState) -> GroupState {
|
||||
bits.update(merge_bits);
|
||||
bits
|
||||
}
|
||||
|
||||
let state = GroupState::empty();
|
||||
|
||||
assert_eq!(group_bits_apply(state, &MergeState::empty()), state);
|
||||
|
||||
let acquire_first_bit = MergeState::acquire(0);
|
||||
let state_after_acquire = group_bits_apply(state, &acquire_first_bit);
|
||||
assert_eq!(state_after_acquire.as_bytes()[0], 1);
|
||||
assert_eq!(state_after_acquire.as_bytes()[1..], state.as_bytes()[1..]);
|
||||
|
||||
let release_first_bit = MergeState::release(0);
|
||||
let state_after_release = group_bits_apply(state_after_acquire, &release_first_bit);
|
||||
assert_eq!(state_after_release, state);
|
||||
|
||||
let mut merge_bits = acquire_first_bit;
|
||||
merge_bits.merge(&release_first_bit);
|
||||
assert_eq!(merge_bits, release_first_bit);
|
||||
assert_eq!(state, group_bits_apply(state, &merge_bits));
|
||||
|
||||
let mut merge_bits = MergeState::empty();
|
||||
for i in 0..=255 {
|
||||
merge_bits.merge(&MergeState::acquire(i));
|
||||
}
|
||||
let full_state = group_bits_apply(state, &merge_bits);
|
||||
assert!(full_state.is_full());
|
||||
|
||||
for i in 0..=255 {
|
||||
merge_bits.merge(&MergeState::release(i));
|
||||
}
|
||||
let empty_state = group_bits_apply(full_state, &merge_bits);
|
||||
assert_eq!(empty_state, state);
|
||||
|
||||
assert!(MergeState::from(&[]).is_err());
|
||||
}
|
||||
}
|
||||
13
src/storage/chunk_engine/src/types/mod.rs
Normal file
13
src/storage/chunk_engine/src/types/mod.rs
Normal file
@@ -0,0 +1,13 @@
|
||||
mod chunk_meta;
|
||||
mod constants;
|
||||
mod group_id;
|
||||
mod group_state;
|
||||
mod merge_state;
|
||||
mod position;
|
||||
|
||||
pub use chunk_meta::*;
|
||||
pub use constants::*;
|
||||
pub use group_id::*;
|
||||
pub use group_state::*;
|
||||
pub use merge_state::*;
|
||||
pub use position::*;
|
||||
117
src/storage/chunk_engine/src/types/position.rs
Normal file
117
src/storage/chunk_engine/src/types/position.rs
Normal file
@@ -0,0 +1,117 @@
|
||||
use super::super::*;
|
||||
|
||||
use derse::{Deserialize, Deserializer, Serialize, Serializer};
|
||||
|
||||
#[derive(Copy, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
|
||||
#[repr(C)]
|
||||
pub struct Position(pub u64);
|
||||
|
||||
impl Position {
|
||||
const SHIFT: u32 = 8;
|
||||
|
||||
// 24bit chunk size + 8bit cluster + 24bit group + 8bit zero
|
||||
pub const fn new(group_id: GroupId, index: u8) -> Self {
|
||||
const CLEAN: u64 = !((GroupId::COUNT - 1) as u64);
|
||||
Self(group_id.inner() & CLEAN | index as u64 | (group_id.cluster() as u64) << 32)
|
||||
}
|
||||
|
||||
pub fn group_id(&self) -> GroupId {
|
||||
const MARKS: u64 = (GroupId::COUNT - 1) as u64;
|
||||
const CLEAN: u64 = !(MARKS | MARKS << 32);
|
||||
GroupId::from(self.0 & CLEAN | self.cluster() as u64)
|
||||
}
|
||||
|
||||
pub fn chunk_size(&self) -> Size {
|
||||
Size::new(self.0 >> 40 << 8)
|
||||
}
|
||||
|
||||
pub fn cluster(&self) -> u8 {
|
||||
(self.0 >> 32) as u8
|
||||
}
|
||||
|
||||
pub fn group(&self) -> u32 {
|
||||
(self.0 as u32) >> Self::SHIFT
|
||||
}
|
||||
|
||||
pub fn index(&self) -> u8 {
|
||||
self.0 as u8
|
||||
}
|
||||
|
||||
pub fn offset(&self) -> Size {
|
||||
self.chunk_size() * self.0 as u32 as u64
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Position {
|
||||
fn default() -> Self {
|
||||
Position::new(GroupId::default(), 0)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<u64> for Position {
|
||||
fn from(value: u64) -> Self {
|
||||
Self(value)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Deref for Position {
|
||||
type Target = u64;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
&self.0
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for Position {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(
|
||||
f,
|
||||
"Position {{ chunk_size: {}, cluster: {}, group: {}, index: {} }}",
|
||||
self.chunk_size(),
|
||||
self.cluster(),
|
||||
self.group(),
|
||||
self.index(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
impl Serialize for Position {
|
||||
fn serialize_to<T: Serializer>(&self, serializer: &mut T) -> derse::Result<()> {
|
||||
self.0.serialize_to(serializer)
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a> Deserialize<'a> for Position {
|
||||
fn deserialize_from<T: Deserializer<'a>>(buf: &mut T) -> derse::Result<Self> {
|
||||
Ok(Self(u64::deserialize_from(buf)?))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_group_id_and_position() {
|
||||
let group_id = GroupId::new(64 * Size::KB, 23, 233);
|
||||
assert_eq!(group_id.chunk_size(), 64 * Size::KB);
|
||||
assert_eq!(group_id.cluster(), 23);
|
||||
assert_eq!(group_id.group(), 233);
|
||||
assert_eq!(
|
||||
format!("{:?}", group_id),
|
||||
"GroupId { chunk_size: 64KiB, cluster: 23, group: 233 }"
|
||||
);
|
||||
|
||||
let position = Position::new(group_id, 223);
|
||||
assert_eq!(position.chunk_size(), 64 * Size::KB);
|
||||
assert_eq!(position.cluster(), 23);
|
||||
assert_eq!(position.group(), 233);
|
||||
assert_eq!(position.index(), 223);
|
||||
assert_eq!(position.group_id(), group_id);
|
||||
assert_eq!(position.to_be_bytes().len(), 8);
|
||||
assert_eq!(
|
||||
format!("{:?}", position),
|
||||
"Position { chunk_size: 64KiB, cluster: 23, group: 233, index: 223 }"
|
||||
);
|
||||
}
|
||||
}
|
||||
21
src/storage/chunk_engine/src/utils/aligned.rs
Normal file
21
src/storage/chunk_engine/src/utils/aligned.rs
Normal file
@@ -0,0 +1,21 @@
|
||||
use super::super::Size;
|
||||
|
||||
pub const ALIGN_SIZE: Size = Size::new(512);
|
||||
|
||||
pub fn create_aligned_vec(size: Size) -> Vec<u8> {
|
||||
let s: usize = size.into();
|
||||
let layout = std::alloc::Layout::from_size_align(s, ALIGN_SIZE.into()).unwrap();
|
||||
unsafe { Vec::from_raw_parts(std::alloc::alloc(layout), s, s) }
|
||||
}
|
||||
|
||||
pub fn is_aligned_buf(data: &[u8]) -> bool {
|
||||
data.as_ptr() as u64 % ALIGN_SIZE.0 == 0 && data.len() as u64 % ALIGN_SIZE.0 == 0
|
||||
}
|
||||
|
||||
pub fn is_aligned_len(len: u32) -> bool {
|
||||
len % ALIGN_SIZE.0 as u32 == 0
|
||||
}
|
||||
|
||||
pub fn is_aligned_io(data: &[u8], offset: u32) -> bool {
|
||||
is_aligned_buf(data) && is_aligned_len(offset)
|
||||
}
|
||||
1
src/storage/chunk_engine/src/utils/bytes.rs
Normal file
1
src/storage/chunk_engine/src/utils/bytes.rs
Normal file
@@ -0,0 +1 @@
|
||||
pub type Bytes = tinyvec::TinyVec<[u8; 28]>;
|
||||
15
src/storage/chunk_engine/src/utils/mod.rs
Normal file
15
src/storage/chunk_engine/src/utils/mod.rs
Normal file
@@ -0,0 +1,15 @@
|
||||
mod aligned;
|
||||
mod bytes;
|
||||
mod result;
|
||||
mod shards_map;
|
||||
mod shards_set;
|
||||
mod size;
|
||||
mod worker;
|
||||
|
||||
pub use aligned::*;
|
||||
pub use bytes::*;
|
||||
pub use result::*;
|
||||
pub use shards_map::*;
|
||||
pub use shards_set::*;
|
||||
pub use size::*;
|
||||
pub use worker::*;
|
||||
34
src/storage/chunk_engine/src/utils/result.rs
Normal file
34
src/storage/chunk_engine/src/utils/result.rs
Normal file
@@ -0,0 +1,34 @@
|
||||
#[derive(Debug, PartialEq)]
|
||||
pub enum Error {
|
||||
IoError(String),
|
||||
RocksDBError(String),
|
||||
MetaError(String),
|
||||
InvalidArg(String),
|
||||
SerializationError(derse::Error),
|
||||
ChecksumMismatch(String),
|
||||
ChainVersionMismatch(String),
|
||||
ChunkETagMismatch(String),
|
||||
ChunkAlreadyExists,
|
||||
ChunkCommittedUpdate(String),
|
||||
ChunkMissingUpdate(String),
|
||||
NoSpace,
|
||||
}
|
||||
|
||||
pub type Result<T> = std::result::Result<T, Error>;
|
||||
|
||||
impl std::fmt::Display for Error {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
std::fmt::Debug::fmt(self, f)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_error_display() {
|
||||
let error = Error::InvalidArg("invalid pos".into());
|
||||
assert_eq!(error.to_string(), r#"InvalidArg("invalid pos")"#);
|
||||
}
|
||||
}
|
||||
153
src/storage/chunk_engine/src/utils/shards_map.rs
Normal file
153
src/storage/chunk_engine/src/utils/shards_map.rs
Normal file
@@ -0,0 +1,153 @@
|
||||
use std::borrow::Borrow;
|
||||
use std::collections::{
|
||||
hash_map::{DefaultHasher, Entry},
|
||||
HashMap,
|
||||
};
|
||||
use std::hash::{Hash, Hasher};
|
||||
|
||||
pub struct ShardsMap<K, V, const S: usize = 64> {
|
||||
shards: [HashMap<K, V>; S],
|
||||
}
|
||||
|
||||
pub struct ShardsMapIter<'a, K, V> {
|
||||
array_it: std::slice::Iter<'a, HashMap<K, V>>,
|
||||
inner_it: std::collections::hash_map::Iter<'a, K, V>,
|
||||
}
|
||||
|
||||
impl<K, V, const S: usize> ShardsMap<K, V, S>
|
||||
where
|
||||
K: Eq + Hash,
|
||||
{
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
shards: [(); S].map(|_| Default::default()),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn with_capacity(capacity: usize) -> Self {
|
||||
let cap = (capacity / S).next_power_of_two();
|
||||
Self {
|
||||
shards: [(); S].map(|_| HashMap::with_capacity(cap)),
|
||||
}
|
||||
}
|
||||
|
||||
fn shard<Q>(key: &Q) -> usize
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
let mut s = DefaultHasher::new();
|
||||
key.hash(&mut s);
|
||||
s.finish() as usize % S
|
||||
}
|
||||
|
||||
pub fn get<Q>(&self, k: &Q) -> Option<&V>
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
self.shards[Self::shard(k)].get(k)
|
||||
}
|
||||
|
||||
pub fn get_mut<Q>(&mut self, k: &Q) -> Option<&mut V>
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
self.shards[Self::shard(k)].get_mut(k)
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.shards.iter().all(|m| m.is_empty())
|
||||
}
|
||||
|
||||
pub fn len(&self) -> usize {
|
||||
self.shards.iter().map(|m| m.len()).sum()
|
||||
}
|
||||
|
||||
pub fn iter(&self) -> ShardsMapIter<'_, K, V> {
|
||||
ShardsMapIter {
|
||||
array_it: self.shards[1..].iter(),
|
||||
inner_it: self.shards[0].iter(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, k: K, v: V) -> Option<V> {
|
||||
self.shards[Self::shard(&k)].insert(k, v)
|
||||
}
|
||||
|
||||
pub fn entry(&mut self, k: K) -> Entry<'_, K, V> {
|
||||
self.shards[Self::shard(&k)].entry(k)
|
||||
}
|
||||
|
||||
pub fn remove<Q>(&mut self, k: &Q) -> Option<V>
|
||||
where
|
||||
K: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
self.shards[Self::shard(k)].remove(k)
|
||||
}
|
||||
}
|
||||
|
||||
impl<K, V, const S: usize> Default for ShardsMap<K, V, S>
|
||||
where
|
||||
K: Eq + Hash,
|
||||
{
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a, K, V> Iterator for ShardsMapIter<'a, K, V> {
|
||||
type Item = (&'a K, &'a V);
|
||||
|
||||
fn next(&mut self) -> Option<Self::Item> {
|
||||
loop {
|
||||
if let Some(value) = self.inner_it.next() {
|
||||
return Some(value);
|
||||
} else if let Some(map) = self.array_it.next() {
|
||||
self.inner_it = map.iter();
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_shards_map() {
|
||||
let mut map = ShardsMap::<usize, usize, 4>::with_capacity(1024);
|
||||
assert!(map.is_empty());
|
||||
assert_eq!(map.len(), 0);
|
||||
|
||||
const N: usize = 1024;
|
||||
for i in 0..N {
|
||||
assert!(map.get(&i).is_none());
|
||||
map.insert(i, i * i);
|
||||
}
|
||||
assert!(!map.is_empty());
|
||||
assert_eq!(map.len(), N);
|
||||
|
||||
assert_eq!(
|
||||
map.iter()
|
||||
.map(|(k, v)| {
|
||||
assert_eq!(k * k, *v);
|
||||
})
|
||||
.count(),
|
||||
N
|
||||
);
|
||||
|
||||
for i in 0..N {
|
||||
let value = map.get_mut(&i).unwrap();
|
||||
assert_eq!(i * i, *value);
|
||||
map.entry(i).and_modify(|v| *v += 1);
|
||||
assert_eq!(map.remove(&i).unwrap(), i * i + 1);
|
||||
}
|
||||
|
||||
assert!(ShardsMap::<usize, usize, 4>::default().is_empty());
|
||||
}
|
||||
}
|
||||
128
src/storage/chunk_engine/src/utils/shards_set.rs
Normal file
128
src/storage/chunk_engine/src/utils/shards_set.rs
Normal file
@@ -0,0 +1,128 @@
|
||||
use std::borrow::Borrow;
|
||||
use std::collections::{hash_map::DefaultHasher, HashSet};
|
||||
use std::hash::{Hash, Hasher};
|
||||
|
||||
pub struct ShardsSet<T, const S: usize = 64> {
|
||||
shards: [HashSet<T>; S],
|
||||
}
|
||||
|
||||
pub struct ShardsSetIter<'a, T> {
|
||||
array_it: std::slice::Iter<'a, HashSet<T>>,
|
||||
inner_it: std::collections::hash_set::Iter<'a, T>,
|
||||
}
|
||||
|
||||
impl<T, const S: usize> ShardsSet<T, S>
|
||||
where
|
||||
T: Eq + Hash,
|
||||
{
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
shards: [(); S].map(|_| Default::default()),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn with_capacity(capacity: usize) -> Self {
|
||||
let cap = (capacity / S).next_power_of_two();
|
||||
Self {
|
||||
shards: [(); S].map(|_| HashSet::with_capacity(cap)),
|
||||
}
|
||||
}
|
||||
|
||||
fn shard<Q>(key: &Q) -> usize
|
||||
where
|
||||
T: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
let mut s = DefaultHasher::new();
|
||||
key.hash(&mut s);
|
||||
s.finish() as usize % S
|
||||
}
|
||||
|
||||
pub fn contains<Q>(&self, value: &Q) -> bool
|
||||
where
|
||||
T: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
self.shards[Self::shard(value)].contains(value)
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.shards.iter().all(|m| m.is_empty())
|
||||
}
|
||||
|
||||
pub fn len(&self) -> usize {
|
||||
self.shards.iter().map(|m| m.len()).sum()
|
||||
}
|
||||
|
||||
pub fn iter(&self) -> ShardsSetIter<'_, T> {
|
||||
ShardsSetIter {
|
||||
array_it: self.shards[1..].iter(),
|
||||
inner_it: self.shards[0].iter(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, value: T) -> bool {
|
||||
self.shards[Self::shard(&value)].insert(value)
|
||||
}
|
||||
|
||||
pub fn remove<Q>(&mut self, value: &Q) -> bool
|
||||
where
|
||||
T: Borrow<Q>,
|
||||
Q: Eq + Hash + ?Sized,
|
||||
{
|
||||
self.shards[Self::shard(value)].remove(value)
|
||||
}
|
||||
}
|
||||
|
||||
impl<T, const S: usize> Default for ShardsSet<T, S>
|
||||
where
|
||||
T: Eq + Hash,
|
||||
{
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a, T> Iterator for ShardsSetIter<'a, T> {
|
||||
type Item = &'a T;
|
||||
|
||||
fn next(&mut self) -> Option<Self::Item> {
|
||||
loop {
|
||||
if let Some(value) = self.inner_it.next() {
|
||||
return Some(value);
|
||||
} else if let Some(map) = self.array_it.next() {
|
||||
self.inner_it = map.iter();
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_shards_map() {
|
||||
let mut set = ShardsSet::<usize, 4>::with_capacity(1024);
|
||||
assert!(set.is_empty());
|
||||
assert_eq!(set.len(), 0);
|
||||
|
||||
const N: usize = 1024;
|
||||
for i in 0..N {
|
||||
assert!(!set.contains(&i));
|
||||
assert!(set.insert(i));
|
||||
}
|
||||
assert!(!set.is_empty());
|
||||
assert_eq!(set.len(), N);
|
||||
|
||||
for i in 0..N {
|
||||
assert!(set.contains(&i));
|
||||
assert!(set.remove(&i));
|
||||
assert!(!set.remove(&i));
|
||||
}
|
||||
|
||||
assert!(ShardsSet::<usize>::default().is_empty());
|
||||
}
|
||||
}
|
||||
283
src/storage/chunk_engine/src/utils/size.rs
Normal file
283
src/storage/chunk_engine/src/utils/size.rs
Normal file
@@ -0,0 +1,283 @@
|
||||
#[derive(Default, Copy, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
|
||||
#[repr(C)]
|
||||
pub struct Size(pub u64);
|
||||
|
||||
impl Size {
|
||||
pub const B: Size = Size::byte(1);
|
||||
pub const KB: Size = Size::kibibyte(1);
|
||||
pub const MB: Size = Size::mebibyte(1);
|
||||
pub const GB: Size = Size::gibibyte(1);
|
||||
pub const TB: Size = Size::tebibyte(1);
|
||||
|
||||
pub const fn new(v: u64) -> Size {
|
||||
Size(v)
|
||||
}
|
||||
|
||||
pub const fn zero() -> Size {
|
||||
Size::new(0)
|
||||
}
|
||||
|
||||
pub const fn byte(value: u64) -> Size {
|
||||
Size::new(value)
|
||||
}
|
||||
|
||||
pub const fn kibibyte(value: u64) -> Size {
|
||||
Size::new(value << 10)
|
||||
}
|
||||
|
||||
pub const fn mebibyte(value: u64) -> Size {
|
||||
Size::new(value << 20)
|
||||
}
|
||||
|
||||
pub const fn gibibyte(value: u64) -> Size {
|
||||
Size::new(value << 30)
|
||||
}
|
||||
|
||||
pub const fn tebibyte(value: u64) -> Size {
|
||||
Size::new(value << 40)
|
||||
}
|
||||
|
||||
pub fn around(&self) -> String {
|
||||
if self.0 == 0 {
|
||||
"0B".to_string()
|
||||
} else if *self * 2 >= Self::TB {
|
||||
format!("{:.2}TiB", (self.0 as f64 / Self::TB.0 as f64))
|
||||
} else if *self * 2 >= Self::GB {
|
||||
format!("{:.2}GiB", (self.0 as f64 / Self::GB.0 as f64))
|
||||
} else if *self * 2 >= Self::MB {
|
||||
format!("{:.2}MiB", (self.0 as f64 / Self::MB.0 as f64))
|
||||
} else if *self * 2 >= Self::KB {
|
||||
format!("{:.2}KiB", (self.0 as f64 / Self::KB.0 as f64))
|
||||
} else {
|
||||
format!("{}B", self.0)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn is_power_of_two(&self) -> bool {
|
||||
self.0.is_power_of_two()
|
||||
}
|
||||
|
||||
pub fn next_power_of_two(&self) -> Size {
|
||||
Size(self.0.next_power_of_two())
|
||||
}
|
||||
|
||||
pub fn trailing_zeros(&self) -> u32 {
|
||||
self.0.trailing_zeros()
|
||||
}
|
||||
}
|
||||
|
||||
macro_rules! impl_trait_for_size {
|
||||
($($t:ty),*) => {
|
||||
$(impl From<$t> for Size {
|
||||
fn from(value: $t) -> Self {
|
||||
Self::new(value as _)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<Size> for $t {
|
||||
fn from(val: Size) -> Self {
|
||||
val.0 as _
|
||||
}
|
||||
}
|
||||
|
||||
impl PartialEq<$t> for Size {
|
||||
fn eq(&self, other: &$t) -> bool {
|
||||
self.0 == *other as u64
|
||||
}
|
||||
}
|
||||
|
||||
impl PartialEq<Size> for $t {
|
||||
fn eq(&self, other: &Size) -> bool {
|
||||
*self as u64 == other.0
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Add<$t> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn add(self, rhs: $t) -> Self::Output {
|
||||
Size::new(self.0 + rhs as u64)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Add<Size> for $t {
|
||||
type Output = Size;
|
||||
|
||||
fn add(self, rhs: Size) -> Self::Output {
|
||||
Size::new(self as u64 + rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::AddAssign<$t> for Size {
|
||||
fn add_assign(&mut self, rhs: $t) {
|
||||
self.0 += rhs as u64;
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Mul<$t> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn mul(self, rhs: $t) -> Self::Output {
|
||||
Size::new(self.0 * rhs as u64)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Mul<Size> for $t {
|
||||
type Output = Size;
|
||||
|
||||
fn mul(self, rhs: Size) -> Self::Output {
|
||||
Size::new(self as u64 * rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::MulAssign<$t> for Size {
|
||||
fn mul_assign(&mut self, rhs: $t) {
|
||||
self.0 *= rhs as u64;
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Rem<$t> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn rem(self, rhs: $t) -> Self::Output {
|
||||
Size::new(self.0 % rhs as u64)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Rem<Size> for $t {
|
||||
type Output = Size;
|
||||
|
||||
fn rem(self, rhs: Size) -> Self::Output {
|
||||
Size::new(self as u64 % rhs.0)
|
||||
}
|
||||
}
|
||||
)*
|
||||
};
|
||||
}
|
||||
|
||||
impl_trait_for_size! {i32, i64, u32, u64, usize}
|
||||
|
||||
impl std::ops::Add<Self> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn add(self, rhs: Self) -> Self::Output {
|
||||
Size::new(self.0 + rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::AddAssign<Self> for Size {
|
||||
fn add_assign(&mut self, rhs: Self) {
|
||||
self.0 += rhs.0;
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Sub<Self> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn sub(self, rhs: Self) -> Self::Output {
|
||||
Size::new(self.0 - rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Mul<Self> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn mul(self, rhs: Self) -> Self::Output {
|
||||
Size::new(self.0 * rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Div<Self> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn div(self, rhs: Self) -> Self::Output {
|
||||
Size::new(self.0 / rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Rem<Self> for Size {
|
||||
type Output = Size;
|
||||
|
||||
fn rem(self, rhs: Self) -> Self::Output {
|
||||
Size::new(self.0 % rhs.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Display for Size {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
if self.0 == 0 {
|
||||
write!(f, "0B")
|
||||
} else if *self % Self::TB == 0 {
|
||||
write!(f, "{}TiB", (*self / Self::TB).0)
|
||||
} else if *self % Self::GB == 0 {
|
||||
write!(f, "{}GiB", (*self / Self::GB).0)
|
||||
} else if *self % Self::MB == 0 {
|
||||
write!(f, "{}MiB", (*self / Self::MB).0)
|
||||
} else if *self % Self::KB == 0 {
|
||||
write!(f, "{}KiB", (*self / Self::KB).0)
|
||||
} else {
|
||||
write!(f, "{}B", self.0)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for Size {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{}", self.around())
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
#[test]
|
||||
fn test_size() {
|
||||
use super::Size;
|
||||
|
||||
let size = Size::zero();
|
||||
assert_eq!(size, Size::new(0));
|
||||
assert_eq!(size.to_string(), "0B".to_string());
|
||||
|
||||
let size = Size::kibibyte(64);
|
||||
assert_eq!(size, Size::new(65536));
|
||||
assert_eq!(size.to_string(), "64KiB".to_string());
|
||||
|
||||
let size: Size = Size::MB * 23;
|
||||
assert_eq!(size, Size::new(23 << 20));
|
||||
assert_eq!(size.to_string(), "23MiB".to_string());
|
||||
|
||||
let size: Size = 233 * Size::GB;
|
||||
assert_eq!(size, Size::new(233 << 30));
|
||||
assert_eq!(size.to_string(), "233GiB".to_string());
|
||||
|
||||
assert_eq!(format!("{}", Size::zero()), "0B".to_string());
|
||||
assert_eq!(format!("{}", Size::byte(233)), "233B".to_string());
|
||||
assert_eq!(format!("{}", Size::byte(512)), "512B".to_string());
|
||||
assert_eq!(format!("{}", Size::kibibyte(512)), "512KiB".to_string());
|
||||
assert_eq!(format!("{}", Size::mebibyte(512)), "512MiB".to_string());
|
||||
assert_eq!(format!("{}", Size::gibibyte(512)), "512GiB".to_string());
|
||||
assert_eq!(format!("{}", Size::tebibyte(512)), "512TiB".to_string());
|
||||
|
||||
assert_eq!(format!("{:?}", Size::zero()), "0B".to_string());
|
||||
assert_eq!(format!("{:?}", Size::byte(233)), "233B".to_string());
|
||||
assert_eq!(format!("{:?}", Size::byte(512)), "0.50KiB".to_string());
|
||||
assert_eq!(format!("{:?}", Size::kibibyte(512)), "0.50MiB".to_string());
|
||||
assert_eq!(format!("{:?}", Size::mebibyte(512)), "0.50GiB".to_string());
|
||||
assert_eq!(format!("{:?}", Size::gibibyte(512)), "0.50TiB".to_string());
|
||||
assert_eq!(format!("{:?}", Size::tebibyte(512)), "512.00TiB".to_owned());
|
||||
|
||||
let r = rand::random::<u64>() % 1024;
|
||||
assert_eq!(0 + Size::kibibyte(r), Size::from(r << 10));
|
||||
assert_eq!(Size::mebibyte(r) + 0, Size::from(r << 20));
|
||||
assert_eq!(1 * Size::gibibyte(r), Size::from(r << 30));
|
||||
assert_eq!(Size::tebibyte(r) * 1, Size::from(r << 40));
|
||||
|
||||
assert_eq!(Size::KB * Size::KB, Size::MB);
|
||||
let mut size = Size::B;
|
||||
size *= 1024;
|
||||
assert_eq!(size, Size::KB);
|
||||
assert_eq!(size % 1000, 24);
|
||||
|
||||
assert_eq!(Size::KB + Size::KB, Size::kibibyte(2));
|
||||
assert_eq!(Size::KB % 1000, Size(24));
|
||||
}
|
||||
}
|
||||
136
src/storage/chunk_engine/src/utils/worker.rs
Normal file
136
src/storage/chunk_engine/src/utils/worker.rs
Normal file
@@ -0,0 +1,136 @@
|
||||
use std::{
|
||||
sync::{
|
||||
atomic::{AtomicBool, Ordering},
|
||||
Arc, Condvar, Mutex,
|
||||
},
|
||||
thread::JoinHandle,
|
||||
};
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||
pub enum WorkerState {
|
||||
Continue,
|
||||
Pause,
|
||||
Wait(std::time::Duration),
|
||||
Stop,
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
pub struct WorkerBuilder {
|
||||
name: Option<String>,
|
||||
condvar: Option<Arc<Condvar>>,
|
||||
}
|
||||
|
||||
impl WorkerBuilder {
|
||||
pub fn name(mut self, str: String) -> Self {
|
||||
self.name = Some(str);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn cond(mut self, condvar: Arc<Condvar>) -> Self {
|
||||
self.condvar = Some(condvar);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn spawn<F>(self, f: F) -> Worker
|
||||
where
|
||||
F: FnMut() -> WorkerState + Send + 'static,
|
||||
{
|
||||
Worker::new(f, self.name, self.condvar)
|
||||
}
|
||||
}
|
||||
|
||||
pub struct Worker {
|
||||
stopping: Arc<AtomicBool>,
|
||||
condvar: Arc<Condvar>,
|
||||
handle: Option<JoinHandle<()>>,
|
||||
}
|
||||
|
||||
impl Worker {
|
||||
pub fn new<F>(mut f: F, name: Option<String>, condvar: Option<Arc<Condvar>>) -> Worker
|
||||
where
|
||||
F: FnMut() -> WorkerState + Send + 'static,
|
||||
{
|
||||
let stopping = Arc::new(AtomicBool::default());
|
||||
let stopping_clone = stopping.clone();
|
||||
let condvar = condvar.unwrap_or_default();
|
||||
let condvar_clone = condvar.clone();
|
||||
|
||||
let builder = if let Some(name) = name {
|
||||
std::thread::Builder::new().name(name)
|
||||
} else {
|
||||
std::thread::Builder::new()
|
||||
};
|
||||
let handle = Some(
|
||||
builder
|
||||
.spawn(move || {
|
||||
let mutex = Mutex::new(());
|
||||
while !stopping_clone.load(Ordering::Acquire) {
|
||||
match f() {
|
||||
WorkerState::Continue => continue,
|
||||
WorkerState::Pause => {
|
||||
drop(condvar_clone.wait(mutex.lock().unwrap()).unwrap());
|
||||
}
|
||||
WorkerState::Wait(duration) => {
|
||||
drop(
|
||||
condvar_clone
|
||||
.wait_timeout(mutex.lock().unwrap(), duration)
|
||||
.unwrap(),
|
||||
);
|
||||
}
|
||||
WorkerState::Stop => break,
|
||||
}
|
||||
}
|
||||
})
|
||||
.unwrap(),
|
||||
);
|
||||
|
||||
Worker {
|
||||
stopping,
|
||||
condvar,
|
||||
handle,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn stop_and_join(&mut self) {
|
||||
self.stopping.store(true, Ordering::Release);
|
||||
self.condvar.notify_all();
|
||||
if let Some(handle) = self.handle.take() {
|
||||
handle.join().unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_worker() {
|
||||
let count = Arc::new(std::sync::atomic::AtomicUsize::default());
|
||||
let condvar = Default::default();
|
||||
let count_clone = count.clone();
|
||||
let mut worker = WorkerBuilder::default()
|
||||
.name("Worker".into())
|
||||
.cond(condvar)
|
||||
.spawn(move || {
|
||||
if count_clone.fetch_add(1, Ordering::SeqCst) + 1 < 10 {
|
||||
WorkerState::Continue
|
||||
} else {
|
||||
WorkerState::Pause
|
||||
}
|
||||
});
|
||||
|
||||
while count.load(Ordering::Acquire) < 10 {
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
}
|
||||
|
||||
worker.stop_and_join();
|
||||
assert_eq!(count.load(Ordering::Acquire), 10);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_worker_2() {
|
||||
let worker = WorkerBuilder::default().spawn(move || WorkerState::Stop);
|
||||
let _ = worker.handle.unwrap().join();
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user