17 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	3FS Setup Guide
This section provides a manual deployment guide for setting up a six-node cluster with the cluster ID stage.
Installation prerequisites
Hardware specifications
| Node | OS | IP | Memory | SSD | RDMA | 
|---|---|---|---|---|---|
| meta | Ubuntu 22.04 | 192.168.1.1 | 128GB | - | RoCE | 
| storage1 | Ubuntu 22.04 | 192.168.1.2 | 512GB | 14TB × 16 | RoCE | 
| storage2 | Ubuntu 22.04 | 192.168.1.3 | 512GB | 14TB × 16 | RoCE | 
| storage3 | Ubuntu 22.04 | 192.168.1.4 | 512GB | 14TB × 16 | RoCE | 
| storage4 | Ubuntu 22.04 | 192.168.1.5 | 512GB | 14TB × 16 | RoCE | 
| storage5 | Ubuntu 22.04 | 192.168.1.6 | 512GB | 14TB × 16 | RoCE | 
RDMA Configuration
- Assign IP addresses to RDMA NICs. Multiple RDMA NICs (InfiniBand or RoCE) are supported on each node.
- Check RDMA connectivity between nodes using
ib_write_bw.
Third-party dependencies
In production environment, it is recommended to install FoundationDB and ClickHouse on dedicated nodes.
| Service | Node | 
|---|---|
| ClickHouse | meta | 
| FoundationDB | meta | 
FoundationDB
- Ensure that the version of FoundationDB client matches the server version, or copy the corresponding version of
libfdb_c.soto maintain compatibility.- Find the
fdb.clusterfile andlibfdb_c.soat/etc/foundationdb/fdb.cluster,/usr/lib/libfdb_c.soon nodes with FoundationDB installed.
Step 0: Build 3FS
Follow the instructions to build 3FS. Binaries can be found in build/bin.
Services and clients
The following steps show how to install 3FS services in /opt/3fs/bin and the config files in /opt/3fs/etc.
| Service | Binary | Config files | NodeID | Node | 
|---|---|---|---|---|
| monitor | monitor_collector_main | monitor_collector_main.toml | - | meta | 
| admin_cli | admin_cli | admin_cli.toml fdb.cluster | - | meta storage1 storage2 storage3 storage4 storage5 | 
| mgmtd | mgmtd_main | mgmtd_main_launcher.toml mgmtd_main.toml mgmtd_main_app.toml fdb.cluster | 1 | meta | 
| meta | meta_main | meta_main_launcher.toml meta_main.toml meta_main_app.toml fdb.cluster | 100 | meta | 
| storage | storage_main | storage_main_launcher.toml storage_main.toml storage_main_app.toml | 10001~10005 | storage1 storage2 storage3 storage4 storage5 | 
| client | hf3fs_fuse_main | hf3fs_fuse_main_launcher.toml hf3fs_fuse_main.toml | - | meta | 
Step 1: Create ClickHouse tables for metrics
Import the SQL file into ClickHouse:
clickhouse-client -n < ~/3fs/deploy/sql/3fs-monitor.sql
Step 2: Monitor service
Install monitor_collector service on the meta node.
- Copy monitor_collector_mainto/opt/3fs/binand config files to/opt/3fs/etc, and create log directory/var/log/3fs.mkdir -p /opt/3fs/{bin,etc} mkdir -p /var/log/3fs cp ~/3fs/build/bin/monitor_collector_main /opt/3fs/bin cp ~/3fs/configs/monitor_collector_main.toml /opt/3fs/etc
- Update monitor_collector_main.tomlto add a ClickHouse connection:[server.monitor_collector.reporter] type = 'clickhouse' [server.monitor_collector.reporter.clickhouse] db = '3fs' host = '<CH_HOST>' passwd = '<CH_PASSWD>' port = '<CH_PORT>' user = '<CH_USER>'
- Start monitor service:
cp ~/3fs/deploy/systemd/monitor_collector_main.service /usr/lib/systemd/system systemctl start monitor_collector_main
Note that
- Multiple instances of monitor services can be deployed behind a virtual IP address to share the traffic.
- Other services communicate with the monitor service over a TCP connection.
Step 3: Admin client
Install admin_cli on all nodes.
- Copy admin_clito/opt/3fs/binand config files to/opt/3fs/etc.mkdir -p /opt/3fs/{bin,etc} rsync -avz meta:~/3fs/build/bin/admin_cli /opt/3fs/bin rsync -avz meta:~/3fs/configs/admin_cli.toml /opt/3fs/etc rsync -avz meta:/etc/foundationdb/fdb.cluster /opt/3fs/etc
- Update admin_cli.tomlto setcluster_idandclusterFile:cluster_id = "stage" [fdb] clusterFile = '/opt/3fs/etc/fdb.cluster'
The full help documentation for admin_cli can be displayed by running the following command:
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml help
Step 4: Mgmtd service
Install mgmtd service on meta node.
- 
Copy mgmtd_mainto/opt/3fs/binand config files to/opt/3fs/etc.cp ~/3fs/build/bin/mgmtd_main /opt/3fs/bin cp ~/3fs/configs/{mgmtd_main.toml,mgmtd_main_launcher.toml,mgmtd_main_app.toml} /opt/3fs/etc
- 
Update config files: - Set mgmtd node_id = 1inmgmtd_main_app.toml.
- Edit mgmtd_main_launcher.tomlto set thecluster_idandclusterFile:
 cluster_id = "stage" [fdb] clusterFile = '/opt/3fs/etc/fdb.cluster'- Set monitor address in mgmtd_main.toml:
 [common.monitor.reporters.monitor_collector] remote_ip = "192.168.1.1:10000"
- Set mgmtd 
- 
Initialize the cluster: /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "init-cluster --mgmtd /opt/3fs/etc/mgmtd_main.toml 1 1048576 16"The parameters of admin_cli:- 1the chain table ID
- 1048576the chunk size in bytes
- 16the file strip size
 Run help init-clusterfor full documentation.
- 
Start mgmtd service: cp ~/3fs/deploy/systemd/mgmtd_main.service /usr/lib/systemd/system systemctl start mgmtd_main
- 
Run list-nodescommand to check if the cluster has been successfully initialized:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
If multiple instances of mgmtd services deployed, one of the mgmtd services is elected as the primary; others are secondaries. Automatic failover occurs when the primary fails.
Step 5: Meta service
Install meta service on meta node.
- Copy meta_mainto/opt/3fs/binand config files to/opt/3fs/etc.cp ~/3fs/build/bin/meta_main /opt/3fs/bin cp ~/3fs/configs/{meta_main_launcher.toml,meta_main.toml,meta_main_app.toml} /opt/3fs/etc
- Update config files:
- Set meta node_id = 100inmeta_main_app.toml.
- Set cluster_id,clusterFileand mgmtd address inmeta_main_launcher.toml:
 cluster_id = "stage" [mgmtd_client] mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]- Set mgmtd and monitor addresses in meta_main.toml.
 [server.mgmtd_client] mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"] [common.monitor.reporters.monitor_collector] remote_ip = "192.168.1.1:10000" [server.fdb] clusterFile = '/opt/3fs/etc/fdb.cluster'
- Set meta 
- Config file of meta service is managed by mgmtd service. Use admin_clito upload the config file to mgmtd:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type META --file /opt/3fs/etc/meta_main.toml"
- Start meta service:
cp ~/3fs/deploy/systemd/meta_main.service /usr/lib/systemd/system systemctl start meta_main
- Run list-nodescommand to check if meta service has joined the cluster:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
If multiple instances of meta services deployed, meta requests will be evenly distributed to all instances.
Step 6: Storage service
Install storage service on storage node.
- Format the attached 16 SSDs as XFS and mount at /storage/data{1..16}, then create data directories/storage/data{1..16}/3fsand log directory/var/log/3fs.mkdir -p /storage/data{1..16} mkdir -p /var/log/3fs for i in {1..16};do mkfs.xfs -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done mkdir -p /storage/data{1..16}/3fs
- Increase the max number of asynchronous aio requests:
sysctl -w fs.aio-max-nr=67108864
- Copy storage_mainto/opt/3fs/binand config files to/opt/3fs/etc.rsync -avz meta:~/3fs/build/bin/storage_main /opt/3fs/bin rsync -avz meta:~/3fs/configs/{storage_main_launcher.toml,storage_main.toml,storage_main_app.toml} /opt/3fs/etc
- Update config files:
- Set node_idinstorage_main_app.toml. Each storage service is assigned a unique id between10001and10005.
- Set cluster_idand mgmtd address instorage_main_launcher.toml.
 cluster_id = "stage" [mgmtd_client] mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]- Add target paths in storage_main.toml:
 [server.mgmtd] mgmtd_server_address = ["RDMA://192.168.1.1:8000"] [common.monitor.reporters.monitor_collector] remote_ip = "192.168.1.1:10000" [server.targets] target_paths = ["/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs","/storage/data4/3fs","/storage/data5/3fs","/storage/data6/3fs","/storage/data7/3fs","/storage/data8/3fs","/storage/data9/3fs","/storage/data10/3fs","/storage/data11/3fs","/storage/data12/3fs","/storage/data13/3fs","/storage/data14/3fs","/storage/data15/3fs","/storage/data16/3fs",]
- Set 
- Config file of storage service is managed by mgmtd service. Use admin_clito upload the config file to mgmtd:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type STORAGE --file /opt/3fs/etc/storage_main.toml"
- Start storage service:
rsync -avz meta:~/3fs/deploy/systemd/storage_main.service /usr/lib/systemd/system systemctl start storage_main
- Run list-nodescommand to check if storage service has joined the cluster:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
Step 7: Create admin user, storage targets and chain table
- Create an admin user:
 Save the admin token to/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "user-add --root --admin 0 root"/opt/3fs/etc/token.txt.
- Generate admin_clicommands to create storage targets on 5 storage nodes (16 SSD per node, 6 targets per SSD).- Follow instructions at here to install Python packages.
 
 The following 3 files will be generated inpython ~/3fs/deploy/data_placement/src/model/data_placement.py \ -ql -relax -type CR --num_nodes 5 --replication_factor 3 --min_targets_per_disk 6 python ~/3fs/deploy/data_placement/src/setup/gen_chain_table.py \ --chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \ --num_disks_per_node 16 --num_targets_per_disk 6 \ --target_id_prefix 1 --chain_id_prefix 9 \ --incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickleoutputdirectory:create_target_cmd.txt,generated_chains.csv, andgenerated_chain_table.csv.
- Create storage targets:
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") < output/create_target_cmd.txt
- Upload chains to mgmtd service:
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chains output/generated_chains.csv"
- Upload chain table to mgmtd service:
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chain-table --desc stage 1 output/generated_chain_table.csv"
- List chains and chain tables to check if they have been correctly uploaded:
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chains" /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chain-tables"
Step 8: FUSE client
For simplicity FUSE client is deployed on the meta node in this guide. However, we strongly advise against deploying clients on service nodes in production environment.
- Copy hf3fs_fuse_mainto/opt/3fs/binand config files to/opt/3fs/etc.cp ~/3fs/build/bin/hf3fs_fuse_main /opt/3fs/bin cp ~/3fs/configs/{hf3fs_fuse_main_launcher.toml,hf3fs_fuse_main.toml,hf3fs_fuse_main_app.toml} /opt/3fs/etc
- Create the mount point:
mkdir -p /3fs/stage
- Set cluster ID, mountpoint, token file and mgmtd address in hf3fs_fuse_main_launcher.tomlcluster_id = "stage" mountpoint = '/3fs/stage' token_file = '/opt/3fs/etc/token.txt' [mgmtd_client] mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
- Set mgmtd and monitor address in hf3fs_fuse_main.toml.[mgmtd] mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"] [common.monitor.reporters.monitor_collector] remote_ip = "192.168.1.1:10000"
- Config file of FUSE client is also managed by mgmtd service. Use admin_clito upload the config file to mgmtd:/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
- Start FUSE client:
cp ~/3fs/deploy/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system systemctl start hf3fs_fuse_main
- Check if 3FS has been mounted at /3fs/stage:mount | grep '/3fs/stage'
FAQ
How to troubleshoot admin_cli init-cluster error?
If mgmtd fails to start after running init-cluster, the most likely cause is an error in mgmtd_main.toml. Any changes to this file require clearing all FoundationDB data and re-running init-cluster
How to build a single-node cluster?
A minimum of two storage services is required for data replication. If set --num-nodes=1, the gen_chain_table.py script will fail. In a test environment, this limitation can be bypassed by deploying multiple storage services on a single machine.
How to update config files?
All config files are managed by mgmtd. If any *_main.toml is updated, such as storage_main.toml, the modified file should be uploaded using admin_cli set-config.
How to troubleshoot common deployment issues?
When encountering any error during deployment,
- Check the log messages in stdout/stderrusingjournalctl, especially during service startup.
- Check log files stored in /var/log/3fs/on service and client nodes.
- Ensure that the directory /var/log/3fs/exists before starting any service.