Configuration
This document provides a detailed explanation of Celline's configuration system. Learn how to configure everything from project-specific settings to global execution environments for efficient analysis.
🎯 Configuration System Overview
Celline manages configurations in the following hierarchy:
- Project Configuration (
setting.toml
) - Project-specific settings - Sample Configuration (
samples.toml
) - Management of analysis target samples - Runtime Configuration - Temporary settings via command-line arguments
📁 Project Configuration (setting.toml)
This is the configuration file created in the project's root directory.
Basic Structure
[project]
name = "my-scrna-project"
version = "1.0.0"
description = "Single cell RNA-seq analysis project"
[execution]
system = "multithreading"
nthread = 4
pbs_server = ""
[R]
r_path = "/usr/bin/R"
[fetch]
wait_time = 4
[analysis]
target_species = "homo_sapiens"
reference_genome = "GRCh38"
Detailed Configuration Sections
[project]
- Project Information
Parameter | Type | Default | Description |
---|---|---|---|
name | string | directory name | Project name |
version | string | "0.01" | Project version |
description | string | "" | Project description |
[execution]
- Execution Environment
Parameter | Type | Default | Description |
---|---|---|---|
system | string | "multithreading" | Execution system (multithreading / PBS ) |
nthread | integer | 1 | Number of parallel execution threads |
pbs_server | string | "" | PBS cluster server name |
[R]
- R Environment Configuration
Parameter | Type | Default | Description |
---|---|---|---|
r_path | string | auto-detect | R execution path |
[fetch]
- Data Acquisition Settings
Parameter | Type | Default | Description |
---|---|---|---|
wait_time | integer | 4 | Wait time between API calls (seconds) |
[analysis]
- Analysis Settings
Parameter | Type | Default | Description |
---|---|---|---|
target_species | string | "" | Target species (homo_sapiens, mus_musculus, etc.) |
reference_genome | string | "" | Reference genome (GRCh38, GRCm39, etc.) |
🧬 Sample Configuration (samples.toml)
This file manages information about samples to be analyzed. It's usually auto-generated by the celline run add
command.
Basic Structure
# Simple format
GSM1234567 = "Control sample 1"
GSM1234568 = "Treatment sample 1"
# Detailed information format
[GSM1234569]
title = "Control sample 2"
condition = "control"
replicate = 2
tissue = "brain"
cell_type = "mixed"
[GSM1234570]
title = "Treatment sample 2"
condition = "treatment"
replicate = 2
tissue = "brain"
cell_type = "mixed"
Manual Sample Addition
# Direct entry in samples.toml
GSM5555555 = "Custom sample"
# Or with detailed information
[GSM6666666]
title = "Custom detailed sample"
condition = "experimental"
batch = "batch1"
notes = "Special processing required"
⚙️ CLI Configuration Commands
Basic Configuration
# Interactive configuration
celline config
# Display current configuration
celline config --show
Execution System Configuration
# Configure multithreading execution
celline config --system multithreading --nthread 8
# Configure PBS cluster execution
celline config --system PBS --pbs-server my-cluster --nthread 16
R Environment Configuration
# Manually set R path
celline config --r-path /opt/R/4.3.0/bin/R
# Use auto-detection
celline config --r-path auto
🔧 Detailed Execution Environment Configuration
Multithreading Execution
Suitable for parallel execution on local machines.
[execution]
system = "multithreading"
nthread = 4 # Adjust according to CPU cores
# Recommended settings examples
# 4-core CPU: nthread = 2-3
# 8-core CPU: nthread = 4-6
# 16-core CPU: nthread = 8-12
Applied Processes:
- Data downloading
- File processing
- Parallelizable analysis processing
PBS Cluster Execution
Suitable for large-scale analysis in HPC environments.
[execution]
system = "PBS"
nthread = 16
pbs_server = "your-cluster-name"
[pbs]
queue = "normal"
walltime = "24:00:00"
memory = "64GB"
PBS Job Template Example:
#!/bin/bash
#PBS -l select=1:ncpus=16:mem=64GB
#PBS -l walltime=24:00:00
#PBS -q normal
#PBS -N celline-job
cd $PBS_O_WORKDIR
module load R/4.3.0
celline run count
🌍 Environment Variable Configuration
Celline recognizes the following environment variables:
# R environment
export R_HOME=/opt/R/4.3.0
export R_LIBS_USER=/home/user/R/library
# Cell Ranger
export CELLRANGER_PATH=/opt/cellranger-7.0.0
# Temporary directory
export TMPDIR=/scratch/tmp
# Memory limit
export CELLINE_MAX_MEMORY=32G
Configuration in .bashrc/.zshrc
# Add to ~/.bashrc or ~/.zshrc
export CELLINE_CONFIG_DIR=$HOME/.config/celline
export CELLINE_CACHE_DIR=$HOME/.cache/celline
export CELLINE_R_PATH=/opt/R/bin/R
📊 Profile Management
You can manage multiple configuration profiles.
Creating Profiles
# Development profile
mkdir -p ~/.config/celline/profiles/development
cat > ~/.config/celline/profiles/development/setting.toml << EOF
[execution]
system = "multithreading"
nthread = 2
[analysis]
debug_mode = true
verbose = true
EOF
# Production profile
mkdir -p ~/.config/celline/profiles/production
cat > ~/.config/celline/profiles/production/setting.toml << EOF
[execution]
system = "PBS"
nthread = 32
pbs_server = "production-cluster"
[analysis]
debug_mode = false
verbose = false
EOF
Using Profiles
# Execute with specified profile
celline --profile development run preprocess
celline --profile production run count
🔒 Security Configuration
Access Control
[security]
# API access restrictions
api_allowed_hosts = ["localhost", "127.0.0.1"]
api_port = 8000
# Data directory permissions
data_permissions = "750"
result_permissions = "755"
Authentication Settings
[auth]
# Future feature
enable_auth = false
auth_provider = "none" # "ldap", "oauth", etc.
📈 Performance Configuration
Memory Management
[performance]
# Memory usage limits
max_memory_gb = 32
temp_dir = "/tmp/celline"
# Cache settings
enable_cache = true
cache_size_gb = 10
cache_dir = "~/.cache/celline"
Parallel Processing Adjustment
[parallel]
# I/O intensive tasks
io_workers = 4
# CPU intensive tasks
cpu_workers = 8
# Memory intensive tasks
memory_workers = 2
🔍 Debug Configuration
Log Levels
[logging]
level = "INFO" # DEBUG, INFO, WARNING, ERROR
log_file = "celline.log"
max_log_size_mb = 100
backup_count = 5
# Module-specific log levels
[logging.modules]
"celline.functions" = "DEBUG"
"celline.database" = "INFO"
"celline.api" = "WARNING"
Debug Mode
# Execute with debug information
celline --debug run preprocess
# Verbose log output
celline --verbose run download
🔄 Configuration Inheritance and Priority
Configurations are applied in the following priority order (highest first):
- Command-line arguments
- Environment variables
- Project configuration (
setting.toml
) - User configuration (
~/.config/celline/config.toml
) - System configuration (
/etc/celline/config.toml
) - Default values
Configuration Verification
# Display currently effective configuration
celline config --show-effective
# Display configuration file locations
celline config --show-files
# Detailed configuration analysis
celline config --debug
🚨 Configuration Validation
Configuration Check
# Configuration file syntax check
celline config --validate
# Execution environment verification
celline config --test-environment
# Dependency check
celline config --check-dependencies
Auto-repair
# Automatic configuration repair
celline config --repair
# Restore default configuration
celline config --reset
📦 Configuration Export/Import
Configuration Backup
# Export current configuration
celline config --export > my-config.toml
# Import configuration to another project
celline config --import my-config.toml
Configuration Templates
# Generate configuration template
celline config --generate-template > template.toml
# Create new project from template
celline init --template template.toml new-project
Info: For more advanced configuration, refer to Advanced Usage.