/Configuration

Configuration

This document provides a detailed explanation of Celline's configuration system. Learn how to configure everything from project-specific settings to global execution environments for efficient analysis.

🎯 Configuration System Overview

Celline manages configurations in the following hierarchy:

  1. Project Configuration (setting.toml) - Project-specific settings
  2. Sample Configuration (samples.toml) - Management of analysis target samples
  3. Runtime Configuration - Temporary settings via command-line arguments

📁 Project Configuration (setting.toml)

This is the configuration file created in the project's root directory.

Basic Structure

      [project]
name = "my-scrna-project"
version = "1.0.0"
description = "Single cell RNA-seq analysis project"

[execution]
system = "multithreading"
nthread = 4
pbs_server = ""

[R]
r_path = "/usr/bin/R"

[fetch]
wait_time = 4

[analysis]
target_species = "homo_sapiens"
reference_genome = "GRCh38"

    

Detailed Configuration Sections

[project] - Project Information

ParameterTypeDefaultDescription
namestringdirectory nameProject name
versionstring"0.01"Project version
descriptionstring""Project description

[execution] - Execution Environment

ParameterTypeDefaultDescription
systemstring"multithreading"Execution system (multithreading / PBS)
nthreadinteger1Number of parallel execution threads
pbs_serverstring""PBS cluster server name

[R] - R Environment Configuration

ParameterTypeDefaultDescription
r_pathstringauto-detectR execution path

[fetch] - Data Acquisition Settings

ParameterTypeDefaultDescription
wait_timeinteger4Wait time between API calls (seconds)

[analysis] - Analysis Settings

ParameterTypeDefaultDescription
target_speciesstring""Target species (homo_sapiens, mus_musculus, etc.)
reference_genomestring""Reference genome (GRCh38, GRCm39, etc.)

🧬 Sample Configuration (samples.toml)

This file manages information about samples to be analyzed. It's usually auto-generated by the celline run add command.

Basic Structure

      # Simple format
GSM1234567 = "Control sample 1"
GSM1234568 = "Treatment sample 1"

# Detailed information format
[GSM1234569]
title = "Control sample 2"
condition = "control"
replicate = 2
tissue = "brain"
cell_type = "mixed"

[GSM1234570]
title = "Treatment sample 2"
condition = "treatment"
replicate = 2
tissue = "brain"
cell_type = "mixed"

    

Manual Sample Addition

      # Direct entry in samples.toml
GSM5555555 = "Custom sample"

# Or with detailed information
[GSM6666666]
title = "Custom detailed sample"
condition = "experimental"
batch = "batch1"
notes = "Special processing required"

    

⚙️ CLI Configuration Commands

Basic Configuration

      # Interactive configuration
celline config

# Display current configuration
celline config --show

    

Execution System Configuration

      # Configure multithreading execution
celline config --system multithreading --nthread 8

# Configure PBS cluster execution
celline config --system PBS --pbs-server my-cluster --nthread 16

    

R Environment Configuration

      # Manually set R path
celline config --r-path /opt/R/4.3.0/bin/R

# Use auto-detection
celline config --r-path auto

    

🔧 Detailed Execution Environment Configuration

Multithreading Execution

Suitable for parallel execution on local machines.

      [execution]
system = "multithreading"
nthread = 4  # Adjust according to CPU cores

# Recommended settings examples
# 4-core CPU: nthread = 2-3
# 8-core CPU: nthread = 4-6
# 16-core CPU: nthread = 8-12

    

Applied Processes:

  • Data downloading
  • File processing
  • Parallelizable analysis processing

PBS Cluster Execution

Suitable for large-scale analysis in HPC environments.

      [execution]
system = "PBS"
nthread = 16
pbs_server = "your-cluster-name"

[pbs]
queue = "normal"
walltime = "24:00:00"
memory = "64GB"

    

PBS Job Template Example:

      #!/bin/bash
#PBS -l select=1:ncpus=16:mem=64GB
#PBS -l walltime=24:00:00
#PBS -q normal
#PBS -N celline-job

cd $PBS_O_WORKDIR
module load R/4.3.0
celline run count

    

🌍 Environment Variable Configuration

Celline recognizes the following environment variables:

      # R environment
export R_HOME=/opt/R/4.3.0
export R_LIBS_USER=/home/user/R/library

# Cell Ranger
export CELLRANGER_PATH=/opt/cellranger-7.0.0

# Temporary directory
export TMPDIR=/scratch/tmp

# Memory limit
export CELLINE_MAX_MEMORY=32G

    

Configuration in .bashrc/.zshrc

      # Add to ~/.bashrc or ~/.zshrc
export CELLINE_CONFIG_DIR=$HOME/.config/celline
export CELLINE_CACHE_DIR=$HOME/.cache/celline
export CELLINE_R_PATH=/opt/R/bin/R

    

📊 Profile Management

You can manage multiple configuration profiles.

Creating Profiles

      # Development profile
mkdir -p ~/.config/celline/profiles/development
cat > ~/.config/celline/profiles/development/setting.toml << EOF
[execution]
system = "multithreading"
nthread = 2

[analysis]
debug_mode = true
verbose = true
EOF

# Production profile  
mkdir -p ~/.config/celline/profiles/production
cat > ~/.config/celline/profiles/production/setting.toml << EOF
[execution]
system = "PBS"
nthread = 32
pbs_server = "production-cluster"

[analysis]
debug_mode = false
verbose = false
EOF

    

Using Profiles

      # Execute with specified profile
celline --profile development run preprocess
celline --profile production run count

    

🔒 Security Configuration

Access Control

      [security]
# API access restrictions
api_allowed_hosts = ["localhost", "127.0.0.1"]
api_port = 8000

# Data directory permissions
data_permissions = "750"
result_permissions = "755"

    

Authentication Settings

      [auth]
# Future feature
enable_auth = false
auth_provider = "none"  # "ldap", "oauth", etc.

    

📈 Performance Configuration

Memory Management

      [performance]
# Memory usage limits
max_memory_gb = 32
temp_dir = "/tmp/celline"

# Cache settings
enable_cache = true
cache_size_gb = 10
cache_dir = "~/.cache/celline"

    

Parallel Processing Adjustment

      [parallel]
# I/O intensive tasks
io_workers = 4

# CPU intensive tasks  
cpu_workers = 8

# Memory intensive tasks
memory_workers = 2

    

🔍 Debug Configuration

Log Levels

      [logging]
level = "INFO"  # DEBUG, INFO, WARNING, ERROR
log_file = "celline.log"
max_log_size_mb = 100
backup_count = 5

# Module-specific log levels
[logging.modules]
"celline.functions" = "DEBUG"
"celline.database" = "INFO"
"celline.api" = "WARNING"

    

Debug Mode

      # Execute with debug information
celline --debug run preprocess

# Verbose log output
celline --verbose run download

    

🔄 Configuration Inheritance and Priority

Configurations are applied in the following priority order (highest first):

  1. Command-line arguments
  2. Environment variables
  3. Project configuration (setting.toml)
  4. User configuration (~/.config/celline/config.toml)
  5. System configuration (/etc/celline/config.toml)
  6. Default values

Configuration Verification

      # Display currently effective configuration
celline config --show-effective

# Display configuration file locations
celline config --show-files

# Detailed configuration analysis
celline config --debug

    

🚨 Configuration Validation

Configuration Check

      # Configuration file syntax check
celline config --validate

# Execution environment verification
celline config --test-environment

# Dependency check
celline config --check-dependencies

    

Auto-repair

      # Automatic configuration repair
celline config --repair

# Restore default configuration
celline config --reset

    

📦 Configuration Export/Import

Configuration Backup

      # Export current configuration
celline config --export > my-config.toml

# Import configuration to another project
celline config --import my-config.toml

    

Configuration Templates

      # Generate configuration template
celline config --generate-template > template.toml

# Create new project from template
celline init --template template.toml new-project

    

Info: For more advanced configuration, refer to Advanced Usage.