/CreateSeuratObject Function

CreateSeuratObject Function

Create Seurat objects from Cell Ranger count matrices for downstream analysis.

Overview

The CreateSeuratObject function creates Seurat objects from Cell Ranger output matrices. Seurat objects are the standard data structure for single-cell RNA-seq analysis in R, providing a unified framework for data storage and analysis.

Class Information

  • Module: celline.functions.create_seurat
  • Class: CreateSeuratObject
  • Base Class: CellineFunction

Parameters

Constructor Parameters

ParameterTypeRequiredDescription
useqc_matrixboolYesWhether to use quality-controlled filtered matrix
thenOptional[Callable[[str], None]]NoCallback function executed on successful completion
catchOptional[Callable[[subprocess.CalledProcessError], None]]NoCallback function executed on error

JobContainer Structure

The CreateSeuratObject.JobContainer contains job configuration:

FieldTypeDescription
nthreadstrNumber of threads (fixed to 1)
cluster_serverstrCluster server name (if applicable)
jobnamestrJob identifier
logpathstrPath to log file
r_pathstrPath to R script directory
exec_rootstrExecution root directory
input_h5_pathstrPath to input HDF5 matrix file
data_dir_pathstrOutput data directory
proj_namestrProject name
useqc_matrixstrQC matrix flag ("true" or "false")

Usage Examples

Python API

Basic Usage

      from celline import Project
from celline.functions.create_seurat import CreateSeuratObject

# Create project
project = Project("./my-project")

# Create Seurat objects with filtered matrices
seurat_function = CreateSeuratObject(useqc_matrix=True)

# Execute function
result = project.call(seurat_function)

    

Using Raw Matrices

      from celline import Project
from celline.functions.create_seurat import CreateSeuratObject

# Create project
project = Project("./my-project")

# Create Seurat objects with raw matrices (no Cell Ranger filtering)
seurat_function = CreateSeuratObject(useqc_matrix=False)

# Execute function
result = project.call(seurat_function)

    

With Callbacks

      from celline import Project
from celline.functions.create_seurat import CreateSeuratObject
import subprocess

def on_success(sample_id: str):
    print(f"Successfully created Seurat object for: {sample_id}")

def on_error(error: subprocess.CalledProcessError):
    print(f"Seurat object creation failed: {error}")

# Create project
project = Project("./my-project")

# Create function with callbacks
seurat_function = CreateSeuratObject(
    useqc_matrix=True,
    then=on_success,
    catch=on_error
)

# Execute function
result = project.call(seurat_function)

    

CLI Usage

Basic Usage

      # Create Seurat objects (filtered matrices)
celline run createseuratobject

# Create with raw matrices
celline run createseuratobject --raw

# Verbose output
celline run createseuratobject --verbose

    

Implementation Details

Prerequisites

The function requires samples to be:

  1. Counted: Cell Ranger count must be completed
  2. Cell Type Predicted: Cell type prediction must be finished
  3. Preprocessed: Quality control preprocessing must be done

R Script Integration

The function executes R scripts to create Seurat objects:

      # Load required libraries
library(Seurat)
library(hdf5r)

# Read Cell Ranger output
matrix_path <- "/path/to/filtered_feature_bc_matrix.h5"
expression_matrix <- Read10X_h5(matrix_path)

# Create Seurat object
seurat_obj <- CreateSeuratObject(
  counts = expression_matrix,
  project = "MyProject",
  min.cells = 3,
  min.features = 200
)

# Save Seurat object
saveRDS(seurat_obj, file = "seurat.rds")

    

File Processing

The function processes Cell Ranger outputs:

      Input:  resources/SAMPLE_ID/counted/outs/filtered_feature_bc_matrix.h5
Output: data/SAMPLE_ID/seurat.rds

    

Directory Structure

Output organization:

      project_root/
├── data/
│   └── SAMPLE_ID/
│       ├── seurat.rds           # Seurat object
│       ├── src/
│       │   └── create_seurat.sh # Generated R script
│       └── log/
│           └── create_seurat_*.log

    

Matrix Selection

Filtered vs Raw Matrices

Matrix TypeDescriptionUse Case
FilteredCell Ranger filtered cells and featuresStandard analysis
RawAll detected barcodes and featuresCustom filtering workflows

Quality Control Impact

Using useqc_matrix=True:

  • Applies Cell Ranger's cell calling algorithm
  • Removes low-quality cells and features
  • Reduces computational burden
  • Recommended for most analyses

Using useqc_matrix=False:

  • Preserves all detected barcodes
  • Allows custom quality control
  • Increases data size and processing time
  • Useful for specialized analyses

Seurat Object Structure

Standard Components

Created Seurat objects contain:

      # Seurat object structure
seurat_obj@assays$RNA@counts        # Raw count matrix
seurat_obj@assays$RNA@data          # Normalized data (initially same as counts)
seurat_obj@meta.data               # Cell metadata
seurat_obj@reductions              # Dimensionality reductions (empty initially)
seurat_obj@graphs                  # Cell-cell graphs (empty initially)

    

Metadata Integration

The function automatically adds metadata:

ColumnDescription
orig.identSample identifier
nCount_RNATotal UMI count per cell
nFeature_RNANumber of detected genes per cell
sample_idOriginal sample accession ID

Error Handling

Prerequisite Checking

The function validates prerequisites:

      # Sample must be counted, predicted, and preprocessed
if not (sample.path.is_predicted_celltype and sample.path.is_preprocessed):
    print(f"Sample {sample.schema.key} is not ready. Skip")
    continue

    

Existing Object Detection

      # Skip if Seurat object already exists
if os.path.isfile(f"{sample.path.data_sample}/seurat.rds"):
    print(f"Sample {sample.schema.key} already processed. Skip")
    continue

    

Common Issues

  1. Missing R Dependencies: Ensure Seurat and hdf5r packages are installed
  2. Insufficient Memory: Large datasets require substantial RAM
  3. File Permissions: Check read/write permissions
  4. Corrupted Matrices: Validate Cell Ranger output integrity

R Environment Requirements

Required Packages

      # Install required R packages
install.packages(c("Seurat", "hdf5r", "Matrix"))

# For Bioconductor dependencies
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()

    

Memory Configuration

      # Increase memory limit if needed
memory.limit(size = 32000)  # 32 GB limit on Windows

# For large datasets
options(future.globals.maxSize = 8000 * 1024^2)  # 8 GB

    

Performance Considerations

Memory Requirements

Dataset SizeMemory Needed
<5K cells8 GB
5K-20K cells16 GB
20K-50K cells32 GB
>50K cells64+ GB

Processing Time

Typical processing times:

Dataset SizeProcessing Time
<5K cells1-2 minutes
5K-20K cells2-5 minutes
20K-50K cells5-15 minutes
>50K cells15+ minutes

Cluster Computing

Job Submission

For cluster environments:

      #!/bin/bash
#PBS -N CreateSeurat
#PBS -l nodes=1:ppn=1
#PBS -l mem=32gb
#PBS -l walltime=2:00:00

module load R/4.1.0
Rscript create_seurat.R

    

Resource Allocation

Recommended cluster resources:

  • CPU: 1 core (R is primarily single-threaded for this task)
  • Memory: 32-64 GB depending on dataset size
  • Walltime: 2-4 hours for large datasets

Methods

call(project: Project) -> Project

Main execution method that creates Seurat objects for all eligible samples.

Parameters:

  • project: The Celline project instance

Returns: Updated project instance

Process:

  1. Iterates through all samples in the project
  2. Checks prerequisites (counted, predicted, preprocessed)
  3. Skips samples with existing Seurat objects
  4. Generates R scripts for Seurat object creation
  5. Executes scripts using ThreadObservable

Output Validation

Quality Checks

The function validates successful creation:

      # Validate Seurat object
if (class(seurat_obj) == "Seurat") {
  cat("Successfully created Seurat object\n")
  cat("Number of cells:", ncol(seurat_obj), "\n")
  cat("Number of features:", nrow(seurat_obj), "\n")
} else {
  stop("Failed to create valid Seurat object")
}

    

File Verification

      # Check if output file exists and is valid
output_file = f"{sample.path.data_sample}/seurat.rds"
if os.path.isfile(output_file) and os.path.getsize(output_file) > 0:
    print("Seurat object created successfully")
else:
    raise FileNotFoundError("Failed to create Seurat object")

    

Integration with Pipeline

Typical Workflow

      from celline import Project
from celline.functions.count import Count
from celline.functions.preprocess import Preprocess
from celline.functions.predict_celltype import PredictCelltype
from celline.functions.create_seurat import CreateSeuratObject

# Complete pipeline
project = Project("./my-project")

# Process data
project.call(Count(nthread=8))
project.call(Preprocess())
project.call(PredictCelltype())

# Create Seurat objects
project.call(CreateSeuratObject(useqc_matrix=True))

    

Troubleshooting

Common Issues

  1. R Package Missing: Install required R packages
  2. Memory Error: Increase system memory or use filtered matrices
  3. File Not Found: Ensure Cell Ranger count completed successfully
  4. Permission Denied: Check file system permissions

Debug Mode

Enable detailed R logging:

      # Add to R script for debugging
options(error = traceback)
sessionInfo()

    

Manual Execution

For debugging, run R script manually:

      # Navigate to sample directory
cd data/SAMPLE_ID/src/

# Execute R script manually
Rscript create_seurat.sh