Skip to content

Troubleshooting

This guide helps resolve common issues when using U-Probe.

Installation Issues

Command not found: uprobe

Problem: After installation, the uprobe command is not recognized.

Solutions:

  1. Check if installed correctly:
bash
   pip list | grep uprobe
   python -c "import uprobe; print(uprobe.__version__)"
  1. Try using Python module syntax:
bash
   python -m uprobe --help
  1. Check PATH (for --user installs):
bash
   # Add to ~/.bashrc or ~/.zshrc
   export PATH="$HOME/.local/bin:$PATH"
  1. Reinstall in a virtual environment:
bash
   python -m venv uprobe_env
   source uprobe_env/bin/activate
   pip install uprobe

ImportError: No module named 'uprobe'

Problem: Python cannot find the uprobe module.

Solutions:

  1. Verify installation:
bash
   pip show uprobe
  1. Check Python environment:
bash
   which python
   which pip
   # Ensure both point to the same environment
  1. Reinstall:
bash
   pip uninstall uprobe
   pip install uprobe

Missing dependencies errors

Problem: Errors about missing packages like pandas, click, etc.

Solutions:

  1. Install all requirements:
bash
   pip install -r requirements.txt
  1. Update pip and try again:
bash
   pip install --upgrade pip
   pip install uprobe
  1. For development installs:
bash
   pip install -e ".[dev]"

Configuration Issues

FileNotFoundError: [Errno 2] No such file or directory

Problem: U-Probe cannot find specified files.

Solutions:

  1. Use absolute paths:
yaml
   # Instead of relative paths
   fasta: "genome.fa"
   # Use absolute paths  
   fasta: "/full/path/to/genome.fa"
  1. Check file permissions:
bash
   ls -la /path/to/genome.fa
   # Ensure files are readable
  1. Verify file existence:
bash
   file /path/to/genome.fa
   head -n 5 /path/to/genome.fa

Target validation failed

Problem: Error message "Invalid targets found" or no targets pass validation.

Solutions:

  1. Check gene names in GTF:
bash
   # Search for your gene in GTF
   grep -i "GAPDH" /path/to/annotation.gtf
   
   # Check available gene names
   awk '$3=="gene"' /path/to/annotation.gtf | \
   grep -o 'gene_name "[^"]*"' | sort | uniq | head -20
  1. Try different gene identifiers:
yaml
   targets:
     - "GAPDH"           # Gene symbol
     - "ENSG00000111640" # Ensembl ID
     - "2597"            # Entrez ID
  1. Use continue-invalid flag for testing:
bash
   uprobe validate-targets -p protocol.yaml -g genomes.yaml --continue-invalid
  1. Check GTF format:
bash
   # GTF should have these columns:
   # seqname source feature start end score strand frame attribute
   head -n 5 /path/to/annotation.gtf

Invalid YAML syntax

Problem: YAML parsing errors.

Solutions:

  1. Check indentation (use spaces, not tabs):
yaml
   # Correct
   probes:
     main_probe:          # 2 spaces
       template: "{seq}"  # 4 spaces
   
   # Wrong (tabs or inconsistent spacing)
   probes:
   	main_probe:        # tab character
      template: "{seq}"   # 3 spaces
  1. Validate YAML syntax:
bash
   python -c "import yaml; yaml.safe_load(open('protocol.yaml'))"
  1. Quote strings with special characters:
yaml
   # Quote expressions and conditions
   expr: "rc(target_region[0:20])"
   condition: "gc_content >= 0.4 & gc_content <= 0.6"

Runtime Issues

No target sequences generated

Problem: The generate-targets step produces an empty result.

Solutions:

  1. Check extraction parameters:
yaml
   extracts:
     target_region:
       source: "exon"  # Try "gene" if exons are too short
       length: 50      # Reduce if regions are smaller
       overlap: 10     # Reduce overlap
  1. Verify targets exist:
bash
   uprobe validate-targets -p protocol.yaml -g genomes.yaml -v
  1. Check for gene annotation issues:
bash
   # Look for your gene in GTF
   grep "GAPDH" /path/to/annotation.gtf | head -5

No probes constructed

Problem: The construct-probes step fails or produces no output.

Solutions:

  1. Check probe expressions:
yaml
   probes:
     test_probe:
       template: "{simple_part}"
       parts:
         simple_part:
           length: 20
           expr: "target_region[0:20]"  # Simple expression
  1. Verify encoding mappings:
yaml
   # Ensure all target genes have encoding entries
   encoding:
     GAPDH:  # Must match target name exactly
       BC1: "ACGTACGTACGT"
  1. Test with minimal probe:
yaml
   probes:
     minimal:
       expr: "target_region[0:25]"

All probes filtered out

Problem: Post-processing removes all probes.

Solutions:

  1. Use --raw flag to see unfiltered probes:
bash
   uprobe run -p protocol.yaml -g genomes.yaml --raw
  1. Relax filtering conditions:
yaml
   post_process:
     filters:
       gc_content:
         condition: "gc_content >= 0.2 & gc_content <= 0.8"  # Very relaxed
  1. Check attribute calculations:
yaml
   # Remove problematic attributes temporarily
   attributes:
     basic_gc:
       target: main_probe
       type: gc_content
     # Comment out complex attributes:
     # off_targets: ...
  1. Examine raw results:
python
   import pandas as pd
   df = pd.read_csv('results/experiment_raw.csv')
   print(df.describe())  # Check attribute distributions
   print(df[df['gc_content'].isna()])  # Find failed calculations

Performance Issues

Slow execution

Problem: U-Probe runs very slowly.

Solutions:

  1. Increase thread count:
bash
   uprobe run -p protocol.yaml -g genomes.yaml -t 16
  1. Use faster extraction:
yaml
   extracts:
     target_region:
       source: "exon"  # Faster than "gene"
       length: 100     # Shorter regions
  1. Reduce expensive attributes:
yaml
   attributes:
     # Keep fast attributes
     gc_content:
       target: main_probe
       type: gc_content
     # Remove slow ones temporarily:
     # fold_score: ...
     # kmer_count: ...
  1. Process in batches:
bash
   # Split large target lists
   uprobe run -p small_batch.yaml -g genomes.yaml

Memory issues

Problem: Out of memory errors or system becomes unresponsive.

Solutions:

  1. Process smaller batches:
yaml
   targets:
     - "GAPDH"
     - "ACTB"
     # Process 5-10 genes at a time for large genomes
  1. Reduce sequence length:
yaml
   extracts:
     target_region:
       length: 80   # Shorter sequences use less memory
       overlap: 15
  1. Skip memory-intensive attributes:
yaml
   # Avoid these for large datasets:
   # - n_mapped_genes with blast
   # - kmer_count
   # - complex fold_score calculations

Index building fails

Problem: Genome index building fails or crashes.

Solutions:

  1. Check available disk space:
bash
   df -h /path/to/genome/directory
  1. Verify genome file integrity:
bash
   file /path/to/genome.fa
   head -n 10 /path/to/genome.fa
   tail -n 10 /path/to/genome.fa
  1. Build indices manually:
bash
   # Bowtie2
   bowtie2-build /path/to/genome.fa /path/to/indices/genome
   
   # BLAST
   makeblastdb -in /path/to/genome.fa -dbtype nucl -out /path/to/indices/genome
  1. Use pre-built indices:
yaml
   # Point to existing indices
   human_hg38:
     fasta: "/data/hg38.fa"
     gtf: "/data/hg38.gtf"
     out: "/data/existing_indices"  # Pre-built indices location

Attribute Calculation Issues

Melting temperature calculation fails

Problem: Tm calculation produces NaN values or errors.

Solutions:

  1. Check sequence validity:
python
   # Sequences should only contain ATCG
   import re
   def check_sequence(seq):
       return bool(re.match('^[ATCG]*$', seq))
  1. Handle short sequences:
yaml
   # Ensure minimum sequence length
   probes:
     main_probe:
       parts:
         binding:
           length: 15  # Minimum for reliable Tm calculation

Off-target calculation fails

Problem: Alignment-based attributes fail.

Solutions:

  1. Verify indices exist:
bash
   ls -la /path/to/indices/
   # Should contain .bt2 files for bowtie2
  1. Test aligner manually:
bash
   # Test bowtie2
   echo "ATCGATCGATCGATCG" | bowtie2 -x /path/to/indices/genome -
  1. Use alternative aligner:
yaml
   attributes:
     off_targets:
       target: main_probe
       type: n_mapped_genes
       aligner: blast  # Try blast if bowtie2 fails

K-mer counting fails

Problem: kmer_count attributes produce errors.

Solutions:

  1. Check Jellyfish database:
bash
   jellyfish info genome.jf
  1. Build Jellyfish database:
bash
   jellyfish count -m 15 -s 1000000000 -t 8 -o genome.jf genome.fa
  1. Use alternative complexity measures:
yaml
   # Instead of kmer_count, use:
   attributes:
     sequence_complexity:
       target: main_probe
       type: complexity_score

Data Format Issues

Unexpected output format

Problem: Output CSV has unexpected columns or values.

Solutions:

  1. Check probe names match:
yaml
   # Probe names become column names
   probes:
     my_probe:  # Creates column 'my_probe'
       template: "{seq}"
  1. Verify attribute names:
yaml
   attributes:
     probe_gc:     # Creates column 'probe_gc'
       target: my_probe
       type: gc_content
  1. Examine raw output:
bash
   uprobe run -p protocol.yaml -g genomes.yaml --raw
   # Check _raw.csv file for all calculated values

Missing sequences in output

Problem: Some expected probes are missing from results.

Solutions:

  1. Check filtering criteria:
yaml
   # Very permissive filters for debugging
   post_process:
     filters:
       anything_goes:
         condition: "True"  # Passes everything
  1. Look for errors in logs:
bash
   uprobe --verbose run -p protocol.yaml -g genomes.yaml 2>&1 | tee log.txt
  1. Check intermediate files:
bash
   ls -la results/
   wc -l results/*.csv  # Count lines in each file

Getting Help

Check Logs

Always run with verbose output for troubleshooting:

bash
uprobe --verbose run -p protocol.yaml -g genomes.yaml 2>&1 | tee uprobe.log

Minimal Test Case

Create a minimal test to isolate issues:

yaml
# minimal_test.yaml
name: "minimal_test"
genome: "human_hg38"
targets: ["GAPDH"]  # Just one target

extracts:
  target_region:
    source: "exon"
    length: 50
    overlap: 10

probes:
  simple:
    expr: "target_region[0:20]"

# No attributes or filters initially

Report Issues

When reporting issues, include:

  1. U-Probe version: uprobe version
  2. Full error message and traceback
  3. Configuration files (anonymized)
  4. System information: OS, Python version
  5. Steps to reproduce

Where to Get Help

  1. Documentation: Check this documentation first
  2. GitHub Issues: Report bugs
  3. GitHub Discussions: Ask questions
  4. Examples: Review working examples in the repository

Common Error Messages

Error MessageSolution
"Genome 'X' not found"Check genome name matches genomes.yaml key
"No targets specified"Add targets list to protocol.yaml
"Invalid expression: X"Check probe expression syntax
"Attribute calculation failed"Verify required files and indices exist
"No data to concatenate"Check that previous steps generated output
"YAML parsing error"Check indentation and syntax
"Permission denied"Check file permissions and disk space
"Index not found"Run build-index command first

Prevention Tips

  1. Start simple: Begin with basic configurations and add complexity gradually
  2. Validate early: Use validate-targets before full runs
  3. Test with subsets: Use small target lists for initial testing
  4. Use version control: Track configuration changes
  5. Document decisions: Comment your configuration files
  6. Regular backups: Keep backups of working configurations

Next Steps

If you're still having issues:

  1. Review the examples for working configurations
  2. Check the configuration guide for detailed option descriptions
  3. Ask for help on GitHub Discussions

Released under the MIT License.