There is a Summer School currently running in SHARCNET. I enrolled in some courses and today's starts the course in Bioinformatics. I share underneath the notes from the 2-session course.

AM Session

How to submit a job in Compute Canada scheduler. Example my_submission.sh file:

#!/bin/bash
#SBATCH --account=def-training-wa
#SBATCH --reservation=snss-wr_cpu
#SBATCH --time=0-0:10:00 # d-hh:mm:ss

echo hello
sleep 10

Run in terminal: sbatch --jobname=test my_submission.sh

Useful terminal commands in Compute Canada:

  • module purge: will onload anything that is non-standard.
  • module restore: will store to default.

Nix 101 is Package Manager that is available in Compute Canada.

To create a python environment (different than conda/anaconda environment):

module load python/3.8
virtualenv --no-download ~/ENV
source ~/ENV/bin/activate
pip install --upgrade pip
pip install --no-index <PACKAGE_NAME>
deactivate

PM session

Some useful commands in terminal:

  • top: see the running commands in the computer.
  • htop: visual version of htop.
  • uptime: see actual state for computer resources.
  • free: available memory.
  • sq: to see your user's works.

Work with BLAST

BLAST = Basic Local Alignment Tool. BLAST finds regions of similarity between biological sequences. It's a heuristic pairwise sequence alignment.

IOData & Databases - Compute Canada Environment Setup

  1. Make a virtual environment called env_databses, activate it, and install Python libraries:
module load python/3.6
mkdir ~/envs
virtualenv --no-download ~/envs/env_database
source ~/envs/env_database/bin/activate
pip install --no-index --upgrade pip
  1. Install IOData in the virtual environment and run the tests (warnings are fine):
mkdir ~/modules
cd ~/modules
git clone https://github.com/theochem/iodata.git
cd iodata
python3 -m pip install -e .
pytest -v iodata
  1. Install Databases from source (you need to provide your GitHub username & password when cloning the repository) to access databases command-line interface:
cd ~/modules
git clone https://github.com/QuantumElephant/databases.git
cd databases
python3 -m pip install -e .
  1. Databases Scripts Updates: scripts are actively being updated and developed, so you will most likely need to update the databases source code on Compute Canada by (make sure the env_databses environment is active):
cd ~/modules/databases
git pull origin master
pip install -e .

The same commands can be used to update IOData source code; just use cd ~/modules/iodata.

Note: for some reason I am having problem with these instructions in Graham, thus I was not able to install the corresponding packages to start working with the database examples. This is something that I was already aware, i suspect that something in between 40-50% of a computational chemist's time is spent dealing with broken code, looking through StackExchange blogs and troubleshooting software that doesn't want to run. So for today I am done with dealing with Compute Canada, I posted a discussion thread in the group and hope that someone's wisdom might help me solve this problem.

I will try to install the packages locally to see if I am dealingwith the same problems that in Compute Canada as it may be that I am messing something from my side.

Installing IOData and Databases locally with conda

  1. Create local virtual environment with conda:
conda create --name loc_env_database
conda activate loc_env_database
conda install scipy numpy
  1. Clone repository from github and install it with pip:
mkdir ~/modules
git clone https://github.com/theochem/iodata.git
cd iodata
python3 pip install --upgrade pip
python3 -m pip install -e .
  1. Update package when GitHub's repo is updated.
cd ~/modules/iodata
git pull origin master
pip install -e .