June 22 - SHARCNET Summer School: Bioinformatics
There is a Summer School currently running in SHARCNET. I enrolled in some courses and today's starts the course in Bioinformatics. I share underneath the notes from the 2-session course.
AM Session
How to submit a job in Compute Canada scheduler. Example my_submission.sh
file:
#!/bin/bash
#SBATCH --account=def-training-wa
#SBATCH --reservation=snss-wr_cpu
#SBATCH --time=0-0:10:00 # d-hh:mm:ss
echo hello
sleep 10
Run in terminal: sbatch --jobname=test my_submission.sh
Useful terminal commands in Compute Canada:
module purge
: will onload anything that is non-standard.module restore
: will store to default.
Nix 101 is Package Manager that is available in Compute Canada.
To create a python environment (different than conda/anaconda environment):
module load python/3.8
virtualenv --no-download ~/ENV
source ~/ENV/bin/activate
pip install --upgrade pip
pip install --no-index <PACKAGE_NAME>
deactivate
PM session
Some useful commands in terminal:
top
: see the running commands in the computer.htop
: visual version of htop.uptime
: see actual state for computer resources.free
: available memory.sq
: to see your user's works.
Work with BLAST
BLAST = Basic Local Alignment Tool. BLAST finds regions of similarity between biological sequences. It's a heuristic pairwise sequence alignment.
IOData & Databases - Compute Canada Environment Setup
- Make a virtual environment called env_databses, activate it, and install Python libraries:
module load python/3.6
mkdir ~/envs
virtualenv --no-download ~/envs/env_database
source ~/envs/env_database/bin/activate
pip install --no-index --upgrade pip
- Install IOData in the virtual environment and run the tests (warnings are fine):
mkdir ~/modules
cd ~/modules
git clone https://github.com/theochem/iodata.git
cd iodata
python3 -m pip install -e .
pytest -v iodata
- Install Databases from source (you need to provide your GitHub username & password when cloning the repository) to access databases command-line interface:
cd ~/modules
git clone https://github.com/QuantumElephant/databases.git
cd databases
python3 -m pip install -e .
- Databases Scripts Updates: scripts are actively being updated and developed, so you will most likely need to update the databases source code on Compute Canada by (make sure the env_databses environment is active):
cd ~/modules/databases
git pull origin master
pip install -e .
The same commands can be used to update IOData source code; just use cd ~/modules/iodata.
Note: for some reason I am having problem with these instructions in Graham, thus I was not able to install the corresponding packages to start working with the database examples. This is something that I was already aware, i suspect that something in between 40-50% of a computational chemist's time is spent dealing with broken code, looking through StackExchange blogs and troubleshooting software that doesn't want to run. So for today I am done with dealing with Compute Canada, I posted a discussion thread in the group and hope that someone's wisdom might help me solve this problem.
I will try to install the packages locally to see if I am dealingwith the same problems that in Compute Canada as it may be that I am messing something from my side.
Installing IOData and Databases locally with conda
- Create local virtual environment with conda:
conda create --name loc_env_database
conda activate loc_env_database
conda install scipy numpy
- Clone repository from github and install it with
pip
:
mkdir ~/modules
git clone https://github.com/theochem/iodata.git
cd iodata
python3 pip install --upgrade pip
python3 -m pip install -e .
- Update package when GitHub's repo is updated.
cd ~/modules/iodata
git pull origin master
pip install -e .