Today I continued the work for Databases and while working with the Linux terminal that I use to access Compute Canada, I came also to the need of an efficient way to rename folders, subfolders and files. Some here are some of the notes from these days.

Notes

Copy files from original directory to my folder in rrg-ayers-ab to work on the database files (issue_14 and Na_13):

cd ~/projects/projects/rrg-ayers-ab/juansa
cp -r ../databases/share_geometry/issue_14/ .
cp -r ../databases/share_geometry/Na_13/ .

Rename folders

Inside issue_14 there are repeated folders for CH<sub>4</sub>, after deleting the folders, there are some missing numbers in the first folders, for example:

├── 0001_CH4
│   ├── 0001_CH4.chk
├── 0003_CH2OH
│   ├── 0003_CH2OH.chk

where 0002_CH4 was the repeated folder. To renumber the folders, I ran the following command:

database number 00*

Now folders are in the following order:

├── 0001_0001_CH4
│   ├── 0001_CH4.chk
├── 0002_0003_CH2OH
│   ├── 0003_CH2OH.chk

To rename the folders, I spent a good while looking for practical solutions using Linux terminal, nevertheless the hours spent in that task were worthless. I was almost convinced to learn regular expressions for the sake of accomplishing this task, that on my own view should be very easy. Here is some of the code that I tried:

rename -n -v 's/^\d{4}_\K\d{5}//' *

rename -n -v 's/^\d{4}_\K.....//' *

rename -n 's/^\d{4}_\K.{5}(.*)/$1/' *

rename -n 's/^\d{5}\K\d{5}//' *

rename -n 's/\_\d{4}//' *

's/^\d{5}//[a-zA-Z0-9]$'

rename -n 's/_d{4}//' *

Unfortunately, none of this worked so I asked @fwmeng88 for his help in this task. After some notes that he shared with me, I was able to write a short script in python to solve this task. Here I show my code, which is actually my first python script implemented in a computing cluster:

import os

for old_folder in next(os.walk("."))[1]:
    str_list = old_folder.split("_")
    new_folder = str_list[0] + "_" + str_list[-1]
    os.rename(old_folder, new_folder) 

After this, now the directory names have the desired names:

├── 0001_CH4
│   ├── 0001_CH4.chk
├── 0002_CH2OH
│   ├── 0003_CH2OH.chk

It was not until the next day that I noticed that files inside the folders were not renamed, but that will be posted in a future blog entry,

Finally, as it must be implemented in my files located in CC, the way to run this script in a terminal is the following:

cd ~/.../my_directory
python my_script.py