June 29 - Databases and Python Scripts
Today I continued the work for Databases
and while working with the Linux terminal that I use to access Compute Canada, I came also to the need of an efficient way to rename folders, subfolders and files. Some here are some of the notes from these days.
Notes
Copy files from original directory to my folder in rrg-ayers-ab
to work on the database files (issue_14
and Na_13
):
cd ~/projects/projects/rrg-ayers-ab/juansa
cp -r ../databases/share_geometry/issue_14/ .
cp -r ../databases/share_geometry/Na_13/ .
Rename folders
Inside issue_14
there are repeated folders for CH<sub>4</sub>
, after deleting the folders, there are some missing numbers in the first folders, for example:
├── 0001_CH4
│ ├── 0001_CH4.chk
├── 0003_CH2OH
│ ├── 0003_CH2OH.chk
where 0002_CH4
was the repeated folder. To renumber the folders, I ran the following command:
database number 00*
Now folders are in the following order:
├── 0001_0001_CH4
│ ├── 0001_CH4.chk
├── 0002_0003_CH2OH
│ ├── 0003_CH2OH.chk
To rename the folders, I spent a good while looking for practical solutions using Linux terminal, nevertheless the hours spent in that task were worthless. I was almost convinced to learn regular expressions
for the sake of accomplishing this task, that on my own view should be very easy. Here is some of the code that I tried:
rename -n -v 's/^\d{4}_\K\d{5}//' *
rename -n -v 's/^\d{4}_\K.....//' *
rename -n 's/^\d{4}_\K.{5}(.*)/$1/' *
rename -n 's/^\d{5}\K\d{5}//' *
rename -n 's/\_\d{4}//' *
's/^\d{5}//[a-zA-Z0-9]$'
rename -n 's/_d{4}//' *
Unfortunately, none of this worked so I asked @fwmeng88 for his help in this task. After some notes that he shared with me, I was able to write a short script in python
to solve this task. Here I show my code, which is actually my first python script implemented in a computing cluster:
import os
for old_folder in next(os.walk("."))[1]:
str_list = old_folder.split("_")
new_folder = str_list[0] + "_" + str_list[-1]
os.rename(old_folder, new_folder)
After this, now the directory names have the desired names:
├── 0001_CH4
│ ├── 0001_CH4.chk
├── 0002_CH2OH
│ ├── 0003_CH2OH.chk
It was not until the next day that I noticed that files inside the folders were not renamed, but that will be posted in a future blog entry,
Finally, as it must be implemented in my files located in CC, the way to run this script in a terminal is the following:
cd ~/.../my_directory
python my_script.py