July 12 - Progress in B3DB
Mondays are the days when I have recurrent meetings with Fanwang. He is a PhD candidate at AyersLab
and is currently working in his thesis. I am glad to be helping with some of the work he has to do, specially the one that involves B3DB
and imbalanced learning
. I had some peding work from last week, so today I hope to finish that.
In summary, at this stage we are trying to train different learning models to classify the blood-brain barrier permeability of drug molecules using chemical descriptors. This is a supervised learning approach, as there is a label for each molecule in the dataset.
Today I wrote the scripts to implement the following models. SVC, KNN, DecisionTrees, LogisticRegression, GPC
.
Besides working on the ML models, I wrote a script to encode the categorical data BBB+
and BBB-
to a usable format for the ML algorithms. Some of them can take without problem the labels as strings
but others such as DecisionTrees
and LogisticRegression
need the labels as integer
values. This process is called label encoding. The main commands are the following:
df_X.replace({'category' : {'BBB+': 1, 'BBB-': 0}}, inplace = True)