Mondays are the days when I have recurrent meetings with Fanwang. He is a PhD candidate at AyersLab and is currently working in his thesis. I am glad to be helping with some of the work he has to do, specially the one that involves B3DB and imbalanced learning. I had some peding work from last week, so today I hope to finish that.

In summary, at this stage we are trying to train different learning models to classify the blood-brain barrier permeability of drug molecules using chemical descriptors. This is a supervised learning approach, as there is a label for each molecule in the dataset.

Today I wrote the scripts to implement the following models. SVC, KNN, DecisionTrees, LogisticRegression, GPC.

Besides working on the ML models, I wrote a script to encode the categorical data BBB+ and BBB- to a usable format for the ML algorithms. Some of them can take without problem the labels as strings but others such as DecisionTrees and LogisticRegression need the labels as integer values. This process is called label encoding. The main commands are the following:

df_X.replace({'category' : {'BBB+': 1, 'BBB-': 0}}, inplace = True)