July 29 - Data collection for imbalanced algorithms and meeting with Fanwang
Today I spent a good time reading articles about application of ML in drug discovery with the goal to find if any significant amount of research applying these kind of tools are using balanced datasets. Our feeling is that most datasets from the real world isn't balanced, but common algorithms and approaches need balanced data. Our research upon current literature may give us a glimpse about what's happening in the industry.
Data collection for imbalanced algorithms
Here are some notes taken from J. Chem. Inf. Model. 2020, 60, 9, 4180–4190 about evaluation metrics for imbalanced algorithsm:
To evaluate the performance of the network, we calculated six performance metrics, including balanced accuracy, precision, recall, F1 score, Matthews correlation coefficient, and area under the ROC curve.