QM descriptors

Best models were consistently KNN (best MAE: 0.871, hyperparams={'metric': 'manhattan', 'n_neighbors': 5 and 10, 'weights': 'distance'}) > KernelRidge (best MAE: 0.921, hyperparams={'alpha': 0.1, 'gamma': 0.01, 'kernel': 'rbf'})

Fingerprints

  • Linear regression gave VERY BAD results

ECFP6: Best models were consistently KernelRidge (best MAE: 0.742, hyperparams={'alpha': 0.1, 'gamma': 0.01, 'kernel': 'rbf'}); all Ridge and SGDReg had MAE tests scores below 0.80

MaCCSKeys: Best models were consistently KernelRidge (best MAE: 0.731, hyperparams={'alpha': 0.1, 'gamma': 0.01, 'kernel': 'rbf'}); KNN, Ridge, and SGDReg had MAE test scores below 0.80

Morgan: Best models were consistently KernelRidge (best MAE: 0.715, hyperparams={'alpha': 0.1, 'gamma': 0.01, 'kernel': 'rbf'}); Ridge and SGDReg reported consistent MAE test scores below 0.75

SECFP6: Bet models were consisntently KernelRidge (best MAE: 0.733, hyperparams={'alpha': 0.1, 'gamma': 0.01, 'kernel': 'rbf'}); SGD and Ridge reported consistent MAE test scores below 0.76.

RDK: Best models were between Ridge and ElasticNet (best MAE: 0.741, hyperparams for Ridge={'alpha': 100}, hyperparams for ElasticNet={'alpha': 0.1, 'fit_intercept': True, 'l1_ratio': 0}); SGDReg, KernelRidge and KNN reported consistent MAE test scores between 0.75 and 0.80

Chemical Features

20: Best model, KNN (best MAE: 0.870) > KernelRidge 30: Best model, KNN (best MAE: 0.843) > KernelRidge 50: Best model, KNN (best MAE: 0.780) > KernelRidge 100: Best model, KNN (best MAE: 0.723) > KernelRidge 150: Best model, KNN (best MAE: 0.693) > KernelRidge 200: Best model, KNN (best MAE: 0.686) > ElastcNet 250: Best model, KNN (best MAE: 0.710) > Ridge & ElasticNet 300: Best model, KNN (best MAE: 0.697) > Ridge & ElasticNet 400: Best model, KNN (best MAE: 0.714) > Ridge & ElasticNet 500: Best model, KNN (best MAE: 0.738) ≈ Ridge & ElasticNet 600: Best model, KNN (best MAE: 0.722) > ElasticNet & Ridge 700: Best model, Ridge (best MAE: 0.701) ≈ KNN, ElasticNet 757: Best model, Ridge (best MAE: 0.693) > ElasticNet, KNN

NOTES: ADD BEST_CV SCORE to see based on what optimization metricGridSearchCV chose the best strategy optimizer. 'GridSearchCV_score': gridobject.best_score_