Cancer is the second leading cause of death globally after heart disease, and unfortunately, its incidence is rapidly growing. Thiazole derivatives are currently one of the most frequently studied compounds in anticancer drug discovery due to the easy optimization of the structure and the ability to synthesize a wide group of derivatives. Therefore, discovering new drugs with very high activity against the most frequently diagnosed types of cancer, as well as against those that most often lead to death, is currently the biggest challenge for many scientists. Our present study aimed to work on a trending machine learning approach with an open-source data analysis python script for discovering anticancer lead via building the QSAR model by using 53 compounds of thiazole derivatives. Dataset of 53 compounds of thiazole derivatives was collected from various reported literature and divided into train and test set compounds using the "Rule of Thumb" method. A total of 82 CDK molecular descriptors were downloaded from the "chemdes" online web server and used for our study. After training the model, we checked the model performance via cross-validation, which predicted the test set compound's bioactivity. Besides, we have applied three algorithms to forecast the model performance viz. multiple linear regression (MLR), support vector machine (SVM), and partial least square (PLS) regression model. The generated QSAR model afforded the ordinary least squares (OLS) regression as R2=0.542, F=8.773, and adjusted R2 (Q2) =0.481, std. error = 0.061, reg.coef_ developed were of, -0.00064 (PC1), -0.07753 (PC2), -0.09078 (PC3), -0.08986 (PC4), 0.05044 (PC5), and reg.intercept_ of 4.79279 developed through stats models.formula module. The performance of test set prediction was done by MLR, SVM, and PLS classifiers of the sklearn module, which generated the model score of 0.5424, 0.6422, and 0.6422, respectively. In addition to this, the linear regression curve was plotted between the predicted and actual p IC50 value, and all the data points mostly fell over and close to the middle line. We found that the R2 values (i.e., the model score) obtained using this script via three algorithms were correlated well. There is not much difference between them and may be useful in designing a similar group of thiazole derivatives anticancer agents.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright (c) 2022 Journal of Pharmaceutical Chemistry