July 27 - Getting geometries
Today I am going back to some of the pending work that I had in Database workflow. Here are the notes of the day.
Create a json
file with charges and multiplicities.
database json 1099:SO3--:-2:3 1117:SO4--:-2:1 1117:SO4--:-2:3 140772:S2O:0:1 24682:SO3:0:1 6857671:SO3-:-1:2 5460595:SO4-:-1:2 159922:SO5--:-2:1 159922:SO5--:-2:3 107879:S2O8--:-2:1 107879:S2O8--:-2:3 177717:S2O7--:-2:1 177717:S2O7--:-2:3 3082075:S2O6--:-2:1 3082075:S2O6--:-2:3 159940:S2O5--:-2:1 159940:S2O5--:-2:3 1086:S2O4--:-2:1 1086:S2O4--:-2:3 1084:S2O3--:-2:1 1084:S2O3--:-2:3 491:S3O6--:-2:1 491:S3O6--:-2:3 4657547:S4O6--:-2:1 4657547:S4O6--:-2:3 -o sulfur_oxides.json
database download sulfur_oxides.json
Some of the geometries were able to be created due to the lack of geometry information in PubChemQC. An error message for this molecules appeared as the following:
Failed to download file for:
0 .//PBQC_CID000001086_S2O4--.gjf -> dithionite
1 .//PBQC_CID000159922_SO5--.gjf -> peroxymonosulfate
2 .//PBQC_CID004657547_S4O6--.gjf -> tetrathionate
3 .//PBQC_CID005460595_SO4-.gjf -> SO4+radical+anion
Note: Current scripts can't create a json file containing different charge/multiplicities for a same molecule. Input is overwritten if input is intended in this way. Adittionaly, the enhancements in the code for automatically generating molecular geometries given a txt
file could be improved by allowing it to take CID
codes.
Issue: I found that SMILES generated from a TXT file with compound names are not compatible with the structure generating command for RDKit.