Big Datasets
All the files below are shared without any restrictions. But, I will gradually move them to Git Hub to improve the accessibility.
1. Data for delta-ML of 134 kilo (GDB1-GDB9) molecules
This dataset was used to generate Fig. 8 and Fig. 9 of JCTC, 11 (2015) 2087–2096. The dataset comprises of structures (in Angstroms) of 133885 (134k) molecules relaxed at the PM7 level, along with heats of atomization (in kcal/mol) at PM7 and B3LYP/6-31G(2df,p) levels. Note that B3LYP heats were computed for structures relaxed at the B3LYP level, as reported in Scientific Data, 1 (2014) 140022.
Reference Quantum chemistry structures and properties of 134 kilo molecules |
2. Data for delta-ML of atomization energies 6 kilo C7H10O2 constitutional isomers
The dataset comprises of structures (in Angstroms) of 6095 (6k) constitutional isomers of stoichiometry C7H10O2 relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with atomization energies (in kcal/mol) at HF/6-31G(d), MP2/6-31G(d), CCSD/6-31G(d), and CCSD(T)/6-31G(d) levels.
Reference Quantum chemistry structures and properties of 134 kilo molecules |
3. Data for delta-ML of atomization energies 9 kilo diastereomers of C7H10O2 constitutional isomers
The dataset comprises of structures (in Angstroms) of 9868 (10k) “additional” diastereomers isomers of parent 6095 C7H10O2 isomers relaxed at the B3LYP/6-31G(2df,p) level. Note that the original 6095 isomers are part of the GDB9 dataset published in Scientific Data, 1 (2014) 140022). The 10k diastereomers were generated separately in order to validate the transferability of the delta-ML model trained on the parent isomers.
Reference Quantum chemistry structures and properties of 134 kilo molecules |
4. Data for delta-ML of electronic spectra of 22 kilo (GDB1-GDB8) molecules
This dataset was used in JCP, 143 (2015) 084111. The dataset comprises of structures (in Angstroms) of 21786 (22k) molecules relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with valence electronic excitation energy (in hartree) and oscillator strengths (in atomic unit, length representation) computed at the levels RICC2/def2TZVP, LR-TD-PBE0/def2SVP, LR-TD-PBE0/def2TZVP, LR-TD-CAMB3LYP/def2TZVP.
Here is the link for the dataset on Github:
Reference Quantum chemistry structures and properties of 134 kilo molecules |
5. Data for “many properties one kernel” of 112 kilo (only GDB9, 9 heavy atoms only) molecules
This dataset was used in Chimia, 69 (2015) 182-186. The dataset comprises of structures (in Angstroms) of 111597 (112k) molecules relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with 13 properties computed at the same level. Please see the top two lines of the file “112k_properties.dat” for definitions and units of various properties. See also the supporting information archived in Basel.
Reference Quantum chemistry structures and properties of 134 kilo molecules |