Big Datasets

All the files below are shared without any restrictions. But, I will gradually move them to Git Hub to improve the accessibility.

1. Data for delta-ML of 134 kilo (GDB1-GDB9) molecules

This dataset was used to generate Fig. 8 and Fig. 9 of JCTC, 11 (2015) 2087–2096. The dataset comprises of structures (in Angstroms) of 133885 (134k) molecules relaxed at the PM7 level, along with heats of atomization (in kcal/mol) at PM7 and B3LYP/6-31G(2df,p) levels. Note that B3LYP heats were computed for structures relaxed at the B3LYP level, as reported in Scientific Data, 1 (2014) 140022.

Big data meets quantum chemistry approximations: The delta-machine learning approach
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Journal of Chemical Theory and Computation, 11 (2015) 2087–2096.

Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data 1, 140022 (2014).

2. Data for delta-ML of atomization energies 6 kilo C7H10O2 constitutional isomers

The dataset comprises of structures (in Angstroms) of 6095 (6k) constitutional isomers of stoichiometry C7H10O2 relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with atomization energies (in kcal/mol) at HF/6-31G(d), MP2/6-31G(d), CCSD/6-31G(d), and CCSD(T)/6-31G(d) levels.

Big data meets quantum chemistry approximations: The delta-machine learning approach
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Journal of Chemical Theory and Computation, 11 (2015) 2087–2096.

Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data 1, 140022 (2014).


3. Data for delta-ML of atomization energies 9 kilo diastereomers of C7H10O2 constitutional isomers

The dataset comprises of structures (in Angstroms) of 9868 (10k) “additional” diastereomers isomers of parent 6095 C7H10O2 isomers relaxed at the B3LYP/6-31G(2df,p) level. Note that the original 6095 isomers are part of the GDB9 dataset published in Scientific Data, 1 (2014) 140022). The 10k diastereomers were generated separately in order to validate the transferability of the delta-ML model trained on the parent isomers.

Big data meets quantum chemistry approximations: The delta-machine learning approach
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Journal of Chemical Theory and Computation, 11 (2015) 2087–2096.

Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data 1, 140022 (2014).

4. Data for delta-ML of electronic spectra of 22 kilo (GDB1-GDB8) molecules

This dataset was used in JCP, 143 (2015) 084111. The dataset comprises of structures (in Angstroms) of 21786 (22k) molecules relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with valence electronic excitation energy (in hartree) and oscillator strengths (in atomic unit, length representation) computed at the levels RICC2/def2TZVP, LR-TD-PBE0/def2SVP, LR-TD-PBE0/def2TZVP, LR-TD-CAMB3LYP/def2TZVP.

Here is the link for the dataset on Github:


Electronic spectra from TDDFT and machine learning in chemical space
Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, O. Anatole von Lilienfeld
Journal of Chemical Physics, 143 (2015) 084111 (1-8).

Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data 1, 140022 (2014).

5. Data for “many properties one kernel” of 112 kilo (only GDB9, 9 heavy atoms only) molecules

This dataset was used in Chimia, 69 (2015) 182-186. The dataset comprises of structures (in Angstroms) of 111597 (112k) molecules relaxed at the B3LYP/6-31G(2df,p) level (from Scientific Data, 1 (2014) 140022), along with 13 properties computed at the same level. Please see the top two lines of the file “112k_properties.dat” for definitions and units of various properties. See also the supporting information archived in Basel.

Many molecular properties from one kernel in chemical space
Raghunathan Ramakrishnan, O. Anatole von Lilienfeld
Chimia, 69 (2015) 182-186.

Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data 1, 140022 (2014).