GOOD.data.good_datasets.good_zinc

The GOOD-ZINC dataset. Adapted from ZINC database.

Classes

DomainGetter()

A class containing methods for data domain extraction.

GOODZINC(root, domain[, shift, subset, ...])

The GOOD-ZINC dataset adapted from ZINC database.

class GOOD.data.good_datasets.good_zinc.DomainGetter[source]

Bases: object

A class containing methods for data domain extraction.

get_nodesize(smile: str) int[source]
Parameters

smile (str) – A smile string for a molecule.

Returns

The number of node in the molecule.

get_scaffold(smile: str) str[source]
Parameters

smile (str) – A smile string for a molecule.

Returns

The scaffold string of the smile.

class GOOD.data.good_datasets.good_zinc.GOODZINC(root: str, domain: str, shift: str = 'no_shift', subset: str = 'train', transform=None, pre_transform=None, generate: bool = False)[source]

Bases: InMemoryDataset

The GOOD-ZINC dataset adapted from ZINC database.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘scaffold’ and ‘size’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • subset (str) – The split set. Allowed: ‘train’, ‘id_val’, ‘id_test’, ‘val’, and ‘test’. When shift=’no_shift’, ‘id_val’ and ‘id_test’ are not applicable.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.