GOOD.data.good_datasets.good_webkb

The GOOD-WebKB dataset adapted from the Geom-GCN: Geometric Graph Convolutional Networks.

Classes

DataInfo(idx, y)

The class for data point storage.

DomainGetter()

A class containing methods for data domain extraction.

GOODWebKB(root, domain[, shift, transform, ...])

The GOOD-WebKB dataset adapted from the Geom-GCN: Geometric Graph Convolutional Networks.

class GOOD.data.good_datasets.good_webkb.DataInfo(idx, y)[source]

Bases: object

The class for data point storage. This enables tackling node data point like graph data point, facilitating data splits.

class GOOD.data.good_datasets.good_webkb.DomainGetter[source]

Bases: object

A class containing methods for data domain extraction.

get_university(graph: Data) int[source]
Parameters

graph (Data) – The PyG Data object.

Returns

The university that the webpages belong to.

class GOOD.data.good_datasets.good_webkb.GOODWebKB(root: str, domain: str, shift: str = 'no_shift', transform=None, pre_transform=None, generate: bool = False)[source]

Bases: InMemoryDataset

The GOOD-WebKB dataset adapted from the Geom-GCN: Geometric Graph Convolutional Networks.

Parameters
  • root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘university’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

download()[source]

Downloads the dataset to the self.raw_dir folder.

static load(dataset_root: str, domain: str, shift: str = 'no_shift', generate: bool = False)[source]

A staticmethod for dataset loading. This method instantiates dataset class, constructing train, id_val, id_test, ood_val (val), and ood_test (test) splits. Besides, it collects several dataset meta information for further utilization.

Parameters
  • dataset_root (str) – The dataset saving root.

  • domain (str) – The domain selection. Allowed: ‘degree’ and ‘time’.

  • shift (str) – The distributional shift we pick. Allowed: ‘no_shift’, ‘covariate’, and ‘concept’.

  • generate (bool) – The flag for regenerating dataset. True: regenerate. False: download.

Returns

dataset or dataset splits. dataset meta info.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.