benchmark.dataset

class Yandex(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) None[source]

Bases: InMemoryDataset

Paper:

A critical look at the evaluation of GNNs under heterophily: are we really making progress?

Ref:

https://github.com/yandex-research/heterophilous-graphs

property raw_dir: str[source]
property processed_dir: str[source]
property raw_file_names: str[source]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() None[source]

Downloads the dataset to the self.raw_dir folder.

process() None[source]

Processes the dataset to the self.processed_dir folder.

class LINKX(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) None[source]

Bases: InMemoryDataset

Paper:

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Ref:

https://github.com/CUAI/Non-Homophily-Large-Scale/

_dataset_drive_url = {'pokec.mat': '1dNs5E7BrWJbgcHeQ_zuy5Ozp2tRCWG0y', 'snap-patents.mat': '1ldh23TSY1PwXia6dU0MYcpyEgX-w3Hia', 'twitch-gamer_edges.csv': '1XLETC6dG3lVl7kDmytEJ52hvDMVdxnZ0', 'twitch-gamer_features.csv': '1fA9VIIEI8N0L27MSQfcBzJgRQLvSbrvR', 'wiki_edges.pt': '14X7FlkjrlUgmnsYtPwdh-gGuFla4yb5u', 'wiki_features.pt': '1ySNspxbK-snNoAZM7oxiWGvOnTRdSyEK', 'wiki_views.pt': '1p5DlVHrnFgYm3VsNIzahSsvCD424AyvP', 'yelp-chi.mat': '1fAXtTVQS4CfEk4asqrFw9EPmlUPGbGtJ'}
_splits_drive_url = {'pokec_splits.npy': '1ZhpAiyTNc0cE_hhgyiqxnkKREHK7MK-_', 'snap-patents_splits.npy': '12xbBRqd8mtG_XkNLH8dRRNZJvVM4Pw-N'}
property raw_dir: str[source]
property processed_dir: str[source]
property raw_file_names: str[source]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() None[source]

Downloads the dataset to the self.raw_dir folder.

process() None[source]

Processes the dataset to the self.processed_dir folder.

class FB100(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) None[source]

Bases: InMemoryDataset

property raw_dir: str[source]
property processed_dir: str[source]
property raw_file_names: str[source]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() None[source]

Downloads the dataset to the self.raw_dir folder.

process() None[source]

Processes the dataset to the self.processed_dir folder.

class Grid2D(root, name, transform=None, pre_transform=None)[source]

Bases: InMemoryDataset

property raw_dir: str[source]
property processed_dir: str[source]
property raw_file_names[source]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names[source]

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

T_insert(transform, new_t: BaseTransform, index=-1) Compose[source]
resolve_data(args: Namespace, dataset: Dataset) Data[source]

Acquire data and properties from dataset.

Parameters:
  • args (Namespace) –

    Parameters.

    • args.multi (bool): True for multi-label classification.

  • dataset (Dataset) – PyG dataset object.

Returns:

data (Data) – The resolved PyG data object from the dataset.

Updates:
  • args.num_features (int) – Number of input features.

  • args.num_classes (int) – Number of output classes.

resolve_split(data_split: str, data: Data) Data[source]

Apply data split masks.

Parameters:
  • data_split (str) –

    Index of dataset split, formatted as scheme_split or scheme_split_seed.

    • scheme='Random': Random split, split is train/val/test ratio.

    • scheme='Stratify': Stratified split, split is train/val/test ratio.

    • scheme='Original': Original split, split is the index of split.

  • data (Data) – PyG data object containing the dataset and its attributes.

Returns:

data (Data) – The updated PyG data object with split masks (train/val/test).