benchmark.dataset

class Yandex(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) → None[source]

Bases: InMemoryDataset

Paper:: A critical look at the evaluation of GNNs under heterophily: are we really making progress?
Ref:: https://github.com/yandex-research/heterophilous-graphs

property raw_dir: str[source]

property processed_dir: str[source]

property raw_file_names: str[source]: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() → None[source]: Downloads the dataset to the self.raw_dir folder.

process() → None[source]: Processes the dataset to the self.processed_dir folder.

class LINKX(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) → None[source]

Bases: InMemoryDataset

Paper:: Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods
Ref:: https://github.com/CUAI/Non-Homophily-Large-Scale/

_dataset_drive_url = {'pokec.mat': '1dNs5E7BrWJbgcHeQ_zuy5Ozp2tRCWG0y', 'snap-patents.mat': '1ldh23TSY1PwXia6dU0MYcpyEgX-w3Hia', 'twitch-gamer_edges.csv': '1XLETC6dG3lVl7kDmytEJ52hvDMVdxnZ0', 'twitch-gamer_features.csv': '1fA9VIIEI8N0L27MSQfcBzJgRQLvSbrvR', 'wiki_edges.pt': '14X7FlkjrlUgmnsYtPwdh-gGuFla4yb5u', 'wiki_features.pt': '1ySNspxbK-snNoAZM7oxiWGvOnTRdSyEK', 'wiki_views.pt': '1p5DlVHrnFgYm3VsNIzahSsvCD424AyvP', 'yelp-chi.mat': '1fAXtTVQS4CfEk4asqrFw9EPmlUPGbGtJ'}

_splits_drive_url = {'pokec_splits.npy': '1ZhpAiyTNc0cE_hhgyiqxnkKREHK7MK-_', 'snap-patents_splits.npy': '12xbBRqd8mtG_XkNLH8dRRNZJvVM4Pw-N'}

property raw_dir: str[source]

property processed_dir: str[source]

property raw_file_names: str[source]: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() → None[source]: Downloads the dataset to the self.raw_dir folder.

process() → None[source]: Processes the dataset to the self.processed_dir folder.

class FB100(root: str, name: str, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False) → None[source]

Bases: InMemoryDataset

property raw_dir: str[source]

property processed_dir: str[source]

property raw_file_names: str[source]: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str[source]: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download() → None[source]: Downloads the dataset to the self.raw_dir folder.

process() → None[source]: Processes the dataset to the self.processed_dir folder.

class Grid2D(root, name, transform=None, pre_transform=None)[source]

Bases: InMemoryDataset

property raw_dir: str[source]

property processed_dir: str[source]

property raw_file_names[source]: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names[source]: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]: Downloads the dataset to the self.raw_dir folder.

process()[source]: Processes the dataset to the self.processed_dir folder.

T_insert(transform, new_t: BaseTransform, index=-1) → Compose[source]

resolve_data(args: Namespace, dataset: Dataset) → Data[source]

Acquire data and properties from dataset.

Parameters:

args (Namespace) –
Parameters.
- args.multi (bool): True for multi-label classification.
dataset (Dataset) – PyG dataset object.

Returns:

data (Data) – The resolved PyG data object from the dataset.

Updates:

args.in_channels (int) – Number of input features.
args.out_channels (int) – Number of output classes.

resolve_split(data_split: str, data: Data) → Data[source]

Apply data split masks.

Parameters:

data_split (str) –
Index of dataset split, formatted as scheme_split or scheme_split_seed.
- scheme='Random': Random split, split is train/val/test ratio.
- scheme='Stratify': Stratified split, split is train/val/test ratio.
- scheme='Original': Original split, split is the index of split.
data (Data) – PyG data object containing the dataset and its attributes.

Returns:

data (Data) – The updated PyG data object with split masks (train/val/test).