datasets

GATNE dataset

class cogdl.datasets.gatne.AmazonDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gatne.GatneDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

The network datasets “Amazon”, “Twitter” and “YouTube” from the “Representation Learning for Attributed Multiplex Heterogeneous Network” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Amazon", "Twitter", "YouTube").

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/THUDM/GATNE/raw/master/data'
class cogdl.datasets.gatne.TwitterDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gatne.YouTubeDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

cogdl.datasets.gatne.read_gatne_data(folder)[source]

GCC dataset

class cogdl.datasets.gcc_data.Edgelist(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/cenyk1230/gcc-data/raw/master'
class cogdl.datasets.gcc_data.GCCDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

preprocess(root, name)[source]
property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/cenyk1230/gcc-data/raw/master'
class cogdl.datasets.gcc_data.KDD_ICDM_GCCDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gcc_data.SIGIR_CIKM_GCCDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gcc_data.SIGMOD_ICDE_GCCDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gcc_data.USAAirportDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

GTN dataset

class cogdl.datasets.gtn_data.ACM_GTNDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gtn_data.DBLP_GTNDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.gtn_data.GTNDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

The network datasets “ACM”, “DBLP” and “IMDB” from the “Graph Transformer Networks” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("gtn-acm", "gtn-dblp", "gtn-imdb").

apply_to_device(device)[source]
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

read_gtn_data(folder)[source]
class cogdl.datasets.gtn_data.IMDB_GTNDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

HAN dataset

class cogdl.datasets.han_data.ACM_HANDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.han_data.DBLP_HANDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.han_data.HANDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

The network datasets “ACM”, “DBLP” and “IMDB” from the “Heterogeneous Graph Attention Network” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("han-acm", "han-dblp", "han-imdb").

apply_to_device(device)[source]
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

read_gtn_data(folder)[source]
class cogdl.datasets.han_data.IMDB_HANDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

cogdl.datasets.han_data.sample_mask(idx, length)[source]

Create mask.

KG dataset

class cogdl.datasets.kg_data.BidirectionalOneShotIterator(dataloader_head, dataloader_tail)[source]

Bases: object

static one_shot_iterator(dataloader)[source]

Transform a PyTorch Dataloader into python iterator

class cogdl.datasets.kg_data.FB13Datset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.kg_data.FB13SDatset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.kg_data.FB15k237Datset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.kg_data.FB15kDatset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.kg_data.KnowledgeGraphDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_entities
property num_relations
process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

property test_start_idx
property train_start_idx
url = 'https://cloud.tsinghua.edu.cn/d/d1c733373b014efab986/files/?p=%2F{}%2F{}&dl=1'
property valid_start_idx
class cogdl.datasets.kg_data.TestDataset(triples, all_true_triples, nentity, nrelation, mode)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

static collate_fn(data)[source]
class cogdl.datasets.kg_data.TrainDataset(triples, nentity, nrelation, negative_sample_size, mode)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

static collate_fn(data)[source]
static count_frequency(triples, start=4)[source]

Get frequency of a partial triple like (head, relation) or (relation, tail) The frequency will be used for subsampling like word2vec

static get_true_head_and_tail(triples)[source]

Build a dictionary of true triples that will be used to filter these true triples for negative sampling

class cogdl.datasets.kg_data.WN18Datset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.kg_data.WN18RRDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

cogdl.datasets.kg_data.read_triplet_data(folder)[source]

Matlab matrix dataset

class cogdl.datasets.matlab_matrix.BlogcatalogDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.matlab_matrix.DblpNEDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.matlab_matrix.FlickrDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.matlab_matrix.MatlabMatrix(root, name, url)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

networks from the http://leitang.net/code/social-dimension/data/ or http://snap.stanford.edu/node2vec/

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Blogcatalog").

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_classes

The number of classes in the dataset.

property num_nodes
process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset(root, name, url)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

property num_classes

The number of classes in the dataset.

property num_nodes
process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class cogdl.datasets.matlab_matrix.PPIDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.matlab_matrix.WikipediaDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.matlab_matrix.YoutubeNEDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

OGB dataset

class cogdl.datasets.ogb.OGBArxivDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBCodeDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBGDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

get(idx)[source]

Gets the data object at index idx.

get_loader(args)[source]
get_subset(subset)[source]
property num_classes

The number of classes in the dataset.

class cogdl.datasets.ogb.OGBLCitation2Dataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBLCollabDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBLDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

get(idx)[source]

Gets the data object at index idx.

get_edge_split()[source]
get_evaluator()[source]
get_loss_fn()[source]
property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

class cogdl.datasets.ogb.OGBLDdiDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBLPpaDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBMolbaceDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBMolhivDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBMolpcbaDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBNDataset(root, name, transform=None)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

get(idx)[source]

Gets the data object at index idx.

get_evaluator()[source]
get_loss_fn()[source]
process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

class cogdl.datasets.ogb.OGBPapers100MDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBPpaDataset[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBProductsDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.ogb.OGBProteinsDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

property edge_attr_size
get_evaluator()[source]
get_loss_fn()[source]
process()[source]

Processes the dataset to the self.processed_dir folder.

TU dataset

class cogdl.datasets.tu_data.CollabDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.ENZYMES(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.ImdbBinaryDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.ImdbMultiDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.MUTAGDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.NCI109Dataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.NCI1Dataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.PTCMRDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.ProteinsDataset(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.RedditBinary(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.RedditMulti12K(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.RedditMulti5K(data_path='data')[source]

Bases: Generic[torch.utils.data.dataset.T_co]

class cogdl.datasets.tu_data.TUDataset(root, name)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

download()[source]

Downloads the dataset to the self.raw_dir folder.

property num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

property raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://www.chrsmrrs.com/graphkerneldatasets'
cogdl.datasets.tu_data.cat(seq)[source]
cogdl.datasets.tu_data.coalesce(index, value, m, n)[source]
cogdl.datasets.tu_data.normalize_feature(data)[source]
cogdl.datasets.tu_data.num_edge_attributes(edge_attr=None)[source]
cogdl.datasets.tu_data.num_edge_labels(edge_attr=None)[source]
cogdl.datasets.tu_data.num_node_attributes(x=None)[source]
cogdl.datasets.tu_data.num_node_labels(x=None)[source]
cogdl.datasets.tu_data.parse_txt_array(src, sep=None, start=0, end=None, dtype=None, device=None)[source]
cogdl.datasets.tu_data.read_file(folder, prefix, name, dtype=None)[source]
cogdl.datasets.tu_data.read_tu_data(folder, prefix)[source]
cogdl.datasets.tu_data.read_txt_array(path, sep=None, start=0, end=None, dtype=None, device=None)[source]
cogdl.datasets.tu_data.segment(src, indptr)[source]

Module contents

cogdl.datasets.build_dataset(args)[source]
cogdl.datasets.build_dataset_from_name(dataset, split=0)[source]
cogdl.datasets.build_dataset_from_path(data_path, dataset=None)[source]
cogdl.datasets.register_dataset(name)[source]

New dataset types can be added to cogdl with the register_dataset() function decorator.

For example:

@register_dataset('my_dataset')
class MyDataset():
    (...)
Parameters

name (str) – the name of the dataset

cogdl.datasets.try_adding_dataset_args(dataset, parser)[source]