cogdl.data

Package Contents

Classes

Data

A plain old python object modeling a single graph with various

Batch

A plain old python object modeling a batch of graphs as one big

Dataset

Dataset base class for creating graph datasets.

DataLoader

Data loader which merges data objects from a

DataListLoader

Data loader which merges data objects from a

DenseDataLoader

Data loader which merges data objects from a

Functions

download_url(url, folder, name=None, log=True)

Downloads the content of an URL to a specific folder.

extract_tar(path, folder, mode='r:gz', log=True)

Extracts a tar archive to a specific folder.

extract_zip(path, folder, log=True)

Extracts a zip archive to a specific folder.

extract_bz2(path, folder, log=True)

extract_gz(path, folder, log=True)

class cogdl.data.Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None)[source]

Bases: object

A plain old python object modeling a single graph with various (optional) attributes:

Args:
x (Tensor, optional): Node feature matrix with shape :obj:`[num_nodes,

num_node_features]`. (default: None)

edge_index (LongTensor, optional): Graph connectivity in COO format

with shape [2, num_edges]. (default: None)

edge_attr (Tensor, optional): Edge feature matrix with shape

[num_edges, num_edge_features]. (default: None)

y (Tensor, optional): Graph or node targets with arbitrary shape.

(default: None)

pos (Tensor, optional): Node position matrix with shape

[num_nodes, num_dimensions]. (default: None)

The data object is not restricted to these attributes and can be extented by any other additional data.

static from_dict(dictionary)

Creates a data object from a python dictionary.

__getitem__(self, key)

Gets the data of the attribute key.

__setitem__(self, key, value)

Sets the attribute key to value.

property keys(self)

Returns all names of graph attributes.

__len__(self)

Returns the number of all present attributes.

__contains__(self, key)

Returns True, if the attribute key is present in the data.

__iter__(self)

Iterates over all present attributes in the data, yielding their attribute names and content.

__call__(self, *keys)

Iterates over all attributes *keys in the data, yielding their attribute names and content. If *keys is not given this method will iterative over all present attributes.

cat_dim(self, key, value)

Returns the dimension in which the attribute key with content value gets concatenated when creating batches.

Note

This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

__inc__(self, key, value)

“Returns the incremental count to cumulatively increase the value of the next attribute of key when creating batches.

Note

This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

property num_edges(self)

Returns the number of edges in the graph.

property num_features(self)

Returns the number of features per node in the graph.

property num_nodes(self)
is_coalesced(self)

Returns True, if edge indices are ordered and do not contain duplicate entries.

apply(self, func, *keys)

Applies the function func to all attributes *keys. If *keys is not given, func is applied to all present attributes.

contiguous(self, *keys)

Ensures a contiguous memory layout for all attributes *keys. If *keys is not given, all present attributes are ensured to have a contiguous memory layout.

to(self, device, *keys)

Performs tensor dtype and/or device conversion to all attributes *keys. If *keys is not given, the conversion is applied to all present attributes.

cuda(self, *keys)
clone(self)
__repr__(self)

Return repr(self).

class cogdl.data.Batch(batch=None, **kwargs)[source]

Bases: cogdl.data.Data

A plain old python object modeling a batch of graphs as one big (dicconnected) graph. With cogdl.data.Data being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vector batch, which maps each node to its respective graph identifier.

static from_data_list(data_list, follow_batch=[])

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly. Additionally, creates assignment batch vectors for each key in follow_batch.

cumsum(self, key, item)

If True, the attribute key with content item should be added up cumulatively before concatenated together.

Note

This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

to_data_list(self)

Reconstructs the list of torch_geometric.data.Data objects from the batch object. The batch object must have been created via from_data_list() in order to be able reconstruct the initial objects.

property num_graphs(self)

Returns the number of graphs in the batch.

class cogdl.data.Dataset(root, transform=None, pre_transform=None, pre_filter=None)[source]

Bases: torch.utils.data.Dataset

Dataset base class for creating graph datasets. See here for the accompanying tutorial.

Args:

root (string): Root directory where the dataset should be saved. transform (callable, optional): A function/transform that takes in an

cogdl.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

pre_transform (callable, optional): A function/transform that takes in

an cogdl.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

pre_filter (callable, optional): A function that takes in an

cogdl.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

property raw_file_names(self)

The name of the files to find in the self.raw_dir folder in order to skip the download.

property processed_file_names(self)

The name of the files to find in the self.processed_dir folder in order to skip the processing.

abstract download(self)

Downloads the dataset to the self.raw_dir folder.

abstract process(self)

Processes the dataset to the self.processed_dir folder.

abstract __len__(self)

The number of examples in the dataset.

abstract get(self, idx)

Gets the data object at index idx.

property num_features(self)

Returns the number of features per node in the graph.

property raw_paths(self)

The filepaths to find in order to skip the download.

property processed_paths(self)

The filepaths to find in the self.processed_dir folder in order to skip the processing.

_download(self)
_process(self)
__getitem__(self, idx)

Gets the data object at index idx and transforms it (in case a self.transform is given).

__repr__(self)
class cogdl.data.DataLoader(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

Data loader which merges data objects from a cogdl.data.dataset to a mini-batch.

Args:

dataset (Dataset): The dataset from which to load the data. batch_size (int, optional): How may samples per batch to load.

(default: 1)

shuffle (bool, optional): If set to True, the data will be

reshuffled at every epoch (default: True)

class cogdl.data.DataListLoader(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

Data loader which merges data objects from a cogdl.data.dataset to a python list.

Note

This data loader should be used for multi-gpu support via cogdl.nn.DataParallel.

Args:

dataset (Dataset): The dataset from which to load the data. batch_size (int, optional): How may samples per batch to load.

(default: 1)

shuffle (bool, optional): If set to True, the data will be

reshuffled at every epoch (default: True)

class cogdl.data.DenseDataLoader(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

Data loader which merges data objects from a cogdl.data.dataset to a mini-batch.

Note

To make use of this data loader, all graphs in the dataset needs to have the same shape for each its attributes. Therefore, this data loader should only be used when working with dense adjacency matrices.

Args:

dataset (Dataset): The dataset from which to load the data. batch_size (int, optional): How may samples per batch to load.

(default: 1)

shuffle (bool, optional): If set to True, the data will be

reshuffled at every epoch (default: True)

cogdl.data.download_url(url, folder, name=None, log=True)[source]

Downloads the content of an URL to a specific folder.

Args:

url (string): The url. folder (string): The folder. log (bool, optional): If False, will not print anything to the

console. (default: True)

cogdl.data.extract_tar(path, folder, mode='r:gz', log=True)[source]

Extracts a tar archive to a specific folder.

Args:

path (string): The path to the tar archive. folder (string): The folder. mode (string, optional): The compression mode. (default: "r:gz") log (bool, optional): If False, will not print anything to the

console. (default: True)

cogdl.data.extract_zip(path, folder, log=True)[source]

Extracts a zip archive to a specific folder.

Args:

path (string): The path to the tar archive. folder (string): The folder. log (bool, optional): If False, will not print anything to the

console. (default: True)

cogdl.data.extract_bz2(path, folder, log=True)[source]
cogdl.data.extract_gz(path, folder, log=True)[source]