cogdl.datasets.pyg_strategies_data

This file is borrowed from https://github.com/snap-stanford/pretrain-gnns/

Module Contents

Functions

nx_to_graph_data_obj(g, center_id, allowable_features_downstream=None, allowable_features_pretrain=None, node_id_to_go_labels=None)

graph_data_obj_to_nx(data)

graph_data_obj_to_nx_simple(data)

Converts graph Data object required by the pytorch geometric package to

nx_to_graph_data_obj_simple(G)

Converts nx graph to pytorch geometric Data object. Assume node indices

reset_idxes(G)

Resets node indices such that they are numbered from 0 to num_nodes - 1

cogdl.datasets.pyg_strategies_data.nx_to_graph_data_obj(g, center_id, allowable_features_downstream=None, allowable_features_pretrain=None, node_id_to_go_labels=None)[source]
cogdl.datasets.pyg_strategies_data.graph_data_obj_to_nx(data)[source]
cogdl.datasets.pyg_strategies_data.graph_data_obj_to_nx_simple(data)[source]

Converts graph Data object required by the pytorch geometric package to network x data object. NB: Uses simplified atom and bond features, and represent as indices. NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered. :param data: pytorch geometric Data object :return: network x object

cogdl.datasets.pyg_strategies_data.nx_to_graph_data_obj_simple(G)[source]

Converts nx graph to pytorch geometric Data object. Assume node indices are numbered from 0 to num_nodes - 1. NB: Uses simplified atom and bond features, and represent as indices. NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered. :param G: nx graph obj :return: pytorch geometric Data object

class cogdl.datasets.pyg_strategies_data.NegativeEdge[source]

Borrowed from https://github.com/snap-stanford/pretrain-gnns/

__call__(self, data)[source]
class cogdl.datasets.pyg_strategies_data.MaskEdge(mask_rate)[source]

Borrowed from https://github.com/snap-stanford/pretrain-gnns/

__call__(self, data, masked_edge_indices=None)[source]
class cogdl.datasets.pyg_strategies_data.MaskAtom(num_atom_type, num_edge_type, mask_rate, mask_edge=True)[source]

Borrowed from https://github.com/snap-stanford/pretrain-gnns/

__call__(self, data, masked_atom_indices=None)[source]
Parameters

data – pytorch geometric data object. Assume that the edge

ordering is the default pytorch geometric ordering, where the two directions of a single edge occur in pairs. Eg. data.edge_index = tensor([[0, 1, 1, 2, 2, 3],

[1, 0, 2, 1, 3, 2]])

Parameters

masked_atom_indices – If None, then randomly samples num_atoms

  • mask rate number of atom indices

Otherwise a list of atom idx that sets the atoms to be masked (for debugging only) :return: None, Creates new attributes in original data object: data.mask_node_idx data.mask_node_label data.mask_edge_idx data.mask_edge_label

__repr__(self)[source]

Return repr(self).

cogdl.datasets.pyg_strategies_data.reset_idxes(G)[source]

Resets node indices such that they are numbered from 0 to num_nodes - 1 :param G: :return: copy of G with relabelled node indices, mapping

class cogdl.datasets.pyg_strategies_data.ExtractSubstructureContextPair(l1, center=True)[source]
__call__(self, data, root_idx=None)[source]
__repr__(self)[source]

Return repr(self).

class cogdl.datasets.pyg_strategies_data.ChemExtractSubstructureContextPair(k, l1, l2)[source]
__call__(self, data, root_idx=None)[source]
Parameters
  • data – pytorch geometric data object

  • root_idx – If None, then randomly samples an atom idx.

Otherwise sets atom idx of root (for debugging only) :return: None. Creates new attributes in original data object: data.center_substruct_idx data.x_substruct data.edge_attr_substruct data.edge_index_substruct data.x_context data.edge_attr_context data.edge_index_context data.overlap_context_substruct_idx

__repr__(self)[source]

Return repr(self).

class cogdl.datasets.pyg_strategies_data.BatchFinetune(batch=None, **kwargs)[source]

Bases: torch_geometric.data.Data

static from_data_list(data_list)[source]

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly.

property num_graphs(self)[source]

Returns the number of graphs in the batch.

class cogdl.datasets.pyg_strategies_data.BatchMasking(batch=None, **kwargs)[source]

Bases: torch_geometric.data.Data

static from_data_list(data_list)[source]

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly.

cumsum(self, key, item)[source]

If True, the attribute key with content item should be added up cumulatively before concatenated together. .. note:

This method is for internal use only, and should only be overridden
if the batch concatenation process is corrupted for a specific data
attribute.
property num_graphs(self)[source]

Returns the number of graphs in the batch.

class cogdl.datasets.pyg_strategies_data.BatchAE(batch=None, **kwargs)[source]

Bases: torch_geometric.data.Data

static from_data_list(data_list)[source]

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly.

property num_graphs(self)[source]

Returns the number of graphs in the batch.

cat_dim(self, key)[source]
class cogdl.datasets.pyg_strategies_data.BatchSubstructContext(batch=None, **kwargs)[source]

Bases: torch_geometric.data.Data

static from_data_list(data_list)[source]

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly.

cat_dim(self, key)[source]
cumsum(self, key, item)[source]

If True, the attribute key with content item should be added up cumulatively before concatenated together. .. note:

This method is for internal use only, and should only be overridden
if the batch concatenation process is corrupted for a specific data
attribute.
property num_graphs(self)[source]

Returns the number of graphs in the batch.

class cogdl.datasets.pyg_strategies_data.DataLoaderFinetune(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

class cogdl.datasets.pyg_strategies_data.DataLoaderMasking(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

class cogdl.datasets.pyg_strategies_data.DataLoaderAE(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

class cogdl.datasets.pyg_strategies_data.DataLoaderSubstructContext(dataset, batch_size=1, shuffle=True, **kwargs)[source]

Bases: torch.utils.data.DataLoader

class cogdl.datasets.pyg_strategies_data.TestBioDataset(data_type='unsupervised', root=None, transform=None, pre_transform=None, pre_filter=None)[source]

Bases: torch_geometric.data.InMemoryDataset

class cogdl.datasets.pyg_strategies_data.TestChemDataset(data_type='unsupervised', root=None, transform=None, pre_transform=None, pre_filter=None)[source]

Bases: torch_geometric.data.InMemoryDataset

get(self, idx)[source]
class cogdl.datasets.pyg_strategies_data.BioDataset(data_type='unsupervised', empty=False, transform=None, pre_transform=None, pre_filter=None)[source]

Bases: torch_geometric.data.InMemoryDataset

property raw_file_names(self)[source]
property processed_file_names(self)[source]
download(self)[source]
process(self)[source]
class cogdl.datasets.pyg_strategies_data.MoleculeDataset(data_type='unsupervised', transform=None, pre_transform=None, pre_filter=None, empty=False)[source]

Bases: torch_geometric.data.InMemoryDataset

get(self, idx)[source]
property raw_file_names(self)[source]
property processed_file_names(self)[source]
download(self)[source]
process(self)[source]
class cogdl.datasets.pyg_strategies_data.BACEDataset(transform=None, pre_transform=None, pre_filter=None, empty=False)[source]

Bases: torch_geometric.data.InMemoryDataset

get(self, idx)[source]
property raw_file_names(self)[source]
property processed_file_names(self)[source]
download(self)[source]
process(self)[source]
class cogdl.datasets.pyg_strategies_data.BBBPDataset(transform=None, pre_transform=None, pre_filter=None, empty=False)[source]

Bases: torch_geometric.data.InMemoryDataset

get(self, idx)[source]
property raw_file_names(self)[source]
property processed_file_names(self)[source]
download(self)[source]
process(self)[source]