Graphs

InteractionGraph

class pybiographs.graphs.InteractionGraph(directed: bool = False)[source]

Represents a graph of protein-protein interactions.

It can load a directed or undirected graph, and wraps the corresponding networkx graph. If it represents a directed graph it behaves as a networkx.Digraph. Otherwise it behaves as a networkx.Graph.

Node attributes:
  • label : uniprot_id from uniprot (https://www.uniprot.org/)

  • string node_type : metabolome_graph (with pathway and metabolites associated)

    or other_protein (not referenced as metabolome proteins: no metabolites and no pathway on smpd : https://smpdb.ca/)

  • string info : Text explaining the products of the mRNA that codes

    the protein from STRING database : https://string-db.org/).

  • list cellular_components : list of Go Id cellular components the protein

    is belonging to in QuickGO (see gene ontology : https://www.ebi.ac.uk/QuickGO/). Mappings``.go_to_name`` maps GoId to names.

  • list molecular_functions : list of Go Id as above but for molecular functions.

  • list biological_processes : list of Go Id as above but for biological processes.

  • list expression_data : Vector of float of size 308 corresponding to expression

    ranks of intial RNAm coding the protein renormalized from 0 to 1 in 308 tissues (see https://bgee.org/). index_tissue is a dict mapping index in vector to string tissue name.

  • list metabolites : list of HMDB ID metabolites associated to protein if

    it is a metabolome_protein (see https://hmdb.ca/). Mappings``.metabolite_id_to_name`` contains the

    mapping from id to metabolite name.

  • list pathways : list of pathway of names the belonging to the metabolome_protein.

    for more information on a pathway, search it on smpd (might not be referenced).

  • string sequence : Amino acid sequence for the protein.

Directed edge attributes:

Undirected edge attributes:

__init__(directed: bool = False)[source]

Initialize a InteractionGraph.

Parameters:directed – If True the instance will represent a directed graph of interactions. If False the instance will represent an undirected graph of interactions.
classify_tissue_by_node_expression(nodes, limit=30)[source]

Takes a list of nodes, then print tissues where the set of nodes is the most expressed.

Parameters:
  • nodes – nodes to be searched for.
  • limit – limit to print.
Returns:

None.

get_nodes_by_sequence_regex(sequence_regex)[source]

Search a regex in amino acid sequences stored in nodes and returns matching node results. :param sequence_regex: regex to search in sequences

Returns:a list of nodes (uniprot ids).
info_sequence_regex(res, reg, attribute)[source]

Depending on node attribute “info” or “sequence”, search regex in all node attributes and return an union between query node results and nodes that have a match.

Parameters:
  • res – entry node list corresponding to query result so far.
  • reg – regex to be search as a string.
  • attribute – “info” or “sequence”.
Returns:

union list of matching nodes and res.

is_directed

Return True if it represents a directed graph. Otherwise return False.

metabolites_regex(res, reg)[source]

Search a regex in metabolites names in graph node attributes and return an union of matching results with query results so far.

Parameters:
  • res – entry node list corresponding to query results so far.
  • reg – regex to be search as a string.
Returns:

union list of matching nodes and res.

most_present_biological_processes(graph, tissue, bp_size_thresh=0, limit=10)[source]

After sub_graph_from_node_propagation, this function can be used to print most affected biological processes.

Parameters:
  • graph – sub graph to print most affected components.
  • tissue – string, the tissue where to analyze the biological processes.
  • bp_size_thresh – a threshold on size on number proteins in biological processes
  • limit – limit to print.
Returns:

None.

most_present_cellular_components(graph, tissue, cc_size_thresh=0, limit=10)[source]

Similar to most_affected_biological_processes; but for cellular components.

Parameters:
  • graph – sub graph to print most affected cellular components.
  • tissue – string, the tissue where to analyze the biological processes.
  • cc_size_thresh – a threshold on size on number proteins in molecular function.
  • limit – limit to print.
Returns:

None.

ontology_regex(res, reg)[source]

Search a regex in all ontological attributes of nodes in graph (“cellular_components”, “biological_processes”, “cellular_components”) and return an union of matching results with query results so far.

Parameters:
  • res – entry node list corresponding to query results so far.
  • reg – regex to be search as a string.
Returns:

union list of matching nodes and res.

pathway_regex(res, reg)[source]

Search a regex in “pathways” attribute of nodes in graph and return union of arg res with matching nodes.

Parameters:
  • res – entry node list corresponding to query results so far.
  • reg – regex to be search as a string.
Returns:

union list of matching nodes and res.

print_sub_graph_nodes(graph, print_spec='i_o_p_m', limit=30)[source]

Print nodes in graph up to a limit with specs similar to sub_graph_by_node_regex_search.

Parameters:
  • graph – graph where to print nodes
  • print_spec – a string to specify what to print, a combination of “i” (for info),
  • "p" (for pathways), "m" (for metabolites), "o" (for ontologies) –
  • by underscore "_". As a split is applied, the order is not important. (separated) –
  • limit – limit to the number of prints.
Returns:

None.

propagate_node(node, diameter)[source]

Recursive part of sub_graph_from_node_propagation :param node: node to propagate :param diameter: diameter that still need to be propagated

Returns:node results.
recurrent_ontology_query(sub_search, nodes)[source]

test recursively queries in request for ontology and return result as list

Parameters:
  • sub_search – the list request.
  • nodes – nodes to be searched.
Returns:

a list containing proteins satisfying results.

remove_edges_by_threshold(graph, score_threshold=0.0)[source]

Remove from graph all edges that have a score inferior to threshold. Considering removing edges can do new orphan nodes (with no edges), those node are removed also from graph.

Parameters:
  • graph – graph to clean edges.
  • score_threshold – attribute score threshold, should be between 0 and 1.
  • edge scores are. (as) –
Returns:

cleaned graph.

restrict_by_tissue_threshold(nodes, tissue, threshold)[source]

Remove all nodes from entry nodes that does’nt have an expression superior to a threshold

Parameters:
  • nodes – list of nodes.
  • tissue – a string key for tissue.
  • threshold – float between 0 and 1.
Returns:

new list containing nodes that satisfy threshold properties.

This is the public method that need to be used to query for a subgraph by searching for a set expression query in ontology. As for method above, returns a sub graph cleaned by threshold. ontology query language: simple query: “goid” -> returns subgraph with nodes in goid basic query : [“and”, “goid1”, “goid2”, …] -> returns subgraph with nodes in goid1 and goid2 and … basic query : [“not”, “goid1”, …] -> returns subgraph with nodes not in goid1, … basic query : [“or”, “goid1”, “goid2”] -> returns subgraph with nodes in goid1 or goid2 or … complex query : [“and”, [“or”, “g1”, “g2”, [“and”, “g3”, “g4”]], [“not”, “g5”], “g6”] : -> return subgraph with nodes satisfying (g1 or g2 or (g3 and g4)) and (not g5) and g6 :param ontology_query: the query list :param tissue: restrict the search by tissue (exemple “lung”). Default None and ignored. :param score_threshold: threshold to apply to edges in subgraph, between 0 and 1 as the scores :param expression_threshold: threshold to apply to expression score in tissue. :param Ignored if tissue is None.:

Returns:New sub graph.

This is the public method that need to be used to query for a subgraph of the graph by searching a regex in the node attribute. Step 1 : search nodes with matching regex in attributes. Step 2 : removes nodes that are inferior to expression threshold. Step 3 : create subragph from parent graph and removes edges inferior to score threshold.

Parameters:
  • regex – regex to be searched in node attributes.
  • spec – a string to specify where to search for, a combination of “i” (for info),
  • "p" (for pathways), "m" (for metabolites), "o" (for ontologies) –
  • "_". As a split is applied, the order is not important. (underscore) –
  • tissue – restrict the search by tissue (exemple “lung”). Default None and ignored.
  • score_threshold – threshold to apply to edges in subgraph, between 0 and 1 as
  • scores. (the) –
  • expression_threshold – threshold to apply to expression score in tissue.
  • if tissue is None. (Ignored) –
Returns:

New sub graph.

sub_graph_from_node_propagation(nodes, diameter=1, tissue=None, score_threshold=0.0, expression_threshold=0.0)[source]

Takes nodes and returns sub graph generated by neighbor propagation up to a diameter. Will start recursively to take all neighbors of entry nodes, then neighbors of neighbors, etc…The method will return subgraph thresholded eventually by tissue and scores on edges.

Parameters:
  • nodes – nodes to propagate
  • diameter – diameter of the resulting sub graph around the node.
  • tissue – restrict the search by tissue (exemple “lung”). Default None and ignored.
  • score_threshold – threshold to apply to edges in subgraph, between 0 and 1 as the scores
  • expression_threshold – threshold to apply to expression score in tissue.
  • if tissue is None. (Ignored) –
Returns:

New graph with results.

OntologyGraph

class pybiographs.graphs.OntologyGraph(name: str)[source]

Covid Data

CovidData

class pybiographs.covid_data.CovidData[source]
__init__()[source]

Initialize a CovidData.

Mappings

Mappings

class pybiographs.mappings.Mappings[source]

Deep Learning

PPInteractionDataset

pybiographs.dl_models.torch_datasets.PPInteractionDataset

PPGCN

pybiographs.dl_models.graph_dl_model.PPGCN