Graphs¶
InteractionGraph
¶
-
class
pybiographs.graphs.
InteractionGraph
(directed: bool = False)[source]¶ Represents a graph of protein-protein interactions.
It can load a directed or undirected graph, and wraps the corresponding networkx graph. If it represents a directed graph it behaves as a
networkx.Digraph
. Otherwise it behaves as anetworkx.Graph
.- Node attributes:
label : uniprot_id from uniprot (https://www.uniprot.org/)
- string node_type : metabolome_graph (with pathway and metabolites associated)
or other_protein (not referenced as metabolome proteins: no metabolites and no pathway on smpd : https://smpdb.ca/)
- string info : Text explaining the products of the mRNA that codes
the protein from STRING database : https://string-db.org/).
- list cellular_components : list of Go Id cellular components the protein
is belonging to in QuickGO (see gene ontology : https://www.ebi.ac.uk/QuickGO/).
Mappings``
.go_to_name`` maps GoId to names.
list molecular_functions : list of Go Id as above but for molecular functions.
list biological_processes : list of Go Id as above but for biological processes.
- list expression_data : Vector of float of size 308 corresponding to expression
ranks of intial RNAm coding the protein renormalized from 0 to 1 in 308 tissues (see https://bgee.org/). index_tissue is a dict mapping index in vector to string tissue name.
- list metabolites : list of HMDB ID metabolites associated to protein if
it is a metabolome_protein (see https://hmdb.ca/).
Mappings``
.metabolite_id_to_name`` contains themapping from id to metabolite name.
- list pathways : list of pathway of names the belonging to the metabolome_protein.
for more information on a pathway, search it on smpd (might not be referenced).
string sequence : Amino acid sequence for the protein.
Directed edge attributes:
Undirected edge attributes:
-
__init__
(directed: bool = False)[source]¶ Initialize a
InteractionGraph
.Parameters: directed – If True
the instance will represent a directed graph of interactions. IfFalse
the instance will represent an undirected graph of interactions.
-
classify_tissue_by_node_expression
(nodes, limit=30)[source]¶ Takes a list of nodes, then print tissues where the set of nodes is the most expressed.
Parameters: - nodes – nodes to be searched for.
- limit – limit to print.
Returns: None.
-
get_nodes_by_sequence_regex
(sequence_regex)[source]¶ Search a regex in amino acid sequences stored in nodes and returns matching node results. :param sequence_regex: regex to search in sequences
Returns: a list of nodes (uniprot ids).
-
info_sequence_regex
(res, reg, attribute)[source]¶ Depending on node attribute “info” or “sequence”, search regex in all node attributes and return an union between query node results and nodes that have a match.
Parameters: - res – entry node list corresponding to query result so far.
- reg – regex to be search as a string.
- attribute – “info” or “sequence”.
Returns: union list of matching nodes and res.
-
is_directed
¶ Return
True
if it represents a directed graph. Otherwise returnFalse
.
-
metabolites_regex
(res, reg)[source]¶ Search a regex in metabolites names in graph node attributes and return an union of matching results with query results so far.
Parameters: - res – entry node list corresponding to query results so far.
- reg – regex to be search as a string.
Returns: union list of matching nodes and res.
-
most_present_biological_processes
(graph, tissue, bp_size_thresh=0, limit=10)[source]¶ After sub_graph_from_node_propagation, this function can be used to print most affected biological processes.
Parameters: - graph – sub graph to print most affected components.
- tissue – string, the tissue where to analyze the biological processes.
- bp_size_thresh – a threshold on size on number proteins in biological processes
- limit – limit to print.
Returns: None.
-
most_present_cellular_components
(graph, tissue, cc_size_thresh=0, limit=10)[source]¶ Similar to most_affected_biological_processes; but for cellular components.
Parameters: - graph – sub graph to print most affected cellular components.
- tissue – string, the tissue where to analyze the biological processes.
- cc_size_thresh – a threshold on size on number proteins in molecular function.
- limit – limit to print.
Returns: None.
-
ontology_regex
(res, reg)[source]¶ Search a regex in all ontological attributes of nodes in graph (“cellular_components”, “biological_processes”, “cellular_components”) and return an union of matching results with query results so far.
Parameters: - res – entry node list corresponding to query results so far.
- reg – regex to be search as a string.
Returns: union list of matching nodes and res.
-
pathway_regex
(res, reg)[source]¶ Search a regex in “pathways” attribute of nodes in graph and return union of arg res with matching nodes.
Parameters: - res – entry node list corresponding to query results so far.
- reg – regex to be search as a string.
Returns: union list of matching nodes and res.
-
print_sub_graph_nodes
(graph, print_spec='i_o_p_m', limit=30)[source]¶ Print nodes in graph up to a limit with specs similar to sub_graph_by_node_regex_search.
Parameters: - graph – graph where to print nodes
- print_spec – a string to specify what to print, a combination of “i” (for info),
- "p" (for pathways), "m" (for metabolites), "o" (for ontologies) –
- by underscore "_". As a split is applied, the order is not important. (separated) –
- limit – limit to the number of prints.
Returns: None.
-
propagate_node
(node, diameter)[source]¶ Recursive part of sub_graph_from_node_propagation :param node: node to propagate :param diameter: diameter that still need to be propagated
Returns: node results.
-
recurrent_ontology_query
(sub_search, nodes)[source]¶ test recursively queries in request for ontology and return result as list
Parameters: - sub_search – the list request.
- nodes – nodes to be searched.
Returns: a list containing proteins satisfying results.
-
remove_edges_by_threshold
(graph, score_threshold=0.0)[source]¶ Remove from graph all edges that have a score inferior to threshold. Considering removing edges can do new orphan nodes (with no edges), those node are removed also from graph.
Parameters: - graph – graph to clean edges.
- score_threshold – attribute score threshold, should be between 0 and 1.
- edge scores are. (as) –
Returns: cleaned graph.
-
restrict_by_tissue_threshold
(nodes, tissue, threshold)[source]¶ Remove all nodes from entry nodes that does’nt have an expression superior to a threshold
Parameters: - nodes – list of nodes.
- tissue – a string key for tissue.
- threshold – float between 0 and 1.
Returns: new list containing nodes that satisfy threshold properties.
-
sub_graph_by_node_ontology_search
(ontology_query=None, tissue=None, score_threshold=0.0, expression_threshold=0.0)[source]¶ This is the public method that need to be used to query for a subgraph by searching for a set expression query in ontology. As for method above, returns a sub graph cleaned by threshold. ontology query language: simple query: “goid” -> returns subgraph with nodes in goid basic query : [“and”, “goid1”, “goid2”, …] -> returns subgraph with nodes in goid1 and goid2 and … basic query : [“not”, “goid1”, …] -> returns subgraph with nodes not in goid1, … basic query : [“or”, “goid1”, “goid2”] -> returns subgraph with nodes in goid1 or goid2 or … complex query : [“and”, [“or”, “g1”, “g2”, [“and”, “g3”, “g4”]], [“not”, “g5”], “g6”] : -> return subgraph with nodes satisfying (g1 or g2 or (g3 and g4)) and (not g5) and g6 :param ontology_query: the query list :param tissue: restrict the search by tissue (exemple “lung”). Default None and ignored. :param score_threshold: threshold to apply to edges in subgraph, between 0 and 1 as the scores :param expression_threshold: threshold to apply to expression score in tissue. :param Ignored if tissue is None.:
Returns: New sub graph.
-
sub_graph_by_node_regex_search
(regex, spec='i_p_m_o', tissue=None, score_threshold=0.0, expression_threshold=0.0)[source]¶ This is the public method that need to be used to query for a subgraph of the graph by searching a regex in the node attribute. Step 1 : search nodes with matching regex in attributes. Step 2 : removes nodes that are inferior to expression threshold. Step 3 : create subragph from parent graph and removes edges inferior to score threshold.
Parameters: - regex – regex to be searched in node attributes.
- spec – a string to specify where to search for, a combination of “i” (for info),
- "p" (for pathways), "m" (for metabolites), "o" (for ontologies) –
- "_". As a split is applied, the order is not important. (underscore) –
- tissue – restrict the search by tissue (exemple “lung”). Default None and ignored.
- score_threshold – threshold to apply to edges in subgraph, between 0 and 1 as
- scores. (the) –
- expression_threshold – threshold to apply to expression score in tissue.
- if tissue is None. (Ignored) –
Returns: New sub graph.
-
sub_graph_from_node_propagation
(nodes, diameter=1, tissue=None, score_threshold=0.0, expression_threshold=0.0)[source]¶ Takes nodes and returns sub graph generated by neighbor propagation up to a diameter. Will start recursively to take all neighbors of entry nodes, then neighbors of neighbors, etc…The method will return subgraph thresholded eventually by tissue and scores on edges.
Parameters: - nodes – nodes to propagate
- diameter – diameter of the resulting sub graph around the node.
- tissue – restrict the search by tissue (exemple “lung”). Default None and ignored.
- score_threshold – threshold to apply to edges in subgraph, between 0 and 1 as the scores
- expression_threshold – threshold to apply to expression score in tissue.
- if tissue is None. (Ignored) –
Returns: New graph with results.