spras package

Subpackages

Submodules

spras.allpairs module

class spras.allpairs.AllPairs

Bases: PRM[Empty]

dois: list[str] = []

static generate_inputs(data: Dataset, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - nodetypes: node types with sources and targets - network: network file containing edges and their weights - directed_flag: contains true if network is fully directed.

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['nodetypes', 'network', 'directed_flag']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

spras.btb module

class spras.btb.BowTieBuilder

Bases: PRM[Empty]

dois: list[str] = ['10.1186/1752-0509-3-67']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - sources: NODEID-headered list of sources - targets: NODEID-headered list of targets - edges: node pairs with associated edge weights

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['sources', 'targets', 'edges']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

spras.containers module

exception spras.containers.ContainerError(message: str, error_code: int, stdout: str | None, stderr: str | None, *args)

Bases: RuntimeError

Raises when anything goes wrong inside a container

error_code: int

stderr: str | None

stdout: str | None

streams_contain(needle: str): Checks (with case sensitivity) if any of the stdout/err streams have the provided needle.

spras.containers.convert_docker_path(src_path: PurePath, dest_path: PurePath, file_path: str | PurePath) → PurePosixPath: Convert a file_path that is in src_path to be in dest_path instead. For example, convert /usr/mydir and /usr/mydir/myfile and /tmp to /tmp/myfile @param src_path: source path that is a parent of file_path @param dest_path: destination path @param file_path: filename that is under the source path @return: a new path with the filename relative to the destination path

spras.containers.download_gcs(gcs_path: str, local_path: str, is_dir: bool)

spras.containers.env_to_items(environment: dict[str, str]) → Iterator[str]: Turns an environment variable dictionary to KEY=VALUE pairs.

spras.containers.prepare_dsub_cmd(flags: dict[str, str | list[str]])

spras.containers.prepare_path_docker(orig_path: PurePath) → str: Prepare an absolute path for mounting as a Docker volume. Converts Windows file separators to posix separators. Converts Windows drive letters in absolute paths.

spras.containers.prepare_volume(filename: str | PathLike, volume_base: str | PurePath, config: ProcessedContainerSettings) → Tuple[Tuple[PurePath, PurePath], str]: Makes a file on the local file system accessible within a container by mapping the local (source) path to a new container (destination) path and renaming the file to be relative to the destination path. The destination path will be a new path relative to the volume_base that includes a hash identifier derived from the original filename. An example mapped filename looks like ‘/spras/MG4YPNK/oi1-edges.txt’. @param filename: The file on the local file system to map @param volume_base: The base directory in the container, which must be an absolute directory @return: first returned object is a tuple (source path, destination path) and the second returned object is the updated filename relative to the destination path

spras.containers.run_container(container_suffix: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, out_dir: str | PathLike, container_settings: ProcessedContainerSettings, environment: dict[str, str] | None = None, network_disabled=False): Runs a command in the container using Singularity or Docker @param container_suffix: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param container_settings: the settings to use to run the container @param out_dir: output directory for the rule’s artifacts. Only passed into run_container_singularity for the purpose of profiling. @param environment: environment variables to set in the container @param network_disabled: Disables the network on the container. Only works for docker for now. This acts as a ‘runtime assertion’ that a container works w/o networking. @return: output from Singularity execute or Docker run

spras.containers.run_container_and_log(name: str, container_suffix: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, out_dir: str | PathLike, container_settings: ProcessedContainerSettings, environment: dict[str, str] | None = None, network_disabled=False): Runs a command in the container using Singularity or Docker with associated pretty printed messages. @param name: the display name of the running container for logging purposes @param container_suffix: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param container_settings: the container settings to use @param environment: environment variables to set in the container @param network_disabled: Disables the network on the container. Only works for docker for now. This acts as a ‘runtime assertion’ that a container works w/o networking. @return: output from Singularity execute or Docker run

spras.containers.run_container_docker(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None, network_disabled=False): Runs a command in the container using Docker. Attempts to automatically correct file owner and group for new files created by the container, setting them to the current owner and group IDs. Does not modify the owner or group for existing files modified by the container. @param container: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: output from Docker run, or will error if the container errored.

spras.containers.run_container_dsub(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None) → str: Runs a command in the Google Cloud using dsub. @param container: name of the container in the Google Cloud Container Registry @param command: command to run @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: path of output from dsub

spras.containers.run_container_singularity(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, out_dir: str, config: ProcessedContainerSettings, environment: dict[str, str] | None = None): Runs a command in the container using Singularity. Only available on Linux. @param container: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param out_dir: output directory for the rule’s artifacts – used here to store profiling data @param environment: environment variable to set in the container @return: output from Singularity execute

spras.containers.upload_gcs(local_path: str, gcs_path: str, is_dir: bool)

spras.dataset module

class spras.dataset.Dataset(dataset_dict)

Bases: object

NODE_ID = 'NODEID'

contains_node_columns(col_names: list[str] | str)

classmethod from_file(file_name: str): Loads dataset object from a pickle file. Usage: dataset = Dataset.from_file(pickle_file)

get_interactome() → DataFrame | None

get_node_columns(col_names: list[str]) → DataFrame: returns: A table containing the requested column names and node IDs for all nodes with at least 1 of the requested values being non-empty

get_other_files()

load_files_from_dict(dataset_dict)

Loads data files from dataset_dict, which is one dataset dictionary from the list in the config file with the fields in the config file. Populates node_table and interactome.

node_table is a single merged pandas table.

When loading data files, files of only a single column with node identifiers are assumed to be a binary feature where all listed nodes are True.

We might want to eventually add an additional “algs” argument so only subsets of the entire config file are loaded, alternatively this could be handled outside this class.

returns: none

to_file(file_name: str): Saves dataset object to pickle file

warning_threshold = 0.05

spras.domino module

class spras.domino.DOMINO

Bases: PRM[DominoParams]

dois: list[str] = ['10.15252/msb.20209593']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - network: list of edges - active_genes: list of active genes

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert the merged HTML modules into the universal pathway format @param raw_pathway_file: the merged HTML modules file @param standardized_pathway_file: the edges from the modules written in the universal format

required_inputs: list[str] = ['network', 'active_genes']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.domino.DominoParams(*, module_threshold: float | None = None, slice_threshold: float | None = None)

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

module_threshold: float | None: the p-value threshold for considering a slice as relevant (optional)

slice_threshold: float | None: the p-value threshold for considering a putative module as final module (optional)

spras.domino.post_domino_id_transform(node_id): Remove ID_PREFIX from the beginning of the node id if it is present. @param node_id: the node id to transform @return the node id without the prefix, if it was present, otherwise the original node id

spras.domino.pre_domino_id_transform(node_id): DOMINO requires module edges to have the ‘ENSG0’ string as a prefix for visualization. Prepend each node id with this ID_PREFIX. @param node_id: the node id to transform @return the node id with the prefix added

spras.evaluation module

class spras.evaluation.Evaluation(gold_standard_dict: GoldStandardDict)

Bases: object

NODE_ID = 'NODEID'

datasets: list[str]

directed_edge_table: DataFrame

static edge_dummy_function(mixed_edge_table: DataFrame, undirected_edge_table: DataFrame, directed_edge_table: DataFrame, dummy_file: str)

Temporary function to test edge file implementation. Will be removed from SPRAS’s evaluation code in the future.

Takes in the different edge table versions (mixed, fully directed, fully undirected) for a specific edge gold standard dataset and writes them to a file.

@param mixed_edge_table: Edge gold standard treated as mixed directionality. @param undirected_edge_table: Edge gold standard treated as fully undirected. @param directed_edge_table: Edge gold standard treated as fully directed. @param dummy_file: Filename to save the edge tables.

static edge_frequency_node_ensemble(node_table: DataFrame, ensemble_files: Iterable[str | PathLike], dataset_file: str) → dict

Generates a dictionary of node ensembles using edge frequency data from a list of ensemble files. A list of ensemble files can contain an aggregated ensemble or algorithm-specific ensembles per dataset

1. Prepare a set of default nodes (from the interactome and gold standard) with frequency 0, ensuring all nodes are represented in the ensemble.

Answers “Did the algorithm(s) select the correct nodes from the entire network?”

It measures whether the algorithm(s) can distinguish relevant gold standard nodes

from the full “universe” of possible nodes present in the input network.

For each edge ensemble file:
1. Read edges and their frequencies.
2. Convert edges frequencies into node-level frequencies for Node1 and Node2.
3. Merge with the default node set and group by node, taking the maximum frequency per node.
Store the resulting node-frequency ensemble under the corresponding ensemble source (label).

If the interactome or gold standard table is empty, a ValueError is raised.

@param node_table: dataFrame of gold standard nodes (column: NODEID) @param ensemble_files: list of file paths containing edge ensemble outputs @param dataset_file: path to the dataset file used to load the interactome @return: dictionary mapping each ensemble source to its node ensemble DataFrame

static from_file(file_name: str | PathLike): Loads gold standard object from a pickle file. Usage: gold_standard = Evaluation.from_file(pickle_file)

label: str

static merge_gold_standard_input(gs_dict: GoldStandardDict, gs_file: str | PathLike): Merge files listed for this gold standard dataset and write the dataset to disk @param gs_dict: gold standard dataset to process @param gs_file: output filename

mixed_edge_table: DataFrame

static node_precision_and_recall(file_paths: Iterable[str | PathLike], node_table: DataFrame) → DataFrame

Computes node-level precision and recall for each pathway reconstruction output file.

This function takes a list of file paths corresponding to pathway reconstruction algorithm outputs, each formatted as a tab-separated file with columns ‘Node1’, ‘Node2’, ‘Rank’, and ‘Direction’. It compares the set of predicted nodes (from both columns Node1 and Node2) to a provided gold standard node table and computes precision and recall per file.

@param file_paths: list of file paths of pathway reconstruction algorithm outputs @param node_table: the gold standard nodes @return: A DataFrame with the following columns:

‘Pathway’: Path object corresponding to each pathway file

‘Precision’: Precision of predicted nodes vs. gold standard nodes

‘Recall’: Recall of predicted nodes vs. gold standard nodes

node_table: DataFrame

static pca_chosen_pathway(coordinates_files: Iterable[str | PathLike], pathway_summary_file: str, output_dir: str)

Identifies the pathway closest to a specified highest kernel density estimated (KDE) peak based on PCA coordinates Calculates the Euclidean distance from each data point to the KDE peak, then selects the closest pathway as the representative pathway. If there is more than one representative pathway, a tiebreaker will be used

choose smallest pathway (smallest number of edges and nodes)

end all be all, choose the first one based on name

Returns a list of file paths for the representative pathway associated with the closest data point to the centroid.

@param coordinates_files: a list of PCA coordinates files for a dataset or specific algorithm in a dataset @param pathway_summary_file: a file for each file per dataset about its network statistics @param output_dir: the main reconstruction directory

static precision_and_recall_pca_chosen_pathway(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, aggregate_per_algorithm: bool = False)

Function for visualizing the precision and recall of the single parameter combination selected via PCA, either for each algorithm individually or one combination shared across all algorithms. Each point represents a pathway reconstruction corresponding to the PCA-selected parameter combination. If aggregate_per_algorithm is True, the plot includes a pca chosen pathway per algorithm and titled accordingly.

@param pr_df: Dataframe of calculated precision and recall for each pathway file @param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)

static precision_and_recall_per_pathway(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, aggregate_per_algorithm: bool = False)

Function for visualizing per pathway precision and recall across all algorithms. Each point in the plot represents a single pathway reconstruction. If aggregate_per_algorithm is set to True, the plot is restricted to a single algorithm and titled accordingly.

@param pr_df: Dataframe of calculated precision and recall for each pathway file @param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)

static precision_recall_curve_node_ensemble(node_ensembles: dict, node_table: DataFrame, output_png: str | PathLike, output_file: str | PathLike, aggregate_per_algorithm: bool = False)

Plots precision-recall (PR) curves for a set of node ensembles evaluated against a gold standard.

Takes in a dictionary containing either algorithm-specific node ensembles or an aggregated node ensemble for a given dataset, along with the corresponding gold standard node table. Computes PR curves for each ensemble and plots all curves on a single figure.

@param node_ensembles: dict of the pre-computed node_ensemble(s) @param node_table: gold standard nodes @param output_png: filename to save the precision and recall curves as a .png image @param output_file: filename to save the precision, recall, threshold values, average precision, and baseline average precision @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)

to_file(file_name: str | PathLike): Saves gold standard object to pickle file

undirected_edge_table: DataFrame

static visualize_precision_and_recall_plot(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, title: str)

Generates a scatter plot of precision and recall values for each pathway and saves both the plot and the data.

This function is intended for visualizing how different pathway reconstructions perform (not a precision-recall curve) showing the precision and recall of each parameter combination for each algorithm.

@param pr_df: Dataframe of calculated precision and recall for each pathway file.: Must include a preprocessed ‘Algorithm’ column.

@param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param title: The title to use for the plot

class spras.evaluation.GoldStandardDict

Bases: TypedDict

data_dir: str

dataset_labels: list[str]

edge_files: list[str]

label: str

node_files: list[str]

spras.interactome module

Author: Neha Talluri 07/19/23

Methods for converting from the universal network input format and to the universal network output format

spras.interactome.add_constant(df: DataFrame, new_col_name: str, const) → DataFrame

adds a new column at the end of the input dataframe with a constant value in all rows

@param df: input network df of edges, weights, and directionality @param new_col_name: the name of the new column @param const: some type of constant needed in the df @return a df with a new constant added to every row

spras.interactome.add_directionality_constant(df: DataFrame, col_name: str, dir_const, undir_const) → DataFrame

deals with adding in directionality constants for mixed graphs that aren’t using the universal input directly

@param df: input network df of edges, weights, and directionality @param col_name: the name of the new column @param dir_const: the directed edge const @param undir_const: the undirected edge const @return a df converted to show directionality differently

spras.interactome.convert_directed_to_undirected(df: DataFrame) → DataFrame

turns a graph into a fully undirected graph - turns all the directed edges directly into undirected edges - we will lose any sense of directionality and the graph won’t be inherently accurate, but the basic relationship between the two connected nodes will still remain intact.

@param df: input network df of edges, weights, and directionality @return a dataframe with no directed edges in Direction column

spras.interactome.convert_undirected_to_directed(df: DataFrame) → DataFrame

turns a graph into a fully directed graph - turns every undirected edge into a pair of directed edges - with the pair of directed edges, we are not losing too much information because the relationship of the undirected edge is still preserved

@param df: input network df of edges, weights, and directionality @return a dataframe with no undirected edges in Direction column

spras.interactome.has_direction(df: DataFrame) → bool: Checks if a graph has any directed edge.

spras.interactome.reinsert_direction_col_directed(df: DataFrame) → DataFrame

adds back a ‘Direction’ column that puts a column of ‘D’s at the end of the provided dataframe

@param df: input network df that contains directionality column @return a df with Direction column of ‘D’s added back

spras.interactome.reinsert_direction_col_mixed(df: DataFrame, existing_direction_column: str, dir_const: str, undir_const: str) → DataFrame

adds back a ‘Direction’ column that puts a ‘U’ or ‘D’ at the end of provided dataframe based on the dir/undir constants in the existing direction column

@param df: input network df that contains a directionality column @param existing_direction_column: the name of the existing directionality column @param dir_const: the directed edge const @param undir_const: the undirected edge const @return a df with universal Direction column added back

spras.interactome.reinsert_direction_col_undirected(df: DataFrame) → DataFrame

adds back a ‘Direction’ column that puts a columns of ‘U’s at the end of the provided dataframe

@param df: input network df that contains a directionality column @return a df with Direction column of ‘U’s added back

spras.interactome.sort_and_deduplicate_undirected(df: DataFrame) → DataFrame

Sorts and removes duplicated undirected edges and directed edges are left unchanged.

For each undirected edge, the nodes are sorted so that Interactor1 is always lexicographically (or numerically) less than Interactor2. Duplicate undirected edges are then removed.

@param df: input network df of edges, weights, and directionality @return a dataframe with sorted undirected, deduplicated edges and unchanged directed edges

spras.logging module

Utility functions for logging.

spras.logging.indent(string: str, space_count: int = 4)

spras.meo module

class spras.meo.MEO

Bases: PRM[MEOParams]

dois: list[str] = ['10.1093/nar/gkq1207']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - sources: list of sources - targets: list of targets - edges: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['sources', 'targets', 'edges']

static run(inputs, output_file=None, args=None, container_settings=None): Run Maximum Edge Orientation in the Docker image with the provided parameters. The properties file is generated from the provided arguments. Only supports the Random orientation algorithm. Does not support MINSAT or MAXCSP. Only the edge output file is retained. All other output files are deleted.

class spras.meo.MEOParams(*, max_path_length: int | None = None, local_search: bool | None = None, rand_restarts: int | None = None)

Bases: BaseModel

local_search: bool | None: a boolean parameter that enables MEO’s local search functionality. See “Improving approximations with local search” in the associated paper for more information.

max_path_length: int | None: the maximal length of a path from sources and targets to orient.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

rand_restarts: int | None: The number of random restarts to use.

spras.meo.write_properties(filename=PosixPath('properties.txt'), edges=None, sources=None, targets=None, edge_output=None, path_output=None, max_path_length=None, local_search=None, rand_restarts=None, framework='docker'): Write the properties file for Maximum Edge Orientation See https://github.com/agitter/meo/blob/master/sample.props for property descriptions and the default values at https://github.com/agitter/meo/blob/master/src/alg/EOMain.java#L185-L199 All file and directory names, except the filename argument, should be converted to container-friendly filenames with util.prepare_volume before passing them to this function filename: the name of the properties file to write on the local file system

spras.mincostflow module

class spras.mincostflow.MinCostFlow

Bases: PRM[MinCostFlowParams]

dois: list[str] = ['10.1038/ng.337']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - sources: list of sources - targets: list of targets - edges: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params)

Convert a predicted pathway into the universal format

Although the algorithm constructs a directed network, the resulting network is treated as undirected. This is because the flow within the network doesn’t imply causal relationships between nodes. The primary goal of the algorithm is node identification, not the identification of directional edges.

@param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['sources', 'targets', 'edges']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.mincostflow.MinCostFlowParams(*, flow: int | None = None, capacity: int | None = None)

Bases: BaseModel

capacity: int | None: amount of capacity allowed on each edge

flow: int | None: amount of flow going through the graph

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spras.omicsintegrator1 module

class spras.omicsintegrator1.DummyMode(value)

Bases: CaseInsensitiveEnum

all = 'all': connect the dummy node to all nodes in the interactome i.e. full set of nodes in graph

file = 'file': connect the dummy node to a specific list of nodes provided in a file

others = 'others': connect the dummy node to all nodes that are not terminal nodes i.e. nodes w/o prizes

terminals = 'terminals': connect the dummy node to all nodes that have been assigned prizes

class spras.omicsintegrator1.OmicsIntegrator1

Bases: PRM[OmicsIntegrator1Params]

Omics Integrator 1 works with partially directed graphs - it takes in the universal input directly

Expected raw input format: Interactor1 Interactor2 Weight Direction - the expected raw input file should have node pairs in the 1st and 2nd columns, with a weight in the 3rd column and directionality in the 4th column - it can include repeated and bidirectional edges - it uses ‘U’ for undirected edges and ‘D’ for directed edges

dois: list[str] = ['10.1371/journal.pcbi.1004879']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - prizes: list of nodes associated with their prize - edges: list of edges associated with their weight and directionality - dummy_nodes: list of dummy nodes

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['prizes', 'edges', 'dummy_nodes']

static run(inputs, output_file, args, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.omicsintegrator1.OmicsIntegrator1Params(*, dummy_mode: DummyMode | None = None, mu_squared: bool = False, exclude_terms: bool = False, noisy_edges: int = 0, shuffled_prizes: int = 0, random_terminals: int = 0, seed: int | None = None, w: float, b: float, d: int, mu: float = 0.0, noise: float | None = None, g: float = 0.001, r: float = 0)

Bases: BaseModel

b: float: The trade-off between including more prizes and using less reliable edgess

d: int: Controls the maximum path-length from root to terminal nodes

dummy_mode: DummyMode | None

exclude_terms: bool

g: float: (gamma) msgsteiner reinforcement parameter that affects the convergence of the solution and runtime, with larger values leading to faster convergence but suboptimal results.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mu: float: Controls the degree-based negative prizes (default 0.0)

mu_squared: bool

noise: float | None: Standard Deviation of the gaussian noise added to edges in Noisy Edges Randomizations

noisy_edges: int: How many times you would like to add noise to the given edge values and re-run the algorithm.

r: float: msgsteiner parameter that adds random noise to edges, which is rarely needed.

random_terminals: int: How many times to apply the given prizes to random nodes in the interactome

seed: int | None: The randomness seed to use.

shuffled_prizes: int: How many times the algorithm should shuffle the prizes and re-run

w: float: Float that affects the number of connected components, with higher values leading to more components

spras.omicsintegrator1.write_conf(filename=PosixPath('config.txt'), w=None, b=None, d=None, mu=None, noise=None, g=None, r=None): Write the configuration file for Omics Integrator 1 See https://github.com/fraenkel-lab/OmicsIntegrator#required-inputs filename: the name of the configuration file to write

spras.omicsintegrator2 module

class spras.omicsintegrator2.DummyMode(value)

Bases: CaseInsensitiveEnum

all = 'all'

others = 'others'

terminals = 'terminals'

class spras.omicsintegrator2.OmicsIntegrator2

Bases: PRM[OmicsIntegrator2Params]

dois: list[str] = ['10.1371/journal.pcbi.1004879']

static generate_inputs(data: Dataset, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - prizes: list of nodes associated with their prize - edges: list of edges associated with their cost (transformed from the original Dataset weights)

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['prizes', 'edges']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.omicsintegrator2.OmicsIntegrator2Params(*, w: float = 5, b: float = 1, g: float = 3, noise: float | None = None, noisy_edges: int | None = None, random_terminals: int | None = None, dummy_mode: DummyMode | None = None, seed: int | None = None)

Bases: BaseModel

b: float: Beta: scaling factor of prizes

dummy_mode: DummyMode | None

Tells the program which nodes in the interactome to connect the dummy node to. (default: terminals): “terminals” = connect to all terminals “others” = connect to all nodes except for terminals “all” = connect to all nodes in the interactome.

g: float: Gamma: multiplicative edge penalty from degree of endpoints

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

noise: float | None: Standard Deviation of the gaussian noise added to edges in Noisy Edges Randomizations.

noisy_edges: int | None: An integer specifying how many times to add noise to the given edge values and re-run.

random_terminals: int | None: An integer specifying how many times to apply your given prizes to random nodes in the interactome and re-run

seed: int | None: The random seed to use for this run.

w: float: Omega: the weight of the edges connecting the dummy node to the nodes selected by dummyMode

spras.pathlinker module

class spras.pathlinker.PathLinker

Bases: PRM[PathLinkerParams]

dois: list[str] = ['10.1038/npjsba.2016.2', '10.1089/cmb.2012.0274']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - nodetypes: list of nodes tagged with whether they are a source or a target - network: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params): Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['nodetypes', 'network']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.pathlinker.PathLinkerParams(*, k: int = 100)

Bases: BaseModel

k: int: Number of paths

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spras.prm module

class spras.prm.PRM

Bases: ABC, Generic[T]

The PRM (Pathway Reconstruction Module) class, which defines the interface that runner.py uses to handle algorithms.

dois: list[str] = None

abstractmethod static generate_inputs(data: Dataset, filename_map: dict[str, str]): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type

classmethod get_params_generic() → type[T]

Gets the class instance of the parameter type passed into T, allowing us to use the underlying pydantic model associated to it.

For example, on class PathLinker(PRM[PathLinkerParams]), calling PathLinker.get_params_generic() returns PathLinkerParams.

abstractmethod static parse_output(raw_pathway_file: str, standardized_pathway_file: str, params: dict[str, Any])

required_inputs: list[str] = []

abstractmethod static run(inputs: dict[str, str | PathLike], output_file: str | PathLike, args: T, container_settings: ProcessedContainerSettings)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

classmethod run_typeless(inputs: dict[str, str | PathLike], output_file: str | PathLike, args: dict[str, Any], container_settings: ProcessedContainerSettings): This is similar to PRA.run, but it does pydantic logic internally to re-validate argument parameters.

classmethod validate_required_inputs(filename_map: dict[str, str])

classmethod validate_required_run_args(inputs: dict[str, str | PathLike], relax: list[str] | None = None)

Validates the inputs parameter for PRM#run.

@param inputs: See PRM#run. @param relax: List of inputs that aren’t required: if they are specified, they should be valid path

spras.responsenet module

class spras.responsenet.ResponseNet

Bases: PRM[ResponseNetParams]

dois: list[str] = ['10.1038/ng.337']

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - sources: list of sources - targets: list of targets - edges: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params)

Convert a predicted pathway into the universal format

Although the algorithm constructs a directed network, the resulting network is treated as undirected. This is because the flow within the network doesn’t imply causal relationships between nodes. The primary goal of the algorithm is node identification, not the identification of directional edges. See “Directionality of ResponseNet output” in the supplement of “Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity” (https://www.nature.com/articles/ng.337)

@param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

required_inputs: list[str] = ['sources', 'targets', 'edges']

static run(inputs, output_file, args=None, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.responsenet.ResponseNetParams(*, gamma: int = 10)

Bases: BaseModel

gamma: int: The ‘size’ of the graph. The higher gamma is, the more flow is encouraged to start from the source nodes.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spras.runner module

spras.runner.get_algorithm(algorithm: str) → type[PRM]

spras.runner.get_required_inputs(algorithm: str): Get the input files requires to run this algorithm @param algorithm: algorithm name @return: A list of strings of input files types

spras.runner.merge_input(dataset_dict, dataset_file: str): Merge files listed for this dataset and write the dataset to disk @param dataset_dict: dataset to process @param dataset_file: output filename

spras.runner.parse_output(algorithm: str, raw_pathway_file: str, standardized_pathway_file: str, params: dict[str, Any]): Convert a predicted pathway into the universal format @param algorithm: algorithm name @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format

spras.runner.prepare_inputs(algorithm: str, data_file: str, filename_map: dict[str, str]): Prepare general dataset files for this algorithm @param algorithm: algorithm name @param data_file: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:

spras.runner.run(algorithm: str, inputs, output_file, args, container_settings): A generic interface to the algorithm-specific run functions

spras.rwr module

class spras.rwr.RWR

Bases: PRM[RWRParams]

dois: list[str] = []

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - nodes: list of active nodes - network: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params)

required_inputs: list[str] = ['network', 'nodes']

static run(inputs, output_file, args, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.rwr.RWRParams(*, threshold: int, alpha: float | None = None)

Bases: BaseModel

alpha: float | None: The chance of a restart during the random walk

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

threshold: int: The number of nodes to return

spras.strwr module

class spras.strwr.ST_RWR

Bases: PRM[ST_RWRParams]

dois: list[str] = []

static generate_inputs(data, filename_map): Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type. Associated files will be written with: - sources: list of sources - targets: list of targets - network: list of edges

static parse_output(raw_pathway_file, standardized_pathway_file, params)

required_inputs: list[str] = ['network', 'sources', 'targets']

static run(inputs, output_file, args, container_settings=None)

Runs an algorithm with the specified inputs, algorithm params (T), the designated output_file, and the desired container_settings.

See the algorithm-specific generate_inputs and parse_output for information about the input and output format.

class spras.strwr.ST_RWRParams(*, threshold: int, alpha: float | None = None)

Bases: BaseModel

alpha: float | None: The chance of a restart during the random walk

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_attribute_docstrings': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

threshold: int: The number of nodes to return

spras.util module

Utility functions for pathway reconstruction

class spras.util.NpHashEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

A numpy compatible JSON encoder meant to be fed as a cls for hashing, as this encoder does not decode the other way around.

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

spras.util.add_rank_column(df: DataFrame) → DataFrame: Add a column of 1s to the dataframe @param df: the dataframe to add the rank column of 1s to

spras.util.duplicate_edges(df: DataFrame) → tuple[DataFrame, bool]

Removes duplicate edges from the input DataFrame. Run within every pathway reconstruction algorithm’s parse_output. - For duplicate edges (based on Node1, Node2, and Direction), the one with the smallest Rank is kept. - For undirected edges, the node pair is sorted (e.g., “B-A” becomes “A-B”) before removing duplicates.

@param df: A DataFrame from a raw file pathway. @return pd.DataFrame: A DataFrame with duplicate edges removed. @return bool: True if duplicate edges were found and removed, False otherwise.

spras.util.hash_filename(filename: str | PathLike, length: int | None = None) → str: Hash of a filename using hash_params_sha1_base32 @param filename: filename to hash @param length: the length of the returned hash, which is ignored if it is None, < 1, or > the full hash length @return: hash

spras.util.hash_params_sha1_base32(params_dict: Dict[str, Any], length: int | None = None, cls=None) → str: Hash of a dictionary. Derived from https://www.doc.ic.ac.uk/~nuric/coding/how-to-hash-a-dictionary-in-python.html by Nuri Cingillioglu Adapted to use sha1 instead of MD5 and encode in base32 Can be truncated to the desired length @param params_dict: the algorithm parameters dictionary @param length: the length of the returned hash, which is ignored if it is None, < 1, or > the full hash length

spras.util.make_required_dirs(path: str | PathLike): Create the directory and parent directories required before an output file can be written to the specified path. Existing directories will not raise an error. @param path: the filename that is to be written

spras.util.raw_pathway_df(raw_pathway_file: str, sep: str = '\t', header: int = None) → DataFrame: Creates dataframe from contents in raw pathway file, otherwise returns an empty dataframe with standard output column names @param raw_pathway_file: path to raw_pathway_file @param sep: separator used when loading the dataframe, default tab character @param header: what row the header is in raw_pathway_file, default None

spras package

Subpackages

Submodules

spras.allpairs module

spras.btb module

spras.containers module

spras.dataset module

spras.domino module

spras.evaluation module

spras.interactome module

spras.logging module

spras.meo module

spras.mincostflow module

spras.omicsintegrator1 module

spras.omicsintegrator2 module

spras.pathlinker module

spras.prm module

spras.responsenet module

spras.runner module

spras.rwr module

spras.strwr module

spras.util module

Module contents