CLI Reference

cooltools

Type -h or –help after any subcommand for more information.

cooltools [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose: Verbose logging

-d, --debug: Post mortem debugging

-V, --version: Show the version and exit.

coverage

Calculate the sums of cis and genome-wide contacts (aka coverage aka marginals) for a sparse Hi-C contact map in Cooler HDF5 format. Note that the sum(tot_cov) from this function is two times the number of reads contributing to the cooler, as each side contributes to the coverage.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools coverage [OPTIONS] COOL_PATH

Options

-o, --output <output>: Specify output file name to store the coverage in a tsv format.

--ignore-diags <ignore_diags>: The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--store: Append columns with coverage (cov_cis_raw, cov_tot_raw), or (cov_cis_clr_weight_name, cov_tot_clr_weight_name) if calculating balanced coverage, to the cooler bin table. If clr_weight_name=None, also stores total cis counts in the cooler info

--chunksize <chunksize>

Split the contact matrix pixel records into equally sized chunks to save memory and/or parallelize. Default is 10^7

Default: 10000000.0

--bigwig: Also save output as bigWig files for cis and total coverage with the names <output>.<cis/tot>.bw

--clr_weight_name <clr_weight_name>: Name of the weight column. Specify to calculate coverage of balanced cooler.

-p, --nproc <nproc>: Number of processes to split the work between. [default: 1, i.e. no process pool]

Arguments

COOL_PATH: Required argument

dots

Call dots on a Hi-C heatmap that are not larger than max_loci_separation.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

EXPECTED_PATH : The paths to a tsv-like file with expected signal, including a header. Use the ‘::’ syntax to specify a column name.

Analysis will be performed for chromosomes referred to in EXPECTED_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH. Also chromosomes refered to in EXPECTED_PATH must be non-trivial, i.e., contain not-NaN signal. Thus, make sure to prune your EXPECTED_PATH before applying this script.

COOL_PATH and EXPECTED_PATH must be binned at the same resolution.

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘region1/2’, ‘dist’, ‘n_valid’, value_name. value_name is controlled using options. Header must be present in a file.

cooltools dots [OPTIONS] COOL_PATH EXPECTED_PATH

Options

--view, --regions <view>: Path to a BED file with the definition of viewframe (regions) used in the calculation of EXPECTED_PATH. Dot-calling will be performed for these regions independently e.g. chromosome arms. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--clr-weight-name <clr_weight_name>

Use cooler balancing weight with this name.

Default: 'weight'

-p, --nproc <nproc>: Number of processes to split the work between. [default: 1, i.e. no process pool]

--max-loci-separation <max_loci_separation>

Limit loci separation for dot-calling, i.e., do not call dots for loci that are further than max_loci_separation basepair apart. 2-20MB is reasonable and would capture most of CTCF-dots.

Default: 2000000

--max-nans-tolerated <max_nans_tolerated>

Maximum number of NaNs tolerated in a footprint of every used filter. Must be controlled with caution, as large max-nans-tolerated, might lead to pixels scored in the padding area of the tiles to “penetrate” to the list of scored pixels for the statistical testing. [max-nans-tolerated <= 2*w ]

Default: 1

--tile-size <tile_size>

Tile size for the Hi-C heatmap tiling. Typically on order of several mega-bases, and <= max_loci_separation.

Default: 6000000

--num-lambda-bins <num_lambda_bins>

Number of log-spaced bins to divide your adjusted expected between. Same as HiCCUPS_W1_MAX_INDX (40) in the original HiCCUPS.

Default: 45

--fdr <fdr>

False discovery rate (FDR) to control in the multiple hypothesis testing BH-FDR procedure.

Default: 0.02

--clustering-radius <clustering_radius>

Radius for clustering dots that have been called too close to each other.Typically on order of 40 kilo-bases, and >= binsize.

Default: 39000

-v, --verbose: Enable verbose output

-o, --output <output>: Required Specify output file name to store called dots in a BEDPE-like format

Arguments

COOL_PATH: Required argument

EXPECTED_PATH: Required argument

eigs-cis

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track.

COOL_PATH : the paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : the path to a BedGraph-like file that stores phasing track as track-name named column.

BedGraph-like format assumes tab-separated columns chrom, start, stop and track-name.

cooltools eigs-cis [OPTIONS] COOL_PATH

Options

--phasing-track <TRACK_PATH>: Phasing track for orienting and ranking eigenvectors,provided as /path/to/track::track_value_column_name.

--view, --regions <view>: Path to a BED file which defines which regions of the chromosomes to use (only implemented for cis contacts). Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--n-eigs <n_eigs>

Number of eigenvectors to compute.

Default: 3

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Using raw unbalanced data is not currently supported for eigenvectors.

Default: 'weight'

--ignore-diags <ignore_diags>: The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

-v, --verbose: Enable verbose output

-o, --out-prefix <out_prefix>: Required Save compartment track as a BED-like file. Eigenvectors and corresponding eigenvalues are stored in out_prefix.contact_type.vecs.tsv and out_prefix.contact_type.lam.txt

--bigwig: Also save compartment track (E1) as a bigWig file with the name out_prefix.contact_type.bw

Arguments

COOL_PATH: Required argument

eigs-trans

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track.

COOL_PATH : the paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : the path to a BedGraph-like file that stores phasing track as track-name named column.

BedGraph-like format assumes tab-separated columns chrom, start, stop and track-name.

cooltools eigs-trans [OPTIONS] COOL_PATH

Options

--phasing-track <TRACK_PATH>: Phasing track for orienting and ranking eigenvectors,provided as /path/to/track::track_value_column_name.

--view, --regions <view>: Path to a BED file which defines which regions of the chromosomes to use (only implemented for cis contacts). Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--n-eigs <n_eigs>

Number of eigenvectors to compute.

Default: 3

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Using raw unbalanced data is not supported for saddles.

Default: 'weight'

-v, --verbose: Enable verbose output

-o, --out-prefix <out_prefix>: Required Save compartment track as a BED-like file. Eigenvectors and corresponding eigenvalues are stored in out_prefix.contact_type.vecs.tsv and out_prefix.contact_type.lam.txt

--bigwig: Also save compartment track (E1) as a bigWig file with the name out_prefix.contact_type.bw

Arguments

COOL_PATH: Required argument

expected-cis

Calculate expected Hi-C signal for cis regions of chromosomal interaction map: average of interactions separated by the same genomic distance, i.e. are on the same diagonal on the cis-heatmap.

When balancing weights are not applied to the data, there is no masking of bad bins performed.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools expected-cis [OPTIONS] COOL_PATH

Options

-p, --nproc <nproc>: Number of processes to split the work between.[default: 1, i.e. no process pool]

-c, --chunksize <chunksize>

Control the number of pixels handled by each worker process at a time.

Default: 10000000

-o, --output <output>: Specify output file name to store the expected in a tsv format.

--view, --regions <view>: Path to a 3 or 4-column BED file with genomic regions to calculated cis-expected on. When region names are not provided (no 4th column), UCSC-style region names are generated. Cis-expected is calculated for all chromosomes, when this is not specified. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--smooth: If set, cis-expected is smoothed and result stored in an additional column e.g. balanced.avg.smoothed

--aggregate-smoothed: If set, cis-expected is averaged over all regions and then smoothed. Result is stored in an additional column, e.g. balanced.avg.smoothed.agg. Ignored without smoothing

--smooth-sigma <smooth_sigma>

Control smoothing with the standard deviation of the smoothing Gaussian kernel, ignored without smoothing.

Default: 0.1

--clr-weight-name <clr_weight_name>

Use balancing weight with this name stored in cooler.Provide empty argument to calculate cis-expected on raw data

Default: 'weight'

--ignore-diags <ignore_diags>

Number of diagonals to neglect for cis contact type

Default: 2

Arguments

COOL_PATH: Required argument

expected-trans

Calculate expected Hi-C signal for trans regions of chromosomal interaction map: average of interactions in a rectangular block defined by a pair of regions, e.g. inter-chromosomal blocks.

When balancing weights are not applied to the data, there is no masking of bad bins performed.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools expected-trans [OPTIONS] COOL_PATH

Options

-p, --nproc <nproc>: Number of processes to split the work between.[default: 1, i.e. no process pool]

-c, --chunksize <chunksize>

Control the number of pixels handled by each worker process at a time.

Default: 10000000

-o, --output <output>: Specify output file name to store the expected in a tsv format.

--view, --regions <view>: Path to a 3 or 4-column BED file with genomic regions. Trans-expected is calculated on all pairwise combinations of these regions. When region names are not provided (no 4th column), UCSC-style region names are generated. Trans-expected is calculated for all inter-chromosomal pairs, when view is not specified. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--clr-weight-name <clr_weight_name>

Use balancing weight with this name stored in cooler.Provide empty argument to calculate cis-expected on raw data

Default: 'weight'

Arguments

COOL_PATH: Required argument

genome

Utilities for binned genome assemblies.

cooltools genome [OPTIONS] COMMAND [ARGS]...

binnify

cooltools genome binnify [OPTIONS] CHROMSIZES_PATH BINSIZE

Options

--all-names: Parse all chromosome names from file, not only default r”^chr[0-9]+$”, r”^chr[XY]$”, r”^chrM$”.

Arguments

CHROMSIZES_PATH: Required argument

BINSIZE: Required argument

digest

cooltools genome digest [OPTIONS] CHROMSIZES_PATH FASTA_PATH ENZYME_NAME

Arguments

CHROMSIZES_PATH: Required argument

FASTA_PATH: Required argument

ENZYME_NAME: Required argument

fetch-chromsizes

cooltools genome fetch-chromsizes [OPTIONS] DB

Arguments

DB: Required argument

gc

cooltools genome gc [OPTIONS] BINS_PATH FASTA_PATH

Options

--mapped-only

Arguments

BINS_PATH: Required argument

FASTA_PATH: Required argument

genecov

BINS_PATH is the path to bintable.

DB is the name of the genome assembly. The gene locations will be automatically downloaded from teh UCSC goldenPath.

cooltools genome genecov [OPTIONS] BINS_PATH DB

Arguments

BINS_PATH: Required argument

DB: Required argument

insulation

Calculate the diamond insulation scores and call insulating boundaries.

IN_PATH : The path to a .cool file with a balanced Hi-C map.

WINDOWThe window size for the insulation score calculations.: Multiple space-separated values can be provided. By default, the window size must be provided in units of bp. When the flag –window-pixels is set, the window sizes must be provided in units of pixels instead.

cooltools insulation [OPTIONS] IN_PATH WINDOW

Options

-p, --nproc <nproc>: Number of processes to split the work between.[default: 1, i.e. no process pool]

-o, --output <output>: Specify output file name to store the insulation in a tsv format.

--view, --regions <view>: Path to a BED file containing genomic regions for which insulation scores will be calculated. Region names can be provided in a 4th column and should match regions and their names in expected. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--ignore-diags <ignore_diags>: The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Provide empty argument to calculate insulation on raw data (no masking bad pixels).

Default: 'weight'

--min-frac-valid-pixels <min_frac_valid_pixels>

The minimal fraction of valid pixels in a sliding diamond. Used to mask bins during boundary detection.

Default: 0.66

--min-dist-bad-bin <min_dist_bad_bin>

The minimal allowed distance to a bad bin. Use to mask bins after insulation calculation and during boundary detection.

Default: 0

--threshold <threshold>

Rule used to threshold the histogram of boundary strengths to exclude weakboundaries. ‘Li’ or ‘Otsu’ use corresponding methods from skimage.thresholding.Providing a float value will filter by a fixed threshold

Default: 0

--window-pixels: If set then the window sizes are provided in units of pixels.

--append-raw-scores: Append columns with raw scores (sum_counts, sum_balanced, n_pixels) to the output table.

--chunksize <chunksize>

Default: 20000000

--verbose: Report real-time progress.

--bigwig: Also save insulation tracks as a bigWig files for different window sizes with the names output.<window-size>.bw

Arguments

IN_PATH: Required argument

WINDOW: Optional argument(s)

pileup

Perform retrieval of the snippets from .cool file.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

FEATURES_PATH : the path to a BED or BEDPE-like file that contains features for snipping windows. If BED, then the features are on-diagonal. If BEDPE, then the features can be off-diagonal (but not in trans or between different regions in the view).

cooltools pileup [OPTIONS] COOL_PATH FEATURES_PATH

Options

--view, --regions <view>: Path to a BED file which defines which regions of the chromosomes to use. Required if EXPECTED_PATH is provided Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--expected <expected>: Path to the expected table. If provided, outputs OOE pileup. if not provided, outputs regular pileup.

--flank <flank>

Size of flanks.

Default: 100000

--features-format <features_format>

Input features format.

Options: auto | BED | BEDPE

--clr-weight-name <clr_weight_name>

Use balancing weight with this name.

Default: 'weight'

-o, --out <out>: Required Save output pileup as NPZ/HDF5 file.

--out-format <out_format>

Type of output.

Default: 'NPZ'
Options: NPZ | HDF5

--store-snips: Flag indicating whether snips should be stored.

-p, --nproc <nproc>: Number of processes to split the work between. [default: 1, i.e. no process pool]

--ignore-diags <ignore_diags>: The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--aggregate <aggregate>

Function for calculating aggregate signal.

Default: 'none'
Options: none | mean | median | std | min | max

-v, --verbose: Enable verbose output

Arguments

COOL_PATH: Required argument

FEATURES_PATH: Required argument

random-sample

Pick a random sample of contacts from a Hi-C map.

IN_PATH : Input cooler path or URI.

OUT_PATH : Output cooler path or URI.

Specify the target sample size with either –count or –frac.

cooltools random-sample [OPTIONS] IN_PATH OUT_PATH

Options

-c, --count <count>: The target number of contacts in the sample. The resulting sample size will not match it precisely. Mutually exclusive with –frac and –cis-count

--cis-count <cis_count>: The target number of cis contacts in the sample. The resulting sample size will not match it precisely. Mutually exclusive with –count and –frac

-f, --frac <frac>: The target sample size as a fraction of contacts in the original dataset. Mutually exclusive with –count and –cis-count

--exact: If specified, use exact sampling that guarantees the size of the output sample. Otherwise, binomial sampling will be used and the sample size will be distributed around the target value.

-p, --nproc <nproc>: Number of processes to split the work between.[default: 1, i.e. no process pool]

--chunksize <chunksize>

The number of pixels loaded and processed per step of computation.

Default: 10000000

Arguments

IN_PATH: Required argument

OUT_PATH: Required argument

rearrange

Rearrange data from a cooler according to a new genomic view

Parameters

IN_PATHstr: .cool file (or URI) with data to rearrange.
OUT_PATHstr: .cool file (or URI) to save the rearrange data.
viewstr: Path to a BED-like file which defines which regions of the chromosomes to use and in what order. Has to be a valid viewframe (columns corresponding to region coordinates followed by the region name), with potential additional columns. Using –new-chrom-col and –orientation-col you can specify the new chromosome names and whether to invert each region (optional). If has no header with column names, assumes the new-chrom-col is the fifth column and –orientation-col is the sixth, if they exist.
new_chrom_colstr: Column name in the view with new chromosome names. If not provided and there is no column named ‘new_chrom’ in the view file, uses original chromosome names.
orientation_colstr: Columns name in the view with orientations of each region (+ or -). - means the region will be inverted. If not providedand there is no column named ‘strand’ in the view file, assumes all are forward oriented.
assemblystr: The name of the assembly for the new cooler. If None, uses the same as in the original cooler.
chunksizeint: The number of pixels loaded and processed per step of computation.
modestr: (w)rite or (a)ppend to the output file (default: w)

cooltools rearrange [OPTIONS] IN_PATH OUT_PATH

Options

--view <view>: Required Path to a BED-like file which defines which regions of the chromosomes to use and in what order. Using –new-chrom-col and –orientation-col you can specify the new chromosome names and whether to invert each region (optional)

--new-chrom-col <new_chrom_col>: Column name in the view with new chromosome names. If not provided and there is no column named ‘new_chrom’ in the view file, uses original chromosome names

--orientation-col <orientation_col>: Columns name in the view with orientations of each region (+ or -). If not providedand there is no column named ‘strand’ in the view file, assumes all are forward oriented

--assembly <assembly>: The name of the assembly for the new cooler. If None, uses the same as in the original cooler.

--chunksize <chunksize>

The number of pixels loaded and processed per step of computation.

Default: 10000000

--mode <mode>

(w)rite or (a)ppend to the output file (default: w)

Options: w | a

Arguments

IN_PATH: Required argument

OUT_PATH: Required argument

saddle

Calculate saddle statistics and generate saddle plots for an arbitrary signal track on the genomic bins of a contact matrix.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : The path to bedGraph-like file with a binned compartment track (eigenvector), including a header. Use the ‘::’ syntax to specify a column name.

EXPECTED_PATH : The paths to a tsv-like file with expected signal, including a header. Use the ‘::’ syntax to specify a column name.

Analysis will be performed for chromosomes referred to in TRACK_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH and EXPECTED_PATH.

COOL_PATH, TRACK_PATH and EXPECTED_PATH must be binned at the same resolution (expect for EXPECTED_PATH in case of trans contact type).

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘chrom’, ‘diag’, ‘n_valid’, value_name and the following columns for trans contacts: ‘chrom1’, ‘chrom2’, ‘n_valid’, value_name value_name is controlled using options. Header must be present in a file.

cooltools saddle [OPTIONS] COOL_PATH TRACK_PATH EXPECTED_PATH

Options

-t, --contact-type <contact_type>

Type of the contacts to aggregate

Default: 'cis'
Options: cis | trans

--min-dist <min_dist>

Minimal distance between bins to consider, bp. If negative, removesthe first two diagonals of the data. Ignored with –contact-type trans.

Default: -1

--max-dist <max_dist>

Maximal distance between bins to consider, bp. Ignored, if negative. Ignored with –contact-type trans.

Default: -1

-n, --n-bins <n_bins>

Number of bins for digitizing track values.

Default: 50

--vrange <vrange>: Low and high values used for binning genome-wide track values, e.g. if range`=(-0.05, 0.05), `n-bins equidistant bins would be generated. Use to prevent extreme track values from exploding the bin range and to ensure consistent bins across several runs of compute_saddle command using different track files.

--qrange <qrange>

Low and high values used for quantile bins of genome-wide track values,e.g. if `qrange`=(0.02, 0.98) the lower bin would start at the 2nd percentile and the upper bin would end at the 98th percentile of the genome-wide signal. Use to prevent the extreme track values from exploding the bin range.

Default: None, None

--clr-weight-name <clr_weight_name>

Use balancing weight with this name.

Default: 'weight'

--strength, --no-strength: Compute and save compartment ‘saddle strength’ profile

--view, --regions <view>: Path to a BED file containing genomic regions for which saddleplot will be calculated. Region names can be provided in a 4th column and should match regions and their names in expected. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

-o, --out-prefix <out_prefix>: Required Dump ‘saddledata’, ‘binedges’ and ‘hist’ arrays in a numpy-specific .npz container. Use numpy.load to load these arrays into a dict-like object. The digitized signal values are saved to a bedGraph-style TSV.

--fig <fig>

Generate a figure and save to a file of the specified format. If not specified - no image is generated. Repeat for multiple output formats.

Options: png | jpg | svg | pdf | ps | eps

--scale <scale>

Value scale for the heatmap

Default: 'log'
Options: linear | log

--cmap <cmap>

Name of matplotlib colormap

Default: 'coolwarm'

--vmin <vmin>: Low value of the saddleplot colorbar. Note: value in original units irrespective of used scale, and therefore should be positive for both vmin and vmax.

--vmax <vmax>: High value of the saddleplot colorbar

--hist-color <hist_color>: Face color of histogram bar chart

-v, --verbose: Enable verbose output

Arguments

COOL_PATH: Required argument

TRACK_PATH: Required argument

EXPECTED_PATH: Required argument

virtual4c

Generate virtual 4C profile from a contact map by extracting all interactions of a given viewpoint with the rest of the genome.

COOL_PATH : the paths to a .cool file with a Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

VIEWPOINT : the viewpoint to use for the virtual 4C profile. Provide as a UCSC-string (e.g. chr1:1-1000)

Note: this is a new (experimental) tool, the interface or output might change in a future version.

cooltools virtual4c [OPTIONS] COOL_PATH VIEWPOINT

Options

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Provide empty argument to calculate insulation on raw data (no masking bad pixels).

Default: 'weight'

-o, --out-prefix <out_prefix>: Required Save virtual 4C track as a BED-like file. Contact frequency is stored in out_prefix.v4C.tsv

--bigwig: Also save virtual 4C track as a bigWig file with the name out_prefix.v4C.bw

-p, --nproc <nproc>: Number of processes to split the work between. [default: 1, i.e. no process pool]

Arguments

COOL_PATH: Required argument

VIEWPOINT: Required argument