CLI Reference

cooltools

Type -h or –help after any subcommand for more information.

cooltools [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose

Verbose logging

-d, --debug

Post mortem debugging

-V, --version

Show the version and exit.

coverage

Calculate the sums of cis and genome-wide contacts (aka coverage aka marginals) for a sparse Hi-C contact map in Cooler HDF5 format. Note that the sum(tot_cov) from this function is two times the number of reads contributing to the cooler, as each side contributes to the coverage.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools coverage [OPTIONS] COOL_PATH

Options

-o, --output <output>

Specify output file name to store the coverage in a tsv format.

--ignore-diags <ignore_diags>

The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--store

Append columns with coverage (cov_cis_raw, cov_tot_raw), or (cov_cis_clr_weight_name, cov_tot_clr_weight_name) if calculating balanced coverage, to the cooler bin table. If clr_weight_name=None, also stores total cis counts in the cooler info

--chunksize <chunksize>

Split the contact matrix pixel records into equally sized chunks to save memory and/or parallelize. Default is 10^7

Default

10000000.0

--bigwig

Also save output as bigWig files for cis and total coverage with the names <output>.<cis/tot>.bw

--clr_weight_name <clr_weight_name>

Name of the weight column. Specify to calculate coverage of balanced cooler.

-p, --nproc <nproc>

Number of processes to split the work between. [default: 1, i.e. no process pool]

Arguments

COOL_PATH

Required argument

dots

Call dots on a Hi-C heatmap that are not larger than max_loci_separation.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

EXPECTED_PATH : The paths to a tsv-like file with expected signal, including a header. Use the ‘::’ syntax to specify a column name.

Analysis will be performed for chromosomes referred to in EXPECTED_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH. Also chromosomes refered to in EXPECTED_PATH must be non-trivial, i.e., contain not-NaN signal. Thus, make sure to prune your EXPECTED_PATH before applying this script.

COOL_PATH and EXPECTED_PATH must be binned at the same resolution.

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘region1/2’, ‘dist’, ‘n_valid’, value_name. value_name is controlled using options. Header must be present in a file.

cooltools dots [OPTIONS] COOL_PATH EXPECTED_PATH

Options

--view, --regions <view>

Path to a BED file with the definition of viewframe (regions) used in the calculation of EXPECTED_PATH. Dot-calling will be performed for these regions independently e.g. chromosome arms. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--clr-weight-name <clr_weight_name>

Use cooler balancing weight with this name.

Default

weight

-p, --nproc <nproc>

Number of processes to split the work between. [default: 1, i.e. no process pool]

--max-loci-separation <max_loci_separation>

Limit loci separation for dot-calling, i.e., do not call dots for loci that are further than max_loci_separation basepair apart. 2-20MB is reasonable and would capture most of CTCF-dots.

Default

2000000

--max-nans-tolerated <max_nans_tolerated>

Maximum number of NaNs tolerated in a footprint of every used filter. Must be controlled with caution, as large max-nans-tolerated, might lead to pixels scored in the padding area of the tiles to “penetrate” to the list of scored pixels for the statistical testing. [max-nans-tolerated <= 2*w ]

Default

1

--tile-size <tile_size>

Tile size for the Hi-C heatmap tiling. Typically on order of several mega-bases, and <= max_loci_separation.

Default

6000000

--num-lambda-bins <num_lambda_bins>

Number of log-spaced bins to divide your adjusted expected between. Same as HiCCUPS_W1_MAX_INDX (40) in the original HiCCUPS.

Default

45

--fdr <fdr>

False discovery rate (FDR) to control in the multiple hypothesis testing BH-FDR procedure.

Default

0.02

--clustering-radius <clustering_radius>

Radius for clustering dots that have been called too close to each other.Typically on order of 40 kilo-bases, and >= binsize.

Default

39000

-v, --verbose

Enable verbose output

-o, --output <output>

Required Specify output file name to store called dots in a BEDPE-like format

Arguments

COOL_PATH

Required argument

EXPECTED_PATH

Required argument

eigs-cis

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track.

COOL_PATH : the paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : the path to a BedGraph-like file that stores phasing track as track-name named column.

BedGraph-like format assumes tab-separated columns chrom, start, stop and track-name.

cooltools eigs-cis [OPTIONS] COOL_PATH

Options

--phasing-track <TRACK_PATH>

Phasing track for orienting and ranking eigenvectors,provided as /path/to/track::track_value_column_name.

--view, --regions <view>

Path to a BED file which defines which regions of the chromosomes to use (only implemented for cis contacts). Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--n-eigs <n_eigs>

Number of eigenvectors to compute.

Default

3

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Using raw unbalanced data is not currently supported for eigenvectors.

Default

weight

--ignore-diags <ignore_diags>

The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

-v, --verbose

Enable verbose output

-o, --out-prefix <out_prefix>

Required Save compartment track as a BED-like file. Eigenvectors and corresponding eigenvalues are stored in out_prefix.contact_type.vecs.tsv and out_prefix.contact_type.lam.txt

--bigwig

Also save compartment track (E1) as a bigWig file with the name out_prefix.contact_type.bw

Arguments

COOL_PATH

Required argument

eigs-trans

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track.

COOL_PATH : the paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : the path to a BedGraph-like file that stores phasing track as track-name named column.

BedGraph-like format assumes tab-separated columns chrom, start, stop and track-name.

cooltools eigs-trans [OPTIONS] COOL_PATH

Options

--phasing-track <TRACK_PATH>

Phasing track for orienting and ranking eigenvectors,provided as /path/to/track::track_value_column_name.

--view, --regions <view>

Path to a BED file which defines which regions of the chromosomes to use (only implemented for cis contacts). Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--n-eigs <n_eigs>

Number of eigenvectors to compute.

Default

3

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Using raw unbalanced data is not supported for saddles.

Default

weight

-v, --verbose

Enable verbose output

-o, --out-prefix <out_prefix>

Required Save compartment track as a BED-like file. Eigenvectors and corresponding eigenvalues are stored in out_prefix.contact_type.vecs.tsv and out_prefix.contact_type.lam.txt

--bigwig

Also save compartment track (E1) as a bigWig file with the name out_prefix.contact_type.bw

Arguments

COOL_PATH

Required argument

expected-cis

Calculate expected Hi-C signal for cis regions of chromosomal interaction map: average of interactions separated by the same genomic distance, i.e. are on the same diagonal on the cis-heatmap.

When balancing weights are not applied to the data, there is no masking of bad bins performed.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools expected-cis [OPTIONS] COOL_PATH

Options

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

-c, --chunksize <chunksize>

Control the number of pixels handled by each worker process at a time.

Default

10000000

-o, --output <output>

Specify output file name to store the expected in a tsv format.

--view, --regions <view>

Path to a 3 or 4-column BED file with genomic regions to calculated cis-expected on. When region names are not provided (no 4th column), UCSC-style region names are generated. Cis-expected is calculated for all chromosomes, when this is not specified. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--smooth

If set, cis-expected is smoothed and result stored in an additional column e.g. balanced.avg.smoothed

--aggregate-smoothed

If set, cis-expected is averaged over all regions and then smoothed. Result is stored in an additional column, e.g. balanced.avg.smoothed.agg. Ignored without smoothing

--smooth-sigma <smooth_sigma>

Control smoothing with the standard deviation of the smoothing Gaussian kernel, ignored without smoothing.

Default

0.1

--clr-weight-name <clr_weight_name>

Use balancing weight with this name stored in cooler.Provide empty argument to calculate cis-expected on raw data

Default

weight

--ignore-diags <ignore_diags>

Number of diagonals to neglect for cis contact type

Default

2

Arguments

COOL_PATH

Required argument

expected-trans

Calculate expected Hi-C signal for trans regions of chromosomal interaction map: average of interactions in a rectangular block defined by a pair of regions, e.g. inter-chromosomal blocks.

When balancing weights are not applied to the data, there is no masking of bad bins performed.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools expected-trans [OPTIONS] COOL_PATH

Options

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

-c, --chunksize <chunksize>

Control the number of pixels handled by each worker process at a time.

Default

10000000

-o, --output <output>

Specify output file name to store the expected in a tsv format.

--view, --regions <view>

Path to a 3 or 4-column BED file with genomic regions. Trans-expected is calculated on all pairwise combinations of these regions. When region names are not provided (no 4th column), UCSC-style region names are generated. Trans-expected is calculated for all inter-chromosomal pairs, when view is not specified. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--clr-weight-name <clr_weight_name>

Use balancing weight with this name stored in cooler.Provide empty argument to calculate cis-expected on raw data

Default

weight

Arguments

COOL_PATH

Required argument

genome

Utilities for binned genome assemblies.

cooltools genome [OPTIONS] COMMAND [ARGS]...

binnify

cooltools genome binnify [OPTIONS] CHROMSIZES_PATH BINSIZE

Options

--all-names

Parse all chromosome names from file, not only default r”^chr[0-9]+$”, r”^chr[XY]$”, r”^chrM$”.

Arguments

CHROMSIZES_PATH

Required argument

BINSIZE

Required argument

digest

cooltools genome digest [OPTIONS] CHROMSIZES_PATH FASTA_PATH ENZYME_NAME

Arguments

CHROMSIZES_PATH

Required argument

FASTA_PATH

Required argument

ENZYME_NAME

Required argument

fetch-chromsizes

cooltools genome fetch-chromsizes [OPTIONS] DB

Arguments

DB

Required argument

gc

cooltools genome gc [OPTIONS] BINS_PATH FASTA_PATH

Options

--mapped-only

Arguments

BINS_PATH

Required argument

FASTA_PATH

Required argument

genecov

BINS_PATH is the path to bintable.

DB is the name of the genome assembly. The gene locations will be automatically downloaded from teh UCSC goldenPath.

cooltools genome genecov [OPTIONS] BINS_PATH DB

Arguments

BINS_PATH

Required argument

DB

Required argument

insulation

Calculate the diamond insulation scores and call insulating boundaries.

IN_PATH : The path to a .cool file with a balanced Hi-C map.

WINDOWThe window size for the insulation score calculations.

Multiple space-separated values can be provided. By default, the window size must be provided in units of bp. When the flag –window-pixels is set, the window sizes must be provided in units of pixels instead.

cooltools insulation [OPTIONS] IN_PATH WINDOW

Options

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

-o, --output <output>

Specify output file name to store the insulation in a tsv format.

--view, --regions <view>

Path to a BED file containing genomic regions for which insulation scores will be calculated. Region names can be provided in a 4th column and should match regions and their names in expected. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--ignore-diags <ignore_diags>

The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Provide empty argument to calculate insulation on raw data (no masking bad pixels).

Default

weight

--min-frac-valid-pixels <min_frac_valid_pixels>

The minimal fraction of valid pixels in a sliding diamond. Used to mask bins during boundary detection.

Default

0.66

--min-dist-bad-bin <min_dist_bad_bin>

The minimal allowed distance to a bad bin. Use to mask bins after insulation calculation and during boundary detection.

Default

0

--threshold <threshold>

Rule used to threshold the histogram of boundary strengths to exclude weakboundaries. ‘Li’ or ‘Otsu’ use corresponding methods from skimage.thresholding.Providing a float value will filter by a fixed threshold

Default

0

--window-pixels

If set then the window sizes are provided in units of pixels.

--append-raw-scores

Append columns with raw scores (sum_counts, sum_balanced, n_pixels) to the output table.

--chunksize <chunksize>
Default

20000000

--verbose

Report real-time progress.

--bigwig

Also save insulation tracks as a bigWig files for different window sizes with the names output.<window-size>.bw

Arguments

IN_PATH

Required argument

WINDOW

Optional argument(s)

pileup

Perform retrieval of the snippets from .cool file.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

FEATURES_PATH : the path to a BED or BEDPE-like file that contains features for snipping windows. If BED, then the features are on-diagonal. If BEDPE, then the features can be off-diagonal (but not in trans or between different regions in the view).

cooltools pileup [OPTIONS] COOL_PATH FEATURES_PATH

Options

--view, --regions <view>

Path to a BED file which defines which regions of the chromosomes to use. Required if EXPECTED_PATH is provided Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

--expected <expected>

Path to the expected table. If provided, outputs OOE pileup. if not provided, outputs regular pileup.

--flank <flank>

Size of flanks.

Default

100000

--features-format <features_format>

Input features format.

Options

auto | BED | BEDPE

--clr-weight-name <clr_weight_name>

Use balancing weight with this name.

Default

weight

-o, --out <out>

Required Save output pileup as NPZ/HDF5 file.

--out-format <out_format>

Type of output.

Default

NPZ

Options

NPZ | HDF5

--store-snips

Flag indicating whether snips should be stored.

-p, --nproc <nproc>

Number of processes to split the work between. [default: 1, i.e. no process pool]

--ignore-diags <ignore_diags>

The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--aggregate <aggregate>

Function for calculating aggregate signal.

Default

none

Options

none | mean | median | std | min | max

-v, --verbose

Enable verbose output

Arguments

COOL_PATH

Required argument

FEATURES_PATH

Required argument

random-sample

Pick a random sample of contacts from a Hi-C map.

IN_PATH : Input cooler path or URI.

OUT_PATH : Output cooler path or URI.

Specify the target sample size with either –count or –frac.

cooltools random-sample [OPTIONS] IN_PATH OUT_PATH

Options

-c, --count <count>

The target number of contacts in the sample. The resulting sample size will not match it precisely. Mutually exclusive with –frac and –cis-count

--cis-count <cis_count>

The target number of cis contacts in the sample. The resulting sample size will not match it precisely. Mutually exclusive with –count and –frac

-f, --frac <frac>

The target sample size as a fraction of contacts in the original dataset. Mutually exclusive with –count and –cis-count

--exact

If specified, use exact sampling that guarantees the size of the output sample. Otherwise, binomial sampling will be used and the sample size will be distributed around the target value.

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

--chunksize <chunksize>

The number of pixels loaded and processed per step of computation.

Default

10000000

Arguments

IN_PATH

Required argument

OUT_PATH

Required argument

rearrange

Rearrange data from a cooler according to a new genomic view

Parameters

IN_PATHstr

.cool file (or URI) with data to rearrange.

OUT_PATHstr

.cool file (or URI) to save the rearrange data.

viewstr

Path to a BED-like file which defines which regions of the chromosomes to use and in what order. Has to be a valid viewframe (columns corresponding to region coordinates followed by the region name), with potential additional columns. Using –new-chrom-col and –orientation-col you can specify the new chromosome names and whether to invert each region (optional). If has no header with column names, assumes the new-chrom-col is the fifth column and –orientation-col is the sixth, if they exist.

new_chrom_colstr

Column name in the view with new chromosome names. If not provided and there is no column named ‘new_chrom’ in the view file, uses original chromosome names.

orientation_colstr

Columns name in the view with orientations of each region (+ or -). - means the region will be inverted. If not providedand there is no column named ‘strand’ in the view file, assumes all are forward oriented.

assemblystr

The name of the assembly for the new cooler. If None, uses the same as in the original cooler.

chunksizeint

The number of pixels loaded and processed per step of computation.

modestr

(w)rite or (a)ppend to the output file (default: w)

cooltools rearrange [OPTIONS] IN_PATH OUT_PATH

Options

--view <view>

Required Path to a BED-like file which defines which regions of the chromosomes to use and in what order. Using –new-chrom-col and –orientation-col you can specify the new chromosome names and whether to invert each region (optional)

--new-chrom-col <new_chrom_col>

Column name in the view with new chromosome names. If not provided and there is no column named ‘new_chrom’ in the view file, uses original chromosome names

--orientation-col <orientation_col>

Columns name in the view with orientations of each region (+ or -). If not providedand there is no column named ‘strand’ in the view file, assumes all are forward oriented

--assembly <assembly>

The name of the assembly for the new cooler. If None, uses the same as in the original cooler.

--chunksize <chunksize>

The number of pixels loaded and processed per step of computation.

Default

10000000

--mode <mode>

(w)rite or (a)ppend to the output file (default: w)

Options

w | a

Arguments

IN_PATH

Required argument

OUT_PATH

Required argument

saddle

Calculate saddle statistics and generate saddle plots for an arbitrary signal track on the genomic bins of a contact matrix.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : The path to bedGraph-like file with a binned compartment track (eigenvector), including a header. Use the ‘::’ syntax to specify a column name.

EXPECTED_PATH : The paths to a tsv-like file with expected signal, including a header. Use the ‘::’ syntax to specify a column name.

Analysis will be performed for chromosomes referred to in TRACK_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH and EXPECTED_PATH.

COOL_PATH, TRACK_PATH and EXPECTED_PATH must be binned at the same resolution (expect for EXPECTED_PATH in case of trans contact type).

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘chrom’, ‘diag’, ‘n_valid’, value_name and the following columns for trans contacts: ‘chrom1’, ‘chrom2’, ‘n_valid’, value_name value_name is controlled using options. Header must be present in a file.

cooltools saddle [OPTIONS] COOL_PATH TRACK_PATH EXPECTED_PATH

Options

-t, --contact-type <contact_type>

Type of the contacts to aggregate

Default

cis

Options

cis | trans

--min-dist <min_dist>

Minimal distance between bins to consider, bp. If negative, removesthe first two diagonals of the data. Ignored with –contact-type trans.

Default

-1

--max-dist <max_dist>

Maximal distance between bins to consider, bp. Ignored, if negative. Ignored with –contact-type trans.

Default

-1

-n, --n-bins <n_bins>

Number of bins for digitizing track values.

Default

50

--vrange <vrange>

Low and high values used for binning genome-wide track values, e.g. if range`=(-0.05, 0.05), `n-bins equidistant bins would be generated. Use to prevent extreme track values from exploding the bin range and to ensure consistent bins across several runs of compute_saddle command using different track files.

--qrange <qrange>

Low and high values used for quantile bins of genome-wide track values,e.g. if `qrange`=(0.02, 0.98) the lower bin would start at the 2nd percentile and the upper bin would end at the 98th percentile of the genome-wide signal. Use to prevent the extreme track values from exploding the bin range.

Default

None, None

--clr-weight-name <clr_weight_name>

Use balancing weight with this name.

Default

weight

--strength, --no-strength

Compute and save compartment ‘saddle strength’ profile

--view, --regions <view>

Path to a BED file containing genomic regions for which saddleplot will be calculated. Region names can be provided in a 4th column and should match regions and their names in expected. Note that ‘–regions’ is the deprecated name of the option. Use ‘–view’ instead.

-o, --out-prefix <out_prefix>

Required Dump ‘saddledata’, ‘binedges’ and ‘hist’ arrays in a numpy-specific .npz container. Use numpy.load to load these arrays into a dict-like object. The digitized signal values are saved to a bedGraph-style TSV.

--fig <fig>

Generate a figure and save to a file of the specified format. If not specified - no image is generated. Repeat for multiple output formats.

Options

png | jpg | svg | pdf | ps | eps

--scale <scale>

Value scale for the heatmap

Default

log

Options

linear | log

--cmap <cmap>

Name of matplotlib colormap

Default

coolwarm

--vmin <vmin>

Low value of the saddleplot colorbar. Note: value in original units irrespective of used scale, and therefore should be positive for both vmin and vmax.

--vmax <vmax>

High value of the saddleplot colorbar

--hist-color <hist_color>

Face color of histogram bar chart

-v, --verbose

Enable verbose output

Arguments

COOL_PATH

Required argument

TRACK_PATH

Required argument

EXPECTED_PATH

Required argument

virtual4c

Generate virtual 4C profile from a contact map by extracting all interactions of a given viewpoint with the rest of the genome.

COOL_PATH : the paths to a .cool file with a Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

VIEWPOINT : the viewpoint to use for the virtual 4C profile. Provide as a UCSC-string (e.g. chr1:1-1000)

Note: this is a new (experimental) tool, the interface or output might change in a future version.

cooltools virtual4c [OPTIONS] COOL_PATH VIEWPOINT

Options

--clr-weight-name <clr_weight_name>

Use balancing weight with this name. Provide empty argument to calculate insulation on raw data (no masking bad pixels).

Default

weight

-o, --out-prefix <out_prefix>

Required Save virtual 4C track as a BED-like file. Contact frequency is stored in out_prefix.v4C.tsv

--bigwig

Also save virtual 4C track as a bigWig file with the name out_prefix.v4C.bw

-p, --nproc <nproc>

Number of processes to split the work between. [default: 1, i.e. no process pool]

Arguments

COOL_PATH

Required argument

VIEWPOINT

Required argument