CLI Reference

cooltools

Type -h or –help after any subcommand for more information.

cooltools [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose

Verbose logging

-d, --debug

Post mortem debugging

-V, --version

Show the version and exit.

call-compartments

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track.

COOL_PATH : the paths to a .cool file with a balanced Hi-C map.

TRACK_PATH : the path to a BedGraph-like file that stores phasing track as track-name named column.

BedGraph-like format assumes tab-separated columns chrom, start, stop and track-name.

cooltools call-compartments [OPTIONS] COOL_PATH

Options

--reference-track <TRACK_PATH>

Reference track for orienting and ranking eigenvectors

--contact-type <contact_type>

Type of the contacts perform eigen-value decomposition on.

Default:cis
Options:cis | trans
--n-eigs <n_eigs>

Number of eigenvectors to compute.

Default:3
-v, --verbose

Enable verbose output

-o, --out-prefix <out_prefix>

Required Save compartment track as a BED-like file.

--bigwig

Also save compartment track as a bigWig file.

Arguments

COOL_PATH

Required argument

call-dots

Call dots on a Hi-C heatmap that are not larger than max_loci_separation.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

EXPECTED_PATH : The paths to a tsv-like file with expected signal.

Analysis will be performed for chromosomes referred to in EXPECTED_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH. Also chromosomes refered to in EXPECTED_PATH must be non-trivial, i.e., contain not-NaN signal. Thus, make sure to prune your EXPECTED_PATH before applying this script.

COOL_PATH and EXPECTED_PATH must be binned at the same resolution.

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘chrom’, ‘diag’, ‘n_valid’, value_name. value_name is controlled using options. Header must be present in a file.

cooltools call-dots [OPTIONS] COOL_PATH EXPECTED_PATH

Options

--expected-name <expected_name>

Name of value column in EXPECTED_PATH

Default:balanced.avg
--weight-name <weight_name>

Use balancing weight with this name.

Default:weight
-p, --nproc <nproc>

Number of processes to split the work between. [default: 1, i.e. no process pool]

--max-loci-separation <max_loci_separation>

Limit loci separation for dot-calling, i.e., do not call dots for loci that are further than max_loci_separation basepair apart. 2-20MB is reasonable and would capture most of CTCF-dots.

Default:2000000
--max-nans-tolerated <max_nans_tolerated>

Maximum number of NaNs tolerated in a footprint of every used filter. Must be controlled with caution, as large max-nans-tolerated, might lead to pixels scored in the padding area of the tiles to “penetrate” to the list of scored pixels for the statistical testing. [max-nans-tolerated <= 2*w ]

Default:1
--tile-size <tile_size>

Tile size for the Hi-C heatmap tiling. Typically on order of several mega-bases, and <= max_loci_separation.

Default:6000000
--kernel-width <kernel_width>

Outer half-width of the convolution kernel in pixels e.g. outer size (w) of the ‘donut’ kernel, with the 2*w+1 overall footprint of the ‘donut’.

--kernel-peak <kernel_peak>

Inner half-width of the convolution kernel in pixels e.g. inner size (p) of the ‘donut’ kernel, with the 2*p+1 overall footprint of the punch-hole.

--num-lambda-chunks <num_lambda_chunks>

Number of log-spaced bins to divide your adjusted expected between. Same as HiCCUPS_W1_MAX_INDX in the original HiCCUPS.

Default:45
--fdr <fdr>

False discovery rate (FDR) to control in the multiple hypothesis testing BH-FDR procedure.

Default:0.02
--dots-clustering-radius <dots_clustering_radius>

Radius for clustering dots that have been called too close to each other.Typically on order of 40 kilo-bases, and >= binsize.

Default:39000
-v, --verbose

Enable verbose output

-s, --output-scores <output_scores>

At the moment it is a redundant option that does nothing. Reserve it for a better dump of convolved scores.

--output-hists <output_hists>

Specify output file name to store lambda-chunked histograms. [Not implemented yet]

-o, --output-calls <output_calls>

Specify output file name where to store the results of dot-calling, in a BEDPE format. Pre-processed dots are stored in that file. Post-processed dots are stored in the .postproc one.

--score-dump-mode <score_dump_mode>

Specify file format for the dump of convolved scores. This dump is used for the downstream processing and is read twice. Now ‘parquet’ is the only supported format. ‘cooler’ and ‘hdf’ in the future.

Default:parquet
--temp-dir <temp_dir>

Create temporary files in specified directory.

Default:.
--no-delete-temp

Do not delete temporary files when finished.

Arguments

COOL_PATH

Required argument

EXPECTED_PATH

Required argument

compute-expected

Calculate expected Hi-C signal either for cis or for trans regions of chromosomal interaction map.

When balancing weights are not applied to the data, there is no masking of bad bins performed.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map.

cooltools compute-expected [OPTIONS] COOL_PATH

Options

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

-c, --chunksize <chunksize>

Control the number of pixels handled by each worker process at a time.

Default:10000000
-o, --output <output>

Specify output file name to store the expected in a tsv format.

--hdf

Use hdf5 format instead of tsv. Output file name must be specified [Not Implemented].

-t, --contact-type <contact_type>

compute expected for cis or trans region of a Hi-C map.trans-expected is calculated for pairwise combinations of specified regions.

Default:cis
Options:cis | trans
--regions <regions>

Path to a BED file containing genomic regions for which expected will be calculated. Region names canbe provided in a 4th column, otherwise UCSC notaion is used.When not specified, expected is calculated for all chromosomes

--balance, --no-balance

Apply balancing weights to data before calculating expected.Bins masked in the balancing weights are ignored from calcualtions.

Default:True
--weight-name <weight_name>

Use balancing weight with this name.

Default:weight
--blacklist <blacklist>

Path to a 3-column BED file containing genomic regions to mask out during calculation of expected. Overwrites inference of ‘bad’ regions from balancing weights. [Not Implemented]

--ignore-diags <ignore_diags>

Number of diagonals to neglect for cis contact type

Default:2

Arguments

COOL_PATH

Required argument

compute-saddle

Calculate saddle statistics and generate saddle plots for an arbitrary signal track on the genomic bins of a contact matrix.

COOL_PATH : The paths to a .cool file with a balanced Hi-C map. Use the ‘::’ syntax to specify a group path in a multicooler file.

TRACK_PATH : The path to bedGraph-like file with a binned compartment track (eigenvector), including a header. Use the ‘::’ syntax to specify a column name.

EXPECTED_PATH : The paths to a tsv-like file with expected signal, including a header. Use the ‘::’ syntax to specify a column name.

Analysis will be performed for chromosomes referred to in TRACK_PATH, and therefore these chromosomes must be a subset of chromosomes referred to in COOL_PATH and EXPECTED_PATH.

COOL_PATH, TRACK_PATH and EXPECTED_PATH must be binned at the same resolution (expect for EXPECTED_PATH in case of trans contact type).

EXPECTED_PATH must contain at least the following columns for cis contacts: ‘chrom’, ‘diag’, ‘n_valid’, value_name and the following columns for trans contacts: ‘chrom1’, ‘chrom2’, ‘n_valid’, value_name value_name is controlled using options. Header must be present in a file.

cooltools compute-saddle [OPTIONS] COOL_PATH TRACK_PATH EXPECTED_PATH

Options

-t, --contact-type <contact_type>

Type of the contacts to aggregate

Default:cis
Options:cis | trans
--min-dist <min_dist>

Minimal distance between bins to consider, bp. If negative, removesthe first two diagonals of the data. Ignored with –contact-type trans.

Default:-1
--max-dist <max_dist>

Maximal distance between bins to consider, bp. Ignored, if negative. Ignored with –contact-type trans.

Default:-1
-n, --n-bins <n_bins>

Number of bins for digitizing track values.

Default:50
--quantiles

Bin the signal track into quantiles rather than by value.

--range <range_>

Low and high values used for binning genome-wide track values, e.g. if range`=(-0.05, 0.05), `n-bins equidistant bins would be generated. Use to prevent the extreme track values from exploding the bin range and to ensure consistent bins across several runs of compute_saddle command using different track files.

--qrange <qrange>

The fraction of the genome-wide range of the track values used to generate bins. E.g., if `qrange`=(0.02, 0.98) the lower bin would start at the 2nd percentile and the upper bin would end at the 98th percentile of the genome-wide signal. Use to prevent the extreme track values from exploding the bin range.

Default:0.0, 1.0
--weight-name <weight_name>

Use balancing weight with this name.

Default:weight
--strength, --no-strength

Compute and save compartment ‘saddle strength’ profile

--regions <regions>

Path to a BED file containing genomic regions for which saddleplot will be calculated. Region names can be provided in a 4th column and should match regions and their names in expected.

-o, --out-prefix <out_prefix>

Required Dump ‘saddledata’, ‘binedges’ and ‘hist’ arrays in a numpy-specific .npz container. Use numpy.load to load these arrays into a dict-like object. The digitized signal values are saved to a bedGraph-style TSV.

--fig <fig>

Generate a figure and save to a file of the specified format. If not specified - no image is generated. Repeat for multiple output formats.

Options:png | jpg | svg | pdf | ps | eps
--scale <scale>

Value scale for the heatmap

Default:log
Options:linear | log
--cmap <cmap>

Name of matplotlib colormap

Default:coolwarm
--vmin <vmin>

Low value of the saddleplot colorbar. Note: value in original units irrespective of used scale, and therefore should be positive for both vmin and vmax.

--vmax <vmax>

High value of the saddleplot colorbar

--hist-color <hist_color>

Face color of histogram bar chart

-v, --verbose

Enable verbose output

Arguments

COOL_PATH

Required argument

TRACK_PATH

Required argument

EXPECTED_PATH

Required argument

diamond-insulation

Calculate the diamond insulation scores and call insulating boundaries.

IN_PATH : The paths to a .cool file with a balanced Hi-C map.

WINDOW : The window size for the insulation score calculations.
Multiple space-separated values can be provided. By default, the window size must be provided in units of bp. When the flag –window-pixels is set, the window sizes must be provided in units of pixels instead.
cooltools diamond-insulation [OPTIONS] IN_PATH WINDOW

Options

-o, --output <output>

Specify output file name to store the insulation in a tsv format.

--ignore-diags <ignore_diags>

The number of diagonals to ignore. By default, equals the number of diagonals ignored during IC balancing.

--min-frac-valid-pixels <min_frac_valid_pixels>

The minimal fraction of valid pixels in a sliding diamond. Used to mask bins during boundary detection.

Default:0.66
--min-dist-bad-bin <min_dist_bad_bin>

The minimal allowed distance to a bad bin. Used to mask bins during boundary detection.

Default:0
--window-pixels

If set then the window sizes are provided in units of pixels.

--append-raw-scores

Append columns with raw scores (sum_counts, sum_balanced, n_pixels) to the output table.

--chunksize <chunksize>
Default:20000000
--verbose

Report real-time progress.

Arguments

IN_PATH

Required argument

WINDOW

Optional argument(s)

dump-cworld

Convert a cooler or a group of coolers into the Dekker’ lab CWorld text format.

COOL_PATHS : Paths to one or multiple .cool files OUT_PATH : Output CWorld file path

cooltools dump-cworld [OPTIONS] COOL_PATHS OUT_PATH

Options

--cworld-type <cworld_type>

The format of the CWorld output. ‘matrix’ converts a single .cool file into the .matrix.txt.gz tab-separated format. ‘tar’ dumps all specified cooler files into a single .tar archive containing multiple .matrix.txt.gz files (use to make multi-resolution archives).

Default:matrix
Options:matrix | tar
--region <region>

The coordinates of a genomic region to dump, in the UCSC format. If empty (by default), dump a genome-wide matrix. This option can be used only when dumping a single cooler file.

Default:
--balancing-type <balancing_type>

The type of the matrix balancing. ‘IC_unity’ - iteratively corrected for the total number of contacts per locus=1.0; ‘IC’ - same, but preserving the average total number of contacts; ‘raw’ - no balancing

Default:IC_unity
Options:IC_unity | IC | raw

Arguments

COOL_PATHS

Optional argument(s)

OUT_PATH

Required argument

genome

Utilities for binned genome assemblies.

cooltools genome [OPTIONS] COMMAND [ARGS]...

binnify

cooltools genome binnify [OPTIONS] CHROMSIZES_PATH BINSIZE

Options

--all-names

Parse all chromosome names from file, not only default r”^chr[0-9]+$”, r”^chr[XY]$”, r”^chrM$”.

Arguments

CHROMSIZES_PATH

Required argument

BINSIZE

Required argument

digest

cooltools genome digest [OPTIONS] CHROMSIZES_PATH FASTA_PATH ENZYME_NAME

Arguments

CHROMSIZES_PATH

Required argument

FASTA_PATH

Required argument

ENZYME_NAME

Required argument

fetch-chromsizes

cooltools genome fetch-chromsizes [OPTIONS] DB

Arguments

DB

Required argument

gc

cooltools genome gc [OPTIONS] BINS_PATH FASTA_PATH

Options

--mapped-only

Arguments

BINS_PATH

Required argument

FASTA_PATH

Required argument

genecov

BINS_PATH is the path to bintable.

DB is the name of the genome assembly. The gene locations will be automatically downloaded from teh UCSC goldenPath.

cooltools genome genecov [OPTIONS] BINS_PATH DB

Arguments

BINS_PATH

Required argument

DB

Required argument

logbin-expected

Logarithmically bin expected values generated using compute_expected for cis data.

This smoothes the data, resulting in clearer plots and more robust analysis results. Also calculates derivative after gaussian smoothing. For a very detailed escription, see https://github.com/open2c/cooltools/blob/51b95c3bed8d00a5f1f91370fc5192d9a7face7c/cooltools/expected.py#L988

EXPECTED_PATH : The paths to a .tsv file with output of compute_expected. Must include a header. Use the ‘::’ syntax to specify a summary column name.

OUTPUT_PREFIX: Output file name prefix to store the logbinned expected (prefix.log.tsv) and derivative (prefix.der.tsv) in the tsv format.”

cooltools logbin-expected [OPTIONS] EXPECTED_PATH OUTPUT_PREFIX

Options

--bins-per-order-magnitude <bins_per_order_magnitude>

How many bins per order of magnitude. Default of 10 has a ratio of neighboring bins of about 1.25

Default:10
--bin-layout <bin_layout>

‘fixed’ means that bins are exactly the same for different datasets, and only depend on bins_per_order_magnitude ‘longest_regio’ means that the last bin will end at size of the longest region. GOOD: the last bin will have as much data as possible. BAD: bin edges will end up different for different datasets, you can’t divide them by each other

Default:fixed
Options:fixed | longest_region
--min-nvalid <min_nvalid>

For each region, throw out bins (log-spaced) that have less than min_nvalid valid pixels. This will ensure that each entree in Pc by region has at least n_valid valid pixels. Don’t set it to zero, or it will introduce bugs. Setting it to 1 is OK, but not recommended.

Default:200
--min-count <min_count>

If counts are found in the data, then for each region, throw out bins (log-spaced) that have more than min_counts of counts.sum (raw Hi-C counts). This will ensure that each entree in P(s) by region has at least min_count raw Hi-C reads

Default:50
--spread-funcs <spread_funcs>

A way to estimate the spread of the P(s) curves between regions. * ‘minmax’ - the minimum/maximum of by-region P(s) * ‘std’ - weighted standard deviation of P(s) curves (may produce negative results)

  • ‘logstd’ (recommended) weighted standard deviation in logspace
Default:logstd
Options:minmax | std | logstd
--spread-funcs-slope <spread_funcs_slope>

Same as spread-funcs, but for slope (derivative) ratehr than P(s)

Default:std
Options:minmax | std | logstd
--resolution <resolution>

Data resolution in bp. If provided, additonal column of separation in bp (s_bp) will be added to the outputs

Arguments

EXPECTED_PATH

Required argument

OUTPUT_PREFIX

Required argument

random-sample

Pick a random sample of contacts from a Hi-C map, w/o replacement.

IN_PATH : Input cooler path or URI.

OUT_PATH : Output cooler path or URI.

Specify the target sample size with either –count or –frac.

cooltools random-sample [OPTIONS] IN_PATH OUT_PATH

Options

-c, --count <count>

The target number of contacts in the sample. The resulting sample size will not match it precisely. Mutually exclusive with –frac

-f, --frac <frac>

The target sample size as a fraction of contacts in the original dataset. Mutually exclusive with –count

--exact

If specified, use exact sampling that guarantees the size of the output sample. Otherwise, binomial sampling will be used and the sample size will be distributed around the target value.

-p, --nproc <nproc>

Number of processes to split the work between.[default: 1, i.e. no process pool]

--chunksize <chunksize>

The number of pixels loaded and processed per step of computation.

Default:10000000

Arguments

IN_PATH

Required argument

OUT_PATH

Required argument