{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Compartments & Saddleplots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the compartments and saddleplot notebook! \n", "\n", "This notebook illustrates cooltools functions used for investigating chromosomal compartments, visible as plaid patterns in mammalian interphase contact frequency maps.\n", "\n", "These plaid patterns reflect tendencies of chromosome regions to make more frequent contacts with regions of the same type: active regions have increased contact frequency with other active regions, and intactive regions tend to contact other inactive regions more frequently. The strength of compartmentalization has been show to vary through the cell cycle, across cell types, and after degredation of components of the cohesin complex. \n", "\n", "In this notebook we:\n", "\n", "* obtain compartment profiles using eigendecomposition\n", "* calculate and visualize strength of compartmentalization using saddleplots" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# import standard python libraries\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import pandas as pd\n", "import os, subprocess" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Import python package for working with cooler files and tools for analysis\n", "import cooler\n", "import cooltools.lib.plotting" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from packaging import version\n", "if version.parse(cooltools.__version__) < version.parse('0.5.4'):\n", " raise AssertionError(\"tutorials rely on cooltools version 0.5.4 or higher,\"+\n", " \"please check your cooltools version and update to the latest\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./data/test.mcool\n" ] } ], "source": [ "# download test data\n", "# this file is 145 Mb, and may take a few seconds to download\n", "import cooltools\n", "cool_file = cooltools.download_data(\"HFF_MicroC\", cache=True, data_dir='./data/') \n", "print(cool_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating per-chromosome compartmentalization\n", "\n", "We first load the Hi-C data at 100 kbp resolution. \n", "\n", "Note that the current implementation of eigendecomposition in cooltools assumes that individual regions can be held in memory-- for hg38 at 100kb this is either a 2422x2422 matrix for chr2, or a 3255x3255 matrix for the full cooler here." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "clr = cooler.Cooler('./data/test.mcool::resolutions/100000')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the orientation of eigenvectors is determined up to a sign, the convention for Hi-C data anaylsis is to orient eigenvectors to be positively correlated with a binned profile of GC content as a 'phasing track'. \n", "\n", "In humans and mice, GC content is useful for phasing because it typically has a strong correlation at the 100kb-1Mb bin level with the eigenvector. In other organisms, other phasing tracks have been used to orient\n", "eigenvectors from Hi-C data. \n", "\n", "For other data analyses, different conventions are used to consistently orient eigenvectors. For example, spectral clustering as implemented in [scikit-learn](\n", "https://github.com/scikit-learn/scikit-learn/blob/03245ee3afe5ee9e2ff626e2290f02748d95e497/sklearn/utils/extmath.py#L1041) orients vectors such that the absolute maximum element of each vector is positive. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "## fasta sequence is required for calculating binned profile of GC conent\n", "if not os.path.isfile('./hg38.fa'):\n", " ## note downloading a ~1Gb file can take a minute\n", " subprocess.call('wget --progress=bar:force:noscroll https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz', shell=True)\n", " subprocess.call('gunzip hg38.fa.gz', shell=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | chrom | \n", "start | \n", "end | \n", "GC | \n", "
|---|---|---|---|---|
| 0 | \n", "chr2 | \n", "0 | \n", "100000 | \n", "0.435867 | \n", "
| 1 | \n", "chr2 | \n", "100000 | \n", "200000 | \n", "0.409530 | \n", "
| 2 | \n", "chr2 | \n", "200000 | \n", "300000 | \n", "0.421890 | \n", "
| 3 | \n", "chr2 | \n", "300000 | \n", "400000 | \n", "0.431870 | \n", "
| 4 | \n", "chr2 | \n", "400000 | \n", "500000 | \n", "0.458610 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 3250 | \n", "chr17 | \n", "82800000 | \n", "82900000 | \n", "0.528210 | \n", "
| 3251 | \n", "chr17 | \n", "82900000 | \n", "83000000 | \n", "0.518530 | \n", "
| 3252 | \n", "chr17 | \n", "83000000 | \n", "83100000 | \n", "0.561450 | \n", "
| 3253 | \n", "chr17 | \n", "83100000 | \n", "83200000 | \n", "0.535119 | \n", "
| 3254 | \n", "chr17 | \n", "83200000 | \n", "83257441 | \n", "0.473451 | \n", "
3255 rows × 4 columns
\n", "| \n", " | chrom | \n", "start | \n", "end | \n", "name | \n", "
|---|---|---|---|---|
| 0 | \n", "chr2 | \n", "0 | \n", "242193529 | \n", "chr2 | \n", "
| 1 | \n", "chr17 | \n", "0 | \n", "83257441 | \n", "chr17 | \n", "