- Xemlicham

The advent of chromosome conformation capture (3C)-based techniques coupled with next-generation sequencing (NGS) technology has dramatically elevated our appreciation of chromosomal organization (Fig. 1). A series of 3C-derivative sequencing methods involving crosslinking, enzymatic digestion of chromosomes, and proximity ligation have enabled high-throughput and genome-wide detection of contact frequency between genomic loci, including Circular Chromosome Conformation Capture (4C) [5] (one to all), Chromosome Conformation Capture Carbon Copy (5C) [6] (many to many), capture-HiC [7], Proximity Ligation-Assisted ChIP-Seq (PLAC-seq) [8], Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) [9] (many to all), Hi-C [10] and Micro-C [11] (all to all). All of these 3C-based techniques, except for Micro-C, use restriction enzymes to achieve the desired resolution. In principle, a more frequent cutter will yield higher resolution. However, even with a 4-cutter enzyme, which recognizes a tetranucleotide sequence, information on finer contacts between genomic loci, such as E-P, is very rare. In Micro-C, improved resolution to the level of individual nucleosomes (147 bp) is achieved by employing micrococcal nuclease (MNase) in the digestion step. Accordingly, Micro-C structural information shows overall consistency with that of Hi-C but with increased power for detecting short-range associations, including E-P interactions. Several studies have interrogated host and, when applicable, viral chromosomal architecture upon infection with 3C-based techniques, but none have reported data with nucleosomal resolution.

Fig. 1. Schematic representation of assays for studying chromosome architecture. For 3C-based methods, nuclei are first treated with appropriate fixatives (i.e., formaldehyde, DSG, etc.). In GAM, cryosections are cut from paraformaldehydefixed and sucrose-embedded samples. In 3C, 4C and 5C, fixed nuclei are treated with restriction enzymes, ligated, and the ligation frequency is measured by PCR or NGS. For Hi-C, Plac-seq, and cHi-C, chromosomal DNA is digested by restriction enzymes while micro-C uses MNase for finer resolution. The digested DNA ends are repaired with biotin-labeled nucleotides followed by blunt-end ligation. The ligated biotin-labeled contacts are sheared and purified with streptavidin beads prior to NGS sequencing. In Plac-seq and cHi-C, antibody pull-down or RNA oligo-mediated DNA pull-down is performed for target enrichment, respectively. In the SPRITE method, crosslinked chromatin is fragmented by sonication, each interacting complex is uniquely tagged by multiple rounds of split-pool barcoding, and the final material is sequenced. In the GAM method, the DNA contents from cryosections are extracted, fragmented, and sequenced. Appropriate computational analysis of the sequencing data from each approach is necessary to detect physical interactions between genomic loci.

Ligation-independent methods that retain a crosslinking step also have been introduced to assess 3D genomic organization, including split-pool recognition of interactions by tag extension (SPRITE) [12] and genome architecture mapping (GAM) [13]. SPRITE relies on the sequencing of barcoded DNA following multiple rounds of splitting and pooling, such that each interacting chromatin complex is expected to have a unique barcode. The interaction map can be determined from the extracted DNA segments with the same barcode. In the GAM technique, interaction information is derived from micro-dissected slices of nuclei in fixed cells. Because DNA loci in close proximity have a higher probability of being in the same slice, the frequency of the genomic regions in a given slice is calculated as the interaction map. Currently, there are no published studies using SPRITE or GAM to interrogate viral and host genomic organization.

In some cases, the viral components of NGS datasets have not been analyzed yet. For example, lymphoblastoid cell lines (LCLs), which are widely-used, immortalized B cells, are established by infection with the human herpesvirus (HHV) EBV. Because of the convenience of immortalization, the HapMap project [14] used LCLs to amass genotype and gene expression data. In addition, extensively characterized tier 1 ENCODE cell lines include an LCL designated as GM12878 [15]. In LCLs, ~1% of mappable sequencing reads from RNA-seq, ChIP-seq, and other NGS datasets can be aligned to the EBV genome [16]. Thus, a substantial amount of NGS data pertaining to epigenetic profiles and chromosome organization have been collected for the EBV genome through HapMap, ENCODE, and subsequent studies [17, 18]. By virtue of these publicly available datasets, EBV chromatin biology has been relatively well studied compared to other viruses.