Improving the Accuracy of 3D Chromosome Structure Inference and Analyzing the Organization of Genome in Early Embryogenesis Using Single Cell Hi-C Data

Improving the Accuracy of 3D Chromosome Structure Inference and Analyzing the Organization of Genome in Early Embryogenesis Using Single Cell Hi-C Data
Title Improving the Accuracy of 3D Chromosome Structure Inference and Analyzing the Organization of Genome in Early Embryogenesis Using Single Cell Hi-C Data PDF eBook
Author Tarak Shisode
Publisher
Pages 0
Release 2021
Genre Applied mathematics
ISBN

Download Improving the Accuracy of 3D Chromosome Structure Inference and Analyzing the Organization of Genome in Early Embryogenesis Using Single Cell Hi-C Data Book in PDF, Epub and Kindle

This dissertation summarizes my graduate work on the structure and organization of mouse genome during preimplantation development. My research is divided into three different areas, which I will discuss in turn. To begin, I will discuss my collaborative work on parental-to-embryo switch of chromosome organization during critical stages of early development. Notably, both paternal and maternal epigenomes undergo significant modifications following fertilization. Recent epigenomic studies have revealed the extraordinary chromatin landscapes found in oocytes, sperm, and early preimplantation embryos, including atypical histone modification patterns and differences in chromosome organization and accessibility. However, these studies reached polar opposite conclusions: the global absence of local topological-associated domains (TADs) in gametes and their appearance in the embryo versus the zygote's pre-existence of TADs and loops. The issues of whether parental structures can be inherited in the newly formed embryo and how these structures may be related to allele-specific gene regulation remain unresolved. To address this question, we use an optimized single cell high-throughput chromosome conformation capture (HiC) protocol to map genomic interactions for each parental genome (including the X chromosome) during mouse preimplantation. We integrate chromosome organization with allelic expression states and chromatin marks and demonstrate that after fertilization, higher-order chromatin structure is associated with an allele specific enrichment of histone H3 lysine 27 methylation. These early parental-specific domains are associated with gene repression and contribute to parentally biased gene expression-including newly described transiently imprinted loci. Additionally, we observe that these domains emerge in a non-parental-specific manner during the second wave of genome assembly. Finally, we discover that these domains are lost as genes are silenced on the paternal X chromosome but persist in regions that are not inactivated by the X chromosome. These findings highlight the complexities of three-dimensional genome organization and gene expression dynamics during early development. Second, I will discuss my work on some common and cell type-specific themes of higher order chromatin arrangements during mouse preimplantation development. Mapping the spatial organization of the genome is critical for comprehending its regulatory function in health, disease, and development. Our findings demonstrate an extraordinary amount of parent-specific chromosome choreography during the concatenation of two genomes. After fertilization, we observe an abrupt emergence of a Rabl-like configuration and a high head-to-head and tail-to-tail alignment of the chromosomes, which are gradually lost by the 64-cell stage. Additionally, the characteristics and marks of active and inactive chromatin exhibit a distinct radial profile across developmental stages and the genome. Finally, in addition to the well-known hallmarks of genome organization, we observe a preferential organization of chromosome territories - which call the "Territome". We were able to distinguish cell types based on the radial and relative positioning of the chromosomes in the 3D reconstructions. This suggest that interchromosomal interactions are just as critical for defining chromatin architecture and cellular identity as intrachromosomal interactions. Our findings establish a novel criterion for classifying cells when other hallmarks are difficult to quantify or when transcriptomics data is unavailable, thus paving a whole new way of looking at cells and learning how they function. Finally, with advances in experimental and theoretical approaches for generating single cell chromatin conformation capture assays, elucidating the genome's structure-function relationship has become a highly active area of research. Numerous computational methods have been developed to infer the genome's three-dimensional organization using Hi-C data from single cells. This is referred to as the three-dimensional genome reconstruction problem in formal terms (3D-GRP). While numerous methods exist for predicting the three-dimensional structure of a single genomic region, chromosome, or genome, the reconstructed models do not satisfy all of the input constraints. To address this, we present CUT & GROW, a method for improving the accuracy of three-dimensional chromosome structure inference using an iterative importance sampling strategy. CUT & GROW refines the structure of a three-dimensional chromosome (or genome) model by regrowing fragments of varying sizes locally, satisfying the majority of input constraints and providing a more precise view of the structure-function relationship

Hi-C Data Analysis

Hi-C Data Analysis
Title Hi-C Data Analysis PDF eBook
Author Silvio Bicciato
Publisher Humana
Pages 0
Release 2022-09-04
Genre Science
ISBN 9781071613924

Download Hi-C Data Analysis Book in PDF, Epub and Kindle

This volume details a comprehensive set of methods and tools for Hi-C data processing, analysis, and interpretation. Chapters cover applications of Hi-C to address a variety of biological problems, with a specific focus on state-of-the-art computational procedures adopted for the data analysis. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Hi-C Data Analysis: Methods and Protocols aims to help computational and molecular biologists working in the field of chromatin 3D architecture and transcription regulation.

HiC-Pro: an Optimized and Flexible Pipeline for Hi-C Data Processing

HiC-Pro: an Optimized and Flexible Pipeline for Hi-C Data Processing
Title HiC-Pro: an Optimized and Flexible Pipeline for Hi-C Data Processing PDF eBook
Author Oldenburg Oldenburg Press
Publisher
Pages 40
Release 2016-01-29
Genre
ISBN 9781523764426

Download HiC-Pro: an Optimized and Flexible Pipeline for Hi-C Data Processing Book in PDF, Epub and Kindle

HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro.

High-resolution Computational Analysis of Chromatin Architecture and Function

High-resolution Computational Analysis of Chromatin Architecture and Function
Title High-resolution Computational Analysis of Chromatin Architecture and Function PDF eBook
Author Christopher Cameron
Publisher
Pages
Release 2019
Genre
ISBN

Download High-resolution Computational Analysis of Chromatin Architecture and Function Book in PDF, Epub and Kindle

"Since sequencing the human genome in the early 2000s, researchers have been determined to define the genetic pathways that regulate cellular activity or lead to disease. With the recent advent of Chromosome Conformation Capture (3C) technologies, the ability to observe chromatin’s three-dimensional (3D) structure became a possibility. It quickly became apparent that the genome is not regulated in one-dimension, but in 3D where chromatin loops are formed between an enhancer(s) and promoter to regulate a gene’s transcription. While 3C technology is quite useful, most protocols are limited in their resolution and availability across cell types and genomes. This limited resolution is a common concern for many technologies that study the regulation of genomes, such as Chromatin Immunoprecipitation (ChIP), and typically results from low-coverage sequencing. The objective of this thesis is to develop computational and biochemical methodologies that provide accurate, high-resolution genomic data for deciphering the organization and regulation of genomes. The first contribution in this thesis is Hi-C Interaction Frequency Inference (HIFI), a collection of density estimation algorithms for High-throughput 3C (Hi-C) data. Hi-C is a particularly useful 3C technology that identifies chromatin contacts genome-wide. HIFI allows Hi-C data to be analyzed at the highest possible resolution (restriction fragments) while providing the most accurate estimation of chromatin contact frequency when compared to other techniques in the field. The higher resolution afforded by HIFI has lead to the discovery of a potential role for active promoters and enhancers at the boundaries of Topologically Associating Domains (TADs). Next, we developed machine learning approaches to predict chromatin interaction frequencies from the reference genome sequence alone. While some machine learning work has been done to predict Hi-C data, all these models rely on biochemical input to make their predictions, which makes them impossible to use in cases where this data is unavailable (e.g., computationally inferred ancestral genomes). By limiting model input to features derived from sequence only, their predictions enable us to identify sequence determinants of 3D genome organization. Finally, we present a targeted and affordable ChIP methodology, called ‘Carbon Copy-ChIP’ (2C-ChIP), that continues our foray into high-resolution chromatin assays. 2C-ChIP provides quantifiable measures of bound protein across the genome at a cost that makes it very attractive for studies involving multiple experimental conditions (e.g., drug design). We also describe a computational tool for processing 2C-ChIP products called the Ligation-mediated Amplified, Multiplexed Paired-end Sequence (LAMPS) analysis pipeline.Taken together, the work in this thesis provides new ways to study genome function and organization affordably and at high resolution"--

Investigating Chromosome Dynamics Through Hi-C Assembly

Investigating Chromosome Dynamics Through Hi-C Assembly
Title Investigating Chromosome Dynamics Through Hi-C Assembly PDF eBook
Author Lyam Baudry
Publisher
Pages 0
Release 2019
Genre
ISBN

Download Investigating Chromosome Dynamics Through Hi-C Assembly Book in PDF, Epub and Kindle

The advent of high-throughput DNA sequencing technologies has set off an expanding trend in genome assembling and scaffolding. Such genome quality is an essential preliminary to understand interactions between and among chromosomes. We built upon a computational and technological framework that let us tackle genome assembly problems of increasing complexity. Our methods are mainly based on chromosome conformation capture technologies such as Hi-C. In a Hi-C experiment, DNA molecules are cross-linked with the surrounding proteins and form a large, static protein-DNA complex. This captures the spatial conformation by trapping together molecules that are physically close to each other. Therefore, Hi-C is very suitable for 3D genome structure analysis, which lets us infer a wealth of information about the genome. It was indeed shown that the tridimensional structure of the genome can be unambiguously linked to its 1D structure thanks to the physical properties of DNA polymers. Moreover, such 3D proximity also gives access to cell compartment information, thus opening the way for an additional approach for metagenomic binning, known as meta3C. In this work, we expand upon these methods and apply them to use cases with more and more complexity. We first improve on tools for genome assembly and demonstrate their validity with the scaffolding of Ectocarpus sp., then unveil rearrangements in joint scaffoldings of Trichoderma reesei and Cataglyphis hispanica. Lastly, we use the same approach with metagenomic binning on live mouse microbiome samples to reconstruct hundreds of genomes.

Inference of 3D Structure of Diploid Chromosomes

Inference of 3D Structure of Diploid Chromosomes
Title Inference of 3D Structure of Diploid Chromosomes PDF eBook
Author Lawrence J. Sun
Publisher
Pages 62
Release 2018
Genre
ISBN

Download Inference of 3D Structure of Diploid Chromosomes Book in PDF, Epub and Kindle

The spatial organization of DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this thesis, we study the problem of reconstructing the 3D organization of the genome from whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming (SDP) formulations to obtain the 3D configurations. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two (from the available data) indistinguishable copies of each genomic locus. Due to this, the standard approach performs very poorly on diploid organisms. We prove that the 3D organization of the DNA is not identifiable from exclusively chromosome capture data for diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant constraints (including distances between neighboring genomic loci and to the nucleus center or higher-order interactions). Under these conditions we prove there are finitely many solutions and conjecture we in fact have identifiability. Finally, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy even under noise.

Introduction to Single Cell Omics

Introduction to Single Cell Omics
Title Introduction to Single Cell Omics PDF eBook
Author Xinghua Pan
Publisher Frontiers Media SA
Pages 129
Release 2019-09-19
Genre
ISBN 2889459209

Download Introduction to Single Cell Omics Book in PDF, Epub and Kindle

Single-cell omics is a progressing frontier that stems from the sequencing of the human genome and the development of omics technologies, particularly genomics, transcriptomics, epigenomics and proteomics, but the sensitivity is now improved to single-cell level. The new generation of methodologies, especially the next generation sequencing (NGS) technology, plays a leading role in genomics related fields; however, the conventional techniques of omics require number of cells to be large, usually on the order of millions of cells, which is hardly accessible in some cases. More importantly, harnessing the power of omics technologies and applying those at the single-cell level are crucial since every cell is specific and unique, and almost every cell population in every systems, derived in either vivo or in vitro, is heterogeneous. Deciphering the heterogeneity of the cell population hence becomes critical for recognizing the mechanism and significance of the system. However, without an extensive examination of individual cells, a massive analysis of cell population would only give an average output of the cells, but neglect the differences among cells. Single-cell omics seeks to study a number of individual cells in parallel for their different dimensions of molecular profile on genome-wide scale, providing unprecedented resolution for the interpretation of both the structure and function of an organ, tissue or other system, as well as the interaction (and communication) and dynamics of single cells or subpopulations of cells and their lineages. Importantly single-cell omics enables the identification of a minor subpopulation of cells that may play a critical role in biological process over a dominant subpolulation such as a cancer and a developing organ. It provides an ultra-sensitive tool for us to clarify specific molecular mechanisms and pathways and reveal the nature of cell heterogeneity. Besides, it also empowers the clinical investigation of patients when facing a very low quantity of cell available for analysis, such as noninvasive cancer screening with circulating tumor cells (CTC), noninvasive prenatal diagnostics (NIPD) and preimplantation genetic test (PGT) for in vitro fertilization. Single-cell omics greatly promotes the understanding of life at a more fundamental level, bring vast applications in medicine. Accordingly, single-cell omics is also called as single-cell analysis or single-cell biology. Within only a couple of years, single-cell omics, especially transcriptomic sequencing (scRNA-seq), whole genome and exome sequencing (scWGS, scWES), has become robust and broadly accessible. Besides the existing technologies, recently, multiplexing barcode design and combinatorial indexing technology, in combination with microfluidic platform exampled by Drop-seq, or even being independent of microfluidic platform but using a regular PCR-plate, enable us a greater capacity of single cell analysis, switching from one single cell to thousands of single cells in a single test. The unique molecular identifiers (UMIs) allow the amplification bias among the original molecules to be corrected faithfully, resulting in a reliable quantitative measurement of omics in single cells. Of late, a variety of single-cell epigenomics analyses are becoming sophisticated, particularly single cell chromatin accessibility (scATAC-seq) and CpG methylation profiling (scBS-seq, scRRBS-seq). High resolution single molecular Fluorescence in situ hybridization (smFISH) and its revolutionary versions (ex. seqFISH, MERFISH, and so on), in addition to the spatial transcriptome sequencing, make the native relationship of the individual cells of a tissue to be in 3D or 4D format visually and quantitatively clarified. On the other hand, CRISPR/cas9 editing-based In vivo lineage tracing methods enable dynamic profile of a whole developmental process to be accurately displayed. Multi-omics analysis facilitates the study of multi-dimensional regulation and relationship of different elements of the central dogma in a single cell, as well as permitting a clear dissection of the complicated omics heterogeneity of a system. Last but not the least, the technology, biological noise, sequence dropout, and batch effect bring a huge challenge to the bioinformatics of single cell omics. While significant progress in the data analysis has been made since then, revolutionary theory and algorithm logics for single cell omics are expected. Indeed, single-cell analysis exert considerable impacts on the fields of biological studies, particularly cancers, neuron and neural system, stem cells, embryo development and immune system; other than that, it also tremendously motivates pharmaceutic RD, clinical diagnosis and monitoring, as well as precision medicine. This book hereby summarizes the recent developments and general considerations of single-cell analysis, with a detailed presentation on selected technologies and applications. Starting with the experimental design on single-cell omics, the book then emphasizes the consideration on heterogeneity of cancer and other systems. It also gives an introduction of the basic methods and key facts for bioinformatics analysis. Secondary, this book provides a summary of two types of popular technologies, the fundamental tools on single-cell isolation, and the developments of single cell multi-omics, followed by descriptions of FISH technologies, though other popular technologies are not covered here due to the fact that they are intensively described here and there recently. Finally, the book illustrates an elastomer-based integrated fluidic circuit that allows a connection between single cell functional studies combining stimulation, response, imaging and measurement, and corresponding single cell sequencing. This is a model system for single cell functional genomics. In addition, it reports a pipeline for single-cell proteomics with an analysis of the early development of Xenopus embryo, a single-cell qRT-PCR application that defined the subpopulations related to cell cycling, and a new method for synergistic assembly of single cell genome with sequencing of amplification product by phi29 DNA polymerase. Due to the tremendous progresses of single-cell omics in recent years, the topics covered here are incomplete, but each individual topic is excellently addressed, significantly interesting and beneficial to scientists working in or affiliated with this field.