The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While a popular strategy in de novo transcriptome assembly algorithms is to assemble the reads by obtaining a de Bruijn graph that represents the transcriptome, an additional step is needed to obtain predicted transcripts from the de Bruijn graph. A similarity search algorithm is then applied to a related organism to obtain information about possible function of these predicted transcripts. We observe...
The advent of single-cell RNA sequencing (scRNA-seq) has given researchers the ability to study transcriptomic activity within individual cells, rather than across hundreds or thousands of cells as with bulk RNA-seq techniques. The greater precision afforded by scRNA-seq identifies mutations and gene expression landscapes private to individual cells or subpopulations, enabling us to determine novel...
Introduction: There exists a number of methods that attempt to reconstruct a genome from a set of scaffolds. To do so, they (i) determine the order of scaffolds; and (ii) determine the orientation (i.e., strand of origin) of scaffolds. Some methods attempt to solve these subproblems jointly by using various types of additional data including jumping libraries, long error-prone reads, homology relationships...
We have developed linear space algorithms to compute the Damerau-Levenshtein (DL) distance [1], [2] between two strings and also to find a sequence of edit operations of length equal to the DL distance (optimal trace). Our algorithms require O(s min{m, n} + m + n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require...
A crucial task for metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer due to the fragmentary and incomplete nature of nucleotide sequence assembly, while the...
Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production...
Many biological analysis techniques require measurement of similarity between sequences from large genomic datasets, which often involves extraction of all pairs of close DNA or RNA sequences. We present a k-mer-based tool to efficiently perform such sequence similarity queries for large viral datasets produced by next-generation sequencing.
Closing gaps in draft genomes is an important post processing step in genome assembly. At present, most assembled genomes contain gaps. Usually, genomes assembled from short reads or hybrid (with both short and long reads) have much more gaps than genomes assembled purely from long reads (with high coverage). A more complete genome is highly desirable since it leads to better annotation, less genotyping...
Not only fulfilling a large portion of the worldwide meat consumption, pigs also serve as a model organism in biomedical studies due to the shared similarity with humans at both physiological and genetic levels. However, as a diploid organism, a normal pig holds two versions of genetic code simultaneously, creating an obstacle for many studies in the related field. For the first time in history, we...
Calibrating stochastic biochemical models against experimental insights remains a critical challenge in biological design automation. Stochastic biochemical models incorporate the uncertainty inherent in the system being modeled, thus demanding meticulous calibration techniques. We present an approach for calibrating stochastic biochemical models such that the calibrated model satisfies a given behavioral...
RNA-seq is a mature and well-established method for studying the complexity of the transcriptome in the research setting. As this method moves from the research realm to the clinical context, new opportunities for the development of bioinformatics methods arise. During this talk I will present some of the challenges we have found during our work to release a clinical test for tumor samples using RNA-seq...
Computing centrality involves finding the most “central” or important nodes in a network. Although potentially useful for biological networks, this can be challenging if the definition of importance is not obvious [1]. There are many different centrality algorithms with different importance definitions that return different results. This is immediately obvious in Figure 1(a), which shows the results...
Hepatitis C virus (HCV) usually establishes chronic infection, which is often asymptomatic at the early stages of disease. Unfortunately, no diagnostic criteria that can distinguish between recent and chronic HCV infections are available. Error-prone replication of HCV causes each patient to host a heterogeneous population of genetically related HCV variants. Therefore, it is usually supposed that...
Identification of Hepatitis C virus (HCV) infections is crucial in determining viral outbreaks. HCV has an affinity to lead towards chronic infection with time due to its highly mutable nature. This leads to increase in heterogeneous population of genetically related HCV variants in the affected individuals. To our knowledge, there are no reliable diagnostic assays for distinguishing acute and chronic...
Ribo-seq is a popular technique for studying translation and its regulation. Various software tools for data preprocessing, quality assessment, analysis, and visualization of Ribo-seq data have been developed. However, many of them are inaccessible to users without a thorough practical knowledge of software applications, and often multiple different tools have to be used in combination with each other...
The limitations of traditional Computer Aided Detection (CAD) systems for mammography, the extreme importance of early detection of breast cancer and the high impact of the false diagnosis of patients lead to investigating Deep Learning methods (DL) for mammograms. Deep Learning, in particular, Convolutional Neural Networks (CNNs) have been recently used for object localization and detection, risk...
Intra-tumor heterogeneity is believed to be a major source of confounding analysis and treatment resistance. In this research we introduce BAMSE, a Bayesian model based tool for intra-tumor heterogeneity analysis of bulk tumor sequencing results across multiple samples. BAMSE takes as input a list of somatic mutations and their corresponding reference and variant read counts, clusters these mutations...
Recent advances in whole-genome sequencing and SNP array data have led to generation of large amount of genotype data. Large volumes of genotype data will require faster and more efficient methods for storing and searching the data. Positional Burrows-Wheeler Transform (PBWT) [1] provides an appropriate data structure for bi-allelic data. With the increasing sample sizes, more multi-allelic sites...
Reconstructing the energy landscape of a protein holds the key to characterizing its structural dynamics and function [1]. While the disparate spatio-temporal scales spanned by the slow dynamics challenge reconstruction in wet and dry laboratories, computational efforts have had recent success on proteins where a wealth of experimentally-known structures can be exploited to extract modes of motion...
Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (1, d)-motif model is one of these that has been studied widely. In this model, there are n input sequences and each has...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.