May 2020 – Computational Biology Data Center, GenBank Accession Numbers and UniProt IDs

New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data.

May 21, 2020 by Kathy Gibson

Bayesian network (BN) is a reconstruction of biological systems analysis approach prototype data that has been successfully used to reverse engineer and network models that reflect the various layers of biological organization (from genetics to epigenetics to the cellular pathway for metabolomics).

This is particularly relevant in the context of modern studies (current and prospective), which produces a high-throughput omics heterogeneous datasets. However, there are both barriers theoretical and practical applications for a seamless modeling BN large data such as, including inefficiency optimal computing BN search algorithm structure, ambiguity in the discretization of data, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and BNS visualization.

To overcome these and other obstacles, we BNOmics this, improved algorithms and software toolkit to summarize and analyze BNS of omics datasets. Data exploration BNOmics goal in the type of comprehensive biological systems, including both produce new biological hypotheses and test and validate existing ones. Novel aspects of the algorithm centers around improving scalability and application to various types of data (assuming a different distribution of explicit and implicit) within the framework of the same analysis.

Output and visualization interfaces to many available software graphics rendering are also included. Three detailed a variety of applications. BNOmics originally developed in the context of genetic epidemiology data and are continuously optimized to follow the increasing influx of large scale omics datasets available.

Thus, scalability of the software and usability at less than computer hardware exotic is a priority, as well as the application of algorithms and software for the dataset heterogeneous containing polymorphisms many data types-single-nucleotide and other genetic / epigenetic / transcriptome is variable, metabolite levels, variable epidemiology, endpoints, and phenotype, etc.

Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells.

Identifying transcription factors (TF) and related network involved in the regulation of stem cells is important to understand the initiation and growth of plant tissues and organs. Although many TF been shown to have a role in stem cells Arabidopsis root, a comprehensive view of the signature transcription of stem cells is lacking. In this work, we use data transcriptomic to predict the spatial and temporal interactions between genes involved in the regulation of stem cells.

To achieve this, we are transcriptionally profiled some stem cell populations and develop gene regulatory network inference algorithms that combine with dynamic grouping Bayesian network inference. We utilize our network topology regulator potential major conclusions.

In particular, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of the central functions of silence. The results presented in this work show our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify factors candidate functions in stem cells.

Training bioinformaticians in High Performance Computing.

May 21, 2020 by Kathy Gibson

In recent decades, bioinformatics has become an indispensable branch of modern scientific research, experienced an explosion in financial support, application development and data collection. Growth dataset emerging from research laboratories, industry, health sector, etc., increase the level of demand in computing power and storage.

Biological data processing, large-scale datasets, often require the use of High Performance Computing (HPC) resources, especially when dealing with certain types of data omics, such as genomic and metagenomic the data. resources such as computing not only require substantial investment, but they also involve high maintenance costs.

More importantly, to maintain good returns on investment, specialized training should be put in place to ensure that waste is minimized. Moreover, given that bioinformatics is a field that is highly interdisciplinary in which several other domains intersect (such as biology, chemistry, physics and computer science), researchers from the areas also require training in bioinformatics in HPC, in order to fully utilize the centers supercomputer.

In this document, we describe our experience in the training of researchers from several different disciplines in HPC, as applied to bioinformatics in the context of Europe’s leading bioinformatics platform ELIXIR, and analyze both the content and outcome of the course.

Functional classification of protein structures by local structure matching in graph representation.

As a result of initiatives of high-throughput protein structure, over 14,400 protein structures have been solved by the Structural Genomics (SG) centers and participating research groups. While the totality of data SG is an outstanding contribution to genomics and structural biology, functional reliable information for this protein is generally lacking. better functional predictions for proteins SG will add great value to structural information has been obtained.

Our method described herein, Graphic Representation of the active site for the Prediction Function (GRASP-Func), quickly and accurately predict biochemical function of proteins by representing residue predicted at the local site is active as a graph rather than in Cartesian coordinates.

We compared the methods of GRASP-Func to our method previously reported, a structural block Local Site Activities (SALSA), using ribulose phosphate Binding Barrel (RPBB), 6-Hairpin Glycosidase (6-HG), and concanavalin A-like lectin / glucanase (CAL / G) superfamilies as test cases. In each superfamilies, SALSA and faster methods of GRASP-Func produce the correct classification similar to that previously characterized proteins, provide a benchmark validated for the new method. In addition, we analyzed protein and SG using our SALSA-Func GRASP method for predicting the function.

Forty-one SG at RPBB protein superfamily, nine in 6-SG protein superfamily HG and SG protein in the CAL / G superfamily successfully classified into one functional families within the superfamily each with both methods. , Faster, enhanced validated computational method can produce a more reliable prediction of the functions that can be used for various applications by the public.

Iron Hack – A symposium/hackathon focused on porphyrias, Friedreich’s ataxia, and other rare iron-related diseases.

May 21, 2020 by Kathy Gibson

Background: The scientific basic and clinical research at the University of South Florida (USF) intersected to support a multifaceted approach around a common goal on rare diseases related to iron.

We proposed a modified version of the National Center (NCBI) Hackathon information biotechnology model to take full advantage of local expertise in the construction of “Iron Hack,” a rare hackathon focused on diseases. As the collaborative, problem solving of hackathons tends to attract participants from very different backgrounds, the organizers have hosted a symposium on rare diseases related to iron, especially porphyria and Friedreich’s ataxia, drawn to the general public .

Methods: The hackathon was structured to start each day with presentations by expert clinicians, genetic counselors, researchers focused on molecular and cellular biology, public / global health health, genetics / genomics, computational biology , bioinformatics, biomolecular science, bioengineering and computer science, as well as guest speakers from the American Foundation porphyria (APF) and the Friedreich’s Ataxia research Alliance (FARA) to inform participants on the human impact of these diseases.

Results: Because of this Hackathon, we have developed resources that are relevant not only to these specific models-diseases, but also to other rare diseases and problems of bioinformatics in general. In the two and a half days, the participants’ Iron Hack »successfully integrated collaborative projects to visualize the data, building databases to improve the diagnosis of rare diseases, and to study the legacy rare disease.

Conclusions: The purpose of this manuscript is to demonstrate the usefulness of a hackathon model to generate prototypes of generalized tools for a given disease and train clinicians and science interact effectively.

DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis.

A central tenet of the study was that the reproducible scientific results published along with the underlying data and the software code needed to reproduce and verify the findings. A number of tools and software has been released which facilitates such as work-flows and scientific journals are increasingly demanding that the code and the primary data made available by the publication.

There is little practical advice on the implementation of reproducible research work for a large flow ‘omics’ or the biological systems data sets used by the analyst team working together. In such cases, it is important to ensure all analysts are using the same version of a set of data for their analysis. However, instantiating relational databases and standard operating procedures could be severe, with high “startup” costs and non-compliance with the procedure when they deviate substantially from a regular analyst workflow.

Ideally reproduced workflow research should fit naturally into the existing individual work-flow, with minimal disruption. Here, we provide an overview of how we have made use of open source tools is popular, including Bioconductor, Rmarkdown, version control git, R, and in particular system R package combined with new tools DataPackageR, to apply mild reproducible research workflow for preprocessing of data sets great, perfect for sharing among teams of small-to-medium sized computational scientists.

Our main contribution is DataPackageR tool, which decouples the time-consuming data processing of the data analysis while leaving the track record of how the raw data is processed into analysis-ready data sets. The data object ensure software packages are documented and performs a checksum verification along with the basic package version management, and importantly, leaves record data processing code in the form of sketches package. Our group has been implementing this workflow to manage, analyze and report data pre-clinical immunology test of the multi-center, multi-assay studies for the last three years.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31