Changes in genomic sequence might influence the gene expression, protein function and, what is related to phenotype of the organism. The Next Generation Sequencing provides a big amount of data that could be used in predicting the single nucleotide variants between analyzed and reference genome. Herein we compare three tools for predicting the structural variants: Freebayes, GATK toolkit and DeepVariant. Predictions with usage of each program were made on cucumber lines and the results were compared. Our analysis indicates that in order to obtain more precise and reliable variant predictions it is worth to use more than one program for detecting polymorphisms and cross-check the results.
Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.
Genome sequencing is the core of genomic research. With the development of NGS and lowering the cost of procedure there is another tight gap - genome assembly. Developing the proper tool for this task is essential as quality of genome has important impact on further research. Here we present comparison of several de Bruijn assemblers tested on C. sativus genomic reads. The assessment shows that newly developed software - dnaasm provides better results in terms of quantity and quality. The number of generated sequences is lower by 5 - 33% with even two fold higher N50. Quality check showed reliable results were generated by dnaasm. This provides us with very strong base for future genomic analysis.
Laser Capture Microdissection (LCM) is a sample preparation microscopic method that enables isolation of an interesting cell or cells population from human, animal or plant tissue. This technique allows for obtaining pure sample from heterogeneous mixture. From isolated cells, it is possible to obtain the appropriate quality material used for genomic research in transcriptomics, proteomics and metabolomics. We used LCM method to study flower morphogenesis and specific bud’s organ organization and development. The genes expression level in developing flower buds of male (B10) and female (2gg) lines were analyzed with qPCR. The expression was checked for stamen and carpel primordia obtained with LCM and for whole flower buds at successive stages of growth.
The development of next generation sequencing opens the possibility of using sequencing in various plant studies, such as finding structural changes and small polymorphisms between species and within them. Most analyzes rely on genomic sequences and it is crucial to use well-assembled genomes of high quality and completeness. Herein we compare commonly available programs for genomic assembling and newly developed software - dnaasm. Assemblies were tested on cucumber (<i>Cucumis sativus</i> L.) lines obtained by <i>in vitro</i> regeneration (somaclones), showing different phenotypes. Obtained results shows that dnaasm assembler is a good tool for short read assembly, which allows obtaining genomes of high quality and completeness.
The application of genomic approaches may serve as an initial step in understanding the complexity of biochemical network and cellular processes responsible for regulation and execution of many developmental tasks. The molecular mechanism of sex expression in cucumber is still not elucidated. A study of differential expression was conducted to identify genes involved in sex determination and floral organ morphogenesis. Herein, we present generation of expression sequence tags (EST) obtained by differential hybridization (DH) and subtraction technique (cDNA-DSC) and their characteristic features such as molecular function, involvement in biology processes, expression and mapping position on the genome.
Integrated CuGene is an easy-to-use, open-source, on-line tool that can be used to browse, analyze, and query genomic data and annotations. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. It also allows users to upload and display their own experimental results or annotation sets. An important functionality of the application is a possibility to find similarity between sequences by applying four different algorithms of different accuracy. The presented tool was tested on real genomic data and is extensively used by Polish Consortium of Cucumber Genome Sequencing.
Three cDNA clones were used to screen cucumber genome in order to find genes and proteins. Functional annotation reveals that they are correlated with ubiquitination pathways. Various bioinformatics tools were used to screen and check protein sequences features such as: the presence of specific domains, transmembrane regions, cleavage site and cellular placement. The computational analysis for promotor region shows many binding sites for transcription factors, which could regulate the expression of genes. In order to check gene expression levels in developing flower buds of monoecious (B10) and gynoecious (2gg) cucumber lines, the real – time PCR technique was applied. The expression was checked for the whole buds and only for the 3<sup>rd</sup> and 4<sup>th</sup> whorls of bud when generative organ are form which were obtained by Laser Capture Microdissection (LCM) technique.
Real-time quantitative polymerase chain reaction is consider as the most reliable method for gene expression studies. However, the expression of target gene could be misinterpreted due to improper normalization. Therefore, the crucial step for analysing of qPCR data is selection of suitable reference genes, which should be validated experimentally. In order to choice the gene with stable expression in the designed experiment, we performed reference gene expression analysis. In this study genes described in the literature and novel genes predicted as control genes, based on the <i>in silico</i> analysis of transcriptome data were used. Analysis with geNorm and NormFinder algorithms allow to create the ranking of candidate genes and indicate the best reference for flower morphogenesis study. According to the results, genes <i>CACS</i> and <i>CYCL</i> were characterised the most stable expression, but the least suitable genes were <i>TUA</i> and <i>EF</i>.
Two <i>Arabidopsis thaliana</i> genes from the PP2C family of protein phosphatases (<i>AtABI1</i> and <i>AtABI2</i>) were used to find orthologous genes in the <i>Cucumis sativus</i> L. cv. Borszczagowski (cucumber) genome. Cucumber has been used as a model plant for sex expression studies because although it has been defined as a monoecious species, numerous genotypes are known to produce only female, only male, or hermaphroditic flowers. We identified two new orthologous genes of <i>AtABI1</i> and <i>AtABI2</i> in the cucumber genome and named them <i>CsABI1</i> and <i>CsABI2</i>. To determine the relationships between the regulation of <i>CsABI1</i> and <i>CsABI2</i> and flower morphogenesis in cucumber, we performed various computational analyses to define the structure of the genes, and to predict regulatory elements and protein motifs in their sequences. We also performed an expression analysis to identify differences in the expression levels of <i>CsABI1</i> and <i>CsABI2</i> in vegetative and generative tissues (leaf, shoot apex, and flower buds) of monoecious (B10) and gynoecious (2gg) cucumber lines. We found that the expressions of <i>CsABI1</i> and <i>CsABI2</i> differed in male and female floral buds, and correlated these findings with the abscisic acid signaling pathways in male and female flowers.
An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.
The new sequencing methods, called Next Generation Sequencing gives an opportunity to possess a vast amount of data in short time. This data requires structural and functional annotation. Functional identification and characterization of predicted proteins could be done by in silico approches, thanks to a numerous computational tools available nowadays. However, there is a need to confirm the results of proteins function prediction using different programs and comparing the results or confirm experimentally. Here we present a bioinformatics pipeline for structural and functional annotation of proteins.
Identifying structural variations is crucial to obtain comprehensive knowledge on genomic differentiation. Massive
data generated by present technologies determines researchers to make use of computational methods for variation
discovery in genomes. Focusing on results and trying to specify challenges remained and possible solutions for the
future, here we give a review of state-of-the-art methods and software utilized for structural variation discovery.
Recent rapid development of next generation sequencing (NGS) technologies provided significant impact into genomics
field of study enabling implementation of many <i>de novo</i> sequencing projects of new species which was previously
confined by technological costs. Along with advancement of NGS there was need for adjustment in assembly programs.
New algorithms must cope with massive amounts of data computation in reasonable time limits and processing power
and hardware is also an important factor. In this paper, we address the issue of assembly pipeline for de novo genome
assembly provided by programs presently available for scientist both as commercial and as open – source software. The
implementation of four different approaches – Greedy, Overlap – Layout – Consensus (OLC), De Bruijn and Integrated
resulting in variation of performance is the main focus of our discussion with additional insight into issue of short and
long reads correction.
A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use <i>Fgenesh</i>, <i>GenScan </i>and<i> GeneMark </i>to automated structural annotation, the results have been compared to reference annotation.