From a computational point of view,Deoxyribonucleic acid (DNA) is long strings made of four different letters ({A,C, T, and G}). In contrast, from a biological point of view, DNA is thehereditary molecule in almost all living things.

It contains the instructionsof growth, functioning and reproduction of an organism. The DNA is made up offour chemical nucleotides: Adenine (A), guanine (G), cytosine (C) and thymine(T). A DNA has a double-stranded structure, where each molecule is hold by the twostrands and DNA bases pair up with each other, A with T and C with G, to formunits called base pairs and series of different base pairs made DNA fragment.  The procedure of determining the sequence ofnucleotides in a DNA fragment is called sequencing.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

It is not feasible to readthe entire sequence of a genome at once by using present technologies, whichcan only sequence small DNA fragments consisting of a few hundred nucleotides. In 1975 Frederick Sangerdeveloped a sequencing technology known as Sanger sequencing. While thistechnology has been constantly enhanced over 30 years, it can only sequenceabout 500 to 1000 bps of DNA at once. The process of constructing reads by theSanger technique is very sluggish and costly. The Sanger technique conqueredthe world of genome sequencing for over two decades and led to a number of events,including the conclusion of the sequencing of the human genome. Sangersequencing is generally known as the first generation sequencing technique (Sanger et al.

1977).The Human Genome Project (HGP) wasa global study project that started in 1990. The goal of HGP was to sequencethe entire human genome. The first sketch of human genome was printed in 2000.

The project was finished in 2003, when the first whole human genome waspublished. About one third of the sequences of the HGP project were producedusing the Sanger sequencing technique. Sanger sequencing also played an importantrole in acquiring the DNA sequence of mice (Collins et al. 2001).

DNA sequencing has revolutionizedcurrent advancements in the field of science and technology. It has been widelyused in applied field of medicine, genetic engineering, food science etc (Sperberet al., 2008).

In current era, Next Generation Sequencing (NGS) is the mostadvanced technology of DNA sequencing, which provides more accuracy and speedthan previously known Sanger sequencing (Buermans et al., 2014).  Paired-end sequencing in NGS, which involvesthe sequencing of both forward and reverse fragments of DNA, has further increasedthe accuracy and ability to detect indels which otherwise was not possible insingle-end sequencing (Grimm et al., 2013).

Next generation sequencingtechnique produces millions of short reads which, without a reference genome,is one of the challenging task for de novo assemblers (Shendure et al., 2008).    In the past few years, several de novosequence assembling algorithms have been developed to handle and assemble thelarge amount of data in the form of contigs but choosing the appropriate assemblerfor paired-end or single-end data is still a challenging job (Baker et al.,2012). The currently availableassembling algorithms include de Bruijn Graph (DBG), Overlap Layout Consensus(OLC), string graph, greedy and hybrid algorithm (Miller et al.

, 2010). DBG isthe graph algorithm based on k-mers approach, which split the short reads intosmaller k-mers, and these k-mers overlap by k-1 which is the next k-mer. Dividingthe sequences into smaller sizes also help improves the crisis of differentinitial read lengths, whereas, OLC is also the graph based algorithm whichbuilds overlap graph by overlapping the similar sequences (Kang et al., 2013).Finding Overlapping sequencing is usually the slowest part of the assembly andthese overlapped sequences then pack fragments of the overlap graph intocontigs.

  DBG algorithm is fast and OLCalgorithm executes better for longer sequence reads. String graph algorithm isthe variant of OLC algorithm, which performs global overlap graph byeliminating unnecessary sequences (Li et al., 2012). Greedy algorithms start byjoining the short reads that are best overlapped to produce contigs. Mostgreedy assemblers use heuristic techniques that are designed to eliminatemisassembling of recurring sequences (Pop et al., 2002). Hybrid assemblingalgorithm refers to the mixing various assembling algorithms.

It is used toreduce the number of contigs and errors produced by other algorithms (Koren etal., 2012).