From Infancy to Adolescence: Transformation of bioinformatics towards System Biology

16 August 2013

An interdisciplinary scientific analytical stream which was born to play a supportive or complementary role for various molecular experimental analyses has finally emerged as a discipline itself.

Increment of biological data generated through high throughput techniques were in need of fast, accurate and reliable analytical techniques immensely.

Did bioinformatics moved beyond that belief to be a supportive discipline in the past decades? We would say, definitely yes.

Amount of literature getting published regularly with core bioinformatics work is the proof of its emergence as an important individual discipline. In this context some particular sections of bioinformatics which has grown tremendously are discussed below.

Dynamic database management systems with workbench and analytical tool support:
Biological databases have moved beyond the job of merely deposition and retrieval of experimentally or theoretically derived scientific records. Thousands of biologically important and specialized dynamic databases are becoming online regularly which can be tracked in issues of journals like Nucleic Acids Research and other database related journals1.

The initial features of NCBI2 has also been altered drastically along with other allied databases and tools such as Bioproject, Biosystem, Clinvar, Epigenomics, Geo, SNP, SRA etc. The present databases are also associated with analytical tools which are useful for comparative analysis and valuable information retrieval, for instance, HIV database is a highly specific database related to HIV ( and is associated with an array of analytical servers and tools.

Workbenches and software:
The wings of bioinformatics have spanned over several important sub disciplines such as sequence analysis, phylogenetic analysis, comparative genomics etc. To obtain such success several softwares with the purpose of better and robust analysis have been developed. In broad way, these tools are of two types, namely, online servers and offline executables (those having installation executable files and runs locally on a system), further, these tools are again categorized as workbenches and general softwares, where, workbenches or suites are a large collection of tools which works as a platform for several related analysis such as GCG, CLC, Galaxy etc. for sequence analysis. Specific softwares are for specific analyses such as PHYLIP, MEGA and PAUP are used for phylogenetic analysis.

Genomics, transcriptomics, proteomics, metabolomics and other “omics” high throughput data generation compelled the bioinformatics sector to expedite and develop large scale, accurate and robust analytical tools with better visualization and data interpretation ability. Such kind of analytical tools are observed for microarray, NGS data analysis platforms, machine learning platforms and System biology workbenches.

Microarray data analysis:
Large scale gene expression data generated through microarray techniques requires extensive analysis capacity and computing efficiency. Technically, generating data through Affymetrix or other chips is one aspect and the other half is the extensive and sensitive computing aspect where rigorous statistical calculations are performed and step wise analysis are done using some efficient tools such as R package or Bioconductor. Detection of disease causing proteins or genes from a pool of huge genes and proteins are now feasible though this technology3,4. Large scale genomics and proteomics analysis is performed in less time using microarray methodologies.

Next generation Sequencing (NGS) data analysis:
Similar to microarray data analysis methodologies, NGS data analysis is another latest addition in the arsenal of bioinformatics techniques where gigabytes of bases are sequenced in parallel and analyzed using sophisticated bioinformatics protocols. This technology has provided the footstep towards the successful implementation of personalized medicine in the coming future. Although cost effective yet it is now possible to sequence genome of a patient and find out the disease trends through such kind of analysis5. We hope these techniques will be available for common person too in a cheaper cost in near future.

Impact of modern machine learning techniques in Biological data analysis:
Machine learning techniques have emerged as the best tool for classification and clustering of complex and highly overlapping biological datasets6, 7 along with assessing the relationship of various important parameters8. Bio-inspired algorithms such as artificial neural network (ANN) 9, ant colony optimization (ACO) are extensively used for genes, proteins and other specific classifications and predictions. Other techniques such as Support Vector Machines (SVM), decision tree based approaches, Self Organizing Maps (SOM) are also being used for classification and clustering important biological data including disease causing gene and protein classifications. At present and in future, these computational techniques integrated with experimental methods may be used for identifying so far unknown genes and proteins for human and disease causing pathogens.

System biology: towards understanding whole organisms:
Exploring an organism completely is a far reach even for today’s developed and sophisticated methodologies and protocols. The beauty of predictive or computational approaches is in the capability of scientific explorations where experimental techniques are either not able to reach or highly expensive and time consuming. System biology is such a discipline which raises the hope in understanding a complete biochemical pathway or a small pathogenic organism as a whole. Different workbenches related to system biology have been developed including specific computing languages such as SBML (System Biology Mark Up language).

The journey continues:
The present status and the direction of bioinformatics analysis and development are rendering extensive support or analytical tools to molecular bioscience along with establishing itself as an individual discipline. A drastic shift from the previous decades showed very fast growth and important tool development capabilities of this interdisciplinary applied science. Though at this moment it is not completely possible to bring the real-time complexity of biological or medical science in a desktop system and provide deep simulation with all reality, but the growing availability and capability of the specific hardware, software, internet connectivity and algorithmic advances may make this possible in near future.

1. Xosé M, Fernández-Suárez, Galperin MY. The 2013 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucl. Acids Res. 2013;41 (D1): D1-D7. DOI: 10.1093/nar/gks1297
2. Bayat A. Bioinformatics. BMJ 2002; 324:1018–22.
3. Olson NE. The microarray data analysis process: from raw data to biological significance. NeuroRx. 2006;3(3):373-83.
4. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7(1):55-65.
5. Rizzo JM, Buck MJ. Key Principles and Clinical Applications of "Next-Generation" DNA Sequencing. Cancer Prev Res 2012; 5:887-900.
6. Banerjee AK, Ravi V, Murty US, Shanbhag AP, Prasanna VL. Keratin protein
property based classification of mammals and non-mammals using machine learning
techniques. Comput Biol Med. 2013;43(7):889-99. DOI:10.1016/j.compbiomed.2013.04.007.
7. Banerjee AK, Ravi V, Murty US, Sengupta N, Karuna B. Application of intelligent techniques for classification of bacteria using protein sequence-derived features. Appl Biochem Biotechnol. 2013; 170(6):1263-81. DOI: 10.1007/s12010-013-0268-1.
8. Banerjee AK, Manasa BP, Murty US. Assessing the relationship among physicochemical properties of proteins with respect to hydrophobicity: a case study on AGC kinase superfamily. Indian J Biochem Biophys. 2010;47(6):370-377.
9. Banerjee AK, Kiran K, Murty US, Venkateswarlu Ch. Classification and identification of mosquito species using artificial neural networks. Comput Biol Chem. 2008;32(6):442-7. DOI: 10.1016/j.compbiolchem.2008.07.020.

Competing interests: None declared

Amit Kumar Banerjee, Research Fellow

Indian Institute of Chemical Technology, CSIR-Indian Institute of Chemical Technology, Hyderabad-500007, AP, India

Click to like: