Introduction:
In recent times, there has been a rapid growth in the amount of data generated and collected on the human genome project and sequencing projects in other organisms. The rapid growth of biological data has consequently led to increased demand for data analysis and interpretation methods and tools. The evolving science of bioinformatics provides the necessary processes and tools to conduct this data analysis (Bayat, 2002). There are a number of specifically dedicated bioinformatics problem-solving tools and successful applications. However, despite the many existing applications, bioinformatics is still considered as a science in its infancy, and the process of massive integrating it into the various disciplines such as computer science, biology, chemistry, medicine, mathematics and physics has just begun (Jena, Aqel, Srivastava & Mahanti, 2009).
Bioinformatics is an emerging field of study that deals with the application of computational and analytical tools to capture and interpret biological data. According to Luscombe, Greenbaum, & Gerstein (2001), bioinformatics refers to the application of computational techniques in understanding and organizing information on biological macromolecules. The field is considered interdisciplinary since it harnesses knowledge in biology, computer science, mathematics and physics. The subject is essential for data management in medicine and biology. Due to the large amount of data and analysis required, computers are particularly indispensable in biological research. The use of computers for biological data analysis is thus an ideal approach due to their ability to manipulate large quantities of data seamlessly as well as examining the complex dynamics experienced in nature (Bayat, 2002).
The unexpected relationship between biology and computer science subjects stems from the fact that life is in itself an information technology. In this context, genes determine an organism’s physiology thus at the most basic level, genetic information in organisms can be compared to digitally encoded information in computers. There have also been major technological advancements in the areas supplying the initial data. It has been theorized that a single experimental lab is capable of easily producing over 100 gigabytes of data in a day. The information technology field has also matched the incredible processing power with major advancements in Internet technologies, Central Processing Unit (CPU) power and increased storage capacities. This has led to faster-computing speeds, more efficient data access and transmission, and better storage (Luscombe, Greenbaum, & Gerstein, 2001).
This paper describes a brief history of bioinformatics, aims and objectives of bioinformatics research, current state of affairs in the field, processes, tools and applications of bioinformatics, industry challenges, emerging trends and future expectations in the field.
Historical perspective of Bioinformatics:
Bioinformatics has been widely associated with large databases containing information on gene and protein structure, sequencing and functional information. It can be argued that the history of bioinformatics is traceable to the discovery of genetic inheritance in 1865 by Gregor Mendel. In real sense, bioinformatics research began in the late 1960's and was symbolized by Margaret Oakley DayHoff's early modeling analyses of RNA and protein structures, and her protein sequence atlas. These early works were representative of two different origins of bioinformatics namely evolution and biochemistry which still define bioinformatics topics to date (Jena, Aqel, Srivastava & Mahanti, 2009).
The field of bioinformatics has been evolving, and it emerged as a field of its own in the mid 1970’s after automated DNA and protein sequencing became a possibility. In fact, the actual application of bioinformatics began in the mid to late 1980’s when computers started being used as centralized information repositories that could be remotely accessed (Nature Biotechnology, 2000).
Early efforts at implementing bioinformatics included the Staden Package, which was used for DNA sequencing and PROPHET in the United States. The system was considered as a national computing resource used for research in life sciences and tailored to meet the data analysis and management needs of life scientists working in various fields ranging from molecular biology to pharmacology. The system utilized an integrated graphical interface environment which had functions such as data manipulation and analysis, molecular structures, biological simulation models, graphs and facilities to compare and contrast the different nucleic acid and protein sequences (Nature Biotechnology, 2000).
In the late 1980’s Intelli-Genetics Company developed the PC/GENE software package. The software package could translate a specified gene sequenced into the corresponding protein structure it encoded, complete with database comparisons and storage structure predictions. In the early 1990’s Amos Bairoch launched PROSITE, which was a protein sequence and structure correlations database. In 1991, Bairoch launched SWISS-PROT, the first full version of a protein sequence databank. In other parts of the world, new databases were being created as well as software for accessing and analyzing data. These development of these emerging systems was mainly driven by academic scientists, researchers and large research centers such as NIH in the United States. In the early 1990's SWISS-2DPAGE, a proteomics-oriented database was developed, and it contained data on 2D PAGE (two-dimensional polyacrylamide gel electrophoresis) protein maps from both diseased and healthy tissues. In the early and mid-1990’s, the Internet became the main means of remote communication mainly due to the development of web servers such as ExPAsy and networking tools such as WAIS and Gopher. In the light of these technological advancements, bioinformatics gained root as a field and grew to what it is today (Nature Biotechnology, 2000).
Current state of bioinformatics research:
Currently, there is a wide variety of freely accessible bioinformatics tools available over the Internet. In fact according to the BioCat listing which is ran by the European Bioinformatics Institute's (EBI) based in Cambridge, UK, there were more than 500 bioinformatics tools listed by the year 2000. Over the last decade, the number of bioinformatics tools has rapidly grown due to the increased influx of software developers into life sciences, and the increasing number of life scientists who have begun to appreciate and grow software development skills.
The ability to collect, store, classify, analyze, interpret and convey biological information derived from functional analysis and sequencing projects is so significant in modern biotechnology such that stakeholders are coming together in unprecedented ways to help each other carry out the task. The various stakeholders include companies, governments, funding bodies and scientists in general. The reason for coming together is because it has long been a common belief that the long term capabilities of bioinformatics lies more in the conversion of knowledge into better therapeutic applications than in the tools used. Consequently, there are collected efforts to standardize bioinformatics tools and improve their usability as much as possible.
A classic example of the standardization approach is the Bio-standards project developed by the EBI. The project, which is jointly funded by the European Commission, the EBI and an assortment of pharmaceutical companies includes the development and adoption of software tools and databases to meet both existing and new standards. Other functions of the multiyear project are provision of education and training.
The University of Singapore’s Bioinformatics Center is a good illustration of the fact that bioinformatics is strategic in nature and that it can be practiced anywhere in the globe as long as there is Internet access. The flexibility of bioinformatics thus allows for contributions from quarters that were previously unheard of in relation to a lab-based technology. In the year 2000, the University of Singapore reported on its efforts to develop and integrated database tool with the ability to access different, and incompatible databases simultaneously. This is a clear example of how bioinformatics has grown to become a globally acknowledged discipline.
The United States National Institutes of Health (NIH) is known for its massive, long-term financial support for bioinformatics projects such as the Entrez database which receives more than 50,000 daily queries. In fact, the NIH is under constant pressure to increase its funding on bioinformatics projects since it recognizes the importance of bioinformatics research and also due to pressure from competing entities.
The current state of bioinformatics in terms of market size and potential has become difficult to analyze mainly because most resources and tools are available free of charge. However, strategic alliances formed between companies that focus on biochips, pharmacogenomics and genomics have spurred a significant growth in the market. This is mainly because these firms are heavily reliant on efficient applications of bioinformatics in their operations. In fact, major biotechnology and pharmaceutical companies whose operations involve genomics have established their internal programs and capabilities in bioinformatics.
Aims and Objectives of bioinformatics:
Bioinformatics has three main aims, and the first aim is to organize biological data in ways that allow researchers to easily access existing information and also submit new data entries as they are generated. For example, there exists a Protein Data Bank for storing 3D data on macromolecular structures. While data storage is essential, the information stored in this database is useless until it is analyzed and interpreted. The second aim is to develop methods, tools and other resources to aid in data analysis. For example, after conducting sequencing on a chosen protein, it would be interesting to compare the new sequence with previous sequencing results. This task requires advanced searching methods and software tools such as FASTA (fast all) and Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) which must consider what a biological match should comprise. Such tools and resources require much computing expertise to develop and similar deep understanding of biological concepts. The final aim of bioinformatics is to use computing tools and resources to perform data analysis and result interpretation in a biologically meaning way (Luscombe, Greenbaum, & Gerstein, 2001).
Before the advent of bioinformatics, biological studies involved detailed examination of individual systems and frequently comparing them with few other related systems. However, bioinformatics now allows researchers to analyze data globally and unveil principles are common across different systems and highlight new features (Luscombe, Greenbaum, & Gerstein, 2001).
Tasks and Applications of bioinformatics:
There are various biological problems that are considered within the scope of bioinformatics research, and most of these involve the study of genes, nucleic acid predictions, molecular design with docking, and protein sequencing. When broadly classified the various bioinformatics tasks include chromosome gene mapping, gene finding and identification of promoters from DNA sequences, aligning and comparing protein, RNA and DNA sequences, prediction of DNA and RNA structures, and prediction and classification of protein structure. Other tasks include interpreting gene expressions and micro-array data, identifying gene regulatory frameworks, molecular design with docking, and construction of phylogenetic trees used in the study of evolutionary relationships (Gaurav, Kumar & Nigam, 2012).
Bioinformatics has various applications since it provides practical tools for the exploration of DNA and proteins in various ways. Essentially, bioinformatics provides recognition techniques that are used to compare sequences by detecting their similarities and consequently to interrelate structures and functions. Another major application of bioinformatics is the direct prediction of 3-dimensional protein structures based on linear sequences found in amino acids. Bioinformatics also helps researchers to understand complex genomes easily by first analyzing simple organisms and then applying the same principles to more complex life forms. Essentially, this technique makes it easy to identify potential drug targets by matching the similarities in the basic microbial proteins. Apart from identifying targets bioinformatics can also be used to design drugs based on genomics (Jena, Aqel, Srivastava & Mahanti, 2009).
Challenges in bioinformatics:
There exist several challenges in the field of bioinformatics and key to that is the need to collect relevant data, and also have a data mining and integration platform. This challenge can be met by the implementation of fully integrated data analysis and capturing systems. Another key challenge in bioinformatics is the running costs of databases. As earlier seen, most tools and databases are accessible free of charge but developers are now finding it hard to maintain them for free. For example, in 1998, SWISS-PROT began charging an annual access fee for its corporate users. The fee ranged from $2500 to $90,000 but did not apply for academic users. SWISS-PROT is one of the few success cases since it has over 200,000 users worldwide and contains information on more than 70,000 proteins.
On the technical side, the challenges are limitless with genome studies revealing the complexities of defining genes based on only their start and stop codons. In fact, genes in higher eukaryotes are often within each other and have complex transcription and splicing patterns. Even in alternative splicing methods, several gene transcripts are often seen to share one or more gene exons. In this case, rules to unambiguously assign genes within genomes do not exist thus raising key concerns in the field.
Businesswise, companies that wish to reap benefits in the bioinformatics field have the problem of value creation since most of the technology is offered free of charge. However, it is still possible to create value using academic versus corporate use, premium memberships, and personal versus enterprise applications among other business models.
The final challenge is how to create data analysis and mining tools that can retrieve the desired information from the vast number of cross-platform databases while maintaining the highest possible "signal-to-noise" ratio. Efforts in this area should be directed towards standardization, semantic ontology, or using language representation techniques to create the necessary annotations and identify the desired information in a manner that is independent of the database in question. In this case, the aim is to provide the highest level of transparency possible without giving rise to complexities.
Future work in bioinformatics:
Bioinformatics is an interdisciplinary field that uses different technologies with multifaceted approaches and numerous applications. In this case, the future of bioinformatics is still unpredictable as it is but from the state of affairs, the post-genomic era seems promising.
As with every broad field of study, the emergence of smaller specialized branches within the field of bioinformatics is inevitable. With respect to this notion, there is an emerging field within bioinformatics that has come to be referred to as functional bioinformatics. This field is concerned with ontology development or concept classifications which are utilized by algorithm when performing computations that utilize biomolecule functions as input and output. Such computations include functional similarity queries, database queries for some specific functional sequences and algorithms that can predict structure and function from sequence data. The EcoCyc database is a classic example of a database that uses functional ontology. In this case, the ontology is responsible for encoding a variety of events and processes such as regulating gene expressions, signal transduction, transport events, and controlling enzyme reactions.
Automated sequence preprocessing is another future application of bioinformatics since it involves preprocessing of obtained sequences in large-scale laboratories such as those dealing with the Human Genome Project.
It is expected that most genomes will be sequenced in the near future and from there, bioinformatics can be used to model metabolic and genetic networks. In this case, some initial developments already exist in the form of qualitative models that combine qualitative reasoning and Boolean networks derived from artificial intelligence methods. If successfully adopted, these techniques promise to deliver accurate computer-based prediction models of biological function.
Finally, the most interesting application to watch out for in bioinformatics is biological computing (Biocomputing). This sub-discipline focuses on using genes as information storage, retrieval and manipulation “devices” that can be used for computational activities.
Conclusion:
Bioinformatics is a field that has gained root as a major field in modern biotechnology over a few years. Its main objective is to manage biological information regardless of the data sources used and modes of representation. In fact, the field's main focus is to speed up the progress of life science and discovery of new drugs. While it is right to appreciate the current state of affairs given that the field is relatively young, some significant challenges in the field have also been noted. However, on the forefront of research are emerging techniques that aim at improving data acquisition, manipulation, storage, analysis, interpretation and dissemination across different platforms. Promising advancements expected in the future include unambiguous assignment of gene sequences and the development of novel ontology and language-based approaches to ensure transparent information access across different platforms. It is also worth noting that the various stakeholders in the field of bioinformatics have come together with the common goal of realizing the ultimate goal of applying bioinformatics therapeutically. The ultimate goal of bioinformatics is thus to empower researchers with computational tools and techniques and enable them work effortlessly with various forms of biological data in real time, and provide solutions for the benefit of all organisms.
References:
Bayat, A. (2002). Science, medicine, and the future: Bioinformatics. BMJ, 324(7344), 1018-1022. doi:10.1136/bmj.324.7344.1018
Gaurav, A., Kumar, V., & Nigam, D. (2012). New Applications of Soft Computing in Bioinformatics: A Review. Journal of Pure and Applied Science & Technology, 2(2), 12-22.
Jena, R., Aqel, M., Srivastava, P., & Mahanti, P. (2009). Soft computing methodologies in bioinformatics. European Journal of Scientific Research, 26(2), 189-203.
Luscombe, N., Greenbaum,, D., & Gerstein, M. (2001). What is Bioinformatics? A Proposed Definition and Overview of the Field. Methods of Information in Medicine, 40(4), 346-358.
Nature Biotechnology,. (2000). Bioinformatics. Nature Biotechnology, 18, IT31 - IT34. doi:10.1038/80068
Nigam, D., & Kumar, V. (2012). New Applications of Soft Computing in Bioinformatics: A Review. Journal of Pure and Applied Science & Technology, 2(2), 12-22.