展开

Machine learning in bioinformatics

最后发布时间 : 2023-04-13 21:21:15 浏览量 :

introduction


The exponential growth of the amount of biological data available raises two problems: on one hand, efficient information storage and management and, on the other hand, the extraction of useful information from these data. The second problem is one of the main challenges in computational biology, which requires the development of tools and methods capable of transforming all these heterogeneous data into biological knowledge about the underlying mechanism. These tools and methods should allow us to go beyond a mere description of the data and provide knowledge in the form of testable models. By this simplifying abstraction that constitutes a model, we will be able to obtain predictions of the system.

可用生物数据数量的指数级增长带来了两个问题:一方面,有效的信息存储和管理;另一方面,从这些数据中提取有用的信息。第二个问题是计算生物学的主要挑战之一,这需要开发能够将所有这些异构数据转化为有关潜在机制的生物学知识的工具和方法。这些工具和方法应该允许我们超越仅仅对数据的描述,并以可测试模型的形式提供知识。通过这种构成模型的简化抽象,我们将能够获得对系统的预测。



There are several biological domains where machine learning techniques are applied for knowledge extraction from data. Figure 1 shows a scheme of the main biological problems where computational methods are being applied. We have classified these problems into six different domains: genomics, proteomics, microarrays, systems biology, evolution and text mining. The category named ‘other applications’ groups together the remaining problems. These categories should be understood in a very general way, especially genomics and proteomics, which in this review are considered as the study of nucleotide chains and proteins, respectively.

有几个生物领域应用机器学习技术从数据中提取知识。图1显示了应用计算方法的主要生物学问题的方案。我们将这些问题分为六个不同的领域:基因组学、蛋白质组学、微阵列、系统生物学、进化和文本挖掘。“其他应用程序”这一类别将剩下的问题归为一类。这些类别应该以非常一般的方式来理解,特别是基因组学和蛋白质组学,在这篇综述中,它们分别被认为是对核苷酸链和蛋白质的研究。

生信小木屋

音频生成