Skip to content

Latest commit

 

History

History
127 lines (79 loc) · 4.37 KB

File metadata and controls

127 lines (79 loc) · 4.37 KB

Introduction

Bioinformatics

Crossover between Computer Science and Molecular Biology.

  1. Biological information can be studied with approaches typical of information theory (Computer Science)
  2. Understanding Biology requires Computer Science (huge data)
  3. Computer Science useful to Biology

DNA sequencing is a very slow process. Initially it was extremely costly. Traditional sequencing was very precise but slow and costly. With the event of genome sequencing there was a change (still costly). Sequencing grew faster then the Moore Law. (100 genome project). Animal genome sequencing (mammals or anything useful for food ecc. ecc.), bacteria genome sequencing (understand resistances ecc. ecc.).

  • UniProtKB (knowledge database)
    • Swiss-Prot (manually annotated and reviewed)
    • TrEMBL (Transaltion of EMBL) (automatically annotated and not reviewed =>may contain sequencing errors)

The number of entries in the database grows exponentially.

Basic Paradigm of molecular biology is the sequent:

  • DNA => RNA => Protein

Nucleotides

DNA and RNA can be seen as a "4 letter alphabet" = (A,C,T/U,G) called information storage

Phosphate + Sugar + Nitrogenous base

Biological Sequences

How to turn a "4 letter alphabet" into a "20 letter alphabet" of the proteins?

You need triplets of nucleotides to encode a letter of the "protein alphabet"

RNA copies just a part of the whole DNA, just the one he needs to produce the protein needed

An RNA sequence is composed by one of the three "stop" codons

Proteins is where "things" become three-dimensional

Ammino Acids: Proteins

  • Carbon in the center
  • Hydrogen to the left
  • Acidic carboxyl group to the right
  • R group to the bottom
  • Amino group to the top

Proteins

Sequence => Structure => Function (!?)

Where:

  • Sequence = list of "letters"
  • Structure = Three-dimensional structure
  • Function = cavity of the structure (generated by the way it folds) is called the "active site", place where the "chemical magic" happens (adding or braking a chemical bond for example) (enzymes)

In silico methods

Term for computational methods (opposed to in vitro ecc. ecc.)

Prediction on structures

Bioinformatics to understand disease-associated aspects of protein structure and function

  • alignment and identification of homologous sequences
  • Characterization of the primary protein architecture
  • Prediction of secondary and tertiary structure
  • Exploration of protein interaction network
  • Analysis of binding sites for proteins and ligands

Alternative methods


Alternative viewpoints on proteins

Starting from a known protein sequence

  • Evolutionary model
    • descriptive
    • knowledge-based
  • Protein-folding model
    • predictive
    • Optimization ("Ab initio")

Arriving to the prediction of a structure


  • A lot of methods are knowledge-based (they require background knowledge on the field od study with training sets)
  • An alternative it is possible to build predictions based on simple observations and sets rules. These methods are called optimization and are able to generate new, not observed yet, solutions.

Elixir

Most major databases are projected to double in size every 12 months

The funds devoted to store the data is less than 1% of what is spent to generate it

Elixir coordinates, integrates and sustains bioinformatics resources across its member states and enables users in academia and industry to access services that are vital for their research.

Sites:

Elixir Hub (located in London):

  • Technical coordination across Nodes
  • Standards

Elixir Node (e.g. Padua)

  • Research and development of bioinformatics services
  • management of core resources

vision

  1. Enstablish a distributed infrastructure that scales with the challenge
  2. Secure and deliver the core data resources underpinning life-science research
  3. Provide discoverable tools, services and connectors to drive data access and exploitation
  4. Provide robust technical platforms and clouds for secure data access, data access and compute
  5. Develop and maintain standards for data management, reuse and integration
  6. Partner with user communities in a sustainable manner to ensure high and lasting impact
  7. Close the computational biology skills gap through a comprehensive training programme for professionals
  8. Support innovation in "big-data biology"