Representing the structure and function of biological systems via formal languages,
for description, simulation, analysis and (eventually) compilation.

What is Systems Biology? Wikipedia; Chemical&EngineeringNews
What is NOT Systems Biology? Yuri Lazebnik; Olaf Wolkenhauer
Computational Techniques: Executable Biology
Harvard movie: The Inner Life of the Cell


The Centre for Computational and Systems Biology  Microsoft Research - University of Trento
Annual Scientific Reports
SPiM Player release (stochastic π-calculus simulator with GUI)
The BBSRC Centre for Integrative Systems Biology  at Imperial College London
The Microsoft Biology Foundation

BioComputing at Microsoft

Collaboration opportunities
Computational Biology
Cambridge Bioinformatics




Molecular Programming - Tutorial (Microosft Research Cambridge, Feb 11 '10)
Molecules as Automata - Open Lectures for PhD Students in Computer Science PDF PDF PDF PDF (Warsaw, March&May'09)
Molecules as Automata - International Summer School on Natural Computing PDF PDF (BNC'08)
Artificial Biochemistry - Graduate Course (Trento, May 22-26 '06)
Membrane Interactions - International School on Computational Sciences for Complex Systems in Biology PDF PDF  (Rovereto, April 23 '04)

Position Papers and Notes

 Visualization in Process Algebra Models of Biological Systems (Data Intensive Computing)
 Biological Systems as Complex Systems (IST FET Complex Systems)
 Process Calculi and Biology (IST FET)
 Languages for Systems Biology (Grand Challenges UK, GC1 InVivo<=>InSilico)
 InVivo<=>InSilico (Ronan Sleep Ed.) (Grand Challenges UK, GC1 InVivo<=>InSilico)

Student Projects

A Stochastic π-Calculus Models of MHC Class I Antigen Presentation
Leonard Goldstein (with Luca Cardelli and Andrew Phillips), Computational Biology MPhil Project, Cambridge, August 26 2005.
Generating Definitions of Cell Cycles in π-Calculus from Mathematical Models
Rosemary Francis (with Pietro Lio'), Part II Computer Science Tripos, Cambridge, May 16 2005.

Publication Venues

The HFSP journal
The Transactions on Computational Systems Biology journal [Editorial Board Member]
The Synthetic and Systems Biology journal
The Journal of the Royal Society Interface
The CMSB Conferences


  • DNA Strand Displacement Simulator

    We present a programming language for designing and simulating DNA circuits in which strand displacement is the main computational mechanism.
  • Stochastic π-Calculus Simulators

    The underlying model for stochastic π-Calculus is continuous time Markov chains, including infinite-state systems.
    Executing a stochastic π-Calculus program involves at each step computing a set of possible communication events
    between pairs of processes, and choosing one of the events (and its specific process continuations) according to
    exponential distributions, e.g. by Gillespie. In traditional (non-stochastic) pi-calculus the choice of events is
    nondeterministic: the original motivation was the study of nondeterministic systems.
    Each stochastic π-Calculus process can be seen as a state automaton, which coordinates its state transitions by
    communicating with other automata in a compositional way (i.e. it is not necessarily a single gigantic automata
    for the whole system). The various automata might not be finite-state and may evolve in unbounded ways, and
    even create new communication channels and new automata. The characteristic feature of π-Calculus, as opposed
    to previous process calculi, is the ability to dynamically generate new communication channels, which can be used
    in a number of ways and turns out to be an extremely flexible modeling device.
    • SPiM (Andrew Phillips)
    • PDF SPiM Example: MAPK Cascade
      (Chi-Ying F. Huang and James E. Ferrell, Jr., Ultrasensitivity in the mitogen-activated protein cascade, PNAS 93, 10078-10083, 1996).
    • PDF SPiM Example: Evolved Gene-Protein Networks
      (Paul Franois and Vincent Hakim, Design of genetic networks with specified functions by evolution in silico, PNAS (101)2, 580-585, 2004).
    • BioSPI (Regev, Shapiro, Silverman)

Essential Bibliography


    Biochemistry Glossary | Prefixes and Suffixes

Quegli che pigliavano per altore altro che la natura
maestra de' maestri s'affaticavano invano.
Those who took inspiration from other than nature,
the master of masters, were laboring in vain.
[Leonardo da Vinci - 1500]

Abstract Machines of Systems Biology

An abstract machine is a fictional information-processing device that can, in principle, have a number of different physical realizations (mechanical, electronic, biological, or software). An abstract machine is characterized by a collection of discrete states, and by a collection of operations (or events) that cause discrete transitions between states, possibly concurrently. The adequacy of this generic model for describing complex systems is argued, e.g., in D.Harel. "Statecharts: a visual formalism for complex systems." Science of Computer Programming 8:231-274. North-Holland 1987.

Biochemical toolkits in cellular biology (nucleotides, amino acids, and phospholipids) can be seen as abstract machines with appropriate sets of states and operations. Each abstract machine corresponds to a different kind of informal algorithmic notation that biologists have developed (inside bubbles). These machines operate in concert and are highly interdependent. Genes instruct the production of proteins and membranes, and direct the embedding of proteins within membranes. Some proteins act as messengers between genes, and others perform various gating and signaling tasks when embedded in a membrane. Membranes confine cellular materials and bear proteins on their surfaces. In eukaryotes, membranes confine the genome, so that local conditions are suitable for regulation, and confine other reactions carried out by proteins in specialized vesicles.

To understand the functioning of a cell, one must understand (at least) how the various machines interact. This involves considerable difficulties in modeling and simulations because of the drastic differences in the "programming model" of each machine, in the time and size scales involved.
Source: Abstract Machines of Systems Biology (TCSB) [@Springer]

The Membrane Machine

The basic operations on membranes, implemented by a variety of molecular mechanisms, are local fusion (two patches merging) and local fission (one patch splitting in two). In two dimensions at the local scale of membrane patches, fusion and fission become a single operation, switch. A switch is a fusion when it decreases the number of whole membranes, and is a fission when it increases such number.

When seen on the global scale of whole 2D membranes, switch induces four operations: in addition to the obvious splitting (Mito) and merging (Mate) of membranes, there are also operation, quite common in reality, that cause a membrane to eat (Endo) or spit (Exo) another subsystem (P). There are common special cases of Mito and Endo, when the subsystem P consists of zero (Drip, Pino) or one (Bud, Phago) membranes.

Although this is an unusual computational model, the membrane machine supports the execution of real algorithms. In fact, some sets of operations, such as {Pino, Phago, Exo} are Turing-complete, and can encode the other membrane operations.

Source: Abstract Machines of Systems Biology (TCSB) [@Springer]

Biological Systems as Reactive Systems


Stochastic π-Calculus Simulation

Source: A Compositional Approach to the Stochastic Dynamics of Gene Networks (TCSB) [@Springer]

Impromptu Research Statement

Systems Biology is the emerging interdisciplinary study of complex biological systems from the point of view of relationships and interactions between different components. Many of these interactions are based on digital information (e.g. DNA) and on sophisticated information processing.

Biologists, especially now with the help of bioinformatics, are assembling huge data bases of information about various biological structures (e.g. the several Genome projects). This data helps, first of all, in understanding how various biological processes and molecular mechanisms work; biologists are making great and actually absolutely amazing progress in fundamental data gathering and understanding. But understanding the high-level behavior of living organisms seems to require more than understanding the individual nanomachines they are made of: there are things to be understood that are not found in any single piece of molecular hardware. Biological systems are "complex systems" pretty much in the same sense as "complex software systems": layer upon layer of complex control mechanisms that in essence do a lot of very sophisticated information processing. I do not want to minimize this, but it is like trying to reverse-engineer your PocketPC, and spending all the time in figuring out the processor and the bus protocols: that's not going to tell you anything about how it synchs Outlook! Because of that, system behaviors cannot be modeled "analytically" very well by the usual techniques of continuous mathematics. Some people (and some biologists) now claim that we need to achieve a better understanding of this "systems" level, in order to understand how these things truly work, and hence how to fix them other than by testing all possible chemicals on them.

That kind of biological data is now being collected and made available on the web. So, there is first a big problem of choosing a common representation for the data. For genetic data, well, it's fundamentally all strings of AGCT letters, but even that is a pretty hard problem because of all the auxiliary information that goes with it. Similar databases and representation problems exist for more complex structures, such as proteins. So there are plenty of research problems already here. The next level up, and this is where I get into the picture a bit, is to build databases of biological *processes*, e.g. of the various signaling pathways that describe sequences of events that causes something to happen in an organism. These are generally very concurrent pathways, and some kind of process language is needed to describe them. It could even be BPEL, or something similar, but is going to be quite different than storing AGCT data. If one had such a library of interoperable biological processes, one could pick and choose a bunch of these processes, combine them in a simulator or in a symbolic analyzer (not unlikely a software analyzers), and perform "in silico" experiments, e.g. to test drugs without having to synthesize them. Moreover, if these biological control processes are so complex, with lots of selected-in redundancy, it seems pretty unlikely that any single chemical could have much effect on them, and not at the same time on lots of unrelated things. Rather, an effective drug should itself be a multi-stage process that reacts to the organism, and one would need to study how these processes interact.

This kind of research activity is already going on in preliminary forms in various academic and government projects, and one of my researcher goals is keep track of those. At the same time I am trying to contribute to the idea of describing biological processes, that is of finding "programming languages" that can code them up effectively, for the various purposes of archival, analysis, and simulation.

A long term vision in this field is one day to put all those processes together, and simulate an entire organism, say a very small cell (which is itself astoundingly complex). Such a project is at the level of a "grand challenge" that would require an effort way beyond the Human Genome project. Some such projects have been proposed in the past, but more in the area of differential equations simulation of these systems. Well, we don't use differential equations to code up algorithms, and there is reason to believe that they are just as unsuitable for describing critical aspects of decision processes in living organisms.

So, this is where the things we regularly do, computer software, may have something new to contribute: at the "system software" level of understanding. Biology is not just exciting science, but exciting "computer" science, in the broad sense of the study of information. Biological systems (of all kinds) are fundamentally information processing systems that happen to run on wet hardware. Even at the lowest level, the genetic code, that's digital information (1 Megabyte in each E.Coli bacteria, 800 Megabytes in each human cell), and reproduction is almost by definition making copies of information. And it goes all the way up from there. I am sure that software will play a critical role, one way or another, either as a new paradigm or as a tool to assist biologists in ways that are yet unthinkable. And I am also sure that, as soon as we learn how to build these wet machines for ourselves, there will be a lot of programming to do.