AlphaFold Revolution for Crystallographic Software
23 Aug 2022



Dr. Eugene Krissinel, leader of the STFC core team for the Collaborative Computational Project No. 4 (CCP4) foresees changes ahead for the structural biology community

White male, bearded man smiling



AlphaFold-2 is an AI system for predicting protein structures from a given sequence of amino acids, released recently by DeepMind Inc., part of Alphabet Inc. [1]. It solves one of the central and long-standing problems in structural biology with efficiency that has a significant, if not devastating, effect on the structure determination software, the scale and consequences of which are yet to be understood. Let’s see what has happened in the outline.

One of the main motivations behind structural biology research is understanding protein function through its structure, which is closely related to the question of the relationship between protein sequence and structure. Over the past 40 years, significant progress has been made in this direction, both experimentally and in terms of computational methods for protein homology analysis. The Smith and Waterman algorithm, proposed in 1981 [2], laid the foundation for homology studies. By 2002, algorithms for secondary structure prediction had reached ~80% accuracy, giving more structural insight in sequence-based analysis [3]. Profile matching techniques (see, e.g., [4,5]) boosted the sensitivity of sequence matching so that structural homology could be detected deep in the twilight zone. Computation of 3-dimensional protein folds directly from sequence was a challenging task for a long time. It was largely solved by AlphaFold-2 and RoseTTAFold by 2021 [1,6].

Each of the above achievements has had a significant impact on structural biology research, computational methods and software for structure determination. In this context, the emergence of AlphaFold-2, which produces structures that are incredibly accurate by computational standards (cf the figu​re below), is perceived as a revolution, since it solves the very problem of the sequence-structure relationship. Up until now, finding protein folds was possible only by using sophisticated experimental techniques, primarily through X-ray diffraction on macromolecular crystals, which required a massive software support, giving rise to major software projects in the field: CCP4 in the UK [7] and Phenix in the USA [8]. Does AlphaFold-2 make obsolete these types of experiments and software packages now?
Comparison of experimentally determined and AlphaFold-predicted structures of Ribonuclease, a medium-size protein that catalyzes the degradation of RNA into smaller components. The overall difference is less than 0.5Å r.m.s.d. and can be seen as doubling of covalent links between atoms in the picture. For reference, the experimental structure resolution is 1.8 Å. Given experimentally determined electron density (not shown in the Figure), very little effort is required to make the way from predicted to experimentally confirmed structure as compared with the conventional procedure of phasing and building structures “from scratch”. The structure prediction and comparison were done in CCP4 Cloud, a new graphical system for remote, in-browser, structure solution that gained a considerable popularity during the pandemic due to the overall shift to remote working patterns [9].

The answer is yes and no. Probably the best analogy here is the discovery of Pluto, which was heavily assisted by tremendous achievements of Newtonian mechanics by the end of the 19th century, so that the planet was found “at the tip of a pen” -- and yet all the might of astronomy was needed to actually locate it. And so is the AlphaFold-2, which provides astonishingly accurate navigation to protein structures, like to planets of the molecular biology world, but ​one still needs a vehicle (read “experiment and software") to actually land on them. However, the vehicle will travel a much shorter and easier way now, and, therefore, it may be lighter, faster and automatic to a much higher degree. In this sense, CCP4 will need to change. The AlphaFold does not eliminate the need for CCP4, but it makes one of many ways of solving protein structures, called “molecular replacement (MR)", truly dominating. Over 95% (estimated) of structures can be now solved with MR having AlphaFold predictions as guidance, and the solution can be almost completely automated. This fact is already exploited by CCP4 and its rival Phenix, with significant changes already coming for their users. In many, and probably most cases, solving a protein structure becomes a routine, in a striking contrast with the situation of just a few years ago, when crystallographic and software expertise was required in about half of all cases.

However exciting all this may sound, the situation creates not only benefits but also new challenges peculiar to CCP4's role as a crystallographic knowledge keeper and software developer and maintainer. Now, probably, around 90% of the package will see a rather low (yet non-zero) use rate, which puts up a question of how economical it is to continue the support of old codes, their testing, adjustment to ever-evolving file formats, new types of data and so on. Likewise, should CCP4 invest in the future development of methods, alternative to MR, that are likely to see a drastically decreased (yet non-zero) interest? “The new way" is so much more robust and simpler (yet not a 100%-sure shot), that little prior knowledge is required from a molecular biologist to solve a structure – will this affect the level of relevant expertise in the field? Probably yes. And how CCP4 should now revise its extensive educational program, if only a rather thin slice of its functionality is expected to be used by 90% of researchers? Given a clear demonstration, by AlphaFold, of what AI technologies may achieve, is there a scope for a deep, AI-based revision of many other CCP4 components and structure solution methods?

​We do not have answers to all these questions now, but they will surely come in the next few years at the latest. CCP4 has been delivering crystallographic software for more than 40 years now and has been through various challenges only to become better and stronger. One thing is clear, we are crossing a ridge at the moment, with a vision of new challenges and opportunities on the other side. CCP4 Project needs to change in the new “AlphaFold, AI, era", and it will do so.

*weblogo203.gif'CCP4 exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques.'

[1] Jumper, J., Evans, R., Pritzel, A. et al. (2021) Nature 596, 583–589.

[2] Smith, T.F. & Waterman, M.S. (1981) J. Mol. Biol. 147(1) 195-197.

[3] Aloy, P., Stark, A., Hadley, C. & Russell, R.B. (2003) Proteins: Struct. Funct. Bioinf. 53(S6) 436-456.

[4] Yona, G. &  Levitt, M. (2002) J. Mol. Biol. 315(5) 1257-1275.

[5] Söding, J. (2005) Bioinformatics 21, 951-960.

[6] Baek, M., DiMaio, F., Anishchenko, I. et al. (2021) Science 373(6557) 871-876.

[7] Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). Acta Cryst. D67, 235-242.

[8] Liebschner, D., Afonine, P. V., Baker, M. L., Bunkoczi, G., Chen, V. B., Croll, T. I., Hintze, B., Hung, L.-W., Jain, S., McCoy, A. J., Moriarty, N. W., Oeffner, R. D., Poon, B. K., Prisant, M. G., Read, R. J., Richardson, J. S., Richardson, D. C., Sammito, M. D., Sobolev, O. V., Stockwell, D. H., Terwilliger, T. C., Urzhumtsev, A. G., Videau, L. L., Williams, C. J. & Adams, P. D. (2019). Acta Cryst. D75, 861-877.

[9] Krissinel, E., Lebedev, A., Uski, V., Ballard, C., Keegan, R., Kovalevskiy, O., Nicholls, R., Pannu, N., Skubák, P., Berrisford, J., Fando, M., Lohkamp, B., Wojdyr, M., Simpkin, A., Thomas, J., Oliver, C., Vonrhein, C., Chojnowski, G., Basle, A., Purkiss, A., Isupov, M., McNicholas, S., Lowe, E., Triviño, J., Cowtan, K., Agirre, J., Rigden, D., Uson, I., Lamzin, V., Tews, I., Bricogne, G., Leslie, A., & Brown, D. (2022), Acta Cryst. D, under review.



Contact: Geatches, Dawn (STFC,DL,SC)