What is the Dark Proteome

Over the past decade and a half we have moved from an era during which protein function was equated with protein 3D structure, to the recognition that disordered proteins are highly abundant in eukaryotic cells and are central to biological function. Recent breakthrough studies have established that approximately one-third of the proteins in the human proteome are entirely disordered (IDPs) or contain long intrinsically disordered regions (IDRs). IDPs and IDRs are unable to fold spontaneously into single, well-defined 3D structures, but fluctuate over an ensemble of conformations. IDPs cannot be characterized by the traditional methods of structural biology but will require the development of new integrative approaches and technologies. Currently, our understanding of this dark portion of the human proteome is very limited.

Historical Timeline
Figure 1. World-wide research activity on the dark proteome has outpaced support from the US NIH.

IDPs function in the regulation and organization of the cell. They regulate key signaling pathways, transcription, translation, the cell cycle, and numerous other cellular regulatory programs and are directly implicated in debilitating diseases such as cancer, diabetes, cardiovascular disease, infectious disease, and neurodegenerative disorders (1,2). (1,2). In keeping with their regulatory functions, the cellular abundance of IDPs is tightly controlled and mutations or changes in abundance are associated with disease (3,4). Indeed, more than 20% of human disease mutations occur within intrinsically disordered protein regions (5). It has been estimated that, when posttranslational modifications are taken into account, there may be as many as a million binding motifs, which mediate protein-protein interactions, within disordered regions of the human proteome (6). Viruses commonly employ IDPs that mimic cellular binding motifs in order to subvert cellular signaling networks and gain control of the host cell regulatory machinery (7,8).

IDPs are a central component of the control circuitry of the cell. Cellular signals are encoded and propagated by enzyme action, protein-protein interactions, posttranslational modification, and cellular localization. IDPs have unique characteristics that allow them to function as central hubs in signaling networks. They frequently contain multiple binding motifs that mediate dynamic protein-protein interactions to propagate signals and form complex information processing circuits. Disordered protein regions are subject to combinatorial posttranslational modifications, which fine-tune their interactions and add complexity to signaling networks. IDPs are enriched in phosphorylation sites (9) (9) and frequently undergo multisite phosphorylation to generate molecular switches and rheostats. Alternative splicing of IDPs is common and forms a basis for tissue-specific signaling (10).

In addition to their regulatory roles, IDPs perform an important function in the ordered assembly of cellular machines, in the organization of chromatin, in the assembly and disassembly of cytoskeletal filaments and microtubules, in the functioning of chaperones, and as flexible “entropic” linkers that connect functional protein domains. Low-complexity sequences and prion-like domains in IDPs can drive phase separation to form membrane-less compartments within the cell. These structures behave as liquid droplets and the components within them are in dynamic exchange with the surrounding cytoplasm or nucleoplasm. Mutations within the intrinsically disordered regions drive pathological fibrillization associated with ALS, frontotemporal dementia, and other debilitating diseases (11,12).

There are approximately 50 human diseases that are associated with protein misfolding, formation of aggregated states and amyloid fibrils (13). Many of the proteins involved in diseases of protein misfolding are intrinsically disordered, such as Aβ and tau (Alzheimer’s disease), α-synuclein (Parkinson’s disease), TDP43 (ALS), and amylin (type II diabetes). Other misfolding diseases are associated with folded globular proteins which, through dynamic fluctuations in the protein structure, form transient “invisible” states that are partially unfolded and are aggregation prone. Examples include transthyretin, which is associated with systemic and familial cardiomyopathies and familial amyloid neuropathy, and superoxide dismutase which forms cytotoxic aggregates in ALS. Although proteins like transthyretin and superoxide dismutase perform their physiological functions as native folded proteins, their association with disease is through transient formation of invisible, aggregation-prone states and they therefore constitute a part of the human dark proteome.


  1. Wright, P. E., and Dyson, H. J. (2015) Intrinsically disordered proteins in cellular signalling and regulation. Nature Reviews Molecular Cell Biology 16, 18-29.
  2. Uversky, V. N., Oldfield, C. J., and Dunker, A. K. (2008) Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annual Review of Biophysics 37, 215-246.
  3. Gsponer, J., Futschik, M. E., Teichmann, S. A., and Babu, M. M. (2008) Tight regulation of unstructured proteins: from transcript synthesis to protein degradation. Science 322, 1365-1368.
  4. Babu, M. M., van der Lee, R., de Groot, N. S., and Gsponer, J. (2011) Intrinsically disordered proteins: regulation and disease. Current Opinion in Structural Biology 21, 432-440.
  5. Uyar, B., Weatheritt, R. J., Dinkel, H., Davey, N. E., and Gibson, T. J. (2014) Proteome-wide analysis of human disease mutations in short linear motifs: neglected players in cancer? Molecular BioSystems 10, 2626-2642.
  6. Tompa, P., Davey, N. E., Gibson, T. J., and Babu, M. M. (2014) A million peptide motifs for the molecular biologist. Molecular Cell 55, 161-169.
  7. Hagai, T., Azia, A., Babu, M. M., and Andino, R. (2014) Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions. Cell Reports 7, 1729-1739.
  8. Davey, N. E., Travé, G., and Gibson, T. J. (2011) How viruses hijack cell regulation. Trends in Biochemical Sciences 36, 159-169.
  9. Iakoucheva, L. M., Radivojac, P., Brown, C. J., O’Connor, T. R., Sikes, J. G., Obradovic, Z., and Dunker, A. K. (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Research 32, 1037-1049.
  10. Buljan, M., Chalancon, G., Eustermann, S., Wagner, G. P., Fuxreiter, M., Bateman, A., and Babu, M. M. (2012) Tissue-Specific Splicing of Disordered Segments that Embed Binding Motifs Rewires Protein Interaction Networks. Molecular Cell 46, 871-883.
  11. Patel, A., Lee, Hyun O., Jawerth, L., Maharana, S., Jahnel, M., Hein, Marco Y., Stoynov, S., Mahamid, J., Saha, S., Franzmann, Titus M., Pozniakovski, A., Poser, I., Maghelli, N., Royer, Loic A., Weigert, M., Myers, Eugene W., Grill, S., Drechsel, D., Hyman, Anthony A., and Alberti, S. (2015) A Liquid-to-Solid Phase Transition of the ALS Protein FUS Accelerated by Disease Mutation. Cell 162, 1066-1077.
  12. Molliex, A., Temirov, J., Lee, J., Coughlin, M., Kanagaraj, Anderson P., Kim, Hong J., Mittag, T., and Taylor, J. P. (2015) Phase Separation by Low Complexity Domains Promotes Stress Granule Assembly and Drives Pathological Fibrillization. Cell 163, 123-133.
  13. Knowles, T. P. J., Vendruscolo, M., and Dobson, C. M. (2014) The amyloid state and its association with protein misfolding diseases. Nature Reviews Molecular Cell Biology 15, 384-396.