Background supplementary material 1 The annotation of various human chromosomes is well supported by computational predictions where there is no similarity to known proteins or EST sequences. The genes that have unknown function called as orphan genes code for proteins annotated as “hypothetical proteins”. Hence, there is a need to begin constructing and analyzing protein families clustered as “hypothetical proteins” with an aim to elucidate function and protein subunit interactions. 2 3 4 5 Figure 1a 6 figure 1b Homo sapiens 7 While we started mining the proteins, it seemed that there are a few hypothetical proteins that have amino acid residues HYP (histidine, tyrosine and proline) in succession. These might have been long-established through the mutations that are introduced into the proteins at one or more predicted non-essential residues. 8 5 Observations and challenges We observe in tandem that few hypothetical proteins present on different chromosomal loci are known to have the same putative function. Categorizing several approaches beyond traditional sequence similarity that utilize tremendously large amounts of data that is available for computational prediction of functions is the need of the hour these days. Having said this, one could use a subset of proteins that match from several of the experimental approaches and be used as a predictor to circumvent the use of wet laboratory experiments in the near future. 7 Yet another issue to be noted is the appraisal to understand if any of the hypothetical proteins have proper functional annotations’ been attributed to sequence: structure: function relationship in case of ordered proteins while sequence: un-structure: function in case of intrinsically disordered proteins. In conclusion, the current methods could play an important role in establishing functions for proteins annotated as hypothetical in the genome. Note The title of the article contains hypo abbreviated for hypothetical proteins. Supplementary material Data 1