First, we classified the entire sequences of
into 20^4 (= 160,000) patterns according to its C-terminal four amino acids.
Then, we defined the frequency of
terms annotated to sequences with the same pattern as the function of each pattern.
Finally, we calculated the dissimilarity between the functions of each pattern
with Tanimoto coefficient extended so as to handle real number.