結束我有這個文件(+ 200K線):得到N個順序線,圖案
##gff-version 2
##source-version bepipred-1.0b
##date 2017-03-02
##Type Protein EgrG_000076200.1
##Protein EgrG_000076200.1 cat seq.1.fsa
##MSPRGCLLLLMLVVILGISIQWTEAQGHRSDGQAEEFAVAKEMEEEDDDDEGEDYDDDDEEEEKEVVANRESKLLKHCLNLQNALKEKMESVVNQMKDCSKILALA
##end-Protein
# seqname source feature start end score N/A ?
# ---------------------------------------------------------------------------
EgrG_000076200.1 bepipred-1.0b epitope 1 1 0.920 . . M|E
EgrG_000076200.1 bepipred-1.0b epitope 2 2 0.544 . . S|.
EgrG_000076200.1 bepipred-1.0b epitope 3 3 -0.070 . . P|.
EgrG_000076200.1 bepipred-1.0b epitope 12 12 -3.747 . . L|.
EgrG_000076200.1 bepipred-1.0b epitope 13 13 -3.223 . . V|.
EgrG_000076200.1 bepipred-1.0b epitope 14 14 -2.999 . . V|.
EgrG_000076200.1 bepipred-1.0b epitope 15 15 -2.401 . . I|.
EgrG_000076200.1 bepipred-1.0b epitope 16 16 -2.271 . . L|.
EgrG_000076200.1 bepipred-1.0b epitope 17 17 -1.701 . . G|.
EgrG_000076200.1 bepipred-1.0b epitope 18 18 -1.569 . . I|.
EgrG_000076200.1 bepipred-1.0b epitope 19 19 -1.072 . . S|.
EgrG_000076200.1 bepipred-1.0b epitope 20 20 -0.532 . . I|.
EgrG_000076200.1 bepipred-1.0b epitope 21 21 -0.055 . . Q|.
EgrG_000076200.1 bepipred-1.0b epitope 22 22 0.128 . . W|.
EgrG_000076200.1 bepipred-1.0b epitope 23 23 0.553 . . T|.
EgrG_000076200.1 bepipred-1.0b epitope 24 24 0.541 . . E|.
EgrG_000076200.1 bepipred-1.0b epitope 25 25 0.923 . . A|E
EgrG_000076200.1 bepipred-1.0b epitope 26 26 0.992 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 27 27 1.480 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 28 28 1.540 . . H|E
EgrG_000076200.1 bepipred-1.0b epitope 29 29 1.564 . . R|E
EgrG_000076200.1 bepipred-1.0b epitope 30 30 1.591 . . S|E
EgrG_000076200.1 bepipred-1.0b epitope 31 31 1.582 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 32 32 1.599 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 33 33 1.280 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 34 34 1.101 . . A|E
EgrG_000076200.1 bepipred-1.0b epitope 35 35 0.777 . . E|.
EgrG_000076200.1 bepipred-1.0b epitope 36 36 0.516 . . E|.
EgrG_000076200.1 bepipred-1.0b epitope 37 37 0.353 . . F|.
EgrG_000076200.1 bepipred-1.0b epitope 38 38 0.273 . . A|.
EgrG_000076200.1 bepipred-1.0b epitope 39 39 0.068 . . V|.
EgrG_000076200.1 bepipred-1.0b epitope 40 40 0.086 . . A|.
EgrG_000076200.1 bepipred-1.0b epitope 41 41 0.124 . . K|.
EgrG_000076200.1 bepipred-1.0b epitope 42 42 0.648 . . E|.
EgrG_000076200.1 bepipred-1.0b epitope 43 43 1.026 . . M|E
EgrG_000076200.1 bepipred-1.0b epitope 44 44 1.520 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 45 45 1.842 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 46 46 2.132 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 47 47 2.271 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 48 48 2.605 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 49 49 2.669 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 50 50 2.778 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 51 51 2.544 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 52 52 2.506 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 53 53 2.464 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 54 54 2.464 . . D|E
##Protein EgrG_000524000.1 cat seq.3.fsa
##MATAQRLLTASLLLISVLIPLISARRPSYYVHGLKFSRPCENNTYDEMTGNFKCTVPTGAECFQLCQQYGCYEWSFSSFMPSTDMHVRDHFRCRCIQDICLYNYVRVRDRDYE
##end-Protein
# seqname source feature start end score N/A ?
# ---------------------------------------------------------------------------
EgrG_000524000.1 bepipred-1.0b epitope 1 1 0.143 . . M|.
EgrG_000524000.1 bepipred-1.0b epitope 2 2 0.068 . . A|.
EgrG_000524000.1 bepipred-1.0b epitope 3 3 -0.340 . . T|.
EgrG_000524000.1 bepipred-1.0b epitope 4 4 -0.654 . . A|.
EgrG_000524000.1 bepipred-1.0b epitope 5 5 -0.563 . . Q|.
EgrG_000524000.1 bepipred-1.0b epitope 6 6 -0.500 . . R|.
EgrG_000524000.1 bepipred-1.0b epitope 7 7 -0.448 . . L|.
EgrG_000524000.1 bepipred-1.0b epitope 8 8 -0.904 . . L|.
EgrG_000524000.1 bepipred-1.0b epitope 41 41 1.129 . . E|E
EgrG_000524000.1 bepipred-1.0b epitope 42 42 1.135 . . N|E
EgrG_000524000.1 bepipred-1.0b epitope 43 43 1.223 . . N|E
EgrG_000524000.1 bepipred-1.0b epitope 48 48 0.557 . . M|.
EgrG_000524000.1 bepipred-1.0b epitope 49 49 0.415 . . T|.
EgrG_000524000.1 bepipred-1.0b epitope 50 50 0.269 . . G|.
EgrG_000524000.1 bepipred-1.0b epitope 51 51 0.188 . . N|.
EgrG_000524000.1 bepipred-1.0b epitope 52 52 -0.024 . . F|.
EgrG_000524000.1 bepipred-1.0b epitope 53 53 0.184 . . K|.
EgrG_000524000.1 bepipred-1.0b epitope 54 54 0.280 . . C|.
等
,我需要選擇只與行結束符「| E」,但至少6,然後,如果下一行以'|。'結尾,則插入一個分隔符。像:
EgrG_000076200.1 bepipred-1.0b epitope 25 25 0.923 . . A|E
EgrG_000076200.1 bepipred-1.0b epitope 26 26 0.992 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 27 27 1.480 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 28 28 1.540 . . H|E
EgrG_000076200.1 bepipred-1.0b epitope 29 29 1.564 . . R|E
EgrG_000076200.1 bepipred-1.0b epitope 30 30 1.591 . . S|E
EgrG_000076200.1 bepipred-1.0b epitope 31 31 1.582 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 32 32 1.599 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 33 33 1.280 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 34 34 1.101 . . A|E
-----
EgrG_000094950.1 bepipred-1.0b epitope 146 146 1.277 . . I|E
EgrG_000094950.1 bepipred-1.0b epitope 147 147 1.443 . . N|E
EgrG_000094950.1 bepipred-1.0b epitope 148 148 1.593 . . G|E
EgrG_000094950.1 bepipred-1.0b epitope 149 149 1.740 . . E|E
EgrG_000094950.1 bepipred-1.0b epitope 150 150 1.752 . . D|E
EgrG_000094950.1 bepipred-1.0b epitope 151 151 2.206 . . E|E
EgrG_000094950.1 bepipred-1.0b epitope 152 152 2.243 . . E|E
EgrG_000094950.1 bepipred-1.0b epitope 153 153 2.194 . . E|E
EgrG_000094950.1 bepipred-1.0b epitope 154 154 1.840 . . A|E
EgrG_000094950.1 bepipred-1.0b epitope 155 155 1.451 . . D|E
EgrG_000094950.1 bepipred-1.0b epitope 156 156 1.298 . . E|E
我需要這個,因爲選擇這些集羣,結束後「| E」我要去要算多少的羣組(=序列> = 6「| E」在一排)我得到了每個ID(ID = EgrG_ *)
我試着用做它:grep 'EgrG' bepipred_eg_es_final_rev.txt | sed '/^#/d;s/.*|\./-----/' | uniq
這裏是什麼,我得到了一個例子:
-----
EgrG_000076200.1 bepipred-1.0b epitope 1 1 0.920 . . M|E
-----
EgrG_000076200.1 bepipred-1.0b epitope 25 25 0.923 . . A|E
EgrG_000076200.1 bepipred-1.0b epitope 26 26 0.992 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 27 27 1.480 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 28 28 1.540 . . H|E
EgrG_000076200.1 bepipred-1.0b epitope 29 29 1.564 . . R|E
EgrG_000076200.1 bepipred-1.0b epitope 30 30 1.591 . . S|E
EgrG_000076200.1 bepipred-1.0b epitope 31 31 1.582 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 32 32 1.599 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 33 33 1.280 . . Q|E
EgrG_000076200.1 bepipred-1.0b epitope 34 34 1.101 . . A|E
-----
EgrG_000076200.1 bepipred-1.0b epitope 43 43 1.026 . . M|E
EgrG_000076200.1 bepipred-1.0b epitope 44 44 1.520 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 45 45 1.842 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 46 46 2.132 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 47 47 2.271 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 48 48 2.605 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 49 49 2.669 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 50 50 2.778 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 51 51 2.544 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 52 52 2.506 . . G|E
EgrG_000076200.1 bepipred-1.0b epitope 53 53 2.464 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 54 54 2.464 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 55 55 2.455 . . Y|E
EgrG_000076200.1 bepipred-1.0b epitope 56 56 2.442 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 57 57 2.457 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 58 58 2.451 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 59 59 2.390 . . D|E
EgrG_000076200.1 bepipred-1.0b epitope 60 60 2.477 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 61 61 2.295 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 62 62 1.861 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 63 63 1.400 . . E|E
EgrG_000076200.1 bepipred-1.0b epitope 64 64 1.014 . . K|E
-----
EgrG_000131300.1 bepipred-1.0b epitope 37 37 0.984 . . N|E
-----
的事情是,我不知道如何去除包含少於6行以'| E'結尾的組。 我也試過用python,但我得到了幾乎相同的結果。
順便說一句,我在Linux Mint 18.1和Ubuntu 16.04上工作。
我希望我能夠解釋得很好。
謝謝@klashxx它幫了我很多。請介意向我解釋這個awk代碼的功能?喜歡,爲什麼/什麼是'我= 1;我++',等在此先感謝。 –
當然@TiagoMinuzzi,做xP – klashxx