"1.6) Parse the fasta file entry-wise. An entry consists of a header and the corresponding sequence (which may comprise multiple lines). The result should be a list of tuples of the form (header, sequence)."
],
"metadata": {
"id": "ItrnPkVE5fsv"
}
},
{
"cell_type": "code",
"source": [
"\"\"\"\n",
"Parse a fasta file entry-wise as a list of tuples of the form (header, sequence).\n",
"\"\"\"\n",
"def read_fasta(file):\n",
" result = []\n",
"\n",
" header = None\n",
" sequence = []\n",
" for line in file:\n",
" # remove all whitespace from the ends\n",
" line = line.strip()\n",
" if line.startswith('>'):\n",
" # if you find a header return the previous FASTA block in tuple form after\n",
" # concatenating the sequence lines(if there is a previous block)\n",
" if header:\n",
" result += [(header, ''.join(sequence))]\n",
"\n",
" header = line\n",
" sequence = []\n",
" else:\n",
" # current line is not a header\n",
" # add line to the list of sequence lines of the current FASTA block after removing all whitespace from it\n",
1.6) Parse the fasta file entry-wise. An entry consists of a header and the corresponding sequence (which may comprise multiple lines). The result should be a list of tuples of the form (header, sequence).
%% Cell type:code id: tags:
```
"""
Parse a fasta file entry-wise as a list of tuples of the form (header, sequence).
"""
def read_fasta(file):
result = []
header = None
sequence = []
for line in file:
# remove all whitespace from the ends
line = line.strip()
if line.startswith('>'):
# if you find a header return the previous FASTA block in tuple form after
# concatenating the sequence lines(if there is a previous block)
if header:
result += [(header, ''.join(sequence))]
header = line
sequence = []
else:
# current line is not a header
# add line to the list of sequence lines of the current FASTA block after removing all whitespace from it