Skip to content
Snippets Groups Projects
Commit bef72f98 authored by Franziska Niemeyer's avatar Franziska Niemeyer
Browse files

Upload solutions for exercises_C

parent 2a410e29
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Python course 2021 - Exercises C
%% Cell type:markdown id: tags:
## Part1 - file handling
%% Cell type:markdown id: tags:
---
1.1) Count number of sequences (number of headers) in AtCol0_Exons.fasta!
%% Cell type:code id: tags:
```
from google.colab import drive
drive.mount('/content/drive')
```
%% Output
Mounted at /content/drive
%% Cell type:code id: tags:
```
datei = open("/content/drive/MyDrive/PythonProgramming/AtCol0_Exons.fasta", "r")
lines = datei.readlines()
datei.close()
```
%% Cell type:code id: tags:
```
def get_num_headers(lines):
num_headers = 0
for line in lines:
if line:
if line[0] == ">":
num_headers += 1
return num_headers
print(get_num_headers(lines))
```
%% Output
217183
%% Cell type:markdown id: tags:
---
1.2) Count number of sequence lines!
%% Cell type:code id: tags:
```
def get_num_sequence_lines(lines):
num_sequence_lines = 0
for line in lines:
if line:
if line[0] != ">":
num_sequence_lines += 1
return num_sequence_lines
print(get_num_sequence_lines(lines))
```
%% Output
916024
%% Cell type:markdown id: tags:
---
1.3) Count number of characters in document! (How many per line?)
%% Cell type:code id: tags:
```
def get_num_characters(lines):
num_characters = 0
num_lines = 0
for line in lines:
num_characters += len(line)
num_lines += 1
return (num_characters, num_characters / num_lines)
print(get_num_characters(lines))
```
%% Output
(81803755, 72.18783064347467)
%% Cell type:markdown id: tags:
---
1.4) How long are all contained sequences combined?
%% Cell type:code id: tags:
```
def get_sequence_length(lines):
total_sequence_length = 0
for line in lines:
if line:
if line[0] != ">":
line = line.strip()
total_sequence_length += len(line)
return total_sequence_length
print(get_sequence_length(lines))
```
%% Output
64867051
%% Cell type:markdown id: tags:
---
1.5) Calculate the average sequence length in this file!
%% Cell type:code id: tags:
```
def get_average_sequence_length(lines):
return get_sequence_length(lines) / get_num_headers(lines)
print(get_average_sequence_length(lines))
```
%% Output
298.67462462531597
%% Cell type:markdown id: tags:
**Additional exercises**
%% Cell type:markdown id: tags:
1.6) Parse the fasta file entry-wise. An entry consists of a header and the corresponding sequence (which may comprise multiple lines). The result should be a list of tuples of the form (header, sequence).
%% Cell type:code id: tags:
```
"""
Parse a fasta file entry-wise as a list of tuples of the form (header, sequence).
"""
def read_fasta(file):
result = []
header = None
sequence = []
for line in file:
# remove all whitespace from the ends
line = line.strip()
if line.startswith('>'):
# if you find a header return the previous FASTA block in tuple form after
# concatenating the sequence lines(if there is a previous block)
if header:
result += [(header, ''.join(sequence))]
header = line
sequence = []
else:
# current line is not a header
# add line to the list of sequence lines of the current FASTA block after removing all whitespace from it
sequence.append(line.translate(str.maketrans('', '', whitespace)))
if header:
result += [(header, ''.join(sequence))]
return result
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment