Skip to content
Snippets Groups Projects
Commit 2a410e29 authored by Franziska Niemeyer's avatar Franziska Niemeyer
Browse files

Upload solutions for exercises_B

parent 49f1bfc2
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Python course 2021 - Exercises B
%% Cell type:markdown id: tags:
## Part1 - control structures
%% Cell type:markdown id: tags:
---
1.1) Write a script for guessing numbers!
%% Cell type:markdown id: tags:
---
1.2) Add tips (smaller/larger) during the guessing process!
%% Cell type:code id: tags:
```
import random
def guessing_game(num_tries, upper_limit):
true_number = random.randrange(upper_limit)
for i in range(num_tries):
user_input = int(input("Enter a number: "))
if (user_input == true_number):
print("Correct! You win the game")
return
elif (user_input < true_number):
print("Too low! Guess a higher number")
else:
print("Too high! Guess a lower number")
print("You are out of attempts. Better luck next time")
print(f"The correct number was {true_number}")
guessing_game(3, 10)
```
%% Output
Enter a number: 5
Too high! Guess a lower number
Enter a number: 2
Correct! You win the game
%% Cell type:markdown id: tags:
## Part2 - loops
%% Cell type:markdown id: tags:
---
2.1) Write a function counting to 100 and printing all numbers which can be divided by 4 without any residue!
* Info: 10%2 #modulo division in Python
%% Cell type:code id: tags:
```
def get_multiples_of_four(limit):
multiples = []
for i in range(0, limit, 4):
multiples += [i]
print(multiples)
get_multiples_of_four(100)
```
%% Output
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96]
%% Cell type:markdown id: tags:
---
2.2) Write a function counting down from 1000 to 0 and printing all numbers!
%% Cell type:code id: tags:
```
def countdown(start):
for i in range(start, -1, -1):
print(i)
countdown(10)
```
%% Output
10
9
8
7
6
5
4
3
2
1
0
%% Cell type:markdown id: tags:
---
2.3) Generate a list of species names! Write a function printing all species names starting with "E"!
%% Cell type:code id: tags:
```
species = ["D. melanogaster", "M. musculus", "E. coli", "C. elegans", "H. sapiens", "B. napus", "B. vulgaris", "E. multilocularis", "E. a"]
def filter_species_0(species):
filtered_species = [name for name in species if name[0] == "E"]
return filtered_species
print(filter_species_0(species))
```
%% Output
['E. coli', 'E. multilocularis', 'E. a']
%% Cell type:markdown id: tags:
---
2.4) Expand this function to limit the printing to species names which are additionally shorter than 10 characters!
%% Cell type:code id: tags:
```
def filter_species_1(species):
filtered_species = filter_species_0(species)
filtered_species = [name for name in filtered_species if len(name) < 10]
return filtered_species
print(filter_species_1(species))
```
%% Output
['E. coli', 'E. a']
%% Cell type:markdown id: tags:
---
2.5) Expand this function to limit the printing to species names which are additionally ending with "a".
%% Cell type:code id: tags:
```
def filter_species_2(species):
filtered_species = filter_species_1(species)
filtered_species = [name for name in filtered_species if name[-1] == "a"]
return filtered_species
print(filter_species_2(species))
```
%% Output
['E. a']
%% Cell type:markdown id: tags:
**Additional exercises**
%% Cell type:markdown id: tags:
2.6) Load 4-6 protein sequences into a list and search them for specific motive, e.g. "VAL". You should only return those sequences that contain the motive. Additional: where does the motive lie?
%% Cell type:code id: tags:
```
"""
Protein sequences are taken from UniProt.
P01308 (insulin, H. sapiens)
P68871 (hemoglobin subunit beta, H. sapiens)
O22264 (transcription factor MYB12, A. thaliana)
P19821 (DNA polymerase I, thermostable, Thermus aquaticus)
"""
proteins = [
"MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGA"
+ "GSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN",
"MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSD"
+ "GLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH",
"MGRAPCCEKVGIKRGRWTAEEDQILSNYIQSNGEGSWRSLPKNAGLKRCGKSCRLRWINYLRSDLKRGNITPEE"
+ "EELVVKLHSTLGNRWSLIAGHLPGRTDNEIKNYWNSHLSRKLHNFIRKPSISQDVSAVIMTNASSAPPPPQA"
+ "KRRLGRTSRSAMKPKIHRTKTRKTKKTSAPPEPNADVAGADKEALMVESSGAEAELGRPCDYYGDDCNKNLM"
+ "SINGDNGVLTFDDDIIDLLLDESDPGHLYTNTTCGGDGELHNIRDSEGARGFSDTWNQGNLDCLLQSCPSVE"
+ "SFLNYDHQVNDASTDEFIDWDCVWQEGSDNNLWHEKENPDSMVSWLLDGDDEATIGNSNCENFGEPLDHDDE"
+ "SALVAWLLS",
"MRGMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDAVIVVFDAKAPSFR"
+ "HEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLY"
+ "QLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL"
+ "KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLES"
+ "PKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALR"
+ "EGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVE"
+ "RPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKT"
+ "EKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSD"
+ "PNLQNIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVP"
+ "REAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVET"
+ "LFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKE"
+ "RAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDWLSAKE"
]
```
%% Cell type:code id: tags:
```
"""
Find the first occurrence of a motive in each protein.
If there exists an occurrence, returns the protein together with
the position of the first occurrence of the motive.
Implemented as a generator for more flexibility.
"""
def find_motive(proteins, motive):
for protein in proteins:
# find returns the index of the first occurrence of the search string
# and -1 if no occurrence can be found
occurrence = protein.find(motive)
if occurrence > -1:
yield (occurrence, protein)
motive = "MA"
for entry in find_motive(proteins, motive):
print(entry)
```
%% Output
(0, 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN')
(746, 'MRGMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDWLSAKE')
%% Cell type:markdown id: tags:
2.7) What is the amino acid composition of the proteins? Which amino acid occurs most rarely?
%% Cell type:code id: tags:
```
def get_sequence_composition(sequence):
# dictionary saving the number of observed occurrences of each character
num_occurrences = {}
for character in sequence:
# setdefault returns the value of the key if the key is already in the dictionary
# otherwise it returns the default value (here 0) and adds the (key, default) pair
# to the dictionary
num_occurrences[character] = num_occurrences.setdefault(character, 0) + 1
return num_occurrences
def get_rarest_symbol(sequence):
num_occurrences = get_sequence_composition(sequence)
min_occurrences = len(sequence) + 1
rarest_symbol = ""
for symbol, occurrences in num_occurrences.items():
if occurrences < min_occurrences:
min_occurrences = occurrences
rarest_symbol = symbol
return (rarest_symbol, min_occurrences)
print(get_sequence_composition(proteins[0]))
print(get_rarest_symbol(proteins[0]))
```
%% Output
{'M': 2, 'A': 10, 'L': 20, 'W': 2, 'R': 5, 'P': 6, 'G': 12, 'D': 2, 'F': 3, 'V': 6, 'N': 3, 'Q': 7, 'H': 2, 'C': 6, 'S': 5, 'E': 8, 'Y': 4, 'T': 3, 'K': 2, 'I': 2}
('M', 2)
%% Cell type:markdown id: tags:
## Part3 - range & enumerate
%% Cell type:markdown id: tags:
---
3.1) Write a script to print 50x "here" and the current value of the control variable!
%% Cell type:code id: tags:
```
def print_here(iterations):
for i in range(iterations):
print(i, "here")
print_here(50)
```
%% Output
0 here
1 here
2 here
3 here
4 here
5 here
6 here
7 here
8 here
9 here
10 here
11 here
12 here
13 here
14 here
15 here
16 here
17 here
18 here
19 here
20 here
21 here
22 here
23 here
24 here
25 here
26 here
27 here
28 here
29 here
30 here
31 here
32 here
33 here
34 here
35 here
36 here
37 here
38 here
39 here
40 here
41 here
42 here
43 here
44 here
45 here
46 here
47 here
48 here
49 here
%% Cell type:markdown id: tags:
---
3.2) Write a script to walk through the species list and to print the character from the species where the index corresponds to the current control variable value!
%% Cell type:code id: tags:
```
species = ["D. melanogaster", "M. musculus", "", "A", None, "E. coli", "C. elegans", "H. sapiens", "B. napus", "B. vulgaris", "E. multilocularis", "E. a"]
def print_index_char(species):
for index, name in enumerate(species):
# ignore empty names and None
if name:
# if the index is larger than the largest possible index for this name
# we need to correct it by setting it to the last valid index
corrected_index = min(index, len(name) - 1)
print(index, name[corrected_index])
# note that indices 2 and 4 are ignored because for these species name there are no characters
print_index_char(species)
```
%% Output
0 D
1 .
3 A
5 l
6 g
7 e
8 s
9 i
10 c
11 a
%% Cell type:markdown id: tags:
**Additional exercises**
%% Cell type:markdown id: tags:
3.3) Given two arbitrary sequences *x* and *y*, find a longest common substring of *x* and *y*.
Example: *x* = ACGCTA, *y* = CGCGTA yields the result CGC.
%% Cell type:code id: tags:
```
"""
Let x and y be two sequences over the same alphabet with lengths |x| = n, |y| = m.
"""
"""
Finds a longest common substring of x and y naively.
Time complexity: O(n * m^2)
Auxiliary space complexity: O(1)
"""
def longest_common_substring_naive(x, y):
length_x = len(x)
length_y = len(y)
longest_match_length = 0
longest_match_start = 0
for i in range(length_x):
for j in range(length_y):
current_position_x = i
current_position_y = j
current_match_length = 0
while(current_position_x < length_x and current_position_y < length_y):
if x[current_position_x] == y[current_position_y]:
current_position_x += 1
current_position_y += 1
current_match_length += 1
else:
break
if current_match_length > longest_match_length:
longest_match_length = current_match_length
longest_match_start = i
return x[longest_match_start:longest_match_start+longest_match_length]
"""
Find a longest common substring of x and y using dynamic programming
without any space optimizations.
Essentially we compute the longest common suffix of each combination of prefixes
of x and y. The largest of such longest common suffixes of prefixes is a
longest common substring.
The recursion formula used is
longest_common_suffix[i-1][j-1] + 1, if x[i] = x[j]
longest_common_suffix[i][j] =
0, otherwise
for 1 <= i <= n, 1 <= j <= m. The recursion anchor is
longest_common_suffix[i][0] = 0
longest_common_suffix[0][j] = 0
for 0 <= i <= n, 0 <= j <= m.
Time complexity: O(n * m)
Auxiliary space complexity: O(n * m)
"""
def longest_common_substring_dp(x, y):
length_x = len(x)
length_y = len(y)
# initialize longest common suffix table
# longest_common_suffix[i][j] is the length
# of the longest common suffix of x[0:i] and y[0:j]
longest_common_suffix = [[0 for _ in range(length_y + 1)] for _ in range(length_x + 1)]
longest_match_length = 0
longest_match_end = 0
# compute the longest_common_suffix array row-wise
for i in range(1, length_x + 1):
for j in range(1, length_y + 1):
if (x[i-1] == y[j-1]):
longest_common_suffix[i][j] = longest_common_suffix[i-1][j-1] + 1
if longest_common_suffix[i][j] > longest_match_length:
longest_match_length = longest_common_suffix[i][j]
longest_match_end = i
else:
longest_common_suffix[i][j] = 0
return x[longest_match_end-longest_match_length:longest_match_end]
x = "ACGCTA"
x_2 = "ACGCTAC"
y = "CGCGTA"
y_2 = "CGCGTAG"
print(longest_common_substring_naive(x, y))
print(longest_common_substring_dp(x,y))
print(longest_common_substring_dp(x_2, y))
print(longest_common_substring_dp(x, y_2))
```
%% Output
CGC
CGC
CGC
CGC
%% Cell type:markdown id: tags:
The auxiliary space complexity of the dynamic programming solution presented above can be optimized substantially.
First note that the computation in that solution always only depends on the last row already computed. Therefore, it suffices to only store two rows at once decreasing the auxiliary space complexity to O(min(n,m)).
There is still room for improvement. If you perform the computation diagonal-wise instead of row-wise, we will only need to store the last already computed element of that diagonal. This way, we can get away with O(1) auxiliary space usage.
Another totally different solution of the longest common substring problem resolves around a data structure named generalized suffix tree. With the help of this data structure it is possibly to obtain a solution with O(n + m) time and auxiliary space complexity. However, that solution is far more difficult to implement and the relatively high constant factors in the space usage may make it prohibitive for large inputs.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment