" filtered_species = [name for name in filtered_species if name[-1] == \"a\"]\n",
" return filtered_species\n",
"\n",
"print(filter_species_2(species))"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['E. a']\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"**Additional exercises**"
],
"metadata": {
"id": "Al6Bv0jlTTfG"
}
},
{
"cell_type": "markdown",
"source": [
"2.6) Load 4-6 protein sequences into a list and search them for specific motive, e.g. \"VAL\". You should only return those sequences that contain the motive. Additional: where does the motive lie?"
],
"metadata": {
"id": "PSJds5r8NqmP"
}
},
{
"cell_type": "code",
"source": [
"\"\"\"\n",
"Protein sequences are taken from UniProt.\n",
"P01308 (insulin, H. sapiens)\n",
"P68871 (hemoglobin subunit beta, H. sapiens)\n",
"O22264 (transcription factor MYB12, A. thaliana)\n",
"P19821 (DNA polymerase I, thermostable, Thermus aquaticus)\n",
"3.2) Write a script to walk through the species list and to print the character from the species where the index corresponds to the current control variable value!\n"
"The auxiliary space complexity of the dynamic programming solution presented above can be optimized substantially.\n",
"\n",
"First note that the computation in that solution always only depends on the last row already computed. Therefore, it suffices to only store two rows at once decreasing the auxiliary space complexity to O(min(n,m)).\n",
"\n",
"There is still room for improvement. If you perform the computation diagonal-wise instead of row-wise, we will only need to store the last already computed element of that diagonal. This way, we can get away with O(1) auxiliary space usage.\n",
"\n",
"Another totally different solution of the longest common substring problem resolves around a data structure named generalized suffix tree. With the help of this data structure it is possibly to obtain a solution with O(n + m) time and auxiliary space complexity. However, that solution is far more difficult to implement and the relatively high constant factors in the space usage may make it prohibitive for large inputs.\n"
],
"metadata": {
"id": "F9tF15SdxSws"
}
}
]
}
\ No newline at end of file
%% Cell type:markdown id: tags:
# Python course 2021 - Exercises B
%% Cell type:markdown id: tags:
## Part1 - control structures
%% Cell type:markdown id: tags:
---
1.1) Write a script for guessing numbers!
%% Cell type:markdown id: tags:
---
1.2) Add tips (smaller/larger) during the guessing process!
%% Cell type:code id: tags:
```
import random
def guessing_game(num_tries, upper_limit):
true_number = random.randrange(upper_limit)
for i in range(num_tries):
user_input = int(input("Enter a number: "))
if (user_input == true_number):
print("Correct! You win the game")
return
elif (user_input < true_number):
print("Too low! Guess a higher number")
else:
print("Too high! Guess a lower number")
print("You are out of attempts. Better luck next time")
print(f"The correct number was {true_number}")
guessing_game(3, 10)
```
%% Output
Enter a number: 5
Too high! Guess a lower number
Enter a number: 2
Correct! You win the game
%% Cell type:markdown id: tags:
## Part2 - loops
%% Cell type:markdown id: tags:
---
2.1) Write a function counting to 100 and printing all numbers which can be divided by 4 without any residue!
filtered_species = [name for name in species if name[0] == "E"]
return filtered_species
print(filter_species_0(species))
```
%%Output
['E. coli', 'E. multilocularis', 'E. a']
%% Cell type:markdown id: tags:
---
2.4) Expand this function to limit the printing to species names which are additionally shorter than 10 characters!
%% Cell type:code id: tags:
```
def filter_species_1(species):
filtered_species = filter_species_0(species)
filtered_species = [name for name in filtered_species if len(name) < 10]
return filtered_species
print(filter_species_1(species))
```
%% Output
['E. coli', 'E. a']
%% Cell type:markdown id: tags:
---
2.5) Expand this function to limit the printing to species names which are additionally ending with "a".
%% Cell type:code id: tags:
```
def filter_species_2(species):
filtered_species = filter_species_1(species)
filtered_species = [name for name in filtered_species if name[-1] == "a"]
return filtered_species
print(filter_species_2(species))
```
%%Output
['E. a']
%% Cell type:markdown id: tags:
**Additionalexercises**
%% Cell type:markdown id: tags:
2.6) Load 4-6 protein sequences into a list and search them for specific motive, e.g. "VAL". You should only return those sequences that contain the motive. Additional:where does the motive lie?
3.1) Write a script to print 50x "here" and the current value of the control variable!
%% Cell type:code id: tags:
```
def print_here(iterations):
for i in range(iterations):
print(i, "here")
print_here(50)
```
%% Output
0 here
1 here
2 here
3 here
4 here
5 here
6 here
7 here
8 here
9 here
10 here
11 here
12 here
13 here
14 here
15 here
16 here
17 here
18 here
19 here
20 here
21 here
22 here
23 here
24 here
25 here
26 here
27 here
28 here
29 here
30 here
31 here
32 here
33 here
34 here
35 here
36 here
37 here
38 here
39 here
40 here
41 here
42 here
43 here
44 here
45 here
46 here
47 here
48 here
49 here
%% Cell type:markdown id: tags:
---
3.2) Write a script to walk through the species list and to print the character from the species where the index corresponds to the current control variable value!
The auxiliary space complexity of the dynamic programming solution presented above can be optimized substantially.
First note that the computation in that solution always only depends on the last row already computed. Therefore, it suffices to only store two rows at once decreasing the auxiliary space complexity to O(min(n,m)).
There is still room for improvement. If you perform the computation diagonal-wise instead of row-wise, we will only need to store the last already computed element of that diagonal. This way, we can get away with O(1) auxiliary space usage.
Another totally different solution of the longest common substring problem resolves around a data structure named generalized suffix tree. With the help of this data structure it is possibly to obtain a solution with O(n + m) time and auxiliary space complexity. However, that solution is far more difficult to implement and the relatively high constant factors in the space usage may make it prohibitive for large inputs.