... | ... | @@ -3,7 +3,7 @@ |
|
|
# Session 7: Practice 1
|
|
|
|
|
|
* **Time**: 2h
|
|
|
* **Date**: Wednesday, Feb-16th-2022
|
|
|
* **Date**: XXXX, XX-XX-2023
|
|
|
* **Goals**:
|
|
|
* Creating our first Class Module for working with DNA sequences
|
|
|
|
... | ... | @@ -29,41 +29,41 @@ |
|
|
|
|
|
## Introduction
|
|
|
|
|
|
The goal of this practice is to developt the **Seq Class** for working with **DNA sequences**. This library will be in a module called **Seq1** (The python file should be called Seq1.py). It is quite similar to **Seq0**, developed in the previous practice, but using an **Object Oriented approach**. We will also include several improvements
|
|
|
The goal of this assignment is to developt the **Seq Class** for working with **DNA sequences**. This class will be in a module called **Seq1** (The python file should be called Seq1.py). It is quite similar to the one we already worked on (**Seq0**), but using an **Object Oriented approach**. We will also include several improvements.
|
|
|
|
|
|
These are all the normal **methods** that should be implemented in the **Seq Class**
|
|
|
| Method | Parameters | Return value | Description |
|
|
|
|--------|------------|--------------|-------------|
|
|
|
| **len** | None | integer | Calculate the total number of bases in the sequence |
|
|
|
| **len** | None | Integer | Calculates the total number of bases in the sequence |
|
|
|
| **count_base**(base) | base: character | Integer | Calculate the number of the given base in the Sequence |
|
|
|
| **count** | None | A dicctionary | Calculate the number of all the bases in the sequence. A dicctionary with the results is returned. The keys are the bases and the values their number |
|
|
|
| **count** | None | A dicctionary | Calculates the number of all the bases in the sequence. A dicctionary with the results is returned. The keys are the bases and the values their number |
|
|
|
| **reverse** | None | String | Return the reverse sequence |
|
|
|
| **complement** | None | String | Return the complement sequence |
|
|
|
| **read_fasta**(filename) | filename: string | String | Open a DNA file in FASTA format and stored it inside the object |
|
|
|
| **read_fasta**(filename) | filename: string | String | Opens a DNA file in FASTA format and store it within the object |
|
|
|
|
|
|
In addition, the **Class Seq** will have the **special method**s that we already know: **\__init_\_()** for initializing the object and **\__str_\_()** for printing the object as a sequence
|
|
|
In addition, the **Class Seq** will have the **special methods** that we already know: **\__init_\_()** for initializing the object and, **\__str_\_()** for printing the object as a sequence.
|
|
|
|
|
|
## Finish your previous practices
|
|
|
|
|
|
**Before** starting Practice 1, spend time **finishing** the **previous practices**. This practice is based on this previous work
|
|
|
**Before** starting this new assignment, pleae **finish** the all previous work, since all we will be working here is related to that.
|
|
|
|
|
|
## Exercises
|
|
|
|
|
|
We will develop the **Seq class** **incrementally**, starting from your work on the Session 6
|
|
|
We will develop the **Seq class** **incrementally**, starting from the work done in Session 6.
|
|
|
|
|
|
### Exercise 1: Creating the Seq1 module
|
|
|
|
|
|
Create a new python file, called **Seq1.py** in the **P1 folder**. This file is our **module**, that we will import from our exercises. Remember that for doing so, you have first to **mark** the P1 folder as **Sources Root**
|
|
|
Create a new python file, called **Seq1.py** in a the **P01 folder**. This file is our **module**, that we will import in all the exercises. Remember that for doing so, you have first to **mark** the P1 folder as **Sources Root**.
|
|
|
|
|
|
Copy the **Seq Class** that you have already developed in the exercises of **Session 6** in the **Seq1.py** file
|
|
|
Copy the **Seq Class** that you have already developed in the exercises of **Session 6** in the **Seq1.py** file.
|
|
|
|
|
|
* **Filename:** P1/Seq1.py
|
|
|
* **Description**: This file where the Seq Class for working with DNA sequences is stored. It is our Seq1 module
|
|
|
* **Filename:** P01/Seq1.py
|
|
|
* **Description**: This is the file where the Seq Class will be implemented. It is our Seq1 module
|
|
|
|
|
|
The goal of this first exercise is making sure that you can access to the Seq class from **external files**
|
|
|
The goal of this first exercise is making sure that you can access to the Seq class from **external files**.
|
|
|
|
|
|
* **Filename:** P1/Ex1.py
|
|
|
* **Description**: Write a python program that creates an object with the sequence "ACTGA" and prints its length and the sequence itself. The output should be like this:
|
|
|
* **Filename:** P01/e1.py
|
|
|
* **Description**: Write a python program that creates an object with the sequence "ACTGA" and prints both its length and the sequence itself. The output should be like this:
|
|
|
|
|
|
```
|
|
|
-----| Exercise 1 |------
|
... | ... | @@ -73,7 +73,7 @@ Sequence 1: (Length: 5) ACTGA |
|
|
Process finished with exit code 0
|
|
|
```
|
|
|
|
|
|
* **Considerations**: The first thing you have to do is to **import** the **Seq Class** from the **Seq1 module**
|
|
|
* **Considerations**: The first thing you have to do is to **import** the **Seq Class** from the **Seq1 module**:
|
|
|
|
|
|
```python3
|
|
|
from Seq1 import Seq
|
... | ... | @@ -83,11 +83,11 @@ from Seq1 import Seq |
|
|
|
|
|
We will manage **three types** of sequences: **Valid**, **Invalid** and **Null**:
|
|
|
|
|
|
* **Null**: Empty sequence "".It has no bases
|
|
|
* **Valid**: A sequence compose of the union of only the four valid bases: 'A', 'T', 'C', 'G'. Example: "ATTACG"
|
|
|
* **Null**: Empty sequence "". It has no bases at all
|
|
|
* **Valid**: A sequence composed of the union of only the four valid bases: 'A', 'T', 'C', 'G'. Example: "ATTACG"
|
|
|
* **Invalid**: A sequence that has one or more characters that are not valid bases. Example: "ATTXXG"
|
|
|
|
|
|
In this exercise we will implement the **Null sequences**
|
|
|
In this exercise we will implement the **null sequences**.
|
|
|
|
|
|
The **null sequences** are created by calling the **Seq() class** with **no** arguments:
|
|
|
|
... | ... | @@ -98,20 +98,26 @@ s = Seq() |
|
|
s = Seq("TATAC")
|
|
|
```
|
|
|
|
|
|
The difference between the creation of the previous two object is that the first one has no arguments when calling Seq, and the second one has one. This means that the **argument** passed to the **\__init()\_\_** method is **optional**
|
|
|
The difference between the creation of the previous two objects is that the first one has no arguments when calling Seq, and the second one has one (a sequence). This means that the **argument** provided to the **\__init()\_\_** method is **optional**.
|
|
|
|
|
|
For creating Null sequences the definition of the **\__init()\_\_** method should be like this:
|
|
|
For creating null sequences the definition of the **\__init()\_\_** method should be like this:
|
|
|
|
|
|
```python3
|
|
|
def __init__(self, strbases="NULL"):
|
|
|
```
|
|
|
|
|
|
It is used in python for creating **optional arguments**. If no argument is given, python automatically will create one with the default value to "NULL". This is the value we will use to identify the **null sequences**
|
|
|
Or like this:
|
|
|
|
|
|
When a Null sequence is created, the **\__init()\_\_** method will print the message: "NULL Seq Created"
|
|
|
```python3
|
|
|
def __init__(self, strbases=None):
|
|
|
```
|
|
|
|
|
|
It is used in python for creating **optional arguments**. If no argument is given, python automatically will create one with the default value to "NULL" in case of the first option (if we go for the second option, we need to code an _if strbases == None_ and then assigne "NULL" to the corresponding attribute.
|
|
|
|
|
|
* **Filename:** P1/Ex2.py
|
|
|
* **Description**: Write a python program that creates first a null sequence and then a valid sequence. It should prints the objects. The output of the program should be:
|
|
|
When a null sequence is created, the **\__init()\_\_** method will print the message: "NULL Seq Created"
|
|
|
|
|
|
* **Filename:** P01/e2.py
|
|
|
* **Description**: Write a python program that first creates a null sequence and then a valid sequence. It should prints the two objects. The output of the program should be:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 2 |------
|
... | ... | @@ -121,11 +127,11 @@ Sequence 1: NULL |
|
|
Sequence 2: ACTGA
|
|
|
```
|
|
|
|
|
|
* **Considerations**: The first you should do in the **\__init()\_\_** method is checking if it is a null sequence. If so, print the message on the console, assign the value to the **self.strbases** attribute and return. If it is not null, continue with the other checks
|
|
|
* **Considerations**: The first you should do in the **\__init()\_\_** method is checking if it is a null sequence. If so, print the message on the console, assign the value to the **self.strbases** attribute and return. If it is not null, continue with the other checks.
|
|
|
|
|
|
### Exercise 3: Null, valid and invalid sequences
|
|
|
|
|
|
In this exercise we will make sure that our Seq class works ok with the **three types** of sequences. We will create this **three sequences**:
|
|
|
In this exercise we will make sure that our Seq class works ok with the **three types** of sequences. We will create these **three sequences**:
|
|
|
|
|
|
```python3
|
|
|
# -- Create a Null sequence
|
... | ... | @@ -138,8 +144,8 @@ s2 = Seq("ACTGA") |
|
|
s3 = Seq("Invalid sequence")
|
|
|
```
|
|
|
|
|
|
* **Filename:** P1/E3.py
|
|
|
* **Description**: Write a python program that creates three sequences: null, valid and invalid. Then it prints the objects in the console. This is what we should see on the **console:**
|
|
|
* **Filename:** P01/e3.py
|
|
|
* **Description**: Write a python program that creates three sequences: one null, one valid, and one that is invalid and then it prints the objects. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 3 |------
|
... | ... | @@ -153,10 +159,10 @@ Sequence 3: ERROR |
|
|
|
|
|
### Exercise 4: len() method
|
|
|
|
|
|
The **len(self)** method should works with the **three types** of sequences. In case the sequence is **Null** or **invalid**, the length should be always **0**. Implement this behaviour in the Seq Class
|
|
|
The **len(self)** method should work with the **three types** of sequences. In case the sequence is **null** or **invalid**, the length should always be **0**. Implement this in the Seq Class.
|
|
|
|
|
|
* **Filename**: P0/Ex4.py
|
|
|
* **Desription**: Write a python program that creates three sequences: null, valid and invalid. Then it prints their lengths and sequences on the console. This is what we should see on the **console:**
|
|
|
* **Filename**: P01/e4.py
|
|
|
* **Desription**: Write a python program that creates three sequences: a null one, a valid, and one that isinvalid and then it prints their lengths and sequences. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 4 |------
|
... | ... | @@ -172,10 +178,10 @@ Process finished with exit code 0 |
|
|
|
|
|
### Exercise 5: count_base() method
|
|
|
|
|
|
Implement the **count_base(self)** method. If the sequence is Null or invalid, the return value should be 0
|
|
|
Implement the **count_base(self)** method. If the sequence is either null or invalid, the return value should be 0.
|
|
|
|
|
|
* **Filename**: P0/Ex5.py
|
|
|
* **Desription**: Write a python program that creates three sequences: null, valid and invalid. Then it prints their lengths, sequences and the number of bases on the console. This is what we should see on the **console:**
|
|
|
* **Filename**: P01/e5.py
|
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then it prints their lengths, sequences and the number of bases on the console. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 5 |------
|
... | ... | @@ -183,21 +189,21 @@ NULL Seq created |
|
|
New sequence created!
|
|
|
INVALID Seq!
|
|
|
Sequence 0: (Length: 0) NULL
|
|
|
A: 0, C: 0, T: 0, G: 0,
|
|
|
A: 0, C: 0, T: 0, G: 0
|
|
|
Sequence 1: (Length: 5) ACTGA
|
|
|
A: 2, C: 1, T: 1, G: 1,
|
|
|
A: 2, C: 1, T: 1, G: 1
|
|
|
Sequence 2: (Length: 0) ERROR
|
|
|
A: 0, C: 0, T: 0, G: 0,
|
|
|
A: 0, C: 0, T: 0, G: 0
|
|
|
|
|
|
Process finished with exit code 0
|
|
|
```
|
|
|
|
|
|
### Exercise 6: count() method
|
|
|
|
|
|
Implement the **count(self)** method. If the sequence is Null or invalid, the returned dictionary should have all its values to 0
|
|
|
Implement the **count(self)** method. If the sequence is null or invalid, the returned dictionary should have all its values to 0.
|
|
|
|
|
|
* **Filename**: P0/Ex6.py
|
|
|
* **Desription**: Write a python program that creates three sequences: null, valid and invalid. Then it prints their lengths, sequences and dictionary returned by the count() method. This is what we should see on the **console:**
|
|
|
* **Filename**: P01/e6.py
|
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one that is invalid and then it prints their lengths, sequences and the dictionary returned by the count() method. This is what we should see;
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 6 |------
|
... | ... | @@ -216,10 +222,10 @@ Process finished with exit code 0 |
|
|
|
|
|
### Exercise 7: reverse() method
|
|
|
|
|
|
Implement the **reverse(self)** method. If the sequence is Null or invalid, it should return the string "NULL" or "ERROR" respectively (and not its reverse!)
|
|
|
Implement the **reverse(self)** method. If the sequence is either null or invalid, it should return the string "NULL" or "ERROR" (and not its reverse!).
|
|
|
|
|
|
* **Filename**: P0/Ex7.py
|
|
|
* **Desription**: Write a python program that creates three sequences: null, valid and invalid. Then it prints their lengths, sequences, the dictionary with the bases and the reverse sequence. This is what we should see on the **console:**
|
|
|
* **Filename**: P1/e7.py
|
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then prints their lengths, sequences, the dictionary with the bases, and the reverse sequence. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 7 |------
|
... | ... | @@ -239,10 +245,10 @@ Sequence 2: (Length: 0) ERROR |
|
|
|
|
|
### Exercise 8: complement() method
|
|
|
|
|
|
Implement the **complement(self)** method. If the sequence is Null or invalid, it should return the string "NULL" or "ERROR" respectively (and not its complement!)
|
|
|
Implement the **complement(self)** method. If the sequence is either null or invalid, it should return the string "NULL" or "ERROR" (and not its complement!)
|
|
|
|
|
|
* **Filename**: P0/Ex8.py
|
|
|
* **Desription**: Write a python program that creates three sequences: null, valid and invalid. Then it prints their lengths, sequences, the dictionary with the bases, the reverse sequence and the complement. This is what we should see on the **console:**
|
|
|
* **Filename**: P1/e8.py
|
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then prints their lengths, sequences, the dictionary with the bases, the reverse sequence, and the complement. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 8 |------
|
... | ... | @@ -267,9 +273,9 @@ Process finished with exit code 0 |
|
|
|
|
|
### Exercise 9: read_fasta() method
|
|
|
|
|
|
We have been creating sequences by passing a string with the bases to the object. We also want to create sequence from **files** in **fasta format**. That is the purpose of the read_fasta(self) method
|
|
|
We have been creating sequences by manually providing a string with the bases. We also want to create sequences that are stored **files** that are **fasta format**. That is the purpose of the read_fasta(self) method.
|
|
|
|
|
|
For using the read_fasta() method we need first to create a **Null sequenc**e. And then we call the read_fasta() method
|
|
|
For using the read_fasta() method we first need to create a **null sequence**, and then call the read_fasta() method to populate the object.
|
|
|
|
|
|
```python3
|
|
|
# -- Create a Null sequence
|
... | ... | @@ -279,10 +285,10 @@ s = Seq() |
|
|
s.read_fasta(FILENAME)
|
|
|
```
|
|
|
|
|
|
After that, you can use the sequence s exactly in the same way than the other sequences initialized manually
|
|
|
After that, you can use the object exactly in the same way than the other sequences initialized manually.
|
|
|
|
|
|
* **Filename**: P0/Ex9.py
|
|
|
* **Desription**: Write a python program that reads a sequence from the U5.txt file. Then it should print its lengths, the sequence, the dictionary with the bases, the reverse sequence and the complement. This is what we should see on the **console:**
|
|
|
* **Filename**: P01/e9.py
|
|
|
* **Desription**: Write a python program that reads a sequence from the U5.txt file. Then it should print its length, the sequence itself, the dictionary with the bases, the reverse sequence and the complement. This is what we should see:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 9 |------
|
... | ... | @@ -295,18 +301,20 @@ Sequence : (Length: 1314) ATAGACCAAACATGAGAGGCTGTGAATGGTATAATCTTCGCCGT...(not sh |
|
|
Process finished with exit code 0
|
|
|
```
|
|
|
|
|
|
As the U5 is very long, only the first characters are shown in this figure (but the exercise will show all the bases)
|
|
|
As the U5 is very long, only the first characters are shown in this figure (but your exercise will show all the bases).
|
|
|
|
|
|
### Exercise 10: processing the genes
|
|
|
|
|
|
Write a python program that automatically calculate the answer to this question:
|
|
|
Write a python program that automatically calculates the answer to this question:
|
|
|
|
|
|
* Which is the most frequent base in each gene?
|
|
|
|
|
|
The genes are stored in the fasta files we downladed in a previous session.
|
|
|
|
|
|
This exercise is the same than the exercise 8 of practice 0, but we are using the **Seq Class** instead of the functions developed in the Seq0 module
|
|
|
|
|
|
* **Filename**: P0/Ex10.py
|
|
|
* **Output**: This is what should be seen on the console after the execution
|
|
|
* **Filename**: P01/e10.py
|
|
|
* **Output**: This is what should be seen after running the script:
|
|
|
|
|
|
```
|
|
|
-----| Practice 1, Exercise 10 |------
|
... | ... | @@ -326,31 +334,30 @@ Process finished with exit code 0 |
|
|
|
|
|
## END of the session
|
|
|
|
|
|
The session is finished. Make sure, during this week, that everything in this list is checked!
|
|
|
The session is finished. Make sure during this week that everything in this list is checked!
|
|
|
|
|
|
* [ ] You have all the items of the session 6 checked!
|
|
|
* [ ] Your working repo contains the **P1 Folder** with the following files:
|
|
|
* [ ] Ex1.py
|
|
|
* [ ] Ex2.py
|
|
|
* [ ] Ex3.py
|
|
|
* [ ] Ex4.py
|
|
|
* [ ] Ex5.py
|
|
|
* [ ] Ex6.py
|
|
|
* [ ] Ex7.py
|
|
|
* [ ] Ex8.py
|
|
|
* [ ] Ex9.py
|
|
|
* [ ] Ex10.py
|
|
|
* [ ] Seq0.py
|
|
|
* [ ] All the previous files have been pushed to your remote Github repo
|
|
|
* [ ] Your working repo contains the **P01 Folder** with the following files:
|
|
|
* [ ] e1.py
|
|
|
* [ ] e2.py
|
|
|
* [ ] e3.py
|
|
|
* [ ] e4.py
|
|
|
* [ ] e5.py
|
|
|
* [ ] e6.py
|
|
|
* [ ] e7.py
|
|
|
* [ ] e8.py
|
|
|
* [ ] e9.py
|
|
|
* [ ] e10.py
|
|
|
* [ ] Seq1.py
|
|
|
* [ ] All the previous files have been pushed to your remote Gitlab repo
|
|
|
|
|
|
# Credits
|
|
|
|
|
|
* [Juan González-Gómez](https://github.com/Obijuan) (Obijuan)
|
|
|
* [Alvaro del Castillo](https://github.com/acs). He designed and created the original content of this subject. Thanks a lot :-)
|
|
|
* Rodrigo Pérez Rodríguez
|
|
|
|
|
|
# License
|
|
|
|
|
|

|
|
|

|
|
|
|
|
|
# Links
|
|
|
|
... | ... | |