... | @@ -3,8 +3,8 @@ |
... | @@ -3,8 +3,8 @@ |
|
# Session 7: Practice 1
|
|
# Session 7: Practice 1
|
|
|
|
|
|
* **Time**: 2h
|
|
* **Time**: 2h
|
|
* **Goals**:
|
|
* **Goal**:
|
|
* Creating our first Class Module for working with DNA sequences
|
|
* To create our first Class Module to work with DNA sequences
|
|
|
|
|
|
## Contents
|
|
## Contents
|
|
|
|
|
... | @@ -27,23 +27,23 @@ |
... | @@ -27,23 +27,23 @@ |
|
|
|
|
|
## Introduction
|
|
## Introduction
|
|
|
|
|
|
The goal of this assignment is to developt the **Seq Class** for working with **DNA sequences**. This class will be in a module called **Seq1** (The python file should be called Seq1.py). It is quite similar to the one we already worked on (**Seq0**), but using an **Object Oriented approach**. We will also include several improvements.
|
|
The goal of this assignment is to develop the **Seq Class** to work with **DNA sequences**. This class will be in a module called **Seq1** (thus, the Python file should be called Seq1.py). It is quite similar to the one we already worked on (**Seq0**), but using an **Object Oriented approach**. We will also include several improvements.
|
|
|
|
|
|
These are all the normal **methods** that should be implemented in the **Seq Class**
|
|
These are all the normal **methods** that should be implemented in the **Seq Class**
|
|
| Method | Parameters | Return value | Description |
|
|
| Method | Parameters | Return value | Description |
|
|
|--------|------------|--------------|-------------|
|
|
|--------|------------|--------------|-------------|
|
|
| **len** | None | Integer | Calculates the total number of bases in the sequence |
|
|
| **len** | None | Integer | Returns the total number of bases in the sequence |
|
|
| **count_base**(base) | base: character | Integer | Calculate the number of the given base in the Sequence |
|
|
| **count_base**(base) | base: character | Integer | Returns the number of the given base in the Sequence |
|
|
| **count** | None | A dicctionary | Calculates the number of all the bases in the sequence. A dicctionary with the results is returned. The keys are the bases and the values their number |
|
|
| **count** | None | A dictionary | Returns the number of all the bases in the sequence as a dictionary. The keys are the bases and the values the number of times they appear |
|
|
| **reverse** | None | String | Return the reverse sequence |
|
|
| **reverse** | None | String | Returns the reverse sequence |
|
|
| **complement** | None | String | Return the complement sequence |
|
|
| **complement** | None | String | Returns the complement sequence |
|
|
| **read_fasta**(filename) | filename: string | String | Opens a DNA file in FASTA format and store it within the object |
|
|
| **read_fasta**(filename) | filename: string | String | Opens a DNA file in FASTA format and stores it in the object (as an attribute) |
|
|
|
|
|
|
In addition, the **Class Seq** will have the **special methods** that we already know: **\__init_\_()** for initializing the object and, **\__str_\_()** for printing the object as a sequence.
|
|
In addition, the **Class Seq** will have the **special methods** that we already know: **\__init_\_()** for initializing the object and, **\__str_\_()** for printing the object as a sequence.
|
|
|
|
|
|
## Finish your previous practices
|
|
## Finish your previous practices
|
|
|
|
|
|
**Before** starting this new assignment, please **finish** all previous work, since all the stuff we will be working on in this practive is related to that.
|
|
**Before** starting this new assignment, please **finish** all work in previous practices, since everything we will be working on in this practice is related to that.
|
|
|
|
|
|
## Exercises
|
|
## Exercises
|
|
|
|
|
... | @@ -51,7 +51,7 @@ We will develop the **Seq class** **incrementally**, starting from the work done |
... | @@ -51,7 +51,7 @@ We will develop the **Seq class** **incrementally**, starting from the work done |
|
|
|
|
|
### Exercise 1: Creating the Seq1 module
|
|
### Exercise 1: Creating the Seq1 module
|
|
|
|
|
|
Create a new python file, called **Seq1.py** in a the **P01 folder**. This file is our **module**, that we will import in all the exercises. Remember that for doing so, you have first to **mark** the P1 folder as **Sources Root**.
|
|
Create a new Python file, called **Seq1.py** in a the **P01 folder**. This file is our **module**, that we will import in all the exercises. Remember that for doing so, you have first to **mark** the P1 folder as **Sources Root** in pyCharm (remember that pyCharm uses the source roots as the starting point for resolving imports).
|
|
|
|
|
|
Copy the **Seq Class** that you have already developed in the exercises of **Session 6** in the **Seq1.py** file.
|
|
Copy the **Seq Class** that you have already developed in the exercises of **Session 6** in the **Seq1.py** file.
|
|
|
|
|
... | @@ -61,7 +61,7 @@ Copy the **Seq Class** that you have already developed in the exercises of **Ses |
... | @@ -61,7 +61,7 @@ Copy the **Seq Class** that you have already developed in the exercises of **Ses |
|
The goal of this first exercise is making sure that you can access to the Seq class from **external files**.
|
|
The goal of this first exercise is making sure that you can access to the Seq class from **external files**.
|
|
|
|
|
|
* **Filename:** P01/e1.py
|
|
* **Filename:** P01/e1.py
|
|
* **Description**: Write a python program that creates an object with the sequence "ACTGA" and prints both its length and the sequence itself. The output should be like this:
|
|
* **Description**: Write a Python program that creates an object with the sequence "ACTGA" and prints both its length and the sequence itself. The output should be like this:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 1 |------
|
|
-----| Practice 1, Exercise 1 |------
|
... | @@ -82,7 +82,7 @@ from Seq1 import Seq |
... | @@ -82,7 +82,7 @@ from Seq1 import Seq |
|
We will manage **three types** of sequences: **Valid**, **Invalid** and **Null**:
|
|
We will manage **three types** of sequences: **Valid**, **Invalid** and **Null**:
|
|
|
|
|
|
* **Null**: Empty sequence "". It has no bases at all
|
|
* **Null**: Empty sequence "". It has no bases at all
|
|
* **Valid**: A sequence composed of the union of only the four valid bases: 'A', 'T', 'C', 'G'. Example: "ATTACG"
|
|
* **Valid**: A sequence composed exclusively of the union of the four valid bases: 'A', 'T', 'C', 'G'. Example: "ATTACG"
|
|
* **Invalid**: A sequence that has one or more characters that are not valid bases. Example: "ATTXXG"
|
|
* **Invalid**: A sequence that has one or more characters that are not valid bases. Example: "ATTXXG"
|
|
|
|
|
|
In this exercise we will implement the **null sequences**.
|
|
In this exercise we will implement the **null sequences**.
|
... | @@ -98,24 +98,18 @@ s2 = Seq("TATAC") |
... | @@ -98,24 +98,18 @@ s2 = Seq("TATAC") |
|
|
|
|
|
The difference between the creation of the previous two objects is that the first one has no arguments when calling Seq, and the second one has one (a sequence). This means that the **argument** provided to the **\__init()\_\_** method is **optional**.
|
|
The difference between the creation of the previous two objects is that the first one has no arguments when calling Seq, and the second one has one (a sequence). This means that the **argument** provided to the **\__init()\_\_** method is **optional**.
|
|
|
|
|
|
For creating null sequences the definition of the **\__init()\_\_** method should be like this:
|
|
To create null sequences the definition of the **\__init()\_\_** method should be like this:
|
|
|
|
|
|
```python3
|
|
|
|
def __init__(self, strbases="NULL"):
|
|
|
|
```
|
|
|
|
|
|
|
|
Or like this:
|
|
|
|
|
|
|
|
```python3
|
|
```python3
|
|
def __init__(self, strbases=None):
|
|
def __init__(self, strbases=None):
|
|
```
|
|
```
|
|
|
|
|
|
It is used in python for creating **optional arguments**. If no argument is given, python automatically will create one with the default value to "NULL" in case of the first option (if we go for the second option, we need to code an _if strbases == None_ and then assigne "NULL" to the corresponding attribute.
|
|
A parameter that is equaled to something (strbases=None in our case) is used in Python to create **optional arguments**. This means that if no argument is passed when calling the method, Python automatically creates one with the default value to None. In other words, calling Seq() will create a null sequence as no parameter is passed to the initialization method.
|
|
|
|
|
|
When a null sequence is created, the **\__init()\_\_** method will print the message: "NULL sequence Created"
|
|
When a null sequence is created, the **\__init()\_\_** method will print the message: "NULL sequence Created"
|
|
|
|
|
|
* **Filename:** P01/e2.py
|
|
* **Filename:** P01/e2.py
|
|
* **Description**: Write a python program that first creates a null sequence and then a valid sequence. It should prints the two objects. The output of the program should be:
|
|
* **Description**: Write a Python program that first creates a null sequence and then a valid sequence. It should print the two objects. The output of the program should be:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 2 |------
|
|
-----| Practice 1, Exercise 2 |------
|
... | @@ -125,11 +119,11 @@ Sequence 1: NULL |
... | @@ -125,11 +119,11 @@ Sequence 1: NULL |
|
Sequence 2: TATAC
|
|
Sequence 2: TATAC
|
|
```
|
|
```
|
|
|
|
|
|
* **Considerations**: The first you should do in the **\__init()\_\_** method is checking if it is a null sequence. If so, print the message on the console, assign the value to the **self.strbases** attribute and return. If it is not null, continue with the other checks.
|
|
* **Considerations**: The first you should do in the **\__init()\_\_** method is to check if it is a null sequence. If so, print the message on the console, assign the value to the **self.strbases** attribute and return. If it is not null, continue with the other checks.
|
|
|
|
|
|
### Exercise 3: Null, valid and invalid sequences
|
|
### Exercise 3: Null, valid and invalid sequences
|
|
|
|
|
|
In this exercise we will make sure that our Seq class works ok with the **three types** of sequences. We will create these **three sequences**:
|
|
In this exercise we will make sure that our Seq class works correctly with the **three types** of sequences. We will create these **three sequences**:
|
|
|
|
|
|
```python3
|
|
```python3
|
|
# -- Create a Null sequence
|
|
# -- Create a Null sequence
|
... | @@ -143,7 +137,7 @@ s3 = Seq("Invalid sequence") |
... | @@ -143,7 +137,7 @@ s3 = Seq("Invalid sequence") |
|
```
|
|
```
|
|
|
|
|
|
* **Filename:** P01/e3.py
|
|
* **Filename:** P01/e3.py
|
|
* **Description**: Write a python program that creates three sequences: one null, one valid, and one that is invalid and then it prints the objects. This is what we should see:
|
|
* **Description**: Write a Python program that creates three sequences: one null, one valid, and one that is invalid, and then prints the objects. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 3 |------
|
|
-----| Practice 1, Exercise 3 |------
|
... | @@ -155,12 +149,12 @@ Sequence 2: ACTGA |
... | @@ -155,12 +149,12 @@ Sequence 2: ACTGA |
|
Sequence 3: ERROR
|
|
Sequence 3: ERROR
|
|
```
|
|
```
|
|
|
|
|
|
### Exercise 4: len() method
|
|
### Exercise 4: The len() method
|
|
|
|
|
|
The **len(self)** method should work with the **three types** of sequences. In case the sequence is **null** or **invalid**, the length should always be **0**. Implement this in the Seq Class.
|
|
The **len(self)** method should work with the **three types** of sequences. In case the sequence is **null** or **invalid**, the length should always be **0**. Implement this in the Seq Class.
|
|
|
|
|
|
* **Filename**: P01/e4.py
|
|
* **Filename**: P01/e4.py
|
|
* **Desription**: Write a python program that creates three sequences: a null one, a valid, and one that isinvalid and then it prints their lengths and sequences. This is what we should see:
|
|
* **Description**: Write a Python program that creates three sequences: a null one, a valid, and one that is invalid, and then prints their lengths and sequences. This is what we should see after running this program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 4 |------
|
|
-----| Practice 1, Exercise 4 |------
|
... | @@ -179,7 +173,7 @@ Process finished with exit code 0 |
... | @@ -179,7 +173,7 @@ Process finished with exit code 0 |
|
Implement the **count_base(self, base)** method. If the sequence is either null or invalid, the return value should be 0.
|
|
Implement the **count_base(self, base)** method. If the sequence is either null or invalid, the return value should be 0.
|
|
|
|
|
|
* **Filename**: P01/e5.py
|
|
* **Filename**: P01/e5.py
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then it prints their lengths, sequences and the number of bases on the console. This is what we should see:
|
|
* **Description**: Write a Python program that creates three sequences: one null, one valid, and one invalid, and then prints their lengths, sequences and the number of bases on the console. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 5 |------
|
|
-----| Practice 1, Exercise 5 |------
|
... | @@ -196,12 +190,12 @@ Sequence 2: (Length: 0) ERROR |
... | @@ -196,12 +190,12 @@ Sequence 2: (Length: 0) ERROR |
|
Process finished with exit code 0
|
|
Process finished with exit code 0
|
|
```
|
|
```
|
|
|
|
|
|
### Exercise 6: count() method
|
|
### Exercise 6: The count() method
|
|
|
|
|
|
Implement the **count(self)** method. If the sequence is null or invalid, the returned dictionary should have all its values to 0.
|
|
Implement the **count(self)** method. If the sequence is null or invalid, the returned dictionary should have all its values to 0.
|
|
|
|
|
|
* **Filename**: P01/e6.py
|
|
* **Filename**: P01/e6.py
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one that is invalid and then it prints their lengths, sequences and the dictionary returned by the count() method. This is what we should see;
|
|
* **Description**: Write a Python program that creates three sequences: one null, one valid, and one that is invalid, and then prints their lengths, sequences and the dictionary returned by the count() method. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 6 |------
|
|
-----| Practice 1, Exercise 6 |------
|
... | @@ -220,10 +214,10 @@ Process finished with exit code 0 |
... | @@ -220,10 +214,10 @@ Process finished with exit code 0 |
|
|
|
|
|
### Exercise 7: reverse() method
|
|
### Exercise 7: reverse() method
|
|
|
|
|
|
Implement the **reverse(self)** method. If the sequence is either null or invalid, it should return the string "NULL" or "ERROR" (and not its reverse!).
|
|
Implement the **reverse(self)** method. If the sequence is either null or invalid, it should return the "ERROR" string (and not its reverse!).
|
|
|
|
|
|
* **Filename**: P1/e7.py
|
|
* **Filename**: P1/e7.py
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then prints their lengths, sequences, the dictionary with the bases, and the reverse sequence. This is what we should see:
|
|
* **Description**: Write a Python program that creates three sequences: one null, one valid, and one invalid, and then prints their lengths, sequences, the dictionary with the bases, and the reverse sequence. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 7 |------
|
|
-----| Practice 1, Exercise 7 |------
|
... | @@ -243,10 +237,10 @@ Sequence 2: (Length: 0) ERROR |
... | @@ -243,10 +237,10 @@ Sequence 2: (Length: 0) ERROR |
|
|
|
|
|
### Exercise 8: complement() method
|
|
### Exercise 8: complement() method
|
|
|
|
|
|
Implement the **complement(self)** method. If the sequence is either null or invalid, it should return the string "NULL" or "ERROR" (and not its complement!)
|
|
Implement the **complement(self)** method. If the sequence is either null or invalid, it should return the "ERROR" string (and not its complement!)
|
|
|
|
|
|
* **Filename**: P1/e8.py
|
|
* **Filename**: P1/e8.py
|
|
* **Desription**: Write a python program that creates three sequences: one null, one valid, and one invalid and then prints their lengths, sequences, the dictionary with the bases, the reverse sequence, and the complement. This is what we should see:
|
|
* **Description**: Write a Python program that creates three sequences: one null, one valid, and one invalid, and then prints their lengths, sequences, the dictionary with the bases, the reverse sequence, and the complement. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 8 |------
|
|
-----| Practice 1, Exercise 8 |------
|
... | @@ -271,9 +265,9 @@ Process finished with exit code 0 |
... | @@ -271,9 +265,9 @@ Process finished with exit code 0 |
|
|
|
|
|
### Exercise 9: read_fasta() method
|
|
### Exercise 9: read_fasta() method
|
|
|
|
|
|
We have been creating sequences by manually providing a string with the bases. We also want to create sequences that are stored **files** that are **fasta format**. That is the purpose of the read_fasta(self) method.
|
|
We have been creating sequences by manually providing a string with the bases. We now want to create sequences that are stored in **files** in the **fasta format**. That is the purpose of the read_fasta(self, filename) method.
|
|
|
|
|
|
For using the read_fasta() method we first need to create a **null sequence**, and then call the read_fasta() method to populate the object.
|
|
To use the read_fasta() method, we first need to create a **null sequence**, and then call the read_fasta() method to populate the object.
|
|
|
|
|
|
```python3
|
|
```python3
|
|
# -- Create a Null sequence
|
|
# -- Create a Null sequence
|
... | @@ -286,30 +280,28 @@ s.read_fasta(FILENAME) |
... | @@ -286,30 +280,28 @@ s.read_fasta(FILENAME) |
|
After that, you can use the object exactly in the same way than the other sequences initialized manually.
|
|
After that, you can use the object exactly in the same way than the other sequences initialized manually.
|
|
|
|
|
|
* **Filename**: P01/e9.py
|
|
* **Filename**: P01/e9.py
|
|
* **Desription**: Write a python program that reads a sequence from the U5.txt file. Then it should print its length, the sequence itself, the dictionary with the bases, the reverse sequence and the complement. This is what we should see:
|
|
* **Description**: Write a Python program that reads a sequence from the U5.txt file. Then the program should print the length of the sequence, the sequence itself, the dictionary with the bases, the reverse sequence and the complement. This is what we should see after running the program:
|
|
|
|
|
|
```
|
|
```
|
|
-----| Practice 1, Exercise 9 |------
|
|
-----| Practice 1, Exercise 9 |------
|
|
NULL Seq created
|
|
NULL Seq created
|
|
Sequence : (Length: 1314) ATAGACCAAACATGAGAGGCTGTGAATGGTATAATCTTCGCCGT...(not shown)...
|
|
Sequence : (Length: 1314) ATAGACCAAACATGAGAGGCTGTGAATGGTATAATCTTCGCCGT...
|
|
Bases: {'A': 360, 'T': 491, 'C': 229, 'G': 234}
|
|
Bases: {'A': 360, 'T': 491, 'C': 229, 'G': 234}
|
|
Rev: ATAATGACAAGTTTAAAATAATGCCAGACTCTATTACGATTACTACATTCAAGGTAAATAC...(not shown)...
|
|
Rev: ATAATGACAAGTTTAAAATAATGCCAGACTCTATTACGATTACTACATTCAAGGTAAATAC...
|
|
Comp: TATCTGGTTTGTACTCTCCGACACTTACCATATTAGAAGCGGCAAGCTGTCCATTCCAATA...(not shown)...
|
|
Comp: TATCTGGTTTGTACTCTCCGACACTTACCATATTAGAAGCGGCAAGCTGTCCATTCCAATA...
|
|
|
|
|
|
Process finished with exit code 0
|
|
Process finished with exit code 0
|
|
```
|
|
```
|
|
|
|
|
|
As the U5 is very long, only the first characters are shown in this figure (but your exercise will show all the bases).
|
|
As the U5 sequences are too long, only the first characters are shown in the previous snippet -- but your exercise should show the complete sequence.
|
|
|
|
|
|
### Exercise 10: processing the genes
|
|
### Exercise 10: processing the genes
|
|
|
|
|
|
Write a python program that automatically calculates the answer to this question:
|
|
Write a Python program that automatically calculates the answer to this question:
|
|
|
|
|
|
* Which is the most frequent base in each gene?
|
|
* Which is the most frequent base in each gene?
|
|
|
|
|
|
The genes are stored in the fasta files we downladed in a previous session.
|
|
The genes are stored in the fasta files we downloaded in a previous session. This exercise is the same than exercise 8 of practice 0, but now we are using the **Seq Class** instead of the functions developed in the Seq0 module.
|
|
|
|
|
|
This exercise is the same than the exercise 8 of practice 0, but we are using the **Seq Class** instead of the functions developed in the Seq0 module
|
|
|
|
|
|
|
|
* **Filename**: P01/e10.py
|
|
* **Filename**: P01/e10.py
|
|
* **Output**: This is what should be seen after running the script:
|
|
* **Output**: This is what should be seen after running the script:
|
... | @@ -347,11 +339,12 @@ The session is finished. Make sure during this week that everything in this list |
... | @@ -347,11 +339,12 @@ The session is finished. Make sure during this week that everything in this list |
|
* [ ] e9.py
|
|
* [ ] e9.py
|
|
* [ ] e10.py
|
|
* [ ] e10.py
|
|
* [ ] Seq1.py
|
|
* [ ] Seq1.py
|
|
* [ ] All the previous files have been pushed to your remote Github repo
|
|
* [ ] All the previous files have been pushed to your remote Github repository.
|
|
|
|
|
|
# Credits
|
|
# Credits
|
|
|
|
|
|
* Rodrigo Pérez Rodríguez
|
|
* Rodrigo Pérez Rodríguez (main author)
|
|
|
|
* Gregorio Robles (revision and minor fixes and clarifications)
|
|
|
|
|
|
# License
|
|
# License
|
|
|
|
|
... | @@ -360,4 +353,4 @@ The session is finished. Make sure during this week that everything in this list |
... | @@ -360,4 +353,4 @@ The session is finished. Make sure during this week that everything in this list |
|
# Links
|
|
# Links
|
|
|
|
|
|
* [Universidad Rey Juan Carlos de Madrid](https://www.urjc.es/)
|
|
* [Universidad Rey Juan Carlos de Madrid](https://www.urjc.es/)
|
|
* [Escuela Técnica Superior de Ingeniería de Telecomunicaciones (URJC)](https://www.urjc.es/universidad/facultades/escuela-tecnica-superior-de-ingenieria-de-las-telecomunicaciones/content/etsit-escuela-tecnica-superior-de-ingenieria-de-telecomunicacion) |
|
* [Escuela de Ingeniería de Fuenlabrada (URJC)](https://www.urjc.es/eif) |
|
\ No newline at end of file |
|
\ No newline at end of file |