... | ... | @@ -3,7 +3,7 @@ |
|
|
# Session 12: HTTP protocol
|
|
|
|
|
|
* **Time**: 2h
|
|
|
* **Date**: Tuesday, March-15th-2022
|
|
|
* **Date**: XXXX, XX-XX-2023
|
|
|
* **Goals**:
|
|
|
* Learn about the HTTP protocol
|
|
|
* Write our first web server using sockets
|
... | ... | @@ -38,8 +38,8 @@ |
|
|
|
|
|
# Introduction to the HTTP protocol
|
|
|
|
|
|
* [HTTP protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) is the language spoken between a **browser** (client) and a **web server**
|
|
|
* This is our **general scenario**, in which there is a communication between one client and one server. As we already know, there are two kinds of sockets: one just for listening to new connection on the server (Red dot), and others for interchanging data between the client and the server (blue dots)
|
|
|
* [HTTP protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) is the language spoken between **browsers** (clients) and a **web servers**
|
|
|
* This is our **general scenario**, where communication between one client and one server happens. As we already know, there are two types of sockets: one just for listening to new connections on the server, and other to exchage date data between the client and the server
|
|
|
|
|
|

|
|
|
|
... | ... | @@ -50,62 +50,62 @@ |
|
|
|
|
|
## Requesting a web page
|
|
|
|
|
|
Let's understand what is happening when a **browser** connects to a **web server** for viewing a **web page**. This is the **initial scenario**:
|
|
|
Let's understand what happens when a **browser** connects to a **web server** to download a **web page**. This is the **initial scenario**:
|
|
|
|
|
|

|
|
|
|
|
|
The client is the **browser** running in our device (computer, mobile, tablet...). the server is running in another computer on the internet. It is waiting for the clients to connect
|
|
|
The client is the **browser** running in our device (computer, mobile, tablet...); the server is running in another computer on the internet, and it is waiting for clients to connect.
|
|
|
|
|
|
### Step 1: Connection establishment
|
|
|
|
|
|
When we write an **URL** in the browser, we are requesting a web page from the server. The client creates a **socket** and **establish a connection** with the server. The server creates a new **socket** (clientsocket) for **interchanging data** with the **client** (in both directions). The original sockets continues listening for new connections
|
|
|
When we write an **URL** in the browser, we are requesting a web page from the server. The client creates a **socket** and **establishes a connection** with the server. The server creates a new **socket** (the client socket) to **exchange data** with the **client** (in both directions). The original socket continues listening for new connections.
|
|
|
|
|
|

|
|
|
|
|
|
Now the client and server can **communicate** by means of the "blue" sockets. When they **write** to the sockets, the data is **sent**. When they **read** from them, the data is **received**. There is a **bidirectional communication** channel established
|
|
|
Now the client and the server can **communicate** by means on the blue sockets. When they **write** to the sockets, data is **sent**; when they **read** from the sockets, the data is **received**. There is a **bidirectional communication** channel established.
|
|
|
|
|
|
### Step 2: The client sends a request message for a web page
|
|
|
|
|
|
The client takes the initiative (always) and sends a **request message** for obtaining the **web page** that the user wants to see
|
|
|
The client takes the initiative (always) and sends a **request message** to obtain the **web page** that the user wants to see.
|
|
|
|
|
|

|
|
|
|
|
|
### Step 3: The server reads the page from the disk
|
|
|
|
|
|
The server receives the **request message** and reads the **html file** from the **hard disk**
|
|
|
The server receives the **request message** and reads the **html file** from the **hard disk** to provide it to the client.
|
|
|
|
|
|

|
|
|
|
|
|
### Step 4: The server sends a response message
|
|
|
|
|
|
The server builds a **response message**, composed of different fields. The HTML contents are located in the end of the message
|
|
|
The server builds a **response message** composed of different fields. The HTML contents are located in the end of the message.
|
|
|
|
|
|

|
|
|
|
|
|
### Step 5: The browser renders the page on the screen
|
|
|
|
|
|
The client receive the **html content** and shows it on the screen
|
|
|
The client receives the **html content** and shows it to the user.
|
|
|
|
|
|

|
|
|
|
|
|
## HTTP messages
|
|
|
|
|
|
There are two types of messages in HTTP: **Request** and **response**. They both have the **same format**: They consist of **Lines in plain text** (strings) separated by the **special character** '\\n'
|
|
|
There are two types of messages in HTTP: **requests** and **responses**. They both have the **same format**: They consist of **lines in plain text** (strings) separated by the **special character** '\\n'
|
|
|
|
|
|
The lines are divided into two parts: the **heather** and the **body**. There is a **blank line** for separating both elements
|
|
|
The lines are divided into two parts: **header** and **body**. There is a **blank line** for separating both elements.
|
|
|
|
|
|
### Request messages
|
|
|
|
|
|
This is the **format** of the Request messages
|
|
|
This is the **format** of for request messages:
|
|
|
|
|
|

|
|
|
|
|
|
The **request line** is the most important part. Here is where the client tells the server the service it needs. Consist of **three parts** separated by one space:
|
|
|
The **request line** is the most important part. Here is where the client tells the server the service it needs. It consist of **three parts** separated by one space:
|
|
|
|
|
|
* **Method**: Command name. There are three: GET, POST, HEAD
|
|
|
* **GET**: Request an **object** to the server. The client wants the server to send it an object. The object id is given in the Path argument
|
|
|
* **POST**: The client wants to send data to the server. They are placed in the message body
|
|
|
* **HEAD**: Similar to GET, but only the object's headers are requested. It is used by the client to know if the object has been modified without having to transfer the whole object
|
|
|
* **Method**: Command name. There are three: GET, POST, and HEAD
|
|
|
* **GET**: Requests an **object** from the server. The object id is provided using the argument _path_
|
|
|
* **POST**: The client wants to send data to the server. These data are placed in the message body
|
|
|
* **HEAD**: Similar to GET, but only the object's header us requested. It is used by the client to know if the object has been modified without having to transfer the whole object
|
|
|
* **Path**: It is the name of the object that the client wants to get from the server, or the object which will receive the data the client is sending
|
|
|
* **Version**: the HTTP version used. The syntax is like this: **HTTP/x.y**, where x and y are integer numbers
|
|
|
|
... | ... | @@ -119,11 +119,11 @@ And this is an **example** of a real message: |
|
|
|
|
|

|
|
|
|
|
|
In this example, there is no body (it is empty)
|
|
|
In this example, there is no body (it is empty).
|
|
|
|
|
|
### Response messages
|
|
|
|
|
|
This is the **format** of the response message. It is the same than for the request message
|
|
|
This is the **format** of the response message. It is the same than the for the request message:
|
|
|
|
|
|

|
|
|
|
... | ... | @@ -148,13 +148,13 @@ This is an **example** of a response message: |
|
|
|
|
|
# Creating our first HTTP server
|
|
|
|
|
|
Let's create our **first HTTP server**, step by step, learning while doing
|
|
|
Let's create our **first HTTP server**, step by step, learning while doing.
|
|
|
|
|
|
## Starting point: The echo server
|
|
|
|
|
|
We start from a **simple server**, from the previous week, that just **receives the request message** and print it on the console: The **echo server**. It does no generates a response yet
|
|
|
We start from a **simple server** that just **receives a request message** and print it on the console: the **echo server** we have already developed. It does no generates a response yet.
|
|
|
|
|
|
**Create** the **Session 12 folder** and the **new** python file **echo-server.py**. Copy & paste the following code
|
|
|
**Create** the **S12 folder** and the **new** python file **echo-server.py**. Copy & paste the following code:
|
|
|
|
|
|
```python3
|
|
|
import socket
|
... | ... | @@ -209,21 +209,21 @@ while True: |
|
|
cs.close()
|
|
|
```
|
|
|
|
|
|
First, let's check that our **server** is **working fine**. From the **linux console** we send a message to the server using the **printf** and **nc commands**:
|
|
|
First, let's check that our server works as expected. From the **linux console**, let's send it a message the **echo** and **nc commands**:
|
|
|
|
|
|
```
|
|
|
printf "Hello!" | nc 127.0.0.1 8080
|
|
|
echo "Hello!" | nc 127.0.0.1 8080
|
|
|
```
|
|
|
|
|
|
We should see this message on the **server's console**, in **green** color
|
|
|
We should see this message on the **server's console**:
|
|
|
|
|
|

|
|
|
|
|
|
## Reading the browser's request message
|
|
|
|
|
|
**Internet browsers** (like Firefox or Chrome) speak the **HTTP protocol**. It means that they send a request message with the format we have already seen. Let's check it
|
|
|
**Internet browsers** (like Firefox, Chrome or any other) "speak" the **HTTP protocol**. It means that they send a request message with the format we have already seen. Let's check it.
|
|
|
|
|
|
Open a **new tab** in your **browser** and type it:
|
|
|
Open a **new tab** in your **browser** and type this:
|
|
|
|
|
|
```
|
|
|
http://127.0.0.1:8080/
|
... | ... | @@ -232,21 +232,21 @@ http://127.0.0.1:8080/ |
|
|
This is the **URL** of the **main page** of our server:
|
|
|
|
|
|
* **"http://"**: It means that we want to use the HTTP protocol
|
|
|
* **127.0.0.1**: Server's IP (in this case is the server in our local machine)
|
|
|
* **:8080**: The Server's Port. It is separated by the caracter **:** from the IP
|
|
|
* **127.0.0.1**: Server's IP (in this case, it is the server in our local machine)
|
|
|
* **:8080**: The server's port. It is separated by the caracter **:** from the IP address
|
|
|
* **/**: This slash indicate that we want to access the server's **main page**
|
|
|
|
|
|
In the **browser** we will see something like this:
|
|
|
|
|
|

|
|
|
|
|
|
As our server does NOT speak HTTP yet, the **browser** could **not** establish the connection with the web server. An error message is shown
|
|
|
This is because our server **does NOT speak HTTP yet**, and the browser could **not** establish the connection with the web server, so an error message is displayed.
|
|
|
|
|
|
But... our server **has received** the **request messages** from the **browser**. If we have a look at the **server's console**, we will see something like this:
|
|
|
But... our server **has received** the **request message** from the **browser**. If we have a look at the **server's console**, we will see something like this:
|
|
|
|
|
|

|
|
|
|
|
|
**Notice** that there appear **many** request messages (all the same). This is because we have not generate a response to the client's request messages. The browser **re-sends** the request messages many times, until there is a **timeout** and the browser writes an **error message**
|
|
|
**Notice** that there are **many** request messages (all the same). This is because we have not generated a response to the client's request messages so the browser **re-sends** the request message many times, until there is a **timeout** and the browser writes an **error message**.
|
|
|
|
|
|
This is the **request message** received from the browser:
|
|
|
|
... | ... | @@ -267,11 +267,11 @@ Have a look at the first line: |
|
|
GET / HTTP/1.1
|
|
|
```
|
|
|
|
|
|
The browser is asking our server for the **/** object. It means the \*\*main page. The HTTP version used is 1.1
|
|
|
The browser is asking our server for the **/** object. It means the \*\*main page. The HTTP version used is 1.1.
|
|
|
|
|
|
## Sending a simple response message
|
|
|
|
|
|
Let's modify our server for generating a **valid response message** in **HTTP format**. Use the file **Session-12/webserver1.py**
|
|
|
Let's modify our server to generate a **valid response message** in **HTTP format**. Use the file **S12/webserver1.py**.
|
|
|
|
|
|
Our **response message** should have the following format:
|
|
|
|
... | ... | @@ -282,11 +282,13 @@ HTTP/1.1 200 OK\n |
|
|
```
|
|
|
|
|
|
* The **header** should contain at least two elements:
|
|
|
* **Content-Type:** This is for indicating the type of content return by the server. It will be typically **text/html** (but can also be image/png in the case of sending back an image in png format)
|
|
|
* **Content-Length:** It indicates the **total length** of the information sent in the **body** of the response
|
|
|
* The **body** with the **contents** we are sending to the browser
|
|
|
* **Content-Type:** This is to indicate the type of content returned by the server. It will be typically **text/html** (but it can also be image/png in case of sending and image in png format)
|
|
|
* **Content-Length:** It indicates the **total length** of the information sent in the **body** of the response message
|
|
|
* The **body** with the **contents** the server is sending to the browser/client
|
|
|
|
|
|
In our server we will generate a **simple response** in which body we will store the string: "Hello from my first web server!"
|
|
|
In our server we will generate a **simple response** in which body will only contain the string: "Hello from my first web server!"
|
|
|
|
|
|
Here is the code:
|
|
|
|
|
|
```python
|
|
|
import socket
|
... | ... | @@ -372,11 +374,11 @@ while True: |
|
|
cs.close()
|
|
|
```
|
|
|
|
|
|
Run the server and connect with the browser again. Now we can see the **answer**. Our first **mini-web server** is working!!! :-)
|
|
|
Run the server and try to connect again from the browser. Now we can see the **answer**. Our first **mini-web server** is working!!! :-).
|
|
|
|
|
|

|
|
|
|
|
|
In the **server's console** in pycharm we see that there are **two request messages**:
|
|
|
In the **server's console**, in PyCharm, we see that there are **two request messages**:
|
|
|
|
|
|

|
|
|
|
... | ... | @@ -398,27 +400,27 @@ The **second request message** is this one: |
|
|
GET /favicon.ico HTTP/1.1
|
|
|
```
|
|
|
|
|
|
The server is asking for the resource **/favicon.ico**. The [favicon](https://en.wikipedia.org/wiki/Favicon) is a short image file that stores the **icon** of the webpage you are accessing. We are ignoring this request
|
|
|
The server is asking for the resource **/favicon.ico**. The [favicon](https://en.wikipedia.org/wiki/Favicon) is a short image file that stores the **icon** of the webpage you are accessing. We are ignoring this request.
|
|
|
|
|
|
## curl: Watching the http messages
|
|
|
|
|
|
The Linux command **curl** allow us to watch both the http **request** and **response messages**. Run the **web-server-1.py** and **execute** the following command on the **Linux Console**:
|
|
|
The Linux command **curl** allows us watching both the http **request** and the **response messages**. Run the **web-server-1.py** and **execute** run this command on the **Linux Console**:
|
|
|
|
|
|
```
|
|
|
curl 127.0.0.1:8080 -v
|
|
|
```
|
|
|
|
|
|
The messages that start with the **>** symbol are the **requests:** from the client to the server. The messages with the **<** symbol are the responses: from the server to the client
|
|
|
The messages that start with the **>** symbol are the **requests:**: from the client to the server. The messages with the **<** symbol are the responses: from the server to the client.
|
|
|
|
|
|

|
|
|
|
|
|
## Response with HTML contents
|
|
|
|
|
|
Let's response with our first web page written in [HTML](https://en.wikipedia.org/wiki/HTML). We know nothing about HTML yet. It is the **language** used for creating **web pages**,that describes the structure of the document
|
|
|
Let's make the server respond with our first web page written in [HTML](https://en.wikipedia.org/wiki/HTML). We know nothing about HTML yet. It is the **language** used for creating **web pages**,that describes the structure of the document.
|
|
|
|
|
|
In our server we are changing the contents. Instead of responding with a **string**, we will send a **message in HTML**. It is important to change the **Content-type** header from **text/plain** to **text/html** for indicating that we are sending HTML code instead of plain text
|
|
|
In our server we wil be changing the contents. Instead of responding with a **string**, we will send a **message in HTML** format. It is important to change the **Content-type** header from **text/plain** to **text/html** to indicate that we are sending HTML code instead of plain text (isn't it obvious?).
|
|
|
|
|
|
Write the following server in the **Sesion-12/web-server-2.py** file
|
|
|
Write the following server in the **S12/web-server-2.py** file:
|
|
|
|
|
|
```python
|
|
|
import socket
|
... | ... | @@ -531,7 +533,7 @@ We will see that now the \*\*\*\*Content-Type header of the response message is |
|
|
|
|
|
# HTML
|
|
|
|
|
|
[HTML](https://en.wikipedia.org/wiki/HTML) is a special language used for defining the **structure** and the contents of the **web pages**. It consist of **text** inside **tags**. There is always an **opening tag** and a **closing tag**. This is the HTML code for the green server we used in the previous example
|
|
|
[HTML](https://en.wikipedia.org/wiki/HTML) is a special (markup) language used to define the **structure** and the contents of the **web pages**. It consist of **text** within **tags**. There is always an **opening tag** and a **closing tag**. This is the HTML code for the green server we used in the previous example:
|
|
|
|
|
|
```html
|
|
|
<!DOCTYPE html>
|
... | ... | @@ -547,15 +549,15 @@ We will see that now the \*\*\*\*Content-Type header of the response message is |
|
|
</html>
|
|
|
```
|
|
|
|
|
|
* **HTML documents** should always start with the special tag: </li>
|
|
|
* The rest of the html code is inside the and tags
|
|
|
* **HTML documents** should always start with the special tag: <!DOCTYPE html>
|
|
|
* The rest of the html code is within the boundaries of the **<html>** and **</html>** tags. The attributes specified along with the **<html>** tag specify, in this case, that we are working in English, reading from left to right
|
|
|
* Every html document consist of two parts: the **head** and the **body**
|
|
|
* The **head** contains information for the browser, about the document
|
|
|
* The **head** contains information for the browser, about the document it is receiving
|
|
|
* The actual content is located in the **body**
|
|
|
* In this example there are two elements inside the **body**:
|
|
|
* The **heading**: GREEN SERVER. It is a bigger text
|
|
|
* A **paragraph**: "I am the green server"
|
|
|
* The **background** color of the elements in the body is set inside the **style attribute**
|
|
|
* The **background** color of the elements in the body, that is set with the **style attribute**
|
|
|
* You can learn more about html following this [tutorials from the w3school](https://www.w3schools.com/html/)
|
|
|
* You also can learn more HTML in this [notes that I prepared for the CSAAI subject](https://github.com/myTeachingURJC/2019-2020-CSAAI/wiki/S2:-HTML) (in spanish)
|
|
|
|
... | ... | |