Author avatar

Pavneet Singh

Web Requests with Python

Pavneet Singh

  • Nov 26, 2018
  • 10 Min read
  • 238,693 Views
  • Nov 26, 2018
  • 10 Min read
  • 238,693 Views
Python

Introduction

The Internet is an enormous source of data and, often, websites will offer a RESTful API endpoints (URLs, URIs) to share data via HTTP requests. HTTP requests are composed of methods like GET, POST, PUT, DELETE, etc. to manipulate and access resources or data. Often, websites require a registration process to access RESTful APIs or offer no API at all. So, to simplify the process, we can also download the data as raw text and format it. For instance, downloading content from a personal blog or profile information of a GitHub user without any registration. This guide will explain the process of making web requests in python using Requests package and its various features.

Prerequisites

  1. Python setup: Download and install the python setup from python.org or you can run python in browser with jupyter notebook.
  2. Request Package: Use python package manager (pip) command in the terminal (command prompt) to install packages.
1pip3 install requests
python

Use pip for python 2 (until python 3.4). Python also offers Virtualenv to manage the dependencies and development environments separately, across multiple applications.

Making a Get Request

In order to make a REST call, the first step is to import the python requests module in the current environment.

1import requests                                 # To use request package in current program 
2response = requests.get("www.dummyurl.com")     # To execute get request 
python

Python also provides a way to create alliances using the as keyword.

1import requests as reqs 
2response = reqs.get('https://www.google.com') 
python

To make the first request, we will be using JSONPlaceholder API which provides JSON response for specific item like posts, todos, and albums. So, the /todos/1 API will respond with the details of a TODO item.

1url = 'https://jsonplaceholder.typicode.com/todos/1' 
2response = requests.get(url)        # To execute get request 
3print(response.status_code)     # To print http response code  
4print(response.text)            # To print formatted JSON response 
python

The execution of above snippet will provide the result:

1200 
2{ 
3  "userId": 1, 
4  "id": 1, 
5  "title": "delectus aut autem", 
6  "completed": false 
7} 
json

The status code 200 means a successful execution of request and response.content will return the actual JSON response of a TODO item.

There are many public APIs available to test REST calls. You can also use Postman Echo or mocky to return customized responses and headers as well as adding a delay to the generated dummy link.

POST Request

Post requests are more secure because they can carry data in an encrypted form as a message body. Whereas GET requests append the parameters in the URL, which is also visible in the browser history, SSL/TLS and HTTPS connections encrypt the GET parameters as well. If you are not using HTTPs or SSL/TSL connections, then POST requests are the preference for security.
A dictionary object can be used to send the data, as a key-value pair, as a second parameter to the post method.

1data = {'title':'Python Requests','body':'Requests are awesome','userId':1} 
2response = requests.post('https://jsonplaceholder.typicode.com/posts', data) 
3print(response.status_code) 
4print(response.text) 
python

This dummy post request will return the attached data as response body:

1201 
2{ 
3  "title": "Python Requests", 
4  "body": "Requests are awesome", 
5  "userId": "1", 
6  "id": 101 
7} 
json

POST requests have no restriction on data length, so they’re more suitable for files and images. Whereas GET requests have a limit of 2 kilobytes (some servers can handle 64 KB data) and GET only allows ASCII values.

Just like post, requests also support other methods like put, delete, etc. Any request can be sent without any data and can define empty placeholder names to enhance code clarity.

1response = req.post('https://jsonplaceholder.typicode.com/posts', data = None, json = dictionaryObject) 
2print(response.json())      # output: {'id': 101} 
python

In this case where data is set as None, this can be skipped because it happened automatically due to default values.

Response Types

The response object can be parsed as string, bytes, JSON, or raw as:

1print(response.content)           # To print response bytes 
2print(response.text)              # To print unicode response string 
3jsonRes = response.json()         # To get response dictionary as JSON 
4print(jsonRes['title'] , jsonRes['body'], sep = ' : ')  # output: Python Requests : Requests are awesome 
python

Reading the response as a raw value allows us to read specific number of bytes and to enable this, set stream = True as a parameter in the request method.

1data = {'title':'Pyton Requests','body':'Requests are qwesome','userId':1} 
2response = req.post('https://jsonplaceholder.typicode.com/posts', data, stream = True) 
3print(response.raw.read(30))     # output: b'{\n  "title": "Python Requests"' 
python

To enable stream, the stream placeholder has to be mentioned specifically because it is not a required argument.

You can also use iter_content method which automatically decodes gzip files.

1response.iter_content(chunk_size=1024) 
python

Authentication

The process of authentication is required by many APIs to allow access to user specific details. Requests support various types of authentication, such as:

  • Basic Auth: This transfers the authentication details as base64 encoding (text as bytes), meaning there is no encryption and security. It is suitable for HTTPs or SSL/TSL enabled connections where security is inbuilt.
1# Open github API to test authentication 
2from requests.auth import HTTPBasicAuth     
3requests.get('https://api.github.com/user', auth=HTTPBasicAuth('userName', 'password')) 
4 
5# or shortcut method 
6requests.get('https://api.github.com/user', auth=('user', 'pass'))  
python
  • Digest Auth: This transfers the credentials in an encrypted form by applying a hash function on credentials, HTTP method, nonce (one-time number, provided by server), and the requested URI. Hence, it is more secured while making HTTP calls.
1from requests.auth import HTTPDigestAuth 
2response = reqs.get('https://postman-echo.com/digest-auth', auth=HTTPDigestAuth('postman', 
3'password')) 
python

Digest Auth can still be hacked and HTTPs or SSL/TSL security should be preferred over digest authentication.

Headers

A header contains information about the client (type of browser), server, accepted response type, IP address, etc. Headers can be customized for the source browser (user-agent) and content-type. They can be viewed using headers property as:

1headers = {'user-agent': 'customize header string', 'Content-Type': 'application/json; charset=utf-8'}  
2response = requests.get(url, headers=headers)   # modify request headers 
3print(response.headers)                         # print response headers 
4print(response.headers['Content-Type'])         # output: application/json; charset=utf-8 
python

Cookies

Cookies are small pieces of data stored on the client (browser) side and are often used to maintain a login session or to store user IDs. Both the client and server can send cookies. Use the cookies property to send and access cookies.

1cookie = {'username':'Pavneet'} 
2response = reqs.get('https://postman-echo.com/cookies/set',cookies = cookie)   # send cookie 
3print(response.text)    # output: {"cookies":{"username":"Pavneet"}} 
python

Use response.cookies to access the cookies from server response

Timeout and Redirection

  • Timeout: It allows requests to terminate any request, if there is no response within the set timeout duration. This will avoid any indefinite waiting state, in case there's no response from server.
1requests.get('https://github.com/', timeout=0.50) 
python
  • Redirection: Requests provide the url property to track redirected URLs.
1response = requests.get('http://github.com/', allow_redirects=True) 
2response.url
python

To disable redirection, set the allow_redirects parameter to False. By default it is set toTrue.

Key Points

  • Sending sensitive data, such as password, over GET requests with HTTPs or SSL/TSL is considered very poor practice. While it cannot be intercepted, the data would be logged in serverlogs as plain text on the receiving HTTPS server and quite possibly also in browser history. It is probably also available to browser plugins and, possibly, other applications on the client computer.

  • Lists of other supported parameters like proxies, cert, and verify are supported by Requests.

  • Always mention specific exceptions first over general exceptions, to catch any specific exception:
1try: 
2    response = requests.get(url,timeout=3) 
3    response.raise_for_status()                 # Raise error in case of failure 
4except requests.exceptions.HTTPError as httpErr: 
5    print ("Http Error:",httpErr) 
6except requests.exceptions.ConnectionError as connErr: 
7    print ("Error Connecting:",connErr) 
8except requests.exceptions.Timeout as timeOutErr: 
9    print ("Timeout Error:",timeOutErr) 
10except requests.exceptions.RequestException as reqErr: 
11    print ("Something Else:",reqErr) 
python