The HTTPX library provides important features available in the Requests library as well as support for both HTTP/1.1 and HTTP/2. You can have a look at HTTPX's website to know what features HTTPX has in its pocket. In this guide, you will explore the following HTTPX topics with practical examples:
All of the code in this guide assumes that the HTTPX library has been installed and successfully imported in your current working environment as import httpx
.
Suppose you want to fetch the stock stats of a company from the Yahoo! Finance website. The associated URL will be https://finance.yahoo.com/quote/ABXY/key-statistics?p=ABXY where "ABXY" is the company's registered abbreviation. Notice that the URL consists of a key-value pair where the key is "p" and the value is "ABXY". So if you want to fetch stock stats for another company, all you need to do is replace "ABXY" with your required company's registered abbreviation.
To learn this concept practically, you can fetch the stock stats of PS (NASDAQ: Pluralsight). Start by forming a dictionary as shown:
1eq = {'p': 'PS'}
And finally, use httpx.get
method and provide the complete URL including the eq
variable:
1r = httpx.get('https://finance.yahoo.com/quote/'+eq['p']+'/key-statistics', params=eq)
You can view the resulted URL using url
attribute as shown:
1print(r.url)
2# https://finance.yahoo.com/quote/PS/key-statistics?p=PS
If you want to let the client and server pass additional data with an HTTP request, you can set custom HTTP headers using HTTPX library. For instance, consider YouTube whose headers can be extracted as shown:
1url = 'https://youtube.com'
2r = httpx.get(url)
3print(r.headers)
The result of the above code is the following YouTube headers:
1Headers([('x-frame-options', 'SAMEORIGIN'), ('strict-transport-security', 'max-age=31536000'), ('expires', 'Tue, 27 Apr 1971 19:44:06 GMT'), ('x-content-type-options', 'nosniff'), ('content-encoding', 'gzip'), ('p3p', 'CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en-GB for more info."'), ('content-type', 'text/html; charset=utf-8'), ('cache-control', 'no-cache'), ('date', 'Sat, 27 Jun 2020 07:17:08 GMT'), ('server', 'YouTube Frontend Proxy'), ('x-xss-protection', '0'), ('set-cookie', 'YSC=QxCnrEi_49U; path=/; domain=.youtube.com; secure; httponly; samesite=None'), ('set-cookie', 'VISITOR_INFO1_LIVE=L4qLJ2-3U18; path=/; domain=.youtube.com; secure; expires=Thu, 24-Dec-2020 07:17:08 GMT; httponly; samesite=None'), ('set-cookie', 'GPS=1; path=/; domain=.youtube.com; expires=Sat, 27-Jun-2020 07:47:08 GMT'), ('alt-svc', 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"'), ('transfer-encoding', 'chunked')])
Notice, the default value of the content-encoding
header is gzip
. What if you want to go with any other alternative like br
?
To update the value of a header, store the new value in a dictionary and pass the dictionary to the headers
argument of the get
method as shown:
1headers = {'accept-encoding': 'br'}
2print(httpx.get(url, headers=headers).headers) # The updated content encoding is br
This updated header is shown below:
1Headers([
2...
3('content-encoding', 'br'),
4...
You can use the HTTPX library to fetch text, image, and JSON data. In this section, you will learn to fetch text and image data both small and large in size. For larger data you can perform streaming, which will be explained further in this section.
Suppose you want to fetch data in the HTML format from a webpage that consists of not more than 5-10 lines of text as shown in the image:
To do so, pass the URL inside the httpx.get
method and later use the text
attribute as shown:
1# https://archive.ics.uci.edu/ml/datasets/Container+Crane+Controller+Data+Set
2fetch = httpx.get('https://archive.ics.uci.edu/ml/machine-learning-databases/00436/')
3print(fetch.text)
Output:
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<html>
3 <head>
4 <title>Index of /ml/machine-learning-databases/00436</title>
5 </head>
6 <body>
7<h1>Index of /ml/machine-learning-databases/00436</h1>
8<ul><li><a href="/ml/machine-learning-databases/"> Parent Directory</a></li>
9<li><a href="Container_Crane_Controller_Data_Set.csv"> Container_Crane_Controller_Data_Set.csv</a></li>
10</ul>
11<address>Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips SVN/1.7.14 Phusion_Passenger/4.0.53 mod_perl/2.0.11 Perl/v5.16.3 Server at archive.ics.uci.edu Port 443</address>
12</body></html>
To fetch large text data, you can perform streaming, i.e. fetch a limited amount of data at a time until you reach the end of the page. To understand this topic, suppose you want to extract all the text data from the Wikipedia page about Ghostbusters. To do so, you could fetch data line by line inside a for
loop that is nested inside the httpx.stream
method as shown:
1with httpx.stream("GET", "https://en.wikipedia.org/wiki/Ghostbusters") as r:
2 count = 0
3 for line in r.iter_lines():
4# print(line)
5 count += 1
6print(count)
7# 1166
In the above code, the print(line)
statement is commented. You can uncomment it and observe the output as it streams. The last line print(count)
describes the total number of lines (1166) fetched during the stream.
The process of fetching an image data is similar to that of a text data. However, the only difference is how you read it once you get it in your Python environment. Consider that you need to fetch a penguin image from the Pixabay website. To do so, pass the image URL inside the httpx.get
method and later use PIL
and io
methods to read and display the data as shown:
1from PIL import Image
2from io import BytesIO
3
4# https://pixabay.com/photos/penguin-figure-christmas-santa-hat-1843544/
5g = httpx.get('https://cdn.pixabay.com/photo/2016/11/20/19/02/penguin-1843544_1280.jpg')
6Image.open(BytesIO(g.content)).show()
Output image
You can use the httpx.stream
method to fetch large image data. The only difference from that of streaming text data is that this time the for
loop iterates over iter_bytes
as shown:
1with httpx.stream("GET", "https://cdn.pixabay.com/photo/2016/11/20/19/02/penguin-1843544_1280.jpg") as r:
2 for data in r.iter_bytes():
3 print(data)
HTTPX can be used to post data inside a form available on a webpage. In this section, you will learn how to upload text and file data to a webpage.
Suppose you want to write "Hello, World!" on a webpage using HTTPX. The input area would look like the one as shown here in the green color:
To do so, use the post
method and provide the information inside the data
argument as shown:
1data = {'htmlString': 'Hello, World!'}
2 # Locate a correct node, for me it is "htmlString"
3r = httpx.post("<URL>", data=data)
4print(str(r.content).replace('\\n','').replace('\\t','').replace('\\r',''))
The above code results in an HTML output. Since the output is too big, given below is the properly formatted webpage with the "Hello, World!" string in the desired box:
There are many websites where you have to upload a file on their server. Here's one instance as highlighted in the green box:
To upload a complete file rather than just writing a string, you can use the files
argument of the post
method. Assume you have a HTML file upload.html
stored on your drive. You can upload it on this webpage using the given code:
1file = {'htmlFile': open('upload.html', 'rb')}
2 # Locate a correct node, for me it is "htmlFile"
3r = httpx.post("<URL>", files=file)
4print(str(r.content).replace('\\n','').replace('\\t','').replace('\\r',''))
You can verify that the data has been successfully uploaded by checking the console result.
Have you ever noticed that when you type "https://msn.com" in a browser, it may not necessarily retain its URL as-it-is? There are good chances that you may be redirected based on the geographical region from where you are trying to access the website. For instance, run the following code and check your output:
1r = httpx.get('https://msn.com')
2print(r.url)
For me it is https://www.msn.com/en-in/
, but for you, the last two characters may differ. But, the point is the original URL has been changed and you are redirected to a new one. However, the good news is that you can control whether redirection should take place or not. If you don't need a redirection from your original URL then use allow_redirects=False
inside the get
method as shown:
1r = httpx.get('https://msn.com', allow_redirects=False)
2print(r.url)
3# https://msn.com
Also, you can control how long should it take for a website to load before the program throws a timeout error. You can specify custom timeout value by using the timeout
parameter as shown:
1r = httpx.get('https://msn.com', allow_redirects=False, timeout=1)
2print(r.url)
3# https://msn.com
In the above code, if the URL is not reached before one second, the program will result in a ConnectError
.
You have now learned the basics of HTTPX library by exploring its get
, post
, and stream
methods.