春江暮客

春江暮客的个人学习分享网站

python3 requests module usage examples

2018-12-28 Technology
python3 requests module usage examples

Network requests in Python 3 are much more pleasant than they used to be, and requests is still one of the most practical entry points for everyday HTTP scripting. For many common tasks such as GET requests, POST data, downloading files, sending headers, or reusing cookies, it covers almost everything beginners need first.

This article walks through those common cases with small examples and adds a few habits that are useful in real scripts.

requests is not installed by default in python, you can install it with pip install requests. Below are usage examples of the requests module.

“python3 requests module usage examples”

GET fetch webpage content

import requests
r = requests.get("https://www.bobobk.com/vip_parse")
print("Status code:\n%d" % r.status_code)  # status code
print(r.headers['content-type']) # header encoding
print("Webpage content:\n%s" % r.content.decode("utf8")[:100]) # content returns raw bytes, decoded here as utf8
print("Webpage content:\n%s" % r.text[:100]) # text returns str type

A more practical version usually also adds:

r.raise_for_status()

That way, HTTP errors such as 404 or 500 are raised immediately instead of letting the script continue silently with a bad response.

What if the URL has parameters? It’s very simple. For convenience, requests lets you pass parameters using params={"url":"https://www.bobobk.com","id":"1"}
Of course, you can still write a long URL manually if you want, no problem.

POST data

Posting data is also very simple, use the data parameter with a dictionary containing the data to post.

import requests
r = requests.post('https://httpbin.org/post', data = {'key':'value'})

Add headers

Sometimes websites restrict user agents (UA). The default UA used by requests includes “requests” in it, so if you want to modify headers, do this:

import requests
headers = {'user-agent': 'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10'}
url="https://httpbin.org/get"
r = requests.get(url, headers=headers, timeout=5)
# or set timeout for delay

Basic authentication

If the website uses basic authentication, just add the auth parameter.

r = requests.get(url, headers=headers, timeout=5, auth=HTTPBasicAuth('username', 'password'))
# Since HTTPBasicAuth is common, python allows you to pass a tuple directly:
r = requests.get(url, headers=headers, timeout=5, auth=('username', 'password'))

GET download file

r = requests.get('https://www.bobobk.com/wp-content/uploads/2018/12/wizard.webp')
f = open('download.webp', 'wb')
for chunk in r.iter_content(chunk_size=512 * 1024): 
    if chunk: 
        f.write(chunk)
f.close()

This method supports downloading large files.

POST file

You can also post files directly by adding the files parameter, like this:

url = 'https://httpbin.org/post'
files = {'file': ('myfile.xls', open('myfile.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
r = requests.post(url, files=files)

Just specify the cookies parameter directly:

url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies={"username":"bobobk"})
# If the webpage response contains cookies, you can also easily get cookies:
r.cookies

When Repeated Requests Are Involved, Consider Session

If you are making multiple requests to the same site, especially with shared headers, login state, or cookies, requests.Session() is often cleaner:

session = requests.Session()
session.headers.update({"user-agent": "Mozilla/5.0"})
response = session.get("https://httpbin.org/get")

The advantage is that connections and cookies can be reused automatically across requests.

References:

  1. http://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

友情链接

其它