Concurrent Downloads - Bash (xargs, parallel) Vs Python (ThreadPoolExecutor)

I just found one more free telugu book Graded readings in modern literary Telugu by Golla Narayanaswami Reddy and Dan M Matson in Digital South Asia Library.
Unfortunately they didn't provide it as an ebook but as a set of 221 tif images.
I wrote a simple for loop in shell which downloaded all images one by one using wget.
$ base_url="http://dsal.uchicago.edu"
$ url="$base_url/digbooks/images/PL4775.R4_1967/PL4775.R4_1967_%03g.gif"
$ time -p sh -c 'for i in $(seq -f $url 1 221); do; wget $i; done;'
I took 375 seconds for that. This was too slow. So I tried to download them parallelly using xargs.
$ time echo $(seq -f $url 1 221) | xargs -n 1 -P 36 wget
My laptop has a quad core processor. So I tried with 20, 24, 28, 32 process at a time.
With wget+xargs, the best timing is 13 seconds (CPU: 15%, Process: 28).
Again I tried downloading them parallelly but with GNU parallel.
$ time seq -f $url 1 221 | parallel -j36 wget {}
With wget+parallel, the best timing is 12 seconds (CPU: 48%, Process: 24).
Here is cpu consumption and time taken at each step.
paralle_python_bash2
Once I have done with bash, I decided to try the same things with Python and see how it goes.
I wrote a simple script using requests to download images.
import shutil
import sys
from concurrent import futures

import requests


def download_image(url):
    r = requests.get(url)
    file_name = url.split('/')[-1]
    with open(file_name, 'wb') as fh:
        fh.write(r.content)


base_url = 'http://dsal.uchicago.edu'
book_url = base_url + '/digbooks/images/PL4775.R4_1967/PL4775.R4_1967_{}.gif'
urls = [book_url.format(str(i).zfill(3)) for i in range(1, 221)]

def download_serially():
    for url in urls:
        download_image(url)

download_serially()
This took 244 seconds.
To download images parallelly, I have used Threadpoolexecutor from concurrent module.
def download_parallely():
    workers = int(sys.argv[1])

    with futures.ThreadPoolExecutor(max_workers=workers) as executor:
        result = executor.map(download_image, urls)

download_parallely()
I used previous script but just added one more function which queues tasks. Then I have executed the script with several options.
$ time python down.py 28
Threadpoolexecutor documentation uses 5 times number of processors as max_workers by default. I tried same options which I have used for bash. Here is the overall comparision.
With requests+ThreadPoolExecutor, the best timing is 12 seconds (CPU: 36%, Process: 28).
Here is the overall comparision.
paralle_python_bash
For a simple concurrent download, xargs+wget seems to be the best option.