In the era of short videos and live streaming, video formats are often no longer traditional ones like mp4 or mkv, but streaming formats such as m3u8. If you want to download your favorite videos, special handling is needed. This article provides a detailed example of how to download m3u8 streaming data using Python and embed it into an AMP webpage for playback. Let’s dive into the full process.

1. What is m3u8?

The M3U8 format is based on HTTP Live Streaming (HLS) technology, which splits the entire video into a series of small files, then creates an index file (.m3u8). Through this index file, streaming media can be loaded. This means that when playing a video, the client only needs to load the index file and then download the small video files one by one according to the index file for playback, enabling streaming playback while downloading.

2. Downloading all video files from an m3u8 file

First, we need to obtain the m3u8 file and parse the URLs of the video segments (.TS files) inside it. To handle network requests, we use Python’s requests library. Here is a snippet of code:

import m3u8
import requests
import os

def check_path(path):
    if not os.path.exists(path):
        os.makedirs(path)

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0"
}

domain = 'www.bobobk.com'

def download_m3u8(m3u8url):
    print(m3u8url)
    playlist_filename = os.path.join(*m3u8url.split('/')[3:])
    if os.path.exists(playlist_filename):
        pass
    check_path(os.path.dirname(playlist_filename))

    host = m3u8url.split("://")[1].split('/')[0]
    baseurl, m3u8url = m3u8url.split("://")
    if not os.path.exists(playlist_filename):
        text = requests.get(m3u8url, headers=headers).text
        open(playlist_filename, 'w').write(text.replace(host, domain))

    m3u8_obj = m3u8.load(playlist_filename)
    print(f"playlist:{playlist_filename}")
    baseurl += "://" + m3u8url.split("/")[0]
    print("start check playlist")
    if len(m3u8_obj.playlists) != 0:
        m3u8url = m3u8_obj.playlists[0].uri
        if '://' not in m3u8url:
            if m3u8url[0] == '/':
                m3u8url = baseurl + m3u8_obj.playlists[0].uri
            else:
                m3u8url = baseurl + '/' + os.path.dirname(playlist_filename)+ '/' + m3u8_obj.playlists[0].uri
        print(m3u8url)
        text = requests.get(m3u8url, headers=headers).text
        playlist_filename = os.path.join(*m3u8url.split('/')[3:])
        check_path(os.path.dirname(playlist_filename))
        open(playlist_filename, 'w').write(text.replace(host, domain))
        m3u8_obj = m3u8.load(playlist_filename)
        baseurl, m3u8url = m3u8url.split("://")
        host = m3u8url.split('/')[0]
        print(f"playlist:{playlist_filename}")

    output = os.path.join(*m3u8_obj.segments[0].absolute_uri.split('/')[3:])
    check_path(os.path.dirname(output))
    print("check subdir")
    for segment in m3u8_obj.segments:
        segment_url = segment.absolute_uri
        segment_file_name = os.path.join(*segment_url.split('/')[3:])
        r = requests.get(segment_url, headers=headers, stream=True)
        print(f"{segment_url}t{segment_file_name}")
        if os.path.exists(segment_file_name):
            continue

        with open(segment_file_name, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)


m3u8url = "https://pptv.sd-play.com/202308/01/PfdfHWiJa43/video/index.m3u8"
download_m3u8(m3u8url)

3. Embedding m3u8 video into an AMP webpage

AMP (Accelerated Mobile Pages) is an open-source framework optimized for mobile devices. Using this framework, you can create fast and visually appealing web applications. It standardizes but also flexibly defines HTML usage, making responsive web development easier.

Directly embedding m3u8 videos in AMP has a limitation: AMP currently does not support direct playback of m3u8 files. Therefore, we need to convert the video into MP4 format before embedding it into the AMP webpage.

We can use the ffmpeg tool to merge .ts files and convert them into .mp4 files. Here is the code:

for tsfile in "202308/01/PfdfHWiJa43/video/900k_0X480_64k_25/hls/"*.ts;do
echo "file '${tsfile}'" >> tsfile.txt
done

ffmpeg -f concat -safe 0 -i tsfile.txt -c copy 202308/01/PfdfHWiJa43/video/900k_0X480_64k_25/hls/202308.mp4

Then, we can use the amp-video component in AMP webpages to play the mp4 video:

https://www.bobobk.com/m3u8.html

<!doctype html> 
<html amp> 
<head> 
<meta charset="utf-8"> 
<script async src="https://cdn.ampproject.org/v0.js"></script> 
<title>Downloading m3u8 Streaming Media Using Python and Embedding into AMP Webpages (amp video page)</title> 
<link rel="canonical" href="self.html"> 
<meta name="viewport" content="width=device-width,minimum-scale=1,initial-scale=1"> 
<style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style>
<noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript> 
<script async custom-element="amp-video" src="https://cdn.ampproject.org/v0/amp-video-0.1.js"></script> 
</head> 
<body> 
<amp-video controls width="640" height="360" layout="responsive" > <source src="/202308/01/PfdfHWiJa43/video/900k_0X480_64k_25/hls/202308.mp4" type="video/mp4"> </amp-video> 
</body> 
</html>

4. Conclusion

This article provided a detailed walkthrough of downloading m3u8 streaming media using Python and embedding it into an AMP webpage. I hope it helps you. In practical use, please ensure you fully comply with the streaming service’s terms and respect relevant copyright laws.