Reverse Diary 0x1 - ming5ming

Recently, I've been watching "The Daily Life of a Female High School Student", but the video quality on Bilibili is heavily compressed, so I found a parsing website. As a result, I discovered that the website's JavaScript code was obfuscated, so I decided to reverse engineer it.

Reverse Engineering Target#

aHR0cHM6Ly9qeC5wbGF5ZXJqeS5jb20v

Analysis#

Pasted image 20230214195931

It can be seen that the video has been loaded, but there is no request related to the video. Pasted image 20230214200144

The response data for these requests cannot be loaded, but it can be inferred that they should be video segments. After opening the file, it is displayed as follows:

Pasted image 20230214200551

So, the approximate request process is:

request

The M3U8 file is a text file that contains the request URL for video clips. With it, we can obtain the URL of the entire video file.

Tracing back, I found the request that accepts the M3U8 file:

Pasted image 20230214201749
Pasted image 20230214201851

https:// 省略 / 1676376933/4d140af4cb5d1c5466a7491918b43e5b/b3489f29121e97e6dd1dcb1957e5788c-20230214.m3u8?from=https://banyung.pw

The composition of the request URL is:
https://domain/timestamp/unknown/unknown.m3u8 + fixed parameters

If you're not careful here, it's easy to set an XHR breakpoint to trace the JavaScript code. I fell into this trap myself at the beginning of my analysis, going back and forth in the complex code and various encryption methods.

Pasted image 20230214202531

However, we don't need to construct this request. It is accepted by the previous request and even transmitted in plain text!

Pasted image 20230214202715

Let's take a look at the payload of this request:

Pasted image 20230214202821

Two parameters, time and key. Time is a timestamp, and key seems to be an MD5 ciphertext. After reverse engineering the JavaScript code, it turns out to be CBC encryption. However, what makes me speechless is that just when I finished analyzing the JavaScript code and started simulating the request using Python, I found that the key and time were transmitted in plain text in the previous request again!!!

Pasted image 20230214203247

Writing Code#

#getPram.py
import time
import requests as rq
from lxml import etree
from lxml import html

headers = {

    "referer": "https://jx.playerjy.com/",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
    "Cache-Control": "no-cache",
    "Host": "jy.we-vip.com:5433"
}

def getHtml(ep):
    url = "https://jy.we-vip.com:5433/?url=" + ep
    return rq.get(url=url, headers=headers, verify=False).content 
    
# get time and key
def getData(bili_url):
    htmlString = html.fromstring(getHtml(bili_url))
    result = htmlString.xpath('//body/script/text()')
    #gettime
    timebegin = result[0].index('time') + 8
    timeend = int(str(result[0]).index('",', timebegin))
    time = result[0][timebegin:timeend]
    #getkey
    keybegin = result[0].index('key') + 7
    keyend = int(str(result[0]).index('",', keybegin))
    key = result[0][keybegin:keyend]
    return time, key 

if __name__ == "__main__":
    time, key = getData()
    print(f"time is {time}, key is {key}")

import requests as rq
import getPram

time, key = getPram.getData(input("bilibili_url:"))

headers = {
    "authority" : "jy.we-vip.com:5433",
    "accept" : "application/json, text/javascript, */*; q=0.01",
    "content-type" : "application/x-www-form-urlencoded; charset=UTF-8",
    "user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
    "Host" : "jy.we-vip.com:5433",
    "Content-Length": "115",
    "Cache-Control": "no-cache"
}

url = r"https://jy.we-vip.com:5433/API.php"
data = {
    "url": "https://www.bilibili.com/bangumi/play/ep276690",
    "time": time,
    "key": key
}
rsp = rq.post(url, headers=headers, data=data, verify=False)
print(rsp.content)

Reflection#

The analysis approach was like relying on a defense, going round and round without properly analyzing the request process, wasting a lot of time. Reverse engineering is like a high school math problem, starting with only a vague direction and requiring patience to trace back and track.
The code was written based on the defense, and the logic is not very clear. There are many ways to extract substrings, but I chose the simplest but least readable method...