Update http URLs from a HTML via script

Rafid · December 7, 2022, 3:16pm

I'm trying to replace URLs of my conf file from a HTML file because sometimes the URLs get updated/changed. I made a simple script that can fetch the HTML & update/replace the URLs in my conf file. The HTML file gets changed by the server admin where I don't have any access. I can only visit the up-to-date site by using the below Html Location & update my conf file manually.

Html Location: https://10.10.10.1

Part of HTML file:

<li><a href="#" class="dropdown-toggle hvr-bounce-to-bottom" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Movies<span class="caret"></span></a>
									<ul class="dropdown-menu">
										
<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.7/MY-FTP-2/English%20Movies/">English Movies</a></li>
<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.8/MY-FTP-1/English%20Movies%20%281080p%29/">English Movies -1080p </a></li>
										
										<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.9/MY-FTP-1/Hindi%20Movies/">Hindi Movies</a></li>
										<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.8/MY-FTP-1/SOUTH%20INDIAN%20MOVIES/Hindi%20Dubbed/">South-Movie Hindi Dubbed</a></li>
										<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.10/MY-FTP-3/Animation%20Movies/">Animation Movies</a></li>
										<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.10/MY-FTP-3/Animation%20Movies%20%281080p%29/">Animation Movies -1080p</a></li>

After getting the HTML, it will replace/update the links in my rclone.conf file.

rclone.conf file preview:

[Hindi Movies]
type = http
url = http://10.10.10.9/MY-FTP-1/Hindi%20Movies/

[English Movies]
type = http
url = http://10.10.10.7/MY-FTP-2/English%20Movies/

[English Movies -1080p]
type = http
url = http://10.10.10.9/MY-FTP-1/English%20Movies%20%281080p%29/

[South-Movie Hindi Dubbed]
type = http
url = http://10.10.10.9/MY-FTP-1/SOUTH%20INDIAN%20MOVIES/Hindi%20Dubbed/

[Animation Movies]
type = http
url = http://10.10.10.10/MY-FTP-3/Animation%20Movies/

[Animation Movies -1080p]
type = http
url = http://10.10.10.10/MY-FTP-3/Animation%20Movies%20%281080p%29/

So I've written a noob script that will start the work but it seems it's giving me an error !

import re
import requests
from bs4 import BeautifulSoup

# Fetch the HTML from the website
html = requests.get("http://10.10.10.1/")

# Parse the HTML
soup = BeautifulSoup(html.text, 'html.parser')

# The location of the rclone.conf file
rclone_conf_file = '/home/user/tmp/rclone.conf'

# Open the rclone.conf file
with open(rclone_conf_file, 'r') as f:
    # Read the file into a list of lines
    lines = f.readlines()

# Iterate over the <a> tags in the HTML
for a in soup.find_all('a'):
    # Get the text of the <a> tag (e.g. "Hindi Movies")
    section_name = a.text.strip().lower()

    # Check if the section name exists in the rclone.conf file
    if any(section_name in line.lower() for line in lines):
        # Get the URL of the <a> tag
        new_url = a['href']

        # Use a regular expression to match the URL in the rclone.conf file
        regex = r'^(\[%s\]\n.*\n.*http.*)' % re.escape(section_name)

        # Update the URL in the rclone.conf file
        for i, line in enumerate(lines):
            if section_name in line.lower():
                print(lines[i])  # <-- Add this line
                lines[i] = re.sub(regex, r'\1', line, flags=re.IGNORECASE)
                lines[i] = lines[i].replace(lines[i].split()[2], new_url)

# Open the rclone.conf file for writing
with open(rclone_conf_file, 'w') as f:
    # Write the updated lines to the file
    for line in lines:
        f.write(line)

The error it's showing:

File "/home/plex/tmp/script.py", line 37, in <module>
    lines[i] = lines[i].replace(lines[i].split()[2], new_url)
IndexError: list index out of range

Have A Octotastic Day !!

Ole · December 7, 2022, 8:40pm

Hi Rafid,

The error message:

tells you that one of the list/array indexes refer to a element that doesn't exist.

It could be be that i in lines[i] is out of range - which is less likely when I read the code.

It could also be be that 2 in .split()[2] is out of range - that is that the content of lines[i] doesn't split as you expect - which is more likely.

You can find out by adding some extra (debug) print statements just before the error, that is something like this:

print(i)
print(lines[i])
print(lines[i].split())
print(lines[i].split()[0])
print(lines[i].split()[1])
print(lines[i].split()[2])
lines[i] = lines[i].replace(lines[i].split()[2], new_url)

If you want to learn the more advanced approach then try placing a breakpoint just before the failing line and then watch the variables directly in the debugger without adding print statements.

Rafid · December 8, 2022, 2:55pm

@Ole Somehow the script now working but not updating the values. Also, there is no output at all. Can you help me ?

Ole · December 8, 2022, 3:44pm

Perhaps, but it seems like you will be able to make it work yourself.

The the code you already have indicates a good flair for programming and you seem to be almost there, just need to be patient and systematic with the remaining test and debugging.

What is your programming background?
What parts of the code have you already tested?
What development tools are you using?

Rafid · December 8, 2022, 3:57pm

No, I do not know Python at all. It was created using OpenAI's ChatGPT. But the AI seems can't fix this issue because it can't browser the internet and also can't receive full HTML file. So need help from an expert.

Ole · December 8, 2022, 4:37pm

Ah, that explains the very basic "out of range" error in seemingly professional style programming

You are being bluffed just like I was initially bluffed by you:

Rafid · December 8, 2022, 4:52pm

Ahh!! That was written by it too . Sorry forgot to edit. Btw can you help me to fix it? My whole Plex server is depends on rclone & when the admin of the page chnages url i need to extract & update in rclone manually. Really needed help!

Ole · December 8, 2022, 5:11pm

Most likely, but I do not make custom programs for free and doubt you want to pay the price I would take for this.

Rafid · December 8, 2022, 5:26pm

I do not want you to create from scratch. If mine can be usable with a little bit of tweak you can help me if you want.

Ole · December 8, 2022, 7:07pm

I suspect a complete rewrite is the easiest based on the origin of the script and characteristics of the first error.

Perhaps other forum members would like to give it a try?

Perhaps you find it fun/useful to learn programming yourself?

system · January 7, 2023, 7:07pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.