Script: bulk star Readeck entries by URL (with URL cleaning)

A better version of the script I wrote for Wallabag

Posted by Curiositry on January 26th, 2026 Tagged 100DaysToOffload, Read-it-later, Readeck, Python, API

Note: this post is part of #100DaysToOffload, a challenge to publish 100 posts in 365 days. These posts are generally shorter and less polished than our normal posts; expect typos and unfiltered thoughts! View more posts in this series.

I have adapted the script I wrote for Wallabag to work with the Readeck API, and made it vastly better in the process.

Please refer to the previous article for basic information on usage, jq pipelines for bulk starring, configuration, and dependencies; that stuff is mostly the same.

In this post, I’ll just cover what’s different.

This script uses authlib for OAuth bearer token authentication. This means, in your environment variables, you can skip username, password, etc, and just provide two environment variables: READECK_BASE_URL and READECK_API_TOKEN.
There are two additional dependencies, so the uv pip install command is: uv pip install json requests authlib urlparse
Readeck doesn’t provide an “exists” endpoint. I tried using the search parameter, but it didn’t work, so what I ended up doing is fetching all bookmarks from the (paginated) API the first time it runs, and caching them in /tmp/readeck-bookmarks.json. Subsequent runs always use the cached version if it’s there.
While Wallabag seemed to store the URLs imported from the Pocket CSV basically unchanged, Readeck URLs don’t always match the original URL that was imported. So this script has a long chain of redirect-following and URL cleaning logic. If it fails to match a URL, it then fetches the headers and checks where the URL currently redirects to, and tries that. If that doesn’t work, it cleans the original URL, removing the scheme, query parameters, and fragment, and tries that. Then it tries a cleaned version of the redirected URL. Then it gives up.

Though this script is way fancier than my Wallabag script, it still doesn’t work as well. The Wallabag script was able to star 654 articles (ie, it missed one).

This script starred 621 articles. It admits it failed, for good reason, on two. I’m still sorting out the remaining 31. One of them was https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma…

Here’s the script:

import requests
from urllib.parse import urlparse, urlunparse, unquote
import json
import os
import os.path
import sys

class ReadeckAPI:
    def __init__(self, BASE_URL, API_TOKEN):
        self.BASE_URL = BASE_URL 
        self.API_TOKEN = API_TOKEN 
        self.access_token = None
    
    def fetch_bookmarks(self):
        params = {'limit': 100}
        headers = {'Authorization': f'Bearer {self.API_TOKEN}'}
        
        # params = {}
        endpoint = f"{self.BASE_URL}/api/bookmarks"            
        self.data = []
        def fetch_page(endpoint):
            response = requests.get(endpoint, 
                                headers=headers, 
                                params=params)
            response.raise_for_status()
            newdata = response.json()
            # print(newdata)
            self.data = self.data + newdata
            # print(self.data)
            print(len(newdata))
            print(len(self.data))
            if  response.headers['Current-Page'] != response.headers['Total-Pages']:
                print(f"Page {response.headers['Current-Page']} of {response.headers['Total-Pages']}")              
                # print(f"Fetching {response.links['next']['url']}")
                fetch_page(response.links['next']['url'])           

        fetch_page(endpoint)

    def get_redirect_destination(self,url):
        try:
            headers = {}
            # headers = {'User-Agent':'If you get 403 errors or redirect following, try putting something in here'}
            r = requests.head(url, headers=headers,allow_redirects=True)
            if r.status_code == 301 or r.status_code == 200:
                return r.url
            else:
                print(r.status_code)
                return False
        except Exception as e:
            print(f"Redir err:{e}")
    
    def star_article_by_url(self, urls):
        """Star an article by its URL"""
        for bookmark in self.data:
            for url in urls:
                if url in bookmark['url'] or bookmark['url'] in url:
                    print(f"Starring {bookmark['id']} ({url})")
        
                    # Star the article
                    headers = {'Authorization': f'Bearer {self.API_TOKEN}'}
                    response = requests.patch(
                        f"{self.BASE_URL}/api/bookmarks/{bookmark['id']}",
                        headers=headers,
                        data={'is_marked': True}
                    )
                    response.raise_for_status()
    
                    return True
                else:
                    # print(f"URL doesn't match ({url} != {bookmark['url']}")
                    continue
        print(f"URL not found in bookmarks: {urls}")
        return False
        
# Initialize API client
Readeck = ReadeckAPI(
    BASE_URL=os.environ["READECK_BASE_URL"],
    API_TOKEN=os.environ["READECK_API_TOKEN"],
)

if __name__ == "__main__":
    if len(sys.argv) != 2:
      print("Usage: python script.py <article_url>")
      sys.exit(1)
    
    url = sys.argv[1]

    # Readeck.fetch_bookmarks()

    # Cache all bookmarks for bulk starring
    if os.path.isfile('/tmp/readeck-bookmarks.json') != True:
        Readeck.fetch_bookmarks()
        with open('/tmp/readeck-bookmarks.json', 'w+', encoding='utf-8') as f:
            json.dump(Readeck.data, f, ensure_ascii=False, indent=4)
    else:
        with open('/tmp/readeck-bookmarks.json') as f:          
            Readeck.data = json.load(f)
        
    # success = Readeck.star_article_by_url(url)
    # print("success",success)
    # print("trying redirect")
    u = urlparse(url)
    # print(u)
    newu = u._replace(scheme="",fragment="",query="")
    # print(newu)
    cleaned_url= urlunparse(newu)
    # print(f"Trying with cleaned URL {cleaned_url}")

    cleaned_decoded_url = unquote(url)
    # print(f"Trying with cleaned idecoded URL {cleaned_decoded_url}")
    urls = [url,cleaned_url,cleaned_decoded_url]
    success = Readeck.star_article_by_url(urls)
    if success == True:
      exit(0)
      
    redirected_url =  Readeck.get_redirect_destination(url)
    if redirected_url and redirected_url != url:
        # print("redir url not match url")
        # print(type(redirected_url))
        # print(f"URL redirects to {redirected_url}. Trying that URL.")
        u = urlparse(redirected_url)
        # print(u)
        newu = u._replace(scheme="",fragment="",query="")
        # print(newu)
        cleaned_redirected_url = urlunparse(newu)
        cleaned_redirected_decoded_url = unquote(cleaned_redirected_url)
        # print(f"Trying with cleaned redirected URL {cleaned_redirected_url}")
        redirected_urls = [redirected_url,cleaned_redirected_url,cleaned_redirected_decoded_url]
        urls = urls + redirected_urls
        success = Readeck.star_article_by_url(redirected_urls)
        if success == True:
            exit(0)

    
    print(f"Total fail for url: {url} ({urls})")
    exit(1)

The Autodidacts

Exploring the universe from the inside out

Script: bulk star Readeck entries by URL (with URL cleaning)

A better version of the script I wrote for Wallabag

Sign up for updates