The Autodidacts

Exploring the universe from the inside out

Script: bulk star Readeck entries by URL (with URL cleaning)

A better version of the script I wrote for Wallabag

Note: this post is part of #100DaysToOffload, a challenge to publish 100 posts in 365 days. These posts are generally shorter and less polished than our normal posts; expect typos and unfiltered thoughts! View more posts in this series.

I have adapted the script I wrote for Wallabag to work with the Readeck API, and made it vastly better in the process.

Please refer to the previous article for basic information on usage, jq pipelines for bulk starring, configuration, and dependencies; that stuff is mostly the same.

In this post, I’ll just cover what’s different.

  1. This script uses authlib for OAuth bearer token authentication. This means, in your environment variables, you can skip username, password, etc, and just provide two environment variables: READECK_BASE_URL and READECK_API_TOKEN.
  2. There are two additional dependencies, so the uv pip install command is: uv pip install json requests authlib urlparse
  3. Readeck doesn’t provide an “exists” endpoint. I tried using the search parameter, but it didn’t work, so what I ended up doing is fetching all bookmarks from the (paginated) API the first time it runs, and caching them in /tmp/readeck-bookmarks.json. Subsequent runs always use the cached version if it’s there.
  4. While Wallabag seemed to store the URLs imported from the Pocket CSV basically unchanged, Readeck URLs don’t always match the original URL that was imported. So this script has a long chain of redirect-following and URL cleaning logic. If it fails to match a URL, it then fetches the headers and checks where the URL currently redirects to, and tries that. If that doesn’t work, it cleans the original URL, removing the scheme, query parameters, and fragment, and tries that. Then it tries a cleaned version of the redirected URL. Then it gives up.

Though this script is way fancier than my Wallabag script, it still doesn’t work as well. The Wallabag script was able to star 654 articles (ie, it missed one).

This script starred 621 articles. It admits it failed, for good reason, on two. I’m still sorting out the remaining 31. One of them was https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma

Here’s the script:

import requests
from urllib.parse import urlparse, urlunparse
import json
import os
import os.path
import sys
from authlib.integrations.requests_client import OAuth2Auth

class ReadeckAPI:
    def __init__(self, BASE_URL, API_TOKEN):
        self.BASE_URL = BASE_URL 
        self.API_TOKEN = API_TOKEN 
        self.access_token = None
    
    def fetch_bookmarks(self):
        params = {'limit': 100}
        headers = {'Authorization': f'Bearer {self.API_TOKEN}'}
        # params = {}
        endpoint = f"{self.BASE_URL}/api/bookmarks"            
        self.data = []
        def fetch_page(endpoint):
            response = requests.get(endpoint, 
                                headers=headers, 
                                params=params)
            response.raise_for_status()
            newdata = response.json()
            # print(newdata)
            self.data = self.data + newdata
            # print(self.data)
            print(len(newdata))
            print(len(self.data))
            if  response.headers['Current-Page'] != response.headers['Total-Pages']:
                print(f"Page {response.headers['Current-Page']} of {response.headers['Total-Pages']}")              
                # print(f"Fetching {response.links['next']['url']}")
                fetch_page(response.links['next']['url'])           

        fetch_page(endpoint)

    def get_redirect_destination(self,url):
        r = requests.head(url, allow_redirects=True)
        return r.url
    
    def star_article_by_url(self, url):
        """Star an article by its URL"""
        for bookmark in self.data:
            if url in bookmark['url'] or bookmark['url'] in url:
                print(f"Starring {bookmark['id']} ({url})")
        
                # Star the article
                headers = {'Authorization': f'Bearer {self.API_TOKEN}'}
                response = requests.patch(
                    f"{self.BASE_URL}/api/bookmarks/{bookmark['id']}",
                    headers=headers,
                    data={'is_marked': True}
                )
                response.raise_for_status()
    
                return True
            else:
                # print(f"URL doesn't match ({url} != {bookmark['url']}")
                continue
        print(f"URL not found in bookmarks: {url}")
        return False
        
# Initialize API client
Readeck = ReadeckAPI(
    BASE_URL=os.environ["READECK_BASE_URL"],
    API_TOKEN=os.environ["READECK_API_TOKEN"],
)

if __name__ == "__main__":
    if len(sys.argv) != 2:
      print("Usage: python script.py <article_url>")
      sys.exit(1)
    
    url = sys.argv[1]

    # Cache all bookmarks for bulk starring
    if os.path.isfile('/tmp/readeck-bookmarks.json') != True:
        Readeck.fetch_bookmarks()
        with open('/tmp/readeck-bookmarks.json', 'w+', encoding='utf-8') as f:
            json.dump(Readeck.data, f, ensure_ascii=False, indent=4)
    else:
        with open('/tmp/readeck-bookmarks.json') as f:
            
            Readeck.data = json.load(f)
        
    success = Readeck.star_article_by_url(url)
    if success != True:
        redirected_url =  Readeck.get_redirect_destination(url)
        if redirected_url != url:
            try:
                answer = input(f"URL redirects to {redirected_url}. Use that URL to star?")
                if answer.lower() in ["y","yes"]:
                    success = Readeck.star_article_by_url(redirected_url)
                    if success == True:
                        exit(0)
                elif answer.lower() in ["n","no"]:
                    exit(1)
            # if running in bulk, follow redirects without prompting
            except:
                print(f"URL redirects to {redirected_url}. Trying that URL.")
                success = Readeck.star_article_by_url(redirected_url)
                if success == True:
                    exit(0)
                else:
                    print(f"Failed to star both URL ({url} and redirected URL ({redirected_url})")
                    u = urlparse(url)
                    # print(u)
                    newu = u._replace(scheme="",fragment="",query="")
                    # print(newu)
                    cleaned_url= urlunparse(newu)
                    print(f"Trying with cleaned URL {cleaned_url}")
                    success = Readeck.star_article_by_url(cleaned_url)
                    if success == True:
                        exit(0)
                    else:
                        print(f"Failed to star cleaned URL ({cleaned_url}). Cleaning redirected URL and trying that.")
                        u = urlparse(redirected_url)
                        # print(u)
                        newu = u._replace(scheme="",fragment="",query="")
                        # print(newu)
                        cleaned_redirected_url = urlunparse(newu)
                        print(f"Trying with cleaned redirected URL {cleaned_redirected_url}")
                        success = Readeck.star_article_by_url(cleaned_redirected_url)
                        if success == True:
                            exit(0)
                        else:
                            print(f"Total fail for url: {url}")
                            exit(1)

Sign up for updates

Join the newsletter for curious and thoughtful people.
No Thanks

Great! Check your inbox and click the link to confirm your subscription.