Any way to archive external URLs transparently?

Hi,

As I refer to external URLs e.g. web or twitter, over time, some of these URLs will become obsolete because the source URL might change or cease to exist.

Is there a way to create an archival copy of the source page so that the logseq representation isn’t broken someday in the future?

Thanks in advance.

1 Like

You can use the Wayback Machine to take a snapshot of any page:

https://archive.org/web/

You can save the link to the snapshot or just the original link, because it breaks you will be able to find snapshots of the original link by simply visiting the page above.

In general when you have broken links check the Wayback Machine because often the snaptshots are already there, taken automatically or by other people :slightly_smiling_face:

2 Likes

That is a great idea for the use of the Wayback Machine. Thanks.

Now to figure out how to write a plugin that’ll do this automatically in the background:

  1. Take any URL pasted into Logseq and submit it to the Wayback machine (if it doesn’t already exist). I assume they have APIs for this.
  2. Modify the Logseq source link to the URL from the Wayback machine

Or let the original url and use a browser that integrates Wayback Machine lookup.

1 Like

Not everything is archived by the Wayback Machine. So you’d still need an interface to submit the URL to WM.

It’s not in Logseq but I use nb · command line and local web plain text note-taking, bookmarking, archiving, and knowledge base application for bookmarking. It creates a markdown copy of the page internally. Another option could be https://docs.archivebox.io/en/latest/index.html

1 Like

Both these tools are excellent suggestions and seem to have built-in support for the Wayback Machine. Will definitely be looking at them closer as I found a second reference in my Logseq notes go bad because the original source website changed.

Not the best solution but, when the linked content is critical, I do a Print to PDF and attach the PDF to the content I am linking from. Since there’s great PDF viewer and annotator built-in, this is sufficient most of the time. But I understand this doesn’t scale unless automated as well.

5 Likes

If you’re on MacOS, There’s also Brett Terpstra’s impressive (and free—he accepts donations and sponsorship) Gather command-line tool. It takes a web page and intelligently parses it into markdown. Various arguments for customizing the output are built in.

I use it in an Alfred workflow triggered by a text snippet, so I just type ..pmd and the text content from my frontmost browser tab is pasted in markdown format wherever I happen to be typing.

Happy to share how I set up the Alfred workflow if there’s interest!

1 Like

I wrote a plugin which will save a URL as as a static webpage locally. It also checks the Internet Archive if the URL exists and if so, puts a reference link to it.

2 Likes

Awesome! Will take this for a spin. We don’t need the bun dependency when installing as a plugin through logseq, right?

1 Like

Correct, that’s only needed for development.