Logseq like Jupyter Notebook et similia

RichardJActon · March 20, 2023, 4:48pm

There are a number of very different approaches that could be taken to implementing a feature like this. Choice of an approach has some quite consequential implications so this is something the Logseq community should probably give careful consideration if it is decided to move forward with something like this. Particularly what scope would be desirable for such a feature. Jupyter and Quatro/Rmarkdown take slightly different approaches to the problem. This article The First Notebook War - So Joel Grus doesn't like Jupyter notebooks. Here are some of my thoughts on notebooks, IDE, and R Markdown. - Yihui Xie | 谢益辉 by Yihui Xie the main developer of amongst other things knitr does a good job of contrasting them and pointing out the strengths and weakness on the ‘computational notebook’ more generally.
- Jupyter
  - Jupyter’s approach is to have an external web server running jupyter hub to provide you with an interactive ipython REPL like session using Jupyter’s kernels for specific languages.
  - I view jupyter as more of an interactive scratch session for prototyping an playing around than a more serious document authoring tools or tool to perform reproducible computational analyses
  - Tools like Jupyter Book GitHub - executablebooks/jupyter-book: Create beautiful, publication-quality books and documents from computational content. and now Quarto can make it better from the document authoring side I’m not sure the compute environment management tools are as clean and generaliseable as the Quarto markdown approach.
- Quarto / Rmarkdown
  - In this approach the notebook is authored usually in some slight super-set of pandoc markdown in an IDE like RStudio or VScode with some plugins. Code blocks are executed in a session on the system’s install of whatever language your are using with the outputs displayed interactively in the editor.
  - This approach in my view scales better, by which I mean it is easier to transition from the prototyping and exploratory phase to more serious and larger scale projects either developing software or authoring technical of academic works without having to switch up your tooling.
  - The components in this approach are less tightly coupled than in the jupyter approach making things more modular and extensible.
- WebAssembly (WASM)
  - Another way entirely could be to have the code you want to run compiled to web assembly and run ‘serverlessly’ with electron providing the runtime so you can keep it entirely self contained and not have any external runtimes. (see GitHub - quarto-ext/shinylive: Quarto extension to embed Shinylive for Python applications for and example of what can be done with this) Pyodide (mentioned above) also takes this approach.
A big gap in this thread so far is thinking about defining, preserving & making portable the compute environment in which any code is run. For a portable and reproducible analysis you need the data, the code and the compute environment to all be adequately defined. My current favourite tool for this is Renku https://renkulab.io/ an open source project from the swiss data science centre build on gitlab, kubernetes and docker which allows you to run RStudio, JupyterHub, or a full linux desktop from a docker container either in the cloud or locally and to store data in a shared knowledge graph.
Do we want to be able to come back to our code and re-run the analysis after making some tweaks? Do we want to freeze the output of a given run with out preserving the ability to re-run it keeping the results of running it at the time but not necessarily with the ability to re-run it add get the same results?
In addition to Renku Stencila https://stenci.la/ may be work watching their approach is very ambitious and they seem to have become less active lately but I like their interest in integrating with existing linked data formats like schema.org
Beyond the containerisation approaches which have certain weaknesses for computational reproducibility, as container image builds are not necessarily fully reproducible, the NIX & GUIX approach to package management will I think in the long run become the more popular approach to reproducible compute environments especially as people start taking software bill of materials more seriously for security reasons.