rickroll.clj
rickroll.sh
Today, we’re going to use Pandoc and Clojure to produce a nice EDN file with all the links from an Markdown file.
I strive to learn to use general tools. I want to be able to mix and combine my existing toolbox to new problems. To achieve that, I’m willing to sacrifice some clarity and some control.
Pandoc and Clojure are general tools. Pandoc supports a wide range of document formats. Clojure is a great tool for general purpose programming.
Here’s a list of more specific ways of solving adjacent problems:
Specific tools are often easier to get started with than general tools. Doing something specific is also a great way to learn. By minimizing indirection in what you do, you minimize your chance to get lost.
That’s not what we’re going to do today! Today, we’re aiming for general.
Let’s get to it.
First, let’s define our language.
Term | Definition | More details |
---|---|---|
walk | A way to transform recursive data structures | https://clojuredocs.org/clojure.walk |
Pandoc | Document converter | https://pandoc.org/ |
Pandoc filter | A program that can transform Pandoc JSON | https://pandoc.org/filters.html |
Babashka | Clojure runtime for scripting | https://babashka.org/ |
Note: we could use plain Clojure instead of Babashka. But Babashka is a good fit here because of fast startup time.
Pandoc provides a common document format abstraction, and transformation from/to a wide range of formats. Let’s look at an example.
Given doc.md
:
# Pandoc converts
> /pan/
>
> involving all of a (specified) group or region
"Pan-doc" like "pan-Atlantic", get it?
It supports lots of formats.
We can call pandoc:
#!/usr/bin/env bash
pandoc doc.md -o doc.html
To produce doc.html
:
<h1 id="pandoc-converts">Pandoc converts</h1>
<blockquote>
<p>/pan/</p>
<p>involving all of a (specified) group or region</p>
</blockquote>
<p>“Pan-doc” like “pan-Atlantic”, get it?</p>
<p>It supports lots of formats.</p>
Given link.md
:
[teod.eu][teod]
See
[teod]: https://teod.eu
We can call pandoc:
#!/usr/bin/env bash
# pandoc link.md -t json | jq > link.json # pretty
pandoc link.md -o link.json # compact
To produce link.json
:
{"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para","c":[{"t":"Str","c":"See"},{"t":"Space"},{"t":"Link","c":[["",[],[]],[{"t":"Str","c":"teod.eu"}],["https://teod.eu",""]]}]}]}
Man, that’s a long line. Here:
{
"pandoc-api-version": [1, 22, 2],
"meta": {},
"blocks": [
{
"t": "Para",
"c": [
{"t": "Str", "c": "See"},
{"t": "Space"},
{
"t": "Link",
"c": [
["", [], []],
[{"t": "Str", "c": "teod.eu"}],
["https://teod.eu", ""]
]
}
]
}
]
}
See? It’s just data 🙂
First, recap.
So, by leveraging Pandoc, we can create arbitrary transformations on anything*!
(*anything: https://pandoc.org/index.html)
So, how do we want to do this? A Pandoc filter takes JSON on stdin and produces JSON on stdout.
We can use jet
and bb
do do this:
echo '{"args": [1, 2]}' | \
jet --from json | \
bb '(assoc *input* :sum (reduce + (:args *input*)))' | \
jet --to json
{"args":[1,2],"sum":0}
Nice!
rickroll.clj
I wanted to work on the pandoc filter incrementally, writing each step. That didn’t happen. I got in the zone, and wrote everything. So you’ll get after-the-fact commentary instead. The source code renderer on play.teod.eu currently (2022-07-15) demands very short source code lines. So you can view rickroll.clj as a raw file or on Github if you’d like. Otherwise, keep on scrolling.
Here comes a full listing for the babashka script. We continue below the code listing!
ns rickroll
(:require
(walk :refer [prewalk]] ; recursive transformation
[clojure.; read pandoc JSON as EDN
[clojure.edn]
))
comment
(;; a nice pattern for recursive transformation in Clojure:
;;
;; 1. walk
;; 2. change element if (predicate?)
;; 3. otherwise, leave it be.
;;
;; Example:
prewalk (fn [el]
(if (string? el) ; touch strings
(keyword el) ; do this to strings
(; otherwise let it be
el)) :big ["nested" "structure"]}) ; big thing in here
{
)
;; Here's the predicate we're going to use later:
defn pandoc-link?
("Is this a valid Pandoc link?"
[pandoc]= "Link" (:t pandoc)))
(
;; I choose to pull "this is an empty element" out of the walk logic:
defn pandoc-empty
("Empty Pandoc element"
[]
{})
;; What's the simplest link transform we could do?
;; Removing links is easy.
;; Let's start there.
;;
;; For the interested reader, Geepaw Hill provides some
;; great commentary on you should take small steps.
;;
;; https://www.geepawhill.org/2021/09/29/many-more-much-smaller-steps-first-sketch/
;;
;; But I digress. Back to our totally serious project.
defn remove-links [pandoc]
(prewalk (fn [el]
(if (pandoc-link? el)
(
(pandoc-empty)
el))
pandoc))
;; To try, set `transform` to `remove-links` below :)
;; Finally, here's a rickroll pandoc filter:
defn rickroll [pandoc]
(let [;; I like to see an example of the data
(;; structure I'm working with
:t "Link",
_link-example {:c [["" [] []]
:t "Str", :c "teod.eu"}]
[{"https://www.youtube.com/watch?v=dQw4w9WgXcQ" ""]]}
[;; which made the assoc-in okay to write:
fn [el]
rick-link (assoc-in el [:c 2 0]
("https://www.youtube.com/watch?v=dQw4w9WgXcQ"))]
;; now, same (if predicate change no-change) pattern
prewalk (fn [el]
(if (pandoc-link? el)
(
(rick-link el)
el))
pandoc)))
;; I first tried running it all at once:
;;
;; pandoc -i doc.md --filter "bash -c \"jet --from json --keywordize | bb rickroll.clj | jet --to json\" -o doc-no-links.md
;;
;; But it turns out, pandoc doesn't support this.
;; A filter must be a single script.
;; Filters can't take arguments.
;; So we need a wrapper.
;; More on the wrapper later.
;; I hard-code some example data so that "just running" gives me feedback:
def example
(:pandoc-api-version [1 22 2], :meta {},
{:blocks [{:t "Para",
:c [{:t "Str", :c "See"}
:t "Space"}
{:t "Link",
{:c [["" [] []]
:t "Str", :c "teod.eu"}]
[{"https://teod.eu" ""]]}]}]})
[
;; ... but if *in* looks right, use that.
def input
(try
(*in*)
(clojure.edn/read catch RuntimeException _
(
())))
let [transform rickroll] ; choose rickroll or remove-links here
(if (map? input)
(
(transform input)
(transform example)))
;; How to run without pandoc:
;;
;; cat link.json \
;; | jet --from json --keywordize \
;; | bb rickroll.clj \
;; | jet --to json --keywordize
;;
rickroll.sh
Pandoc’s --filter
requires a single
script. So here’s rickroll.sh
:
#!/usr/bin/env bash
jet --from json --keywordize \
| bb rickroll.clj \
| jet --to json --keywordize
You can use rickroll.sh
like this:
./rickroll.sh < link.json
{
"pandoc-api-version": [1, 22, 2],
"meta": {},
"blocks": [
{
"t": "Para",
"c": [
{"t": "Str", "c": "See"},
{"t": "Space"},
{
"t": "Link",
"c": [
["", [], []],
[{"t": "Str", "c": "teod.eu"}],
["https://www.youtube.com/watch?v=dQw4w9WgXcQ", ""]
]
}
]
}
]
}
Look at all those closing parens! 😄 Hiccup is quite compact when you think about it.
:p "See " [:a {:href "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
["teod.eu"]]
#!/usr/bin/env bash
# We could rickroll from the org file:
#
# pandoc --standalone \
# --from=org+smart \
# --shift-heading-level-by=1 \
# --toc \
# -i index.org \
# --filter rickroll.sh \
# -o rickroll-ourselves.html
# But that's too easy, so let's use the HTML file instead.
pandoc \
--standalone \
-V title:"" \
-i index.html \
--filter rickroll.sh \
-o rickroll-ourselves.html
Please head over to rickroll-ourselves.html!
extract_links.clj
is rickroll.clj
with some edits:
ns extract-links
(:require [clojure.walk :refer [prewalk]]
(
[clojure.edn]))
defn link?
("Is this a valid Pandoc link?"
[pandoc]= "Link" (:t pandoc)))
(
defn link-href [el]
(when (link? el)
(get-in el [:c 2 0])))
(
;; Keeping the old =rickroll= function for comparison.
defn rickroll [pandoc]
(let [;; I just copied in an example of what I was going to generate
(:t "Link",
_pandoc-link-example {:c [["" [] []]
:t "Str", :c "teod.eu"}]
[{"https://www.youtube.com/watch?v=dQw4w9WgXcQ" ""]]}
[;; which made the assoc-in okay to write
fn [el]
link-to-rick (assoc-in el [:c 2 0] "https://www.youtube.com/watch?v=dQw4w9WgXcQ"))]
(;; now, just follow the walk pattern from above.
prewalk (fn [el]
(if (link? el)
(
(link-to-rick el)
el))
pandoc)))
defn links [pandoc]
(let [links-found (atom [])]
(prewalk (fn [el]
(if (link? el)
(do (swap! links-found conj
(:href (link-href el)}) el)
{
el))
pandoc)@links-found))
def example
(:pandoc-api-version [1 22 2], :meta {},
{:blocks [{:t "Para", :c [{:t "Str", :c "See"}
:t "Space"}
{:t "Link",
{:c [["" [] []]
:t "Str", :c "teod.eu"}]
[{"https://teod.eu" ""]]}]}]})
[
def input
(try
(*in*)
(clojure.edn/read catch RuntimeException _
(
())))
if (map? input)
(
(links input) (links example))
And here’s how to run it:
#!/usr/bin/env bash
pandoc index.org --to json \
| jet --from json --keywordize \
| bb extract_links.clj \
| bb '(clojure.pprint/pprint *input*)'
, producing:
:href "./.."}
[{:href "https://pandoc.org/MANUAL.html#general-options"}
{:href "https://clojuredocs.org/clojure.walk"}
{:href "https://pandoc.org/filters.html"}
{:href "https://babashka.org/"}
{:href "rickroll.clj"}
{:href
{"https://github.com/teodorlu/play.teod.eu/blob/master/document-transform-pandoc-clojure/rickroll.clj"}
:href "https://github.com/weavejester/hiccup"}
{:href "rick.html"}] {
That’s all for now.
🙌