Array programming

..

https://en.wikipedia.org/wiki/Array_programming

Array programming enables performant, ergonomic handling of tabular data

My definition:

When we program with row oriented data, we typically want to work on one row at a time. A row can be a customer, a document, or a user. Memory and speed is rarely a problem.

When we program with column oriented data, we typically want to work on one column at a time. A column can be an array of points in time, or a sample from a probability distribution.

Array programming is a rich discipline centered around column-oriented data.

References

Sequences And Arrays

By Chris Nuernberger.

Array languages for Clojurians

By Dave Liepmann.

Journal

2023-03-19

why

I’d like to make a Clerk notebook that demonstrates how Latin hypercube sampling improves performance over arbitrary random sampling for monte-carlo-methods. But … I don’t know where to start.

  1. I’d like to be very lazy when sampling – ideally simply work from readers (tech.ml style) and avoid as much allocation as possible. Though it looks like we might have to sample the initial data. Hmm.
  2. I want nice tabular data display for Clerk
  3. I want nice histograms for Clerk.

j?

Both Dave Liepmann and Chris Nuernberger quickly refer to J (programming language).

Perhaps I should just start with plain java, and arrays?

latin hypercube:

  1. set the cube dimension
  2. set the sample size (N)
  3. for each cube dimension,
    1. split the cube space into N equally large buckets
    2. in each bucket, sample a random value
  4. finally, shuffle each sample vector

buuut starting simpler, let’s just sample “normally”

and we can define a uniform->dice transformer.