Apr 17, 2013

I will never CNAME my root domain again.

I will never CNAME my root domain again.

I will never CNAME my root domain again.

NEVER. EVER.

Apr 13, 2013

Hello Tumblr

Good bye, Posterous. Rest in peace.

Jun 6, 2012

A simple callback chain macro for elisp

The Problem

As usual, it started with a tiny piece of ugly code:

(bd-create-stage datafile-id
                 (lambda (stage-id)
                   (bd-insert-rows stage-id 
                                   [[10 20 30] [40 50 60]]
                                   (lambda (stage-id)
                                     (bd-commit-stage stage-id 
                                                      #'ignore)))))

The snippet above is basically a callback chain. When bd-create-stage finishes its work, it calls the first lambda, which calls bd-insert-rows with the second lambda as its callback argument and so on, until it all stops at the ignore function.

I wanted to rewrite it as something like this:

(=> datafile-id
    (bd-create-stage it next)
    (bd-insert-rows  it [[1 2 3 4 5] [6 7 8 9 0]] next)
    (bd-commit-stage it next))

Where the it variable would represent the current callback’s parameter and next would refer to the next callback in the chain. As with the -> macro, I wanted explicit anaphoric variables.

The Idea

Each line in the snippet above could be wrapped in a lambda, lust like this:

(=> datafile-id
    (lambda (next it)
      (bd-create-stage it next))
    (lambda (next it)
      (bd-insert-rows it [[1 2 3 4 5] [6 7 8 9 0]] next)
    (lambda (next it)
      (bd-commit-stage it next))))

Then it should somehow call each function in the list with the consequent function as the first parameter and the result of execution of the previous function as the second parameter.

The Solution

This function chaining thing looks a lot like a binary function fold:

(defun chain2 (f1 f2)
  (apply-partially f1 f2))

(defun chain (&rest fns)
  (if fns
      (reduce #'chain2 fns :from-end t)
    #'identity))

Applying chain to a function list creates a new function taking one parameter and passing it through the whole function list, much like the -> macro does.

In fact, this is enough to start working on the macro.

(defmacro => (initial &rest forms)
  `(funcall ,(build-form-chain forms) ,initial))

The build-form-chain function wraps each form into a lambda and then chains them together:

(defun build-form-chain (forms)
  `(apply #'chain 
          (list ,@(mapcar #'build-form-link forms) #'ignore)))

At the end it adds ignore as a terminator. The terminator is necessary because the last callback’s result is almost always ignored.

The build-form-link’s implementation is trivial:

(defun build-form-link (form)
  `(lambda (next it) ,form))

Done! Here’s the full source for your convenience:

(defun chain2 (f1 f2)
  (apply-partially f1 f2))

(defun chain (&rest fns)
  (if fns
      (reduce #'chain2 fns :from-end t)
    #'identity))

(defun build-form-link (form)
  `(lambda (next it) ,form))

(defun build-form-chain (forms)
  `(apply #'chain 
          (list ,@(mapcar #'build-form-link forms) #'ignore)))

(defmacro => (initial &rest forms)
  `(funcall ,(build-form-chain forms) ,initial))

Now let’s see how the macro expands:

ELISP> (macroexpand
     '(=> datafile-id
          (bd-create-stage it next)
          (bd-insert-rows  it [[1 2 3 4 5] [6 7 8 9 0]] next)
          (bd-commit-stage it next)))

(funcall (apply (function chain) 
                (list (lambda (next it) 
                        (bd-create-stage it next))
                      (lambda (next it)
                        (bd-insert-rows it [[1 2 3 4 5] [6 7 8 9 0]] next))
                      (lambda (next it)
                        (bd-commit-stage it next))
                      (function ignore)))
          datafile-id)

Exactly as intended.

This macro covers 95% of my callback chaining needs. For the rest 5% there is the all-powerful deferred.el library.

May 9, 2012

Thread operator in Elisp

TL;DR

ELISP> (-> 1
           (+ 2 it)
           (* 3 it))
9
ELISP> (macroexpand
           '(-> 1
                (+ 2 it)
                (* 3 it)))
(let* ((it 1) (it (+ 2 it)) (it (* 3 it))) it)

Implementation:

(defmacro -> (arg &rest forms)
  `(let* ((it ,arg) .
      ,(mapcar (lambda (form) `(it ,form))
           forms))
     it))

The Long Story

When I see code like this, I frown:

(defun bd-search (api-key query callback)
  (send-request "GET"
            (format "search?%s"
                (make-query-string `(("api_key" . ,api-key)
                             ("query" . ,query))))
        callback))

It’s a very simple case, yet the parameter list is already at the fourth level of indentation. When it gets really ugly I usually wrap the whole thing into a let statement and start moving inner parts into variables.

What I have noticed, however, is that almost always constructs like this are sequential by their nature, in other words the output of the innermost statement serves as input for the statement one level up, and so on and so forth. This is the very reason why Clojure had its thread operator macro since beginning.

Remembering that, I started literally morphing my bd-search function into something more prettier. I came up with this variant:

  (-> `(("api_key" . ,api-key)
    ("query" . ,query))
      (make-query-string it)
      (format "search?%s" it)
      (send-request "GET" it callback)))

Then I put together the -> macro and that was it.

I decided to make the macro anaphoric instead of implicitly injecting an extra parameter as in Clojure. This allowed me to put the threaded parameter at any place, not just at the beginning or at the end of the parameter list.

Aug 27, 2011

How much can be done in four hours

Today I had an awesome day at the first OpenDataBC hackathon which took place at Mozilla Labs Vancouver.

Tara Gibbs pitched this wonderful idea of consolidating shelter availability data and displaying it on a few window displays, so the homeless people living DTES would not waste their time going from one shelter to another just to find a free spot.

This doesn’t solve all the problems of course, but it does solve a little yet very annoying one.

So… At 11:30 we had nothing but an idea. We discussed possible approaches for a while, then came David Eaves and suggested using Twitter as a message queue service.

At approximately 12:00 we still had nothing but a piece of paper covered with boxes and arrows, then we started coding. Tara did the frontend, I was busy hacking the backend and the Twitter stuff.

Four hours later we had a fully functional, production ready system - https://github.com/mikeivanov/vanshelter

How it is supposed to work:

  1. Shelters tweet their availability data (they all have internet access)
  2. VanShelter monitors — each of them independently — receive Twitter updates and
  3. Refresh their displays when something changes.

For displays we can use cheap LCD monitors, probably even donated. The software will run on those amazing Raspberry thingies - http://www.raspberrypi.org/, $25 each. This brings the full cost of installing 10 displays down to $250+.

Thank you Tara and David. Also, thank you Jeff and all the people who made this hackathon possible.

Jun 28, 2011

Pure Python Paillier Homomorphic Cryptosystem Implementation

What

This is a very basic Paillier Homomorphic Cryptosystem implemented in pure Python.

The idea is, in short, to encrypt two numbers, perform an “add” operation on cyphertexts, decrypt the result and find it to be the sum of the original plaintext numbers.

How

The code is loosely based on the thep project and a few ActiveState recipes. The code is pure Python and all objects are serializable.

Where

Here: https://github.com/mikeivanov/paillier

Why

I was bored.

Jun 9, 2011

How to mount an NTFS-formatted USB drive in read-write mode on Mac OS X

Actually, it’s very easy. No additional software is required. Just seven easy steps:

  1. Attach your USB drive
  2. Open the Terminal app (Command-Space, then type “Terminal”, hit Enter)
  3. Type or copy/paste these commands:

    sudo sh -c "mkdir -p /mnt \
    $(mount | grep ntfs | head -n 1 \
       | awk '{ print "&& umount " $3 \
                   " && mount_ntfs -o nosuid,rw " $1 " /mnt" }')"
    
  4. Locate your drive in Finder

  5. Drag/drop files there
  6. Unmount the drive as usual
  7. DONE!

The command breakdown, if you’re interested:

  • mkdir -p /mnt creates a mount point — a place in the file system where the drive is going to be attached
  • The mount command without parameters gives you a list of the currently attached drives
  • grep ntfs filters non-ntfs drives out the list
  • head -n 1 grabs the first line (we’re assuming only one ntfs drive can be attached at a time)
  • The awk part produces two commands:
    • umount /Volumes/ — unmounts the drive from its original place
    • mount_ntfs -o nosuid,rw /dev/ /mnt — mounts the drive again, but this time in the read-write mode
  • Now, the sudo sh -c "..." thing allows code execution with superuser privileges.

That’s it.

Aug 20, 2010

Tail recursion without TCO

Emacs lisp has no Tail Call Optimization (TCO), neither do many other lisp dialects. The lack of TCO is not a big deal—it’s always possible to transform a tail recursive algorithm into a loop. However, it makes functions look uglier. Here is a very simple method of enabling Clojure-style tail call recursion in Emacs lisp:

;; A very simple linearized Y combinator.
;; All the state management stuff is incapsulated here.
;; Don't call it directly.
(defun rloop- (body &rest args)
  (let ((res nil))
    (while (progn
             ;; here's the idea: we keep calling body 
             ;; while it returns the recursion marker
             (setq res (apply body args))
             (when (and (consp res)
                        (eq :loop-recur-marker (car res)))
               (progn (setq args (cdr res))
                      t))))
    res))

;; Recursion marker factory
(defun recur (&rest args)
  ;; instead of a real recursive call,
  ;; just signal an intention to make one
  (cons :loop-recur-marker args))

;; The form macro
(defmacro rloop (init body)
  (let ((args (mapcar 'car init)))
    ;; a little courtesy to the macro users
    `(let* ,init
       ;; make a lambda from the body and pass it 
       ;; to the combinator function
       (rloop- (function (lambda (,@args) ,body))
               ,@args))))

Here’s how to use it:

(defun factorial (x)
  ;; this is the recursion entry point
  (rloop ((x   x) 
          (acc 1))
         (if (< x 1)
             acc ;; done, just return the result
           ;; not done, start the whole rloop block again
           (recur (1- x) 
                  (* x acc)))))

ELISP> (factorial 10)
3628800

The funny part is defun is not necessary. You can have as many sequential inlined rloops as you want. I like this approach: all the state management stuff is off the sight. The function code is almost identical to the underlying algorithm. Another classic example:

(defun fibo (x)
  (rloop ((x    x)
          (curr 0)
          (next 1))
         (if (= x 0)
             curr
           (recur (1- x) 
                   next 
                  (+ curr next)))))

ELISP> (fibo 10)
55

Nice, eh? Of course, this kind of beauty comes with a price. Here is how the rloop macro expands:

ELISP> (macroexpand '(rloop ((n 0)) (if (> n 5) n (recur (1+ n)))))

(let*
    ((n 0))
  (rloop-
   #'(lambda
       (n)
       (if
           (> n 5)
           n
         (recur
          (1+ n))))
   n))

…which means two extra function calls on each iteration. But realistically, it’s not such a big deal. Clarity of the code is way more important.

Apr 6, 2010

Clouds and entropy

In a post titled A Trusted Cloud Entropy Authority Reuven Cohen writes:

…maybe there an opportunity to create a trusted cloud authority to provide signed verified and certified entropy. Think of it like a certificate authority (CA) but for chaos. Actually, Amazon Web Service itself could act as this entropy authority via a simple encrypted web service call. I even have a name for it, Simple Entropy Service (SES).”

This is really a good idea. Amazon should have provided such a service long time ago.

When an SSL connection is being established, a browser and a server perform the Handshake protocol. This protocol involves exchanging random bits between the parties. The important thing is that security depends on how random those bits are. If they are not, the connection is effectively insecure.

In the case of AWS, there is no source of true randomness, therefore SSL on AWS is inherently insecure. Moreover, instances running on the same physical machine can affect each other’s security by draining the shared random pool in the host system.

Further he writes:

a website called http://random.org [is] a true random number service that generates randomness via atmospheric noise. Looks cool, maybe this may help solve the problem.”

I don’t think that random.org is a good choice.

One problem is a connection to such a service. It should be as secure as the most secure secret handled on your system. If the random bit connection is encrypted with 256 bit AES (and it actually is), this is the highest level of security your system can provide. Plus, there should be guarantee that no unencryped random bits are stored anywhere. The same is true for the proposed SAS service, too.

Another problem with random.org is… well, randomness is perceptive. What you see as “random” can be quite deterministic to the people who run the random.org service. Even though they might not store anything, their present is your future—just think about relativistic effects. A temptation to tamper with someone’s future can be, you know, very strong.

The overall quality of the service is not known. There is no guarantee it is random at all. A quote from their FAQ: “Q1.2: Is the source code for the generator available? — Not currently, no. Maybe I’ll make it available as open source some day.”

Even though the Whois database indicates the domain name’s registrant is located in France, the SSL certificate owner is not specified. I have no reasons for not believing the guy running the service, but I would not entrust my customers’ data into a total stranger’s hands, even though he or she seems to be a nice person.

So the conclusion is: while there is no trusted entropy generator on the AWS side, we, the AWS customers, are on our own.

Here is a hint: entropy seeds can be generated in-house and smuggled into instances over a secure channel. Then those seeds could be fed to a cryptographically secure RNG like Isaac to produce actual “random” bits. I think there should be a way of injecting those into the instance’s random pool.

Jul 17, 2009

Dynamic queries: the Postgres way

Have you ever been in a situation when you needed a query to be generated and executed dynamically as a result of another query?

If yes, read further: there is a simple and elegant way to achieve that.

First, create this function:

CREATE OR REPLACE FUNCTION dselect(varchar)
  RETURNS SETOF record AS $$
    DECLARE rec record;
    BEGIN
      FOR rec IN EXECUTE $1 LOOP
        RETURN NEXT rec;
      END LOOP;
    END
  $$ LANGUAGE 'plpgsql';

Second, umm.. well, that’s it.

Anything you pass as a parameter will be interpreted and executed as an SQL statement.

The function is SELECT-able. That is, you can use it in SELECTs:

SELECT * FROM dselect('SELECT id, name FROM users') AS t(id int, name varchar);

Note the AS t(id int, name varchar) part. Postgres has no idea about what this function returns, so a column definition list should be provided. If not, Postgres will complain:

SELECT * FROM dselect('SELECT id, name FROM users');
ERROR:  a column definition list is required for functions returning "record"

Of course, the column definition list depends on the query and should match the actual query results.

So why this function is needed at all?

Because of situations like this:

CREATE SCHEMA sc_foo;
CREATE TABLE sc_foo.activity (date date, descr text);
INSERT INTO sc_foo.activity (date, descr) VALUES ('2009-08-07', 'went there');
INSERT INTO sc_foo.activity (date, descr) VALUES ('2009-06-04', 'hanging around');

CREATE SCHEMA sc_bar;
CREATE TABLE sc_bar.activity (date date, descr text);
INSERT INTO sc_bar.activity (date, descr) VALUES ('2009-10-11', 'came here');

It allows you to do this:

SELECT 
    nspname 
FROM
    pg_namespace 
WHERE
    nspname LIKE 'sc_%'  AND
    (SELECT date FROM 
        dselect('SELECT max(date) FROM ' || nspname || '.activity') AS t(date date)) &lt; '2009-09-09';

 nspname 
---------
 sc_bar
(1 row)

In one pass this query goes over all the schemas whose names start with sc_, grabs the latest date from the schema’s activity tables and matches the result against the provided date.

Of course, this approach is quite ineffective. Each time the function is called, a query is parsed and executed using a separate plan. A simple UNION of two queries would do the same, but… what if there are ten schemas? How about a hundred? I’m actually working with a database containing thousands of them.

I use this function when I need to collect statistics or do some db administration tasks, it saves me a lot of time.

Navigate
« To the past Page 1 of 2
About