I've worked through the derivation of the Y combinator several times, and always found it to be a slippery concept. I'll "get it" for a while, but I move on to other things, and, by the time I come back, I've got to think about it again. One reason it's so hard to grip is the amount of syntactic noise necessitated by peripheral issues like applicative order and normal order evaluation, even in very clean programming languages like Scheme. Here, I will work an exercise (9.9, in fact) from Paul Hudak's lovely book, "The Haskell School of Expression," which results in the cleanest derivation of the Y combinator I've yet encountered. I haven't plumbed all the depths of Haskell, yet, so I am not entirely sure how it sails around the applicative vs. normal issues that make Y slippery in Scheme, though implicit currying might be a large part of it. However, I'll just "take it," because it does sail around. First, here is the normal, recursive definition of mod, expressed in Euclid's antenaresis procedure (repeated subtraction). You should be able to read this even if you don't know a thing about Haskell:
antenaresis a b = if a < b then a
else antenaresis (a-b) b
Now, here is the miraculously small definition of Y in Haskell:
y f = f (y f)
and, the answer to Paul's exercise,
ant = y (\g a b -> if a < b then a else g (a-b) b)
Painful, ain't it? Here's why it works. Let f be (\g a b -> ...), so (y f) is ((\g a b ...) (y (\g a b ...))). This is a function of two arguments, because the first argument, g, got sucked up by the fixed-point calculation y (\g a b...). Those two arguments get bound to the variables a and b, and g gets bound to the fixed point, which is a function of two arguments. This is hardcore black magic, but it works. And it seems less slippery to me: I might be able to hold on to it longer.
For you Scheme-heads amongst us, here is the Y-combinator I use in regression tests for my Scheme implementations. I don't even want to look at this for very long, let alone dissect it like the one above, but knock yourselves out if you like:
(define appY
(lambda (f)
((lambda (x) (f (lambda (a) ((x x) a))))
(lambda (x) (f (lambda (a) ((x x) a)))))))