What are the trade-offs between functional programming and imperative programming?

Composition is at the heart of writing software well. 

One reason is that software must be written for people to read and understand.  When this is not a priority a system will take on additional technical debt as the software begins to deteriorate into legacy code.  Bear in mind, that code would not be called that if it remained well understood and easy to change and extend.

Composition is first about well-defined responsibilities.  A component should know how to perform a particular task and do it well; it shouldn't even be aware that there are other kinds of tasks.  Composing by it's very nature is about isolating concerns.  When components are designed this way it makes them easy to glue together to perform more complex tasks.  This property is superior to most other properties when it comes to writing systems that stand up to the pressure of ongoing change.

Take a list of employees.  Each year we run a process that takes all employees having worked 5 years or more and gives them an extra paid day off.


  Where(emp => emp.getYearsEmployed() >= 5).
  Select(emp => emp.getBenefits()).
  Each(bene => bene.addPTODay(1))


For Each emp in DB.GetEmployees()
  var bene = emp.getBenefits()
  If bene.getYearsEmployed() >= 5 Then

Note the differences. 

Each step in the functional approach isolates a concern whereas the imperative approach intermingles them into a more complex unit of work.  The functional program at one moment is concerned with filtering (Where) and a moment later projecting (Select).  It deals with these concerns one at a time composing them into a greater concern.  The imperative approach balls them up.

Consider the practicality.  Using the functional approach one has an easier means of executing just part of the composition.  In this situation, it might be useful, in a REPL for example, to to select the targeted employees and confirm we have the right ones, before executing the side effects.  You might liken this to running the a SELECT before you actually run your UPDATE in T-SQL.  This isn't quite as easy with imperative code.

The functional way is more modular and human parsable, if you will.  It's easier to insert other components into the pipeline because the concerns are not intermingled.  It's equally easy to drop a concern from the pipeline.  Let's give everyone an extra day off: just drop the Where.  Doing the same to the imperative code would involve dropping the If, removing the If block terminator, and unindenting the code.  This lifts code from being conditional to unconditional but the manner is less straighforward and thus more likely to be done incorrectly.  If many more concerns had been involved this would have been even more perilous.

The functional approach pushes the side effects (state changes) to the end more neatly separating them from the semantics of finding the right data on which to act.  The imperative approach performs the actions somewhere in the middle of the block.  The work is performed in a larger, more-deeply-nested context which has references to variables that are not necessarily relevant to the work happening on any given line. 

I keep stressing how trivial this example is.  A process can be way more involved potentially having dozens of variables and dealing with several conditional branches.  One could imagine nesting 3, 5, even 7 levels deep.  The functional model holds up better under these circumstances because it works hard to isolate concerns and treats them as components.  When it's all said and done, composition is at the very heart of functional programming.  Programs are built up compositionally from bottom to top.

So what does imperative programming do better?

Well, the fact that a process is intermingled rather than composed will allow it to be more highly optimized, meaning faster — though the functional style does in some instances outperform the imperative style.

I have generally found that programmers find it easier to write programs imperatively.  I chalk this up to institutional training (we've been taught this way of thinking from early on) and the fact that I think it does better match how normal people think and reason about solving real-world problems.

Because the imperative way freely permits mutation, it's generally easier to effect state changes (save data).  In the above functional example, I performed a mutation when I added the paid day off.  If I had been more strictly functional I would have used an even more isolated method of effecting the change.  This is because the functional paradigm wants to deal primarily in immutable data. 

Having immutable data by default forces a programmer to effect state changes in a more constrained way.  This is a double-edged sword.  In one respect it forces programs to be written in a style that make them easier to reason about.  In another, it adds a layer of complexity in that you have to devise a strategy for handling state change.  This often means that, as demonstrated in the example above, one generally defers dealing with state change until the last possible moment.  Only practice will enable one to understand how to do this well.  While imperative programmers don't have to deal with this complexity because they can modify state willy-nilly, there's benefit in learning how to isolate state changes from all the other work a program needs to do.

When I think about designing a functional program I think of a tube of toothpaste where the toothpaste is dealing with state change.  As best one can he squeezes the toothpaste from the back of the tube up toward the front and out the nozzle.  Inevitably, it gets harder and harder to get out that last bit, but you try.  And finally, there is a small bit which remains which just won't come.  Writing a functional program is like that.  While you do everything possible to squeeze all of it out, a bit of it must remain.

What are the advantages of Haskell over other functional programming languages?

With Haskell, I can produce correct code significantly faster than in other languages. Here’s why:

  • An expressive type system promotes agility better than the so-called “agile programming languages” because of how types support radical refactoring. When I decide to change an interface deep in some large system, the type checker subsequently tells me everywhere else in the program I need to change to get it up-to-date. Other languages’ type systems can sort of do this, but it requires more expressiveness than you get from Java to make types more helpful than annoying.
  • No amount of testing can deliver the confidence that passing the type checker does in proving the absence of some varieties of bugs. In the iterative refinement cycle, once a program types again, the vast majority of the time it passes the regression suite as well. Types are the first line of defense.
  • Purity makes a program easier to reason about. Side-effects are often used for non-local communication, which leads to a variety of difficult bugs. We need side-effects sometimes, and Haskell’s design makes it very easy to see where effectful computation is happening and where it isn’t. It may also turn out that referential transparency is the key to allowing mere mortals to do parallel programming.
  • Type classes provide ad-hoc overloading, which makes some things more pleasant than in other members of the ML family. More importantly, advanced type class techniques provide type-level metaprogramming, which often allows for type-directed synthesis of what would otherwise be large amounts of boilerplate. Boilerplate is annoying and, because it requires maintenance, expensive.
  • Monad transformers are a great way of layering effects in otherwise pure code. Very, very little of my code is in the IO monad; in my current project, it’s confined to two files that together comprise less than 5% of the total line count. On the other hand, most major system components run in a monad specialized for what they need to do, which makes it very easy to see what kind of information flows where.

As far as I’m concerned, Haskell has one big downside:

  • It has no macro system. I use Template Haskell, and it helps in some circumstances, but it’s no substitute for a real macro facilty such as syntax-case.

I’m wouldn’t claim that Haskell is a silver bullet (http://en.wikipedia.org/wiki/No_…) — it may not turn a 10X into a 100X — but my experience is that if you can learn it well, it will increase your productivity. It’s more fun, too.

What are the trade-offs between effect typing, monads, and uniqueness typing?

It depends, of course, on what you’re trying to accomplish — they certainly aren’t interchangeable. If your goal is to get a handle on state in some way, consider:

  • A type-and-effect system usually allows you to write in direct style — the ‘natural’ style that you’re used to, whereas monads (usually) will require explicit sequencing, and uniqueness types will require explicit threading.
  • If your goal is to model effects in a pure language, an effect system won’t do it. You can generally embed uniqueness types into an effect system, and you can embed an effect system into an (indexed) monad, but then you’re back in explicit-sequencing-land. Likewise, you can use a state monad in an uniqueness-type system to avoid explicit threading, but then you're stuck with explicit sequencing.
  • Uniqueness types and effects are both features that have to be built-in to your type system, whereas monads are a programming technique that can be applied in any higher-order language, or provided as a library.
  • Monads can encapsulate a variety of effects (such as non-determinism) that don’t fall into a uniqueness-type framework, and probably don’t work out as neatly in an effect system. However, no one knows how to compose these things in general — it usually has to be worked out on a case-by-case basis.
  • Uniqueness types and effect systems can make it clear when some code accesses or affects only a portion of some larger state, which may be more difficult to see using a monad.

Many of the above generalizations are not strictly true, but I hope they’re close enough. I’d also suggest looking at linear type systems as an alternative to uniqueness types — they can express many of the same things, but I think linear types are cleaner.

What are the main advantages and use-cases of currying?

If your functions are in curried form, that makes it more convenient to partially apply them. It's never necessary, but it's often syntactically lighter. For example, suppose that you have a list of integers xs, and you'd like to add 3 to each of them.

In Standard ML, the operator + has type int * int -> int. If you want to add 3 to each element of a list using map, you need a one argument function that adds 3, which you can get by partially applying +:

map (fn y => 3 + y) xs

In Ocaml, on the other hand, + is curried — that is, it has type int -> int -> int. (The arrow right-associates, so that's the type of a function that consumes an integer and produces a function that consumes an integer and produces an integer.)  This makes partially applying + a bit shorter:

map ((+) 3) xs

These same techniques work in untyped, higher-order languages such as Perl and Python as well, but it's easier to explain in a typed language, where you can see the difference between curried and uncurried + in their types.

If you write your functions in a curried style, the order of the arguments affects which arguments you can partially apply conveniently. For example, given a list of floats in Ocaml, it's syntactically lighter to find the reciprocal of every number than to divide every number by 2:

map ((/.) 1) xs             (* reciprocal of every number in xs *)
map (fun x -> x /. 2) xs    (* half of every number in xs *)

There are a few solutions to this problem. One, employed by Haskell, is operator sections, which allow partially applying either operand:

map (1 /) xs    -- reciprocal of every number in xs
map (/ 2) xs    -- half of every number in xs

More generally, the Scheme cut macro (http://srfi.schemers.org/srfi-26…) facilitates arbitrary partial application. For example, given a vector v and a list of indices into that vector indices, we can set each of those indices in the vector to 0:

(for-each (cut vector-set! v <> 0) indices)

In this example, <> is the placeholder for the argument that isn't partially applied; cut actually allows an arbitrary number of placeholders. Note that this is not an example of currying, but an attempt to provide a more flexible way to do partial application than currying provides.

How long does it take to master Erlang?

Try finding an online tutorial, but don't be afraid to ask a friend for help — if you're fortunate enough to have one that knows Erlang.  The core language isn't too bad if you're familiar with FP, but the associated concepts like gen_*, supervision hierarchies, and error handling are tricky.  Don't be ashamed to ask for help … it's not as straightforward as most of the languages you've learned.

Implement the things you're learning.  After you learn a concept, try building something that uses that concept.  If nothing comes to mind, ask someone for an assignment or find a tutorial with homework.  Reading tutorials and manuals will tell you how a feature works, but you won't really understand where it's appropriate to use until you've actually used it incorrectly a few times.  Experience is key; you'll probably find that after a while of programming Erlang that everything you wrote more than two weeks ago is complete crap.

Once you get into the real world use http://erldocs.com for reference.  It's the same material as http://www.erlang.org/doc but presented better.

What is Functional Reactive Programming?

Fundamentally, functional reactive programming (FRP) is programming declaratively with time-varying values. The idea is to model things like user input and animations in a more direct, declarative way by making their behavior over time more explicit.

In a normal imperative program for something like a GUI, you model change over time implicitly in two ways: using mutable state and using callbacks. FRP models the same notions by introducing two data types representing values that change over time, events and behaviors:

  • events represent a value at a particular time. For example, a keypress would be represented as an Event Char. Pressing the a key at time t would be represented as the pair (t, 'a'). In practice, we never program directly with events; instead, we deal with potentially infinite streams of events ordered by time. So a keyboard input would be a stream of Char events. You can think of streams as representing a value which only exists sometimes: the typed character only exists when the key is struck.
  • behaviors represent values that vary continually over time. These are values that always exist but can be constantly changing. The mouse position is a perfect example: it would be a Behavior Point. The mouse always has a position, but when it's moving that position is constantly changing.

Conceptually, you can think of event streams as infinite lists of time value pairs and behaviors as functions from a time to a value. However, time is not usually explicit; instead, it is managed by the FRP runtime system. This means that your entire program is made by combining behaviors and events in different ways–you never ask for the value of something now but instead define how it behaves based on other time-varying values.

The advantage of this sort of programming is that it abstracts over managing values that change with time. It makes dependencies between different values more explicit and allows you for some very nice combinators. This makes it fairly easy to express your logic at a higher level instead of dealing with the implementation details of mutable state and callbacks.

My favorite combinator for illustrating the declarative nature of FRP is when. The idea is simple: given a Behavior Bool–a continuouisly varying boolean–we can use it to filter a stream of events. Here's an example adapted from a very simple game of life applet I wrote:

when (not <$> paused) (step <$ time)

The idea here is very simple. paused is a behavior that tells you whether the game is paused or not. time is a stream of events from a timer with one event every n seconds. step <$ time just creates a stream of functions that advance the simulation by one turn every n seconds. The entire expression creates a stream of step functions every n seconds as long as the game isn't paused.

I like this particular example because it's simple and easy to relate to. Most of the other combinators are in a similar vein. 

Your entire program in FRP ends up being a combination of events and behaviors like this–you start with some primitive ones like mouse actions and combine them to create your own. Finally, you just plug the events/behaviors into outpus as appropriate. For example, if you have a Behavior String, you can just plug that into a label and have the label update to the correct value automatically.

While FRP is probably most often used for GUIs, it can also be used for things like games, synthesizing music or even controllers for robots. It's a very nice technique for specifying reactive systems in general, which are systems that can continually respond to inputs.

What is a monad?

In functional programming one typically only deals with one category, the category of types. This is commonly called Hask when we're dealing with Haskell.

Keep in mind that Hask is not just a collection (or class) of types. It also contains morphisms between the types. These are just functions.

A functor is sort of like a function between two categories. A functor from a category to itself is an endofunctor — scary name, simple concept.

An endofunctor in Haskell is basically a type constructor — a function that takes a type and produces a new type. Note, though, that a functor doesn't just deal with the objects of the category (types in our case) but also with the morphisms. That's the job of fmap in Haskell. (So what is called a functor in Haskell parlance is perhaps more accurately called an endofunctor.)

A monad is an endofunctor — again, just a Functor in Haskell — along with two other special operations (called natural transformations, but let's not worry about that too much). Remember that the endofunctor here comes with a type constructor, say T.

One of these operations, called return or lift, takes an object of type x and produces an object of type T x. That's simple enough. So, for instance, in the list monad this operation takes an object, say of type Int, and it produces an object of type [Int]. (Which object, specifically, might it produce? Of course, just the singleton list whose element is precisely the provided integer.)

The other special operation, called join, is a bit more interesting. It takes an object of type T (T x) and produces an object of the simpler type T x — it flattens the two successive applications of the type constructor. Again, as with all this stuff, it's actually very simple. In the list monad, for instance, this join operation is just list concatenation — it takes a list of lists and produces a single list.

What are monads used for in software engineering?

Monads are used virtually everywhere in software engineering. We are not always aware of them. But wherever you are using lists, errors, state, parsers, futures, I/O, or random number generators, chances are good that you are making use of the fact that these things form monads.

The basic functionality provided by all monads (and therefore the reason to use them qua monads) is substitution and normalization. The basic thing that a monad allows you to do is variable assignment and to substitute subexpressions for variables. The monad then re-normalizes the expression to a new one that has your subexpressions grafted where the variables used to be. Each monad carries some effect and the renormalization procedure (the monadic join, or the natural transformation µ) determines how the effects are combined.

For example, in a list of integers you can view each of them as a variable. Then if you have a function from integers to lists (of type Integer -> List x), you can substitute a list of values of type `x` for each of the integer variables which gives you a list of lists. Then you remove the inner brackets (normalize) to end up with a plain list where each of your integers has been replaced by zero or more elements of type `x`. Here's that in action:

for {
  i <- integers
  r <- numberwang(i)
} yield r

With an error monad, you will have some expression `e` that is either an error or a value. You can view that value as a variable and substitute another expression `f` (which could also be an error) for the variable. When the monad renormalizes you get a single expression `g`. If `e` would have failed, then `g` fails with that error. Otherwise it fails with `f`'s error, or succeeds if there were no errors. Example:

for {
  i <- mightError
  r <- mightAlsoError(i)
} yield r

We could repeat this pattern for all monads. But the real power comes from the fact that we can abstract over the monad and write programs that don't just work with lists, or just with futures, or just with errors, but with any monad. They all share a common interface that we can exploit and program against. In practice there is an enormous library of functions that can be written once and for all, and reused for any given monad. And since monads are everywhere, this is very useful.

For the category theorists, the monads that are used in software engineering are specifically (usually) the monads on Set.