What are the advantages of Erlang over other programming languages?

There's a lot of runtime introspection above and beyond hot code swapping. You can attach a console to a running node and inspect and modify state interactively.  I'm a big fan of debugging with printfs, but if you've got a stateful server in production and you want to debug it, it's a lot easier to connect using a real REPL shell than with say gdb.

How long does it take to master Erlang?

Try finding an online tutorial, but don't be afraid to ask a friend for help — if you're fortunate enough to have one that knows Erlang.  The core language isn't too bad if you're familiar with FP, but the associated concepts like gen_*, supervision hierarchies, and error handling are tricky.  Don't be ashamed to ask for help … it's not as straightforward as most of the languages you've learned.

Implement the things you're learning.  After you learn a concept, try building something that uses that concept.  If nothing comes to mind, ask someone for an assignment or find a tutorial with homework.  Reading tutorials and manuals will tell you how a feature works, but you won't really understand where it's appropriate to use until you've actually used it incorrectly a few times.  Experience is key; you'll probably find that after a while of programming Erlang that everything you wrote more than two weeks ago is complete crap.

Once you get into the real world use http://erldocs.com for reference.  It's the same material as http://www.erlang.org/doc but presented better.

What are some well-known applications/websites utilizing Erlang? 

The following companies have used Erlang at places where it's features fit best:

  • ProcessOne, built loads of xmpp,messaging, streaming stuff on erlang
  • Mochimedia built mochiweb, used by projects like Riak, Erlyweb,etc
  • Yahoo within delicious, content extraction
  • Amazon to build simpleDB *
  • Facebook for chat, as mentioned in another answer & as part of thrift rpc
  • Github data serializing, among several others
  • 37 Signals ( socket related work at somet point )
  • heroku *
  • geodesic uses erlang which is used in mundu products such as radio, messenger, tv apps popular in some parts of the world
  • [plug] hover.in btw runs on the LYME stack ( linux,yaws,mnesia,erlang )
  • Basho has one the most talented erlang teams with products like riak, riak search, rebar, nitrogen web, erlang_js, etc
  • Membase ( caching /nosql db which used erlang even before acquiring couchdb which was built with plenty of erlang )
  • ipad video discovery app Showyou

What are the easiest to use, most robust solutions for creating highly-available, non-blocking webservices?

Easiest to use and robust rarely come together in one package, particularly when talking about something like web-services. It's amazing how people always think these are so simple to set up — here lie many daemons (spelling intentional).

The simplest I've found so far is probably node.js, but it's not perfect… yet. As far as I last saw, it still only will run on a single core. So if you want to take advantage of any modern machine, you'll need to run it behind something that can partition requests across multiple node.js processes, which already starts adding complications. Also, while many backend service integrations have been written for it, this is still a developing area, so YMMV.

Scala (in particular with Akka) looks to possibly be the best up and coming bet, but getting started with it can still be a bit difficult at first. After you get over the initial bumps (mostly just getting your environment set up for building projects with it), it's mostly smooth sailing based on my experiences so far. You'll have access to all the solid Java-based libraries, so you won't want for access to backend services.

What aspect of the Erlang language makes applications in Erlang scalable? Why can't they be ported to other languages?

Writing an application in Erlang (or any other language or platform) does not automatically make that application scalable.  However, some features of the Erlang language and runtime system do make writing scalable systems easier.

The lightweight processes and first-class message passing encourage writing in a style that minimizes shared state.  You can do this well in other languages, and you can force yourself to do it wrong in Erlang… but the building blocks in Erlang make it feel a bit more natural to get it right.

Many other things that are often needed in scalable systems are straightforward in Erlang, such as implementing state machines (generic FSM behavior), packing and unpacking protocol messages (great binary syntax), remote failure detection (process monitors) and much more.

There is no silver bullet for scalability, but Erlang can be a very valuable tool.

UPDATE to reflect JChris' comment:  Erlang's per-process garbage collection is a huge benefit when building systems with soft-realtime constraints, and other effects of the process model are also beneficial.  I left that out not because it is not valuable, but only because it seemed less relevant to the specific question about scalability.

Under what conditions/needs would Erlang be the most appropriate language to build a product/service?

Erlang is a good fit for:

  • Irregular concurrency: Task-level, Fine-grained parallelism
  • Network servers
  • Distributed systems
  • Middleware: Parallel databases, Message Queue servers
  • Soft Realtime / Embedded applications
  • Monitoring, control and testing tools

Not so good fit for:

  • Concurrency more appropriate to synchronized parallel execution: Data Parallelism
  • Floating-point intensive code (HPC)
  • Text Processing / Unicode (Unicode support now much better)
  • Traditional GUI (some improvement with wx now)
  • Hard Realtime applications
  • Extensive interop with other languages/VMs (some improvement here with NIFs and Erjang – Erlang on JVM)

I use Erlang for the things listed under "not so good fit", because it's advantages out-weight some inconvinience.

What are some examples of startups based primarily on a functional programming language such as LISP, Erlang, Haskell, etc.?

Erlang (programming language): hover.in (90% of our code at hover.in 2008- '11 relied on Erlang)

You might also want to also see What are some well-known applications/websites utilizing Erlang? 

Here's an idea of what we had built, and how we used Erlang (adapted from my Y Combinator thread on a similiar question):

  • A key-value store modeled caching system that can run functions during set/get, have expiry, counters, counters that wait for N as a buffer before running transactions, and machine learning—all as primitives. The tail-recursive processes act as mini-servers holding state and the best thing is that you have a semi-automatic scaling if you combine this well with expiring of processes. Increase load when there's a need. They die when there's no more need/lesser traffic.
  • A distributed crawler that works with Python. Abandoned the typical rdbms approach and brought down creating inverted indexes from more than an hour to few seconds. The eureka moment was when we stopped porting/writing other languages' sequential for-loop style approach and embraced processes and message passing. Asked myself, "Why didnt I do this a few years ago!"
  • A website , dashboard for the ad/content network using yaws on the LYME stack.
  • Btw—rumours about unicode are misleading. Actually it was a blessing for us that content aka strings are just integers which when converted to binaries not only take less space, pattern match like a dream but all of a sudden your system can handle any language, because to it, it's all just integers, english or swahili. That was a huge bonus even when getting investors, new untapped market opportunities.
  • An ad network (content network i guess without saying).
  • A fire-and-forget thumbnail/screenshot system with queues & talking to imagemagick. I love writing and fire-and-forget tools, and with Erlang, it's very easy to write processes that have a life of their own. At some point you need to compromise on Erlang's unique ability to handle RAM vs giving that up to research on how to handle large mnesia tables (mnesia = bundled database that has a limit on size of tables)—but it's worth the tradeoff.
  • Case in point: our idea of scaling was…'reducing nodes' ie increasing efficiency:
    – 2008 a million hovers in the year, 4 nodes
    – 2009 a million hovers every month, 3 nodes
    – 2010 a million hovers every week, 2 nodes
    I could bring it down to 1 node for doing the tasks as well, on just one node with 8gb RAM.

    The idea of reducing your costs, means that much lesser to profitability. (We're now profitable, pretty much on auto-pilot on the backend, the 3-4 member team concentrating on increasing topline sales now.)

In addition:

  • A wrapper to iui for nitrogen framework for building iPhones (product) webapps easily
  • Wrappers to Tokyo Cabinet I've contributed to called medici
  • Wrappers to other things like imagemagick, rrd, aws utils, etc

Have tried to document them at http://slideshare.net/bosky101