Are there other examples of NLP algorithms writing published books?

Sure, how about the 1993 book, "Just This Once", subtitled "A novel written by a computer programmed to think like the world's best-selling author, as told to Scott French." The book was said to be written by French's computer program "Hal" with considerable human intervention; the best-selling author it imitates is Jacqueline Susann. When the book was published it made it to the front page of The New York Times:…

Here's the book on Amazon:…

The reviews are not really complementary ("Shallow, beautiful-people characters are flatly conceived and randomly accessed in a formulaic plot…") but I'm sure one of these days we'll have some great American machine-written literature. My toaster was just telling me the other day he's been working on a novel about appliances in turn-of-the-century London.

What are the advantages of having a degree in math and working as a programmer?

  • Understanding function convexity very deeply.  While most "computer science" eschews arbitrary functions, real-world algorithmic work involves a lot of magic function creation.  In some contexts, what that comes down to is understanding convexity: how should your output change as the scale of your input changes? For example, the log factor in IDF is crucial to ensuring that extremely rare words don't overpower medium-rare words or that common words like "restaurant" aren't completely discarded.  This isn't a difficult concept, but deep intuition comes only with years of mathematical thinking.
  • It makes a lot computer science classes/concepts easier.  Non-math-majors in, say, an algorithms class will have to spend significant amount of brain power simply on the process of a rigorous proof; if you've already mastered that as a math major, then you can devote all your brain power to the specific problem at hand.
  • Specific concepts applicable to computer science are covered arguably more deeply by mathematics classes. Along with the statistics and machine learning mentioned by Adam D'Angelo, there's graph theory (e.g. PageRank), game theory (AdWords), formal logic (static analysis), number theory (cryptography), &c.
  • Diversity of paradigms.  Whatever your major, it will give you paradigm(s) for thought.  If you assume you'll already be exposed to computer science paradigms (see below), then it may be useful to have other frameworks for working with ideas.
  • Online computer science resources are really good. Computer science is not programming, and so it's still important as a programmer to get exposure to computer science ideas.  However, programming does facilitate that relatively organically ("wait, how does that hash table thing work?"), and the abundance of good resources online (videotaped lectures, stack overflow, etc.) make this easy to do.
  • Precision and skepticism.  These are two skills you hone deeply as a mathematician, and apply to essentially any thought or discussion.
  • Math majors are really smart; It's good to have a network of really smart people, and in some ways better to have a network of really smart people that all the other programmers won't necessarily know as well.

What is recursion?

Recursion is a wonderful programming tool.  It provides a simple,  powerful way of approaching a variety of problems.  It is often hard,  however, to see how a problem can be approached recursively; it can be  hard to "think" recursively.  It is also easy to write a recursive  program that either takes too long to run or doesn't properly terminate  at all.  In this article we'll go over the basics of recursion and  hopefully help you develop, or refine, a very important programming  skill.

What is Recursion?

In order to say exactly what recursion is, we first have to  answer "What is recursion?"  Basically, a function is said to be  recursive if it calls itself.  Below is pseudocode for a recursive  function that prints the phrase "Hello World" a total of count times:

function HelloWorld(count) {     
return     print("Hello World!");
HelloWorld(count - 1);

It might not be immediately clear what we're doing here – so let's follow through what happens if we call our function with count set to 10.  Since count is not less than 1, we do nothing on the first line.  On the next, we  print "Hello World!" once.  At this point we need to print our phrase 9  more times.  Since we now have a HelloWorld function that can do just  that, we simply call HelloWorld (this time with count set to 9)  to print the remaining copies.  That copy of HelloWorld will print the  phrase once, and then call another copy of HelloWorld to print the  remaining 8.  This will continue until finally we call HelloWorld with count set to zero.  HelloWorld(0) does nothing; it just returns.  Once  HelloWorld(0) has finished, HelloWorld(1) is done too, and it returns.   This continues all the way back to our original call of HelloWorld(10),  which finishes executing having printed out a total of 10 "Hello  World!"s.

You may be thinking this is not terribly exciting, but  this function demonstrates some key considerations in designing a  recursive algorithm:

  1. It handles a simple "base case" without using recursion.
                In this example, the base case is "HelloWorld(0)"; if the  function is asked to print zero times then it returns without spawning  any more "HelloWorld"s.
  2. It avoids cycles.
                Imagine if "HelloWorld(10)" called "HelloWorld(10)" which  called "HelloWorld(10)."  You'd end up with an infinite cycle of calls,  and this usually would result in a "stack overflow" error while running.   In many recursive programs, you can avoid cycles by having each  function call be for a problem that is somehow smaller or simpler than  the original problem.  In this case, for example, count will be  smaller and smaller with each call.  As the problem gets simpler and  simpler (in this case, we'll consider it "simpler" to print something  zero times rather than printing it 5 times) eventually it will arrive at  the "base case" and stop recursing.  There are many ways to avoid  infinite cycles, but making sure that we're dealing with progressively  smaller or simpler problems is a good rule of thumb.
  3. Each call of the function represents a complete handling of the given task.
                Sometimes recursion can seem kind of magical in the way it  breaks down big problems.  However, there is no such thing as a free  lunch.  When our function is given an argument of 10, we print "Hello  World!" once and then we print it 9 more times.  We can pass a part of  the job along to a recursive call, but the original function still has  to account for all 10 copies somehow.

Check out use case scenarios at….

1) Hierarchies, Networks, or Graphs

2)  Multiple Related Decisions

3)  Explicit Recursive Relationships

Good luck !!!

What is a monad?

In functional programming one typically only deals with one category, the category of types. This is commonly called Hask when we're dealing with Haskell.

Keep in mind that Hask is not just a collection (or class) of types. It also contains morphisms between the types. These are just functions.

A functor is sort of like a function between two categories. A functor from a category to itself is an endofunctor — scary name, simple concept.

An endofunctor in Haskell is basically a type constructor — a function that takes a type and produces a new type. Note, though, that a functor doesn't just deal with the objects of the category (types in our case) but also with the morphisms. That's the job of fmap in Haskell. (So what is called a functor in Haskell parlance is perhaps more accurately called an endofunctor.)

A monad is an endofunctor — again, just a Functor in Haskell — along with two other special operations (called natural transformations, but let's not worry about that too much). Remember that the endofunctor here comes with a type constructor, say T.

One of these operations, called return or lift, takes an object of type x and produces an object of type T x. That's simple enough. So, for instance, in the list monad this operation takes an object, say of type Int, and it produces an object of type [Int]. (Which object, specifically, might it produce? Of course, just the singleton list whose element is precisely the provided integer.)

The other special operation, called join, is a bit more interesting. It takes an object of type T (T x) and produces an object of the simpler type T x — it flattens the two successive applications of the type constructor. Again, as with all this stuff, it's actually very simple. In the list monad, for instance, this join operation is just list concatenation — it takes a list of lists and produces a single list.

What are monads used for in software engineering?

Monads are used virtually everywhere in software engineering. We are not always aware of them. But wherever you are using lists, errors, state, parsers, futures, I/O, or random number generators, chances are good that you are making use of the fact that these things form monads.

The basic functionality provided by all monads (and therefore the reason to use them qua monads) is substitution and normalization. The basic thing that a monad allows you to do is variable assignment and to substitute subexpressions for variables. The monad then re-normalizes the expression to a new one that has your subexpressions grafted where the variables used to be. Each monad carries some effect and the renormalization procedure (the monadic join, or the natural transformation ยต) determines how the effects are combined.

For example, in a list of integers you can view each of them as a variable. Then if you have a function from integers to lists (of type Integer -> List x), you can substitute a list of values of type `x` for each of the integer variables which gives you a list of lists. Then you remove the inner brackets (normalize) to end up with a plain list where each of your integers has been replaced by zero or more elements of type `x`. Here's that in action:

for {
  i <- integers
  r <- numberwang(i)
} yield r

With an error monad, you will have some expression `e` that is either an error or a value. You can view that value as a variable and substitute another expression `f` (which could also be an error) for the variable. When the monad renormalizes you get a single expression `g`. If `e` would have failed, then `g` fails with that error. Otherwise it fails with `f`'s error, or succeeds if there were no errors. Example:

for {
  i <- mightError
  r <- mightAlsoError(i)
} yield r

We could repeat this pattern for all monads. But the real power comes from the fact that we can abstract over the monad and write programs that don't just work with lists, or just with futures, or just with errors, but with any monad. They all share a common interface that we can exploit and program against. In practice there is an enormous library of functions that can be written once and for all, and reused for any given monad. And since monads are everywhere, this is very useful.

For the category theorists, the monads that are used in software engineering are specifically (usually) the monads on Set.

Who decided that, in a list of scientific principles, the numbering starts with zero rather than one?

No one decided that.  In most cases I have encountered, it is not true. 

There is a zeroth law of thermodynamics, but that numbering is a sort of a joke, because at first there were the first, second, and third laws.  When they decided one more law was important, they called it the zeroth law because it was a very basic idea that logically comes first.  See… 

Sometimes books, especially math books, have a "chapter 0" that contains background material that isn't the subject of the text, but is prerequisite knowledge. 

Most of the time, though, lists of scientific principles start with 1, or else the list has no standardized numbering.  The most famous example is Newton's laws, which start with 1.  When people put together their own lists, I usually see them start with 1.

some examples:………

Is the traveling salesman (optimization version) NP Complete?

A problem is NP-Complete if it is NP-Hard and belongs to NP.

For a problem to belong to NP, it has to be:

  1. a decision problem.
  2. we should be able to verify the answer as yes/no given a polynomial length certificate (also called a clue or solution).
  3. The possible number of clues should be finite and of polynomial length.

Now, the optimization version of TSP is not a decision problem, hence it is not NP-Complete but is NP-hard.

However, in the decision version of TSP, we have to determine whether a tour with length <= K exists or not. This is an NP-Complete problem.

Optimization problems in general (except linear programs) does not belong to NP, and are thus not NP-Complete.

How many random bits does it take to generate a uniformly random integer between 0 and 9?

Let's restate the question a bit. You're looking for a function that takes as input an integer number of uniformly random bits — in other words, a sample from a discrete probability distribution with 2^n events. Each event has the same probability: 1/(2^n).

The output of this function is supposed to be samples from a different uniform probability mass function — in this case, one with exactly ten outcomes, and a probability of 1/10 for each one.

The function is deterministic — our only source of randomness is the input. In other words, every possible input event (there are 2^n of them) is mapped to exactly one of the ten possible output events.

Looked at this way, the answer is clear: you can't do it exactly with any finite n. However you map the 2^n inputs to the ten outputs, some of the outputs will end up with more than a tenth of the inputs and some will end up with fewer than a tenth. This is because 2^n is never evenly divisible by 10.

However, you can get arbitrary close to uniform as you increase the value of n (the number of bits you take in). So for whatever tolerance you're willing to allow the output distribution to diverge from exactly uniform, there is a number of bits (an n) that guarantees you'll meet that constraint.

For fun, let's do the math. Say our goal is for each output to have exactly one tenth of the inputs (and therefore a probability of 1/10), but we're willing to let that go as low as 1/10-e or as high as 1/10+e. One simple way to do the mapping is to find the floor of 2^n / 10, i.e. the largest integer less than or equal to 2^n / 10. We assign this number of inputs to the first nine outputs, and the remainder to the tenth output.

The first nine guys end up "low" — by how much? By what's left over when you divide 2^20 by 10: i.e. by (2^n % 10)/10 inputs out of 2^n total — so by (2^n % 10) / (10 * 2^n)) probability. The last guy's probability ends up "high" — by 9x that same value. As long as that's less than e (the tolerance to diverge from exact uniformity), we're good to go. In other words, you can match a uniform decimal distribution with tolerance +/- e by using no more than n bits, where (9/10) * (2^-n) * (2^n % 10) < e. Since this is an upper bound, we can relax the first factor to be 1 and the third factor to be 9 (that's its value at the worst case), so really you are still good as long as n > -lg(e/9). (There are definitely some tighter bounds.)

To sum up: if you want to match a uniform distribution of the digits 0 through 9 with uniform probability 0.1 and tolerance e, and you have more than -lg(e/9) bits, you can do it. If you want to match a uniform distribution exactly — i.e. if e is zero — there is no finite n that can do it.