What are P, NP, NP-complete, and NP-hard?

Complexity classes are one way to talk about how difficult or easy a problem is. Complexity theory gets very technical but the basics are actually extraordinarily intuitive, and it's possible to understand the P versus NP issue with very little math background.

The kinds of problems we deal with in complexity theory come in pairs: a "search" version and a "verification" version. For example —

  • Problem: sorting.
    Search version: input a list of numbers X and output the same list in sorted order (call it Y).
    Verification version: input a list of numbers X and another list Y, and output "YES" if Y is the sorted version of X and "NO" otherwise.
  • Problem: graph coloring.
    Search version: input a network of nodes and edges X, and output colors Y for each node such that no adjacent nodes have the same color.
    Verification version: input a network X and a set of colors Y and output "YES" if all adjacent nodes have different colors and "NO" otherwise.
  • Problem: partition.
    Search version: input some numbers X and divide the numbers into two groups that add up to exactly the same value (call the assignment of numbers to their group Y).
    Verification version: input some numbers X and the groupings Y and output "YES" if the two groups add up to the same value, or "NO" otherwise.

This is the P versus NP problem:

Are there any problems for which the verification version can be solved efficiently but for which there is no efficient solution to the search version?

If there is a fast solution to the search version of a problem then the problem is said to be Polynomial-time, or P for short. If there is a fast solution to the verification version of a problem then the problem is said to be Nondeterministic Polynomial-time, or NP for short. The question of "P=NP" is then the question of whether these sets are identical.

(The "nondeterministic polynomial-time" terminology is terribly counter-intuitive in my opinion. It was used originally because whenever a Turing machine can efficiently solve the verification version of a problem, a non-deterministic Turing machine can efficiently solve the corresponding search problem. But this is really not at all important to understanding P vs NP.)

In the case of the sorting problem above, there are fast algorithms for both the search and verification versions. But for the other two problems, the verification versions are easy (heck, my grandmother could probably write a computer program to check that two lists of numbers add up to the same value) but the search versions are difficult, and indeed there are no fast solutions known. So all three problems are in NP, but only the first is (known to be) in P.

Some problems can be translated into one another in such a way that a fast solution to one problem would automatically give us a fast solution to the other. There are some problems that every single problem in NP can be translated into, and a fast solution to such a problem would automatically give us a fast solution to every problem in NP. This group of problems are known as NP-Hard. Some problems in NP-Hard are actually not themselves in NP; the group of problems that are in both NP and NP-Hard is called NP-Complete.

You start to see the far-reaching implications of a fast solution to any one problem in NP-Hard: we would automatically get a fast solution to every problem in NP, which would mean that whenever there is a fast solution to the verification version of a problem then there is always a fast solution to the corresponding search version.

Remember how the verification versions of those problems seemed easy but the search versions seemed hard? A fast solution to any NP-Complete problem would mean that as long as you can verify proposed solutions to a problem you would never need to search through a substantial fraction of the search space to find solutions; there would always be a faster way. This seems implausible to most mathematicians (and for deeper reasons that I've listed here) and that is why most mathematicians think that there are no fast solutions to NP-complete problems. But we haven't proved it yet.

19 Replies to “What are P, NP, NP-complete, and NP-hard?”

  1. There are two classes of problems, P, and NP (there are many, many more, but we will ignore the rest here).  These stand for "Polynomial" and "Non-deterministic Polynomial".

    Assuming knowledge of big-O notation, we'll move on from there.  Request an explanation and you shall have it.

    Problems in P are solvable by a polynomial-time algorithm (runs in [math]O(f(n))[/math], where [math]f[/math] is polynomial).  This includes things like sorting and triangulation.

    Problems in NP are checkable by a polynomial-time algorithm, but not necessarily solvable.  Generally, we mean NP minus P, when we say NP (so they are only checkable).  This means that, given the solution to an instance of the problem, we can verify that it is the correct solution in polynomial time, but that we couldn't have come up with it on our own, in polynomial time.  An easy example of this is subset sum: given a set of numbers, does there exist a subset whose sum is zero?  There is no known way to solve this problem in polynomial time, but given the answer, we can easily check whether it is right (the decision problem is a little harder to see, but assume you are given the actual subset).

    Now, we have to discuss NP-Complete and NP-Hard, which requires a discussion of reducibility.  Reducibility is the notion that, given an instance of some hard problem, we can quickly construct an instance of some other problem, whose answer would help us solve the first problem.

    The typical logic is this: you claim to have a fast algorithm to, say, solve subset sum.  I create a machine that, in polynomial time, takes a traveling salesman problem instance, and creates from it, an instance of subset sum.  If you can give my machine the answer to that question, it will, again in polynomial time, change that back into the answer for my original traveling salesman problem.  Therefore, if your subset sum algorithm is really polynomial, then I have just added some extra bits to it and created a polynomial time algorithm for solving traveling salesman problems.  If I can create such a machine, then it creates a notion of reducibility, that is, traveling salesman is reducible to subset sum, which indicates that subset sum is at least as hard as traveling salesman.  Intuitively, one might imagine that with all the universal truth out there, one could say that traveling salesman has no polynomial-time solution.  In that case, the above would be a proof by contradiction, that subset sum also has no polynomial time solution.

    Now, we can pretty easily state what NP-Complete and NP-Hard are:

    If two problems can be reduced to each-other, they are in some sense, "equivalent".  All problems in NP that are so equivalent, are called NP-Complete.

    If there is an NP-Complete problem that can be reduced to some problem H, then H is NP-Hard.  This means that, in some sense, H is at least as hard as all the NP-Complete problems, but note that there does not need to be a reduction from H to any NP-Complete problem, so H might be harder than all the NP problems.

    One proves that a problem is NP-Hard by taking an existing NP-Complete problem (3-SAT and Traveling Salesman are my favorites, since they work well for geometry), and using it to generate input to your problem, such that an answer to your problem is equivalent to solving the NP-Complete problem you chose.

  2. I would like to add to the existing answers and also focus strictly on NP-hard vs NP-complete class of problems.
    P and NP-complete class of problems are subsets of the NP class of problems. A problem is said to be in complexity class P if there exists a PTIME (polynomial running time) algorithm that *solves* the problem.
    A problem is said to be NP-complete if there exists a PTIME algorithm that *verifies* the solution. However, there is no PTIME algorithm that guarantees to *solve* the problem. Such problems are the hardest problems in NP and are called NP-complete problems.

    There is another class of problems called the NP-hard. What is the relation between NP-hard and NP-complete?
    NP-hard problems are at least as hard as the hardest problem in NP-complete. Let us quantify the statement:
    Assume that I have a problem (H) which I *believe* is NP-complete (solution is verifiable in PTIME). Let there be an intermediate step (subroutine S) which must be solved in order to solve for H. If S is said to be NP-complete, then H is at least as hard as S. In fact H is *harder* than S. Thus H is *harder* than a NP-complete problem. So it needs to be in a complexity class of its own and hence H belongs to a class known as the NP-hard class. However note that H is not *strictly* NP-hard. This leads us to conclude that NP-complete is a class of problems that fall under both NP and NP-hard – hardest of the NP problems is NP-complete and the hardest of NP-complete is NP-hard.

    Example 1: Let us consider the classical version of TSP (Travelling Sales Person) problem – find the shortest distance between two cities on a map such that the sales person visits each city only once. This is a constrained optimization problem. To solve this one needs to solve for Hamiltonian Cycles first. In 1972, Prof. Richard Karp proved that Hamiltonian cycles are NP-complete. Now, the classical TSP is at least as hard as a NP-complete problem. Prof. Karp thus proved the NP-hardness of the classical TSP. So classical TSP is a NP-hard problem. Note that it is not *strictly* NP-hard.
    (There is a decision version of this problem – given distance L between two cities, can you give a better l < L ? This requires a yes/no answer. This decision version of classical TSP is NP-complete and not NP-hard.)

    Example 2: The halting problem is NP-hard. Given an input, will the Turing machine halt? A Turing machine can be given a constraint to halt – when it solves the SAT problem it can halt. The solution to SAT can be verified in PTIME and hence once the machine outputs the correct solution it halts. However, SAT is a NP-complete problem. So the *general* halting problem should be at least as hard as the SAT problem. Hence it is a NP-hard problem.

    What is *strictly* a NP-hard problem?NP-complete problems are either optimization, search or decision problems.The classical TSP is an optimization problem. For a problem to be striclty NP-hard it should be neither an optimization/decision/search problem.

  3. These refer to how long it takes a program to run.  Problems in class P can be solved with algorithms that run in polynomial time.

    Say you have an algorithm that finds the smallest integer in an array.  One way to do this is by iterating over all the integers of the array and keeping track of the smallest number you've seen up to that point.  Every time you look at an element, you compare it to the current minimum, and if it's smaller, you update the minimum.

    How long does this take?  Let's say there are n elements in the array.  For every element the algorithm has to perform a constant number of operations.  Therefore we can say that the algorithm runs in O(n) time, or that the runtime is a linear function of how many elements are in the array.*  So this algorithm runs in linear time.

    You can also have algorithms that run in quadratic time (O(n^2)), exponential time (O(2^n)), or even logarithmic time (O(log n)).  Binary search (on a balanced tree) runs in logarithmic time because the height of the binary search tree is a logarithmic function of the number of elements in the tree.

    If the running time is some polynomial function of the size of the input**, for instance if the algorithm runs in linear time or quadratic time or cubic time, then we say the algorithm runs in polynomial time and the problem it solves is in class P.


    Now there are a lot of programs that don't (necessarily) run in polynomial time on a regular computer, but do run in polynomial time on a nondeterministic Turing machine.  These programs solve problems in NP, which stands for nondeterministic polynomial time.  A nondeterministic Turing machine can do everything a regular computer can and more.***  This means all problems in P are also in NP.

    An equivalent way to define NP is by pointing to the problems that can be verified in polynomial time.  This means there is not necessarily a polynomial-time way to find a solution, but once you have a solution it only takes polynomial time to verify that it is correct.

    Some people think P = NP, which means any problem that can be verified in polynomial time can also be solved in polynomial time and vice versa.  If they could prove this, it would revolutionize computer science because people would be able to construct faster algorithms for a lot of important problems.


    What does NP-hard mean?  A lot of times you can solve a problem by reducing it to a different problem.  I can reduce Problem B to Problem A if, given a solution to Problem A, I can easily construct a solution to Problem B.  (In this case, "easily" means "in polynomial time.")

    If a problem is NP-hard, this means I can reduce any problem in NP to that problem.  This means if I can solve that problem, I can easily solve any problem in NP.  If we could solve an NP-hard problem in polynomial time, this would prove P = NP.


    A problem is NP-complete if the problem is both

    • NP-hard, and
    • in NP.

    * A technical point: O(n) actually means the algorithm runs in asymptotically linear time, which means the time complexity approaches a line as n gets very large.  Also, O(n) is technically an upper bound, so if the algorithm ran in sublinear time you could still say it's O(n), even if that's not the best description of it.

    ** Note that if the input has many different parameters, like n and k, it might be polynomial in n and exponential in k

    *** Per Xuan Luo's comment, deterministic and nondeterministic Turing machines can compute exactly the same things, since every nondeterministic Turing machine can be simulated by a deterministic Turing machine (a "regular computer").  However, they may compute things in different amounts of time.

  4. P (Polynomial time decidable problems) is a class of problems which can be decided in polynomial time i.e., there's an algorithm for such a problem which tells whether the solution of a given instance of a problem is true/false in O(n^k) time for some constant k.
    For example, deciding whether there exists a path of at most k links between two vertices u and v is in P.

    NP (Non-deterministic polynomial time decidable problems) is a class of problems which have short accepting certificates and efficient verification algorithms i.e., given a certificate of a solution, it is easy to check if the problem is satisfied.
    Accepting certificate is information which can be used to decide if the answer is true or false.
    Formally, a problem is in NP if there exists a verification algorithm A such that for any input x for which the answer is "true", there is a certificate c such that |c| is polynomial and A(x,c) = true. For any x for which the answer is true, there is no certificate c such that |c| is polynomial and A(x,c) is true.
    For example, given a graph G and a number k, deciding whether there exists a clique (complete subgraph) of size k, is in NP.

    P is a subset of NP i.e., any problem that is in P is also in NP. It is easy to see that for any problem that can be decided in polynomial time, there exists a verification algorithm that given any certificate just ignores the certificate and returns the result of the polynomial-time algorithm.

    Whether NP is also a subset of P (i.e., P = NP) is an open question. No one knows yet.

    NP-hard is a class of problems which are hardest of all problems in NP. These are the problems such that if we can solve them fast, then we can solve all problems in NP fast and P would equal NP. NP-hard problems may be of any type: decision problems, search problems, or optimization problems so they may not even be in NP.
    For example, the clique problem discussed above is NP-hard.

    NP-complete is a class of problems which are in NP and are NP-hard. NP-complete problems transform to each-other by polynomial-time many-one reductions so if a polynomial-time algorithm exists for any one of them, then polynomial algorithms exist for all of them. However, no polynomial algorithm for any NP-complete problem has yet been found.
    For example, the clique problem discussed above is NP-complete.

    To delve deeper into the complexity theory, refer to this course on udacity:
    Introduction to Theoretical Computer Science Course (CS313)

  5. The concept has been very well explained by Jessica Su.

    Adding to her answer, just to help visualize this better:

    As mentioned, P is a subset of NP, and the intersection of NP and NP-hard is NP-complete.

    The oval-shaped box would be one if P= NP= NP-complete.

    If P=NP were proved true, there would be many serious real-world consequences. All known encryption schemes rely on the fact that prime factors of large numbers are something that can be feasibly verified but not calculated. If P=NP, that means there would also be feasible ways to calculate prime factors, and hence de-crypt codes without their private keys.

    Related to this, quoting MIT Computer Scientist Scott Aaronson:

    If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett. It’s possible to put the point in Darwinian terms: if this is the sort of universe we inhabited, why wouldn’t we already have evolved to take advantage of it?

  6. Here is the best layperson explanation I've ever come across.

    Suppose we have a large group of students that we need to pair up to work on projects. We know which students are compatible with each other and we want to put them in compatible groups of two. We could search all possible pairings but even for 40 students we would have more than 300 billion trillion possible pairings.

    In 1965, Jack Edmonds gave an efficient algorithm to solve this matching problem and suggested a formal definition of "efficient computation" (runs in time a fixed polynomial of the input size). The class of problems with efficient solutions would later become known as P for "Polynomial Time."

    But many related problems do not seem to have such an efficient algorithm. What if we wanted to make groups of three students with each pair of students in each group compatible (Partition into Triangles)? What if we wanted to find a large group of students all of whom are compatible with each other (Clique)? What if we wanted to sit students around a large round table with no incompatible students sitting next to each other (Hamiltonian Cycle)? What if we put the students into three groups so that each student is in the same group with only his or her compatibles (3-Coloring)?

    All these problems have a similar flavor: Given a potential solution, for example, a seating chart for the round table, we can validate that solution efficiently. The collection of problems that have efficiently verifiable solutions is known as NP (for "Nondeterministic Polynomial-Time,").

    We call the very hardest NP problems (which include Partition into Triangles, Clique, Hamiltonian Cycle and 3-Coloring) "NP-complete," that is, given an efficient algorithm for one of them, we can find an efficient algorithm for all of them and in fact any problem in NP. Steve Cook, Leonid Levin, and Richard Karp developed the initial theory of NP-completeness that generated multiple ACM Turing Awards. In the 1970s, theoretical computer scientists Garey and Johnson showed hundreds more problems as NP-complete.

    NP-hard is a class of problems that are at least as hard as the hardest problems in NP.

    Now, a bit of a background about the holy-grail (the $1 million question) — Is P=NP?

    • True (the $6 million answer): P = NP means that for every problem that has an efficiently verifiable solution, we can find that solution efficiently as well.
    • False: P ≠ NP means that it is impossible to find a general algorithm that will correctly and accurately solve an NP-complete problem all the time.

    Lance Fortnow, "The Status of the P Versus NP Problem," Communications of the ACM, Vol. 52 No. 9, Pages 78-86, September 2009.

  7. If you are still confused, it might be due to some assumptions that you are making incorrectly/unknowingly.

    To clarify:

    NP: Doesn't stand for Non-Polynomial but Non-deterministic Polynomial
    NP & NP Complete are decision problem (Yes or No)
    NP: Given a problem that non deterministic turing machine solves in polynomial time, and its solution, can you verify in polynomial time if the solution is correct?
    NP Complete: Is this an NP problem in disguise? That is, given an NP problem, can I transform it into another NP problem in polynomial time? The significance of this is, if I can solve an NP Complete problem in polynomial time, then I can solve all NP Complete problems in polynomial time.
    NP Hard: Not necessarily decision problems. 'Harder' than NP problem. Why? Because there might be some NP Hard problem whose solution you cannot verify in polynomial time (See NP above). These are the problems that can be transformed from an NP Complete problem in polynomial time. The significance of this is that if I can solve NP Hard problem in polynomial time, then I can 1) solve all NP Hard problems in polynomial time and 2) solve all NP Complete problems in polynomial time (but reverse is not true). In this way, I can solve all NP problems in polynomial time. However, if I can solve all NP problems in polynomial time, then it doesn't necessarily mean I can solve all NP Hard problems in polynomial time. Why? because a NP Hard problem can be such that given its solution, there might not be a polynomial verification (a requirement for NP)

  8. First, let's understand what NP is. Suppose you stumble upon a Sudoku puzzle that you just can't seem to solve. Then, a friend walks up to you and boasts, "Haha! I know the answer to the problem!" Amazed, you tell your friend, "Prove it!" Your friend fills out the empty spaces in your Sudoku puzzle, and behold, upon a quick glance, you've verified that the answer is correct. "How did you do that?" you asked your friend. He responds, "It doesn't matter ;)"

    Sudoku is an NP problem because it you can "quickly" verify a solution to a Sudoku problem. (By "quickly", I really mean "in polynomial time.") It doesn't matter if you can actually figure out the solution quickly or not. All that matters for a problem to be in NP is if a solution can be verified quickly.

    Now, let's move on to NP-Hard problems. Looking at your Sudoku puzzle, you wonder, "Well, I can't solve Sudoku puzzles quickly… But what if, I had a magical ability to instantly solve any Sudoku puzzle in the blink of an eye!!!" The answer to this "what if" is pretty cool. It turns out that if you did have the magical ability to solve any Sudoku puzzle instantly, you can use this ability to solve (not just verify) any NP problem quickly (in polynomial time) as well! For example, given the ability to solve Sudoku puzzles instantly, you can use this magical ability to help you quickly figure out if a FreeCell game is actually completable. (Note that this may not be very "easy" to do, but computer scientists have proven that it is possible, which is all that matters.)

    Thus, Sudoku is in NP-Hard. In summary, an NP-Hard problem is a problem such that, given the magical ability to instantly solve it, you can use this ability to solve any NP problem quickly.

    So this is the layman's version of the definition of NP-Hard. But why "Hard"? Well, if you think about it, NP-Hard problems are quite powerful in the sense that if you can magically solve them, the possibilities are endless. Of course, if a problem is very "powerful", then you wouldn't expect it to be easy to solve, right? In contrast, easy problems, like addition, don't grant you a lot of power in the sense that being able to do addition instantly doesn't really help with solving other NP problems quickly. So powerful problems, like Sudoku, are at the same time very hard to solve, hence the term NP-Hard.

    Footnote: Sudoku is also in NP-Complete. The only difference between NP-Hard and NP-Complete is that NP-Complete problems must also be NP problems, but NP-Hard problems do not have this restriction.

  9. My aim is to provide the intuition for this question as other answers did a good job describing the technical terms.

    Consider a scenario in which you walk into the exam hall and have the following three questions:

    1. Does there exist [math] x [/math] in set  [math]  \{ 1, 2, \dots, 1000 \} [/math] such that [math] x * x = 300 [/math] ?
    2. Do there exist [math] x, y, z [/math] in set [math] \{ 1, 2, \dots, 10\} [/math]  such that [math] x * y * z + x * y ^ 3 * z + 17 * x * y * z ^ 2 =  4564[/math] ?
    3. Do there exist [math] a, b, c [/math] in set [math] \{ 1, 2, \dots, 10\} [/math]  such that [math] 17 * a * b * c ^ 2  + a * b * c + a * b ^ 3 * c =  4564[/math] ? 

    While skimming through these questions, you feel that the first question is easy but the last two are difficult. What exactly do you mean by easy or difficult? Both are relative and vague terms, right? Can we label the problems which are indeed easy (or doable) and which are difficult (not doable)?

    Before we start, lets assume two things —

    1) You are no less than Arthur T. Benjamin[1] and can do any calculation in one sec no matter however complicated that is.
    2) You don't know about complicated stuff like square-root smare-root!

    With this you start to solve the first question. Your approach is simple and robust: trying all the possible values of [math] x[/math]. So there are 1000 possible values to try. Estimated time to solve this problem — 1000 secs. But you realize that [math] x [/math] has to be even if you hope to satisfy the equation. Cool! Now you have only 500 possibilities to try from. Estimated time to solve the problem — 500 secs. Another realization is that [math]x^2 [/math] increases as you keep increasing [math]x[/math] and you remember that 20 * 20 = 400. Hence, if at all there exists a solution, it should lie within the first 20 numbers. Now you only need to check 20 possibilities. Estimated time to solve problem — 20 secs. Wow! This is 50 times improvement over the naive approach.

    Now let's turn to the second question. Consider [math](x, y, z)[/math] as a tuple. Since every element in the tuple can take any value between 1 and 10, we have 1000 such possible tuples. You still need only one sec to check whether [math]x * y * z + x * y ^ 3 * z + 17 * x * y * z ^ 2[/math] is indeed equal to 4564 for given [math](x, y, z)[/math]. Estimated time to solve problem — 1000 sec. There is hardly any hope to reduce this time. There is no special property or structure that you can exploit to reduce the number of possibilities of the tuple (and hence the time to solve) significantly.

    Not willing to waste your 1000 secs on the second question, you turn to the third one. You are again stuck with the same difficulties but you realize that if you could solve the second question, you will very well be able to solve the third one too. You can just rename some variables, rearrange some terms and it looks exactly like the second problem.

    So having realized that now you have only one problem to solve, you turn your attention to the second question. Since the clock is ticking, you can't explore all the possibilities. Now you have two options — You can ask the guy/girl sitting next to you for the solution and s/he does tell you the solution, say [math](x, y, z) = (3, 5, 2)[/math]. Instead of merely copying his/her solution, you check it in one sec. Notice that you can't solve the problem in one sec but if someone provides you the solution, you can check it in one sec. This is possible because s/he has done all the hard work (creative part) and you are just verifying it.

    If your moral values don't allow you to copy, you can take the following approach. Guess the values of [math](x, y, z)[/math] and check the solution. So your scratch paper for Question 2 looks like

    • (1st possibility) — (4, 8, 2)
    • (2st possibility) — (3, 9, 13)

    and so on and so forth, unlike for Question 1 where it looks like

    • (1st possibility) — 1
    • (2nd possibility) — 2

    You can predict with certainty what comes as the next possibility for Question 1 (and hence deterministic approach) but not for Question 2 (non-deterministic approach). These two words are used in the same context while defining P and NP.

    We broadly label (not divide!) all problems with two tags. First, where you can find the solution in reasonable time, we call it class P. Second, where you can only check solution in reasonable time provided by someone else. We call that class NP. Notice that class P is contained in class NP and hence we say label over divide.

    So Question 1 is in P (and also in NP) and Questions 2 and 3 are in NP (but may not be in P). This is because of second assumption. Please see footnote.

    Now it is an open challenge is to solve all problems in NP in reasonable time. And in the quest of one million dollars [2], you and me start to attack every problem which is known in NP. But it is foolish to attack hundreds of problems at once, right? So we decide to pick the hardest problem in this world and put all our energy in solving that problem. The set of such hard problems is called NP-Hard. But wait, we are making this matter more complicated than necessary. If we could pick the hardest problem in the class NP and solve it, we would be able to solve many problems which are less harder than that and are in NP. The class of such problems is called NP-Complete. So, I can vaguely say, Question 2 is in NP-Complete as once you solve that, Question 3 is easy.

    But why should you worry about classification in the first place?

    Since resources are limited, you don't want to assign them to some problem which takes lots of time. If you don't care about this, you could get lost trying to solve the second problem losing points for all three questions.

    Note –
    Technically, 2 and 3 are NOT in NP if you include algebraic tools. The P vs NP approach in Russian Math circle was to check whether a given problem can be solved without exploring (almost) all the possibilities. In this sense, we can classify Questions 2 and 3 in NP.

    References –
    1 Arthur Benjamin's Home Page
    2 Clay Mathematics Institute

  10. NP: Non-deterministic polynomial: Means that a problem in NP can be solved using a non-deterministic machine in polynomial time.
    This roughly means that if you have an infinite number of machines (computers [more rigorously, Turing Machines]), then you can guess the solution for each possible solution and ask each machine to verify that solution (which takes polynomial time for this class of problems).

    Note that all problems in P are in NP.

    NP-Complete: If you have a solution to a (typically decision) problem and the solution can be verified in polynomial time, but you can NOT come up with a solution in polynomial time, then the problem is said to be NP-complete.

    For example, suppose ask you if a value of V can be achieved in the knapsack problem with a given weight W of the knapsack, you can't quickly come up with a solution, but if I give you a possible set of items { … }, you can verify if it is a valid solution that has value V and fits in weight W. Hence, the decision knapsack problem is NP-complete.

    NP-Hard: If you have a solution to a (typically optimization) problem and the solution can not even be verified in polynomial time (let alone solving it in polynomial time), then the problem is said to be NP-hard.

    For example, suppose I claim that { … } is the best solution to the knapsack problem. Is it possible for you to verify it quickly? No, since you would have to solve knapsack to see if you can find a better solution, and solving optimization knapsack is NP-hard.

  11. P problems are questions that have a yes/no answer and can be easily  solved by a computer.
    For example-determining whether a number is prime  is a relatively easy problem to solve.

    NP problems are questions that have yes/no answers that are easy to  verify, but are hard to solve.  And by hard, I mean it would take years,  decades or centuries for your computer to come up with an answer.
    For example-the travelling salesman problem is trying to figure out the shortest trip through a bunch of cities, visiting each city only once.

    NP-complete problems are special kinds of NP problems.  You can take  any kind of NP problem and twist and contort it until it looks like an  NP-complete problem.
    For example-the knapsack problem is NP.  It can ask what's the best way to stuff a knapsack if you had  lots of different sized pieces of different precious metals lying on the  ground , and that you can't carry all of them in the bag.
    (Knapsack is also NP-complete, so you can do the reverse as well!)

    NP-Hard problems are worst than NP problems.  Even if someone  suggested you a solution to a NP-Hard problem, it'd still take forever  to verify if they were right.
    For example-in travelling salesman, trying to figure out the  absolute shortest path through 500 cities in your state would take  forever to solve.  Even if someone walked up to you and gave you an  itinerary and claimed it was the absolute shortest path, it'd still take  you forever to figure out whether he was a liar or not.

  12. P : problems that can be solved in polynomial time. for example  Sorting you can solve it in O(n log n).

    NP : decision problems that can be verified in polynomial time. for example Hamiltonian Path(Cycle). For Hamiltonian Path problem assume that you have solution as either Yes or No, then verifying whether it is correct , can be done in polynomial time.

    NP-Hard: If all the problems in NP can be reduced to problem X, then X is NP Hard.

    NP-Complete: problems that are NP Hard and in NP.

    Its still unknown whether P=NP or P!=NP. Proving Either wins $1000000
    cash prize. But it is widely believed that P!=NP.

    As a Software Engineer knowing this information is useful for following reason : Suppose your given a problem, and you don't know its NP-C you will WASTE lot of time trying to find a polynomial time or good solution !!

  13. They’re all sets of sets.

    In fact, they’re sets of sets of natural numbers.

    In the theory of computations, a problem is a set of natural numbers. To solve a problem means there’s an algorithm that provides member(s) of that set. To verify a solution means given a natural number there’s an algorithm that tells if it belongs to that set or not. For example,

    SQUARES = { n [math]\epsilon[/math] N : n is a perfect square }

    PRIMES = { n [math]\epsilon[/math] N : n is a prime number }

    are problems. Notice something? I’ve denoted the 3 sets of natural numbers with bold CAPITALS.

    A way to compare algorithms is to see how many computational steps (that’s time for computers and robots) the algorithms need to make vs. how many digits the input(s) have. If we were to asked to compete in multiplying two numbers of size (# of digits) m and k resp. and yours took m + k steps and mine needed m raised to k steps then obviously yours is superior! We’d say your algorithm scales linearly w.r.t input size and mine is exponential. (So there are algorithms which scale polynomially and then there are algorithms which are just proofs of existence 😉

    Definitions time:

    (Sets of sets of natural numbers in bold italic CAPS.)

    A problem that can be solved in polynomial time is said to be in P.

    A problem whose solution can be verified to be correct or wrong in polynomial time, once the tentative solution has already been provided by someone, is said to belong to NP. Of course, if you can solve a problem, you can verify a solution to it, so P [math]\subset[/math] NP.

    A problem whose complement is in NP is CONP.

    A problem is in NPHARD if every problem in NP can be “reduced” to that problem. (This seems to be the opposite of “reduction”, plus you’d have to take computational steps to reduce it – so it’s a requirement that the reduction itself is polynomial). This includes problems which might be in NP and up.

    The intersection of NP and NPHARD is called NPCOMPLETE, ‘cos they’re that cool. Every problem in NP can be converted to any problem in NPCOMPLETE, including themselves. Solve just one of them in polytime and you’ll have proved P=NP.

    Now for some examples:

    PRIMES is in P.

    The goal to find the shortest route through every city and return to the starting city, called the Traveling Salesperson Problem, is in NP.

    Proving tautologies in propositional logic is CONP. See propositional proof systems.

    Super Mario Bros. is NPCOMPLETE, and so is The Legend of Zelda!

    Reducing Wikipedia and arxiv articles to a Quora answer: NPOE. [citation needed]

    PS. I hope my more pedantic readers will excuse my lack of formalisms!

  14. P : problems that can be solved in polynomial time. for example  Sorting you can solve it in O(n log n).

    NP : decision problems that can be verified in polynomial time. for example Hamiltonian Path(Cycle). For Hamiltonian Path problem assume that you have solution as either Yes or No, then verifying whether it is correct , can be done in polynomial time.

    NP-Hard: If all the problems in NP can be reduced to problem X, then X is NP Hard.

    NP-Complete: problems that are NP Hard and in NP.

    Its still unknown whether P=NP or P!=NP. Proving Either wins $1000000
    cash prize. But it is widely believed that P!=NP.

    As a Software Engineer knowing this information is useful for following reason : Suppose your given a problem, and you don't know its NP-C you will WASTE lot of time trying to find a polynomial time or good solution !!

  15. To solve the above recurrence relation, you just have to use Master theorem. You can see that
    a = 4, b = 4, f(n) = nlogn = (n^c)(log^k n), where c = k = 1
    log a to base b = log4 to base 4 = 1 = c
    So its case 2 of Master theorem, which gives
    T(n) =  Θ(n^c)(log^k+1 n) = Θ(n^1)(log^2 n) = Θ(n log^2 n)

    To understand NP Complete and NP Hard classes, you need to have a basic understanding of a couple of things.

    A   decision problem is a problem that you can always answer either "yes"   or "no". For example, "Do you have a brother?". You can always answer   "yes" or "no" for such questions. Such problems are what we call   decision problems.

    A deterministic Turing machine is the machine   that we are used to normally. A computer is a deterministic Turing   machine. A non-deterministic Turing machine is a machine that comes  with  unlimited parallelism. For example, if you come to a fork on a  road,  you can either take the left road or the right road. That is how   deterministic Turing machine operates. But since non-deterministic   Turing machine has unlimited parallelism, it can take both the roads.   Its similar to running multiple threads on a computer.  Non-deterministic  Turing machines cannot be realized in practice.

    A decision problem is in class P,  if we can solve the problem in polynomial time using a deterministic  Turing machine. It  means that we can solve the problem very quickly. It  shall finish the  problem in some time n^k, where k is some constant.  For example, finding  the max element in an array, checking whether a  string is palindrome or  not, checking whether a number is prime or not,  and so on.

    A decision problem  is in class NP, if we can  solve it in polynomial time using a  non-deterministic Turing machine to  get the answer "yes" to your  problem. (The answer "no" is considered  to be in co-NP class). That  means that we cannot solve the problem in polynomial time using a deterministic Turing machine. But we can always check whether our solution is  right in polynomial time. So if someone gives  you an NP problem and  the answer as "yes", we can check whether the  answer is right or not in  polynomial time. But keep in mind that we  cannot find the answer in  polynomial time (only check whether an answer  is right).

    A problem X is in NP-Complete if we can prove that it’s in NP and that we can reduce a known NP  problem Y to X in polynomial time, i.e., we can quickly solve Y  if we  know how to quickly solve X. It means that if anyone finds out a  polynomial time solution to any NP-Complete problem, then every NP  problem can be solved in polynomial time. This shall prove that P = NP.  Eg: 3-SAT (conjunction of 3-clause disjunctions), Minesweeper problem.  Every NP problem can be reduced to 3-SAT problem (Cook’s Theorem)

    A problem is in NP-Hard if and only if its “atleast as” hard as an NP problem. The formal  definition is that a problem X is in NP-Hard if there is an NP-Complete  problem Y such that Y can be reduced to X in polynomial time. But since  any NP-Complete problem can be reduced to any other NP-Complete problem  in polynomial time, all NP-Complete problems can be reduced to any  NP-Hard problem in polynomial time. So if there is solution for one  NP-Hard problem in polynomial time, there is a solution for all NP  problems in polynomial time. NP-Complete problems are also NP hard. Also  NP-Hard problems may not be in NP, meaning that they may not have  solutions that can be verified in polynomial time. Eg: Halting problem  (given a program P and input I, will it halt? It is not in NP),  optimization version of TSP (we need to find an actual schedule; harder  than decision version of TSP). NP-Hard problems may not be decision  problem

  16. I think the main basis of understanding the "hardness" of the problem is to understand the goal behind the need for categorization.
    Our goal is:
     (a) Given a problem, I want to know all the solutions
     (b) Given a problem, and given a solution, verify if the solution is correct.

    Now, assume there is an algorithm devised for a problem.
    An algorithm can be defined as a set of instructions that are executed (conditionally) based on a set of variables and rules (image a flow chart)

    If this algorithm is run and it solves (a), then the output of the algorithm is a solution, if the algorithm is for (b) then the output is 'yes' or 'no'.

    A turing machine is a representation of this algorithm, which (may or may not) takes input, runs the algorithm and gives output (has a final state) i.e it ends.

    Now the algorithm that runs and produce a solution, its called a Deterministic Turing Machine (DTM) as the final state of providing solution is deterministic.

    But, if a algorithm could take multiple paths while arriving at a solution(s) and the path that is chosen to arrive at solution cannot be stated deterministically, it is a Non-deterministic Turing machine (NTM). There are a class of problems which can be solved if we exhaust all the possible paths (imagine a tree of possible states) and arrive at a solution. These are known to be solved using a NTM.
    Its easy to see that a NTM can be expressed as a DTM.

    Now that the machine is ready, we need to worry about how much time do we take to get a solution.
    If the time it takes is based on input size, and the time can be expressed as a polynomial equation involving input size, it can be said that the algorithm can be solved in polynomial time.
    And the set of problems that can be solved using a DTM in polynomial time are grouped in class P.

    The problems, that can be solved using NTM in polynomial time, are grouped NP (Non-deterministic turing machine in Polynomial time). As we discussed about the algorithms that fit in the NTM, are not easy to be solved in polynomial time. And hence P = NP asks a question if a NP problems can actually be solved in polynomial time like P.

    As others explained above, NP-complete problems are the ones which can be solved using the solutions of other NP problems in polynomial time. Hence if we have solution for one, we can use it to solve the other (in polynomial time). And given this property, if we find a the max 'hardest' problem that is NP-complete, we will be able to solve all NP problems. And these are called the NP-hard problems.
    Its easy to see that if we can solve a NP-hard problem, then all NP problems can be solved in polynomial time, and hence P = NP.

    Note: Most use NP class to denote decision problems, but it is not limited to that.

  17. I think this would be a very concise answer, Please go through:

    Notice how difficulty increases top to bottom: any NP can be reduced to NP-Complete, and any NP-Complete can be reduced to NP-Hard, all in P (polynomial) time.

    If you can solve a more difficult class of problem in P time, that will mean you found how to solve all easier problems in P time (for example, proving P = NP, if you figure out how to solve any NP-Complete problem in P time).

    | Problem Type | Verifiable in P time | Solvable in P time | Increasing Difficulty
    ___________________________________________________________|           |
    | P            |        Yes           |        Yes         |           |
    | NP           |        Yes           |     Yes or No *    |           |
    | NP-Complete  |        Yes           |      Unknown       |           |
    | NP-Hard      |     Yes or No **     |      Unknown ***   |           |
    ____________________________________________________________           V

    Notes on Yes or No entries:

    • * An NP problem that is also P is solvable in P time.
    • ** An NP-Hard problem that is also NP-Complete is verifiable in P time.
    • *** NP-Complete problems (all of which form a subset of NP-hard) might be. The rest of NP hard is not.

    I also found this diagram quite useful in seeing how all these types correspond to each other (pay more attention to the left half of the diagram).

  18. Many people have already clarified the difference in other related questions. But, in case those haven't helped you, here's an explanation in laymen terms:

    A problem is said to be in class P if there is a polynomial-time algorithm to solve the problem. Additionally, you can also verify* a solution in polynomial-time for all problems in class P.

    A problem is said to be in class NP if you can verify* a solution in polynomial-time but you don't know yet if you can solve it with a polynomial time algorithm.

    *By verify I mean – if an Oracle presents a solution to a problem, you can verify if the solution is correct or not in polynomial-time.

  19. Let me try to put it this way : Problems in the class NP are mostly natural problems whose intellectually “laziest” solution involves going through all possible value combinations of involved decision variables. This laziest solution for man is of course the most difficult for his slave, the machine ! While each one of those problems have intrinsic structure, their structure has not been explored thoroughly enough (except in some special cases) to be used in general as a basis for building smarter-than-lazy algorithms. We mean by : “smarter-than-lazy” those Algorithms which take a polynomial number of steps with respect to the length of their input(s). This is not the end of the story, however : An interesting phenomenon called completeness (hardness is similar, but not exactly the same) has been discovered which basically says that all those problems have the same structure of a more basic one, called SAT. SAT poses an eternal question of whether the truth value of a sentence in the most basic logic system known in modern ages (the propositional calculus) can be deduced without recurring through all possibilities of truth values of variables.

    Where in all this is Determinism (i.e. causal behavior) and Non Determinism (i.e. no-single-cause behavior) of Algorithms ? It is actually easy to understand. When people failed to explore the structure of those natural problems mentioned above and had therefore to live with the idea that the lazy solution might be the only possible one on normal machines, they imagined a theoretical machine model in which a solution is calculated “somehow” without the necessity of explaining why. Actual machines are of course causal, i.e. any step is caused by one and only one previous step providing hence always an explanation of what is happening and why. Calculating the solution “somehow” had a very important advantage : It could be used to drop the requirement of going through all combination values without explaining why or how this is done 🙂 … So a machine working in a “no-single-cause” mode solves the natural problems above in a smart way, “somehow” ! This explains why they are also called problems in NP (the non deterministic, polynomial Algorithms class) while usual smart Algorithms are in DP (the deterministic polynomial Algorithms class).

    Why did i say “natural” problems ? Not only, because they are occurring in nature, but also because they are occasionally solved by nature in a fast way. So when math asks the question whether NP=P it is actually asking also : “Is it non determinism or determinism which nature uses when it solves those problems?” taking this question to a whole other level of discussion, since – as you may know- a very similar question is at the heart of modern physics and many big scientists – including Einstein – refused to accept any non deterministic model describing natures brain.

    What is the difference between “no single cause” and “no cause at all”? The former includes the possibility of many things causing what is happening (without us knowing really which one of them is responsible) while the latter : “No cause at all” means – in theory at least – true randomness. Are there classes of random Algorithms ? What do we mean by a “random” Algorithm ? Isn’t true randomness something which defeats the mere idea of an Algorithm ? And even if we admit that random Algorithms exist (by requiring coins to be tossed before decisions are taken for example): Are they more powerful than non deterministic or deterministic ones ? Does nature use random Algorithms when it solves the above problems ? This is formalized in the question whether P=BPP or not (BPP being the big class of random Algorithms). It is actually also known that if P=NP then also P=BPP which basically says that if we break the above puzzle of “structure”, causal and non causal worlds collapse in one. So as you see the NP problem is linked to some deep stuff. :)))

    Amazingly enough the consensus among CS people is now that both : P<>NP and P=BPP, meaning : Although many seem to believe there is no point exploring structure of above natural problems because non determinism is more powerful than determinism, they see randomness, which is the best way to “somehow” do things, to be only as powerful as determinism. Weird … no ? :))))) – regards

Leave a Reply

Your email address will not be published. Required fields are marked *