Unifying Ideas in Computer Science
Spring 2015
MP Johnson

Problem Set 4
Due by email by 11:59pm, Wednesday, May 19th

You're free to work with a partner, but you must each submit five different problems.

  1. Add your five NP-Complete problems from the previous problem set to the Compendium. When you go there and click the Submist link at the bottom of the page, you'll log in with your Google account. Then for each problem you add, specify the name of the problem (e.g. MIS), what constitutes a possible solution (e.g., an unweighted graph $G$), what constitutes a solution (e.g., a subset $S$ of $G$'s nodes sharing no edges among them), and what the solution quality is measured based on (e.g., the size of the subset $S$). For "Problem Good" and "Problem Bad" add anything you found in researching this problem about how well it can be (approximately) solved or negative results saying it can't be solved better than something. For some problems there might be little known one way or the other, but add whatever you can find. See existing problems in the Compendium for examples. Finally, add tags describing the problem (e.g., "graph" or "game" or "scheduling", etc.) and your name. And if you can, add an IP formulation of the problem.

    When you submit your homework, include the links to the five new pages you've added to the Compendium.

    Of course, before you add, search it to make sure your problems don't already exist!

    1. Construct an example of the inverse fallacy, whereby e.g. $P(A|B)$ is very high and yet $P(B|A)$ is nonetheless very small. That is, create a population of individuals (1000, say), where $A$ and $B$ are each properties that an individual may or may not possess, so that $P(A|B)$ = # who are both $A$ and $B$ / # who are $B$, and $P(B|A)$ = # who are both $B$ and $A$ / # who are $A$. (For example, think of $A$ = "got good grades in high school" and $B$ = "was accepted by MIT".) Choose how many within the population satisfy just $A$, just $B$, both, and neither, in order to make the conditional probabilities come out as desired.

    2. Write a paragraph on the importance of understanding that the values of $P(A|B)$ and $P(B|A)$ can be totally different from one another.

  2. In discussing induction, we saw that machine learning's modern version of the assumption that "the future will be like the past" is that "samples will be drawn from the same solution we trained on." Valiant's Theorem tells us that as long as the distribution remains unchanged, if we train on a certain number of samples then with probability $1-\delta$ that are learned rule will be accurate $1-\epsilon$ percent of the time. We saw that one of the sources of vulnerability in such learning is very rare events--if we're unlucky, they might not occur in our training data.

    Write a couple of paragraphs reflecting on the situation you're in when, after training, some totally unexpected event occurs. Think of the day when the chicken finally gets its head cut off, or the day when the meteor comes and kills off the dinosaurs, or when 9/11 happens and everyone concludes that now there's a "new normal" where planes will be highjacked every other day. When this happens, how do you distinguish between this being the first sample drawn from a totally new distribution versus the stars having happened to align a certain way, allowing the freak accident to occur.