Foundations of CS II

Probability, Statistics, and Algorithms

Two relevant papers

Posted by James on March 10, 2012

First, I was asked about what could be computed if time travel were possible. See this paper for some answers.

We saw a few polynomial-time reductions in class. Here’s a paper which shows that 3SAT reduces to Super Mario Bros.

Posted in Related material | Leave a Comment »

Algorithms and NP-completeness

Posted by James on March 6, 2012

Here are the last three lectures:

  1. Examples of computational problems

  2. Polynomial-time reductions

  3. NP and NP-completeness

Posted in Lecture, Slides | Leave a Comment »

Final Exam: What to know

Posted by James on March 5, 2012

The final exam will take place Monday, March 12th, in class, from 2:30-4:20pm. Just like lecture, it’s in room MGH 241.

The final exam will be open notes. Please limit yourself to one notebook worth of notes.

Here is a list of what you should know/be able to do for the final exam.

  • Counting: Permutations and Combinations
  • The axioms of discrete probability and how they can be used
    to derive various elementary properties (e.g. inclusion-exclusion)

  • Conditional probabilities, the chain rule
  • Bayes Theorem, basic Bayesian inference (e.g. the robot problem)
  • The definition of independence, and the properties of independent events
  • The definition of a random variable and the probability mass function
  • Expectation of a random variable
  • Variance of a random variable
  • Various types of discrete random variables:
    Uniform, Binomial, Poisson, Geometric

  • What tail bounds are useful for. Markov’s inequality and Chebyshev’s inequality
  • The basics of continuous random variables; the probability density function and the cumulative distribution function
  • The exponential and normal distributions
  • The statement of the Central Limit Theorem; how it can be used
  • Parameter estimation
  • The method of moments estimator
  • Maximum-Likelihood estimation
  • Hypothesis testing
  • The definition of polynomial-time, NP, and NP-completeness

Posted in Related material | 4 Comments »

Lecture 20: Hypothesis testing

Posted by James on March 3, 2012

Here are the slides.

Posted in Lecture, Slides | Leave a Comment »

Homework #8

Posted by James on February 28, 2012

Due, in class: March 7th

For this homework, you will need to turn in your solutions in two parts. This is for ease of grading by the TAs. Problems 1-2 and 3-5 should be turned in on different sheets of paper (there will be two different piles).

In solving the problem sets, you are allowed to collaborate with fellow students taking the class, but remember that you are required to write up the solutions by yourself or with a single partner. If you do collaborate in any way, you must acknowledge, for each problem, the people you worked with on that problem.

The problems have been carefully chosen for their pedagogical value, and hence might be similar to those given in past offerings of this course at UW, or similar to other courses at other schools.  Using any pre-existing solutions from these sources, for from the web, constitutes a violation of the academic integrity you are expected to exemplify, and is strictly prohibited.

Most of the problems only require one or two key ideas for their solution.  It will help you a lot to spell out these main ideas so that you can get most of the credit for a problem even if you err on the finer details.

A final piece of advice:  Start working on the problem sets early!  Don’t wait until the day (or few days) before they’re due.

Problems

Part I

  1. Consistent estimates. An estimate \hat \theta to some parameter \theta is called consistent if, for every \varepsilon > 0, we have P(|\theta - \hat \theta| > \varepsilon) \to 0 as n \to \infty, where n is the number of samples.

    • Show that if E[|\theta-\hat \theta|^2] \to 0 as n \to \infty, then the estimator \hat \theta is consistent. (Hint: Use a proof by contradiction.)

    • Consider X_1, X_2, \ldots, X_n \sim \mathrm{Uni}(0,\theta). Find the MLE \hat \theta_n after these n samples.

    • Give an exact mathematical expression for the cumulative distribution function of \hat \theta_n. Also compute the probability density function.

    • Use this to prove that \hat \theta_n is consistent. (Hint: Use the first part of the problem.)

  2. Power Laws. It is generally believed that the number of incoming links to a web page satisfies a power law. A Power Law Distribution with parameter \theta has density f(x) = \theta x^{-\theta-1}. Given samples X_1, X_2, \ldots, X_n from such a distribution, determine the MLE for \theta.

Part II

  1. Web server requests. Suppose that k_1, k_2, \ldots, k_n were the number of requests arriving at a web server during each of n successive minutes. Under the assumption that these rates follow a Poisson distribution with unknown parameter \lambda, what is the maximum likelihood estimator for \lambda ?

  2. Look at the fair/biased coin example in the hypothesis testing slides.

    1. As claimed on the slides, at the decision threshold of 60, the alternate hypothesis (that the coin is biased) is more likely. At what threshold is the likelihood ratio nearest to 10? (I.e., the data strongly favor the alternate hypothesis.)

    2. Estimate α, β, and the likelihood ratio if n=200 coin flips were gathered, and the decision rule were generalized slightly to read “accept the null hypothesis if the number of Heads is less than or equal to 0.6*n.” Are these answers different from the answers for n=100? Briefly explain why you should expect this. (You may use the normal approximation to the binomial for the calculations.)

  3. The Big Boss at Amazoogle.com says that based on long experience, the best model for errors in software is that experienced programmers make errors at a rate of p errors per line of code (say, for p= 0.001), no matter what you do. The salesperson for Ecliptic insists their IDE will save megabucks by reducing the error rate per line to q = p/2 (“… and just for you, this week only, there’s a special sale price, don’t miss it, blah, blah, …”). The Big Boss is a skeptic, but willing to experiment, and arranges for m different programs to be written using the new IDE, with lengths n1, n2, …, nm and the QC department finds, respectively, x1, x2, …, xm errors in them.

    The Boss asks you to analyze and present the results to Management. Being a whiz at hypothesis testing, you know just what to do.

    1. Explain what hypotheses you would examine, what decision rule you would use, etc.
    2. Suppose m=3, the programs are of length 2, 4, and 6 thousand lines and the number of errors found were 2, 1 and 5, respectively. How would you summarize the results of the experiment, what recommendations would you make and what are the uncertainties associated with them? Is your null hypothesis more or less likely than your alternative? Is it 5 times more or less likely?
    3. After your presentation, the Big Boss says “Why didn’t you just average the error rates for the 3 programs: (2/2000 + 1/4000 + 5/6000)/3 and compare that to p?” What will you answer and why?
    4. B.B. also says “For the same cost, I could have had more shorter programs, say two dozen of 500 lines each, or fewer longer ones, maybe even just one of 12000 lines. Which would have been better?” What will you answer and why?

Posted in Homework | 29 Comments »

Lecture 19: Maximum-likelihood estimation

Posted by James on February 28, 2012

Here are the slides.

For an example of how MLE calculations are done when the likelihood cannot be solved for exactly, see the well-known Expectation maximization algorithm.

Posted in Lecture, Slides | Leave a Comment »

Schedule for the rest of the quarter

Posted by James on February 26, 2012

  1. 2/27: Maximum-likelihood estimation
  2. 2/29: Hypothesis testing (BT, Section 9.3), HW #7 due, HW #8 assigned
  3. 3/2: Algorithms introduction (DPV, Chapter 0)
  4. 3/5: A theory of efficiency
  5. 3/7: NP and polynomial-time reductions (DPV, Chapter 8), HW #8 due
  6. 3/9: NO CLASS, time for studying

Final Exam: Monday, March 12, 2012,230-420 pm

Posted in Announcement | Leave a Comment »

Lecture 18: Parameter estimation

Posted by James on February 24, 2012

Reading: BT, Section 9.1.

Many families of distributions can be specified using a parameter, e.g. \mathsf{Bin}(p) (binomial with success proabability p), \mathsf{Poi}(\lambda), \mathsf{Exp}(\lambda), N(\mu, \sigma)^2, \mathsf{Uni}(a,b), etc. We write the parameter(s) as \theta, which could be a vector. E.g. for a normal random variable with distribution N(\mu,\sigma^2), we would have \theta = (\mu, \sigma^2). These are called parametric models, and our general goal is: Given many independent samples from such a model, determine \theta as best we can.

Confidence intervals

Let X_1, X_2, \ldots, X_n be i.i.d. samples from some distribution with mean \mu and variance \sigma^2. Recall the sample mean \overline{X} = \frac{1}{n} \sum_{i=1}^n X_i. We defined the sample variance by:

\displaystyle S^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2

Now the 100(1-a)% confidence interval is the interval given by

\displaystyle \left(\overline{X} - z_{a/2} \frac{S}{\sqrt{n}}, \overline{X} + z_{a/2} \frac{S}{\sqrt{n}}\right),

where z_{a/2} is such that \Phi(z_{a/2}) = 1-a/2, where \Phi is the CDF of an N(0,1) random variable. (See here for the CDF table.)

Recall that this intuition behind this interval is this: For large n, the probability that the true value of \mu falls in this interval is at least 100(1-a)%. This would be true if our sample mean \overline{X} was truly distributed as an N(\mu,\frac{\sigma^2}{n}) random variable. Here, we make the implicit assumption that for large n, the Central Limit Theorem has kicked in enough to justify a normal approximation.

Method of moments

Recall that the k-th moment of a random variable X is m_k =  E[X^k]. In this case, the sample moment is the random variable given by \hat m_k = \frac{1}{n} \sum_{i=1}^n X_i^k, where X_1, X_2, \ldots, X_n are i.i.d. samples from the distribution.

For instance, the MOM estimate for the variance would be \hat m_2 - (\hat m_1)^2, following the formula \mathrm{Var}[X] = E[X^2] - (E[X])^2.

Bias of the estimator

If our estimator (which, remember, is a random variable!) is \hat \theta, then the bias of the estimator is defined by E[\hat \theta] - \theta, where \theta is true value of the underlying parameter.

An estimator is called unbiased if it has bias = 0. In particular, the sample mean and sample variance are unbiased, while the MOM estimate for the variance is biased (though its bias goes to 0 as n \to \infty).

Estimating parameters using the method of moments

We ended by showing how the MOM could be used to estimate parameters for a number of distributions. For instance, if we have i.i.d. samples X_1, X_2, \ldots, X_n from a \mathsf{Ber}(p) distribution, then the MOM estimate for p would be

\displaystyle \hat p = \frac{1}{n} \sum_{i=1}^n X_i.

For a more interesting example, consider samples from a \mathsf{Uni}(a,b) distribution, i.e. a distribution which is uniform on the interval [a,b]. We know from previous lectures, that this distribution has \mu = \frac{a+b}{2} and \sigma^2 = \frac{(a-b)^2}{12}. Thus we can use the MOM to find estimates \hat a and \hat b by solving the system of two questions: \hat m_1 = \frac{\hat a+\hat b}{2} and \hat m_2 - (\hat m_1)^2 = \frac{(\hat a-\hat b)^2}{12}. The result is \hat a = \hat m_1 - \sqrt{3} \hat \sigma and \hat b = \hat m_1 + \sqrt{3} \hat \sigma, where (\hat \sigma)^2 = \hat m_2 - (\hat m_1)^2.

Posted in Lecture | Leave a Comment »

Homework #7

Posted by James on February 23, 2012

Due, in class: February 22nd

For this homework, you will need to turn in your solutions in two parts. This is for ease of grading by the TAs. Problems 1-3 and 4-7 should be turned in on different sheets of paper (there will be two different piles).

In solving the problem sets, you are allowed to collaborate with fellow students taking the class, but remember that you are required to write up the solutions by yourself or with a single partner. If you do collaborate in any way, you must acknowledge, for each problem, the people you worked with on that problem.

The problems have been carefully chosen for their pedagogical value, and hence might be similar to those given in past offerings of this course at UW, or similar to other courses at other schools.  Using any pre-existing solutions from these sources, for from the web, constitutes a violation of the academic integrity you are expected to exemplify, and is strictly prohibited.

Most of the problems only require one or two key ideas for their solution.  It will help you a lot to spell out these main ideas so that you can get most of the credit for a problem even if you err on the finer details.

A final piece of advice:  Start working on the problem sets early!  Don’t wait until the day (or few days) before they’re due.

Problems

Part I

  1. Let X be a random variable with probability density function

    \displaystyle f(x)=\begin{cases} c(1-|x|^3) & -1 < x < 1 \\ 0 & \textrm{otherwise} \end{cases}

    • What is the value of c?

    • What is the cumulative distribution function of X?

  2. Let X be a random variable whose density function is defined by

    \displaystyle f(x) = \frac{\lambda}{2} e^{-\lambda |x|},

    where \lambda > 0 is some positive parameter.
    Verify that f is a proper density function. Evaluate the mean and variance of X in terms of \lambda.

  3. Suppose you throw a dart at a circular target dart board with radius 1, and your throw is equally likely to hit any point of the target. Let X be the distance from the dart to the outer border of the board.

    • Find the density function of X, as well as its mean and its variance.
    • Suppose the “bulls eye” on the dart board is an inner circle of radius \varepsilon \leq 1. If you hit inside the bulls eye, you get Y=1/(1-X) points and otherwise you get Y=0 points. What’s the cumulative distribution function of Y?

Part II

  1. Police response times.

    • A police station is to be located along a road of length L. If crimes occur at points uniformly chosen on the interval (0,L), where should the station be located so as to minimize the expected distance from the crime? That is, choose P so as to minimize E[|X-P|], when X is uniformly distributed over (0,L). Explain your answer.

    • Now suppose that the road is of infinite length, stretching from points 0 outward to infinity. If the distance of a crime from point 0 is exponentially distributed with rate \lambda, where should the police station now be located? That is, we want to minimize E[|X-P|], where X is now exponential with rate \lambda.

  2. The random variable X is said to be a discrete uniform random variable on {1,2,…,n} if

    \displaystyle P(X=i) = \frac{1}{n}\quad i=1,2,\ldots,n

    For any nonnegative real number x, let Int(x) be the largest integer that is less than or equal to x. Show that if U is a uniform random variable on [0,1], then the variable X = Int(nU)+1 is a discrete uniform random variable on {1,2,…,n}.

  3. You can use this table for the CDF of an N(0,1) random variable.

    Suppose you have 20 independent random variables U_1, U_2, \ldots, U_{20} which are all uniformly distributed on [0,1]. Define:

    X = U_1 + U_2 + \cdots + U_{20}.

    Use the Central Limit Theorem to estimate the probability that X is NOT in the range [8,14]. Show your calculation!

  4. Let X be an N(0, \sigma^2) random variable. Use the table above to compute the probabilities of the events that \{X \geq k \sigma \} and \{ (k-1)\sigma \leq |X| \leq k \sigma \} for k=1,2,3.

Extra Credit

We know that if X_1, X_2, \ldots, X_N are random variables then

\displaystyle E(X_1 + X_2 + \cdots + X_N) = E(X_1) + E(X_2) + \cdots + E(X_N).

But what happens if N is itself a random number?

  1. Give an example where N is a random variable and the linearity of expectation condition no longer holds.

  2. What if all the random variables X_1, X_2, \ldots, X_N are independent and identically distributed (i.i.d.), and also independent of N? Does linearity of expectation hold? Either prove it or give a counterexample.

  3. Suppose again that X_1, X_2, \ldots, X_N are i.i.d. But now N is a random variable that can depend on X_1, X_2, \ldots, X_N.

    Give an example which shows that linearity of expectation does not hold.

    Now assume that for every n=1,2,3,… whether the event \{N = n\} occurs can be determined by looking only at the values X_1, X_2, \ldots, X_n. Show that in this case, linearity of expectation once again holds.

Posted in Homework | 25 Comments »

Lecture 18: The central limit theorem

Posted by James on February 22, 2012

Here are the slides. Reading assignment: BT, Sections 5.2-5.5.

Posted in Lecture, Slides | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.