BUY-ORIGINAL ESSAYS ONLINE

Summary of Computational Learning Theory

Summary of Computational Learning Theory

Kui Tang December 2012

WRITE THIS ESSAY FOR ME

Tell us about your assignment and we will find the best writer for your paper.

Get Help Now!

Prepared from notes and homeworks from Professor Rocco Servedio’s Fall 2012 course in preparation for the final. No guarantees of completeness or correctness. Contents 1 Online Learning 2 1.1 Elementary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Winnow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 (Randomized) Weighted Majority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 General Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 PAC Learning 5 2.1 Probability Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 OLMB to PAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Consistent Hypothesis Finder; Occam’s Razor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Hardness of 3-Term DNF Learning; Proper and Improper Learning . . . . . . . . . . . . . . . 7 2.5 VC Dimension Lower-Bounds PAC Sample Complexity . . . . . . . . . . . . . . . . . . . . . . 8 3 Bounds for Infinite Hypothesis Classes 8 3.1 Growth Functions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Sauer’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Infinite CHF (Method of Double Sampling) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Boosting 10 4.1 Three-Stage Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Recursive Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Noisy PAC 12 5.1 Statistical Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2 Example: Monotone Disjunctions Noisy PAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3.1 Sample Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3.2 Noise Rate and Adversarial Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6 Representation-Independent (Cryptographic) Hardness 13 6.1 Pseudorandom Functions are Hard to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2 Discrete Cube Roots are Hard to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1
Background image of page 1
Info iconThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentRight Arrow Icon
1 Online Learning 1.1 Elementary Algorithms To learn a one-decision list, use extended decision (multiple rules; if multiple antecedents are satisfied, arbitrarily choose one) lists are your hypotheses. Begin with all 4 n + 2 rules. Each time a rule fires and returns a wrong answer, such that that rule is responsible for the error, we push the rule down to the next level. A rule is pushed iff a mistake is made (show the converse side), so we have at most O ( n ) mistakes. Decision lists can be encoded as LTFs with exponentially decaying weights. Recall that a matching rule short-circuits the rest of the decision list evaluation. In an LTF with exponentially-decaying weights, the first matching rule overwhelm all subsequent coefficients [HW1.1b]. [HW1.2] Any OLMB algorithm can be made conservative: Suppose algorithm A has mistake bound M . Algorithm A0 predicts with its current hypothesis. If it turned out to be a mistake, A0 gives the current example to A and takes A0 s output hypothesis. The two algorithms make the same number of mistakes on the subsequence that causes mistakes. This is because A only sees this sequence of mistake examples. Therefore, A0 can make no more mistakes than A itself would have. Randomized halving algorithm: if we remove the oblivious adversary, a more efficient algorithm (in expectation) can be obtained: maintain the set of consistent-so-far hypothesis, and on each mistake, pick h uniformly at random from the new set of consistent hypotheses. Then for any fixed target c and training sequence ± x i ² (this is the oblivious adversary assumption), we have E [ M ] ≤ ln |C| + O (1) . Proof idea: order the concepts in CONSIST by the order in which the fixed training sequence will eliminate them

Introducing our Online Essay Writing Services Agency, where you can confidently place orders for a wide range of academic assignments. Our reputable homework writing company specializes in crafting essays, term papers, research papers, capstone projects, movie reviews, presentations, annotated bibliographies, reaction papers, research proposals, discussions, and various other assignments. Rest assured, our content is guaranteed to be 100% original, as every piece is meticulously written from scratch. Say goodbye to concerns about plagiarism and trust us to deliver authentic and high-quality work.

WRITE MY ESSAY NOW

PLACE YOUR ORDER