diff --git a/docs/source/overview.md b/docs/source/overview.md
index 9db4fc4..4e1fc1a 100644
--- a/docs/source/overview.md
+++ b/docs/source/overview.md
@@ -46,7 +46,7 @@ architecture search to grow an ensemble of subnetworks:
 
 Each AdaNet **iteration** has the given lifecycle:
 
-![AdaNet iteration lifecucle](./assets/lifecycle.svg "The lifecycle of an AdaNet iteration.")
+![AdaNet iteration lifecycle](./assets/lifecycle.svg "The lifecycle of an AdaNet iteration.")
 
 Each of these concepts has an associated Python object:
 
diff --git a/docs/source/theory.md b/docs/source/theory.md
index 393ee60..1969927 100644
--- a/docs/source/theory.md
+++ b/docs/source/theory.md
@@ -43,7 +43,7 @@ rigorous manner.
     learner's weight is inversely proportional to the Rademacher complexity of
     its function class, and all the weights in the logits layer sum to 1.
     Additionally, at training time, we don't have to discourage the trainer from
-    learning complex models -- it is only when we consider the how much the
+    learning complex models -- it is only when we consider how much the
     model should contribute to the ensemble do we take the complexity of the
     model into account.
 *   **Complexity is not just about the weights.** The Rademacher complexity of a