math: add graphs

math: add frequency data structures
math: add bias and types of data, expand descriptions
2020-09-23 15:41:35 -04:00 · 2020-09-22 22:19:46 -04:00 · 2020-09-21 21:10:22 -04:00
5 changed files with 140 additions and 144 deletions
@@ -1,2 +0,0 @@
-
-## Introduction
@@ -133,7 +133,3 @@ The course code for this page is **ENG3UZ**.
 	- e.g., *"I don't want to! That candy is MINE, and no one is going to take it from ME! Mine, mine, MINE!"*
 - Theme: The "main idea" or underlying meaning of a literary work, which can be given directly or indirectly.
 	- e.g., *"Never forget that* you are royalty, *and that hundreds of thousands of souls have suffered and perished so you could become what you are. By their sacrifices, you have been given the comforts you take for granted. Always remember them, so that their sacrifices shall never be without meaning."* (*Eon Fable*, ScytheRider)
-
-## Resources
-
- - [Analysis of a Poem](/resources/g11/central-asserion-1.pdf)
@@ -15,7 +15,3 @@ If you would like to contribute by submitting fixes, requesting pages, and/or co
 ## Source

 The source for Eifueo is available [here](https://git.eggworld.tk/eggy/eifueo).
-
-## Acknowledgements
-
-Thanks to James Su, Lakshy Gupta, and Vincent Guo for providing supplementary data for accuracy and conciseness.
@@ -5,16 +5,150 @@ The course code for this page is **MHF4U7**.
 ## 4 - Statistics and probability

 !!! note "Definition"
-    - **Descriptive statistics:** The use of methods to organise, display, and describe data by using various charts and summary methods to reduce data to a manageable size.
+    - **Statistics:** The techniques and procedures to analyse, interpret, display, and make decisions based on data.
+    - **Descriptive statistics:** The use of methods to work with and describe the **entire** data set.
    - **Inferential statistics:** The use of samples to make judgements about a population.
    - **Data set:** A collection of data with elements and observations, typically in the form of a table. It is similar to a map or dictionary in programming.
    - **Element:** The name of an observation(s), similar to a key to a map/dictionary in programming.
    - **Observation:** The collected data linked to an element, similar to a value to a map/dictionary in programming.
+    - **Population**: A collection of all elements of interest within a data set.
+    - **Sample**: The selection of a few elements within a population to represent that population.
    - **Raw data:** Data collected prior to processing or ranking.

+### Sampling
+
+A good sample:
+
+ - represents the relevant features of the full population,
+ - is as large as reasonably possible so that it decently represents the full population,
+ - and is random.
+
+The types of random sampling include:
+
+ - **Simple**: Choosing a sample completely randomly.
+ - **Convenience**: Choosing a sample based on ease of access to the data.
+ - **Systematic**: Choosing a random starting point, then choosing the rest of the sample at a consistent interval in a list.
+ - **Quota**: Choosing a sample whose members have specific characteristics.
+ - **Stratified**: Choosing a sample so that the proportion of specific characteristics matches that of the population.
+
+??? example
+    - Simple: Using a random number generator to pick items from a list.
+    - Convenience: Asking the first 20 people met to answer a survey,
+    - Systematic: Rolling a die and getting a 6, so choosing the 6th element and every 10th element after that.
+    - Quota: Ensuring that all members of the sample all wear red jackets.
+    - Stratified: The population is 45% male and 55% female, so the proportion of the sample is also 45% male and 55% female.
+
+### Types of data
+
+!!! note "Definition"
+    - **Quantitative variable**: A variable that is numerical and can be sorted.
+    - **Discrete variable**: A quantitative variable that is countable.
+    - **Continuous variable**: A quantitative variable that can contain an infinite number of values between any two values.
+    - **Qualitative variable**: A variable that is not numerical and cannot be sorted.
+    - **Bias**: An unfair influence in data during the collection process, causing the data to be not truly representative of the population.
+
 ### Frequency distribution

+A **frequency distribution** is a table that lists categories/ranges and the number of values in each category/range.

+A frequency distribution table includes:
+
+ - A number of classes, all of the same width.
+	- This number is arbitrarily chosen, but a commonly used formula is $\lceil\sqrt{\text{# of elements}}\rceil$.
+	- The width (size) of each class is $\lceil\frac{\text{max value} - \text{min value}}{\text{number of classes}}\rceil$.
+	- Each class includes its lower bound and excludes its upper bound ($\text{lower} ≤ x < \text{upper}$)
+	- The **relative frequency** of a data set is the percentage of the whole data set present in that class in decimal form.
+ - The number of values that fall under each class.
+	- The largest value can either be included in the final class (changing its range to $\text{lower} ≤ x ≤ \text{highest}$), or put in a completely new class above the largest class.
+
+??? example
+    | Height $x$ (cm) | Frequency |
+    | --- | --- |
+    | $1≤x<5$ | 2 |
+    | $5≤x<9$ | 3 |
+    | $9≤x≤14$ | 1 |
+
+For a given class $i$, the midpoint of that class is as follows:
+$$x_{i} = \frac{\text{lower bound} + \text{upper bound}}{2}$$
+
+### Quartiles
+
+A **percentile** is a value indicates the percentage of a data set that is below it. To find the location of a given percentile, $P_k = \frac{kn}{100}$, where $k$ denotes the percentile number and $n$ represents the sample size.
+
+A **decile** indicates that $n×10$% of data in the data set is below it.
+
+!!! example
+    A score equal to or greater than 97% of all scores in a test is said to be in the *97th percentile*, or in the *9th decile*.
+
+Quartiles split a data set into four equal sections.
+
+ - The **minimum** is the lowest value of a data set.
+ - The **first quartile** ($Q_1$) is at the 25th percentile.
+ - The **median** is at the 50th percentile.
+ - The **third quartile** ($Q_3$) is at the 75th percentile.
+ - The **maximum** is the highest value of a data set.
+
+The first and third quartiles are the median of the **[minimum, median)** and **(median, maximum]** respectively.
+
+!!! warning
+    When the median is equal to a data point in a set, it *cannot* be used to find $Q_1$ or $Q_3$. Only use the data below or above the median.
+
+!!! warning
+    When working with grouped data given in ranges, the actual data is unavailable. The five numbers above are instead:
+
+    - The minimum value is now the lower class boundary of the lowest class.
+    - The first quartile is the midpoint of the class it resides in.
+    - The median is the midpoint of the class it resides in.
+    - The third quartile is the midpoint of the class it resides in.
+    - The maximum value is now the upper class boundary of the highest class. If the highest value is excluded (e.g., $90≤x<100$), it also must be excluded when representing data (e.g., open dot instead of filled dot).
+
+The **interquartile range (IQR)** is equal to $Q_3 - Q_1$ and represents the range where 50% of the data lies.
+
+### Outliers
+
+Outliers are data values that significantly differ from the rest of the data set. They may be because of:
+
+ - a random natural occurrence, or
+ - abnormal circumstances
+
+Outliers can be ignored once identified.
+
+There are various methods to identify outliers. For **single-variable** data sets, the **lower and upper fences** may be used. Any data below the lower fence or above the upper fence can be considered outliers.
+
+ - The lower fence is equal to $Q_1 - 1.5×\text{IQR}$
+ - The upper fence is equal to $Q_3 + 1.5×\text{IQR}$
+
+### Representing frequency
+
+A **stem and leaf plot** can list out all the data points while grouping them simultaneously.
+
+A **frequency histogram** can be used to represent frequency distribution, with the x-axis containing class boundaries, and the y-axis representing frequency. 
+
+<img src="/resources/images/frequency-discrete.png" width=700>(Source: Kognity)</img>
+
+!!! note
+    If data is discrete, a gap must be left between the bars. If data is continuous, there must *not* be a gap between the bars.
+
+A **cumulative frequency table** can be used to find the number of data values below a certain class boundary. It involves the addition of a **cumulative frequency** column which represents the sum of the frequency of the current class as well as every class before it. It is similar to a prefix sum array in computer science.
+
+??? example
+    | Height $h$ (cm) | Frequency | Cumulative frequency |
+    | --- | --- | --- |
+    | $1≤h<10$ | 2 | 2 |
+    | $10≤h<19$ | 5 | 7 |
+
+A **cumulative frequency curve** consists of an independent variable on the x-axis, and the cumulative frequency on the y-axis. In grouped data, the values on the x-axis correspond to the upper bound of a given class. This graph is useful for interpolation (e.g., the value of a given percentile).
+
+<img src="/resources/images/cumulative-frequency-curve.png" width=700>(Source: Kognity)</img> 
+
+A **box-and-whisker plot** is a visual representation of the **"5-number summary"** of a data set. These five numbers are the minimum and maximum values, the median, and the first and third quartiles.
+
+<img src="/resources/images/box-and-whisker.png" width=700>(Source: Kognity)</img>
+
+!!! warning
+    In the image above, the maximum and minimum dots are filled. If these values were to be excluded (e.g., the upper class boundary in grouped data is excluded), they should be unfilled instead.
+
+### Measures of central tendency

 ## Resources

@@ -18,10 +18,6 @@ Every other SI unit is derived from the fundamental SI units. Memorise these!
 | Amount of substance | Mole | mol |
 | Luminous intensity | Candela | cd |

-!!! info "Reminder"
-    Note that on an assesment, you are expected to derive the SI unit expression given the equation of unit.
-    For example Force = mass x acceleration = mass x distance / time / time = $kg \times m \times s^{-2}$. 
-
 ### Metric prefixes

 Every SI unit can be expanded with metric prefixes.
@@ -40,9 +36,6 @@ Every SI unit can be expanded with metric prefixes.
 | peta- | P | $10^{15}$ | f | femto- |
 | exa- | E | $10^{18}$ | a | atto- |

-!!! note
-    For easier memorization, notice that most of these prefixes are in multiples of $3$. 
-
 ### Significant figures

 - The leftmost non-zero digit is the **most significant digit**.
@@ -119,7 +112,7 @@ Uncertainties are stated in the form of [value] ± [uncertainty]. A value is onl
 To determine a measurement's absolute uncertainty, if:

 - the instrument states its uncertainty, use that.
- - an analog instrument is used, the uncertainty is half of the smallest scale.
+ - an analog instrument is used, the last digit is estimated and appended to the end of the reported value. The estimated digit is uncertain by 5 at its order of magnitude.
 - a digital instrument is used, the last reported digit is uncertain by 1 at its order of magnitude.

 !!! example
@@ -140,8 +133,8 @@ Error bars represent the uncertainty of the data, typically representing that da
 ### Uncertainty of gradient and intercepts

 !!! note "Definition"
-    - The **line of best fit** is the line that passes through **as many error bars as possible** while passing as closely as possible to all data points.
-    - The **minimum and maximum lines** are lines that minimise/maximise their slopes while passing through the first and last **error bars**.
+    - The **line of best fit** is the line that passes through **all error bars** while passing as closely as possible to all data points.
+    - The **minimum and maximum lines** are lines that minimise/maximise their slopes while still passing through **all error bars.**

 !!! warning
    - Use solid lines for lines representing **continuous data** and dotted lines for **discrete data**.
@@ -149,134 +142,13 @@ Error bars represent the uncertainty of the data, typically representing that da
 <img src="/resources/images/error-slopes.png" width=700>(Source: Kognity)</img>

 The uncertainty of the **slope** of the line of best fit is the difference between the maximum and minimum slopes.
-$$m_{\text{best fit}} ± \frac{m_{\max}-m_{\min}}{2}$$
+$$m_{best fit} ± m_{max}-m_{min}$$
 The uncertainty of the **intercepts** is the difference between the intercepts of the maximum and minimum lines.
-$$\text{intercept}_{\text{best fit}} ± \frac{\text{intercept}_{\max} - \text{intercept} _{\min}}{2}$$
+$$intercept_{best fit} ± intercept_{max} - intercept_{min}$$


 ## 1.3 - Vectors and scalars

-!!! note "Definition"
-    - **Scalar:** A physical quantity with a numerical value (magnitude) and a unit.
-    - **Vector:** A physical quantity with a numerical value (magnitude), a unit, and a **direction.**
-        - The **minimum magnitude** for any vector must be $`\ge 0`$
-
-??? example
-    - Scalar quantities include speed, distance, mass, temperature, pressure, time, frequency, current, voltage, and more.
-    - Vector quantities include velocity, displacement, acceleration, force (e.g., weight), momentum, impulse, and more.
-
-Vectors are drawn as arrows whose length represents their scale/magnitude and their orientation refer to their direction. A variable representing a vector is written with a right-pointing arrow above it.
-
- - The **standard form** of a vector is expressed as its magnitude followed by its unit followed by its direction in square brackets.
-$$\vec{a} = 1\text{ m }[N 45° E]$$
- - The **component form** of a vector is expressed as the location of its head on a cartesian plane if its tail were at $(0, 0)$.
-$$\vec{a} = (1, 1)$$
- - The **magnitude** of a vector can be expressed as the absolute value of a vector.
-$$|\vec{a}| = 1 \text{ m}$$
-
-
-### Adding/subtracting vectors diagrammatically
-
-1. Draw the first vector.
-2. Draw the second vector with its *tail* at the *head* of the first vector.
-3. Repeat step 2 as necessary for as many vectors as you want by attaching them to the *head* of the last vector.
-4. Draw a new (**resultant**) vector from the *tail* of the first vector to the *head* of the last vector.
-
-<img src="/resources/images/vector-add-direction.png" width=700>(Source: Kognity)</img>
-
-When subtracting a vector, **negate** the vector being subtracted by giving it an opposite direction and then add the vectors.
-
-<img src="/resources/images/vector-subtract-direction.png" width=700>(Source: Kognity)</img>
-
-!!! note
-    Notice that when we are subtracting vectors ($\vec{a} - \vec{b}$), we have **tail** to **tail** and the difference vector has a direction from ($\vec{b} to \vec{a}$)
-
-### Adding/subtracting vectors algebraically
-
-Vectors can be broken up into two vectors (**"components"**) laying on the x- and y-axes via trigonometry such that the resultant of the two components is the original vector. This is especially helpful when adding larger (3+) numbers of vectors.
-$$\vec{F}_x + \vec{F}_y = \vec{F}$$
-
-!!! info "Reminder"
-    The **component form** of a vector is expressed as $(|\vec{a}_x|, |\vec{a}_y|)$
-
-<img src="/resources/images/vector-simple-adding.png" width=700>(Source: Kognity)</img>
-
-By using the primary trignometric identities:
-$$
-|\vec{a}_{x}| = |\vec{a}|\cos\theta_{a} \\
-|\vec{a}_{y}| = |\vec{a}|\sin\theta_{a}
-$$
-
-<img src="/resources/images/vector-decomposition.png" width=700>(Source: Kognity)</img>
-
-Using their component forms, to:
-
- - add two vectors, add their x- and y-coordinates together.
- - subtract two vectors, subtract their x- and y-coordinates together.
-
-$$
-(a_{x}, a_{y}) + (b_{x}, b_{y}) = (a_{x} + b_{x}, a_{y} + b_{y}) \\
-(a_{x}, a_{y}) - (b_{x}, b_{y}) = (a_{x} - b_{x}, a_{y} - b_{y})
-$$
-
-The length of resultant vector can then be found using the Pythagorean theorem.
-
-$$
-|\vec{c}|=\sqrt{c_{x}^2 + c_{y}^2}
-$$
-
-To find the resultant direction, use inverse tan to calculate the angle of the vector using the lengths of its components.
-
-$$
-\theta_{c} = \tan^{-1}(\frac{c_y}{c_x})
-$$
-
-### Multiplying vectors and scalars
-
-The product of a vector multiplied by a scalar is a vector with a magnitude of the vector multiplied by the scalar with the same direction as the original vector.
-
-$$\vec{v} × s = (|\vec{v}|×s)[\theta_{v}]$$
-
-!!! example
-    $$3 \text{ m} · 47 \text{ ms}^{-1}[N20°E] = 141 \text{ ms}^{-1}[N20°E]$$
-
-## 2.1 - Motion
-
-### Models
-
-A **scientific model** is a simplification of a system based on assumptions used to explain or make predictions for that system.
-
-!!! note "Definition"
-    - **System**: An object or a connected group of objects.
-    - **Point particle assumption**: An assumption that models a system as a blob of matter. It is more reliable if the size and shape of the object(s) do not matter much.
-    - **Uniform motion**: The type of motion in which the speed of an object is constant.
-
-### Displaying motion
-
-Motion can be expressed visually using a **motion diagram** or a **position-time graph**.
-
-// TODO: insert motion diagram here because kognity bad
-
-A **position-time graph** expands on the motion diagram by specifying a precise **position** value on the vertical axis in addition to time on the horizontal axis. The line of best fit indicates the object's speed, as well as if it is accelerating or decelerating.
-
-<img src="/resources/images/position-time-graph.png" width=700>(Source: Kognity)</img>
-
-When the slope is:
-
- - linear, the object is moving at a constant speed.
- - exponential, the object is accelerating.
- - logarithmic, the object is decelerating.
-
-## 2.2 - Forces
-
-## 2.3 - Work, energy, and power
-
-## 2.4 - Momentum and impulse
-
-## 3.1 - Thermal concepts
-
-## 3.2 - Modelling a gas
-
 ## Resources

 - [IB Physics Data Booklet](/resources/g11/ib-physics-data-booklet.pdf)
Author	SHA1	Message	Date
eggy	7a168dac62	math: add graphs	2020-09-23 15:41:35 -04:00
eggy	de940be05e	math: add frequency data structures	2020-09-22 22:19:46 -04:00
eggy	d060ad4322	math: add bias and types of data, expand descriptions	2020-09-21 21:10:22 -04:00