Learning notes on Data Mining: Part 1

What is Data Mining

Knowledge Discovery from Data.

Iterative step:

Tasks

Descriptive

Data characterization: Summarize main features on target class.

Data discrimination: Compare main features of target classes.

Frequent pattern discovery: frequent item set

ordered: sequence pattern

structured: structure pattern

Predictive

Classification & Regression

Clustering: have no knowledge of class.

Outlier analysis

What is interesting

$$ \text{support}(X \Rightarrow Y) := P(X \lor Y) $$

$$ \text{confidence}(X \Rightarrow Y) := P(Y | X) $$

or subjective ones

Challanges

Data attributes

Categorical

mode only

Symmetric: carried same weight

Asymmetric: not equally important

mode and median

Numeric

mean, mode and median

ratio is meaningless here

contrary to the one above

Data description

Central tendency measurement

Imbalanced importance: weighted mean

Sensitive to outlier: trimmed mean

For imbalanced data.

For large dataset, the interpolation way to calculate is as follows.

$$ \text{median} = L + \left( \frac{N}{2} + \sum_{l} freq_l \right) \frac{width}{freq_{\text{median}}} $$

where L is the low bound of the range of median, N is the count of data, sum(freq)_l is the sum of all frequency of ranges which is lower than median range, freq_median is the frequency of the median range and width is the width of of median range.

May exists multiple mode

Data distribution

Selected data that split total data equally in size.

Median is a specialized version of this.

Interquartile range (IQR): Q3 - Q1

Five-number summary: ordered(min, Q1, median, Q3, max) vis=boxplot

$$ \left(\frac{\sum(xi^2)}{N}\right)^2 - x_{mean}^2 $$

Square root of variance

Graph

univariable

x: percentage

y: value

bivariable

Bar chart: categotical

Histogram: Numerical

Visualization

Not suitable for high dimension data

Space-filling curve

Hilbert, Grey, Z

Detailed explaination

Circle segment techinique

TODO

Scatter matrix: half is enough

Parallel coordinates

TODO

Too wired

Chernoff faces

Stick figure



You've reached the end of this page. And you may Go to index or visit my friends.
About me and contacts
Except where otherwise noted, this site is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License