In artifical intelegance/machine learning, everything start and end with data and in current world everyday by using facebook, insta we all are producing a lot of data which is available on system for analaysis, but this is raw data.
Now when it come to machine learning, not all the data can be feed to algorighm. For an example if we have data like Hight, weight, temprature, BP, heart rate, Age, sleep hours, weather and officeintime of an employee. All these different attribute of the data is called features. These features
Feature is a crutial factor for any algorighm, for training, we can have hundred of feature of the data but not all can be used to train the algorithm as it may increase the computation cost. Other factor is not all the feature have impact on the output.
In the example provide, based on the feature information if we want to predict if user is a diabitic patient or not, then some of feature is quite irrelvant like weather, officeInTime, which has very minimal or no impact on the prediction. Before training irrevant or less impacting feature need to be identified and removed from training dataset this is called feature selection.
There are multiple methods for the feature selections.
1. UniVariate Selection
2. Recursive Feature Elementation
3. Principal Component Analysis
Univariate Selection
Statistical tests can be used to select those features that have strongest relationship with the output variables. Its help us to eleminiate the features with very insignificant or less corrlation with the output. Using this we remove less used or irrelavant fatures from the datasets. Python liberary sklearn has the method model_selection to remove the unused or noisy feature from the dataset provided.
Recursive Feature Elementation
RFE is a wrapper methould that use other technigue to evaluate the and eliminate the features, This start with all the features and one by one remove the less singnificant features untill the it come to a number of the desisred features.
Model use a slice of the dataset provided and based on this, it remove the weakest feature first, and keep on eliminating the features untill it reach the desired feature count.
Please refer to this link if you want to know more or practical implementation.
https://machinelearningmastery.com/rfe-feature-selection-in-python/
Principal Component Analysis
Principal component Analysis is technique, where multiple factor which impact the output are combined together to generate the new factors, which intern call principal components or PCA. This is done in such a way that if we only consider first few components then also we will get the same or similar output. PCA is the technique, where we change the propective to visualize the data by combining or recuding the components (fatures) impacting the data, where as major pattern and trend are not impacted.
Please refer to page for more inputs
https://builtin.com/data-science/step-step-explanation-principal-component-analysis
No comments:
Post a Comment