Aug 2, 2018

Feature selection

Questions:

a.   How to automatically identify all the features needed to classify the data?

b.   How to automatically identify the features given limited sources of data?

c.   How much data is needed (quantify it) to be able to generate N different classes?

d.   What is the confidence measures assigned to each classes identified by the features selection robots?

e.   What are the attributes that may affect the confidence measures for the classes?   (eg, dynamicity or constancy of the data, stagnant-ness, time-dependent/future dependent, probability which is dynamically changing, importance of data - and relative importance, the ease of collection of data, the availability of data distributed by time/area/)

f.   Data efficiency of features identification:   how much data is needed to determine the classes?   How much is needed to for training?   If highly time-varying, then more training needed.

g.   Game-theoretic mechanism in new features suggestion, vs features destruction (does decreasing energy consumption) etc.   Possible to have GAN framework for this?

h.   Subsampling, and sub-subsamping in many different ways - to identify the features that cut across different slices of representation.   Randomize the grouping mechanism.

i.    Stochastic, back and forth search and undoing the search.

j.   Introducing new data for fake features.   Contrast with existing features.   And possible to separate out two different groups of data by emphasizing the features's importance.


Simple method:







Steps for features selection.



 





No comments: