U.S. Department of Energy

Pacific Northwest National Laboratory

Extract Features

Feature extraction involves mapping processed data to a form consumable by classification methods, statistical models, or other types of machine learning algorithms. The goal is to extract salient information about the phenomenon of interest from the processed data, avoiding superfluous information in the data. The form typically is a multidimensional feature vector for each observational unit. Elements of the vector could be any combination of quantitative and/or qualitative values. For example, if the observational unit is a person, the feature vector might include age, height, weight, gender, and hair color. The resulting feature vector is often represented in a tabular form with features on the columns and observations on the rows.

Feature extraction may be accomplished using one or more of the following common techniques:

  • Transformation of one or more variables.  Examples include:
    • Functions (such as products, ratios, etc.) of one, or several, variables
    • Logarithmic transformations
    • Fourier transforms
  • Mapping unstructured data to features.  Examples include:
    • Natural Language Processing to extract numerical features from text
    • Extracting numerical features from imagery
    • Extracting numerical features from audio or other signals
  • Dimensionality reduction methods.  Examples include:
    • Principal component analysis
    • Singular value decomposition
    • Manifold embedding

For time-series data, smoothing techniques like splines and localized regression can be useful for extracting features.

| Pacific Northwest National Laboratory