There are two key steps required to specify a signature problem. The first is to determine the phenomenon of interest and the second is to specify the purpose.
The phenomenon of interest is the observable fact or event targeted by the signature. Examples include human disease, network intrusion, the presence of pathogens, or the presence of explosives.
The purpose is the intent or goal of the signature: is it intended to detect, characterize, or predict? The purpose is further defined by the relationship in time between the signature target and the detection of the signature. The purpose is prognostic if the signature is used to detect or characterize targets in the future, diagnostic if it is used to detect or characterize targets in the present, and forensic if it is used to detect or characterize targets in the past.
When the problem has been specified, generate hypotheses about the types of observables that would be necessary and sufficient to detect or infer the presence of the phenomenon of interest. Methods might include brainstorming or structured elicitation. Building cross-disciplinary teams to participate in hypothesis generation and subsequent steps will facilitate a collaborative approach applicable across multiple domains.
Identify Potential Observables
From the hypotheses generated, identify all potential observables. Consider selecting multiple observables; useful signatures are often composed of features from multiple observables. This includes identifying the observational unit, which is the entity or instance that will be measured. It could be a person, a vehicle, a window of time, an item that is scanned, etc.
Select Observables for Measurement
Select observables for measurement that will best differentiate the signature. If measurements are insufficient to detect or characterize the signature, this step may be revised later in the process to add other observables.
Specify Measurement Principle
The measurement principle is the phenomenon serving as a basis of a measurement. It is used to identify the measurand (the quantities to be measured) that forms the basis of the data and could be used in the signature discovery process. The specification of a measurement principle is often an iterative process.
Specify Measurement Procedure
The measurement procedure is a detailed description of a measurement according to 1) one or more measurement principles and 2) a given measurement method, based on a measurement model and including any calculation to obtain a measurement result. The measurement procedure comprises:
- Measurement method - generic description of a logical organization of operations used in a measurement
- Measurement model - provides the mathematical framework to interpret the output of the method
- Measurement result - the set of quantity values being attributed to a measurement together with any other available relevant information. This is generally expressed as a single measured quantity value and a measurement uncertainty.
Measure or Collect Data
Once the measurement procedure is clearly defined, begin measuring and collecting data. Measurements may be quantitative or qualitative.
- Quantitative measurement - Quantity values that can be reasonably attributed to a quantity associated with an observational unit.
- Qualitative measurement - Qualitative or symbolic values that can be reasonably attributed to an observational unit.
Feature extraction involves mapping processed data to a form consumable by classification methods, statistical models, or other types of machine learning algorithms. The goal is to extract salient information about the phenomenon of interest from the processed data, avoiding superfluous information in the data. The form typically is a multidimensional feature vector for each observational unit. Elements of the vector could be any combination of quantitative and/or qualitative values. For example, if the observational unit is a person, the feature vector might include age, height, weight, gender, and hair color. The resulting feature vector is often represented in a tabular form with features on the columns and observations on the rows.
Feature extraction may be accomplished using one or more of the following common techniques:
- Transformation of one or more variables. Examples include:
- Functions (such as products, ratios, etc.) of one, or several, variables
- Logarithmic transformations
- Fourier transforms
- Mapping unstructured data to features. Examples include:
- Natural Language Processing to extract numerical features from text
- Extracting numerical features from imagery
- Extracting numerical features from audio or other signals
- Dimensionality reduction methods. Examples include:
- Principal component analysis
- Singular value decomposition
- Manifold embedding
For time-series data, smoothing techniques like splines and localized regression can be useful for extracting features.
Evaluate the suitability of the features extracted for identifying the phenomena of interest. Do the observables yield feature vectors that provide complete and sufficient information to create a signature?
Signature construction is the creation of a classifier that maps the feature vector of each observational unit to a set of labels. The term classifier refers to any type of classification method, statistical model, or machine learning algorithm that maps features to labels. Labels can be a discrete set (e.g., Threat, Not a Threat) or a quantitative interval (e.g., predicting the mass of an object). Ideally, each prediction also provides a measure of uncertainty.
There are two steps in constructing the classifier:
- Train the classifier
- Estimate classifier parameters by optimizing one or more objective functions.
- Test the classifier
- Use the random subsets of data not used for training a particular instance of the classifier to test the fidelity of the classifier predictions.
- The measure of classifier fidelity will likely apply principles from "Signature quality assessment" (see below).
Signature detection is the application of the signature system (resulting from the previously defined steps) to actual problem datasets: the series of transformations of events to labels and probabilities. This process often reveals challenges within the dataset that require additional consideration.
Signature Quality Assessment
Assessing the quality of a signature is key to determining whether the signature is accomplishing its goal or whether the classifier needs to be revised to better identify, characterize, or detect the signature.
A formal assessment of signature quality involves constructing a multi-attribute utility function to compare two or more signature systems. The utility function accounts for all the measurable criteria, or attributes, that will be used to evaluate the systems. The multi-attribute utility function is the weighted average of the single attribute utility functions.
A variety of methods can be used to construct a multi-attribute utility function. A traditional approach is to:
- Identify and measure attributes of the quality of the signature system. Example attributes include:
- Fidelity: How well does the classifier predict, detect, or characterize the phenomenon of interest?
- Cost: How much does it cost to make a prediction for a single observational unit?
- Risk: What are the likelihood and consequences of misclassification?
- Other Attributes: Are there other measurable attributes or criteria that would be used to distinguish one signature system from another, especially in an operational environment? Examples include weight of the sensor, battery life, risk to personnel when using the system, and the amount of a forensic sample consumed during measurement.
- Construct single-attribute utility functions that map each attribute to the [0, 1] interval, such that 1 is highest utility and 0 is lowest utility. The curvature of the single attribute utility function is chosen to reflect the relative value of the attribute as it moves through its observable range.
- Identify relative weights for each attribute that reflect the tradeoffs among these attributes. For example, the weights must reflect our preferences for a signature system that performs well with respect to one attribute (e.g., fidelity) at the expense another attribute (e.g., cost).
Evaluate the signature for suitability in identifying, characterizing, or detecting the phenomena of interest. Signature development is an iterative process; if the signature does not perform suitably, revisit previous steps to assess where modifications could be made.