Match Data to Designs
You can use the Design Match view for matching data to experimental designs for global models. Here you can select data for modeling. You can use an iterative process: make a design, collect some data, match that data with your design points, modify your design accordingly, then collect more data, and so on. You can use this process to optimize your data collection process in order to obtain the most robust models possible with the minimum amount of data.
Use the Design Match view to select data for modeling. All data you select is
also added to a new design called Actual Design. You can
use the matching process to produce an Actual Design that
accurately reflects your current data. You can then use this new design to
decide the best points to use if you want to augment your current design in
order to collect more data.
For instructions, see the following section, How to Use the Design Match View.
How to Use the Design Match View
Tip
For a step-by-step guide to matching data to a design using an example project, see Match Data to Designs in the Getting Started documentation.
You can Shift+click (or center+click) and drag to zoom in on clusters of interest. Double-click the plot to return to full size.
Use the following sequence as a guideline for matching data to designs using the Design Match plot:
It is unlikely that you will get the tolerances right immediately. Open the Tolerance Editor using the context menu and try different values for different variables. These values determine the size of clusters centered on each design point. Data points that lie within tolerance of any design point in a cluster are matched to that cluster. See The Tolerance Editor for cluster definitions.
For matching data to designs, you might want to clear the check box in the Design Match for green clusters (with equal data and design points). These clusters are matched; you are more likely to be interested in unmatched points and clusters with uneven numbers of data and design points. Removing the green clusters allows you to focus on these points of interest. If you want your new Actual Design to accurately reflect your current data, your aim is to get as many data points matched to design points as possible, that is, as few red clusters as possible. See Red Clusters.
You can see the values of variables at different points by clicking and holding. Selected points have a pink border. Once points are selected, you can change the plot variables using the
X-andY-axis factordrop-down menus to track those points through the different dimensions.This can give you a good idea of which tolerances to change in order to match points. Remember that points that do not form a cluster can appear to be perfectly matched when viewed in one pair of dimensions; you must view them in other dimensions to find out where they are separated beyond the tolerance value. Use this tracking process to decide whether you want particular pairs of points to be matched, and then change the tolerances until they form part of a cluster.
Remember that points you select in the design match view are selected across the Data Editor, so if you have other data plots or a table view open you can investigate the same points in different views.
Once you have found useful values for the tolerances by trial and error, you can make selections of points within clusters that have uneven numbers of data and design points. These clusters are blue (more data than design) or red (more design than data). Select any cluster by clicking it. The details of every data and design point contained in the selected cluster appear in the Cluster Information list. Choose the points you want to keep or discard by selecting or clearing the check boxes next to each point. Notice that your selections can cause clusters to change color as you adjust the numbers of data and design points within them.
You can also select unmatched points by right-clicking and selecting Select Unmatched Data. All unmatched points then appear in the list view. You can decide whether to include or exclude them in the same way as points within clusters, by using the check boxes in the list. If you decide to exclude data points (within clusters or not) they appear on the plot as black crosses (if the Excluded Data check box is selected for display).
Note that it is a single fast operation to multiple-select points before selecting or clearing a check box, rather than selecting points individually. To do this, use Shift+click to select multiple points and hold the Shift key when clicking one of the check boxes.
You can right-click and select Show Labels to see design and data point numbers on the plot (also in the View menu).
Continue this process of altering tolerances and making selections of points until you are satisfied that you have selected all the data you want for modeling. All selected data is also added to your new Actual Design, except that in red clusters.
Red Clusters
These contain more design points than data points. These data
points are not added to your design, because the algorithm
cannot choose the design points to replace, so you must
manually make selections to deal with red clusters if you
want to use these data points in your design. If you don't
care about the Actual Design (for
example, if you do not intend to collect more data) and you
are just selecting data for modeling, then you can ignore
red clusters. The data points in red clusters are selected
for modeling. For information about the effects of your
selections, see What Will Happen to My Data and Design?
The Tolerance Editor
Open the Tolerance Editor by selecting Tolerances in the context menu.
Here you can edit the tolerance for selecting data points. You can choose values for each variable to determine the size of tolerance in each dimension.
Data points within the tolerance of a design point are included in that cluster.
Data points that fall inside the tolerance of more than one design point form a single cluster containing all those design and data points.
Excluded data (shown as black crosses) that lies within tolerance appears in the list when that cluster is selected. You can then choose whether to use it or continue to exclude it.
Data in Design (pink crosses) is the only type of data that is not included in clusters.
Note
For grouped data, tolerances are set for global variables. Data used for matching uses operating point means of global variables, not individual records, unlike other Data Editor views. Click points to inspect values of global variables.
Using the Tolerance Editor is the same process as setting tolerances within the Data Wizard. In the Data Wizard you can also choose in advance what to do with unmatched data and clusters with uneven numbers of data and design points. These choices affect how the cluster algorithm is first run; you can always change selections later in the Data Editor. See Step 4: Set Tolerances.
Note
If you modify the data in any way while the Design Match view is open (e.g., by applying a filter) the cluster algorithm will be rerun. You might lose your design point selections.
See the next section, What Will Happen to My Data and Design?, for information about what happens to your data set and design when you close the Data Editor after data selection in the Design Match view.
What Will Happen to My Data and Design?
As with everywhere else in the Data Editor, the changes you make are
only applied to the data set when you exit. When you close the Data
Editor, your choices in the Design Match plot are applied to the
data set and a new design called Actual Design is
created. All the changes are determined by your check box selections
for data and design points.
Note
All data points with a selected check box are selected for modeling. All data points with a cleared check box are excluded from the data set.
All data points with a selected check box are put into the new
Actual Design
except those in red clusters. See
below.
When you close the Data Editor, these changes are applied:
Green clusters — equal number of data and design points
The design points are replaced by the equal number of data points. These points become fixed design points (red in the Design Editor table) and appear as Data in Design (pink crosses) when you reopen the Data Editor.
This means that these points are not included in clusters when matching again. These fixed points are also not changed in the Design Editor when you add points, although you can unlock fixed points if you want. This can be very useful if you want to optimally augment a design, taking into account the data you have already obtained.
Blue clusters — more data than design points
The design points are replaced by all the data points.
Note
Design points with selected check boxes in green or blue clusters are the points that will be replaced by your selected data points. You may have cleared the check boxes of other design points in these clusters, and these points will be left unchanged.
Red Clusters — more design than data points
Red clusters indicate that you should make a decision if you want your new
Actual Designto reflect your most current data. The algorithm cannot choose the design points to replace with the data points, so no action is taken. Red clusters do not make any changes to the design when you close the data editor. The existing design points remain in the design. The data points are included or excluded from the data set depending on your selections in the Cluster Information list, but they are not added to the design.Unmatched Design Points
These remain in the design.
Unmatched Data Points
If you have selected the check boxes for unmatched data, they become new fixed design points, which are red in the Design Editor. When you reopen the Data Editor these points are Data in Design, which appear as pink crosses. Note that in the Data Wizard you could choose
Useto select all these initially, or you could chooseDo not use, which clears all their check boxes. See Step 4: Set Tolerances.Data in Design
These remain in the design.
Excluded Data
These data points are removed from the data set and are not displayed in any other views. If you want to return them to the data set you can only do so by selecting them in the Design Match view.
Tip
For a step-by-step guide to matching data to a design using an example project, see Match Data to Designs in the Getting Started documentation.