hnswSearcher
Hierarchical Navigable Small Worlds (HNSW) approximate nearest neighbor search
Since R2024a
Description
An hnswSearcher
model object stores the training data, distance
metric, and parameter values of the distance metric for a Hierarchical Navigable Small Worlds
(HNSW) approximate nearest neighbor search. For an algorithm overview, see Approximate KNN Search Using Hierarchical Navigable Small Worlds (HNSW) Algorithm.
After you create an hnswSearcher
model object, you can use it to find
points in the training data that are close to the query data by performing a nearest neighbor
search using knnsearch
. The HNSW search algorithm is faster than
other KNN search algorithms. However, HNSW search results might not be exact.
Creation
Description
creates an HNSW searcher object based on the data in the matrix Mdl
= hnswSearcher(X
)X
. Each
row of X
represents one observation, and each column represents one
feature.
sets Properties and additional options
using name-value arguments. For example, to use the Minkowski distance, set
Mdl
= hnswSearcher(X
,Name=Value
)Distance="minkowski"
.
Input Arguments
X
— Input data
numeric matrix
Input data, specified as a numeric matrix. The rows of X
correspond to observations, and the columns correspond to variables.
The value of X
sets the value of the X
property
in the model object.
Data Types: single
| double
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: Distance="minkowski",P=3
specifies to use the Minkowski
distance metric with an exponent of 3.
Distance
— Distance metric
"euclidean"
(default) | character vector or string scalar
Distance metric used when you call knnsearch
to find nearest neighbors for future query points, specified
as a character vector or string scalar of the distance metric name.
The value of Distance
sets the value of the Distance
property in the model object.
Value | Description |
---|---|
"chebychev" | Chebychev distance (maximum coordinate difference) |
"cityblock" | City block distance |
"correlation" | One minus the sample linear correlation between observations (treated as sequences of values) |
"cosine" | One minus the cosine of the included angle between observations (treated as row vectors) |
"euclidean" | Euclidean distance |
"hamming" | Hamming distance, which is the percentage of coordinates that differ |
"jaccard" | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |
"mahalanobis" | Mahalanobis distance |
"minkowski" | Minkowski distance. The default exponent is 2. |
"seuclidean" | Standardized Euclidean distance |
"spearman" | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |
Example: Distance="minkowski"
Data Types: char
Cov
— Covariance matrix for Mahalanobis distance metric
cov(X,"omitrows")
(default) | positive definite matrix
Covariance matrix for the Mahalanobis distance metric, specified as a
K-by-K positive definite matrix, where
K is the number of columns in X
. This
argument is valid only when Distance
is
"mahalanobis"
.
The value of Cov
sets the value of the
DistParameter
property in the model object.
Example: Cov=eye(3)
Data Types: single
| double
P
— Exponent for Minkowski distance metric
2
(default) | positive scalar
Exponent for the Minkowski distance metric, specified as a positive scalar. This argument is
valid only when Distance
is
"minkowski"
.
The value of P
sets the value of the
DistParameter
property in the model
object.
Example: P=3
Data Types: single
| double
Scale
— Scale parameter value for standardized Euclidean distance metric
std(X,"omitnan")
(default) | nonnegative numeric vector
Scale parameter value for the standardized Euclidean distance metric, specified as a
nonnegative numeric vector of length K, where K is
the number of columns in X
. The software scales each difference
between the training and query data using the corresponding element of
Scale
. This argument is valid only when
Distance
is "seuclidean"
.
The value of Scale
sets the value of the
DistParameter
property in the model object.
Example: Scale=quantile(X,0.75) - quantile(X,0.25)
Data Types: single
| double
Properties
You can set the Distance
, MaxNumLinksPerNode
, and
TrainSetSize
properties by using the name-value argument syntax when you
call hnswSearcher
. However, you cannot directly specify the
DistParameter
property. As noted earlier,
DistParameter
is set by the Cov
, P
,
or Scale
name-value argument.
Distance
— Distance metric
'euclidean'
(default) | character vector
This property is read-only.
Distance metric used when you call knnsearch
to find nearest neighbors for future query points, specified as
a character vector of the distance metric name.
Value | Description |
---|---|
'chebychev' | Chebychev distance (maximum coordinate difference) |
'cityblock' | City block distance |
'correlation' | One minus the sample linear correlation between observations (treated as sequences of values) |
'cosine' | One minus the cosine of the included angle between observations (treated as row vectors) |
'euclidean' | Euclidean distance |
'hamming' | Hamming distance, which is the percentage of coordinates that differ |
'jaccard' | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |
'mahalanobis' | Mahalanobis distance |
'minkowski' | Minkowski distance. The default exponent is 2. |
'seuclidean' | Standardized Euclidean distance |
'spearman' | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |
Data Types: char
DistParameter
— Distance metric parameter values
[]
| positive scalar | numeric vector | numeric matrix
This property is read-only.
Distance metric parameter values, specified as a positive scalar, numeric vector, or
numeric matrix. This property is nonempty only when you specify Distance
as
"seuclidean"
, "minkowski"
, or
"mahalanobis"
.
If
Distance
is"seuclidean"
,DistParameter
is a vector of scaling factors for each dimension, specified as a positive vector. The value is set by theScale
name-value argument. The default value isstd(X,"omitnan")
.If
Distance
is"minkowski"
,DistParameter
is the exponent of the Minkowski distance, specified as a positive scalar. The value is set by theP
name-value argument. The default value is 2.If
Distance
is"mahalanobis"
,DistParameter
is a covariance matrix, specified as a numeric matrix. The value is set by theCov
name-value argument. The default value iscov(X,"omitrows")
.
Data Types: single
| double
MaxNumLinksPerNode
— Number of connections created for each node
min(16,N)
, where N
is the number of rows in X
(default) | positive scalar
This property is read-only.
Number of connections created for each node, specified as a positive scalar.
Typically, set a number from about 5 to 48. MaxNumLinksPerNode
cannot
exceed TrainSetSize
.
Data Types: double
TrainSetSize
— Size of list for potential nearest neighbors
200
(default) | positive integer
This property is read-only.
Size of the list for potential nearest neighbors, specified as a positive
integer. Typically, set a number from about 100 to 2000.
TrainSetSize
must be at least
MaxNumLinksPerNode
and no more than
N
, the number of rows of
X
.
You can try to search for a good value of TrainSetSize
as
follows:
Specify a large enough value to achieve a very accurate search result (close to an exact search).
Run query tests with decreased values, until the accuracy reaches a reasonably good range (recall rate of 0.95–0.99).
Data Types: double
X
— Point data
real matrix
This property is read-only.
Point data, specified as a real matrix. Each row of X
represents
one point. The value of this property is set by the X
input
argument.
Data Types: single
| double
Object Functions
knnsearch | Find k-nearest neighbors using searcher object |
Examples
Create HNSW Searcher
Create an HNSW searcher object using the data in a pseudorandom input matrix X.
rng default % For reproducibility X = 50*randn(1000,20); Mdl = hnswSearcher(X)
Mdl = hnswSearcher with properties: MaxNumLinksPerNode: 16 TrainSetSize: 200 Distance: 'euclidean' DistParameter: [] X: [1000x20 double]
Mdl
is an hnswSearcher
model object with default properties.
Find the closest data point to the point ones(1,20)
and display its distance.
[idx,D] = knnsearch(Mdl,ones(1,20))
idx = 174
D = 118.4583
Perform the same search using an exact KNN search, and compare the results.
[idx2,D2] = knnsearch(X,ones(1,20))
idx2 = 174
D2 = 118.4583
The approximate hnswSearcher
algorithm gives the same results as the exact KNN search algorithm
Create HNSW Searcher with Nondefault Properties
Create an HNSW searcher object using the "cityblock"
distance and twice as many links per node as the default.
rng default % For reproducibility X = 50*randn(1000,20); Mdl = hnswSearcher(X,Distance="cityblock",MaxNumLinksPerNode=32)
Mdl = hnswSearcher with properties: MaxNumLinksPerNode: 32 TrainSetSize: 200 Distance: 'cityblock' DistParameter: [] X: [1000x20 double]
Mdl
has the specified nondefault properties.
Find the closest data point to the point ones(1,20)
and display its distance.
[idx,D] = knnsearch(Mdl,ones(1,20))
idx = 738
D = 420.2990
Perform the same search using an exact KNN search, and compare the results.
[idx,D] = knnsearch(X,ones(1,20),Distance="cityblock")
idx = 738
D = 420.2990
hnswSearcher
gives the same results as the exact searcher.
HNSW Speed and Accuracy for Large Data
Create a large data set and compare the speed of an HNSW searcher to the speed of an exhaustive searcher for 1000 query points.
Create the data.
rng default % For reproducibility A = diag(1:100); B = randn(100,10); K = kron(A,B); ss = size(K)
ss = 1×2
10000 1000
The data K
has 1e4
rows and 1e3
columns.
Create an HNSW searcher object h
from the data K
.
tic; h = hnswSearcher(K); chnsw = toc
chnsw = 10.6544
Create 1000 random query points with 1000 features (columns). Specify to search for five nearest neighbors.
Y = randn(1000, 1000); tic; [idx, D] = knnsearch(h,Y,K=5); thnsw = toc
thnsw = 0.0797
Create an exhaustive searcher object e
from the data K.
tic e = ExhaustiveSearcher(K); ces = toc
ces = 0.0021
Creating an exhaustive searcher is much faster than creating an HNSW searcher.
Time the search using e
and compare the result to the time using the HNSW searcher h
.
tic; [idx0, D0] = knnsearch(e,Y,K=5); tes = toc
tes = 1.4841
In this case, the HNSW searcher computes about 20 times faster than the exhaustive searcher.
Determine how many results of the HNSW search differ in some way from the results of the exhaustive search.
v = abs(idx - idx0); % Count any difference in the five found entries vv = sum(v,2); % vv = 0 means no difference nz = nnz(vv) % Out of 1000 rows how many differ at all?
nz = 118
Here, 118 of 1000 HNSW search results differ from the exhaustive search results.
Try to improve the accuracy of the HNSW searcher by training with nondefault parameters. Specifically, use larger values for MaxNumLinksPerNode
and TrainSetSize
. These settings affect the speed of training and finding nearest neighbors.
tic; h2 = hnswSearcher(K,MaxNumLinksPerNode=48,TrainSetSize=2000); chnsw2 = toc
chnsw2 = 78.4836
With these parameters, the searcher takes about seven times as long to train.
tic; [idx2, D2] = knnsearch(h2,Y,K=5); thnsw2 = toc
thnsw2 = 0.1049
The speed of finding nearest neighbors using HNSW decreases, but is still more than ten times faster than the exhaustive search.
v = abs(idx2 - idx0); vv = sum(v,2); nz = nnz(vv)
nz = 57
For the slower but more accurate HNSW search, only 57 of 1000 results differ in any way from the exact results. Summarize the timing results in a table.
timet = table([chnsw;ces;chnsw2],[thnsw;tes;thnsw2],'RowNames',["HNSW";"Exhaustive";"HNSW2"],'VariableNames',["Creation" "Search"])
timet=3×2 table
Creation Search
_________ ________
HNSW 10.654 0.079741
Exhaustive 0.0021304 1.4841
HNSW2 78.484 0.10492
Algorithms
The HNSW algorithm is described in Malkov and Yashunin [1]. A brief description of the algorithm appears in Approximate KNN Search Using Hierarchical Navigable Small Worlds (HNSW) Algorithm.
Alternative Functionality
You can also create an HNSW searcher object by using createns
with the specification NSMethod="hnsw"
. The
resulting object is the same as the object you create using the
hnswSearcher
function.
References
[1] Malkov, Yu. A., and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. Available at https://arxiv.org/abs/1603.09320.
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)