Getting started
POST /v1/people/match
The Match API v1 uses fuzzy logic with various sets of rules to match and append to your data.
The difference between Search API and Match API is that Search API returns exact results from filters, whereas Match API matches your entities to the Data Axle database.
curl https://qa.api.data-axle.com/v1/people/match -d '{
"identifiers": {
"first_name": "Joe",
"last_name": "Smith"
}
}'
The response includes a persistent person_id
along with appended attributes:
{
"document": {
"person_id": "002293031973",
"rules": [
"name",
"address"
],
"score": 0.82,
"attributes": {
"person_id": "002293031973",
"family_id": "800047540468",
"first_name": "Joe",
"last_name": "Smith",
"city": "Seattle",
...
}
}
}
GET
orPOST
can be used.- By default, one match is returned. Set the
limit
option to retrieve multiple matches. - All available fields are included in the output. This can be changed with the
fields
option. - If a match is not found, a
null
document is returned.
Parameters
Parameter | Description | Default |
identifiers | The set of fields used to find a match for your record. | |
required_rule_groups | The set of rules the result must match. | |
match_rule_groups_exactly | How required_rule_groups matches rules. | false |
filter | Reduce potential results with the Filter DSL. | |
fields | The fields returned with the matched document. | Any Data Dictionary fields |
include_labels | Get the labels for encoded fields. | false |
limit | Return multiple results up to the given limit. Max of 400 results. | 1 |
minimum_match_score | Exclude results below a certain match score. | |
packages | Select packages of fields returned in the results. |
Identifiers
The fields you can submit for matching are:
Field | Description | Example |
reference_id | Optional field used to reference ID's from your system | 1cd345b729, User 12567 |
person_id | The unique identifier for the person. | 123456789000 |
first_name | The first name of the person. | John, Mary Ann, ... |
last_name | The last name of the person. | Smith |
street | The location address of the person. | 123 N Main St |
mailing_street | The mailing address of the person or family. | PO Box 123 |
suite | The unit or apartment number of the person. | A |
city | The city name for the person. | Seattle |
state | The state of the person. | WA |
postal_code | The postal code for the person's address. | 98134 |
phone | Telephone numbers associated with the family. | (206) 555-1212, 2065551212, ... |
email | The email address of the person or family. | someone@example.com, example.com, ... |
email_md5 | MD5 encrypted hash of the email address. | 16d113840f999444259f73bac9ab8b10 |
email_sha256 | SHA-256 encrypted hash of the email address. | 72497f475e4f76d0b28f57c73a084ece576... |
See the rule descriptions for which fields are required for each rule to match.
reference_id
is not used for matching, but is an optional field that can be used to reference requests using an ID from your own system. It will be returned with the results of batch requests.
Rules
Rules are automatically selected based on the data provided.
Rule | Description | Notes |
person_id | Match by person_id . Use this by itself if you already have a person_id and would like data appended. | high confidence |
name | Match by name. The last_name field is required. | |
address | Match by street and region (city, state, or postal_code). | |
phone | Match by phone number. Cell phone numbers will also be matched. | high confidence |
Match by email address. Any one of email , email_md5 , or email_sha256 can be provided. | high confidence |
Records that only match a single rule will only be returned if the rule is marked as "high confidence". If only one rule was run, such as when only an address is provided, address matches will be included in the result.
Score
Match scores are provided to make it easier to compare the similarity of the input to the matched record. Records are scored 0-1, with a higher score indicating greater similarity.
Different use cases may benefit from different score threshold considerations. We recommend using matches with a score of 0.8 or above and matched using least two rules. Matching using multiple rules will increase match quality.
To exclude low-confidence matches from your results, use the minimum_match_score
parameter.
{
"minimum_match_score": 0.8
}
Required Rule Groups
Use the required_rule_groups
parameter to specify groups of rules. Only results that match one or more of the groups are returned.
{
"required_rule_groups": [
["address", "name"],
["address", "phone"]
]
}
In this example, a record matches in the following cases:
- The record matches the "address" and "name" rule.
- The record matches the "address" and "phone" rule.
- The record matches the "address", "name", and "phone" rule.
Match Rule Groups Exactly
When match_rule_groups_exactly
is true
, groups of rules specified in required_rule_groups
must match exactly in order to be returned.
{
"required_rule_groups": [
["address", "name"],
["address", "phone"]
],
"match_rule_groups_exactly": true
}
In this example, a record matches in the following cases:
- The record matches the "address" and "name" rule.
- The record matches the "address" and "phone" rule.
The rule set '["address", "name", "phone"]' will not match.
Filters
The filter
parameter reduces results to records matching a specified criteria, using the Filter DSL.
Fields
By default, all fields in the Data Dictionary are included in the output. Use the fields
parameter to reduce the number of elements returned:
{
"fields": ["city", "date_of_birth", "street", "city", "zip", "first_name", "last_name"]
}
Some contact information, including email addresses, will be used for matching, but will not be included with the matched record. Some data may be suppressed for certain records and will not be returned.
Include Labels
The fields returned within records frequently contain encoded values that reference lookup data. To retrieve the labels for lookups, add the include_labels
option:
{
"include_labels": true
}
Read the Lookups API documentation for more information.
Limit
By default, one match is returned. Specify a limit
parameter to retrieve multiple matches. When limit is specified, an array of documents is returned.
{
"identifiers": {
"email": "smith-family@example.com"
},
"limit": 3
}
{
"documents": [
{
"person_id": "002293031973",
"rules": [
"email"
],
"score": 1.0,
"attributes": {
"person_id": "002293031973",
"family_id": "800047540468",
"first_name": "Joe",
"last_name": "Smith",
"city": "Seattle"
}
},
{
"person_id": "200089221963",
"rules": [
"email"
],
"score": 1.0,
"attributes": {
"person_id": "200089221963",
"family_id": "800047540468",
"first_name": "Jamie",
"last_name": "Smith",
"city": "Seattle"
}
}
]
}
Packages
Select packages of fields by providing a package param_key
to the packages
parameter. By default, every field on a package is returned. The packages
parameter is combined with the fields
parameter to return only specified fields.
{
"packages": ["base_v1", "emails_v2"]
}
Bulk Match
Use bulk requests to process large volumes of match requests in a batch:
- Create a batch
- Add match requests
- Retrieve results
Create a Batch
POST /v1/people/match/batch
Start by creating a batch. The initial request can include up to 1,000 match requests:
curl -XPOST https://qa.api.data-axle.com/v1/people/match/batch -d '{
"identifiers": [
{
"reference_id": "123",
"first_name": "John",
"last_name": "Smith",
"street": "123 Main St",
"city": "Seattle",
"state": "WA",
"postal_code": "98103"
},
{
"reference_id": "125",
"email": "johnson-family@example.com"
}
]
}
The immediate response includes a batch_id
and an array of objects containing a match_id
. Each match_id
is returned in the order the match request identifiers were submitted:
{
"batch_id": "8a140451fe3f095f1c205cf185efffec",
"matches": [
{
"match_id": "5cbb3b15c8bee0f706239b45cb763fed"
},
{
"match_id": "2e849855d32e1e20eb33e3b74a7785b2"
},
{
"match_id": "5f0b1070c3e4761cabf116bbed2b49c4"
}
]
}
Adding Requests
PUT /v1/people/match/batch/:batch_id
Add match requests to an existing batch_id
:
curl -XPUT https://qa.api.data-axle.com/v1/people/match/batch/:batch_id -d '{
"identifiers": [...]
}
Getting Bulk Match Results
GET /v1/people/match/batch/:batch_id
Use the Match Results API with the batch_id
to fetch completed results for the batch. Matches that are pending will not appear in the results.
Use the "status" object to determine batch progress. The batch has completed when
processed
is the same as requests
.
{
"next_token": "13835315192676945401741312",
"status": {
"requests": 500,
"processed": 223
},
"documents": [
{
"match_id": "5cbb3b15c8bee0f706239b45cb763fed",
"reference_id": "123",
"document": {
"person_id": "939010853",
"rules": ["name", "address"],
"score": 0.75,
"attributes": {
"person_id": "002293031973",
"family_id": "800047540468",
"first_name": "Joe",
"last_name": "Smith",
"city": "Seattle"
}
}
},
{
"match_id": "5f0b1070c3e4761cabf116bbed2b49c4",
"reference_id": "125",
"document": null
}
]
}
Scrolling Through Match Results
Each request returns up to 1,000 results. To read the next set of results, use the next_token
from the previous request and append it to the request URL via the since
parameter:
curl https://qa.api.data-axle.com/v1/people/match/batch/:batch_id?since=13835315192676945401741312
Repeat this process until an empty list of documents
is returned. Store the final next_token
for use in future requests from the same batch:
{
"next_token": "13835315192676945401741312",
"status": {
"requests": 500,
"processed": 500
},
"documents": []
}
Bulk Match Parameters
Parameter | Description | Default |
identifiers | The set of fields that are used to find a match for your record. | |
required_rule_groups | The set of rules the result must have to match. We recommend doing this either at batch creation time or result time, not both. | |
filter | Reduce potential results with a filter. | |
limit | Return multiple results up to the given limit count. Max 400 of results. | 1 |
Bulk Match Result Parameters
Parameter | Description | Default |
fields | The fields that are returned with the matched document. This overrides any fields that were created with the batch. | All fields in the Data Dictionary |
include_labels | Get the labels for encoded fields. | false |
since | The token of the earliest result you would like to receive. | |
required_rule_groups | The set of rules the result must match. We recommend doing this either at batch creation time or result time, not both. | |
match_rule_groups_exactly | How required_rule_groups matches rules. We recommend doing this either at batch creation time or result time, not both. | false |
packages | Select packages of fields returned in the results. |
Bulk Match Result Response
The match result includes the following fields:
Field | Description |
next_token | The next token to use when requesting more results. |
documents | The documents that matched your identifiers. If there were no matches, this will be an null or an empty array if a limit is provided. |
status.requests | The count of requests in the batch. |
status.processed | The count of requests that have been processed. The batch is complete when this number equals the requests count. |
match_id | The ID of the match request. |
rules | The rule that provided the best match to your identifiers. |
score | The score of the match result. Use this to further filter your results. |
person_id | The ID of the matched record will be included in the match . |
attributes | The fields specified by fields in the request or those specified by your contract. |