Match API v1

< Prev Next >

Match and enhance your data v1

Getting started

POST /v1/people/match

The Match API v1 uses fuzzy logic with various sets of rules to match and append to your data.

The difference between Search API and Match API is that Search API returns exact results from filters, whereas Match API matches your entities to the Data Axle database.

curl https://qa.api.data-axle.com/v1/people/match -d '{
  "contract": "a86ef85a2ec",
  "identifiers": {
    "first_name": "Joe",
    "last_name": "Smith"
  }
}'

Note: The contract parameter is required if your account has multiple contracts. Omit it if your account has only one contract.

The response includes a persistent person_id along with appended attributes:

{
  "count": 1,
  "document": {
    "person_id": "002293031973",
    "rules": [
      "name",
      "address"
    ],
    "score": 0.82,
    "attributes": {
      "person_id": "002293031973",
      "family_id": "800047540468",
      "first_name": "Joe",
      "last_name": "Smith",
      "city": "Seattle",
      ...
    }
  }
}

GET or POST can be used.
By default, one match is returned. Set the limit option to retrieve multiple matches.
All available fields are included in the output. This can be changed with the fields option.
If a match is not found, a null document is returned.

Parameters

Parameter	Description	Default
identifiers	The set of fields used to find a match for your record.
required_rule_groups	The set of rules the result must match.
match_rule_groups_exactly	How `required_rule_groups` matches rules.	`false`
filter	Reduce potential results with the Filter DSL.
fields	The fields returned with the matched document.	Any Data Dictionary fields
include_labels	Get the labels for encoded fields.	`false`
limit	Return multiple results up to the given limit. Max of 400 results.	1
minimum_match_score	Exclude results below a certain match score.
contract	If multiple contracts are present on an account, you must specify a contract ID.	User's current contract if only one exists.
packages	Select packages of fields returned in the results.

Identifiers

The fields you can submit for matching are:

Field	Description	Example
reference_id	Optional field used to reference ID's from your system	1cd345b729, User 12567
`person_id`	The unique identifier for the person.	123456789000
`first_name`	The first name of the person.	John, Mary Ann, ...
`last_name`	The last name of the person.	Smith
`street`	The location address of the person.	123 N Main St
`mailing_street`	The mailing address of the person or family.	PO Box 123
`suite`	The unit or apartment number of the person.	A
`city`	The city name for the person.	Seattle
`state`	The state of the person.	WA
`postal_code`	The postal code for the person's address.	98134
`phone`	Telephone numbers associated with the family.	(206) 555-1212, 2065551212, ...
`email`	The email address of the person or family.	someone@example.com, example.com, ...
`email_md5`	MD5 encrypted hash of the email address.	16d113840f999444259f73bac9ab8b10
`email_sha256`	SHA-256 encrypted hash of the email address.	72497f475e4f76d0b28f57c73a084ece576...

See the rule descriptions for which fields are required for each rule to match.

reference_id is not used for matching, but is an optional field that can be used to reference requests using an ID from your own system. It will be returned with the results of batch requests.

Rules

Rules are automatically selected based on the data provided.

Rule	Description	Notes
person_id	Match by `person_id`. Use this by itself if you already have a `person_id` and would like data appended.	high confidence
name	Match by name. The last_name field is required.
address	Match by street and region (city, state, or postal_code).
phone	Match by phone number. Cell phone numbers will also be matched.	high confidence
email	Match by email address. Any one of `email`, `email_md5`, or `email_sha256` can be provided.	high confidence

Records that only match a single rule will only be returned if the rule is marked as "high confidence". If only one rule was run, such as when only an address is provided, address matches will be included in the result.

Score

Match scores are provided to make it easier to compare the similarity of the input to the matched record. Records are scored 0-1, with a higher score indicating greater similarity.

Different use cases may benefit from different score threshold considerations. We recommend using matches with a score of 0.8 or above and matched using least two rules. Matching using multiple rules will increase match quality.

To exclude low-confidence matches from your results, use the minimum_match_score parameter.

{
  "minimum_match_score": 0.8
}

Required Rule Groups

Use the required_rule_groups parameter to specify groups of rules. Only results that match one or more of the groups are returned.

{
  "required_rule_groups": [
    ["address", "name"],
    ["address", "phone"]
  ]
}

In this example, a record matches in the following cases:

The record matches the "address" and "name" rule.
The record matches the "address" and "phone" rule.
The record matches the "address", "name", and "phone" rule.

Match Rule Groups Exactly

When match_rule_groups_exactly is true, groups of rules specified in required_rule_groups must match exactly in order to be returned.

{
  "required_rule_groups": [
    ["address", "name"],
    ["address", "phone"]
  ],
  "match_rule_groups_exactly": true
}

In this example, a record matches in the following cases:

The record matches the "address" and "name" rule.
The record matches the "address" and "phone" rule.

The rule set '["address", "name", "phone"]' will not match.

Filters

The filter parameter reduces results to records matching a specified criteria, using the Filter DSL.

Fields

By default, all fields in the Data Dictionary are included in the output. Use the fields parameter to reduce the number of elements returned:

{
  "fields": ["city", "date_of_birth", "street", "city", "zip", "first_name", "last_name"]
}

Some contact information, including email addresses, will be used for matching, but will not be included with the matched record. Some data may be suppressed for certain records and will not be returned.

Include Labels

The fields returned within records frequently contain encoded values that reference lookup data. To retrieve the labels for lookups, add the include_labels option:

{
  "include_labels": true
}

Read the Lookups API documentation for more information.

Limit

By default, one match is returned. Specify a limit parameter to retrieve multiple matches. When limit is specified, an array of documents is returned.

{
  "identifiers": {
    "email": "smith-family@example.com"
  },
  "limit": 3
}

{
  "count": 2,
  "documents": [
    {
      "person_id": "002293031973",
      "rules": [
        "email"
      ],
      "score": 1.0,
      "attributes": {
        "person_id": "002293031973",
        "family_id": "800047540468",
        "first_name": "Joe",
        "last_name": "Smith",
        "city": "Seattle"
      }
    },
    {
      "person_id": "200089221963",
      "rules": [
        "email"
      ],
      "score": 1.0,
      "attributes": {
        "person_id": "200089221963",
        "family_id": "800047540468",
        "first_name": "Jamie",
        "last_name": "Smith",
        "city": "Seattle"
      }
    }
  ]
}

Packages

Select packages of fields by providing a package param_key to the packages parameter. By default, every field on a package is returned. The packages parameter is combined with the fields parameter to return only specified fields.

{
  "packages": ["base_v1", "emails_v2"]
}

Bulk Match

Use bulk requests to process large volumes of match requests in a batch:

Create a batch
Add match requests
Retrieve results

Create a Batch

POST /v1/people/match/batch

Start by creating a batch. The initial request can include up to 1,000 match requests:

curl -XPOST https://qa.api.data-axle.com/v1/people/match/batch -d '{
  "identifiers": [
    {
      "reference_id": "123",
      "first_name": "John",
      "last_name": "Smith",
      "street": "123 Main St",
      "city": "Seattle",
      "state": "WA",
      "postal_code": "98103"
    },
    {
      "reference_id": "125",
      "email": "johnson-family@example.com"
    }
  ]
}

The immediate response includes a batch_id and an array of objects containing a match_id. Each match_id is returned in the order the match request identifiers were submitted:

{
  "batch_id": "8a140451fe3f095f1c205cf185efffec",
  "matches": [
    {
      "match_id": "5cbb3b15c8bee0f706239b45cb763fed"
    },
    {
      "match_id": "2e849855d32e1e20eb33e3b74a7785b2"
    },
    {
      "match_id": "5f0b1070c3e4761cabf116bbed2b49c4"
    }
  ]
}

Adding Requests

PUT /v1/people/match/batch/:batch_id

Add match requests to an existing batch_id:

curl -XPUT https://qa.api.data-axle.com/v1/people/match/batch/:batch_id -d '{
  "identifiers": [...]
}

Millions of match requests can be added to a batch. However, only 1,000 match requests are allowed per API request.

Getting Bulk Match Results

GET /v1/people/match/batch/:batch_id

Use the Match Results API with the batch_id to fetch completed results for the batch. Matches that are pending will not appear in the results.

Use the "status" object to determine batch progress. The batch has completed when processed is the same as requests.

{
  "next_token": "13835315192676945401741312",
  "status": {
    "requests": 500,
    "processed": 223
  },
  "documents": [
    {
      "match_id": "5cbb3b15c8bee0f706239b45cb763fed",
      "reference_id": "123",
      "document": {
        "person_id": "939010853",
        "rules": ["name", "address"],
        "score": 0.75,
        "attributes": {
          "person_id": "002293031973",
          "family_id": "800047540468",
          "first_name": "Joe",
          "last_name": "Smith",
          "city": "Seattle"
        }
      }
    },
    {
      "match_id": "5f0b1070c3e4761cabf116bbed2b49c4",
      "reference_id": "125",
      "document": null
    }
  ]
}

Scrolling Through Match Results

Each request returns up to 1,000 results. To read the next set of results, use the next_token from the previous request and append it to the request URL via the since parameter:

curl https://qa.api.data-axle.com/v1/people/match/batch/:batch_id?since=13835315192676945401741312

Repeat this process until an empty list of documents is returned. Store the final next_token for use in future requests from the same batch:

{
  "next_token": "13835315192676945401741312",
  "status": {
    "requests": 500,
    "processed": 500
  },
  "documents": []
}

Bulk Match Parameters

Parameter	Description	Default
identifiers	The set of fields that are used to find a match for your record.
required_rule_groups	The set of rules the result must have to match. We recommend doing this either at batch creation time or result time, not both.
filter	Reduce potential results with a filter.
limit	Return multiple results up to the given limit count. Max 400 of results.	1

Bulk Match Result Parameters

Parameter	Description	Default
fields	The fields that are returned with the matched document. This overrides any fields that were created with the batch.	All fields in the Data Dictionary
include_labels	Get the labels for encoded fields.	`false`
since	The token of the earliest result you would like to receive.
required_rule_groups	The set of rules the result must match. We recommend doing this either at batch creation time or result time, not both.
match_rule_groups_exactly	How `required_rule_groups` matches rules. We recommend doing this either at batch creation time or result time, not both.	`false`
packages	Select packages of fields returned in the results.

Bulk Match Result Response

The match result includes the following fields:

Field	Description
next_token	The next token to use when requesting more results.
documents	The documents that matched your identifiers. If there were no matches, this will be an null or an empty array if a `limit` is provided.
status.requests	The count of requests in the batch.
status.processed	The count of requests that have been processed. The batch is complete when this number equals the `requests` count.
match_id	The ID of the match request.
rules	The rule that provided the best match to your identifiers.
score	The score of the match result. Use this to further filter your results.
person_id	The ID of the matched record will be included in the `match`.
attributes	The fields specified by `fields` in the request or those specified by your contract.