Importing Data

Understanding Client Matching Using the Data Import Tool (DIT)

 

The Data Import Tool (DIT) matching algorithm employs the use of the Jaro-Winkler string metric to determine the "distance" between two strings. In the context of the DIT, it compares string data from the source file with what is currently stored in a Clarity Human Services instance to determine a match. This method is extremely accurate in determining logical matches during the import process.

When a file is introduced to the DIT and client matching is not turned off completely (No Matching), the DIT looks at the imported file(s) external_id and tries to match that to all external_ids stored in the Clarity Human Services database import mapping tables.

If the external_id exists in the Clarity Human Services database mapping tables, the importer will update that record within Clarity Human Services, as it has already been matched in a previous import. 

If the external_id is not found, it will look at the client record contents to determine a match, as explained below.

Full Matching 

If Full Matching is enabled, the matching algorithm will leverage the following fields from the imported file to search for an existing match within the Clarity Human Servies database:

  • ssn_quality
  • ssn
  • name_quality
  • first_name
  • last_name
  • dob_quality
  • birth_date

If there is a 100% match in the above fields with what is currently stored in Clarity Human Services for a client record, the Clarity Human Services client record will be updated with the client information being imported.  If a match is not made, a new client record will be created with the file's information.

Regular Matching 

Regular Matching goes through a multi-step comparison to determine a match, leveraging the Yaro-Winkler string metric for some of the fields. The steps are outlined below, starting with the introduction of the file.

1. The importer first looks for the following conditions in the imported file:

ssn_quality = 1
AND
ssn is not NULL

If the above conditions are met and there is a Clarity Human Services match, the Clarity Human Services client record is updated with the client information from the file.  If not, the process continues.

2. If the above conditions are not met, the following conditions are checked:

ssn is not NULL
AND
last_name is not NULL
AND
dob_quality = 1
AND
birth_date is not NULL

If the above conditions are met, the importer will perform a Clarity Human Services client search using the following fields, leveraging the Yaro-Winkler method to determine a match:

  • last four digits of SSN
  • last_name
  • birth_date

If the above conditions are met and there is a Clarity Human Services match, the Clarity Human Services client record is updated with the client information from the file.  If not, the process continues.

3. If the above conditions are not met, the following conditions are checked:

ssn is not NULL
AND
last_name is not NULL
AND
first_name is not NULL

If the above conditions are met, the importer will perform a Clarity Human Services client search using the following fields, leveraging the Yaro-Winkler method to determine a match.

  • last four digits from SSN
  • last_name
  • first_name

If the above conditions are met and there is a Clarity Human Services match, the Clarity Human Services client record is updated with the client information from the file.  If not, a new client record is created.

No Matching Enabled

With No Matching enabled, the entire process outlined above does not occur, and each client record introduced in the file will be written to the Clarity Human Services database as a new client record.

Note: the Yaro-Winkler metric will pick up on client names' differences in most cases. For example, comparing first names like "John" to "Jon" or "Will" to "William" will work. However, cases like "Bill" to "William" or "Jim" to "James" will not, as they are more distinct differences in the string. However, keep in mind that when other fields are introduced in addition to client name (SSN, DOB, etc.), the matching accuracy is exponentially increased and extremely accurate in finding matches within the system.