Saturday, January 26, 2013

Problems with leading and trailing blank spaces in CRM data

I recently ran into a number of problems that ended all up being due to blankspaces in some CRM data. I thought of sharing where these problems come from, how blank spaces can affect CRM functionality and how you can resolve them.

You might wonder: how can you end up with leading or trailing blank spaces in text fields in CRM? While it is true that CRM forms are smart enough to remove these blank spaces upon form save, it might not be the case when the data comes in from data import or during integrations with external systems. It sounds like a small problem, but if you are not careful, blank spaces can become a data integrity nightmare for the following reasons:

1. You might end up with duplicate data that is hard to resolve later on. For example you might have account “123” and account “ 123 “ which reference to your same customer. Now all your activities, invoices and all related records are potentially spread across 2 different records. Fortunately, duplicate detection ignores leading/trailing blank spaces so these 2 records would be considered duplicate if you have a published rule on account number field. However, if these accounts are created from the SDK (or through an integration layer), chances are that the duplicate detection is not enforced because when making web service calls duplicate detection is off by default unless specified in the create/update message. Fixing duplicate parent records can be a really long and boring task.

2. Assume you have multiple contacts related to account “123” that you want to import using the data import wizard. However, your account appears in CRM as “ 123 “ (with blank spaces). It will be impossible for import wizard to match the contact’s parent account and import will fail because the parent account was not found. You might say: I can just change my import spreadsheet to reference account “ 123 “ instead of “123”. The answer is: it does not matter if in your import XML file you enter “123” or “ 123 “ as the parent account, CRM will not be able to resolve the parent account!! In short: You cannot use the out-of-the-box data import feature to import related records to that account until you fix the blank spaces, argh.

3. Once you fix all your integration points to trim all text fields before committing a create / update to CRM, how do you clean the thousands of records that already exist? My first guess was: Export to Excel for re-import, use some Excel functions to remove blank spaces and import back to CRM. Again to my surprise, removing leading/trailing blankspaces in text fields is not considered a change for CRM so during import, the records were not even processed! When you export records for re-import, CRM is smart enough to know that if you did not update a row, it will not re-import that row. However, it is not smart enough to figure out that you removed blank spaces so it also ignores your update.

4. Sometimes the CRM forms trim blank spaces for text fields. This can become a problem because if your data has trailing blank spaces, the CRM form will remove then on load, and automatically the form is marked as dirty! Therefore, just by opening the record your form is dirty and you get a warning about pending changes when you close the form even if you did not update the record. It also prevents some functionality since some of our ribbon buttons require the form to not be dirty. It is also not evident what was happening and took us a while to figure out why some forms where always getting marked as dirty.

 

We let our production CRM get really messy because we did not notice this problem until too late (we never noticed the external systems were sending data padded with blank spaces) so cleansing the data became quite challenging since it was spread to multiple fields in multiple entities, many duplicate parent and child records. As I mentioned earlier, fortunately duplicate detection does consider these 2 cases as duplicates (“123” and “ 123 “) so you can execute multiple duplicate detection jobs to find and resolve duplicates. However, it can be a really long task to do. In our case, because of the magnitude of the data cleanse that was required we had to invest in a data cleansing tool that would iterate through multiple records resolving the blank spaces and the duplicates which reduced considerably the amount of manual work required to solve the problem. The “tool” in a nutshell consisted simply in a custom workflow activity that takes as input a CRM query (Rollup Query) and then retrieves and cleans all the records specified as input. Then we can use on-demand workflows and provide different queries to resolve duplicates in multiple entities. In any case our conclusion is that we should have thought about this from the beginning and I hope this post helps other avoid the same problem before it is too late.

3 comments:

  1. Excellent, but if there are already duplicates in my system then its really a huge task. Thanks for the information

    ReplyDelete