Monday, June 1, 2015

Scalability considerations for CRM / SharePoint integration

SharePoint is probably the most natural and easiest to implement solution for document management for CRM records. However, what happens when you have millions of records in CRM which might have documents?

I recently worked on fairly large document migration solution to be able to associate documents to CRM records (e.g. cases, opportunities). There were a few hundreds of thousands of files amounting to about 50GB of data. So here are a few things to consider


With the out of the box integration you have the choice to structure your folders based on accounts/contacts or simply to create a new library for each entity at the root of the site. It might be more convenient to navigate SharePoint if you can start from an account and then find which are the related cases/opportunities rather than starting with a global list of cases and try to find which cases are related to a specific account. In general, this structure is more convenient, until you consider security segregations.

If you need to segregate SharePoint security in such a way that some users can access documents of a given entity type (e.g. cases) but not be able to access documents of another entity type (e.g. opportunities) then the account/contact hierarchy becomes a problem because you cannot set the security at the library level (all documents would be on the same library) and you would have to do some acrobatics with folder security and inheritance breaking which would be a nightmare to maintain. Instead, if you have each entity type have its own SharePoint library then you can easily grant/deny access to documents of a given entity type.

Nonetheless, remember that with the out of the box CRM/SharePoint integration, there is no security synchronization between CRM and SharePoint. You need to keep this in mind throughout your design. Think for example that a user might not be able to access any opportunities in CRM but the user (if malicious) can always find a way to see the documents associated to CRM opportunities by going to SharePoint (unless you block the user completely from the given SharePoint library).

Also consider that changing the folder hierarchy after would be a major data migration effort so you should really think about what hierarchy makes more sense you’re your situation and consider current or future security requirements.


Imagine that you have 100K cases in your system and you have enabled SharePoint integration for case entity. And perhaps every case has at least one document. In this case, your case library/folder will contain 100K items flat on the same list. This goes well beyond what SharePoint recommendations are for scalability and performance. It is not recommended to go beyond 5000 items (even that is already quite high). Of course you can always implement some sort of archiving or use multiple SharePoint sites depending on some criteria so you split this load. Another [reasonably] easy solution is to further structure your folders by year, quarter or month (or all of these). This way you will not end up with 100K folders under the “cases” folder. Instead, the maximum number of folders you will have under a single folder will be the maximum amount of cases that can be opened in a given month/quarter which might be a more reasonable number.

In the example above, we have 2 new layers: Year and Month. These correspond to the date on which the case was opened. By adding these additional layers, we can now guarantee that there will be no folder with more than a few hundred sub-folders, since we know that we only open a few hundred cases per month.

The downside is that this cannot be done by simple configuration or out of the box integration. This structure would require that you register a plugin on create of case, which will create the document locations and the SharePoint folder for the case being created. Some other disadvantages of this approach is that:

- Every case will now have a folder, even if the case has no documents. This should not be a problem if you know that anyway all cases have documents

- You are creating the SharePoint folder at the same time as you create the case, instead of the OOB behavior which is to create the folder on demand the first time document library is accessed in CRM. This is not necessarily a bad thing though.

- If your plugin is sync and it fails (e.g. SharePoint is down) then it will prevent the creation of the case in CRM. If your plugin is async and it fails, then you need to have a way to recover and create the correct folder when the user tries to access document library for this record.


For sites with large volume of documents, large document sizes or rapidly growing volumes, you might also need to consider how long you have until you start having a performance or limitation problem with things like the max size of your content database, max number of files in the library or simply the maximum size of your site. O365 also has some limitations that you need to review. If you identify that this could be a potential problem you should consider implementing some sort of archival solution which will allow you to keep a link between CRM records and SharePoint files while at the same time optimizing for current and mostly used records. I have heard of people simply changing the CRM site every now and then and creating a new site once the old site is getting too large. I guess any of these strategies would work as long as you have defined a process to scale your site and monitor the volumes regularly. The most important thing is to consider this during your design phase and have identified your approach to handle scalability, even if you might not implement a long term maintenance strategy right away.