Author: Igor Petelin, Software Engineer
When migrating data to SugarCRM from another system one can frequently face the problem of incorrect data (for example, website, email, telephone number):
The reason of the issue is that the control over CRM data input is insufficient. A user may enter irrelevant email address in a hurry and a customer will never get an email sent to him/her. So, what is the way out?
Let me describe the way of checking the data correctness, as an example we’ll take email address, website and telephone number. Data check can occur either during daily (weekly, monthly) data import to SugarCRM from an old system or in the existing database. So, let’s begin.
Verification of an email address correctness in the existing SugarCRM database by the example of Accounts module
For the beginning we need to pick up data from email_addresses table of SugarCRM database, since it’s the place where all email addresses are stored. Here we need to apply tMysqlInput component. It picks up data from MySQL database.
Let’s move on to its configuring. We need to set database connection parameters, table name and scheme. Since retrieve only 2 fields, the scheme can be created manually. After you have created the scheme and the table’s name, click Guess Query. Needed request will be generated in Query field.
Below are the results of our actions:
Then we place tMap and tMysqlOutput on the palette and combine them. You have probably already understood, tMysqlOutput is a component that records data into MySQL database. Let’s configure it:
We will set the same connection parameters as for tMysqlInput, in particular the name of email_addresses table. Then we need to indicate that we will update data, so we select «Update» form the drop-down list of parameter «Action on data». Next, we will configure the scheme: here we have only 2 fields – id and invalid_email. The invalid_email field is used in SugarCRM for marking an incorrect email address. An incorrect email address is displayed in the interface as follows:
When updating data in the table we need to indicate a key. It’s a filed in which an update will take place. In our case it’s the id filed.
Let’s move on to the most interesting part. It’s time to create a particular function in Talend. We will use it to verify email address correctness. So, we can create functions using Routines.
Routines is parts of code that can be used many times. They are written in java language and are used for Talend functionality extension. For creating Routines we need to go to «Code» in the menu on the left, click «Routines» with the right button and select «Create routine».
Then we indicate the name and click «Finish». A java class with one method helloExample will be created. It is of no interest to us, so you can delete this part of the code, instead we are going to create a new method, which will check the correctness of an email address by the pattern.
Here is what we’ve got:
I’m not going to discuss in detail the code part, but I have to note: here we verify the first argument of email_address with a set pattern in the second email_pattern parameter, or with a standard pattern (if the second parameter is empty), which we can set in the function’s body. The function returns true, if the email address is relevant and false – if not. Which pattern to choose is up to you.
Now it’s time to apply the function we created. For this we need to open tMap. Click «Activate expression filter» over the output flow and create a condition on which we will update data in the table.
Let me explain, what happens at this stage: we set value true in invalid_email filed, if an email address (email_address) does not suit the provided pattern. After running the utility we get the following result:
As you can see, the process is quite simple. Now we can go to verifying the correctness of a customer’s website.
Verifying website correctness in the existing SugarCRM database on the example of Accounts module
First of all, we need to place three components on the design window, which we used for email addresses verification. The only thing we have to change is the table’s name and the column’s name, which we are going to retrieve. In SugarCRM Accounts module website address is stored in website field.
We will not check the correctness of the email address, but we will look if such website exists. In order to check a website’s existence, we will use nslookup utility. Nslookup allows a user turn to DNS system. The utility is usually is built in the OS.
You are likely to wonder now: “How can we run this utility from Talend?”. It’s a piece of cake! For this purpose tSystem component is provided in Talend. tSystem’s purpose is to call one or more system programs. Let’s place it on the design window of the components. But it’s not all. There is another component to be added to our software, it is tFlowToIterate. tFlowToIterate generates global variables from input data flow.
Now when all the components are on the design window, we can combine them:
tFlowToIterate needs no configuring, but in tSystem we need to set parameters, so we open the component’s parameters and type the call of nslookup utility. We type a website address as a parameter for this utility:
((String)globalMap.get(“row3.website”)) is our parameter of a website. It is in fact our global variable generated due to tFlowToIterate.
The result of nslookup utility call for a website somebad.site is:
can’t find somebad.site: Non-existent domain
Let’s move on to tMap and configure the output expression filter:
Let me explain:
tSystem_1_ERROROUPUT is a global variable which contains an error of system utility run. Here we check if the message about the error is not empty, and in case it is not empty, we look whether it contains «Non-existent domain» phrase with the help of java – contains function.
If there is an error, we set null value for the website field and log it into the database. Thus, our SugarCRM database will include only virtually existing sites.
You can see from the screenshot that everything has worked out as planned. Our utility has cleared off the non-existing website. Now let’s go to phone number verification.
Verifying phone nymber correctness in the existing SugarCRM database on the example of Accounts module
Our task is to either scrub incorrect phone numbers or to convert a phone number to an international format and to record it in the database instead of an existing one. Let’s see how this task can be done with the help of libphonenumber library. The library is not built into Talend, so it can be found on the Internet.
So, let’s begin! At first, we need to connect the library via tLibraryLoad component. tLibraryLoad allows importing outside libraries to the project. We need to place the component on design window and will set the way to the library. Next we will put all familiar components for data reading and recording to database and tMap. Then we need to place tJavaRow on the design window. tJavaRow is a component which allows integrating user’s code and Job. Let’s combine the components as shown below:
Note: if you use tLibraryLoad component, it is recommended to place it from the very beginning.
Now it’s turn to configure tMysqlInput and tMysqlOutput components for our database, detail the accounts table and the phone number filed. Let’s move to tJavaRow.
Here we deal with the input phone number and adjust it according to the set country, in our case its UA (Ukraine). Next we go to tMap. This part is quite simple:
The isValid variable can contain 2 values: namely, true, if the phone number is correct and false, if the phone number is incorrect. So we check whether the phone number is correct, and if so, we record it in the international format to the database, if not – we delete it.
After running the program, we can see that the phone number was deleted from the database because of incorrect format:
Now we will add into Accounts a phone number typical for a certain country and will slightly spoil it (we’ll add gaps and dashes). You can do experiments as for what values will this or that library process.
After running the program:
That’s it. These are the last sentences of my article, which describes some methods of data processing when migrating to SugarCRM from another system. The disadvantage of the suggested methods is the thing that there is no place to store the deleted data. But the issue can be solved by logging data into a file.
Let’s conclude: Talend helps process SugarCRM data quickly and efficiently. Talend offers convenient methods to work with databases, and its functionality can be extended by means of a user’s code.