Data Correctness Verification During Talend-Based Migration to SugarCRM

Author: Igor Petelin, Software Engineer

When migrating data to SugarCRM from another system one can frequently face the problem of incorrect data (for example, website, email, telephone number):

correctness verification during Talend-based migration to SugarCRM 1

The reason of the issue is that the control over CRM data input is insufficient. A user may enter irrelevant email address in a hurry and a customer will never get an email sent to him/her. So, what is the way out?

Let me describe the way of checking the data correctness, as an example we’ll take email address, website and telephone number. Data check can occur either during daily (weekly, monthly) data import to SugarCRM from an old system or in the existing database. So, let’s begin.

Verification of an email address correctness in the existing SugarCRM database by the example of Accounts module

For the beginning we need to pick up data from email_addresses table of SugarCRM database, since it’s the place where all email addresses are stored. Here we need to apply tMysqlInput component. It picks up data from MySQL database.

Let’s move on to its configuring. We need to set database connection parameters, table name and scheme. Since retrieve only 2 fields, the scheme can be created manually. After you have created the scheme and the table’s name, click Guess Query. Needed request will be generated in Query field.

Below are the results of our actions:

correctness verification during Talend-based migration to SugarCRM 2

Then we place tMap and tMysqlOutput on the palette and combine them. You have probably already understood, tMysqlOutput is a component that records data into MySQL database. Let’s configure it:

correctness verification during Talend-based migration to SugarCRM 3

We will set the same connection parameters as for tMysqlInput, in particular the name of email_addresses table. Then we need to indicate that we will update data, so we select «Update» form the drop-down list of parameter «Action on data». Next, we will configure the scheme: here we have only 2 fields – id and invalid_email. The invalid_email field is used in SugarCRM for marking an incorrect email address. An incorrect email address is displayed in the interface as follows:

correctness verification during Talend-based migration to SugarCRM 4

When updating data in the table we need to indicate a key. It’s a filed in which an update will take place. In our case it’s the id filed.

Let’s move on to the most interesting part. It’s time to create a particular function in Talend. We will use it to verify email address correctness. So, we can create functions using Routines.

Routines is parts of code that can be used many times. They are written in java language and are used for Talend functionality extension. For creating Routines we need to go to «Code» in the menu on the left, click «Routines» with the right button and select «Create routine».

correctness verification during Talend-based migration to SugarCRM 5

Then we indicate the name and click «Finish». A java class with one method helloExample will be created. It is of no interest to us, so you can delete this part of the code, instead we are going to create a new method, which will check the correctness of an email address by the pattern.

Here is what we’ve got:

correctness verification during Talend-based migration to SugarCRM 6

I’m not going to discuss in detail the code part, but I have to note: here we verify the first argument of email_address with a set pattern in the second email_pattern parameter, or with a standard pattern (if the second parameter is empty), which we can set in the function’s body. The function returns true, if the email address is relevant and false – if not. Which pattern to choose is up to you.

Now it’s time to apply the function we created. For this we need to open tMap. Click «Activate expression filter» over the output flow and create a condition on which we will update data in the table.

correctness verification during Talend-based migration to SugarCRM 7

Let me explain, what happens at this stage: we set value true in invalid_email filed, if an email address (email_address) does not suit the provided pattern. After running the utility we get the following result:

correctness verification during Talend-based migration to SugarCRM 8

As you can see, the process is quite simple. Now we can go to verifying the correctness of a customer’s website.

Verifying website correctness in the existing SugarCRM database on the example of Accounts module

First of all, we need to place three components on the design window, which we used for email addresses verification. The only thing we have to change is the table’s name and the column’s name, which we are going to retrieve. In SugarCRM Accounts module website address is stored in website field.

We will not check the correctness of the email address, but we will look if such website exists. In order to check a website’s existence, we will use nslookup utility. Nslookup allows a user turn to DNS system. The utility is usually is built in the OS.

You are likely to wonder now: “How can we run this utility from Talend?”. It’s a piece of cake! For this purpose tSystem component is provided in Talend. tSystem’s purpose is to call one or more system programs. Let’s place it on the design window of the components. But it’s not all. There is another component to be added to our software, it is tFlowToIterate. tFlowToIterate generates global variables from input data flow.

Now when all the components are on the design window, we can combine them:

correctness verification during Talend-based migration to SugarCRM 9

tFlowToIterate needs no configuring, but in tSystem we need to set parameters, so we open the component’s parameters and type the call of nslookup utility. We type a website address as a parameter for this utility:

correctness verification during Talend-based migration to SugarCRM 10

((String)globalMap.get(“row3.website”)) is our parameter of a website. It is in fact our global variable generated due to tFlowToIterate.

The result of nslookup utility call for a website somebad.site is:

can’t find somebad.site: Non-existent domain

Let’s move on to tMap and configure the output expression filter:

correctness verification during Talend-based migration to SugarCRM 11

Let me explain:

tSystem_1_ERROROUPUT is a global variable which contains an error of system utility run. Here we check if the message about the error is not empty, and in case it is not empty, we look whether it contains «Non-existent domain» phrase with the help of java – contains function.

If there is an error, we set null value for the website field and log it into the database. Thus, our SugarCRM database will include only virtually existing sites.

correctness verification during Talend-based migration to SugarCRM 12

You can see from the screenshot that everything has worked out as planned. Our utility has cleared off the non-existing website. Now let’s go to phone number verification.

Verifying phone nymber correctness in the existing SugarCRM database on the example of Accounts module

Our task is to either scrub incorrect phone numbers or to convert a phone number to an international format and to record it in the database instead of an existing one. Let’s see how this task can be done with the help of libphonenumber library. The library is not built into Talend, so it can be found on the Internet.

So, let’s begin! At first, we need to connect the library via tLibraryLoad component. tLibraryLoad allows importing outside libraries to the project. We need to place the component on design window and will set the way to the library. Next we will put all familiar components for data reading and recording to database and tMap. Then we need to place tJavaRow on the design window. tJavaRow is a component which allows integrating user’s code and Job. Let’s combine the components as shown below:

correctness verification during Talend-based migration to SugarCRM 13

Note: if you use tLibraryLoad component, it is recommended to place it from the very beginning.

Now it’s turn to configure tMysqlInput and tMysqlOutput components for our database, detail the accounts table and the phone number filed. Let’s move to tJavaRow.

correctness verification during Talend-based migration to SugarCRM 14

Here we deal with the input phone number and adjust it according to the set country, in our case its UA (Ukraine). Next we go to tMap. This part is quite simple:

correctness verification during Talend-based migration to SugarCRM 15

The isValid variable can contain 2 values: namely, true, if the phone number is correct and false, if the phone number is incorrect. So we check whether the phone number is correct, and if so, we record it in the international format to the database, if not – we delete it.

After running the program, we can see that the phone number was deleted from the database because of incorrect format:

correctness verification during Talend-based migration to SugarCRM 16

Now we will add into Accounts a phone number typical for a certain country and will slightly spoil it (we’ll add gaps and dashes). You can do experiments as for what values will this or that library process.

correctness verification during Talend-based migration to SugarCRM 17

After running the program:

correctness verification during Talend-based migration to SugarCRM 18

That’s it. These are the last sentences of my article, which describes some methods of data processing when migrating to SugarCRM from another system. The disadvantage of the suggested methods is the thing that there is no place to store the deleted data. But the issue can be solved by logging data into a file.

Let’s conclude: Talend helps process SugarCRM data quickly and efficiently. Talend offers convenient methods to work with databases, and its functionality can be extended by means of a user’s code.

Leave a Reply

Your email address will not be published. Required fields are marked *