Aug 9, 2017

Version 3.6 is out

Here is what's new and exciting in version 3.6:

Multiple file loading

Previously, loading multiple uniform files (i.e. files with the same type and similar set of columns) required using iterations. This could be challenging for new users unfamiliar with the concept of iterations. Starting from this release, all file import transformations allow loading multiple uniform files at once without using iterations. Loaded files are automatically concatenated into one table. it makes is convenient to load several daily or monthly reports at once.

New mode for loading multiple uniform files at once.


The files to load can be specified either explicitly by ticking necessary files in list, or by providing a search criterion (substring, wildcard, or regular expression). The source folder path can be specified using a parameter.

Publishing to Tableau Server
Starting from this release the "Export to Tableau" transformation can publish Tableau data extracts right to Tableau Server as data sources.

Publishing to Tableau Server


Support for fixed width text data

We've added two transformations to deal with fixed width text: "Import fixed width text" and "Split fixed width text". The first transformation loads a text file that contains columns of fixed width. It has a preview mode for convenient "slicing".

Import fixed width text (click to zoom).

The other transformation, "Split fixed width text" allows splitting columns with fixed width text that were already loaded from other data sources (e.g. a database). It has the same preview mode for easy, visual "slicing".

7 other new transformations

In total, this release adds 9 new transformations -- more than any previous release ever.
  • Fill Right transformation is similar to the “Fill down” transformation. It fills empty cells to the right, in specified row. This transformation helps dealing with spreadsheets where table headers contain merged cells.
  • Download File transformation fetches a file over the internet using the HTTP(S) or the FTP network protocol. It supports basic login/password authorization.
  • Split Delimited Text transformation is similar to the “Import delimited text file” transformation (introduced long time ago), but splits an already loaded column into new columns. It can be used for instance for dealing with CSV data imported from a spreadsheet, or a database.
  • Status transformation is a workflow transformation. When reached it displays a custom message. Typically, it’s used for signaling on project execution progress. For desktop versions of EasyMorph the message text appears in the “Run progress” dialog window. For EasyMorph Server the message appears as task status in the web-console, and also is logged into the task log. Custom statuses of called/iterated projects pop up into their calling projects, which allows displaying iteration progress (e.g. “Processed 15 files out of 132”).
  • Regular Expression transformation find matches for specified regular expression. The transformation works in two modes: find 1st match, and find all matches. In the latter case, new rows can be inserted into the table to store matches beyond 1st. Most common use cases for the transformation include data quality checks and text parsing.
  • Select By Lookup transformation select columns which names exist in another (lookup) table. It is intended for advances scenarios in which a resulting set of columns depends on user input or calculation logic.
  • Convert Data Type transformation converts text dates in one or more columns into number dates. The date format in selected columns can be auto-detected by pressing “Detect” button, if it matches one of 1,000+ formats known to EasyMorph. If no format detected then it should be entered manually in transformation properties. The format specification is the same as for the format() function.

Changes to existing transformations

  • The Concatenate function of Aggregate now allows specifying a separator and concatenating only distinct values. 
  • Running Total transformation has got a competition ranking mode in addition to the default dense ranking. 
  • Select Matching Database Rows and Delete Matching Database Rows transformations now allow matching on dates (without times). 
  • Iterate transformation now runs faster and is more memory-efficient.

The system function

A new function has been added: system(keyword). This function allows obtaining various system values. The keyword must be a text constant. Currently supported keywords:

  • projectpath — The full path to the current project including file name.
  • projectfilename — Current project’s file name only.
  • programpath — The full path to morph.exe.
  • currentpath — The working directory.
  • user — The current user. Replaces function user().
  • callerpath — The full path of the calling project including file name.
  • utcoffset — The UTC offset to the system time zone ( e.g. "-04:00:00").
Example:
system('projectfilepath') returns ‘C:\Documents\myproject.morph’.

Custom HTML in generated documentation

Auto-documentation now allows including custom HTML into the head and body of generated HTML page. This can be used for inserting corporate logos, links to other HTML pages, change logs, custom project metadata.

Miscellaneous

  • The visual condition builder in Query Editor now has undo/redo buttons. 
  • Added support for SAP HANA SQL dialect (for ODBC connections). 
  • Query preview now have a context menu invoked by right-clicking on datagrid cells. 
  • Expression editor has got a button for inserting date constants. 
  • Project tabs can be switched back and forth by pressing Ctrl+Tab and Ctrl+Shift+Tab
  • The “Input” transformation can now be enabled in Plus license as an add-on feature.

Aug 2, 2017

Fuzzy matching and merging

Sometimes there are cases when it's hard to match (or merge) two tables because the matched values (e.g. names) are not precisely identical -- for instance when a name has a typo, or extra space between first and last name. Most frequently it happens when one of the tables is a result of a process that involved manual data entry at some point. It could be web-site forms, CRM systems, or spreadsheets. In such cases fuzzy matching comes to rescue. This article explains what this is and how to do it in EasyMorph.

Fuzzy matching can reliably match two words (or in a broader sense, text strings) that are similar but not exactly the same. For instance, "EasyMorph Inc" and "EasyMoprh Inc" clearly mean the same company, but the two names are not identical because the latter has a typo and therefore they won't be matched when doing a VLOOKUP in Excel, or data blending in Tableau.

In order to deal with such discrepancies the edit distance between two text strings can be calculated. This distance shows the number of edits required in one text in order to obtain the other.

The Levenshtein distance is the number of inserts, deletions and character replacements required to produce the other word. In our case, the Levenshtein distance between "EasyMoprh Inc" and "EasyMorph Inc" equals 2, because it requires removing "p" before "r" and inserting it after "r" -- so 2 edits in total.

Levenshtein

There is a modified variant of the Levenshtein distance which is called the Damerau-Levenshtein distance. This variant adds swapping two adjacent characters as one more possible edit operation. The Damerau-Levenshtein distance between "EasyMoprh Inc" and "EasyMorph Inc" equals 1 because it only requires swapping "p" and "r" which is 1 edit operation.

Damerau-Levenshtein


When fuzzy matching is done using the edit distance two text strings are considered matched when the distance is less than certain threshold (typically 2 or 3). The higher the threshold, the "fuzzier" the matching becomes. Therefore high thresholds are usually not practical as they produce too many false matches. Also, the shorter are matched values, the lower the threshold should be. For instance, the edit distance between "cat" and "dog" is just 3, but these are completely different words.

In EasyMorph, fuzzy matching is arranged using the "Match" transformation in the "Fuzzy" mode. It calculates the Damerau-Levenshtein distance for text values in two tables.



The transformation matches text values in one column (with mistyped values) in current table with a column in another table (with correct values). When the edit distance is less than or equal to the specified threshold values are considered matched. If merging other columns is required it can be done using additionally the "Merge" transformation by linking successfully matched values.

It is also possible to calculate the edit distance in expressions using distance(). The function can be used in simple cases cases when there is only one or two correct values to match with, thus avoiding creating a lookup table.

Jul 14, 2017

How to trigger data transformation tasks from Tableau

Since EasyMorph Server is built using web technologies, it makes triggering data transformation (ETL) tasks from analytical systems like Tableau especially simple. Triggering tasks right from dashboards is convenient, because you can refresh underlying data sources from a familiar UI in just one click, plus you get additional bonus feature (explained later in this article).

Here is how you can do it:

Step 1: Enable HTTP GET requests for the data transformation task

While it might sound cryptic it's actually a simple thing -- we just tell EasyMorph Server that particular task may be triggered by opening specific URL (hyperlink). This option is disabled by default in order to avoid accidental task triggering, because a task can potentially perform an irreversible action, e.g. overwrite data in a database table or file.

To enable triggering by opening a hyperlink, go to the task properties in EasyMorph Server web-console and tick "Enable HTTP GET".

Enable triggering EasyMorph Server task by URL (click to zoom).


Then copy the generated URL. In the example above the server address is localhost, in your case it would be your EasyMorph Server address. Don't forget to save the change in task properties -- it won't work until you save it.

Step 2: Create URL action in Tableau

Now go to Tableau and create a new URL action (menu Dashboard -> Actions -> Add Action -> URL). Give the action a name and paste the URL copied in Step 1.

Create URL Action in Tableau (click to zoom).


Step 3: Trigger the task from Tableau

Now right-click anywhere in your dashboard and trigger the EasyMorph Server task from the right-click menu (actions appear at the bottom). This will open default web-browser showing task execution log updated in real time. Once the task is finished you can go back and refresh the dashboard (in some scenarios it updates automatically). Mission accomplished.

Trigger EasyMorph task from Tableau.


Bonus feature: auto-documentation

EasyMorph Server can generate a human-readable description of a task's data transformation logic. This is convenient because it explains in plain English how the source data was processed into the dataset displayed in the dashboard. You can see all the transformations, expressions and filtering conditions, as well as helpful annotations.

Since generated task documentation has a permanent URL it also can be added as an URL action in Tableau.

To obtain the auto-documentation link go to the task properties on EasyMorph Server web-console, and switch to Documentation. Then copy the page URL from the browser address bar.

Auto-generated ETL documentation (click to zoom).


A URL action for documentation is created in the same way as described in the Step 2 above.

Once the URL action is created, you can open the documentation right from the Tableau dashboard.

Open task documentation from Tableau.


Jun 29, 2017

Our mid-term roadmap (2017H2-2018H1)

We're getting quite a few feature requests recently (which is a great thing!) so it looks like it would be useful to shed some light on our future development plans. We do publish our short term goals on the download page. However, our mid-term roadmap has never been disclosed. So here is what we would like to do in the next 12 months:

Persistent editable datasets in projects

This is supposed to be a very significant change to what EasyMorph is. At this point no data is stored in EasyMorph projects -- they load it from files and databases and output processed data into other files and databases. Once processing is done no data is saved in project. We're going to change that. Similarly to spreadsheets, EasyMorph will be able to have both data AND calculation logic in one file. From a technical standpoint it will be implemented as follows:

  • Sandboxes will be able to keep datasets permanently -- i.e. data in sandboxes will be stored in projects and survive re-opening projects.
  • Sandboxes will be editable -- it will be possible to edit data in sandboxes like in spreadsheets (but without formulas).
  • All import transformations will get the ability to remember (cache) the last loaded dataset, which means that a project can be sent to someone without access to the project's data sources.
While it might be not immediately obvious what the benefits are, they are actually quite significant:
  • Portability -- no need to keep mapping tables and reference tables in external files. They can be stored and edited right in project.
  • Shareability -- EM projects become a means to share data. Like with spreadsheets, it allows sending not just some calculation results to someone, but also the source data and the logic that produced the results.
  • Versatility -- the ability to edit datasets right in EasyMorph means that you can mix manually entered data with calculated data. For instance, edit planned numbers and compare them with actuals in one project. Or update planned numbers based on some actual data. Or perform what-if analysis for different scenarios.
The change would effectively make EasyMorph a hybrid between a spreadsheet and an ETL, or a relational spreadsheet, if you will.

Querybooks

We're going to keep improving queries which will transform into querybooks. A querybook is a set of queries that answer particular questions (one query per tab). Querybooks will be portable, meaning that a querybook can be sent to another user with all necessary data connectors embedded for the queries to work.

Queries themselves will become multi-table, supporting joins and unions (won't be available in the free edition).

Application data sources

So far EasyMorph has been able to import data from databases, files and spreadsheets. However, a new category of data sources will be added -- applications, including cloud applications like Google Analytics, Twitter, Wikipedia, etc.

Application data source won't be available in the free edition.

Financial and statistical transformations

With the release of version 3.6 EasyMorph has 80+ transformations that cover almost all possible basic calculations. But we're not done yet! We will be adding special transformations for financial calculations (payment schedules, compound interest rates, bond yields, etc.) and statistical calculations (e.g. variances, deviations, clustering) with elements of predictive analytics (e.g. linear regressions, extrapolations).

Visual conditions and business rules

Visual conditions that were first introduced for queries in v3.5 will make their way into transformations with conditions that currently use expressions -- Rule, Filter by condition, Trim by condition, Halt and a few more transformations.

PS. Lots of things are planned for EasyMorph Server, but that's a subject for another post.
PPS. All written above is subject to change without notice.

May 21, 2017

Meet EasyMorph Server

Today we're releasing EasyMorph Server -- a new product in EasyMorph family. Here I'd like to talk more about why we've made it and what to expect in the future.

Why

The need for a server edition appeared pretty quickly once our customers started performing more transformations on larger datasets. Basically, it was demanded by the switch of usage pattern from personal to team use:

  • Multiple projects run by multiple users
  • Data volumes required more RAM and CPU power than a typical personal computer has
  • Scheduled long-running transformations
  • Shared source files and data sources

What's this

EasyMorph Server is a Windows service with a web console and the same columnar in-memory data transformation engine which is employed by desktop editions of EasyMorph. It runs on schedule projects created in any desktop edition (including the free edition, yay!). Project parameters can be assigned (overridden) in Server tasks. Project logs can be viewed right in the web console and are updated in real-time.

Task list (click to zoom).


The Server engine performs slightly better than the engine of desktop editions. It runs 10-20% faster and utilizes 10-30% less memory due to server-specific optimizations. For instance, unlike the desktop editions, Server doesn't store some intermediate results since there is no need to display them in UI.

EasyMorph Server is designed for analytical teams with limited or no IT support. Therefore, it's very easy to install and maintain. The installer package also includes EasyMorph Server Monitor -- a simple utility to manage the service: start it, stop it, view the server log, and change HTTP port for that rare case when the default one is not available.

Documentation

While the Server doesn't allow creating or editing projects (projects are designed in desktop editions) it still can be helpful for business users that only run projects (created by someone else) because they can view auto-generated documentation on data transformation logic in projects, and understand what happens under the hood when they trigger a data transformation task. For instance, they can see expressions used to calculate columns, filtering conditions, data sources, parameters, etc.

Since the documentation has a permanent URL it can be accessed (opened) from external applications such as Tableau dashboards or Qlik applications.

Integration with Tableau dashboard through Actions.

API

Right from the beginning EasyMorph Server is built around a public RESTful API that can be used by external applications in order to create, modify and trigger Server tasks programmatically. The API enables tight integration with 3rd party tools.

Customization / White-labeling

The Server's UI is customizable. You can change colors and styles, use your own logo and modify links in the web console's header to match your corporate style. Even more than that -- if you're developing and selling an analytical application you can use EasyMorph Server as native component of your application under an OEM license. Note that desktop EasyMorph (required for creating/editing projects) is not customizable.


What's next

The version 1.0 is, as it's supposed to be for a very first release, somewhat basic :) But it doesn't mean it's going to stay like that -- we have a lot of things on the roadmap:

Soon, we will be introducing events -- a means to arrange task cascading and non-schedule triggers (e.g. running a task when a file appears/changes). Private groups will allow restricting access to particular tasks. Email integration will appear in one of the future releases and will make automated receiving and processing email attachments possible. Impact tracking will allow logging and analyzing what project created, consumed or modified what file or database table.

Licensing
EasyMorph Server is licensed per server with no limit on users which means that any number of users can access the Server, create and run tasks, and view documentation on them. Together with the ability to run projects created in the free desktop edition this makes EasyMorph Server one of the most affordable department-level ETL systems on the market. More on the licensing on our web-site: http://easymorph.com/server.html.

Dmitry Gudkov