Oct 13, 2017

Version 3.6.1 is out

Version 3.6.1 has just been released. While it's designated as a minor release, it introduces so many changes that it could've been a major version. The release is synchronized with the release of EasyMorph Server v1.2 as they both share the same in-memory engine.


.NET requirement upgraded to .NET 4.5.1

Previously the minimum version of .NET required for EasyMorph was 4.5. Starting from this release it's 4.5.1. The change won't affect the vast majority of EasyMorph users because the 4.5.1 was installed on their computers years ago with one of Windows system updates. For those who for some mysterious reasons missed it here is the link to .NET 4.5.1 installer.

Support for quoted line breaks in delimited text files

We've rewritten the parser for loading text files and now it supports quoted line breaks -- a long awaited feature that allows EasyMorph read correctly some files that it couldn't read previously, and comply with the RFC 4180 standard.

However, it also means that EasyMorph may now import some text files differently, because now it imports them correctly. To help detect files that can now be imported differently, we included with the installer a command line utility that parses given text file(s) using the old parser and the new parser. If the parsers produce different results it prints a warning. The utility supports wildcards, so you can test with it multiple files at once.

Improved Export to Tableau

Previously timestamps (i.e. dates with time part) were exported into Tableau as only dates, without the time part. This required implementing a workaround such as exporting the time part as text in a separate column, and then appending it back to dates in Tableau using expressions.

Starting from this release, if a column contains at least 1 date with time and has type “Date” in the “Export to Tableau” transformation then its Tableau data type set to DATETIME and its values are exported as timestamps.

Similarly, numbers previously were always exported as floats. Starting from this release, if a column contains only integer numbers its Tableau data type is set to INTEGER and all its values are exported as integers. If the column has at least 1 non-integer value, or value that is outside of the integer range, then entire column is exported as floats.

Integers and timestamps exported to Tableau (click to zoom).

Append mode for Export to Delimited Text File

Another long-needed feature is the ability to append data to existing text files in the "Export to delimited text file" transformation. Now it can be used for various kinds of logging. For instance, for collecting records that didn't pass data quality checks.

In the Append mode a text file will be created automatically if it didn't exist.

Publishing projects to EasyMorph Server

All desktop versions of EasyMorph now can publish projects directly to EasyMorph Server v1.2 and up. New button “Publish to server” has been added to the Project menu.

Publishing to EasyMorph Server (click to zoom).

Comments and date literals in expressions

Now it is possible to insert comments in expressions. Start a comment with a double-slash (//). Everything between the double-slash and the end of line will be considered a comment and ignored during expression evaluation. A comment can’t span across multiple rows.

Date literals start with the pound sign (#) and must have the ISO format only, i.e. #YYYY-MM-DD. Time is not supported in date literals. Note that there must be no space between # and year. See the screenshot below:

Date literals and comments in expressions.


Links:

Oct 12, 2017

Web File Manager in EasyMorph Server v1.2

The Web File Manager is the headline feature in EasyMorph Server v1.2 released a few days ago. Previously, publishing files to the Server had to be done solely by copying projects and data files to a shared network folder. However, in some cases it can be inconvenient -- the Server can reside on a different network segment or in a cloud where shared folders are inaccessible. Also copying files to a network folder requires setting correct permissions for the folder in order to allow the Server service to access the files.

Starting from version 1.2 EasyMorph Server has browser-based Web File Manager which allows performing the following file operations:
  • Browse folders
  • Upload one or multiple files by dragging them into browser
  • Upload one or multiple files by pressing the “Upload file” button
  • Download a file by clicking its name
  • Delete selected file(s)
Watch the video below to see the Web File Manager in action:



Multiple files can be uploaded at once. There is no limit on file size.

Note that in version 1.2 creating/renaming/deleting folders is not supported yet. This will be available in future releases.

The Web File Manager has three access modes:

  • Full access -- all file operations are enabled.
  • Only upload -- uploading/deleting is enabled. Downloading files not possible.
  • Only download -- only downloading is possible.
  • Disabled -- the Files tab is hidden, no file operations possible.
All the file operations are also available through the EasyMorph Server SDK and the command-line API client.

From a practical perspective, the Web File Manager can complement 3rd party Business Intelligence applications (e.g. Tableau Server or Qlik Server) that are browser-based but don't have built-in capabilities for uploading/downloading data files.

Aug 9, 2017

Version 3.6 is out

Here is what's new and exciting in version 3.6:

Multiple file loading

Previously, loading multiple uniform files (i.e. files with the same type and similar set of columns) required using iterations. This could be challenging for new users unfamiliar with the concept of iterations. Starting from this release, all file import transformations allow loading multiple uniform files at once without using iterations. Loaded files are automatically concatenated into one table. it makes is convenient to load several daily or monthly reports at once.

New mode for loading multiple uniform files at once.


The files to load can be specified either explicitly by ticking necessary files in list, or by providing a search criterion (substring, wildcard, or regular expression). The source folder path can be specified using a parameter.

Publishing to Tableau Server
Starting from this release the "Export to Tableau" transformation can publish Tableau data extracts right to Tableau Server as data sources.

Publishing to Tableau Server


Support for fixed width text data

We've added two transformations to deal with fixed width text: "Import fixed width text" and "Split fixed width text". The first transformation loads a text file that contains columns of fixed width. It has a preview mode for convenient "slicing".

Import fixed width text (click to zoom).

The other transformation, "Split fixed width text" allows splitting columns with fixed width text that were already loaded from other data sources (e.g. a database). It has the same preview mode for easy, visual "slicing".

7 other new transformations

In total, this release adds 9 new transformations -- more than any previous release ever.
  • Fill Right transformation is similar to the “Fill down” transformation. It fills empty cells to the right, in specified row. This transformation helps dealing with spreadsheets where table headers contain merged cells.
  • Download File transformation fetches a file over the internet using the HTTP(S) or the FTP network protocol. It supports basic login/password authorization.
  • Split Delimited Text transformation is similar to the “Import delimited text file” transformation (introduced long time ago), but splits an already loaded column into new columns. It can be used for instance for dealing with CSV data imported from a spreadsheet, or a database.
  • Status transformation is a workflow transformation. When reached it displays a custom message. Typically, it’s used for signaling on project execution progress. For desktop versions of EasyMorph the message text appears in the “Run progress” dialog window. For EasyMorph Server the message appears as task status in the web-console, and also is logged into the task log. Custom statuses of called/iterated projects pop up into their calling projects, which allows displaying iteration progress (e.g. “Processed 15 files out of 132”).
  • Regular Expression transformation find matches for specified regular expression. The transformation works in two modes: find 1st match, and find all matches. In the latter case, new rows can be inserted into the table to store matches beyond 1st. Most common use cases for the transformation include data quality checks and text parsing.
  • Select By Lookup transformation select columns which names exist in another (lookup) table. It is intended for advances scenarios in which a resulting set of columns depends on user input or calculation logic.
  • Convert Data Type transformation converts text dates in one or more columns into number dates. The date format in selected columns can be auto-detected by pressing “Detect” button, if it matches one of 1,000+ formats known to EasyMorph. If no format detected then it should be entered manually in transformation properties. The format specification is the same as for the format() function.

Changes to existing transformations

  • The Concatenate function of Aggregate now allows specifying a separator and concatenating only distinct values. 
  • Running Total transformation has got a competition ranking mode in addition to the default dense ranking. 
  • Select Matching Database Rows and Delete Matching Database Rows transformations now allow matching on dates (without times). 
  • Iterate transformation now runs faster and is more memory-efficient.

The system function

A new function has been added: system(keyword). This function allows obtaining various system values. The keyword must be a text constant. Currently supported keywords:

  • projectpath — The full path to the current project including file name.
  • projectfilename — Current project’s file name only.
  • programpath — The full path to morph.exe.
  • currentpath — The working directory.
  • user — The current user. Replaces function user().
  • callerpath — The full path of the calling project including file name.
  • utcoffset — The UTC offset to the system time zone ( e.g. "-04:00:00").
Example:
system('projectfilepath') returns ‘C:\Documents\myproject.morph’.

Custom HTML in generated documentation

Auto-documentation now allows including custom HTML into the head and body of generated HTML page. This can be used for inserting corporate logos, links to other HTML pages, change logs, custom project metadata.

Miscellaneous

  • The visual condition builder in Query Editor now has undo/redo buttons. 
  • Added support for SAP HANA SQL dialect (for ODBC connections). 
  • Query preview now have a context menu invoked by right-clicking on datagrid cells. 
  • Expression editor has got a button for inserting date constants. 
  • Project tabs can be switched back and forth by pressing Ctrl+Tab and Ctrl+Shift+Tab
  • The “Input” transformation can now be enabled in Plus license as an add-on feature.

Aug 2, 2017

Fuzzy matching and merging

Sometimes there are cases when it's hard to match (or merge) two tables because the matched values (e.g. names) are not precisely identical -- for instance when a name has a typo, or extra space between first and last name. Most frequently it happens when one of the tables is a result of a process that involved manual data entry at some point. It could be web-site forms, CRM systems, or spreadsheets. In such cases fuzzy matching comes to rescue. This article explains what this is and how to do it in EasyMorph.

Fuzzy matching can reliably match two words (or in a broader sense, text strings) that are similar but not exactly the same. For instance, "EasyMorph Inc" and "EasyMoprh Inc" clearly mean the same company, but the two names are not identical because the latter has a typo and therefore they won't be matched when doing a VLOOKUP in Excel, or data blending in Tableau.

In order to deal with such discrepancies the edit distance between two text strings can be calculated. This distance shows the number of edits required in one text in order to obtain the other.

The Levenshtein distance is the number of inserts, deletions and character replacements required to produce the other word. In our case, the Levenshtein distance between "EasyMoprh Inc" and "EasyMorph Inc" equals 2, because it requires removing "p" before "r" and inserting it after "r" -- so 2 edits in total.

Levenshtein

There is a modified variant of the Levenshtein distance which is called the Damerau-Levenshtein distance. This variant adds swapping two adjacent characters as one more possible edit operation. The Damerau-Levenshtein distance between "EasyMoprh Inc" and "EasyMorph Inc" equals 1 because it only requires swapping "p" and "r" which is 1 edit operation.

Damerau-Levenshtein


When fuzzy matching is done using the edit distance two text strings are considered matched when the distance is less than certain threshold (typically 2 or 3). The higher the threshold, the "fuzzier" the matching becomes. Therefore high thresholds are usually not practical as they produce too many false matches. Also, the shorter are matched values, the lower the threshold should be. For instance, the edit distance between "cat" and "dog" is just 3, but these are completely different words.

In EasyMorph, fuzzy matching is arranged using the "Match" transformation in the "Fuzzy" mode. It calculates the Damerau-Levenshtein distance for text values in two tables.



The transformation matches text values in one column (with mistyped values) in current table with a column in another table (with correct values). When the edit distance is less than or equal to the specified threshold values are considered matched. If merging other columns is required it can be done using additionally the "Merge" transformation by linking successfully matched values.

It is also possible to calculate the edit distance in expressions using distance(). The function can be used in simple cases cases when there is only one or two correct values to match with, thus avoiding creating a lookup table.

Jul 14, 2017

How to trigger data transformation tasks from Tableau

Since EasyMorph Server is built using web technologies, it makes triggering data transformation (ETL) tasks from analytical systems like Tableau especially simple. Triggering tasks right from dashboards is convenient, because you can refresh underlying data sources from a familiar UI in just one click, plus you get additional bonus feature (explained later in this article).

Here is how you can do it:

Step 1: Enable HTTP GET requests for the data transformation task

While it might sound cryptic it's actually a simple thing -- we just tell EasyMorph Server that particular task may be triggered by opening specific URL (hyperlink). This option is disabled by default in order to avoid accidental task triggering, because a task can potentially perform an irreversible action, e.g. overwrite data in a database table or file.

To enable triggering by opening a hyperlink, go to the task properties in EasyMorph Server web-console and tick "Enable HTTP GET".

Enable triggering EasyMorph Server task by URL (click to zoom).


Then copy the generated URL. In the example above the server address is localhost, in your case it would be your EasyMorph Server address. Don't forget to save the change in task properties -- it won't work until you save it.

Step 2: Create URL action in Tableau

Now go to Tableau and create a new URL action (menu Dashboard -> Actions -> Add Action -> URL). Give the action a name and paste the URL copied in Step 1.

Create URL Action in Tableau (click to zoom).


Step 3: Trigger the task from Tableau

Now right-click anywhere in your dashboard and trigger the EasyMorph Server task from the right-click menu (actions appear at the bottom). This will open default web-browser showing task execution log updated in real time. Once the task is finished you can go back and refresh the dashboard (in some scenarios it updates automatically). Mission accomplished.

Trigger EasyMorph task from Tableau.


Bonus feature: auto-documentation

EasyMorph Server can generate a human-readable description of a task's data transformation logic. This is convenient because it explains in plain English how the source data was processed into the dataset displayed in the dashboard. You can see all the transformations, expressions and filtering conditions, as well as helpful annotations.

Since generated task documentation has a permanent URL it also can be added as an URL action in Tableau.

To obtain the auto-documentation link go to the task properties on EasyMorph Server web-console, and switch to Documentation. Then copy the page URL from the browser address bar.

Auto-generated ETL documentation (click to zoom).


A URL action for documentation is created in the same way as described in the Step 2 above.

Once the URL action is created, you can open the documentation right from the Tableau dashboard.

Open task documentation from Tableau.