DataAnalytics Help Center

version 0.1.8

Importing data

Version 0.1.8 supports two data source and data format: selected Html plus TABLE elements and Google spreadsheet. The wizard that is explained below is the second iteration of the import facility, but still cannot import clean and meaningful data from random web-sites. I found out that it is really hard to extract data when the software knows nothing about the structure of the information.  This wizard works for simple tables with simple formatting.  It does not handle well colspan, missing information, text physically spread over many table rows, multiple tbody elements in a single table, etc. DataAnalytics currently supports 5 data types: string, number, date, currency and boolean.

Stay tuned for a better import facility in the 0.3 branch. The wizard will be enhanced by the table view in the main window with tools to clean the data.

How to import data (Google spreadsheet)

Open the spreadsheet you want to import. From the Tools menu select Analyze Google Spreadsheet. In the wizard, select Current page and Google spreadsheet. Continue the wizard as explained below.

How to import data (Html table)

Find a table to import such as statistics on countries. If you want only a subset of the table, select the rows you want. They do not have to be completely selected, a single letter on the line will do. If you want the whole table, select anything in the table. Next right-click and select "Analyze selection." This window will open:

As you can see eventually DataAnalytics will support importing from various data sources. The current selection is correct, click next. If Selection is disabled, it means that the software did not detect anything selected. Here's the next screen:

A single data format (TABLE elements) is currently supported. At this point it will also be possible to append the data to an existing table. If Html table is disabled, no tables were detected in the selected region. Click next.

The wizard now shows a preview of the first 25 rows of data and display an estimate of the number of records. You need to provide a name for the table and validate the column headings. Double-click on any line to make it the table heading. The importation of data will processed any succeeding rows. If the table does not have an heading, check the "No heading" option.

The list of fields is displayed next and the detected data type are shown.  You can modify all these values and decide to skip columns by unchecking them. This is also the point where you can specify the parsing and formatting options by clicking on the Format button. More details about formats are available here.

At this points you decide if you want the whole table or just the selected rows. You can also choose to skip any number of records from the start or the end of the table. Finally you can also skip any records where one of the field has invalid data.

Some optimization options are displayed next. Click on Start to process the table. The code for importing is not yet optimize. Importing large tables such as in the third tutorial (32000 rows) can take more than one minute. Consider adjusting your dom.max_script_run_time. If the next button does not enable, there was a problem, please send me the Url to the table.

Your done. These options are not implemented yet. Click finish.