The dataimport framework

The dataimport framework allows us to use a simple configuration setting to create predefined import mappings which will map uploaded data from CSV, XLSX, DBF and other formats to data structures in a fileobject.

The config for each dataimport map lies in TBL dataimportmap. It has a number of fields:

dataimport_target
This is target fileobject type which the data will be inserted into. Eg "document" or "user"
dataimport_name
A unique text identifier for this dataimportmap. Lowercase, no whitespace. eg "document_import_tester"
dataimport_title
A unique human readable title for this dataimportmap. Eg "Document Import Test"
dataimport_insert
Will this import map create new records? If this is set to true, the data import process will create a new records for rows in the import data which don't map to an existing record.
dataimport_update
Will this import map update existing records? If this is set to true, the data import process will update records for rows in the import data which map to existing records. If set to false, it will skip them.
dataimport_replaceall
Will this import map replace all existing records? If this is set to true, the importer will EMPTY the database table for the object type, and create new records for every row in the import data.
dataimport_firstrow
Which is the first row of data in the import data? Allows the importer to skip header rows. The importer starts the count at row[0]. If the data starts at the first row, this setting should be set to 0
dataimport_colcount
How many columns in total are there in the import data? This counter is used in import validation, and any rows which don't have the expected number of columns will be skipped.
dataimport_map
A JSON blob describing the import mapping rules. More on this later.
dataimport_custom
A custom import php class which runs during the import, and can be used to perform more complex data manipulations during the import process. More on this later.

Dataimport Map JSON

Data import map JSON tells the importer to do with each column it finds in the import data.

In its simplest form, this could just be s list of fields to map to a field in the fileobject. In this form, fields will only be updated when a new record is created:

[
	"", 				// Nothing to map in col 1
	"document_title", 		// map col 2 to the document title
	"document_content", 		// map col 3 to the document content
	"", 				// nothing to map in col 4
	"", 				// nothing to map in col 5
	"document_publicationdate" 	// map col 6 to the publication data
]

If we wish to update records, we need to indicate which fields in the import data is used as an "index" field to locate records in the dotAdmin database, and what index field in dotAdmin it maps to. We also need to indicate which fields can be updated when the importer finds an existing record:

[
	{
		"name": "document_id",		// Map col 1 to the document_id
		"canupdate": false ,		// We won't update this value on import
		"is_index": true		// We use this imported field to find existing record in dotAdmin
	},
	{
		"name": "document_title",	// Map col 2 to the document_title
		"canupdate": true		// If we find an existing record, we WILL update this value dotAdmin
	},
	{
		"name": "document_content",	// Map col 3 to the document_content
		"canupdate": false		// If we find an existing record, we WON'T update this value dotAdmin
	},
	"", 					// nothing to map in col 4
	"", 					// nothing to map in col 5
	{
		"name": "document_publicationdate",
		"canupdate": true ,
		"type": "date"			// This is a date. Validate it as such before updating the data.	
						// Skip if it is not valid.	
	},
]

Finally, you will notice above that the document_publicationdate field has a "type" parameter which indicates what format of data we are expecting. If the data coming in does not match the expected format, the importer will skip that row of data until the error is corrected in the import dataset.

Valid types are:

  • date
  • time
  • datetime
  • integer
  • float
  • text

Dataimport Map Custom Classes

We can also specify a custom class. The class will reside in the site path at /model/dataimport.[classname].class.php and it should have a classname of [Classname]DataImport. eg: DocumentTestDataImport

Various methods can exist in that class which attach to various event hooks in the import process, allowing a different behaviour to be added:

recordExists($row)

This method can be used to locate an existing record to match the import data in row $row. It replaces the standard check for an existing dotAdmin record. It should return true if it finds a matching record, and false if not.

$row is a simple numerical array of field values like the result from the fgetcsv() method in PHP.

getFileForUpdate($row)

This method returns an existing record for updating if one exists. it should return a dotAdmin file object for the matching row if one existis, or NULL if not.

$row is a simple numerical array of field values like the result from the fgetcsv() method in PHP.

afterRowInsert($obj_file, $row)

After the importer has finished creating and setting the inital values on $obj_file with the data in $row, it will be passed to this method, where further operations can be carried out.

$row is a simple numerical array of field values like the result from the fgetcsv() method in PHP.

NOTE: if you make changes to $obj_file, you need to save / publish those changes within this method.

afterRowUpdate($obj_file, $row)

After the importer has finished updating $obj_file with the data in $row, it will be passed to this method, where further operations can be carried out.

$row is a simple numerical array of field values like the result from the fgetcsv() method in PHP.

NOTE: if you make changes to $obj_file, you need to save / publish those changes within this method.

getInputFields()

If the importer needs to ask some more questions of the user before importing the data, we can use this method to return those input fields to the import interface so they are presented to the user. For example:

public function getInputFields(){
	$form_html = '';
	$form_html .= FormViewHelper::getSelectFieldAndLabel('document_author' , 'Author' , AdminUsersEnum::getInstance()->getSelectValues() , '');	
	$form_html .= FormViewHelper::getTextareaAndLabel('document_notes' , 'Notes' , '');	
	return $form_html;
}

The values posted in by this form will be available in the other import functions in the PHP global $_POST array.

Using the dataimport framework

The data import framework can be instantiated and accessed as follows:

1) Place the file you wish to access in the site /tmp/ directory

2) Instantiate an importer for it by calling the following code:

// Include the dataimport class
require_once(MODEL_PATH . 'dataimport.class.php');
// Get an importer object for the target dataset
$obj_importer = InvDataImport::getImporterForFile('example_import.csv');
// Get validation data for the import before importing it. This can allow you to display a list of errors, for example
// (returns true if import source is valid, false it it isn't)
$validation_data = $obj_importer->validateData();
// ... or just import the data 
$obj_importer->importData();

A note on validation data

The validation data returned by the import class can be used to display warnings and errors before the user commits to import their data. For example:

<?php if(count($this->obj_importer->getBadRows()) > 0){ ?>
<h3>Row Errors</h3>
<p>The following rows have errors and will not be imported/updated.</p>
<table class="table table-bordered table-condensed">
	<tr>
		<th>NUM</th>
		<?php foreach($this->obj_importer->getImportHeadings() as $name){ ?>
		<th><?php echo htmlspecialchars($name) ?></th>
		<?php } ?>
	</tr>
	<?php 
	$rcount = 0;
	foreach($this->obj_importer->getBadRows() as $error){ 
		if($rcount < 30){
	?>
	<tr>
		<td colspan="<?php echo count($error['row']) + 1 ?>">
			<?php if($error['reason'] == 'WRONG_COLUMN_COUNT'){ ?>
			WRONG NUMBER OF COLUMNS (<?php echo count($error['row']) ?>)
			<?php } else if($error['reason'] == 'CELL_ERRORS'){ ?>
			INVALID DATA IN CELLS
			<?php } else if($error['reason'] == 'RECORD_NOT_FOUND'){ ?>
			CANNOT FIND RECORD TO UPDATE
			<?php } else if($error['reason'] == 'DUPLICATE_RECORD'){ ?>
			DUPLICATE RECORD
			<?php } ?>
		</td>
	</tr>
	<tr>
		<td><?php echo htmlspecialchars($error['rownum']) ?></td>
		<?php 
		$count = 0;
		foreach($error['row'] as $cell){ 
		?>
		<td <?php if(is_array($error) && !empty($error['cell_errors']) && in_array($count , $error['cell_errors'])){ echo 'style="background-color: #d83c3b; color: #FFF;"'; } ?>><?php echo htmlspecialchars($cell) ?></td>
		<?php 
			$count ++;
		} 
		?>
	</tr>
	<?php } else { ?>
	<tr>
		<td colspan="<?php echo count($error['row']) ?>">
			TOO MANY ROWS TO REPORT. ERROR REPORT TRUNCATED.
		</td>
	</tr>
	<?php
			break;
		}
		$rcount ++;
	} 
	?>
</table>
<?php } else { ?>
<div class="alert alert-success">No errors in uploaded data.</div>
<?php } ?>

Contact Us

Address: 22a Fishergate York, YO10 4AB · Tel: 01904 636677 · Email: info@dotadmin.com