Skip to content
  • There are no suggestions because the search field is empty.

Files: Process Top File

This article provides a starter solution related to processing the top file in a directory.

What Does This Article Cover?

It might be necessary to process the first file in a directory.  This article provides a data pipeline solution that processes a file in a directory.  In this case the file has an unconventional format, and it will be converted to CSV format.

Intelligence Hub design considerations for processing the top file in a directory

The solution provided in this article is simple and is intentionally provided independent of a destination system. These considerations pertain to processing the last file written to a directory.

  • The destination system should be considered, and an Intelligence Hub solution should be created that exchanges data directly with that system.  For example, if the desired destination is Snowflake data warehouse, Intelligence Hub can be used to create a solution that parses a file and writes directly to a Snowflake table.
  • Regex can be used to filter the filename.  If there are many files in the directory Regex can be used to eliminate the files that might not pertain to the solution.
  • The file can be removed from the directory and automatically moved to a designated processed files directory.  Alternatively, the Start time filter or Creation Time filters can be used to filter out old files. 
  • The Include Metadata option can be enabled to include the file name and path, file creation time, file update time, and file size in the payload.  
  • A trigger time interval should be chosen frequently enough to ensure only one event will occur between intervals, otherwise files may be skipped. 

A project file may be downloaded [here].

 

 

Summary

The file input can be read regularly to watch for new files or changes in existing files, and filters can be set to narrow the set of files Intelligence Hub will read. Using the above considerations and careful filtering and polling, Intelligence Hub can be used to act on new files as they appear and transform them into a desired format with a specifically designed pipeline. 

Additional Resources