SFTP Server Guidelines

Guidelines for using Aggregation SFTP Server


Introduction

Indeed’s SFTP Server is a shared resource that allows clients to upload their XML files to Indeed. After the client uploads xml files, Indeed processes the files so that the jobs show up on our website.

Because the SFTP Server is a shared resource, it is important for all clients to follow these guidelines. Following these guidelines enables Indeed to process all XML files in a timely manner and ensures that all jobs are posted to Indeed. If a client violates these guidelines, they risk having their access to the server revoked.

Guidelines

1) Use the minimal number of connections needed

Indeed limits the number of connections to mitigate DDOS attacks. Limit the number of connections to the amount needed so that every client can successfully upload their files.

Ideally, each client should utilize only one connection. Indeed knows that some clients will upload multiple files concurrently which might mean multiple connections. By using the SFTP protocol, one connection has the ability to multiplex operations. This allows the client to do multiple disparate operations in a single connection.

2) Just upload the file with the original filename

Indeed starts to process the file with the original filename as soon as it is uploaded. Changing the filename can delay processing time by up to 10 minutes.

Please do not:

  • Change the filename after uploading the file.
  • Delete the file after uploading it.
  • Manipulate the file metadata, such as the modification time or permissions.
  • Constantly query/poll the state of the files.

Indeed provides a 2-day look-back in your directory under the processed subdirectory. This will tell you if the file uploaded successfully.

For example: If the file was uploaded to ftp/ftpexample123/file.xml on July 19, 2022, at 10:31:00 Central Daylight Time, Indeed will upload an empty file at ftp/ftpexample123/processed/file.xml/20220719_103100_CDT.xml. This file should be used as the acknowledgement that the file was uploaded and being processed. This file will be there for 2 days before being deleted.

Screenshot of the file directory where Indeed uploads an empty file.
Screenshot of the file directory where Indeed uploads an empty file.

3) Determine upload frequency

There are two different categories you can use to determine how frequently to upload: time-based or significant-changes-based in the XML.

  • Time-based:
    • Guideline: Every 6 hours
    • Minimum amount of time between uploads: 2 hours
  • Significant-changes-based:
    • Guideline: Update when changing at least 1% of jobs.
    • Note: Changing the URL for many jobs does not constitute a significant change.

Our goal is to serve jobseekers with the most accurate and timely updates of jobs as possible. However, there are a number of steps that jobs go through before they appear on Indeed.com. Uploading every 15 minutes with minor changes will not make jobs appear faster. We will have to reprocess the entire XML file and the job still has to continue through the rest of Indeed’s job pipeline before surfacing. Our systems are configured to limit accounts that frequently upload and to only pick up updates every 3 hours (subject to change).

4) Upload necessary files only

After each file uploads, Indeed checks the filename to see if we know how to handle that filename. If we don’t know what to do with the filename it will be considered unneccessary and moved to an inaccessible directory.

Uploading unnecessary files adds additional load on the system and takes away from processing files that matter.

If requested, Indeed can query your uploaded files to see which files are not being processed.

5) Follow naming conventions

5.1) Keep the same filename between uploads

Use a consistent name between uploads to make it easier to keep track of files (e.g. my_jobs.xml)

The SFTP server can handle changing filenames between uploads as long as the only change is for datetime.

For example, if I upload a file named my_jobs_20220816_064321.xml and then 6 hours later upload a file named my_jobs_20220816_124321.xml they will be stored as:

  • my_jobs_20220816_064321.xml/20220816_064353.xml
  • my_jobs_20220816_124321.xml/20220816_124403.xml

5.2) Include a file extension

Indeed only processes XML files at this time, so all files will be assumed to be xml files. In the future, there might be other file types that Indeed could process. To reduce future issues, please include a file extension.

  • Incorrect filenames examples:
    • MY_JOBS
    • INDEED_JOBS_XML
  • Correct filename examples:
    • my_jobs.XML
    • INDEED_jobs.xml