How We Discovered 1TB Bug While Testing Interoperability

Interoperability Testing and Its purpose

Interoperability testing means testing the software to check if it can inter-operate with other software components, software’s or systems.

The purpose of interoperability testing is to prove that end-to-end functionality between (at least) two communicating systems is as required by the standard(s) on which those systems are based (Source). Interoperability testing involves at least two interoperating systems. It is performed against the standards or specifications on which involved systems are based. Availability of such standards or specifications is required for interoperability testing. In interoperability testing the end-to-end functionality of the interoperating systems is verified. The systems are tested together as one functioning system and observed from the user’s perspective.

Now let’s see, how we discovered this bug.

Specific Test background

The customer is a leader in computer storage and data management, dealing in various storage products. The product is a purpose built storage appliance, optimized for data protection and archiving, that seamlessly integrates cloud storage into the infrastructure to deliver instant recovery, end to end security, and the industry’s most cost-effective storage for backup and archive data.

The interoperability testing includes a wide range of backup applications on one end, the storage appliance in the middle and the cloud service providers at the other end. The interoperability testing is a crucial phase in the product release as it serves as qualification for the various backup applications and cloud providers establishing the support for wide range of industry wide backup application and cloud providers.

How did we discover the issue?

The typical test case execution was done with data size ~250GB for a new Cloud Provider qualification. This was based on few inputs like most frequently used data set size, time required for execution etc. But this did not test the limit of the product. In order to further explore the product limit based on some customer centric use cases, we tried to run the test with a large data size ~1TB. In customer use case, the data size could go well beyond 1TB. When we triggered a backup of the 1TB data size, we hit the issue.

Interoperability test details:

  • Deployed the latest build (Appliance) on ESXi server (5.5 or upgraded). Further setup with additional hard disk which serves as local cache, Cloud Provider Configuration, creation of SMB share and joining the appliance in the AD domain was completed.
  • A client system (Windows Server 2012R2 preferred) was deployed. The system was added to the same AD domain. A backup application NBU 7.6.1 was configured with both client and server on same windows system.

Test Steps were followed as:

  • SMB share from the appliance was mapped on the windows platform using FQDN (Fully Qualified Domain Name)
  • NBU 7.6.1 was configured with Storage Unit and backup policy.
  • Backup was triggered with 1TB data size.
  • The activity was tracked in the activity monitor
  • It was expected that data was replicated to the cloud without any issue.

Expected Outcome: – 1TB backup successfully completed and all the data replicated to the cloud.

Actual Outcome: – 1TB backup failed.

Debugging and Root Cause of the failure

The initial failure was with an error “media write error (84)”. It was quickly identified as the issue with the local cache and hence quick work around was to increase cache size to 1 TB. The intent was to debug further to get to the root cause.

Further attempt for backup failed and the service on the appliance was found stopped automatically. The service of the appliance was restarted manually, but the backup job didn’t resume\restart and the 1TB backup failed.

Upon further investigation of the system logs and wireshark traces with help from the development team, it was found that the service on the appliance stopped working due to the backend code in appliance that only supports 16K buffer to receive the JSON response. In this case, the response was greater than 16K in size and got truncated, resulting in an incomplete JSON buffer. The issue was finally fixed by adding support for buffer with size greater than 16K.

That’s all from us about Interoperability Testing. Subscribe to our blog for more actionable content which will be sent straight to your inbox.



Leave a Reply