File data in the cloud: What’s holding us back?

This is the second post in a series. Read the first post here.

“Every action has an equal and opposite reaction.”

We all learned Newton’s 3rd law at some point in our childhood. But how often do we actually recognize it happening to us? When we’re walking on the street, the thought of the pavement pushing back on every step never occurs to us. But for much larger efforts, like moving your company’s file data to the cloud, those opposing forces can be far more appreciable – and powerful.

This undertaking is becoming unavoidable, as businesses around the world look to phase out legacy data centers and modernize their operations. The cloud is often seen as a catalyst that enables the people and processes behind successful business transformation. And like our increasingly distributed workforce, it’s available from virtually anywhere. For nearly all enterprises, integrating public cloud services is no longer a question of “if” but “how?”

You may find yourself caught in the middle of two forces: the organizational mandate to move “everything” to the cloud and the gravitational pull of petabytes, or even exabytes, of data, supporting dozens of applications and workflows, anchoring your apps to a perpetual on-prem future.

We all know that moving to the cloud doesn’t come without risk and costs. After all, your on-prem applications are the result of years, maybe even decades, of investment. Some are so business critical that they must be kept available at all times. There’s a lot at stake and lots of reasons to be nervous about a migration.

In our previous post, we talked about some of the reasons lifting and shifting data and workloads to the cloud will benefit your organization. Now, we’ll explore some of the counter-forces – the risks, the issues, and the trade-offs – that can complicate your organization’s journey to the cloud.

Cost predictability

Imagine paying upfront for a truck rental only to receive a surprise bill upon return for every mile driven. If you missed that note in the fine print, that sticker shock will hurt. For many, the cost of running workloads and storing data in the cloud is hard to prepare for simply because 1) it may not be clear what you are and aren’t being billed for, and 2) cost modeling for a single application is difficult. The problem is compounded if you are trying to model costs for multiple on-prem workloads at a time. For example, You’d have to calculate your spend for data-at-rest, sustained throughput, and transactional costs while ensuring you’ve selected the right tier for a specific workload. Such a delicate process causes most organizations to move one workload at the time to minimize the risk of surprises and cost overruns.

Not only that, but file storage in the cloud can be 10x to 20x more expensive than on-prem for the same function or use case. For workloads that may be particularly challenging to accurately model, establishing a higher performance tier upfront as a buffer can become expensive if overestimated as you won’t be able to come down to a less expensive tier in the future. Lastly, not being able to “guardrail” your costs with the help of data visibility or optimization tools adds another element of unpredictability for your monthly invoice.

Management Complexity

New tools will always come with a learning curve. But when your margin for error is small and the chances for something to go wrong are high, few will sign up for that task — especially with business critical operations at stake. Turning to the cloud for file storage means your teams will have to contend with unfamiliar management interfaces, feature disparities, and completely different toolsets requiring more than a steep learning curve. Team structures need to be re-envisioned, new roles created, workflows re-architected, mindsets shifted, and third parties (with more tools to learn) may be involved for migration. With so many moving pieces all at once, something is bound to go wrong along the way. It’ll just be a matter of when and where. Those looking to hedge their bets with a hybrid model will likely deal with similar issues, but on multiple platforms simultaneously, all the while trying to maintain continuity of operations for their end-users and customers.

Massive Data Sets

IDC estimates the global datasphere will grow to 175 zettabytes in just a couple of years. Yes, that’s 21 zeros! So, if your organization is sitting on a ton of data right now, you’re not alone. It’s not uncommon for most organizations now to be storing and managing multiple petabytes, if not exabytes of data, and growing faster than ever. Therefore, taking advantage of the virtually endless resources in the cloud makes perfect sense. Getting your mountain of data there however, is another story.

You might first look at some inexpensive options to migrate data and quickly realize it’s too slow and going to take too long. Yet, faster tools will cost you more than you hoped for and may still take considerable time depending on how much data you have to move. On top of that, even if you’re able to spend the time and money to move all that data over, many of the cloud-native file storage solutions available today have limitations with scale. For example, if you have multiple petabytes of data to move to the cloud, but limited to 100TB per volume, migrating all that data over will also mean more of the complexity mentioned above to manage the same amount of data in the cloud.

Skills Gap

Arguably the most important key to the success of any transformation effort is the people behind it. But what happens when there’s simply not enough proficient workers to meet the surging demand for cloud skills? Some IT leaders are going so far as to call this shortage an “existential crisis” for their organization. A recent survey1 by IDC reported 70% of respondents are experiencing a cloud skills gap in their organization with nearly half saying it’s “severely impacting” their delivery, performance, and growth. Nearly one in ten survey respondents admitted to fearing for their company’s very survival.

Like most IT leaders, you may be faced with a dilemma; upskill your existing workforce or outsource/hire for new talent? While upskilling allows you to preserve the human capital you likely fought hard to acquire and/or retain through the pandemic, it will take time and money to formally train your staff or allow them the “tinker time” to reskill themselves.

On the other hand, outsourcing to an MSP or recruiting a new team might accelerate your in-house cloud proficiency and help you execute on your strategy faster. However, involving 3rd parties or building new teams can lead to interdepartmental tension, operational complexity, and institutional resistance from employees who feel marginalized. On top of that, it can also get pretty pricey.

The right approach for your organization may not be an “either/or” decision, but more of an integration. Some new talent will need to come onboard and collaborate with existing staff. Fostering this partnership allows your organization to benefit from the institutional knowledge of veteran workers while also instilling new cloud proficiencies, strategies, and mindsets across the team, while minimizing any need for formal training and downtime.

Control, security, and compliance

Transferring control and residence of your data from owned infrastructure to a 3rd party vendor may feel like sending your child away to boarding school. Not having physical control over your data can raise security worries, especially now that you’re having to re-architect years of judiciously planned and executed security and governance policies. You now have to trust (and verify!) that your cloud vendor has a hardened security posture that meets or exceeds your business requirements. You may also be in a highly-regulated industry like healthcare or financial services (or any vertical in the EU for that matter), requiring you to follow strict rules and policies when it comes to data storage and management.

In the U.S., regulatory frameworks like the Health Insurance Portability and Accountability Act (HIPAA) provide strict requirements to prevent the exposure or theft of Protected Health Information (PHI). The onus is on you to ensure all PHI is being stored on a HIPAA-compliant solution. In the E.U., General Data Protection Regulation (GDPR) laws require you to maintain full control over how and where someone’s personal data is used.

When it’s in the cloud, data locality, ownership, and control must be clearly defined in contractual agreements, consistent with applicable laws, between you and the vendor. Many cloud platforms offer compliance certifications and attestations for multiple laws, standards, and frameworks, but it’s still on you to verify the security and integrity of your systems and data when you move to the cloud.

Does file have any place in the cloud?

Too costly, too complex, too large, no cloud proficiency, and loss of control — is file data forever grounded in the core data center? Will file-based workloads remain only on prem for the foreseeable future?

Not a chance.

As we laid out in our previous post, the tailwinds pushing all unstructured data to the cloud are just too strong. Stacking more racks and renting more floor space in the data center alone won’t let us keep pace with the explosion and spreading of data today. Workloads like generative AI that rely on file data are proliferating, and the growing prevalence of hybrid workflows are pushing data to be more mobile and distributed than ever before. With the on-going rise of cloud-native applications, the cloud will move closer and closer to the center of the modern data lifecycle.

In due time the hurdles we listed out here will be lowered and more file based workloads will find their second home in the cloud. As it is, Azure Native Qumulo already makes it simple and fast to provision an enterprise-scale file system on Microsoft Azure. ANQ is also more feature-rich, scalable, and competitively priced than just about every other available service, offering an identical management experience in the cloud as on-prem. And we’re not stopping there. Stay tuned. The best is yet to come.

Originally published on the Qumulo blog, by Aaron Oshita, July 27, 2023