A reader asks:
Do you have any tool to move External Files (nearly 70 TB) from Celerra & Centera to Isilon faster?
The StorageMojo take
I know EMC made it difficult to leave their Centera system for competitive systems, but making it difficult to leave for another EMC product seems perverse. Or maybe they don’t know how to do it either.
Readers, or Isiloners, any suggestions?
Courteous comments welcome, of course.
Hi, I was part of the original Centera development team. Together with some colleagues, I started a company called Datadobi.
Our first product DobiMiner provides detailed analysis of the data in your Centera based on the various metadata associated with it.
We also offer a tailored Centera migration service. Although it does not yet say so on our website, we are currently working on supporting Isilon as a migration target. We will have this available really soon.
Have a look at our website (http://www.datadobi.com) and send us a mail or call us if you would like to know more.
Hi, I work for EMC and can try to bring some insight into this.
Migrations can be tricky subjects and it’s best to try and understand all aspects of it, both technology, process and business.
I have a hard time understanding what is so perverse in that this isn’t a click-click process.
Migrating from Centera can be complex due to the nature of the system, it’s is an object storage system first and foremost, which means that the applications that store their objects on them needs to be adapted in the process so that they understand where to go looking for the objecs upon recall.
Also, Centera is frequently used as a compliance storage system, which in turn means that there might be chain of custody for objects involved as well.
All of it is rather multi-dimensional and not really easy to give an answer to off the bat.
For Celerra on the other hand, it’s “just” files, and they can be migrated as per more standardized practices, generally by copying over the network(s) involved and then pointing over the exports/file shares to that. Again, dependent on the applications and use cases for the systems.
I would urge the reader that posed the question to approach their EMC account team and/or partner with the questions and they can then work together to make this a successful migration.
EMC has recently introduced supported Hadoop distribution integrated with Isilon. Hadoop, as far as I remember, has dedicated import tools that parallelize the process. Should be optimal if they support multiple (Isilon) nodes as targets.
Centera is running into an ambigous future due to the big disks today and hence needs at least an alternative for a better scalable system which could store filesystem and contentadressed data in one system. The idea of putting these things together into Isilon is seductive 😉 Unfortunately there is one specific feature with the data written into Centera: Centera returns a contentaddress (CA) to the application which is the reference for future access to the data.
If a migration from Centera to a filesystem like Isilon is planned, the CA must be replaced by a pathname or URL so that an application could access previous data as if it has written it always into a filesystem.
The alternative is that the application needs to read each stored object from centera and rewrite it on a filesystem. What a pain!
There’s a small company in belgium called Datadobi who are the real experts in Centera. They have written a powerful administration and migrating tool with which data could be migrated between Centera’s. It’s imaginable that they could develop a migrationtool for extracting data out of Centera into a filesystem 😉
Hi Robin,
My company, Interlock Technology, is an EMC service partner located in Waltham MA that specializes in CAS-to-NAS migrations. We have a sweet spot in migrating data from EMC Centera to Isilon (along with other EMC targets such as VNX and Data Domain).
Our migrations are designed to be fast (3TB to 6+TB transferred per day, or even more for high-volume migrations), compliant (retention propagation to FLR/SmartLock and full chain-of-custody reporting), safe (no production impact or outage required during data transfer) and application transparent (we handle reconfiguring the Centera applications to ‘see’ the migrated NAS data as if it had always lived on NAS). In addition, there is no long-term commitment to our data migration system; once the data is copied and validated on the target and the applications have been reconfigured and tested our software is removed from the environment and the Centera can be retired.
We have performed dozens of successful CAS-to-NAS migrations at customer sites around the world over the last 2.5 years with EMC and other storage OEMs. More information can be found on our website (www.interlock-tech.com) or contact us directly via info@interlock-tech.com.
Thank you,
Mike Horgan
CTO, Interlock Technology
There are a variety of Centera migration technologies out there as others have posted (another one to check out is Seven10). At the end of the day, none of them can architect around the fact that Centera is a dog when it comes to performance. It’s going to be slow and there’s not much you can do about it.
@ Calle Liljeholm,
Migrating from object based storage shouldn’t be hard if your applications use open protocols and open access to metadata. Its just your implementation. Period.
The Dude.
@HoosierStorageGuy,
I realize this thread has been quiet for over 2 months but I just saw your comment and wanted to respond.
Centera isn’t really that slow, but it does have high per-transaction overhead which presents as much-higher-than-normal response latency. If you know how, it is possible to get quite decent mass migration performance out of it. Interlock’s system is very successful in extracting top performance during a migration; it’s not uncommon for us to see 90+ MB/s sustained average migration performance. In fact, around the time of your comment we were wrapping up a 300 TB Centera -> Isilon migration during which the data was transferred over the course of 21 days at almost 15 TB/day. During this migration the source system remained in production and the (very sophisticated) users of the Centera data did not notice any significant impact.
Any readers of this thread interested in similar Centera migration performance (or just wondering how we are able to do this!), please reach out to us at info@interlock-tech.com.
Thank you,
Mike Horgan
CTO, Interlock Technology
Hello, I had the original question the ‘reader’ had, and I realize the post was created 8 months ago, and the original question may have been completed by some professional services contract by this point, however as I am reading, none of the responces here have even remotely attempted answered or provid insite to the question: “Do you have any tool to move External Files (nearly 70 TB) from Celerra & Centera to Isilon faster?”
Unless of course the reader was trying to migrate raw data out of a Centera then there were a couple plugs for some utility/paid service that might help them.
I feel like the reader might have been in a similar situation to myself where I am staring down the bowels of a Celerra w/ CA/FMA/CTA appliance offloading old/archive data to a Centera. I have ~30TB to migrate.
This post is the most relevant search I’ve been able to find along the lines of ‘migrate celerra to isilon’ as I am going through the same task at the moment so I wanted to add some of what I have come across to help others in the future. I truly hope the original reader found his answer, and could maybe hop back on your blog at some point to provide more insite to the solution – or has chose a new career path. Maybe marketing – most of your responders appear to be in pre-sales, not so many post-sales type actual technical answers here…
1. There is a flag that needs to be set on the Celerra that either prevents or allows the data on Centera to be pulled back across instead of migrating archived file ‘stubs’. Find that setting before your migration to ensure you are really pulling everything over to the Isilon as FMA/CTA is only supported w/ Isilon by using Isilon as an Archive Target (replacing Centera). So you must make sure you do this if the final intent is to replace Celerra/Centera solution with an Isilon cluster.
(is this flag set at the DataMover, VDM, or Filesystem?-I’ll contribute more when I get to that answer)
2. ndmpcopy running on a seperate linux server (still messing around with this, trying to get the session to actually copy data – currently my connections are established but have not transfered my intended test data yet
3. robocopy or emcopy to run incremental syncs of the data (copy/purge) until your cut-over
Master this process and save yourself 10’s of thousands in migration services from EMC…
This environment sounds like Rainfinity/CTA. The challenge is that the original file reference is on Celerra, however the data may be there or just stubbed and actually be on Centera.
The migration choices are:
1. Pull everything back to Celerra and then use opensource copy tool. The problem is that Celerra typically was used as a cache and thus is many time smaller than what is archived on Centera so it takes some intelligence to manage the space. This also plays havoc with tools that time out waiting for recalls.
2. Use software that is smart enough to migrate only unstubbed data from Celerra and then get the remaining data from Centera. Of course this data can be cross referenced with the originating application and cryptographic hashes are used to verify files migrated from both Celerra and Centera. We have developed software to solve this problem and as far as I know are the only company that does this.
http://www.datatrustsolutions.com
To respond to Dan Feilner, the flag he is referencing is the “recall” type. It is set on a per file system basis. There are three recall flag types: passthru recall, partial recall, full recall.
In the scenario where a client (user or application) requests a file that has been archived and a stub file is present, pass-thru recall will provide the file back to the client but leave the stub file and archive data in place on the Centera. Partial recall will provide only the portion of the file back to the client needed to satisfy the client request, promote the partial contents back to the primary share (off/from the archive repository) and update the stub file to point only to archive data that was not promoted. Full recall will write the archive data back to it’s original location and delete the stub.
-Brandon Sanders
bsanders@data-strategy.com
The command used to set recall type is fs_dhsm.
The biggest issue during that initial migration was the offline flag remaining after the migration (passthru recall on Celerra). We wrote a power shell script to traverse for days through this to update those fully available files which had the remaining offline attribute bit turned on. Within the same month of completing this massive effort, EMCopy was updated to remove this attribute during the copy… Celerra and CTA w/ Centera had already been decommed and have since only been working on migrating the remaining scattered, and much smaller windows file servers. I know there are more options now like isi_vol_copy to leverage NDMP sending data faster, but I’m out of Celerra and VNX to migrate. I’m still most comfortable using EMCopy from Windows outputting logs if ever someone requests to validate the data copied, files skipped, etc.