1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How much free space to leave on an external HDD used for storage? + Corruption multiple backups?

Discussion in 'SSD and HDD storage' started by 321Boom, Nov 21, 2017.

  1. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Might be a problem was depending the flat owner, what your electrical box is like, and where power comes in to the house (mini-split usually will have its own dedicated breaker installed). Since its your parents' house and the breaker box won't be filled to capacity already, there's probably no consideration other than money. Noise level wise if you can go for a mini-split it's going to be the quietest and cool the best of all possible options.

    Costs where you live sound alot more reasonable than here for the install-job. If you can get it that low, then it's definitely worth it to go for a mini-split.
    ^ I'd go for the mini-split considering you plan to be there for many years more.

    -Out here I'm using PAC's to cool my computer rooms, yet I'm trying to be economical and not permanent installation. Another good use for PAC's is that they're mobile, if you ever have central air break, or any smaller unit break, you can roll in a PAC and use it temporary rather than having the room cook for a few days.

    InfiniBand is another option to Ethernet -- check out this blog post. It can be cheaper ... but configuration is also not as straight forward as Ethernet. You don't just plug it in and it works.

    Sometimes modern motherboards come with 10gbit Ethernet integrated, though these are at the more expensive models. I'd say that the only tradeoff is cost since you'd wind up having to buy extra hardware minimally for the computers you're using right now to get them on a faster network.

    Only the motherboard influences it, yep. You don't need to be that careful with the brand of ECC memory that you buy beyond just the unbuffered or buffered choices usually.

    EDIT: I take that back, just don't buy "BlackDiamond" memory per motherboard compatibility. Crucial, Samsung, Kingston, etc, are fine (any big well known brand name).
    ---Board makers usually post a QVL (qualified vendor list) in their documentation, though I've only had memory that's not on a QVL not-work with BlackDiamond.
    Last edited: Jan 12, 2018
  2. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Thanks for the information, I'll go for a mini-split then. Besides, I have experience with these in the past, and they really are awesome xD

    Haha that's a first, usually everything is really expensive here compared to other countries.

    That is a very good point about the PAC as a back up for the AC.

    Ok, I read the blog post, and I'm sorry to say that some of that stuff seems a bit out of my league to set up I reckon :/ Even the writer of the blog post said it took him a good few days to set it up, and he knows what he's doing, rather than me being a first time user :/ Let's try keeping it a bit simpler, as long as I won't feel any negative impacts with a standard 1gbit link (you know as long as gameplay recordings at 1080p 60fps, around 45,000 average bitrate, don't have trouble to be streamed from the server, don't want them to get interrupted with stopping to buffer while viewing them though, that would be mega annoying and disappointing).

    That's awesome that the only tradeoff is cost, thought it would be something like heat or noise.

    Noted about BlackDiamond, thanks for warning me about your negative experience :/

    A couple of questions from some more research I did:
    1. 'ZIL (ZFS Intent Log) can be moved to an SSD to drastically boost performance of your ZFS pools, but it is obviously write intensive, and requires a small (<32GB should be fine) but robust SSD if you want to do that. Again though, if this fails, the system simply goes back to writing the ZIL on the same drives that are already in the pool.'
    Will this benefit me for what I plan to be doing with the server? (remember it's basically just going to be my storage drive, and watching videos from it, nothing really more intensive than that)

    2. Interaction of TLER with the advanced ZFS filesystem
    'The ZFS filesystem was written to immediately write data to a sector that reports as bad or takes an excessively long time to read (such as non TLER drives); this will usually force an immediate sector remap on a weak sector in most drives.'
    Is this something I should take note of, or something that is instantly set up? Will setting a lower TLER be better? If I I'm understanding correctly it's better to have TLER disabled if you have a RAID set up with 3 mirrors.
  3. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Yep, cost only.

    Especially if you go the Ethernet way and not with some exotic setup (you're right that InfiniBand would require some days of study to get working, especially on BSD) -- the more expensive 5gbit and 10gbit NICs are pretty well plug and play compatible with FreeBSD out of the box. However then you need the NICs, potentially the switch, sometimes higher grade cable (or a repeater if a long run), etc, and that's a few hundred bucks easily.

    (of course as always read online first to check up on BSD compatibility, though in my experience the faster Ethernet NICs tend to just-work out of the box [plug them in and that's it for setup] )

    You could do that, but I don't think you're going to see much of an improvement realistically.

    The only situation that using an SSD is going to help for is very-intensive writing to the array. To hit a sustained speed where the buffer can't be commit to the array faster than you're transmitting data (over the network) would require that you upgraded to 10gbe and were reading from an SSD. Your array (due to having many striped disks in it to get up to 24+TB capacity) will already be pretty quick and capable of sustaining several hundred MB/s of writes (faster than 1gbit will provide the ability to send). The CPU in the NAS is also dedicated to ZFS' chore of committing data and it won't be fighting against anything else roles wise.

    The defaults I suspect won't cause you any problem -- you could reduce this if you want to, but again I don't think you'll need to do any tuning there realistically. The fact that you have three-way mirroring does provide protection already. Although early detection can give you a performance hit, I really doubt there'll be any noticeable hit for your use. That is, given this is for write-performance and the array is primarily intended for archival and long term storage of lots of data.
    Last edited: Jan 19, 2018
  4. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Good to know in case, always good to have options, but will I really need more than a 1gbit NIC for what I intend to use the server for?

    Awesome, so no need for the SSD then.

    This 3 way mirroring really is a blessing huh with all the stuff it protects against. Yep you're right, not going to be doing too many intensive writes, main use is to use it as my storage drive and to watch/stream videos from it.

    You know how we talked about Robocopy before? Well I was going to give it a shot today (to avoid more needless non-ECC writes in future back ups till I get the server), and I read that there is a GUI version called Robocopy GUI (https://en.wikipedia.org/wiki/Robocopy#GUI), and also an updated version of the GUI called RichCopy. Any opinions on these GUI versions vs the standard command line version?

    RichCopy: [​IMG]
    Robocopy GUI:
    If they're as good as the command line version, what checkboxes to check? The Robocopy GUI has loads of copy options and filters which I really don't understand unfortunately :(. What I'm aiming for is to mirror my storage drive that's in the gaming rig onto 2 external drives as back ups (till I get the server). In about a month, when it comes round to the next routinely back up, I'll just want it to update/sync the 2 back ups only with the newly added files, and files that had any changes made to them.

    On a separate note but still related to Robocopy and file syncing software with the point I'm getting to: I don't understand why, if I simply open a Word document, (not edit it or save, just open and view) the folder where the Word doc was in gets it's 'date modified' field to the date I opened the Word doc, pic attached to understand what I mean, look at the folder called 'To watch':
    Before and After
    Important to note that I didn't save or edit anything, just opened it. Also what's even weirder is that opening an Excel file or Notepad doesn't change the modified date like the Word doc does... Anyway, point I'm getting at is, with Robocopy and Rsync, since these programs sync according to a time stamp and filesize, how will they behave to this change? Will they detect the folder as a new one and just resync it?

  5. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Robocopy and rsync will only synchronize files within the folder that have changed, "even if" Windows Explorer shows a newer timestamp on the folder itself (access, modify, creation). So, in otherwords even if the folder shows a different timestamp (newer), when robocopy descends in to that folder and recursively looks at each file it'll make its decision totally independently on a file by file basis.

    -This is a gray area for me on the GUI's, since I haven't played with robocopy GUI's much. I can tell you the argument that you want though here: "/MIR" -- which stands for mirror.
    Take a look here on what each of those commands do. In jist, /MIR is /E coupled with /PURGE.

    /E: Copy subdirectories, including Empty ones. (aka, recursive copy)
    /PURGE: Delete dest files/dirs that no longer exist in source.

    The reason that you want /PURGE is because if you rename a file, /E will copy a new file under the new name, but the old won't be erased afterwards. Same goes if you move the file somewhere else, such as making folders for organization and shuffling items around. You'll wind up with alot of duplicates from /E which /PURGE will remove the duplicates of for you automatically.

    So, as an example here your command could be as simple as:
    robocopy <source> <target> /MIR

    ^ This doesn't have a retry count or delay specified. That /R and /W look fine in the GUI. I assume the GUI will give you a readout at the end, just like robocopy does from a command line, so after the copy finishes you just read the report and check if any transfers failed.

    You might also want to add the /MT argument ESPECIALLY if you have a ton of tiny files rather than big-files. This doesn't sound like it'll help much for the bulk of the content you're storing, so you'd have to experiment and test there. Multithreading helps where there's a choke on seek-time (eg, small files), yet can actually hurt sustained transfer for large files.

    [also as a caution, using /MT usually does increase fragmentation in the copy -- this can be a big deal like if you're cloning one disk to another from scratch]

    ^ Try with /MT:1 and also with a higher number if you do this more than once, and time them both. It really depends on the type of files you're copying which will do better. /MT:1 from a disk to an empty disk will create a copy that's almost fragmentation free, which is why alot of people do this for a two-birds with one stone type deal.

    In /MT:1, robocopy will behave similar to xcopy.

    This is an odd one. Has to be program specific though, likely a 'feature' of Word Documents to record the last access of the file or write some transaction log of any user that's touched it.
    Last edited: Jan 19, 2018
  6. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Hmm, ok, so one thing concerns me about this, sorry if it's going to get confusing x_X: When I get the server, I'm planning on connecting my 4TB storage drive that's currently in my gaming rig to the server, and mirror everything from the 4TB drive using rsync, so the server will have all of the data that's in the 4TB drive, and I will start saving new stuff directly on the server from here onwards. So the server will become my primary storage location, and I'll be using the 4TB storage drive that was in the gaming rig as one of the 2 external drives method I already implement (I don't need it in my non-ECC gaming rig anymore since I'm saving everything to the server now). The data I rsynced from the 4TB drive will have a new creation date I assume (for example a file on the 4TB drive will be in 2016, which is the date I originally saved it, but 2018 when I copied it to the server). When I come to rsync again to the 4TB storage drive, this time the other way around though, so from the server to the 4TB drive (using it as an external drive in an enclosure), will it delete all of the data that was originally on the 4TB drive since the different creation date (this time seeing it as 2018 instead of 2016) and recopy it with the new creation date?. Reason I ask this is because I would like to keep the data that was originally saved on the 4TB drive since it's the original copy of the data (wasn't moved around with copies, in case of errors), rather than that data being copied to the server, then the server deleting it and recopying it.

    Yes the /MIR command sounds like what I need. I agree on the PURGE option, otherwise it wouldn't be a complete mirror without it, and end up with lots of duplicates and more confusion.

    So, I found this:
    'The /mir option is equivalent to the /e plus /purge options with one small difference in behavior:

    With the /e plus /purge options, if the destination directory exists, the destination directory security settings are not overwritten.

    With the /mir option, if the destination directory exists, the destination directory security settings are overwritten.'

    It sounds like what you told me above, but what is it referring to with 'destination directory security settings'? Is it better that these security settings are overwritten using the /MIR command, or better that they're not by using the /e plus /purge options?

    That's good if it could be something that simple (I'm not really a fan of DOS and command prompts)

    Is that a good thing or bad that the command prompt won't have a retry count or delay specified?

    I know this will sound very amateur as a way to check compared to your methods, but what I do when taking back ups to the external drives is right click -> Properties on the folders I backed up and make sure the Size and number of Files and Folders are the same as the source. (On a sidenote just for knowledge, any idea why Size and Size on Disk is different in the Properties tab?)

    Will this /MT command make the files copy more securely, or is it just a matter of speed? I have a mix of files, ranging in small files (like anime art) to large files (gameplay videos).

    '[also as a caution, using /MT usually does increase fragmentation in the copy -- this can be a big deal like if you're cloning one disk to another from scratch]'
    So what would be the benefit apart from speed, for using the /MT command if it has this drawback?

    '^ Try with /MT:1 and also with a higher number if you do this more than once, and time them both. It really depends on the type of files you're copying which will do better. /MT:1 from a disk to an empty disk will create a copy that's almost fragmentation free, which is why alot of people do this for a two-birds with one stone type deal.'
    Only advantage of this is speed, and nothing else? I'm in no rush, especially if I know everything is getting copied more safely. So to set /MT:0, I just don't type in the /MT command, or do I have to specify /MT:0? This would be the safest route?

    'When specifying the /MT[:n] option to enable multithreaded copying, the /NP option to disable reporting of the progress percentage for files is ignored. By default the MT switch provides 8 threads. The n is the amount of threads you specify if you do not want to use the default.'
    Regarding this statement, does it mean /MT is enabled by default, as /MT:8?

    Sorry for all the questions, I know some of them seem very noobish and probably stupid, but I'm really not familiar with command prompt and the like, and pairing this with taking a back up of my data sends my ocd a bit on the fritz, knowing I'm using such a powerful tool on something so delicate. To me it feels like I'm attempting a glass sculpture with a jackhammer, as you told me, robocopy and rsync are very powerful tools, especially with their destructive capability, and with great power comes great responsibility.

    Indeed haha, had me running my head in circles for a while till I figured out what was going on. Seeing updated folder dates when I know I didn't save anything new lol.

    Thanks again for all your help, and patience. It's very nice of you taking the time for someone you never met.
  7. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Fortunately rsync and robocopy have the ability to preserve timestamps in the transfer, so this won't be a problem for future re-syncs.
    --This is why these tools are better than just doing a drag-and-drop copy (attributes, rights, ownership, security, preservation). ;)

    /MIR will maintain timestamps in robocopy and also sync folder rights. In rsync, use "-av"

    -v is verbose mode (more output to read, aka "tell me what you're doing" mode)
    -a is archive mode (equal to specifying -rlptgoD)
    -r is recursive (that's like /E in robocopy)
    -l is maintain symlinks rather than copying the symlinked target
    -p is maintain permissions (eg, keep rights)
    -t is maintain timestamps
    -o is maintain owner
    -g is maintain group (also ownership related)
    -D (--devices, --specials)
    --devices, preserve device files (super-user only)
    --specials, preserve special files

    Entirely has to do with speed.

    --The OS will read-ahead of what software requests in a file as long as it's able to do so (a contiguous read), each copy is reading a file start-to-end and writing it start-to-end to the destination. The issue with this is small-files where the OS doesn't know what the copying program (in this case robocopy or rsync) needs to read beyond the open file. eg, the OS can't predict what the next file to be opened is (that's unknown).

    With an SSD that's no-problem, because SSD's have fast random access speeds. Yet a hard-disk this is just not the case for. So, if the OS doesn't know the next file, then the next file in the transfer for sure will be delayed until the next rotation of the disk (this is time that the drive is 'idle'). It's in your favor speed-wise to queue up many files at once (try to open more than one), so that the OS can more intelligently schedule reading these files in less passes across the surface of the disk. Unfortunately this is a tradeoff, since with writing multiple files at once data isn't going out serially one file at a time, and most OS's (Linux and Windows alike) will make the decision to space out files in writing them to the disk, rather than packing them together (which increases free-space fragmentation to try to keep down file-fragmentation).

    The fragmentation is no big deal by the way (for archival), but the transfer itself may also go slower. All modern OS's prioritize reading many files quicker (eg, prioritize perceived "responsiveness of the system") rather than focusing on single files (the big videos). NCQ's orbit will be optimized to maximize the number of files opened rather than maximizing sequential-read speed.

    --It's usually a tradeoff and there'll be some "magic value" for thread count where a certain number of threads in that MT argument will result in the fastest transfer. Past a certain point (too many threads), speed will go down.

    No other benefits, the parallelization is just for speed. The default '8' is a middle ground ballpark guess assuming a mix of small files and big files.

    Yes, it'll be enabled by default. You may want to even go higher depending the situation.

    For example if I wanted to robocopy my MinGW folder off an HDD to a RAMDISK. [87,953 Files, 2,060 Folders, 1.35 GB]
    ^ This is the perfect case for bumping the MT count to 20+, as those are primarily very small text files (headers).

    Just think of it like the "With Great Power Comes Great Responsibility", Spider-Man, Winston Churchill, etc. All risk of batch operations can be avoided by testing the waters first. Start small scale, apply large scale later only after verifying success first. (just use them responsibly)

    I personally think of this as different than a glass statue. A glass statue is chiseled apart with small successive operations, but the automation of this would be more like using a CNC engraving tool that has a detailed blueprint of its cutting path. robocopy and rsync are simply batch operations, repeating the same job over and over.

    So in this case it'd be like if you wanted to move a mound of dirt yourself, handful by handful, vs hiring a thousand employees to do the same task as you.
    --One job is complex and requires central control / leadership, the other is just repetition. As long as you know that one small operation is performed right, you can assert that repeating that operation will get the job done.
    Last edited: Jan 21, 2018
  8. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    That's very good to know that rsync and robocopy keep the original timestamp. Good to know that the original copy of the data on the 4TB storage drive will stay untouched and not overwritten. Thanks for the clarification :) It does sound like using these tools has it's benefits to the casual drag-and-drop method.

    I'm in no rush believe me, all that matters to me is that the data gets copied across safely. So I could just leave it set to the default value of 8, or would lowering it make it safer due to less fragmentation?

    By 'enabled by default', you mean if I had to do: robocopy <source> <target> /MIR, it will have the same result as if I typed in robocopy <source> <target> /MIR /MT:8 ?

    Yep, definitely agree on testing the waters first. I'll create a few dummy files on my laptop and try a couple of syncs before I actually go for the real attempt on my gaming rig.

    Yes I know these tools will be very helpful in the future once I'm more familiar with them, but till I actually do it the first time and see that it turned out right is where all the paranoia comes in haha.

    So I read a couple of things about the /MIR command in robocopy:
    1. 'Use the /MIR option with caution - it has the ability to delete a file from both the source and destination under certain conditions.

    This typically occurs if a file/folder in the destination has been deleted, causing ROBOCOPY to mirror the source to the destination. The result is that the same files in the source folder are also deleted. To avoid this situation, never delete any files/folders from the destination - delete them from the source, and then run the backup to mirror the destination to the source.'

    Ever heard of this? It's quite terrifying and worrying. This site confirms that other users also experienced the same problem: https://social.technet.microsoft.co...elete-files-from-source-?forum=w7itproinstall

    2. In the 2nd link above, in one of the comments at the end, this guy states 'Mirroring is not safe when involving NTFS reparse points', what exactly are these reparse points? Am I using them? (I tried googling what they are but I couldn't understand it unfortunately, some computer terms are alien to me x_X) The same guy also linked to an article with his findings (you might find this interesting for yourself, it concerns symbolic links): https://mlvnt.com/2018/01/where-robocopy-fails/

    3. I read an article that 'Robocopy fails to mirror file permissions – but works for folder permissions.' (https://blogs.technet.microsoft.com...bocopy-mir-switch-mirroring-file-permissions/) with a work around to also copy security info:
    > ROBOCOPY /Mir <Source> <Target>
    > ROBOCOPY /E /Copy:S /IS /IT <Source> <Target>
    Another command is mentioned in relation to the above two commands: > ROBOCOPY <source> <target> /MIR /SEC /SECFIX

    What exactly are these file permissions and security info? Will I need the above code, or just the /MIR <Source> <Target> in my case?

    4. Due to these hiccups I'm reading about robocopy (especially the /MIR deleting source and destination in questions 1 and 2), would I be safer implementing a method of using one of the external drives with TeraCopy (that is, deleting the previous back up that was on it, and copying everything across again with TeraCopy, similar to the current drag-and-drop method I already use), and using the 2nd external drive for the /MIR command with robocopy? (both externals should have the same data in them at the end of the transfer this way right? and it might be a safeguard against the /MIR issue). Your thoughts?
  9. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Yep, you got it. If not specified then 8 copy jobs are used.

    I've never heard of this nor seen it, at least not unless I made a mistake myself [with junction points or swapping the source and destination order]. (just like my warning of doing that earlier in the thread) I'd strongly suspect as other people are concluding in those discussions that these folks (who are claiming missing files or folders in the source) are making the error of flipping their source and destination. Or as it was also put, it really wouldn't make sense to have a mirror command that has two-way deletion. (this would be ambiguous and like playing Russian roulette or rolling dice)

    Put another way, how would robocopy know whether or not a file has been deleted "or" if it was just a newly created file in the source? From this premise you really have to know a direction of the mirror operation as there's no accessible log to show otherwise -- a mirror operation would literally be rolling-dice or relying on some voodoo to make an educated guess on each deletion.

    Very very unlikely if you don't know what they are. You'll "probably" wind up using them after you have a NAS, yet by this point you'll have a better understanding of them too.

    People typically call these junction points, symbolic links, etc. Most modern filesystems support them (including on Windows, FreeBSD, and Linux) ... an easier way to think of this is that it's a filesystem feature to have 'pointers' or 'links' stored rather than the actual file. You've used shortcuts before (for launching programs or going to a folder), junctions are the same concept, yet for software rather than the user. By using these special links, you can store a file (the actual file) anywhere that you wish (even another volume), and substitute in the location that a program is looking for files, with instead a link.

    This has all sorts of purposes, but in jist junctions provide a mechanism to reduce space consumption (explicitly get rid of duplicates) or to shuffle and move around data without software having support added for this. For example: if I plug a new disk drive in to my machine and I decide that I want to move a game to another drive (because I'm running out of space), I could create a junction point from my main disk's partition(s) to the new drive's, move the files over to the new disk and voila (done). <= easier than reinstalling the game right? (same can be done with save-folders, etc, that otherwise would be hard to relocate / hard-coded in the game)

    --Without any registry changes or otherwise the game is now running with the files stored at the new location. The game also probably has no knowledge that it has been moved. Although it's technically possible to determine this and to check for junction points, almost nothing makes an effort to do this [other than some anti-hack software such as GameGuard, and of course file-copy and FTP tools].

    Modern filesystems (NTFS, ReFS, ZFS, EXT, etc, etc) have a concept of permissions and ownership. Each file on your drive(s) is owned by some user, and there are various access rights assigned to each file on an account by account basis.
    -You can actually see these pretty easy under Windows: Right Click on a folder (or file), select properties, and then click on the Security Tab.

    For the types of files that you have (such as your movies and documents), the rights on these files will probably not matter (there wouldn't be a reason to preserve them). For installed software, it "can" matter if programs create a special account that they run under for the sake of security. (... usually doesn't for things like games -- this is more specialty) Just like on Linux, some file servers on Windows will setup an isolated user-account with dropped privileges to limit the sections of your drive that they can access. It's an effort to limit what the software could write to "if" the server were to be compromised.

    Whatever you're most comfortable with doing is probably best to do. I'll repeat though that I've never personally seen robocopy delete files from the source unless it was my fault.

    --Theoretically you can do "bad stuff" such as creating a junction from the destination back to the source. Yet I'm going to classify that as user-error in use of rsync or robocopy too. rsync has some features to protect against this such as specify the device boundary of a copy (not letting the copy operation hop disk to disk). I don't know if rsync implementations on Windows can do this, yet the Linux versions can.

    Teracopy is great in its own ways, and that's really my opinion on it (it's a different type of tool). Teracopy is fantastic for doing a copy where you want to be 100% sure that the copy has arrived at the destination, it can be configured to do a read-back and checking of hashes on each file. I'd say it's a different purpose than robocopy, and I usually go with Teracopy then I want to clone a full drive to another or just write files to my NAS.

    eg, I used Teracopy as my go to when I migrated my storage volume from NTFS to ReFS on this desktop (copying everything to an empty drive, for the verify feature).
    Last edited: Jan 22, 2018
  10. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Got it. What I still don't understand though, is if I'm better off using /MT:1 or just leaving the default 8. Below are 2 quotes from your earlier post regarding fragmentation, which keeps circling in my head:

    '[also as a caution, using /MT usually does increase fragmentation in the copy -- this can be a big deal like if you're cloning one disk to another from scratch]'

    '^ Try with /MT:1 and also with a higher number if you do this more than once, and time them both. It really depends on the type of files you're copying which will do better. /MT:1 from a disk to an empty disk will create a copy that's almost fragmentation free, which is why alot of people do this for a two-birds with one stone type deal.'

    Don't the bold and underlined parts sound like what I'm going to be doing, since I'm going to be mirroring? (building an exact replica from scratch, so the fragmentation wouldn't be good to have). So wouldn't I be better of with turning off /MT? (Is there even a way to turn it off, since it's on and set to 8 even if you don't type in the command /MT?)

    Glad to know you've never had a problem with this :)

    Thanks for explaining it. It sounds like a very good concept for those that need it, but for what I intend to do with the server, I highly doubt I will need this feature. Yes while it will be quicker than re-installing a game, the games I play are retro and shmups (very small file sizes), so installation is lightning fast, so it wouldn't really be an issue. (it will probably take me longer to learn how to set up the reparse point and implement it than re-installing a shmup haha :p) Thanks for explaining it though, always good to learn more.

    Got it, so it's not something for me to take note of for my use?

    This isolated user-account you mentioned, the server will set that up on it's own or I have to do that? How will it know what sections of the drive it will lock off if everything is going to have the same security privileges?

    'Whatever you're most comfortable with doing is probably best to do' lol you're the data expert, I'm just an extremely paranoid user, don't just leave it up to me :p If it's recommended, a copy with TeraCopy on one external, and a copy with Robocopy (/MIR) on another external I think would be the best course of action.

    'Teracopy is fantastic for doing a copy where you want to be 100% sure that the copy has arrived at the destination' I always want to be 100% sure, that's the point of a back up :/ So, robocopy isn't as 100% sure as TeraCopy? (what would be the point of having a mirror if you're not 100% sure everything is readable in it?)

    'for the verify feature' Isn't /L in robocopy like dry-run in rsync? If so isn't this a verify?

    Well, I bought TeraCopy Pro last night, and toyed around briefly with it, and I'm happy to say I really like it from the brief moving and copying I did (testing the program out). I also noticed it preserves the time stamps which is awesome. 'it can be configured to do a read-back and checking of hashes on each file', are my settings below for TeraCopy doing this?:

    What's up with all the grayed out boxes btw? Anything else I could do from those settings to get the most out of this program? (I unchecked confirm drag and drop, it would be tedious getting a message every time I want to copy or move something). Why would anyone want to 'disable asynchronous file transfer', it speeds up transfers doesn't it?

    Why is Verify grayed out here? Anything I should do with Save checksums (source, target, also why is MD5 grayed out too, isn't this one of the most important features of the program)?

    Thanks once again for all your assistance and time, I really can't stress it enough, I've learnt so much these last 2 months thanks to you.
    Last edited: Jan 23, 2018

  11. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Sounds like you understood this quite well.
    On a first time copy to the server where files will thereafter live permanently, it might be worth doing it with a single serial copy operation [to pack everything tightly together]. You can pretty well assume that over half your files in archival (*probably more*) have been untouched in many months to years.

    ^ So, after these files are packed tightly, they won't move and won't come apart. Meaning they won't impact writes to the rest of the disk as much. (eg, having a nice large contiguous chunk of free disk space for sequential writes)

    --For copying to an external drive where the drive is just for an emergency backup (catastrophic failure), there's not much incentive to do this though. That is, given you'd only use the drive if your main storage were to fail, and in that case you're only using it for a one-time copy back of the data (to restore those files on the main storage). Whereas the freespace fragmentation on a server is something the server OS will be dealing with over time as it builds up, and you have a choice to start with a more clean slate right away. (with that large chunk of files that probably won't ever move)

    From what you mentioned earlier in the thread, you might want score files or savegames saved directly to the NAS rather than your local drive. (to protect against loss on the gaming machine's disk failing) This is where you'll probably wind up using at least symbolic links. (and potentially junctions if a folder will have some files local, and newly created files remotely stored)

    For security it's a good idea to be aware of it, yes.

    Task Managers on Linux, Windows, and BSD will show you the user that a program is running as. By its nature, if you see programs running under other user accounts, that usually means they've created another user for their own files. In all software that I've used over the years this is something far more likely to see in the Linux and BSD space than on Windows. (although "some" Windows servers will do it, not that many do -- usually you'll see it in ports of unix style services to Windows)

    I'll cite probably the most popular example here in the OpenSource world for hosting, apache (HTTP Server):

    x@www:~$ ps -ef | grep apache
    root      1038     1  0 Jan19 ?        00:00:18 /usr/sbin/apache2 -k start
    www-data 12041  1038  0 01:44 ?        00:00:00 /usr/sbin/apache2 -k start
    www-data 12134  1038  0 02:31 ?        00:00:05 /usr/sbin/apache2 -k start
    www-data 12677  1038  0 07:57 ?        00:00:04 /usr/sbin/apache2 -k start
    www-data 12941  1038  0 10:22 ?        00:00:04 /usr/sbin/apache2 -k start
    www-data 12942  1038  0 10:22 ?        00:00:02 /usr/sbin/apache2 -k start
    www-data 12943  1038  0 10:22 ?        00:00:04 /usr/sbin/apache2 -k start
    www-data 12983  1038  0 10:56 ?        00:00:03 /usr/sbin/apache2 -k start
    www-data 13013  1038  0 11:00 ?        00:00:02 /usr/sbin/apache2 -k start
    www-data 13015  1038  0 11:00 ?        00:00:01 /usr/sbin/apache2 -k start
    www-data 13082  1038  0 11:46 ?        00:00:02 /usr/sbin/apache2 -k start
    www-data 13083  1038  0 11:46 ?        00:00:01 /usr/sbin/apache2 -k start
    x        13733 13694  0 17:39 pts/1    00:00:00 grep --color=auto apache
    x@www:~$ cat /etc/passwd | grep www-data
    x@www:~$ ls -al /contents
    www-data           www-data    4096 Jan 12 23:10
    ^ Notice that special account "www-data" which all of the website files belong to? You may also notice that it doesn't have any shell login-privileges. This is a special reduced rights account that apache assigns itself to immediately after startup and binding its TCP port. The idea is to leverage filesystem permissions in an effort to reduce the risk of apache touching things it probably shouldn't. (anything it's not hosting) Of course this is by no means foolproof defense, but it's better than nothing and can protect against some types of scripts doing harm to a server.

    --When you install software, if they're going to do this the setup will usually automatically do so. (eg, completely done for you in the case of apache, etc) The reason it's worth being aware of file-permissions is to match those permissions if you're making changes to files in folders the programs created. (which might, in changing privileges unbeknownst create a security hole)

    I like to think of it as baby-steps (a bit more each time).

    It's like demanding that someone just getting in to computers (who's not that computer literate or comfortable with machines) build one on their own. That sounds "scary" to most people and they're usually not willing to make that leap immediately (despite that it's not so bad), thus you start small. Edge them to install a RAM upgrade, maybe a videocard. After they get their hands on inside a machine that's pre-built for them, then you step it up whenever they're more comfortable (and after they've installed an OS on their own) until soon they feel it's no big deal to venture further. (just a little bit further each time building up self-confidence / comfort)

    Fortunately your situation is different (you can call yourself a power-user and give yourself more credit). You've installed OS's (though Windows to this point), use computers daily, are already doing backups of your data, and sound like an eager enthusiast wanting to do more and willing to adventure with things you've not used. (like FreeBSD for the NAS) Though yes, the bottom line is that it should always be fun and not frustrating.

    --I think you'll find TeraCopy quite easy since it's GUI based, and then after using robocopy you'll find rsync is easy by the time you build a NAS. (as well as getting more console exposure, which will be good for stepping in to BSD)

    The grayed out boxes are because TeraCopy is split in to a service (service does the copying to avoid rights-issues) and the main GUI program (which runs as a standard user). Those grayed out boxes are executed from the client, so if you launch the client as administrator then some of these will be ungrayed.
    --Copy handler and association would've already been setup by the installer though, so there's no need to launch it as admin unless you wanted to remove those.

    Asynchronous transfer is considered redundant by some people (due to the OS sequential file read-ahead prediction, eg the OS already trying to speed this up). It can result in buffering parts of files in memory twice: the OS buffer, and TeraCopy's read buffer for files. Mainly though I'd think of this as a troubleshooting switch as I've never had it cause problems.

    -These are grayed out as you've not performed a transfer.

    After you've done a transfer, then you can save the hashes of the transfer for a later verify. This can be useful because after you've performed a copy (which will automatically trigger a verify per your checkbox choice to always-verify), you might wonder later on if the copy has been tampered with or otherwise touched by some program. (or if the source was still in use, if the source changed during the transfer)

    robocopy is lacking some features such as dumping the hashes after a copy for a re-verify. A dry run can be used for this, but it's much slower and will not help in checking if the source has changed over time. Importantly though: robocopy is very verbose and doesn't sort the output for errors after a run. So after you've seen there's an error in the summary, you have to dig through the output to find exactly what went wrong.

    That said, I think the reason I like TeraCopy so much is that it just-works. The developer has dealt with the rights problem with the service design, presents all data readable / sorts the output, and logs all recent copy operations for review. It even automatically switches to detail-view on errors and doesn't close the window if they happen after a copy.

    If a copy fails, you're immediately thrown a what-went-wrong on your screen which focuses just on that.

    No problem, it's actually pretty fun to talk about storage and redundant copies, backups, etc, with someone else who appreciates them. ;)
    Last edited: Jan 24, 2018
  12. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    By 'single serial copy' you mean /MT:1?

    'On a first time copy to the server', earlier in the thread, didn't we say we were going to move my stuff across onto the server by doing rsync though, not robocopy?

    I understand that fragmentation is a bigger issue for something that has an OS rather than just an external drive for emergency backups, but wouldn't it still be worth doing the external backups as fragmentation-free as possible anyway? Only thing to lose is time right if it's copying at /MT:1 instead of /MT:8?

    Is there a way to completely disable /MT? Any benefits to doing this?

    The more I think about this, the more I think I'd prefer to just move the whole game folder across to the server once when I finish the game and I'm done playing it. The concept of having the save file on the server needing a network connection to be read and written to rather than being on the local machine keeps bugging me (network lag/choke). I don't know how often the game will be accessing it through the server and maybe causing slight performance drops in the process (shmups usually have a slowdown/dropped frames percentage at the end of the game, and I'd like to keep this as low as possible rather than doing things that might increase that percentage). Besides, you suggested I could do a copy of my gaming rig's OS stored on the server in case of the SSD failure in the gaming rig. Well the games I play are on the SSD, so it's like I'm taking an automatic back up of them like that right?

    www-data has no shell login-privileges due to the 'www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin'? I didn't quite understand what I'd have to do though if you're telling me the setup is usually automatic. What changes would I be making to my folders that could create a security hole? I don't tamper with the security of the folders and the like, I just store stuff (games, images, music) in them and play.

    Haha thanks for the compliment, but believe me I'm no power-user, just an addicted gamer that will go to extreme lengths (even jumping into the unknown) to make sure his virtual world is safe :) While I can't say I don't enjoy it (the potential a pc holds is fascinating! such a marvelous piece of equipment), I am nowhere near as knowledgeable as you or the other gurus. Some of the threads and terms I read here on guru3d are just a foreign language to me (especially when it comes to editing registry values, command prompts, and stuff like that).

    Yes so far my experience with TeraCopy has been pleasant :) Yeah I consider myself a GUI person, so I really appreciate how the program is set up. Besides, it basically replaced my standard copy and move in Windows Explorer with an automatic verify after each copy/move, so it's VERY simple to use.

    You're right, I launched as Admin and most of the grayed out boxes are selectable now. Yes the copy handler is set up, but I don't think the association is:

    Isn't the MD5 check one of TeraCopy's best features, so I think I should click on the Associate button? What exactly does it do (associate .sfv and .md5 I mean)? (I didn't click it yet, don't want to be doing associations if I don't know what they're really doing)

    So I should keep 'Disable Asynchronous transfer' unchecked, and only check it if I'm having trouble with TeraCopy?

    Yep, I did a copy through the TeraCopy program rather than through Explorer and the Verify option became available. This is the same as the 'Always test after copy' feature I assume.

    I didn't quite get why I would need to re-verify the hashes afterwards, if I moved or copied something and it got verified already. Like what programs could tamper with the file? 'or if the source was still in use, if the source changed during the transfer' I would never move or copy something while it's open, I'm very cautious on these things.

    I still don't understand what the Save checksums -> Source/Target/grayed out MD5 does :/ Is this to do with the saving of hashes for later re-verifies you mentioned?

    Hmmm that's some negative stuff about robocopy, So can I save a log somewhere out of the command prompt for later viewing to make sure everything mirrored correctly?

    Yes from the testing I did with TeraCopy I'm happy with it too. I've yet to try it on an actual backup yet, but I'm expecting very good results. It looks like a very well made program.

    Btw, in TeraCopy Options, there is a drop down menu, with several options, some of which MD5 (default), SHA-1, Panama, Whirlpool etc, should I leave this on MD5?

    If moving or copying a file (with TeraCopy installed) by doing copy and paste rather than opening the TeraCopy program and using it to locate the Target, I'm still getting the same full benefits (like the MD5 check)? (I'm still getting the verify bar after each transfer doing it this way)

    Yes it really is, albeit time consuming (I've been almost 2 hours typing this out, it's 3am here lol), but all good things require effort they say. Lots to appreciate in something so wonderful, especially 3 way mirroring paired with the protective features ZFS provides, I simply can't get over how amazing that is and can't wait to have one xD
  13. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Most likely with rsync to that machine, yes, since the disk will be directly connected to the FreeNAS box (by SATA, temporarily). Though you'll still be using robocopy with the external HDD's on your desktop for now until then.

    --Thankfully the two (robocopy and rsync) are pretty similar, and it'll probably only take you a few minutes to make the leap to using rsync.

    /MT:1 is effectively disabling it.

    The cost is time, yet it's also just a trade in time to perform the backup vs the time to restore it (mainly in the initial backup). In the ideal case you don't use the external drives for a restore once you have the NAS, so from this stance it just doesn't matter what the state of the disk is in fragmentation.

    I'd assumed a full-disk backup of the gaming rig from time to time [which might be offline (manual) ], with something like Acronis TrueImage or Paragon HDM (from their LiveCD bootable environment, over network to the NAS). Automated (online) backups carry some risk in that you're not sure what the machine is accessing at the time [while the backup is in process the OS can still be writing data to the disk]. Essentially with Windows Update and automatic maintenance tasks going on in the backround, it's possible to capture a partial write unless the state of the OS partition has been completely frozen.

    EasyBCD is free for personal use and can automate creating RAMDISK bootmenu options (suitable for both restoring and backing up). It works for all WinPE based disks, and it's possible to create WinPE bootables of both of those above mentioned backup tools. So, annoying as it may sound with loading a LiveCD -- you can get that process down to just rebooting the machine and choosing a startup menu choice (without inserting the CD), Paragon can even be setup to automate the network-drive mounting of the NAS.


    ^ Just to give an example here of what I mean by this. That third boot menu choice goes straight in to Paragon, and the Paragon image on this machine is built / setup to auto mount a network volume just for this machine's backups. **When I run backups of this machine, I pretty much reboot -- boot to Paragon, select backup for the SSD boot-mirror (Windows Dynamic Disks), and go to sleep.

    --You could probably setup an automatic sync of your savegame folders (this should be very safe), as a supplement to doing a full-backup of the OS (offline snapshot) whenever you make significant changes (installing new software or updates).

    "www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin" -- yeah, this is the bit that prevents a shell login.
    --If I were to try to login to that user, even from root, with "su www-data" ... then nologin is launched, prints out text that login is not allowed, and immediately closes. It's a password-less account with its default shell set to "nologin", which is a program that just closes itself on startup.

    From the apache server case on files: /var/www/html is the default location for the web content (or in my case /content). This folder is setup (by the installation) to be owned by www-data with very restricted rights. If I were to just move files in to that folder, I would have to be careful that I don't just sudo mv (or cp) <files> /var/www/html (sudo is essentially granting that copy / move admin-rights). By just migrating these files with admin rights, they're either preserved rights (or in the case of cp, owned by root) of potentially executable privileges or otherwise (which they probably shouldn't have). The same goes on folders that I create, which may have high privileges. (higher than is appropriate)

    --When in doubt and without a reason to do otherwise it's a good idea to match what was set on that folder. (eg, drop the privileges of everything you put in there, and match file ownership to "www-data") For instance, we could just give permission for everyone to write and read those files and leave them owned by other users ... but we'd also have to think through why we're doing that and if it's creating any security risk in doing so.

    Nowadays everyone seems to think that complex is better, yet there's an elegance to simplicity. The "just-works" aspect should never be overlooked. :D

    Interesting that this isn't set right after an install. Always possible that TeraCopy's installation has changed a bit over time, or I may also just not remember as it's been awhile.

    Anyway, you'll want to make that association. What the association will do for you, is make TeraCopy the default program for opening those file types and thus to re-check copied-data after you've saved some form of a hash dump [from an already performed move or copy operation].

    Textual shell programs can have their output piped with: > "filepath"
    ^ This redirects what's written to screen (what you'd see) instead to a file.

    Another way is robocopy's log command: /log:"filepath"

    --Certainly not as straight forward as TeraCopy to have to manually produce logs, and then read them if you see any summary issues.

    MD5 is probably good-enough and definitely the most compatible with other tools. However, SHA is arguably superior.

    You got it, leave it checked.

    This checkbox just makes TeraCopy do more work (use a thread for reading, and a thread for writing -- maintain its own read-ahead buffer), which may or may not actually speed up transfers ... yet may also result in TeraCopy doing some work redundantly [that the OS would've already been doing transparently for it]. It's a matter of preference thing: Whether or not it actually helps depends on the OS's caching settings, yet at the same time it won't hurt transfer-speeds and in the worst case it should just cost a bit more RAM during a copy or move.

    -Long as you see TeraCopy pop up instead of the default Windows copy handler, then you're benefiting from the verification it provides.

    Should be the same as the automatically triggered verify.

    It's hard to come up with legit situations that aren't user-error (forgetting to close a program). Definitely forgetting to close something and realizing at the last minute that a file might still be open is going to be the main reason to re-verify. [That is, rather than re-copying everything]

    The best example that I could give is if you're backing up a Virtual Machine LIVE (online), and you're testing a setup of OverlayFS on the guest installation, or if you've shut down all services on the machine (stopped all writes). You can perform a copy, then re-run a verification of the source to see if the disk image has continued being modified afterwards (which it's not supposed to be). If everything has stopped and all files are closed, or if the main partition is read-only, a re-test can be used to prove that this snapshot (of the VM's disk) was successful (nothing was altered afterwards), ruling out any potential for a write mid-backup.

    ^ Or in otherwords, if the re-verify passes then you know your backup is sane and can be safely restored. With VM's this is a big deal, as the file system of the VM is stored as a single large file on the host. A write mid-backup could corrupt more than just the file being written.

    Yeah, the irony is that alot of people (including gamers) don't seem to think of their data as that important [even if it's literally their life's work]. Most people are just storing everything on the Cloud with Google Drive, OneDrive, DropBox, etc, or don't think of it as in the case of Steam and Steam Cloud savegame sync. Yet that's putting your precious data in the hands of someone else and offloading the responsibility of taking care of it. I definitely don't like the thought of putting my precious files (which for me is programming projects, account passwords, game backups, and scans) in the hands of someone that I don't even know what technologies they're using, or what they're doing to keep it safe. Especially in the case of a free cloud service or cheap service, I doubt they even feel they have much obligation to us.

    Oddly I used to be that way, and I've actually lost several months of work numerous times -- before doing a 180 and getting anal about data.. I think it's one of those aspects of that you just don't start to take it serious until you actually lose something important and it sinks in what this is going to cost to replace. It's amazing how much exists out there to protect data, and also how few people actually use all these tools at their disposal. (despite that they could be, even small scale for ultra precious data at not that much money invested)
    Last edited: Jan 28, 2018
  14. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Awesome, plus rsync also has the 'ro' command (read only) which will guarantee I don't accidentally delete something from the main storage drive in the process till I move everything over.

    Glad to know /MT:1 is it disabled, makes sense since it will only using 1 thread.

    Yes ideally the external drives will be there in case of catastrophic failure to the server, kind of like emergency back ups, that is why I would like them to be in the best state as possible because they'll be my last line of defense if something goes wrong with the server.

    You assumed correctly, I'd prefer the manual back ups, as you said, I don't want it to be doing something as sensitive like that while I'm gaming and recording at the same time. I manually check for Windows Updates too, don't want them going off while I'm gaming (the less background processes, the better).

    Is this the same as Acronis TrueImage? That's what I read users usually use to do backups of their systems.

    About the RAMDISK bootmenu, will this be sitting in my RAM consuming a certain amount of my space constantly?

    Thanks for the explanation with the screenshots, most of these terms are new to me though. Like what do you mean by 'auto mount a network volume', I'd prefer doing my back ups manually rather than scheduled. How long will it take to do this then if you go to sleep in the process? My gaming rig's SSD only has 100 GB of data in it including the OS, I assumed it would be done in about an hour.

    I'd also prefer doing this (save-game files) manually, wouldn't want the scheduled syncs hitting while I'm doing something intensive. Yep don't worry, before doing anything critical I always take a full back up, that's one of the times when to be most cautious as something could go wrong.

    Thanks I really appreciate the complex explanation, but how will this really be useful for what I'm doing? I'm the only one that's going to be accessing the server, so I'll want rights and access to all my files at all times :/ Am I going to be using apache? Or are these security privileges on all data stored on the server regardless of the program?

    Indeed, it also increases workflow the less complex something is. TeraCopy really is awesome so far, thanks for suggesting it.

    Awesome, thanks, I made the association :) Yes it truly is weird that something like that wouldn't be enabled by default, when it's one of the core features of the program. (the verify feature is what made me buy it)

    Yes certainly not as straightforward, but since I'm going to be using 2 back up drives for the same data, it's worth doing one as a mirror (rsync/robocopy) and one with TeraCopy wouldn't it? This safeguards against any issues one program might encounter.

    'most compatible with other tools', what other tools would these be?

    Yep I did some quick reading on SHA, and it does seem that it is superior, so why would I want to leave it on MD5? What about the others (Panama, Whirlpool, xxHash etc). Which one do you use?

    You mean leave it unchecked? Checking it will Disable Asynchronous transfer.

    I could see where this could be an issue for someone running VMs or live software, but I don't think it's something I should be concerned about. Believe me I quadruple check that everything is safely closed and prepared before taking a back up, it's worse than if I had to launch a shuttle. I bet that not even NASA have so many checks and safety procedures haha :p

    Yes it really is surprising to see how many people really don't give it the importance it deserves, especially when it's your life's work! So many hours, be it on gaming, or work. I completely agree about your argument against cloud storage. I'm really not a fan of it, 1. it's my precious data in the hands of others, 2. they won't take the same preventive measures I would, my data to them doesn't hold the same importance it does to me.

    Ouch I'm terribly sorry to hear you went through months-worth loss of data :( You seem like your data is very important to you too. Glad you're taking the necessary precautions now at least, you've really learnt from your negative experiences :) Haha perfect description 'getting anal about data'.
  15. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    This was mounting a drive (different command) as read-only, per the "ro". rsync's protection is that it can be set to obey volume / device boundaries in a copy. (eg, no risk of the following symbolic links between disks and the mess that can happen if the destination crosses paths with the source)

    -Though yes, in general I'd say that rsync is more powerful than robocopy.

    Yeah, fortunately though fragmentation is natural. Nothing wrong with heavy fragmentation other than that it reduces performance a bit, yet it doesn't jeopardize file integrity in any way.

    A bit different, EasyBCD is a Windows boot-loader customization tool. You can use it to make it a simpler process to manually load a bootable CD image (rather than inserting a CD), such as for part of your routine backup process. EasyBCD can be used to make it simpler to load Acronis or Paragon HDM's LiveCD's.

    Far as RAMDISK usage, only when you're using that bootmenu option (when inside TrueImage or Paragon).
    --The reason that you want a RAMDISK in this case is because otherwise it'd be unsafe to restore an image that overwrites the partition the backup software was loaded from (it's in use). Or in otherwords, reading the LiveCD to RAMDISK, then loading it from RAMDISK allows the partition containing the CD-Image to be written over without disrupting the backup software.

    When you run your backups after you have a NAS, you'll be storing them to the NAS rather than an external drive. Most likely that means an IP address to connect to (the NAS' address), network setup (for the LiveCD to connect to your LAN), potentially a mount-point (default directory), and login credentials for the volume (some user / password / account). Paragon's LiveCD creator can embed those details in the LiveCD, meaning one less thing to be re-entered every time that you manually produce a backup. (load the LiveCD, and automatically your NAS network volume is mounted, ready for storing the new disk image)

    Backup process of the SSD mirror that I run usually completes in ~40 minutes. That's an OS partition on Windows Dynamic Disks mixed with an ReFS mirror that contains all installed software (programming tools and a few games that I'm playing). This is well over 100GB of data, so I think it's very fair to assume you could finish a backup in around an hour.

    -I could just as easily start a backup and then go out for food, yet I just do it at the end of the day generally.

    Ah, well, in running a file-server somewhere down the road you're probably going to want to have easy access to videos on it. (for TV playback, on laptops, tablets, etc) When you do that you'll probably configure a web-server (like apache), and might move on to configuring some PHP scripts with an embedded video-player [on the NAS itself]. (that'd be hand-editing territory, so at this point you'd be dealing with that stuff like rights / permissions and the security implications of any changes you make)

    That might be a good idea just to get a feel for using them all, though they're fairly trustworthy (all of them, provided you have them check data once copied). You can expect they'll report an error if something does go wrong.

    -md5summer is one of the popular free ones, though I personally use SHA for everything.

    Yeah, that was a brain fart, my bad. I probably meant to say to leave asynchronous transfer enabled. :)

    Anal is really the only proper word I can think of to describe it. Both in how you feel when losing a project (that you have to maintain, meaning you have to rewrite the entire thing) and how you get about it thereafter, lol.

  16. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Right, got it for the 'ro' command. It's not actually an rsync feature, it's the way I'll be mounting the storage drive till I move everything across for the first time. That protection about the boundaries sounds useful. What would I do to activate this feature?

    Glad to know, especially since rsync will replace robocopy once I have the server.

    As long as it doesn't affect the integrity of the files that's all that matters. So it doesn't matter what /MT:n is if copying to an external drive, but if copying from the internal drive onto something that will use an OS (like the server) it makes a difference because it could cause a performance hit, so a lower (preferably /MT:1) is recommended. If I'm going to be using rsync to move my data across though, I'm not going to be using these different types of /MT: on robocopy. (I'll only be using it a couple of times to mirror the 4TB drive onto an external, but not the other way around).

    That's cool about EasyBCD, I'll definitely take note of it and look into it when everything is up and running! :)

    Wow that RAMDISK sounds more important than I thought. If it's only while using the bootmenu in TrueImage or Paragon then that's awesome cause it will leave the RAM free for when I do need it :D

    Yep, definitely storing the OS backups on the NAS. *gulp* that sounds like a lot of work, good thing the LiveCD takes care of it for future use (I'll still have to set it up myself for the first time I assume though x_X). So, will the Paragon LiveCD come included in EasyBCD, or something I have to download separately? Will I need Acronis TrueImage if I'm doing this procedure with EasyBCD instead?

    That's great, I'd prefer it being done before I sleep in my case, in case something had to go wrong, I wouldn't want to wake up to find an error the first thing in the morning, horrible way to start the day :p

    I think that's going a bit too in depth for what I intend to use it for :/ (remember to not take simplicity for granted as we said a couple of posts ago :p) Seriously though, all I need is a shortcut to the server on my ECC desktop and gaming rig, and I could just choose a video player from there by going on a video file -> right-click -> Open With (VLC, MPC-HC, Splash, PotPlayer etc). This will work right?

    '(for TV playback, on laptops, tablets, etc)', I don't really like flaunting my data around like that with multiple things that could access it, I'm more of an everything-has-dedicated-roles kinda guy. Besides I'll always have one of my PCs on (gaming rig, or ECC desktop), so I won't need to access the server from a tablet or any other device, could just do it from the PC straightaway. Tell me if I'm missing something here, I know some of my ideas are primitive.

    By 'have them check data once copied' you mean:
    1. TeraCopy does it on it's own with the auto-verify I have checked, correct? (nothing else to do apart from this?)
    2. Rsync by running the --dry-run command?
    3. Robocopy? No idea.

    I don't get what the purpose of -md5summer is. Doesn't TeraCopy have an option to do this in itself?

    Awesome, care to elaborate more? I read that it is more complex/difficult for something to break in SHA rather than md5 (due to the 160 bit output vs the 128 bit md5 provides), so it seems like SHA is a like an advanced version of md5. So, SHA-1, SHA-256, SHA-512? Common sense would say SHA-512 cause it seems more recent, but common sense isn't always correct :p

    Lol no worries, happens to the best of us ;)

    Lol yeah, paranoia 101... horrible even thinking about it, brrr. So many sleepless nights would follow.
  17. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    From the rsync man-page for FreeBSD:

    Check out "-x, --one-file-system"

             This tells rsync to avoid    crossing a  filesystem    boundary  when
             recursing.   This     does  not limit the user's ability to specify
             items to copy from multiple filesystems, just rsync's  recursion
             through the hierarchy of each directory that the user specified,
             and also the analogous recursion on the  receiving  side    during
             deletion.     Also keep in mind that    rsync treats a "bind" mount to
             the same device as being on the same filesystem.
             If this option is    repeated, rsync    omits all mount-point directo-
             ries  from  the copy.  Otherwise,    it includes an empty directory
             at each mount-point it encounters    (using the attributes  of  the
             mounted  directory  because  those of the    underlying mount-point
             directory    are inaccessible).
             If rsync has been    told to    collapse symlinks (via --copy-links or
             --copy-unsafe-links), a symlink to a directory on    another    device
             is treated like a    mount-point.  Symlinks to non-directories  are
             unaffected by this option.
    Right, as long as you don't use parsync or msrsync, you should be fine (since rsync usually writes a single file at a time). rsync is more focused on network transfer than robocopy is, and it's a file by file transfer (over a single stream) to the receiving end usually. Though there are several implementations of multi-session & multi-file at once, rsync tools for BSD and Linux (to speed this process up over a long distance / high latency link).

    For anything that needs to touch all partitions on the machine they're pretty awesome (like in a restore-job). LiveCD's of partition-tools and backup-software are one of the best uses of a RAMDISK.

    Nah, not a lot of work to enter everything in (since it's like signing in to a website). The problem is that it's like signing in over and over, and it's just very repetitive if you're doing this every week or more.

    EasyBCD is not backup software, so you'll need either TrueImage or Paragon in addition to this. Acronis' bootable media can mount the NAS similar to Paragon's, yet I am not sure if there's any easy way to embed login credentials with Acronis in their bootable media. I use HDM (Paragon) instead of TrueImage as the latest versions of TrueImage cannot backup and restore partitions to / from Windows Dynamic Disks (neither WinPE nor Linux bootable). This part probably won't effect you unless you decide to do a software-RAID "boot partition" on Windows, both HDM and TrueImage handle Storage Spaces correctly as long as you make a WinPE media. (ReFS partition support from my experience is better in HDM too)

    --Should mention that there's another popular backup tool, Clonezilla, which is totally free. Clonezilla has no support for a WinPE bootable media though, and it doesn't do automated Peer2Peer adjustment of Windows OS's. (In jist, HDM and TrueImage have much better Windows support [which is what makes them worth paying for] )

    1) Yep, TeraCopy is automatic.
    2) rsync by using the "-c" command after a copy is done. (repeat the copy process with "-c", and optionally --dry-run)

    rsync will not read-back data after a copy under any condition (it doesn't perform a check and assumes the OS has written the data successfully to disk). In a re-copy without "-c", it will not check for changes to a file as long as the timestamp, name, and size match [which they will after the first run of rsync]. That "-c" forces the file to be read on the destination (and source) for a hash-comparison to be made to test if files have changed. (it skips the timestamp and size comparison)

    ** rsync will at least (over network) verify that the transferred data matches the expected data from the source, and protects against corruption in this regard. It just doesn't check that the disk has successfully written the file. (just trusts the OS has written it correctly)

    ^ Or in otherwords, if "-c" is used and a re-copy is attempted then nothing will be re-copied. (presuming the files match the dry-run command isn't even needed)
    eg, look for a blank output and no files copied on the second run as meaning success.

    3) for robocopy, use "/Z" and a second copy attempt (similar idea to with rsync). In restartable mode, it should perform additional integrity checks for potentially terminated connections.

    --Of the three, I trust TeraCopy and rsync the most, although in principle you can force them all to get the job done if that makes sense. Just it's much much easier with TeraCopy.

    Yep, this functionality is built in to TeraCopy. Md5summer is a standalone implementation of just that one aspect of TeraCopy: the hash checking, generation, and saving. It's general purpose is checking if directories and files match, comparison of folders (potentially between remote computers), or saving hashes of files for a later comparison. (all of which of course, TeraCopy can do on its own)

    For an example though: You could use it for generating hashes of a game folder before running updates, then checking afterwards to see what the update process modified. Or, to see if your friend's game folder matches your copy (if he's having problems loging in to a game, where you're not).

    There's far-less data in a hash than the real-thing (the set of data that it was generated from). The key purpose of a hash is to try to give a "unique representation" of a set of data, without being able to reconstruct the original, and importantly in significantly less-space. These hashes are used to verify data, either for proper transfer, against tampering, damage, etc... The problem is that there can (and almost assuredly will) exist duplicates (people tend to call these collisions) as the size of the the data you're hashing is increased. Likewise, the larger the hash used, naturally the lower the likelihood of duplicates.

    Your assumption is correct on that SHA-512 would be the most secure of the ones you listed. Take a look through the wikipedia page here if you've not, and note the output size of each hash algorithm.
    --In general though (and it's an oversimplification) it's usually safe to assume that the larger the hash, the more security it'll provide.

    Don't get me wrong here, of course all of these hashes will catch "most" file-corruption. We're just talking of some edge situations that might slip through the cracks where if the data was damaged in a very specific way, it "might" pass as valid (be a rare duplicate case).
  18. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Thanks for the link to rsync's page, it's very useful detailing what all the commands do (though quite difficult for a newbie to understand). Earlier in the topic (post 67) when we were discussing mirroring, you told me 'In rsync, use "-av"', how come we switched to -x, --one-file-system? Is -x, --one-file-system an additional command I include with -av? I noticed from the rsync page you linked that -av does not include -x in it:
    -a, --archive            archive mode; equals -rlptgoD (no -H,-A,-X)
    Sorry this is getting very confusing x_X

    I read up a bit on parsync and msrsync and it looks like they're the 'multi-session & multi-file at once' tools you are referring to? Speed really isn't an issue for me, integrity, that's all that matters. I'd prefer going for a single file at a time to make sure there's the highest possible chance the file stays intact.

    Define 'long distance'. (It could be perceived differently, like, not in the same room, a corridor away, a link over a multistory office?)

    Got this, I definitely think of them in higher regard now :)

    Oh ok, just sounded like it was more of a tedious process to set up.

    Got it, so I'll need EasyBCD and Paragon.

    Thanks for the info about Clonezilla, but think it's best to go with the EasyBCD + Paragon route. I'd prefer having the support for the bootable media, and I've heard lots of good things about Paragon.

    1. Hooray for TeraCopy, simplicity at it's best xD
    2. Omg rsync doesn't look easy to use. . . so if I got this right, the -c command does an md5 check (like TeraCopy's verify), over the whole mirrored data? (rsync uses md5 according to the rsync page you linked)

    What's the point of a dry run then?

    3. Sounds good, thanks, and I am starting to see similarities to rsync. So how come robocopy is the least preferred of the 3 programs if it could still perform these integrity checks?

    Yep I definitely agree that TeraCopy is the simplest to use. It's as simple as Copy and Paste xD

    Ah, that's an interesting example, while not beneficial for my use, I could see the use of it for someone needing to troubleshoot certain stuff. Thanks for clearing it up a bit more :)

    That's very interesting, so, what would happen if a duplicate hash/collision is detected? (For example, in order of how they were saved/created, we have 2 files: 'file 1' and 'file 2 million' get the same hash. Would 'file 2 million' get deleted and copied with the data that's in 'file 1'?

    Regarding the duplicate where you state 'almost assuredly will', it sounds almost unbelievable that a collision WOULD happen from a hash with 128 bits, see how many different combinations of hashes that could generate, but you really seem sure of it. That's quite worrying. So as per my example above (file 1 and file 2 million), what would happen if 2 different files get the same hash?

    So, I should just set TeraCopy to use SHA-512 and forget about it? (not consider the other options like Panama, Whirlpool, etc)

    Yep I read through the wiki, but not sure if I got it right. I looked at the output size, and the difference between md5 and SHA-512 is that md5 is 128 bits, when SHA is 512 bits, does this mean that each file is being split into that amount of bits (128 or 512, depending on the algorithm used)? Or are these 512 bits being saved somewhere on the system so it could hash from? Sorry if my question is daft or doesn't really make sense, I get quite lost with the in depth stuff (that's why I told you I'm not a power user before :p I know my way around a pc and stuff to get games working and some other programs (thankfully with GUIs), but never had to go in depth on how something works and all the technicalities (like checksums, bits, and code, etc))

    Thanks once again for all the time and information :) I showed this topic to a friend of mine, and he said you're like a mentor lol ;)
  19. A2Razor

    A2Razor Master Guru

    Likes Received:
    ASUS R9 Fury X
    Yep, this is yet another command line argument that you'd need to tack on there.
    eg, "-avx", you could also write it as "-xva", or any other ordering ("-xav", etc) -- arguments can be clumped together when writing them out.

    "-a" is a collection of arguments short-form (just to be easier to type out) for the common process of cloning files between locations. (while preserving their attributes) Yeah, it can be confusing (it really is), but that's why I looked up each of those contained arguments in -a whenever trying to list it.
    (it's a whole alphabet soup, and you just have to take it slow "letter by letter" to grasp the meaning)

    --One thing to be careful of here is the case sensitivity. Note that "-x" is lower-case, and lower and upper case arguments in rsync can (often do) have different meanings. This is very different from robocopy, and more of a unix-thing since everyone is used to case sensitivity on the unix side.

    (*something to keep in mind when you're using FreeBSD or Linux*)

    Nah, not a multi-story office building.

    Think of this more like if you have a regional office in London, and one in Singapore, or HK, or Japan, or in Texas. As latency increases (to the hundreds) then rsync will slow down tremendously (so does FTP and most other protocols really), you can get back some of that speed by splitting the transfer in multiple-segments. We're talking of cross-continent latencies here where things start to get really sluggish, so over the Internet between sites, you might consider one of those other extensions on rsync.

    -It's something that you won't have to be concerned with on a LAN ever.

    Paragon is awesome. Alot of the "little stuff" is done for you with it, like disallowing display-timeout by default. That's a big deal if your monitor is connected by DisplayPort and if it won't recover from a timeout without proper drivers (stay black). Acronis has this problem, even in the 2017 / 2018 edition (on WinPE images).

    It's by far the least hassle backup software since they've thought everything through for you, kindof like the TeraCopy of the backup solutions. Seriously I can't praise it enough for just-working.

    "-c" disables skipping files if they're equal size and timestamp between source and destination. The fallback is to do a hash compare which in turn forces all files to be read-back on the receiving end.
    "--dry-run" is for if you don't want to correct the differences that're found. So you'd specify -c coupled with dry-run if you just want to know if there's differences and nothing else.

    ^ However, in the case of a sync, if there was a difference you'd probably both want to know and correct it, so it's safe to omit here if that makes sense.

    -Yep, rsync will use md5.

    rsync is built ground-up to be network-wise a client and server setup run over SSH as its backend (although it can also do local-copies disk to disk, or within the same disk). It has security, integrity checking (over its network communication), etc, pretty much "built-in". robocopy doesn't have that network focus, it can use a mounted network drive, yet it lacks pretty much any integrity checking and security on network transfers.

    For local syncs drive to drive robocopy may be about as trustworthy as rsync, yet it lacks a ton of the network features that rsync has.

    ^ I find that usually when I want to perform some mirror / sync operation, it's usually something that's to a remote server (such as for our website on a datacenter machine) rather than to an external drive or the NAS (hence my preference to TeraCopy for most everything else). I have very restricted SSH access to our website for example, and I never would open up a Samba share (for Windows file-sharing) on there (due to the security risk), so rsync is the only real option. (as rsync can use SCP, SFTP, SSH, whereas robocopy cannot)

    If you have these two files, let's elaborate on the example a bit:

    Source: pizza.dat
    Destination: pizza.dat

    Both of these files let's say have identical hashes (under whatever hash we've selected to compare them with). Let's also say that both of these files have different data that just happens to overlap. So in otherwords, we have two sets of data in files that happen to (very very rare in practice) result in the same hash being generated.

    For what happens:
    Since we're relying on the hash to determine if those files are identical, they infact pass as identical (despite that they're not), and the changes are not copied by the sync operation. So, after the sync, we have two files that are thought to be the same, and still aren't.
    ----As you may imagine this would be very bad if it happened, yet it's fortunately also insanely unlikely to.

    In being sure of it, I'm talking of that a collision "can" happen -- that collisions exist you can count-on. In practice, even if you used MD5 for file comparison, odds are quite strong you won't run in to one of these cases. Why it's a big deal from the stance of security is if someone can find out how to generate (procedurally) the sets of differences for a hash that will pass... Essentially it's not so much of a concern of this "naturally" occurring, but rather if someone can exploit the hash-algorithm to make undetectable changes to some files (in terms of software protection) in such a way that they bypass the hash check.
    eg, bypassing a signature while modifying the files

    Best way to see it first hand would be to play with each of the outputs with a tool such as 7za (it's the part of the extras / command line version).

    "-scrc" sets hash type, "h" generates hash.

    Here's an example:

    C:\users\x\desktop>7za h -scrcsha1 YAP_BETA_Setup.exe
    7-Zip (a) [32] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04
    1 file, 4330927 bytes (4230 KiB)
    SHA1                                              Size  Name
    ---------------------------------------- -------------  ------------
    5EFAB3AC1EDEAB46E181E6A82D9E998D70C1CE5F       4330927  YAP_BETA_Setup.exe
    ---------------------------------------- -------------  ------------
    5EFAB3AC1EDEAB46E181E6A82D9E998D70C1CE5F       4330927
    Size: 4330927
    SHA1   for data:              5EFAB3AC1EDEAB46E181E6A82D9E998D70C1CE5F
    Everything is Ok
    C:\users\x\desktop>7za h -scrcSHA256 YAP_BETA_Setup.exe
    7-Zip (a) [32] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04
    1 file, 4330927 bytes (4230 KiB)
    SHA256                                                                    Size  Name
    ---------------------------------------------------------------- -------------  ------------
    6D1E719378E77793C5FC5DCCE9A484664FFF871A23C8F6DB0B8BDF6DA439B9BB       4330927  YAP_BETA_Setup.exe
    ---------------------------------------------------------------- -------------  ------------
    6D1E719378E77793C5FC5DCCE9A484664FFF871A23C8F6DB0B8BDF6DA439B9BB       4330927
    Size: 4330927
    SHA256 for data:              6D1E719378E77793C5FC5DCCE9A484664FFF871A23C8F6DB0B8BDF6DA439B9BB
    Everything is Ok
    ^ Note that the output length is actually longer from SHA-256. So we're talking literally the lengths of the generated-result of running the hash algorithm on each file.
    With more possibilities for an output, usually that implies less chance of an overlap simply due to the larger set.

    All hashes are designed to run start to end of some data-set. You can break a file in to chunks and generate many hashes for a single file (parts of it), which is what rsync does in its network transmission (to know earlier if there's a problem rather than wait till the end). The file comparisons of tools between destination and source are usually single-hash though. (for an entire file)

    --Though yes on what you asked, SHA-512 is a very solid choice. Selecting it won't do you wrong.

    Ah, maybe you too can get him hooked on storage and convince him that he also wants to protect his data. :D
    Last edited: Feb 2, 2018
  20. 321Boom

    321Boom Member Guru

    Likes Received:
    GTX980 Ti
    Sorry for the delay in my reply, it's been a really rough week.

    That's good to know about rsync, thanks, especially for the advice where case sensitivity matters.

    Yep I tried reading through all of the different commands, but some sound so similar to each other/could be duplicates that I wouldn't know which set exactly to go for (kinda like how in Robocopy /MIR is /E + /PURGE), then some other commands are in an alien language to me haha, I'm a gamer not a programmer :p Plus this is my data at stake, so not something I would particularly want to test on, and would prefer a foolproof/already tried-and-tested method.

    Ah ok that's good to know. So having it in the same room or a room a corridor across won't make a difference in my case then. Thanks :)

    Lots of praise for Paragon, especially with it being the 'least hassle'. The more something works out of the box, the better. Noted :)

    So to verify that the data got copied across safely, would it be best doing a dry-run (based on size and time-stamp), or with a hash compare (from -c coupled with dry-run)?

    That's quite disappointing knowing rsync uses md5 :( Why haven't they implemented better methods like SHA if it's the go-to-software for mirroring on FreeBSD?

    Thanks for a more detailed explanation on some different features between rsync and Robocopy, interesting to know it was built for more than just mirroring.

    Wow sounds like you're really into the complex stuff and very knowledgeable with workarounds. I could see how rsync is more beneficial in your case.

    As long as one file won't overwrite the other different file (seeing as they both have the same hash) that's fine with me. I know the odds of a collision are low, but on 24TB it might be a higher risk?

    Thanks again for the detailed explanation, it is clear why SHA-256 is superior to SHA-1 (and md5 for that matter), a much longer hash will definitely decrease the chances of a collision. So I could see where SHA-512 will be even better!

    Awesome, I set TeraCopy to SHA-512 :)

    So, some more questions about TeraCopy:
    1. When moving documents across all timestamps match, but when moving video files the Date Accessed changes to the date I moved it :/

    Original File Copied File

    2. I set SHA-512 in the Options tab in TeraCopy, both for Transfer, and also in Test:
    It's good that it's set for both, right?

    Also, what is 'Save checksum file on finish' under Test? Mine is currently unchecked. (beneficial for my use?)

    3. Regarding the prompt window that comes up if it detects a duplicate file:
    I would like to keep both, but the problem is, 'Keep both' is not an option in the 'Prompt on file name collision' tab in the Options tab:
    It would be convenient to not have the prompt pop up all the time when doing routinely back ups. I'm thinking all the Overwrite, Skip, and Replace options are not what I'm looking for since I want to keep both files, so will 'Rename all copied files' act like 'Keep both'? What would be the differences? It shouldn't perform exactly the same because 'Rename all copied files' is also an option under 'Keep both' in the prompt window. I assume there are differences between them, yet testing this out, both were seeming to have the same effect (the new copied file ends up with a (2) after it's file name, example New Text Document (2)).

    Haha yep, he already gave me the task of building his server for him once I'm done with mine :p Glad to help out a good bud though once I've got the knowledge (and paranoia) to do it right! ;)

    Thanks once again for your time and informative replies, have a great weekend :)

Share This Page