Since WindowsNT when Microsoft first released NTFS, a journaling file system, I've had a fascination with the USN Journal. I worked on the Primos operating system at Prime Computer and have always been interested in file systems and scheduling, virtual memory management, dynamic linking, ring-oriented security, etc. In those days, Multics ran on really big machines. When we implemented Primos, we called it 'Multics in a Matchbox'.
I have been particularly fascinated by the fact that so few commercial products have taken advantage of NTFS's USN Journal. For backup products especially, since typically less than 10% of the content on a volume changes, you can reduce the time to identify new, deleted or changed files to a fraction of the time it takes to enumerate the entire volume.
As for the Master File Table or MFT, you can enumerate all of the files/directories on a volume in about a tenth of the time it takes to enumerate the volume using FindFirst(), FindNext(). I will qualify the previous statement. The times to enumerate the volume are about the same if you reboot the system then enumerate. But if the system has been up and running for some time, then my timing studies show that using the MFT takes one tenth of the time of FindFirst(), FindNext().
I've only run into a couple of folks who have really used the USN Journal in a commercial product. I haven't run into any who have done the work to use the MFT and USN Journal from C#.
One example of the kinds of performance improvements you can achieve using the USN Journal. I worked on a de-duplicated file archiving product that discarded old data by enforcing a ‘data retention’ policy’. You could specify something like ‘keep this data until’ some date. I wrote a function to purge the data store of all ‘expired’ data. In an actual customer account, my first attempt at purge took 30+ days to complete. I wrote a version that used the Master File Table to enumerate the data store and that saved me 90% of a small portion of the time. It was still unacceptable. I wrote a service that used the USN Journal to keep all of the data about the data store up-to-date. The total time it took to purge the data store of ‘stale’ data was reduced from over 30 days to less than 4 hours.
So, here goes.
The DeviceIoControl function gets you down into the device code of Windows. And since you don’t want just anyone messing around in the file system, you can only access DeviceIoControl from a process running under ‘administrative’ rights. You’ll minimally have to use the ‘Run as administrator’ menu item to launch an application that uses the USN Journal. If you are running on Vista you could write the code to elevate your process but your login must still have admin rights.
Two things help us here, PInvoke and DllImport. There are some great articles that describe Platform Invocation services or P/Invoke, and the DllImport attribute and how they allow you to access native Win32Api functions that are not currently accessible in .Net. PInvoke.net is a great resource.
Using the USN Journal requires access to several native Win32 api functions the most notable of which is DeviceIoControl. This discussion is about the USN Journal and the Master File Table but DeviceIoControl opens the door to many device level operations. You can see the entire set of FSCTL-control functions and the structures you need to use them in the winioctl.h file.
#define FSCTL_REQUEST_OPLOCK_LEVEL_1 #define FSCTL_REQUEST_OPLOCK_LEVEL_2 #define FSCTL_REQUEST_BATCH_OPLOCK #define FSCTL_OPLOCK_BREAK_ACKNOWLEDGE #define FSCTL_OPBATCH_ACK_CLOSE_PENDING #define FSCTL_OPLOCK_BREAK_NOTIFY #define FSCTL_LOCK_VOLUME #define FSCTL_UNLOCK_VOLUME #define FSCTL_DISMOUNT_VOLUME // decommissioned fsctl value 9 #define FSCTL_IS_VOLUME_MOUNTED #define FSCTL_IS_PATHNAME_VALID // decommissioned fsctl value 13 #define FSCTL_QUERY_RETRIEVAL_POINTERS #define FSCTL_GET_COMPRESSION #define FSCTL_SET_COMPRESSION // decommissioned fsctl value 17 // decommissioned fsctl value 18 #define FSCTL_MARK_AS_SYSTEM_HIVE #define FSCTL_OPLOCK_BREAK_ACK_NO_2 #define FSCTL_INVALIDATE_VOLUMES #define FSCTL_QUERY_FAT_BPB #define FSCTL_FILESYSTEM_GET_STATISTICS #if(_WIN32_WINNT >= 0x0400) #define FSCTL_GET_NTFS_VOLUME_DATA #define FSCTL_GET_NTFS_FILE_RECORD #define FSCTL_GET_VOLUME_BITMAP #define FSCTL_GET_RETRIEVAL_POINTERS #define FSCTL_MOVE_FILE // decomissioned fsctl value 31 #define FSCTL_ALLOW_EXTENDED_DASD_IO #endif /* _WIN32_WINNT >= 0x0400 */ #if(_WIN32_WINNT >= 0x0500) // decommissioned fsctl value 33 // decommissioned fsctl value 34 #define FSCTL_FIND_FILES_BY_SID // decommissioned fsctl value 36 // decommissioned fsctl value 37 #define FSCTL_SET_OBJECT_ID #define FSCTL_GET_OBJECT_ID #define FSCTL_DELETE_OBJECT_ID #define FSCTL_SET_REPARSE_POINT #define FSCTL_GET_REPARSE_POINT #define FSCTL_DELETE_REPARSE_POINT #define FSCTL_ENUM_USN_DATA #define FSCTL_SECURITY_ID_CHECK #define FSCTL_READ_USN_JOURNAL #define FSCTL_SET_OBJECT_ID_EXTENDED #define FSCTL_CREATE_OR_GET_OBJECT_ID #define FSCTL_SET_SPARSE #define FSCTL_SET_ZERO_DATA #define FSCTL_QUERY_ALLOCATED_RANGES #define FSCTL_ENABLE_UPGRADE #define FSCTL_SET_ENCRYPTION #define FSCTL_ENCRYPTION_FSCTL_IO #define FSCTL_WRITE_RAW_ENCRYPTED #define FSCTL_READ_RAW_ENCRYPTED #define FSCTL_CREATE_USN_JOURNAL #define FSCTL_READ_FILE_USN_DATA #define FSCTL_WRITE_USN_CLOSE_RECORD #define FSCTL_EXTEND_VOLUME #define FSCTL_QUERY_USN_JOURNAL #define FSCTL_DELETE_USN_JOURNAL #define FSCTL_MARK_HANDLE #define FSCTL_SIS_ #define FSCTL_SIS_LINK_FILES #define FSCTL_HSM_MSG // decommissioned fsctl value 67 #define FSCTL_HSM_DATA #define FSCTL_RECALL_FILE // decommissioned fsctl value 70 #define FSCTL_READ_FROM_PLEX #define FSCTL_FILE_PREFETCH #endif /* _WIN32_WINNT >= 0x0500 */ #if(_WIN32_WINNT >= 0x0600) #define FSCTL_MAKE_MEDIA_ #define FSCTL_SET_DEFECT_MANAGEMENT #define FSCTL_QUERY_SPARING_INFO #define FSCTL_QUERY_ON_DISK_VOLUME_INFO #define FSCTL_SET_VOLUME_COMPRESSION_STATE #define FSCTL_TXFS_MODIFY_RM #define FSCTL_TXFS_QUERY_RM_INFORMATION // decommissioned fsctl value 83 #define FSCTL_TXFS_ROLLFORWARD_REDO #define FSCTL_TXFS_ROLLFORWARD_UNDO #define FSCTL_TXFS_START_RM #define FSCTL_TXFS_SHUTDOWN_RM #define FSCTL_TXFS_READ_BACKUP_INFORMATION #define FSCTL_TXFS_WRITE_BACKUP_INFORMATION #define FSCTL_TXFS_CREATE_SECONDARY_RM #define FSCTL_TXFS_GET_METADATA_ #define FSCTL_TXFS_GET_TRANSACTED_VERSION // decommissioned fsctl value 93 // decommissioned fsctl value 94 #define FSCTL_TXFS_CREATE_MINIVERSION // decommissioned fsctl value 96 // decommissioned fsctl value 97 // decommissioned fsctl value 98 #define FSCTL_TXFS_TRANSACTION_ACTIVE #define FSCTL_SET_ZERO_ON_DEALLOCATION #define FSCTL_SET_REPAIR #define FSCTL_GET_REPAIR #define FSCTL_WAIT_FOR_REPAIR // decommissioned fsctl value 105 #define FSCTL_INITIATE_REPAIR #define FSCTL_CSC_INTERNAL #define FSCTL_SHRINK_VOLUME #define FSCTL_SET_SHORT_NAME_BEHAVIOR #define FSCTL_DFSR_SET_GHOST_HANDLE_STATE // // Values 111 - 119 are reserved for FSRM. // #define FSCTL_TXFS_LIST_TRANSACTION_LOCKED_FILES \ #define FSCTL_TXFS_LIST_TRANSACTIONS #define FSCTL_QUERY_PAGEFILE_ENCRYPTION
Just a couple of notes: You'll notice that all the 'DllImport' functions in the Win32Api class you’ll find below are declared as 'public static' which makes sense since we don't want to create an object just to access these native functions. All the functions in this class are 'public static' for that reason.
There are some interesting parameters in the [DllImport()] attribute lines. Many of the native Win32 functions have the ability to set a unique 'last error'. SetLastError=true tells Platform Invoke Services to 'set last error'. This allows us to call GetLastError() get more detail if something went wrong.
Many simple data types, such as System.Byte and System.Int32, have a single representation in unmanaged code and do not need their marshaling behavior specified; the common language runtime automatically supplies the correct behavior. UnmanagedType.Bool is the Win32 BOOL type, which is always 4 bytes. [return: MarshalAs(UnmanagedType.Bool)] marshals the unmanaged bool as a managed bool.
On all of the structures you'll see an attribute line of ' [StructLayout(LayoutKind.Sequential, Pack = 1)]'. A 'Pack=1' indicates that data alignment occurs on byte boundaries. There are no gaps between fields with a packing value of 1.
The USN in USN Journal stands for " Update Sequence Number". The USN Journal provides a persistent log of all changes made to files/directories on a volume. If a file or directory changes, an entry is placed in the USN Journal denoting the file and the exact change or changes.
Jeffrey Cooperstein and Jeffrey Richter have co-authored a great article http://www.microsoft.com/msj/0999/journal/journal.aspx"Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained." This article discusses using the USN Journal in C++ and is well worth your time.
Each entry in the USN Journal is assigned a 64 bit number which is the offset into the actual USN Journal file of the entry. USN Journal entries can be different sizes since at least the filenames are different lengths.
Each file or directory on a volume has a unique 64 bit File Reference Number. Files can have the same names excluding the path but no two files or directories can have the same File Reference Number.
My goal here is to share C# code that uses the USN Journal. I know I can write C++ code that executes faster. Certainly, having many years of C++ experience helped me to understand how to form the memory for the calling parameters and how to parse through the resultant buffers to process the USN Journal entries.
The DeviceIoControl function [quote]Sends a control code directly to a specified device driver, causing the corresponding device to perform the corresponding operation. Given a device handle, a control code, an input buffer, and an output buffer you can get all kinds of information about the system with DeviceIoControl. The trouble is that none of it is available to C# code without a little work.
A USN Journal is specific to a given NTFS volume. The journal when it is started it is essentially an empty 'sparse' file on the volume. When a change is made to the volume, a record is added to the USN Journal file.
Quoting from NTFS.com:
A sparse file has an attribute that causes the I/O subsystem to allocate only meaningful (nonzero) data. Nonzero data is allocated on disk, and non-meaningful data (large strings of data composed of zeros) is not. When a sparse file is read, allocated data is returned as it was stored; non-allocated data is returned, by default, as zeros.
So to free up space on a 'sparse' file you need only to zero out the data.
One of the problems with the USN Journal is that anyone or process with administrative rights can create, start, or stop the USN Journal for any volume. They can remake the volume. They can do many things that would cause the USN Journal to be ‘invalid’ and some or all of the changes to the volume would be lost. So the first obstacle to overcome is to determine if there is a USN Journal for a volume and verify that it is valid. Also, USN Journals can overrun their bounds and become invalid simply because no one has removed the entries.
Each USN Journal has data associated with it that allows us to determine if it is 'valid.' And valid means that we haven't missed any data in the USN Journal and consequently, missed changes to files or directories, it hasn't been stopped and restarted since we've interrogated the USN Journal, and it hasn't overrun its boundaries. When you query the USN Journal you get back a USN_JOURNAL_DATA structure.
typedef struct { DWORDLONG UsnJournalID; USN FirstUsn; USN NextUsn; USN LowestValidUsn; USN MaxUsn; DWORDLONG MaximumSize; DWORDLONG AllocationDelta; } USN_JOURNAL_DATA, *PUSN_JOURNAL_DATA;
The USN type is a 64-bit integer called ‘update sequence number’. FirstUsn, NextUsn, LowestValidUsn, and MaxUsn are all defined as of USN type. They represent byte offsets from the beginning of the USN Journal file. The names are self explanatory. FirstUsn is the byte offset of the first USN Journal entry. NextUsn is the byte offset of the first byte of the next USN Journal entry to be written to the USN Journal.
The MaxUsn is the byte offset of the last possible byte in the USN Journal file. The AllocationDelta is the number of bytes to allocate at the end of the USN Journal file when they are needed. You can set this number so that you don’t have to allocate data each time you want to add an entry to the Journal.
The trick is to position the file to a byte position, read some data into some buffer and parse out one or more USN Journal Entries.
These are the control codes you’ll need to use the USN Journal.
FSCTL_ENUM_USN_DATA FSCTL_READ_USN_JOURNAL FSCTL_CREATE_USN_JOURNAL FSCTL_QUERY_USN_JOURNAL FSCTL_DELETE_USN_JOURNAL
FSCTL_CREATE_USN_JOURNALmay be a misnomer because if the USN Journal already exists, this only updates the MaximumSize and AllocationDelta properties if they are larger than those used when the USN Journal was created. It doesn’t change the UsnJournalID.
Since anyone with admin rights can start or stop the USN Journal or it can exhaust its space or the volume might be reformatted, any application that needs to depend on the USN Journal needs to be able to determine if the USN Journal is ‘valid’. A ‘valid’ journal has captured all changes to the file system on the volume since the last time we’ve queried it and all of the change entries we need are still available in the USN Journal.
If we miss even one entry, we have to fall back and determine the state of the USN Journal all over again and then begin to collect changes. Why is not missing even one USN Journal entry so important? It depends on the application. If I am trying to write an application that tracks each and every image file and I miss one USN Journal entry, the worst case is I miss one image file. But take a backup application that uses the USN Journal to track all new, changed and deleted files. If it misses one USN Journal entry, it will miss a change to the volume. The backup application can no longer guarantee that it has protected all the files/directories on the volume.
So how do we determine if the USN Journal is valid? First we need to query the USN Journal and get its state. The state of the USN Journal is defined by the USN_JOURNAL_DATA data structure defined in the Win32Api.cs file below. If you call DeviceIoControl with the device handle, the FSCTL_QUERY_USN_JOURNAL control code with an ‘out’ parameter to receive the USN_JOURNAL_DATA you’ll get the UsnJournalID, the FirstUsn, the NextUsn, the LowestValidUsn, the MaxUsn, MaximumSize and AllocationData.
Each time you create or start the USN Journal it gets a new UsnJournalID. If you keep the previous USN_JOURNAL_DATA, you can compare it with the ‘new’ USN_JOURNAL_DATAand determine if you have missed any USN Journal Entries. Note the LowestValidUsn field. The USN Journal discards ‘old’ change entries as it runs out of space and new entries arrive. If an application doesn’t frequently access the USN Journal and capture the change entries, the USN Journal may free the space occupied by the entries the application needs. To do this it simply needs to zero out that part of the sparse file.
It checks a current USN_JOURNAL_DATA’s UsnJournalID field against a previous USN_JOURNAL_DATA’s UsnJournalID field. If they are different, the USN Journal is invalid. The application cannot assume that the USN Journal in this state will provide the change entries for all of the changes that have taken place on the volume. In fact, you can almost guarantee that change entries will be lost.
Next it checks if the NextUsn it needs to process from the last time it collected the changes on the volume. The NextUsn must still be available in the USN Journal. If both these conditions are met the USN Journal is valid.
All we need to do is keep a persistent copy of the previous state of the USN Journal around and then we can determine with a quick QueryUsnJournal function whether we can rely on the USN Journal to supply us all changes that have occurred on the volume.
If you are not careful, the development environment will complain about using native structures. It will demand that you set the project property ‘Allow unsafe code’ and you’ll have to use the ‘unsafe’ keyword in front of those functions that use the native structures. If you are careful, you can avoid forcing the ‘Allow unsafe code’ property. The code I’ve provided does not require the ‘Allow unsafe code’ property.
When you create or query a USN Journal, you can get error codes like: ‘ERROR_JOURNAL_DELETE_IN_PROGRESS’. It makes more sense to me to handle the error codes instead of throwing exceptions so that if we try to create a USN Journal and it happens to be delete in progress, we should wait until the delete finishes and then create the USN Journal. The application should not have to deal with these issues.
I’ve added a function, DeleteUsnJournal, which deletes a USN Journal. You’ll notice that I had to add 4 bytes to the end of the structure in Win32Api.cs to get it to work. In the documentation, the structure only calls for a UInt64 JournalID and a UInt32 DeleteFlags. But if you try that structure you’ll get ERROR_INVALID_USER_BUFFER every time. If you search hard enough and long enough, you’ll find a comment somewhere that says someone got it to work by padding an extra 4 bytes on the end of the structure.
This post is too long to submit, I'll post the code in a reply...
StCroixSkipper