In my previous article, I have described record structure and traced 3 records including 1 deleted record in sample database (4n6ist_sample.zip) . This time I will focus on slot array & how to identify deleted record.
In MDF, each page maintains the offset to each record at slot array which is located at the end of the page. I show a layout of a page again.
Let's check slot array of page 368 in the database.
Each entry of slot array consists of 2 bytes and first entry starts at the end of the page.
The figure shows that first entry (0x2E1FFE-0x2E1FFF) indicates the offset (0x0060) of first record. Second entry (0x2E1FFC-0x2E1FFD) indicates the offset (0x00C5) of second record. Third entry (0x2E1FFA-0x2E1FFB) is 0 (0x0000), because the third record has already been deleted. It would be 0x012A before the record is deleted. Usual tools rely on the information of slot array to identify each position of allocated record.
What will happen to slot array when a new record is inserted? I have inserted new sample record in the table with the following query (You can download second sample here).
INSERT INTO pictures(id, date, category, filename, data) SELECT 4, '20200223', 'Lights', 'PIXNIO-2363819', BulkColumn FROM OPENROWSET(BULK N'E:\dataset\PIXNIO-2363819-1200x800.jpg', SINGLE_BLOB) AS image
Here is page 368 in the database after insertion.
Now third entry (0x2E1FFA-0x2E1FFB) has changed from 0 to 0x018F, which points to new record (4th). We can confirm that data of third record still remains.
Parsing deleted records
Considering these fasts, I assume that deleted record can be identified and recovered when we parse each record from the start of records area and verify corresponding entry of slot array. I have written a python script based on the assumption. The script parses each record of specific data page, indicates [DELETED] when its record has no reference in slot array.
The script has -d option to show only deleted records. It is useful when we want to focus on only deleted records and a data page has a lot of records.
Applying table schema to records
The script is able to find deleted records. However, output is not easy to read for us. If we provide corresponding table schema to parse records, we can extract each column with human-readable format. Again, here is columns of pictures table.
- id - Fixed-size integer (4 bytes)
- date - Fixed-size ASCII string (8 bytes)
- category - Fixed-size Unicode string (32 bytes)
- filename - Variable-size Unicode string (up to 510 bytes)
- data - Variable-size data (no specific limits)
I have added a function to the script for the schema of pictures table in accordance with record structure as follows:
When I enable the function and run the script, output also shows human-readable information.
We can understand deleted information easily if we provide specific table schema. I assume such schema information is defined somewhere in MDF. If we are able to parse table schema and added as code to the script, we will view human-readable information without such manual work.
I haven't mentioned data column (Data Type: image) which contains large object data (LOB) yet. I will cover structure of LOB next time.