When I investigate Unix-like systems (mostly Linux) for intrusion case, I always check utmp, wtmp and btmp to track of suspicious login/logout. These files are not text, and I have been using last/lastb command with -f option. However, sometimes its files have empty data because an attacker can remove these files if he/she has administrative privileges.

That's why I have created utmp scanner on bulk_extractor-rec.
Format
The format varies from OS such as Linux, FreeBSD and etc. now I have been focused on Linux format because I have very few experience other than Linux...
There are three types of files, utmp, wtmp, and btmp. utmp maintains only current status, wtmp is a historical utmp, and btmp records failed login attempts. All of these types has the same format which is described at 'man utmp 5'.
The key definition is as follows:
#define EMPTY 0 /* Record does not contain valid info (formerly known as UT_UNKNOWN on Linux) */
#define RUN_LVL 1 /* Change in system run-level (see init(8)) */
#define BOOT_TIME 2 /* Time of system boot (in ut_tv) */
#define NEW_TIME 3 /* Time after system clock change (in ut_tv) */
#define OLD_TIME 4 /* Time before system clock change (in ut_tv) */
#define INIT_PROCESS 5 /* Process spawned by init(8) */
#define LOGIN_PROCESS 6 /* Session leader process for user login */
#define USER_PROCESS 7 /* Normal process */
#define DEAD_PROCESS 8 /* Terminated process */
#define ACCOUNTING 9 /* Not implemented */
#define UT_LINESIZE 32
#define UT_NAMESIZE 32
#define UT_HOSTSIZE 256
struct exit_status { /* Type for ut_exit, below */
short int e_termination; /* Process termination status */
short int e_exit; /* Process exit status */
};
struct utmp {
short ut_type; /* Type of record */
pid_t ut_pid; /* PID of login process */
char ut_line[UT_LINESIZE]; /* Device name of tty - "/dev/" */
char ut_id[4]; /* Terminal name suffix, or inittab(5) ID */
char ut_user[UT_NAMESIZE]; /* Username */
char ut_host[UT_HOSTSIZE]; /* Hostname for remote login, or kernel version for run-level messages */
struct exit_status ut_exit; /* Exit status of a process marked as DEAD_PROCESS; not used by Linux init (1 */
/* The ut_session and ut_tv fields must be the same size when compiled 32- and 64-bit.
This allows data files and shared memory to be shared between 32- and 64-bit applications. */
#if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32
int32_t ut_session; /* Session ID (getsid(2)), used for windowing */
struct {
int32_t tv_sec; /* Seconds */
int32_t tv_usec; /* Microseconds */
} ut_tv; /* Time entry was made */
#else
long ut_session; /* Session ID */
struct timeval ut_tv; /* Time entry was made */
#endif
int32_t ut_addr_v6[4]; /* Internet address of remote host; IPv4 address uses just ut_addr_v6[0] */
char __unused[20]; /* Reserved for future use */
};
bulk_extractor-rec utmp scanner
According to above definition, each fileld has the following size.
| field | size |
|---|---|
| ut_type | 4 |
| ut_pid | 4 |
| ut_line | 32 |
| ut_id | 4 |
| ut_user | 32 |
| ut_host | 256 |
| ut_exit | 4 |
| ut_session | 4 |
| tv_sec | 4 |
| tv_usec | 4 |
| ut_addr_v6 | 16 |
| unused | 20 |
| Total | 384 |
ut_type is defined as short (2 byte) but I have confirmed that actual data indicate 4 byte.
To raise the precision of utmp scanner, I have created following rules based on actual utmp records.
- ut_type: 1-8 (because I have never seen 0 or 9)
- ut_line, ut_user, and ut_host: printable ASCII characters and end with \x00
- tv_sec: a positive number
- tv_usec: 0-999999 (because usec 1000000 means 1 second)
- unused: \x00..\x00
bulk_extractor-rec utmp scanner search pattern that meets these requirements then carve out to file named utmp.
Example:
bulk_extractor -E utmp -o output input
bulk_extractor -x all -e gzip -e utmp -o output input

If you want to search pattern within gzipped data, gzip scanner also should be enabled.

In this instance, both wtmp and btmp have no size but bulk_extractor-rec found some amount of utmp record.
Parser
Records in utmp file which bulk_extractor-rec found are not chronological, so last/lastb may show incorrect information.
The simple python 3 parser I uploaded gist parses utmp with TSV. You can download exe in the following link.
Windows: utmp_parser.zip
(SHA-256: 60d14a3af0d5c0bf87c836a7c31b0f7c952c28c4916345df35c5c9208b79613f)

Especially, deleted utmp records may contain root cause of intrusion etc. bulk_extractor-rec helps your work!
