Meditations on Redesigning the Database Viewer
Recently I’ve been involved in the redesign of Cellebrite’s Database Viewer. The new viewer has a right pane that displays cell data more clearly in a specifically tailored view for each data type. Also, there is a new button that displays SQLite recovered records. In this blog post, I will share some thoughts about the process that went into designing both features.
A New Contender Has Entered the Race
Here is a sentence I’ve been wanting to say for a while, ever since I first encountered a .realm file: “The Realm Database is becoming increasingly more popular and has been establishing itself as a worthy opponent of SQLite in smartphone apps.”
I must admit that my rooting for this format had more to do with its intriguing structure than its actual usefulness. However, when I started decoding, I discovered that it is indeed a very powerful tool used by mobile app developers, which emerged from a very real necessity.
Realm is a database format in the sense that it has tables, rows, and columns, but it is not an SQL database. The practice of foreign keys is rendered completely obsolete because one or more rows of one table can be “stored” (actually referenced) in a cell of another table. It has seven different none-null primitive types instead of the four in SQLite, and with performance improvement when querying, and built-in libraries to handle objects, it is rather easy to see the appeal.
And so, our team at Cellebrite had to make a switch from thinking of a database (DB) viewer as an SQLite viewer to a more general understanding that the DB viewer should support all kinds of database formats. The problem, of course, was that we had no ability to display complex objects inside a cell. Instead, a string proclaiming that the type was not supported in that view was displayed as the cell’s content. The user had to switch to File Format Viewer in order to see the missing content. The new viewer solves this problem.
Identify → Deserialize → View → Search
Even before the rise of Realm, developers and third-party tools had been storing complex objects inside SQLite cells. In the case of cache databases, where different types of objects are stored in the developer’s preferred method of serialization this is particularly useful, and we can get a glimpse into what went into modeling these objects.
We have a powerful tool for identifying data-formats, which we utilize to identify the type of data in each BLOB. After the initial identification, we use a second tool to take a stream of data and a declared data-format, to output a deserialized object. These serialization formats include iOS’s Property-Lists, Java Object Serializations, JSONs, Google’s Protocol-Buffers, and some other lesser known formats.
All these data types have a corresponding viewing-component and can now be viewed in the new DB viewer in the right pane of the screen when a cell is selected. Our team at Cellebrite started by adding a Text and a Hex tab to the right pane and kept on adding more and more tabs each time a developer had a new idea: Serialized-Data, Images, HTML, Date & Time, and Decoded base64.
Since, in the course of an investigation, some strings are of special interest, it is paramount that one can select and copy a specific string or sub-string of a text, and search where else it was referenced.
A Window to What Was
The new DB viewer supports recovered records previously unseen. One of the main purposes of Cellebrite Physical Analyzer is to scan unallocated memory and search for specific patterns of bytes called “signatures.” These are remnants of data that were once linked or otherwise referenced, in order to infer a start point and a length in bytes, and by those means to extract a stream of recovered data. This process is called “carving.”
A common example of this occurs when a user decides to delete a photo that they took. The memory manager usually does not take the time to zero-fill the file, but instead marks its memory block as “unallocated.” However, this common example can distort a much more complicated reality: the recovered data was not necessarily deleted by the user.
So how else could we potentially interpret this data?
First, depending on the strength of the signature, (meaning the inverse probability of the pattern to appear for any other reason other than the indication of once-allocated data) the match could simply be a false positive – just an arbitrary stream of bytes.
Second, this could be a case where the data was re-allocated. Therefore, it is either a duplicate of allocated data or a previous version of it.
Third, this data could have been deleted. Not by the user, but by an application or operating system that deemed the data unnecessarily stored. In a typical case, data is stored both in the cloud and on the device for offline fetching. However, when the data is not accessed for a while, the memory manager could decide to free that space on the device to allocate it for better uses, such as recent message or contact data.
The SQLite format marks unallocated blocks within pages of table-data. Since SQLite almost always saves all data of a specific record closely together, we can scan those blocks for record signatures and extract them as recovered records.
It is important to note that the recovered data is not necessarily records that were deleted by the user. Some record signatures are relatively weak and are very prone to false positives, while others are just previous versions of allocated records. Still, this is a powerful tool to find deleted messages a user might have sent and decided to remove, and as such we want to be able to display them to investigators. In that sense a DB viewer can be both a window to what is, and a window to what was.
This project was not a one-man operation, and I would like to credit the teammates that worked with me to make sure these features saw the light of day: Nathan Pankowsky, Dror Rimon, Shlomi Hoota, Or Begam and Sharon Berel. Other features will soon be released as well, and we will continue to strive and make digital evidence more easily accessible.