Context#
While working on this project (TODO LINK) involving ARM64 register programming I quickly felt the need to have a small command line tool that would give me some info on a given register, to avoid constantly looking in the ARM64 spec.
Luckily for me, ARM folks did a great job (this time) and they nowadays release “machine-readable” descriptors of their ISA, in the form of gigantic JSON files describing among other things every register available.
A while ago I had to implement an ARM32 emulator for my uKasan, but at that time I only had found the XML human-readable version which was a real pain to parse.
This JSON spec is MUCH better and I can’t thank ARM folks enough for it. It can be found alongside all the exploration tools for the AARCH64 ISA here, and the json database can be downloaded at the Open Source Machine Readable Data
section or using this link (if still functional).
But simplicity never lasts long : when I can, I prefer re-coding my libraries myself and avoid relying on third party code, as well as programming in C.
Back then for uKasan, after looking for 2 long minutes at the immediately defaulted to python after looking at the XML’s EBNF I quickly decided to doubly hold my nose and to :
- rely on a third party XML parsing library.
- use python.
which kinda hurt, let’s be honest.
But JSON is a whole simpler kind of beast. A glance a the JSON EBNF will show you how simple it is compared to XML.
I have already implemented a couple of JSON parsers for past use cases, but those were small JSON files, and I cared more about the functionality itself than about its performance.
This time, it is different : the ARM64 Register descriptor is 82MB wide, so the parsing time is non-neglectible.
My wish was to have my little command line database load as quickly as possible, and the more it progressed, the more it looked like it could deserve its own article.
First, we will start by some general considerations on the JSON format and its parsing. Then, we’ll discuss some algorithmic optimizations that can increase the performance. Then, we’ll go a level below and try to optimize some steps of the parsing algorithm at the assembly level to increase the performance even more.
The JSON format#
Anyone interested in the exact JSON syntax spec should definitely go checkout this dedicated website. It is great, and very clearly defines all the basic JSON data structures, so clearly that I was just tempted to post screenshots of it in this article. But I refrained.
Long story very short, the JSON file defines the basic value structure which can be any of :
- null : equivalent to void or none. Indicates the value of the XML format these days.
- boolean : false or true.
- number : decimal positive or negative integer or float.
- string : sequence of chars delimited by double quotes, backslashed characters have special meanings.
- array : sequence of values separated by commas and delimited by brackets.
- object : sequence of
key (string) : value
separated by commas and delimited by brackets.
Note that the beauty of the JSON is that the array and object structures contain values and hence make this format recursive. That will matter when we’ll discuss parsing.
The JSON spec essentially defines the low level features of our parser. It will need to recognize every entity supported by the spec and parse it correctly.
Skipping, and why it matters#
A parser is a piece of code able to extract and process data structured in a given format.
A counter-intuitive fact is that in practice, a (linear) parser is essentially a skip engine : at any given time, it knows :
- the position in the parsed file where it is at;
- what entity is at the current position.
- how to skip the current entity, aka move to the following.
Let’s see this with an example.
In all this section, we will ignore whitespaces to keep the explanation simple.
Let’s suppose that our parser is at the start of an object, and that it wants to extract some information contained in this object. The parser needs to skip the object start. The JSON format defines ‘{’ as a single-character array start, so the parser can simply move to the next character. The parser is now at a string start. As per the JSON format, the current char should be ‘"’. After checking this fact, the parser can move to the next character. The parser is now in a string body. The JSON format defines that the string will end with a ‘"’ not preceded by a ‘/’. It can move to the next character until it detects it. If the string should be stored somewhere, it can copy it directly. The parser is now at a string end ‘"’. It can move to the next character. The parser is now at a key-value separator. After checking that the current character is ‘:’, it can move to the next character. The parser is now at a value start. It can now call the generic value parser which will :
- detect the value’s first character; and
- call the relevant substructure parser based on it; and
- return the location at which parsing should resume (i.e : the location of the first character after the value. The parser is now at either :
- an array end ‘}’, in which case the parsing is done and the parser returns the location of the character after the ‘}’; or
- a key:value separator ‘,’, in which case the parser reiterates the key-value parsing routine.
So far, you can see that our parser is just a character skip engine with benefits : sometimes it may copy those characters or process them (for integer parsing), but what really matters is that it knows how to skip the current entity, in order to move to the next.
But you should also realize that skipping an entity is non-trivial. In order for our object to be fully parsed (skipped), we will have needed to skip _ every single entity that composed it _.
This fact alone gives us the first golden rule of JSON parsing :
We previously saw that the JSON spec itself defines almost entirely the low level aspect of our parser. But there is one question that is still pending : how are we going to use those low level features ? How are we going to parse our JSON’s components ?
So before elaborating on the implications of the first rule, we need to define the high level features of our parser.
Parsing a data exchange format#
A JSON file is essentially a data container. But are we always interested in all the information that it contains ?
In our case, the answer is no. The ARM64 JSON spec describes everything about the registers and instruction, but there are many things that we are not interested in (like “Accessors” ?!).
The job of our parser will be to extract targetted information.
For example, in our case, the ARM64 register spec is a gigantic array of register descriptors, where each register descriptor has a lot more info that we care about.
In the following section, [.] means “all entries of the array”, and [“xxx”] means the value of the current object indexed by “xxx”.
What we care, and the way to access it in the JSON is :
- root : array : register descriptor.
- [.] : object : register descriptor.
- ["name"] : str : register name.
- ["purpose"] : str : register purpose (null 99% of the time).
- ["title"] : str : register title (null 99% of the time).
- ["fieldets"] : array : layouts for this register (bad naming).
- [.] : object : layout descriptor.
- ["width"] : int : layout size in bits.
- ["values"] : array : bitfield descriptor (any suggestion for a more confusing name ?).
- [.] : object : bitfield descriptor.
- ["name"] : str : bitfield name.
- ["rangeset"] : array : of bit range descriptors.
- [.] : object : bitfield descriptor.
- ["start"] : int : bitfield start.
- ["width"] : int : bitfield size.
Our parser’s high level features hence sound pretty simple :
- iterate over every element of an array and parse each element with a dedicated function.
- iterate over a certain set of object elements indexed by specific keys and parse them with a dedicated function.
Our previously established golden rule will have a consequence on the second feature : since we must avoid parsing anything twice, we can’t afford to lookup an object once for each key that we are looking for. We will need to parse the object once and extract all elements in the order in which they appear.
Sample headers, macros and parsing code#
Here are the declarations of both the low level and high level parts of the parser :
Spec of the JSON parser.
#ifndef NS_LIB_JS_H
#define NS_LIB_JS_H
/************
* Skip API *
************/
/*
* Skip functions receive the location of the start of a json entity,
* and if it is valid, they return a pointer to the first character after
* the said entity.
* If it is invalid, they return 0.
*/
/*
* Skip whitespaces.
* Never fails.
*/
const char *ns_js_skp_whs(
const char *start
);
#define NS_JS_SKP_WHS(x) ({x = (typeof(x)) ns_js_skp_whs(x); check(x); x;})
/*
* Skip "true".
*/
const char *ns_js_skp_true(
const char *start
);
/*
* Skip "false".
*/
const char *ns_js_skp_false(
const char *start
);
/*
* Skip "null".
*/
const char *ns_js_skp_null(
const char *start
);
/*
* Skip characters.
*/
const char *ns_js_skp_chars(
const char *start
);
/*
* Skip a string.
*/
const char *ns_js_skp_str(
const char *start
);
/*
* Skip a integer.
*/
const char *ns_js_skp_int(
const char *start
);
/*
* Skip a fraction.
*/
const char *ns_js_skp_frc(
const char *start
);
/*
* Skip an exponent.
*/
const char *ns_js_skp_exp(
const char *start
);
/*
* Skip a number.
*/
const char *ns_js_skp_nb(
const char *start
);
/*
* Skip a member.
*/
const char *ns_js_skp_mmb(
const char *start
);
/*
* Skip members.
*/
const char *ns_js_skp_mmbs(
const char *start
);
/*
* Skip members until the elment at @key.
*/
const char *ns_js_skp_mmbs_to(
const char *pos,
const char *key
);
/*
* Skip an object.
*/
const char *ns_js_skp_obj(
const char *start
);
/*
* Skip an element.
*/
const char *ns_js_skp_elm(
const char *start
);
/*
* Skip elements.
*/
const char *ns_js_skp_elms(
const char *start
);
/*
* Skip an array.
*/
const char *ns_js_skp_arr(
const char *start
);
/*
* Skip a value.
*/
const char *ns_js_skp_val(
const char *start
);
/*****************
* Array parsing *
*****************/
/*
* Start the parsing of a json array starting at @start.
* Return the location of the first element if a valid array starts
* at @start, 0 if no valid array starts at @start.
* If the returned value is the end of the arra, set @end.
* If the returned array is a valid element, clear @end.
*/
const char *ns_js_arr_prs_stt(
const char *start,
u8 *end
);
/*
* Return the start of the next element of the currently parsed array.
* @start is the value returned by either @ns_js_arr_parse_*.
* If the skip fails, return 0.
* If the returned value is the end of the array, set @end.
* If the returned array is a valid element, clear @end.
*/
const char *ns_js_arr_prs_skp(
const char *start,
u8 *end
);
/*
* Return the start of the next element of the currently parsed array.
* @pos is the value returned by either @ns_js_arr_parse_*.
* If the skip fails, return 0.
* If the returned value is the end of the array, set @end.
* If the returned array is a valid element, clear @end.
*/
const char *ns_js_arr_prs_nxt(
const char *pos,
u8 *end
);
/*
* Finish the parsing of a json array ending at @start.
* Return the location of the next character after it, if a valid array ends
* at @start, 0 otherwise.
*/
const char *ns_js_arr_prs_end(
const char *start
);
/******************
* Object parsing *
******************/
/*
* Start the parsing of a json object starting at @pos.
* Return the location of the first element if a valid object starts
* at @pos, 0 if no valid object starts at @pos.
* If the returned value is the end of the object, set @end.
* If the returned object is a valid element, clear @end.
*/
const char *ns_js_obj_prs_stt(
const char *pos,
u8 *end
);
/*
* Return the start of the next element of the currently parsed object.
* @pos is the value returned by either @ns_js_obj_parse_*.
* If the skip fails, return 0.
* If the returned value is the end of the object, set @end.
* If the returned object is a valid element, clear @end.
*/
const char *ns_js_obj_prs_skp(
const char *pos,
u8 *end
);
/*
* Return the start of the next element of the currently parsed object.
* @pos is the value returned by either @ns_js_obj_parse_*.
* If the skip fails, return 0.
* If the returned value is the end of the object, set @end.
* If the returned object is a valid element, clear @end.
*/
const char *ns_js_obj_prs_nxt(
const char *pos,
u8 *end
);
/*
* Finish the parsing of a json object ending at @pos.
* Return the location of the next character after it, if a valid object ends
* at @pos, 0 otherwise.
*/
const char *ns_js_obj_prs_end(
const char *pos
);
/*************
* Iteration *
*************/
/*
* Iterate over all elements of a container.
*/
#define ns_js_cnt_fe(typ, idt, jsn) \
for ( \
char __ns_js_itr_end_##idt = ({jsn = (typeof(jsn)) ns_js_##typ##_prs_stt(jsn, (u8 *) &__ns_js_itr_end_##idt); __ns_js_itr_end_##idt; }); \
({if (__ns_js_itr_end_##idt) {jsn = (typeof(jsn)) ns_js_##typ##_prs_end(jsn);}; (__ns_js_itr_end_##idt == 0);}); \
jsn = (typeof(jsn)) ns_js_##typ##_prs_nxt(jsn, (u8 *) &__ns_js_itr_end_##idt) \
)
/*
* Iterate over all elements of an object.
*/
#define ns_js_obj_fe_(jsn, i) ns_js_cnt_fe(obj, i, jsn)
#define ns_js_obj_fe(jsn) ns_js_cnt_fe(obj, 0, jsn)
/*
* Iterate over all elements of an array.
*/
#define ns_js_arr_fe_(jsn, i) ns_js_cnt_fe(arr, i, jsn)
#define ns_js_arr_fe(jsn) ns_js_cnt_fe(arr, 0, jsn)
/**************
* Extraction *
**************/
#define EXPAND(...) __VA_ARGS__
#define REMOVE(...)
/*
* Iterate over all members of the
* objecty starting at @jsn, and parse
* members that match the list of fields
* using dedicated extractors, storing them
* in locally defined variables.
*/
#define NS_JS_XTR_DEF(nam, prs, str, len, ...) \
_unused_ typeof(ns_js_prs_##prs(0 NS_PRP_CDN_EMP(__VA_ARGS__)(, __VA_ARGS__))) nam = 0;
#define NS_JS_XTR_PRS(nam, prs, str, len, ...) \
if ((nam_siz == len NS_PRP_CDT_EMP(len) (sizeof(str) - 1)) && (!strn_cmp(str, nam_stt, len NS_PRP_CDT_EMP(len) (sizeof(str) - 1)))) { \
nam = ns_js_prs_##prs(&__jsn NS_PRP_CDN_EMP(__VA_ARGS__)(, __VA_ARGS__)); \
goto __found; \
}
#define NS_JS_XTR(jsn, ...) \
\
/* Define local variables to contain parsing results. */ \
NS_PRP_CAL(NS_JS_XTR_DEF, EMPTY, __VA_ARGS__) \
{ \
typeof(jsn)__jsn = jsn; \
ns_js_obj_fe_(__jsn, xtr) { \
\
/* Skip the name, compute its length. */ \
check(*__jsn == '"'); \
const char *nam_stt = __jsn + 1; \
__jsn = (typeof(__jsn)) ns_js_skp_str(__jsn); \
const char *nam_end = __jsn - 1; \
check(nam_end >= nam_stt); \
const uad nam_siz = psub(nam_end, nam_stt); \
\
/* Find the element. */ \
NS_JS_SKP_WHS(__jsn); \
check((*__jsn) == ':'); \
__jsn++; \
NS_JS_SKP_WHS(__jsn); \
\
/* Compare all candidates. If found, jump to __found. */ \
NS_PRP_CAL(NS_JS_XTR_PRS, EMPTY, __VA_ARGS__) \
\
/* If not found, skip the colon and element. */ \
__jsn = (typeof(__jsn)) ns_js_skp_val(__jsn); \
\
/* Once parsed, focus on the next element or stop. */ \
__found:; \
} \
jsn = __jsn; \
}
#endif /* NS_LIB_JS_H */
A few notes :
- the
Skip API
section defines the low level part of the parser. As we covered previously, it is mostly composed of functions that receive a pointer to the start of a particular entity, and that return the location of the next character after this entity. - the
Array parsing
andObject parsing
sections define functions that allow a granular parsing of those containers, substructure by substructure. - the
Iteration
sections define macros that expand into for statements which use the container parsing utilities defined by the section above, they allow iterating over all elements of a container in a very compact way. - the
NS_JS_XTR
macro is the toplevel primitive to use to parse an object. In one single object processing sequence, it allows the simple extraction of a specified set ofkey:value
. Since the types of the results of the parsing of those values depends on what we are parsing, it cannot be implemented as a function. Rather, it is a macro that expands into the dedicated object parsing sequence. It also automatically defines the variables that contain the result of the parsing. - If one takes a closer look at the
NS_JS_XTR
macro, they will notice that it internally uses macros prefixed byNS_PRP
that it does not define. Those are part of my preprocessor tricks lib, which is out of the scope of this topic, but in particular :NS_PRP_CAL(name, separator, ...)
expands into a call toname
with every value passed in__VA_ARGS__
.NS_PRP_CDT_EMP(cdt) (code)
will expand tocode
ifcdt
is empty and to nothing ifcdt
is not empty.NS_PRP_CDT_EMP(cdt) (code)
will expand to nothing ifcdt
is empty and tocode
ifcdt
is not empty.
Now let’s take a look at the code to extract (and simply print) the data that we need from the register spec.
ARM64 register spec extraction code using the JSON parser.
/*
* Parse and return an u64.
*/
u64 ns_js_prs_u64(
char **jsnp
)
{
char *jsn = *jsnp;
uerr err = 0;
u64 val = 0;
char *end = (char *) str_to_u64_auto(jsn, &val, &err);
check(!err);
*jsnp = end;
return val;
}
/*
* Parse and return a string or a 'null'.
*/
char *ns_js_prs_str_or_nul(
char **jsnp
)
{
char *jsn = *jsnp;
if (*jsn == 'n') {
*jsnp = jsn + 4;
return 0;
}
char *end = (char *) ns_js_skp_str(jsn);
*(end - 1) = 0;
*jsnp = end;
return jsn + 1;
}
/*
* Parse a range set.
*/
char *ns_js_prs_rng_set(
char **jsnp
)
{
char *jsn = *jsnp;
/* Iterate over each fieldset. */
ns_js_arr_fe(jsn) {
NS_JS_XTR(
jsn,
(stt, u64, "start",),
(wid, u64, "width",)
);
info("\t\t\trng :%U:%U\n", stt, wid);
}
*jsnp = jsn;
return 0;
}
/*
* Parse a field value.
*/
char *ns_js_prs_fld_val(
char **jsnp
)
{
char *jsn = *jsnp;
/* Iterate over each fieldset. */
ns_js_arr_fe(jsn) {
NS_JS_XTR(
jsn,
(nam, str_or_nul, "name",),
(set, rng_set, "rangeset",)
);
info("\t\t\tnam :%s\n", nam);
}
*jsnp = jsn;
return 0;
}
/*
* Parse register fieldset.
*/
char *ns_js_prs_reg_flds(
char **jsnp
)
{
char *jsn = *jsnp;
/* Iterate over each fieldset. */
ns_js_arr_fe(jsn) {
NS_JS_XTR(
jsn,
(siz, u64, "width",),
(val, fld_val, "values",)
);
info("\tfieldset :\n");
info("\t\tsiz :%U\n", siz);
}
*jsnp = jsn;
return 0;
}
/*
* Parse the ARM64 register database.
*/
char *ns_js_prs_rdb(
char **jsnp
)
{
char *jsn = *jsnp;
/* Iterate over each register. */
ns_js_arr_fe(jsn) {
char *jsnm = (char *) jsn;
NS_JS_XTR(
jsnm,
(flds, reg_flds, "fieldsets",),
(nam, str, "name",),
(prp, str_or_nul, "purpose",),
(ttl, str_or_nul, "title",)
);
jsn = jsnm;
info("%s :\n\t%s\n\t%s\n", nam, ttl, prp);
}
*jsnp = jsn;
return 0;
}
/********
* Main *
********/
/*
* rdb main.
*/
u32 prc_rdb_main(
u32 argc,
char **argv
)
{
/* Open and map the reg db. */
ns_res stg_res;
ns_stg *stg = nsk_stg_opn(&stg_res, "/home/bt/Downloads/armdb/Registers.json");
assert(stg);
char *jsn = ns_stg_map(stg, 0, 0, ns_stg_siz(stg), NS_STG_ATT_RED | NS_STG_ATT_WRT);
assert(jsn);
/* Parse. */
ns_js_prs_rdb(&jsn);
return 0;
}
A few notes :
- this relies heavily on my own standard library (all the ns_ prefixed stuff). In particular :
- printing is made via
info
, format specifiers may look weird (%U
for 64 bits integers). That’s normal, since I use my own format decoder I took the liberty to reassign a few specifiers to cover up the stdlib’s madness. In particular,%u
and%U
are unsigned ints,%i
and%I
are for signed ints,%f
is for float, and%d
is for double (in practice both are the same, given how the C handles float promotion in variadic args). - since we just print, parsers do not return anything. If we want to not just print, we can allocate structures on the fly with
malloc
, initialize them with the parsed data and return them. As long as the parser signatures are changed to reflect this, and that the parent callers do not lose track of them, no problem.
Actually, let’s do it.
Here is the code that parses the arm register db and that stores it in a tree-like data structure.
ARM64 register spec extraction code using the JSON parser.
/**************
* Structures *
**************/
/*
* Bitfield.
*/
typedef struct {
/* Bitfields of the same register. */
ns_dls bfls;
/* Name. */
const char *nam;
/* Start. */
u8 stt;
/* Size. */
u8 siz;
} rdb_bfl;
/*
* Register.
*/
typedef struct {
/* Registers of the same db. */
ns_mapn_str regs;
/* Bitfields. */
ns_dls bfls;
/* Number of bits. */
u8 siz;
} rdb_reg;
/*
* System.
*/
typedef struct {
/* Registers sorted by name. */
ns_map_str regs;
} rdb_sys;
/***********
* Parsers *
***********/
/*
* Parse and return an u8.
*/
u8 ns_js_prs_u8(
char **jsnp
)
{
char *jsn = *jsnp;
uerr err = 0;
u8 val = 0;
char *end = (char *) str_to_u8_auto(jsn, &val, &err);
check(!err);
*jsnp = end;
return val;
}
/*
* Parse and return a string or a 'null'.
*/
char *ns_js_prs_str_or_nul(
char **jsnp
)
{
char *jsn = *jsnp;
if (*jsn == 'n') {
*jsnp = jsn + 4;
return 0;
}
char *end = (char *) ns_js_skp_str(jsn);
*(end - 1) = 0;
*jsnp = end;
return jsn + 1;
}
/*
* Parse a range set, return the covered range.
*/
u16 ns_js_prs_rng_set(
char **jsnp
)
{
char *jsn = *jsnp;
/* Iterate over each bit range, enlarge if needed. */
u8 stt = 0;
u8 end = 0;
u8 mult = 0;
ns_js_arr_fe(jsn) {
NS_JS_XTR(
jsn,
(_stt, u8, "start",),
(_siz, u8, "width",)
);
u8 _end = _stt + _siz;
/* Enlarge if needed. */
if (mult) {
if (_stt < stt) stt = _stt;
if (_end > end) end = _end;
} else {
stt = _stt;
end = _end;
}
stt = _stt;
mult = 1;
}
*jsnp = jsn;
return (u16) stt | (u16) (end - stt) << 8;
}
/*
* Parse a bit field array.
* Store allocated bitfield descriptors in @dst.
*/
u8 ns_js_prs_bfl(
char **jsnp,
ns_dls *dst
)
{
char *jsn = *jsnp;
/* Iterate over each bitfield.
* Expect one in practice. */
u8 mul = 0;
ns_js_arr_fe(jsn) {
assert(!mul, "multi fieldset register.\n");
NS_JS_XTR(
jsn,
(nam, str_or_nul, "name",),
(set, rng_set, "rangeset",)
);
if (nam) {
ns_alloc__(rdb_bfl, bfl);
ns_dls_ib(dst, &bfl->bfls);
bfl->nam = nam;
bfl->stt = (u8) (set & 0xff);
bfl->siz = (u8) ((set >> 8) & 0xff);
}
}
*jsnp = jsn;
return 0;
}
/*
* Parse a register layout array expectedly containing
* only one layout.
* Store bitfields in @dst.
* Return the register size.
*/
u8 ns_js_prs_reg_lyts(
char **jsnp,
ns_dls *dst
)
{
char *jsn = *jsnp;
/* Iterate over each fieldset. */
u8 siz = 0;
u8 mult = 0;
ns_js_arr_fe(jsn) {
assert(!mult, "multi-layout register.\n");
NS_JS_XTR(
jsn,
(_siz, u8, "width",),
(_val, bfl, "values",,dst)
);
siz = _siz;
}
*jsnp = jsn;
return siz;
}
/*
* Parse the ARM64 register database.
* Store register content in @sys.
*/
char *ns_js_prs_rdb(
char **jsnp,
rdb_sys *sys
)
{
char *jsn = *jsnp;
/* Iterate over each register. */
ns_js_arr_fe(jsn) {
/* Parse the register descriptor,
* store bitfields in a local list. */
ns_dls_def(bfls);
NS_JS_XTR(
jsn,
(siz, reg_lyts, "fieldsets",,bfls),
(nam, str, "name",),
(prp, str_or_nul, "purpose",),
(ttl, str_or_nul, "title",)
);
assert(nam, "unnamed register.\n");
/* Create a register descriptor,
* save the name, transfer bitfields. */
ns_alloc__(rdb_reg, reg);
uerr err = ns_map_str_put(&sys->regs, ®->regs, nam);
if (err) {
//debug("duplicate register '%s'.\n", nam); // There are a few of those...
rdb_bfl *bfl;
ns_dls_fes(bfl, bfls, bfls) {
ns_free_(bfl);
}
ns_dls_init(bfls);
ns_free_(reg);
} else {
ns_dls_rp(bfls, ®->bfls);
reg->siz = siz;
}
/* Ensure nothing dangling. */
ns_dls_del(bfls);
}
*jsnp = jsn;
return 0;
}
Notes :
- while writing this code I renamed a few things, so the reader will notice changes in the parser names. No functional change is implemented.
- allocation is performed using
ns_alloc__
which performs variable def, allocation and size computation at once for less C code. - the code uses some functions of my own (non-)standard library, namely, my linked list lib (
ns_dls
) and my map (ns_map
) lib, so as their dedicated iterators.
Complete JSON parsing time#
First, to have a vague idea of what is considered an acceptable decoding time, let’s have the ARM64 register file decoded by another library.
Here I chose python for simplicity.
import json
json.load(open('/tmp/Registers.json'))
Let’s run it and check how much time it takes.
$ time python3 /tmp/test.py
real 0m0.403s
user 0m0.326s
sys 0m0.055s
Here, let’s remember that we’re asking python to :
- decode the whole file, which our case study code has to also do, because of what we covered earlier.
- construct a python object tree to contain all the information contained in the JSON file.
Hence, we’re here asking python to do things that we are not doing on our side, so the python execution time must be considered just as a point of reference, and not as a relevant performance metric.
Just out of curiosity, I took a shot at doing the same thing (decoding the whole file and generating a C-like containerized version) with a library (ct
) that I wrote a few years ago. This library is not optimized (it likely does some parsing multiple times) and its perf most certainly sucks but you’ll see that the execution time (which is worse than python) stays in the same order of magnitude.
/*
* rdb main.
*/
u32 prc_rdb_main(
u32 argc,
char **argv
)
{
/* Open and map the reg db. */
ns_res stg_res;
ns_stg *stg = nsk_stg_opn(&stg_res, "/tmp/Registers.json");
assert(stg);
char *jsn = ns_stg_map(stg, 0, 0, ns_stg_siz(stg), NS_STG_ATT_RED | NS_STG_ATT_WRT);
assert(jsn);
/* Parse the whole file and generate a tree-like representation. */
ct_val *v = ct_js_parse_value((const char **) &jsn);
/* Yeah I know I didn't delete @v, it's just for show, who cares... (valgrind does...) */
return 0;
}
Then let’s run it :
$ time build/prc/prc
real 0m0.591s
user 0m0.573s
sys 0m0.017s
The reader may wonder as I did why the python side spends so much system time compared to mine. As you can see in the C code, I’m mmap-ing the whole JSON file in RAM for simplicity, which also has the benefit of not relying on
fread
constantly. It’s possible that the python side does rely on constantfread
s which could explain this. Bad job on my side for doing worse than this if that’s so.
I’m taking around 50% more time to do it because my library is bad, but again, it’s just to give you a rough estimate of how much time it takes on my machine to decode that large file.
Well, actually it may not be just because my lib is bad. I checked my build command and here it was :
gcc -Ibuild/prc/inc -Ibuild/lib/inc -MMD -O0 -DDEBUG -Wall -Wextra -Wconversion -Wshadow -Wundef -fno-common -Wall -Wno-unused-parameter -Wno-type-limits -g -o build/prc/obj/rdb/rdb.c.o -c prc/rdb.c
Fixing the brain and the build mode gives a better result.
time build/prc/prc
real 0m0.189s
user 0m0.174s
sys 0m0.014s
So I’m running quicker than python but again, both times have close orders of magnitude.
Targetted JSON parsing time.#
Now, let’s run the code that I showed you at the end of the previous section, which actually extracts register info and does something with it.
This JSON decoder is relatively performant, in the sense that it does not do anything stupid like parse the same content multiple times (which my previous example certainly did). It also does not create a full in-memory representation of all the data in the JSON file.
I also placed the register file in /tmp/ which is ram-backed to avoid any form of storage access.
$ build/prc/prc -rdb /tmp/regs.json -c 10
stp 0 : 72.442.
stp 1 : 49.858.
stp 2 : 49.996.
stp 3 : 48.785.
stp 4 : 48.795.
stp 5 : 49.943.
stp 6 : 48.750.
stp 7 : 50.366.
stp 8 : 50.716.
stp 9 : 51.777.
Average : 52.142.
The first iteration is slower. This can be due to many things, here are the two that come to my mind :
- CPU power transition : before I ran the code on my system, it is mostly idle. A first run will cause the CPU to burn power which will make it transition to a higer perf state, and potentially cause linux to move our process to a performant core.
- branch predictor training : this first iteration will be exactly the same as the other, and it will give the CPU an exact statistical representation of how branches are taken. Training the branch prediction based on this will effectively cause a better prediction on every next run.
The next iterations are statistically closer and this 50ms
execution time will be our base performance metric for the next chapters.
Let’s see how much we can shrink it !
Conclusion#
This article should have stated the base notions required to understand what JSON is, how it is parsed, and the potential difficulties and perf problems that arise when doing so.
The next part will show some high level algorithmic tricks to make this process more efficient.
Bonus : why I made this command line tool#
With that JSON database, a simple command line tool can do a lot of things for you. Here are the features that matter to me as a machine-level programmer :
$ build/prc/prc -h
Available rdb commands :
lst : list registers.
reg : show info about a register.
dec : decode the value of a register.
lyt : generate a C register layout struct.
List registers starting with a prefix.
$ build/prc/prc lst -n PMU
PMU
PMUACR_EL1
PMUSERENR
PMUSERENR_EL0
Decode the values of a register’s bitfield given the register name and a value for it.
$ build/prc/prc dec -n PMUSERENR -v 10
PMUSERENR : 32
- [0] : EN : 0x0
- [1] : SW : 0x1
- [2] : CR : 0x0
- [3] : ER : 0x1
Generate a C struct to directly manipulate bitfields in an easy way. Let’s see it in action with the ARM CPSR.
$ build/prc/prc lyt -n CPSR
typedef union {
uint32_t bits;
struct {
uint32_t M : 4;
uint32_t __res0 : 2;
uint32_t F : 1;
uint32_t I : 1;
uint32_t A : 1;
uint32_t E : 1;
uint32_t __res1 : 6;
uint32_t GE : 4;
uint32_t __res2 : 7;
uint32_t Q : 1;
uint32_t V : 1;
uint32_t C : 1;
uint32_t Z : 1;
uint32_t N : 1;
};
} CPSR_t;