This shows you the differences between two versions of the page.
| — |
gnucap:user:istring [2018/09/24 07:42] (current) felixs created |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | == case sensitivity in maps == | ||
| + | gnucap provides a flag for case sensitivity that affects the parsing and | ||
| + | interpretation of identifiers. It is user controlled. The main use case is the | ||
| + | support for mostly obsolete but still widely used input languages such as | ||
| + | SPICE. the case sensitivity flag affects declaration and lookup. | ||
| + | |||
| + | In case insensitive mode, an identifier is cast to lower case when it is | ||
| + | entered, and when it is looked up in a dictionary. This can be seen when | ||
| + | switching to the SPICE language, which turns case sensitivity off. | ||
| + | |||
| + | gnucap> spice | ||
| + | gnucap-spice>.param a=1. | ||
| + | gnucap-spice>.param B=2. | ||
| + | gnucap-spice>.param | ||
| + | a= 1. b= 2. | ||
| + | |||
| + | internally, these names are stored in lower case, not case preserving, similar | ||
| + | to names in FAT12, but lower case. back to verilog mode, things look sensitive | ||
| + | again. | ||
| + | |||
| + | gnucap-spice>.verilog | ||
| + | gnucap-verilog>param C=2. | ||
| + | gnucap-verilog>param c=3. | ||
| + | gnucap-verilog>param b=4. | ||
| + | gnucap-verilog>param | ||
| + | .C( 2.), .a( 1.), .b( 4.), .c(3.) | ||
| + | |||
| + | in particular, the actual case is lost, and the parameter B is gone. | ||
| + | |||
| + | gnucap-verilog>eval C | ||
| + | C= 1. | ||
| + | gnucap-verilog>eval B | ||
| + | B=B | ||
| + | gnucap-verilog>eval b | ||
| + | b= 4. | ||
| + | |||
| + | similarly, when a parameter "X" is declared in case sensitive mode, it is | ||
| + | invisible in case insensitive mode. This can have surprising consequences, when | ||
| + | running simulations (e.g. parameter sweeps) in spice mode. | ||
| + | |||
| + | Newer case insensitive file systems (e.g. NTFS, VFAT, HFS+) are case | ||
| + | preserving, not without odd side effects. on NTFS, readme.txt and a Readme.txt | ||
| + | can coexist, according to wikipedia. Since file systems usually fix the | ||
| + | sensitivity mode, the side effects are limited. But the approaches do not work | ||
| + | very well in gnucap. | ||
| + | |||
| + | gnucap should be case preserving internally, but must implement a lookup that | ||
| + | depends on the current mode. | ||
| + | |||
| + | gnucap> options noinsensitive | ||
| + | gnucap> param A=1. | ||
| + | gnucap> options insensitive | ||
| + | gnucap> eval A | ||
| + | A= 1. | ||
| + | gnucap> eval a | ||
| + | a= 1. | ||
| + | gnucap> param A=2. | ||
| + | gnucap> eval A | ||
| + | A= 2. | ||
| + | gnucap> eval a | ||
| + | a= 2. | ||
| + | gnucap> param A=3. | ||
| + | gnucap> eval A | ||
| + | A= 3. | ||
| + | gnucap> eval a | ||
| + | a= 3. | ||
| + | |||
| + | With an approriate lookup, no mangling will be required in case insensitive | ||
| + | mode. But how to implement an appropriate lookup? Gnucap stores parameters in | ||
| + | an std::map<std::string, PARAMETER>. The problem with that is, if "AAAA" is a | ||
| + | key, a case insensitive lookup for "aaaa" will be tricky. The expectation is, | ||
| + | that "AAAA" will be found. however, Maps store keys in binary trees, and the | ||
| + | search is implemented as a bisection, for performance reasons. | ||
| + | In a linear search, we could simply cast the search string to upper, as well as | ||
| + | all the keys that we compare it to. In the map above, we'd have to do a lookup | ||
| + | for all strings that "aaaa" could possibly match. Sixteen in this case, but | ||
| + | much worse for longer strings. | ||
| + | |||
| + | The issue with the lookup boils down to the fact that the insensitive matches | ||
| + | could be anywhere in the map, which is sorted by the order induced by the ASCII | ||
| + | encoding, 1 < 2 < ... < 9 < A < B ... < Z < a < b < ... < z. The following | ||
| + | example illustrates the pattern, which easily scales to any size. | ||
| + | |||
| + | AAAA * | ||
| + | ABBB | ||
| + | AAaa * | ||
| + | AABB | ||
| + | Aaaa * | ||
| + | Bbbb | ||
| + | aaaa * | ||
| + | |||
| + | In contrast, if the keys were sorted insensitively first and blocks with | ||
| + | congruent keys sorted individually, case sensitive, the order would be | ||
| + | |||
| + | AAAA * | ||
| + | AAaa * | ||
| + | Aaaa * | ||
| + | aaaa * | ||
| + | AABB | ||
| + | ABBB | ||
| + | Bbbb. | ||
| + | |||
| + | Putting all possible matches to our query next to each other. In an std::map | ||
| + | sorted like this, lookups and inserts with either case (in)sensitivity mode | ||
| + | will work, while the mode may change in between. | ||
| + | |||
| + | - implement the string comparison operator | ||
| + | - encapsulate it in a custom string type | ||
| + | - provide case sensitive and insensitive lookup, by default these should use OPT::case_insensitive. | ||
| + | |||
| + | (this is implemented in the paramap-* branch) | ||
| + | |||
| + | == CARD_LIST == | ||
| + | |||
| + | With this kind of string in place, it will be possible to also implement an | ||
| + | efficient CARD_LIST::find, supporting both case sensitivities. CARD_LISTs | ||
| + | implement the lists of CARDs, which represent all user input, including the | ||
| + | order, for each individual scope. The CARDs could be models or components or | ||
| + | subcircuit declarations or just comments. | ||
| + | |||
| + | Unlike in PARAM_LISTs there can be many cards with the same key (short_label) | ||
| + | in one CARD_LIST, which is implemented as a linked list. The lookup is | ||
| + | implemented as a linear search, and hence slow. And if there was no case | ||
| + | sensitivity flag, this could be easily fixed. an std::multimap<string, | ||
| + | list_iterator> or std::multiset<list_iterator> right next to the std::list | ||
| + | could be used to lookup the position. | ||
| + | |||
| + | So the list needs to stay the same, and instead of the multiset, we need a | ||
| + | custom multiset that works transparently through case sensitivity changes. | ||
| + | One way of doing this, is to nest sets. One set, with elements sorted case | ||
| + | insensitively holds (at most) one multiset for each string class, a "bucket". | ||
| + | a bucket may then hold arbitrarily many CARDs, but in case sensitive ordering. | ||
| + | |||
| + | For example, if there are cards A A a B b b c, they will end up in three | ||
| + | buckets. | ||
| + | |||
| + | A - A | ||
| + | - A | ||
| + | - a | ||
| + | B - B | ||
| + | - b | ||
| + | - b | ||
| + | c - c | ||
| + | |||
| + | A case insensitive lookup for b will then try and find it in the B bucket, then | ||
| + | pick any of the cards there. similarly, a case sensitive lookup will pick one | ||
| + | of the lower case cards. since std::multiset preserves the insertion order, it | ||
| + | will be possible to fine tune the lookup. | ||
| + | |||
| + | (this is implemented in the multimap-* branch) | ||