Ignore:
Timestamp:
2014-06-09T18:08:08+02:00 (11 years ago)
Author:
donvip
Message:

[josm_pbf] upgrade to protoc 2.5.0 and crosby.binary 1.3.3

Location:
applications/editors/josm/plugins/pbf/src/crosby/binary
Files:
2 added
2 edited

Legend:

Unmodified
Added
Removed
  • applications/editors/josm/plugins/pbf/src/crosby/binary/BinaryParser.java

    r26961 r30490  
    5555    }
    5656   
    57     //@Override
     57    @Override
    5858    public void handleBlock(FileBlock message) {
    5959        // TODO Auto-generated method stub
     
    7777
    7878
    79     //@Override
     79    @Override
    8080    public boolean skipBlock(FileBlockPosition block) {
    8181        // System.out.println("Seeing block of type: "+block.getType());
  • applications/editors/josm/plugins/pbf/src/crosby/binary/StringTable.java

    r26961 r30490  
    22
    33   This program is free software: you can redistribute it and/or modify
    4    it under the terms of the GNU Lesser General Public License as 
    5    published by the Free Software Foundation, either version 3 of the 
     4   it under the terms of the GNU Lesser General Public License as
     5   published by the Free Software Foundation, either version 3 of the
    66   License, or (at your option) any later version.
    77
     
    4646
    4747    /** After the stringtable has been built, return the offset of a string in it.
    48      * 
     48     *
    4949     * Note, value '0' is reserved for use as a delimiter and will not be returned.
    5050     * @param s
     
    5757    public void finish() {
    5858        Comparator<String> comparator = new Comparator<String>() {
    59             //@Override
     59            @Override
    6060            public int compare(final String s1, String s2) {
    6161                int diff = counts.get(s2) - counts.get(s1);
     
    6363            }
    6464        };
     65
     66        /* Sort the stringtable */
     67
     68        /*
     69        When a string is referenced, strings in the stringtable with indices:
     70               0                : Is reserved (used as a delimiter in tags
     71         A:  1 to 127          : Uses can be represented with 1 byte
     72         B: 128 to 128**2-1 : Uses can be represented with 2 bytes,
     73         C: 128*128  to X    : Uses can be represented with 3 bytes in the unlikely case we have >16k strings in a block. No block will contain enough strings that we'll need 4 bytes.
     74
     75        There are goals that will improve compression:
     76          1. I want to use 1 bytes for the most frequently occurring strings, then 2 bytes, then 3 bytes.
     77          2. I want to use low integers as frequently as possible (for better
     78             entropy encoding out of deflate)
     79          3. I want the stringtable to compress as small as possible.
     80
     81        Condition 1 is obvious. Condition 2 makes deflate compress stringtable references more effectively.
     82        When compressing entities, delta coding causes small positive integers to occur more frequently
     83        than larger integers. Even though a stringtable references to indices of 1 and 127 both use one
     84        byte in a decompressed file, the small integer bias causes deflate to use fewer bits to represent
     85        the smaller index when compressed. Condition 3 is most effective when adjacent strings in the
     86        stringtable have a lot of common substrings.
     87
     88        So, when I decide on the master stringtable to use, I put the 127 most frequently occurring
     89        strings into A (accomplishing goal 1), and sort them by frequency (to accomplish goal 2), but
     90        for B and C, which contain the less progressively less frequently encountered strings, I sort
     91        them lexiconographically, to maximize goal 3 and ignoring goal 2.
     92
     93        Goal 1 is the most important. Goal 2 helped enough to be worth it, and goal 3 was pretty minor,
     94        but all should be re-benchmarked.
     95
     96
     97        */
     98
     99
    65100
    66101        set = counts.keySet().toArray(new String[0]);
     
    71106          // sorted lexiconographically.
    72107          // to maximize deflate compression.
    73          
     108
    74109          // Don't sort the first array. There's not likely to be much benefit, and we want frequent values to be small.
    75110          //Arrays.sort(set, Math.min(0, set.length-1), Math.min(1 << 7, set.length-1));
    76          
     111
    77112          Arrays.sort(set, Math.min(1 << 7, set.length-1), Math.min(1 << 14,
    78113              set.length-1));
Note: See TracChangeset for help on using the changeset viewer.