Changeset 30490 in osm for applications/editors/josm/plugins/pbf/src
- Timestamp:
- 2014-06-09T18:08:08+02:00 (11 years ago)
- Location:
- applications/editors/josm/plugins/pbf/src/crosby/binary
- Files:
-
- 2 added
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
applications/editors/josm/plugins/pbf/src/crosby/binary/BinaryParser.java
r26961 r30490 55 55 } 56 56 57 //@Override57 @Override 58 58 public void handleBlock(FileBlock message) { 59 59 // TODO Auto-generated method stub … … 77 77 78 78 79 //@Override79 @Override 80 80 public boolean skipBlock(FileBlockPosition block) { 81 81 // System.out.println("Seeing block of type: "+block.getType()); -
applications/editors/josm/plugins/pbf/src/crosby/binary/StringTable.java
r26961 r30490 2 2 3 3 This program is free software: you can redistribute it and/or modify 4 it under the terms of the GNU Lesser General Public License as 5 published by the Free Software Foundation, either version 3 of the 4 it under the terms of the GNU Lesser General Public License as 5 published by the Free Software Foundation, either version 3 of the 6 6 License, or (at your option) any later version. 7 7 … … 46 46 47 47 /** After the stringtable has been built, return the offset of a string in it. 48 * 48 * 49 49 * Note, value '0' is reserved for use as a delimiter and will not be returned. 50 50 * @param s … … 57 57 public void finish() { 58 58 Comparator<String> comparator = new Comparator<String>() { 59 //@Override59 @Override 60 60 public int compare(final String s1, String s2) { 61 61 int diff = counts.get(s2) - counts.get(s1); … … 63 63 } 64 64 }; 65 66 /* Sort the stringtable */ 67 68 /* 69 When a string is referenced, strings in the stringtable with indices: 70 0 : Is reserved (used as a delimiter in tags 71 A: 1 to 127 : Uses can be represented with 1 byte 72 B: 128 to 128**2-1 : Uses can be represented with 2 bytes, 73 C: 128*128 to X : Uses can be represented with 3 bytes in the unlikely case we have >16k strings in a block. No block will contain enough strings that we'll need 4 bytes. 74 75 There are goals that will improve compression: 76 1. I want to use 1 bytes for the most frequently occurring strings, then 2 bytes, then 3 bytes. 77 2. I want to use low integers as frequently as possible (for better 78 entropy encoding out of deflate) 79 3. I want the stringtable to compress as small as possible. 80 81 Condition 1 is obvious. Condition 2 makes deflate compress stringtable references more effectively. 82 When compressing entities, delta coding causes small positive integers to occur more frequently 83 than larger integers. Even though a stringtable references to indices of 1 and 127 both use one 84 byte in a decompressed file, the small integer bias causes deflate to use fewer bits to represent 85 the smaller index when compressed. Condition 3 is most effective when adjacent strings in the 86 stringtable have a lot of common substrings. 87 88 So, when I decide on the master stringtable to use, I put the 127 most frequently occurring 89 strings into A (accomplishing goal 1), and sort them by frequency (to accomplish goal 2), but 90 for B and C, which contain the less progressively less frequently encountered strings, I sort 91 them lexiconographically, to maximize goal 3 and ignoring goal 2. 92 93 Goal 1 is the most important. Goal 2 helped enough to be worth it, and goal 3 was pretty minor, 94 but all should be re-benchmarked. 95 96 97 */ 98 99 65 100 66 101 set = counts.keySet().toArray(new String[0]); … … 71 106 // sorted lexiconographically. 72 107 // to maximize deflate compression. 73 108 74 109 // Don't sort the first array. There's not likely to be much benefit, and we want frequent values to be small. 75 110 //Arrays.sort(set, Math.min(0, set.length-1), Math.min(1 << 7, set.length-1)); 76 111 77 112 Arrays.sort(set, Math.min(1 << 7, set.length-1), Math.min(1 << 14, 78 113 set.length-1));
Note:
See TracChangeset
for help on using the changeset viewer.