Paul, some interesting new methods, the BX parser (sorry about the name) seems to be significantly faster than the current one when there are NO qualifiers, If there are qualifiers, it is reasonably faster. But the interesting bits start when I created a new method that uses a StringBuffer rather than using "chunks".
The new method is flying when there are qualifiers all over the place... but is slower (albeit not slower than the current one) when there are no qualifier... and that is a bit of a mystery... anyway... getting too late... Have a look and let me know.
126 lines of code changed in:
do not use null for empty fields, use an empty String.
11 lines of code changed in:
modified the last check in testSomeExtremeCases()
it appears that it should have 2 " in the result of the parse
Benoit, please lmk if this is incorrect.
3 lines of code changed in:
- handle null's for lTrim(), lTrimKeepTabs, and splitLine()
- splitLine now returns nulls for elements which are empty and have not been qualified
- Added a trimToNull method.
- All current tests pass with these changes
39 lines of code changed in:
Added an option (17) to compare the BX parser and the current parser.
It should be noted that the current parser fails on some tests (Paul could you fix?)
Just select the number of repeat, the number of columns and whether the column should be qualified or not... tell me your results. ta
40 lines of code changed in:
Paul, I've added some basic tests for null, empty, ",,," kind of things. I've also had a go at a parser, the regular expression is a dead-end or will become **extremely** complex due to our special and whacky cases... The basic tests make quite a few things break in the current version. I'll run a couple of speed tests to see where we're going...
28 lines of code changed in:
One more funny test....
1 lines of code changed in:
I have added a few very basic tests and they all seem to fail... Paul, could you investigate? thanks.
9 lines of code changed in:
converted to factory classes
74 lines of code changed in:
converted to IDataSet interface
3 lines of code changed in:
converted to factory classes
9 lines of code changed in:
added a few more tests.
69 lines of code changed in:
Added a few examples for the website.
116 lines of code changed in:
first fixed width test
36 lines of code changed in:
first cut at reg expressions...
3 lines of code changed in:
added a couple more tests
3 lines of code changed in:
added 2 more extreme tests. Possible bug on the last test.
Needs more discussion.
9 lines of code changed in:
removed code that was commented out since the new code is now the accepted version.
46 lines of code changed in:
Just to keep note.
11 lines of code changed in:
documented addition to splitLine
1 lines of code changed in:
added to javadoc description for splitLine
3 lines of code changed in:
Trim left and right space for unqualified elements.
7 lines of code changed in:
General clean Up by Eclipse (cleanup, organise imports and format).
523 lines of code changed in:
First cut at re-org to use Factory mechanisms. Converted 2 unit tests and they seem happy...
Still using IDataSet for the interface.
LargeSet not covered at this stage.
258 lines of code changed in:
the DataError should be immutable (i.e. no Set method)
15 lines of code changed in:
Removed some throw Exception
Only non-runtime exceptions should be declared and never at the 'Exception' level, which is far too generic and forces every caller to deal with something which is 'unknonwn'
23 lines of code changed in:
starting to go through extreme tests
13 lines of code changed in:
removed system out
1 lines of code changed in:
fixed bug, should not trim off qualifier unless the element
began with a qulifier
3 lines of code changed in:
ParserUtils for fixed width files
15 lines of code changed in:
moved parse to FixedWidthParserUtils
10 lines of code changed in:
added a new method to add a collection of columns to the row
14 lines of code changed in:
- fixed line count bug
- moved parse to FixedWidthParserUtils
- moved constants
28 lines of code changed in:
moved fixed width method to FixedWidthParserUtils
deprecated method
2 lines of code changed in:
moved some constants in from LargeDataSet
34 lines of code changed in:
added a better comment to getColumns()
3 lines of code changed in:
Added a heuristic test that proves that using a StringBuffer delete is better than creating a new one...
60 lines of code changed in:
Try to reduce the number of trimmings but Paul, could you check the comments //+ as I believe that those tests are redudant...
7 lines of code changed in:
try to reduce memory requirements by trimming to size the list.
5 lines of code changed in:
Added a couple of whacky tests, some fail (on purpose); Paul could you check what results you expect and create a few more?
Thanks
32 lines of code changed in:
First cut at some interfaces. Paul, could you review and tell me if you think that they are well separated.
I think that PZParserFactory.java and PZParser.java are ok but have I put everything that is required for the manipulation
of a DataSet in IDataSet.java?
54 lines of code changed in:
Removed System.out
0 lines of code changed in:
removed freeMemory() call and updated constructor
I have not tested the changes yet. I go back through and make sure they are still okay
11 lines of code changed in:
removed freeMemmory call and updated constructor
2 lines of code changed in:
added missing char version of constructor
31 lines of code changed in:
Optimised the ParserUtils to use char for delimiter and qualifier.
I have added deprecated methods for Strings (using only the first character). Could you find out where these are used and remove the call to those and use the char instead.
All tests are passed but we should add more... especially with regards to the multi line one...
Time to hit the sack!
381 lines of code changed in:
Forgot to append the actual element.
2 lines of code changed in:
expanded upon the tests. Made a little more generic. There is an
array of delimiters and qualifiers which we can fill in for whatever
we want to test. Implemented Benoit's formating suggestions.
127 lines of code changed in:
started splitline test. Publishing so I can work on it futher from work
20 lines of code changed in:
Renamed to follow naming convention "ClassNameMethodToTest"
20 lines of code changed in:
added CSV with hdr and trailer file to make #5 work okay
0 lines of code changed in:
Added code to print the errors found in the file if there were any.
Pointed to a text file with no header and trailer to corrispond with
the mapping.
9 lines of code changed in:
added header and trailer checks
20 lines of code changed in:
Keep the header and trailer in the same order when moving to the bottom
9 lines of code changed in:
Fixed the test.
1 lines of code changed in:
Took liberty to make the tests more explicit in order to detect any potential side effect, say the lTrim would correctly remove the leading space and leave the last one but mangle the text in between, the original tests would not have spotted that.
I have also added a space in the middle of the word to detect more potential issues.
Finally, I have added a method at the bottom and this has raised a question about the exacts spec... Paul, please have a look.
i.e. lTrimWithKeepTabs, what if the string starts with a tab and then a space and then some text "\t blabla" what should the result be??? "\t blabla" (now) or "\tblabla" ???
19 lines of code changed in:
Fixed link to download page.
1 lines of code changed in:
bad test on keep leading tabs...corrected
1 lines of code changed in:
Test cases for lTrim and lTrimKeepTabs
11 lines of code changed in:
added checks for header and trailer records
17 lines of code changed in:
added default system type to get around JDOM parse error. More notes in task manager
7 lines of code changed in:
backed out a 1.5 only method Integer.valueOf
2 lines of code changed in:
Uploaded new site.
2 lines of code changed in:
Some serious kicking...
1/ use a map for finding the column index; this makes the fetch of the first or last column consistent
2/ removed SOME of the substring which are causing dramatic performance degradation when once has a fair amount of columns.
3/ optimised some string manipulation code (getDelimiterOffset, lTrim, lTrimKeepTabs, removeChar
4/ I would suggest the creation of a suite of unit tests for all those methods.... Paul, do you want to take this on?
320 lines of code changed in:
javadoc package
0 lines of code changed in:
package javadoc
0 lines of code changed in:
javadoc package docs
0 lines of code changed in:
Fix the links to Word and PDF doco.
6 lines of code changed in:
updated package structure to net.sf on bat file
1 lines of code changed in:
tiny amount of formatting.
0 lines of code changed in:
scoping and using PreparedStatement (always better).
81 lines of code changed in:
Reduce the scope of some variables that now can be declared 'final', this in turn helps the JVM to optimize the runtime code, as well as keeping the memory requirement to a minimum.
38 lines of code changed in:
Fix the homepage and reduced the scope of some variables.
39 lines of code changed in:
Link to the documentation (somehow it had been removed...)
6 lines of code changed in:
Avoid a loop with string addition.
6 lines of code changed in:
Couple of changes for website.
40 lines of code changed in:
Few site changes, also preparing a press release.
171 lines of code changed in:
Final items for move to net.sf.pzfilereader.
15 lines of code changed in:
Moved to net.sf.pzfilereader
7 lines of code changed in: