• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1The following changes (change numbers refer to perforce) were
2made from version 3.1.1 to 3.1.2
3
4Runtime
5-------
6
7Change 5641 on 2009/02/20 by jimi@jimi.jimi.antlr3
8
9	Release version 3.1.2 of the ANTLR C runtime.
10
11	Updated documents and release notes will have to follow later.
12
13Change 5639 on 2009/02/20 by jimi@jimi.jimi.antlr3
14
15	Fixed: ANTLR-356
16
17	Ensure that code generation for C++ does not require casts
18
19Change 5577 on 2009/02/12 by jimi@jimi.jimi.antlr3
20
21	C Runtime - Bug fixes.
22
23	 o Having moved to use an extract directly from a vector for returning
24	   tokens, it exposed a
25	   bug whereby the EOF boudary calculation in tokLT was incorrectly
26	   checking > rather than >=.
27	 o Changing to API initialization of tokens rather than memcmp()
28	   incorrectly forgot to set teh input stream pointer for the
29	   manufactured tokens in the token factory;
30	 o Rewrite streams for rewriting tree parsers did not check whether the
31	   rewrite stream was ever assigned before trying to free it, it is now
32	   in line with the ordinary parser code.
33
34Change 5576 on 2009/02/11 by jimi@jimi.jimi.antlr3
35
36	C Runtime: Ensure that when we manufacture a new token for a missing
37	token, that the user suplied custom information (if any) is copied
38	from the current token.
39
40Change 5575 on 2009/02/08 by jimi@jimi.jimi.antlr3
41
42	C Runtime - Vastly improve the reuse of allocated memory for nodes in
43	  tree rewriting.
44
45	A problem for all targets at the moment si that the rewrite logic
46	generated by ANTLR makes no attempt
47	to reuse any resources, it merely gurantees that the tree shape at the
48	end is correct. To some extent this is mitigated by the garbage
49	collection systems of Java and .Net, even thoguh it is still an overhead to
50	keep creating so many modes.
51
52	This change implements the first of two C runtime changes that make
53	best efforst to track when a node has become orphaned and will never
54	be reused, based on inherent knowledge of the rewrite logic (which in
55	the long term is not a great soloution).
56
57	Much of the rewrite logic consists of creating a niilnode into which
58	child nodes are appended. At: rulePost processing time; when a rewrite
59	stream is closed; and when becomeRoot is called, there are many situations
60	where the root of the tree that will be manipulted, or is finished with
61	(in the case of rewrtie streams), where the nilNode was just a temporary
62	creation for the sake of the rewrite itself.
63
64	In these cases we can see that the nilNode would just be left ot rot in
65	the node factory that tracks all the tree nodes.
66	Rather than leave these in the factory to rot, we now keep a resuse
67	stck and always reuse any node on this
68	stack before claimin a new node from the factory pool.
69
70	This single change alone reduces memory usage in the test case (20,604
71	line C program and a GNU C parser)
72	from nearly a GB, to 276MB. This is still way more memory than we
73	shoudl need to do this operation, even on such a large input file,
74	but the reduction results in a huge performance increase and greatly
75	reduced system time spent on allocations.
76
77	After this optimizatoin, comparison with gcc yeilds:
78
79	time gcc -S a.c
80	a.c:1026: warning: conflicting types for built-in function ‘vsprintf’
81	a.c:1030: warning: conflicting types for built-in function ‘vsnprintf’
82	a.c:1041: warning: conflicting types for built-in function ‘vsscanf’
83	0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
84	0inputs+240outputs (0major+8345minor)pagefaults 0swaps
85
86	and
87
88	time ./jimi
89	Reading a.c
90	0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
91	0inputs+0outputs (0major+66609minor)pagefaults 0swaps
92
93	And we can now interpolate the fact that the only major differnce is
94	now the huge disparity in memory allocations. A
95	future optimization of vector pooling, to sepate node resue from vector
96	reuse, currently looks promising for further reuse of memory.
97
98	Finally, a static analysis of the rewrte code, plus a realtime analysis
99	of the heap at runtime, may well give us a reasonable memory usage
100	pattern. In reality though, it is the generated rewrite logic
101	that must becom optional at not continuously rewriting things that it
102	need not, as it ascends the rule chain.
103
104Change 5563 on 2009/01/28 by jimi@jimi.jimi.antlr3
105
106	Allow rewrite streams to use the base adaptors vector factory and not
107	try to malloc new vectors themselves.
108
109Change 5562 on 2009/01/28 by jimi@jimi.jimi.antlr3
110
111	Don't use CALLOC to allocate tree pools, use malloc as there is no need
112	for calloc.
113
114Change 5561 on 2009/01/28 by jimi@jimi.jimi.antlr3
115
116	Prevent warnigsn about retval.stop not being initialized when a rule
117	returns eraly because it is in backtracking mode
118
119Change 5558 on 2009/01/28 by jimi@jimi.jimi.antlr3
120
121	Lots of optimizations (though the next one to be checked in is the huge
122	win) for AST building and vector factories.
123
124	A large part of tree rewriting was the creation of vectors to hold AST
125	nodes. Although I had created a vector factory, for some reason I never got
126	around to creating a proper one, that pre-allocated the vectors in chunks and
127	so on. I guess I just forgot to. Hence a big win here is prevention of calling
128	malloc lots and lots of times to create vectors.
129
130	A second inprovement was to change teh vector definition such that it
131	holds a certain number of elements wihtin the vector structure itself, rather
132	than malloc and freeing these. Currently this is set to 8, but may increase.
133	For AST construction, this is generally a big win because AST nodes don't often
134	have many individual children unless there has not been any shaping going on in
135	the parser. But if you are not shaping, then you don't really need a tree.
136
137	Other perforamnce inprovements here include not calling functions
138	indirectly within token stream and common token stream. Hence tokens are
139	claimed directly from the vectors. Users can override these funcitons of course
140	and all this means is that if you override tokenstreams then you pretty much
141	have to provide all the mehtods, but then I think you woudl have to anyway (and
142	I don't know of anyone that has wanted to do this as you can carry your own
143	structure around with the tokens anyway and that is much easier).
144
145Change 5555 on 2009/01/26 by jimi@jimi.jimi.antlr3
146
147	Fixed: ANTLR-288
148	Correct the interpretation of the skip token such that channel, start
149	index, char pos in lie, start line and text are correctly reset to the start of
150	the new token when the one that we just traversed was marked as being skipped.
151
152	This correctly excludes the text that was matched as part of the
153	SKIP()ed token from the next token in the token stream and so has the side
154	effect that asking for $text of a rule no longer includes the text that shuodl
155	be skipped, but DOES include the text of tokens that were merely placed off the
156	default channel.
157
158Change 5551 on 2009/01/25 by jimi@jimi.jimi.antlr3
159
160	Fixed: ANTLR-287
161	Most of the source files did not include the BSD license. THis might
162	not be that big a deal given that I don't care what people do with it
163	other than take my name off it, but having the license reproduced
164	everywhere
165	at least makes things perfectly clear. Hence this mass change of
166	sources and templates
167	to include the license.
168
169Change 5550 on 2009/01/25 by jimi@jimi.jimi.antlr3
170
171	Fixed: ANTLR-365
172	Ensure that as soon as we known about an input stream on the lexer that
173	we borrow its string factroy adn use it in our EOF token in case
174	anyone tries to make it a string, such as in error messages for
175	instance.
176
177Change 5548 on 2009/01/25 by jimi@jimi.jimi.antlr3
178
179	Fixed: ANTLR-363
180        At some point the Java runtime default changed from discarding offchannel
181        tokens to preserving them. The fix is to make the C runtime also
182	default to preserving off-channel tokens.
183
184Change 5544 on 2009/01/24 by jimi@jimi.jimi.antlr3
185
186	Fixed: ANTLR-360
187	Ensure that the fillBuffer funtiion does not call any methods
188	that require the cached buffer size to be recorded before we
189	have actually recorded it.
190
191Change 5543 on 2009/01/24 by jimi@jimi.jimi.antlr3
192
193	Fixed: ANTLR-362
194	Some users have started using string factories themselves and
195	exposed a flaw in the destroy method, that is intended to remove
196	a strng htat was created by the factory and is no longer needed.
197	The string was correctly removed from the vector that tracks them
198	but after the first one, all the remaining strings are then numbered
199	incorrectly. Hence the destroy method has been recoded to reindex
200	the strings in the factory after one is removed and everythig is once
201	more hunky dory.
202	User suggested fix rejected.
203
204Change 5542 on 2009/01/24 by jimi@jimi.jimi.antlr3
205
206	Fixed ANTLR-366
207	The recognizer state now ensures that all fields are set to NULL upon
208creation
209	and the reset does not overwrite the tokenname array
210
211Change 5527 on 2009/01/15 by jimi@jimi.jimi.antlr3
212
213	Add the C runtime for 3.1.2 beta2 to perforce
214
215Change 5526 on 2009/01/15 by jimi@jimi.jimivista.antlr3
216
217	Correctly define the MEMMOVE macro which was inadvertently left to be
218	memcpy.
219
220Change 5503 on 2008/12/12 by jimi@jimi.jimi.antlr3
221
222	Change C runtime release number to 3.1.2 beta
223
224Change 5473 on 2008/12/01 by jimi@jimi.jimivista.antlr3
225
226	Fixed: ANTLR-350 - C runtime use of memcpy
227	Prior change to use memcpy instead of memmove in all cases missed the
228	fact that the string factory can be in a situation where overlaps occur. We now
229	have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
230
231Change 5471 on 2008/12/01 by jimi@jimi.jimivista.antlr3
232
233	Fixed ANTLR-361
234	 - Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for
235	   MingW
236
237Templates
238---------
239
240Change 5637 on 2009/02/20 by jimi@jimi.jimi.antlr3
241
242	C rtunime - make sure that ADAPTOR results are cast to the tree type on
243	a rewrite
244
245Change 5620 on 2009/02/18 by jimi@jimi.jimi.antlr3
246
247	Rename/Move:
248	From: //depot/code/antlr/main/src/org/antlr/codegen/templates/...
249	To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/...
250
251	Relocate the code generating templates to exist in the directory set
252	that maven expects.
253
254	When checking in your templates, you may find it easiest to make a copy
255	of what you have, revert the change in perforce, then just check out the
256	template in the new location, and copy the changes back over. Nobody has oore
257	than two files open at the moment.
258
259Change 5578 on 2009/02/12 by jimi@jimi.jimi.antlr3
260
261	Correct the string template escape sequences for generating scope
262	code in the C templates.
263
264Change 5577 on 2009/02/12 by jimi@jimi.jimi.antlr3
265
266	C Runtime - Bug fixes.
267
268	 o Having moved to use an extract directly from a vector for returning
269	    tokens, it exposed a
270	    bug whereby the EOF boudary calculation in tokLT was incorrectly
271	    checking > rather than
272	    >=.
273	 o Changing to API initialization of tokens rather than memcmp()
274	    incorrectly forgot to
275	    set teh input stream pointer for the manufactured tokens in the
276	    token factory;
277	 o Rewrite streams for rewriting tree parsers did not check whether the
278	    rewrite stream
279	    was ever assigned before trying to free it, it is now in line with
280	    the ordinary parser code.
281
282Change 5567 on 2009/01/29 by jimi@jimi.jimi.antlr3
283
284	C Runtime - Further Optimizations
285
286	Within grammars that used scopes and were intended to parse large
287	inputs with many rule nests,
288	the creation anf deletion of the scopes themselves became significant.
289	Careful analysis shows that
290	for most grammars, while a parse could create and delete 20,000 scopes,
291	the maxium depth of
292	any scope was only 8.
293
294	This change therefore changes the scope implementation so that it does
295	not free scope memory when
296	it is popped but just tracks it in a C runtime stack, eventually
297	freeing it when the stack is freed. This change
298	caused the allocation of only 12 scope structures instead of 20,000 for
299	the extreme example case.
300
301	This change means that scope users must be carefule (as ever in C) to
302	initializae their scope elements
303	correctly as:
304
305	1) If not you may inherit values from a prior use of the scope
306	    structure;
307	2) SCope structure are now allocated with malloc and not calloc;
308
309	Also, when using a custom free function to clean a scope when it is
310	popped, it is probably a good idea
311	to set any free'd pointers to NULL (this is generally good C programmig
312	practice in any case)
313
314Change 5566 on 2009/01/29 by jimi@jimi.jimi.antlr3
315
316	Remove redundant BACKTRACK checking so that MSVC9 does not get confused
317	about possibly uninitialized variables
318
319Change 5565 on 2009/01/28 by jimi@jimi.jimi.antlr3
320
321	Use malloc rather than calloc to allocate memory for new scopes. Note
322	that this means users will have to be careful to initialize any values in their
323	scopes that they expect to be 0 or NULL and I must document this.
324
325Change 5564 on 2009/01/28 by jimi@jimi.jimi.antlr3
326
327	Use malloc rather than calloc for copying list lable tokens for
328	rewrites.
329
330Change 5561 on 2009/01/28 by jimi@jimi.jimi.antlr3
331
332	Prevent warnigsn about retval.stop not being initialized when a rule
333	returns eraly because it is in backtracking mode
334
335Change 5560 on 2009/01/28 by jimi@jimi.jimi.antlr3
336
337	Add a NULL check before freeing rewrite streams used in AST rewrites
338	rather than auto-rewrites.
339
340	While the NULL check is redundant as the free cannot be called unless
341	it is assigned, Visual Studio C 2008
342	gets it wrong and thinks that there is a PATH than can arrive at the
343	free wihtout it being assigned and that is too annoying to ignore.
344
345Change 5559 on 2009/01/28 by jimi@jimi.jimi.antlr3
346
347	C target Tree rewrite optimization
348
349	There is only one optimization in this change, but it is a huge one.
350
351	The code generation templates were set up so that at the start of a rule,
352	any rewrite streams mentioned in the rule wer pre-created. However, this
353	is a massive overhead for rules where only one or two of the streams are
354	actually used, as we create them then free them without ever using them.
355	This was copied from the Java templates basically.
356	This caused literally millions of extra calls and vector allocations
357	in the case of the GNU C parser given to me for testing with a 20,000
358	line program.
359
360	After this change, the following comparison is avaiable against the gcc
361	compiler:
362
363	Before (different machines here so use the relative difference for
364	comparison):
365
366	gcc:
367
368	real    0m0.425s
369	user    0m0.384s
370	sys     0m0.036s
371
372	ANTLR C
373	real    0m1.958s
374	user    0m1.284s
375	sys     0m0.656s
376
377	After the previous optimizations for vector pooling via a factory,
378	plus this huge win in removing redundant code, we have the following
379	(different machine to the one above):
380
381	gcc:
382	0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
383	0inputs+328outputs (0major+9922minor)pagefaults 0swaps
384
385	ANTLR C:
386
387	0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
388	0inputs+0outputs (0major+130944minor)pagefaults 0swaps
389
390	The extra system time coming from the fact that although the tree
391	rewriting is now optimal in terms of not allocating things it does
392	not need, there is still a lot more overhead in a parser that is generated
393	for generic use, including much more use of structures for tokens and extra
394	copying and so on. I will
395	continue to work on improviing things where I can, but the next big
396	improvement will come from Ter's optimization of the actual code structures we
397	generate including not doing things with rewrite streams that we do not need to
398	do at all.
399
400	The second machine I used is about twice as fast CPU wise as the system
401	that was used originally by the user that asked about this performance.
402
403Change 5558 on 2009/01/28 by jimi@jimi.jimi.antlr3
404
405	Lots of optimizations (though the next one to be checked in is the huge
406	win) for AST building and vector factories.
407
408	A large part of tree rewriting was the creation of vectors to hold AST
409	nodes. Although I had created a vector factory, for some reason I never got
410	around to creating a proper one, that pre-allocated the vectors in chunks and
411	so on. I guess I just forgot to. Hence a big win here is prevention of calling
412	malloc lots and lots of times to create vectors.
413
414	A second inprovement was to change teh vector definition such that it
415	holds a certain number of elements wihtin the vector structure itself, rather
416	than malloc and freeing these. Currently this is set to 8, but may increase.
417	For AST construction, this is generally a big win because AST nodes don't often
418	have many individual children unless there has not been any shaping going on in
419	the parser. But if you are not shaping, then you don't really need a tree.
420
421	Other perforamnce inprovements here include not calling functions
422	indirectly within token stream and common token stream. Hence tokens are
423	claimed directly from the vectors. Users can override these funcitons of course
424	and all this means is that if you override tokenstreams then you pretty much
425	have to provide all the mehtods, but then I think you woudl have to anyway (and
426	I don't know of anyone that has wanted to do this as you can carry your own
427	structure around with the tokens anyway and that is much easier).
428
429Change 5554 on 2009/01/26 by jimi@jimi.jimi.antlr3
430
431	Fixed: ANTLR-379
432	For some reason in the past, the ruleMemozation() template had required
433	that the name parameter be set to the rule name. This does not seem to be a
434	requirement any more. The name=xxx override when invoking the template was
435	causing all the scope names derived when cleaning up in memoization to be
436	called after the rule name, which was not correct. Howver, this only affected
437	the output when in output=AST mode.
438
439	This template invocation is now corrected.
440
441Change 5553 on 2009/01/26 by jimi@jimi.jimi.antlr3
442
443	Fixed: ANTLR-330
444	Managed to get the one rule that could not see the ASTLabelType to call
445	back in to the super template C.stg and ask it to construct hte name. I am not
446	100% sure that this fixes all cases, but I cannot find any that fail. PLease
447	let me know if you find any exampoles of being unable to default the
448	ASTLabelType option in the C target.
449
450Change 5552 on 2009/01/25 by jimi@jimi.jimi.antlr3
451
452	Progress: ANTLR-327
453	Fix debug code generation templates when output=AST such that code
454	can at least be generated and I can debug the output code correctly.
455	Note that this checkin does not implement the debugging requirements
456	for tree generating parsers.
457
458Change 5551 on 2009/01/25 by jimi@jimi.jimi.antlr3
459
460	Fixed: ANTLR-287
461	Most of the source files did not include the BSD license. THis might
462	not be that big a deal given that I don't care what people do with it
463	other than take my name off it, but having the license reproduced
464	everywhere at least makes things perfectly clear. Hence this mass change of
465	sources and templates to include the license.
466
467Change 5549 on 2009/01/25 by jimi@jimi.jimi.antlr3
468
469	Fixed: ANTLR-354
470	Using 0.0D as the default initialize value for a double caused
471	VS 2003 C compiler to bomb out. There seesm to be no reason other
472	than force of habit to set this to 0.0D so I have dropped the D so
473	that older compilers do not complain.
474
475Change 5547 on 2009/01/25 by jimi@jimi.jimi.antlr3
476
477	Fixed: ANTLR-282
478	All references are now unadorned with any type of NULL check for the
479	following reasons:
480
481		1) A NULL reference means that there is a problem with the
482		   grammar and we need the program to fail immediately so
483		   that the programmer can work out where the problem occured;
484		2) Most of the time, the only sensible value that can be
485		   returned is NULL or 0 which
486		   obviates the NULL check in the first place;
487		3) If we replace a NULL reference with some value such as 0,
488		   then the program may blithely continue but just do something
489		   logically wrong, which will be very difficult for the
490		   grammar programmer to detect and correct.
491
492Change 5545 on 2009/01/24 by jimi@jimi.jimi.antlr3
493
494	Fixed: ANTLR-357
495	The bug report was correct in that the types of references to things
496	like $start were being incorrectly cast as they wer not changed from
497	Java style casts (and the casts are unneccessary). this is now fixed
498	and references are referencing the correct, uncast, types.
499	However, the bug report was wrong in that the reference in the bok to
500	$start.pos will only work for Java and really, it is incorrect in the
501	book because it shoudl not access the .pos member directly but shudl
502	be using $start.getCharPositionInLine().
503	Because there is no access qualification in C, one could use
504	$start.charPosition, however
505	really this should be $start->getCharPositionInLine($start);
506
507Change 5541 on 2009/01/24 by jimi@jimi.jimi.antlr3
508
509	Fixed - ANTLR-367
510	The code generation for the free method of a recognizer was not
511	distinguishing tree parsers from parsers when it came to calling delegate free
512	functions.
513	This is now corrected.
514
515Change 5540 on 2009/01/24 by jimi@jimi.jimi.antlr3
516
517	Fixed ANTLR-355
518	Ensure that we do not attempt to free any memory that we did not
519	actually allocate because the parser rule was being executed in
520	backtracking mode.
521
522Change 5539 on 2009/01/24 by jimi@jimi.jimivista.antlr3
523
524	Fixed: ANTLR-355
525	When a C targetted parser is producing in backtracking mode, then the
526	creation of new stream rewrite structures shoudl not happen if the rule is
527	currently backtracking
528
529Change 5502 on 2008/12/11 by jimi@jimi.jimi.antlr3
530
531	Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit
532	compatible
533
534Change 5473 on 2008/12/01 by jimi@jimi.jimivista.antlr3
535
536	Fixed: ANTLR-350 - C runtime use of memcpy
537	Prior change to use memcpy instead of memmove in all cases missed the
538	fact that the string factory can be in a situation where overlaps occur. We now
539	have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
540
541Change 5387 on 2008/11/05 by parrt@parrt.spork
542
543	Fixed x+=. issue with tree grammars; added unit test
544
545Change 5325 on 2008/10/23 by parrt@parrt.spork
546
547	We were all ref'ing backtracking==0 hardcoded instead checking the
548	@synpredgate action.
549
550
551