>> parse [x 1 2 x 3 4 x] [collect some [collect keep ['x () | integer!]]] == [[x] [1] [2] [x] [3] [4] [x]]
COIParser: make object! [
; handle whitespace
ws: [any [space | tab | cr | lf]]
; the statements
ThisStatement: ["this" keep ('this)]
ThatStatement: ["that" keep ('that)]
Statement: [ThisStatement | ThatStatement]
; grouping into rules and libraries
Rule: [collect [
"rule" keep ('rule)
any [ws Statement] ws
"endrule" keep ('endrule)]
]
Library: [collect [
"library" keep ('library)
collect [any [ws Rule]] ws
"endlibrary" ws keep ('endlibrary)]
]
]>> parse lib COIParser/Library == [library [[rule this that endrule] [rule that this endrule] []] endlibrary]
text
input: load {
LIBRARY
RULE
this
that
ENDRULE
RULE
that
this
ENDRULE
ENDLIBRARY
}
library: [keep 'LIBRARY (rules: make block! 10) any rule keep (rules) keep 'ENDLIBRARY]
rule: [collect into rules keep ['RULE some statement 'ENDRULE]]
statement: ['this | 'that]
probe parse input [collect library]() here. This is what you need:Rule: [
"rule"
collect [keep ('rule)
any [ws Statement] ws
"endrule" keep ('endrule)]
]keep with two wrapping collects, I wonder why it even works that way.parse [x 1 2 x 3 4 x] [collect some [keep copy whatever ['x | integer!]]] == [[x] [1] [2] [x] [3] [4] [x]]
collect works too.text >> parse [x 1 2 x 3 4 x][collect some [keep ahead collect keep skip skip]] == [[x] [1] [2] [x] [3] [4] [x]]
collect subtlety than a bug. Need to think about it deeply some time./dialects, or to create another room for discussions concerned with DSL creation in Red/Rebol?collect and is *very* subtle indeed:>> parse [x][some [(print '.) 'x]] . . == true >> parse [x][some [(print '.) 'x ()]] . == true >> parse [x][some [(print '.) word!]] . == true
/bugs then?end already>> parse [x][collect some [collect keep 'x]] == [[x] []] >> parse [x][collect some [collect some keep 'x]] == [[x]] >> parse [x][collect some collect any keep 'x] == [[x]] >> parse [x][collect some collect 1 2 keep 'x] == [[x]] >> parse [x][collect some keep copy dang! 'x] == [[x]] >> parse [x][collect some collect keep copy dang! 'x] == [[[x]] []] >> parse [x][collect some collect [keep copy dang! 'x ()]] == [[[x]]] >> parse [x][collect some collect keep 1 10 'x] == [[x]]
parse topics. Still all related, and likely to run in batches on themes that come up.parse-trace is already do what ?? does?parse/trace (parse-trace wraps that) is more powerful, but not as convenient for ad-hoc rule checking.>> parse "dog" [ ?? "d" ?? [ "i" | "o" ] ?? "g" ?? ] "d": "dog" ["i" | "o"]: "og" "g": "g" end!: "" == true
>> parse-trace "dog" [ "d" [ "i" | "o" ] "g" ]
-->
match: ["d" ["i" | "o"] "g"]
input: "dog"
==> matched
match: [["i" | "o"] "g"]
input: "og"
-->
match: ["i" | "o"]
input: "og"
==> not matched
match: ["o"]
input: "og"
==> matched
<--
match: ["g"]
input: "g"
==> matched
<--
return: true
== trueparse/trace wrapper and callback to start.p-indent being global (leaking) is doc'd somewhere? It's shared for parse tracing and the callback, likely global because of a compiler limitation. Don't know if that still holds. parse-??: func [
{Wrapper for parse/trace using ?? hook}
input [series!]
rules [block!]
/case
/part
limit [integer!]
return: [logic! block!]
][
clear p-indent
either case [
parse/case/trace input rules :on-parse-??
] [
either part [
parse/part/trace input rules limit :on-parse-??
] [
parse/trace input rules :on-parse-??
]
]
]
on-parse-??: func [
event [word!] {Trace events: push, pop, fetch, match, iterate, paren, end}
match? [logic!] "Result of last matching operation"
rule [block!] "Current rule at current position"
input [series!] "Input series at next position to match"
stack [block!] "Internal parse rules stack"
return: [logic!] {TRUE: continue parsing, FALSE: stop and exit parsing}
][
switch event [
fetch [
if '?? = rule/1 [
print [mold rule/2 ":" mold input]
remove rule ; produces R3-like output, but mods the rule
]
]
]
true
]
parse-?? "dog" [ ?? "d" ?? [ "i" | "o" ] ?? "g" ?? ]?? in the rule.parse, I'm going to try to make a typography program. The idea is quite simple. Every document element you see below is a red function which takes a string, parses it, then creates a document in a target output. I intend to have the functions definitions for different target outputs in separate red files, and load the appropriate typography functions file based on the target specified at the command line.use %typogaphy.red
document-style {default}
document-orientation {landscape}
page-orientation {portrait}
page-size {letter}
document {
title: My First document
author: Yves Cloutier
isbn: 970-123-456
publication-date: Nov-30-1978
publisher: Birch and Aspen
format: epub
}
chapter { My First Chapter }
section { First Section }
p {
hang-indent: 1cm
drop-cap: 2 2
Before there was light in the world, there was darkness.
Before there was light in the world, there was darkness[*].
}
footnote {
Some text for the footnote, identified above as [*].
}
list {
1. Item 1
2. Item 2
a) Sub item
b) Sub item
3. Item 3
}sum: function [ a b ] [ a+ b] ; A user defined function
a: 5 b: 6 ; Some user defined values
; A paragraph function call, with red code to be evaluated and interpolated into the string.
p {
The sum of a and b is @sum [ @a @b ].
}@, this means I need to evaluate red code, and append the result as a string as part of my function output. For example, if the target output is HTML, the output would be:<p>The sum of a and b is 11</p>
p: func [blk /local output][output: "<p></p>" head insert skip copy output 3 form reduce blk] >> p ["The sum of a and b is" sum a b] == "<p>The sum of a and b is 11</p>"
intoparse [...] rule: [some [word! | into rule]]into will parse both string! and block!, so it's better to first check for block - ahead block! into rule.blk: [name1 123] item: ["Mike"] ; ok ; item: [Mike] ;-- parse error:usage of: Mike rule1: [word! insert only item integer!] print parse blk rule1 print mold blk ;-- [name1 ["Mike"] 123] OK ;-- but how do I get: [name1 [Mike ] 123] ?
>> block: [name 42] == [name 42] >> item: [|9214|] == [|9214|] >> also block parse block [word! insert only (item) integer!] == [name [|9214|] 42]
blk: [name1 123] item: ['Mike] rule1: [word! insert only item integer!] print parse blk rule1 print mold blk
mike: [] blk: [name1 123] item: [Mike] ;-- parse error:usage of: Mike rule1: [word! insert only item integer!] print parse blk rule1 print mold blk ;-- [name1 ["Mike"] 123] OK
>> parse "2" [ ["2" end] | "25654446"] == true >> parse "23" [ ["2" end] | "25654446"] == false >> parse "25654446" [ ["2" end] | "25654446"] == true
>> parse "518989888888" ["51" to end] == true ; check if this is true, then do something ; or >> parse "518989888888" ["51" (print 'do-something) ] do-something
16 = length? cardparse "518989888888" ["51" to end end: (print length? head end)]valid-entry?: func [val] [16 = length? val] mastercard: [["51" | "52" | "53" | "54" | "55"] to end] visa: ["2" to end] valid-card: [mastercard | visa]
valid-card-type. There's an algorithm for valid card numbers that I'll do later.dig: charset "0123456789" parse input [16 dig]
visa?: ["4226" 12 dig]
digits: charset ["0123456789"] mastercard: [["51" | "52" | "53" | "54" | "55"] to end] visa: ["2" to end] valid-company: [mastercard | visa] valid-length: [16 digits] valid-entry: [valid-length | valid-company]
to end with the number of remaining digits, then you don't need a separate rule for valid length; Based on https://en.wikipedia.org/wiki/Payment_card_number
;
; The maximum length of a credit card number is 19 digits,
; with the maximum length of the account number field being 12 digits
; (initial six-digit issuer identifier minus the final digit check number).
digits: charset ["0123456789"]
amex: [["34" | "35" | "36" | "37"] 13 digits] ; Length of 15 is valid.
discover: [["6106" | "64" | "65"] 15 digits] ; Length of 16 to 19 is valid.
mastercard: [["51" | "52" | "53" | "54" | "55"] 14 digits] ; Also the range: 2221–2720, Length of 16 is valid.
visa: ["4" 15 digits] ; ??? Length of 13, 16, or 19 is valid
valid-entry: [amex | discover | mastercard | visa]
validate-entry: func [n][parse n valid-entry]
view [
size 400x200 title "CC parse thing"
f: field 120x20 on-change [
f/color: either validate-entry trim/all copy f/text [green][white]
]
]>> digits: charset "0123456789"
== make bitset! #{000000000000FFC0}
>> visa: ["4" [19 digits | 16 digits | 13 digits]]
== ["4" [19 digits | 16 digits | 13 digits]]
>> parse "41234567890123" visa
== true
>> parse "412345678901234" visa
== false
>> parse "4123456789012345" visa
== false
>> parse "41234567890123456" visa
== true
>> parse "412345678901234567" visa
== false
>> parse "4123456789012345678" visa
== false
>> parse "41234567890123456789" visa
== true
>> parse "412345678901234567890" visa
== falseparse rules as greedy. Put your longest matches first. Otherwise the shorter rule may match first, and miss part of what would have matched a longer rule.discover: [["6106" | "64" | "65"] [15 digits | 14 digits]] ;Length of 16 to 19 is valid.
reject worked, quite a pity I can't exit the outer loops somehow, have to design a workaround with variable assigning/checking everywhere etc... becomes messy"6106" and ["64" | "65"]. First will have shorter range of digits following it than second. Range of digits can be given by digits .discover: [["6106" 12 15 digits | "64" | "65"] [14 16 digits]] discover: ["6106" 12 15 digits | "64" 14 16 digits | "65" 14 16 digits]>> discover: ["6106" 12 15 digits | ["64" | "65"] 14 17 digits] >> parse "6412345678901234567" discover == true >> parse "64123456789012345678" discover == false
parse features, so seeing them applied may lead to more useful...rules.plan: [do smth while checking (conditions) and stop if some fail]plan and parse plan [while [at: ... :at]] would go over it indefinitely, only quitting from while once some condition failsplan and exec that code), but as a quick and dirty test it would've been OK - if only break/reject worked as they are documented, and better if there was a way to quit from the outer loop in situations like [while [at: any [... reject or smth] :at]]reject in and call it a day :) Since multi-level breaks are probably too much pain to design and it will require a multitude of concrete use cases which I do not possess.break, and you imagine a symmetrical reject which does the same thing *except* for failing the entire loop. But... this has the big danger of coming up with a design that tries to cover everything regardless of what's actually useful.parse improvements for R3 came about), not imagined ones.parse seems to be that a lot of things were put in because it was annoying not to have them in R2, however, some of them don't seem to make that much sense together. I want to see actual usage, but I'm not sure there's enough R3 code out there to gain that perspective.)parse rule, and you *wish* something was different or something else was possible etc., then post it here. If your use case is esoteric, I don't think Red should worry about it. But if it's a reasonable common thing then we need to address it.reject comment was "Hmmm, could I use throw for that?". parse to make things easier, but we should focus on common use cases, and make sure we cover less common ones in someway, but not necessarily with ad-hoc solutions for obscure cases.break and reject because they make sense... but are they so useful in practice that they should be there as opposed to being able to do the same thing with just break and if (for eg.)? We need code to look at. :)reject is useful it might be enough reason to have it ☻, since this instinct comes from lots of experience. A few more arguments in favor of reject:break, supporting reject is absolutely trivial, it's not an effortp: loop [... :p break now ...] - works with now but just break (in the proposed model) will go thru the alternatives and they don't probably expect :p thing happen. Plus this whole rule will succeed, which might be undesirable(backtrack: no) loop [... break (backtrack: yes) ...] if (not backtrack) ? ugh.. so uglyreject comment was "Hmmm, could I use throw for that?". throw can be used for that. throw gets out of the parse expression completely and you'll have hard time restoring the position where it stopped. No?parse rule, and you *wish* something was different or something else was possible etc., then post it here. If your use case is esoteric, I don't think Red should worry about it. But if it's a reasonable common thing then we need to address it.while to *reparse* these loops over and over, executing the respective code and checking conditions. Based on these conditions in both cases I might want to backtrack the whole loop or accept it. It makes both *deep* break and reject useful. What held me back is I couldn't reason about my own code with these constructs behaving the way they do.break. I don't know if we *need* reject, because that's very subtle, and it may be that code turns out to be better without it (you have to think more about your code, which people don't want to do, but it's a very good thing to do). Not having break in R2 has been quite painful in the past. But, we also have other construct that we didn't have back then, so it's not trivial for me to make a conclusion without trying to solve problems and thinking about it for a long time.topaz-parse, to allow people to play with this more. So if you stumble upon a problem, you can go ahead and make a change and submit a PR, then we can discuss both the PR and the problem it came from, and decide if it should go upstream to Red or not.end skip all the time, you'd get fail, even though they are the same thing.reject is non-trivial to emulate if you don't have it. So if it's something that can pop up from time to time, it's worth having it. If it never pops up because we can show that you can always achieve your goal in a different, cleaner way, then it's not worth having it.fail, that is easy to achieve with just end skip, and is only used in obscure cases, then it's not worth having. It might very well be that once you have if, for eg., things like fail don't make much sense anymore. Even none has perhaps been made obsolete by opt(and was just kept in R2 for the "can't take things away" reason). Ie. I believe the original intent was things like a | b | none which is the same as opt [a | b].fail and none can be both expressed easily. No big deal if we won't have them. On the other hand, @9214 vision of break that does not terminate the loop right away gives a sudden meaning to fail, so maybe it's worth having instead of reinventing every now and then...fail none and reject and see where it leads, right?reject, fail or whatever.parse/trace for AST/CST building, to some extent (haven't done it personally, just speculating). I don't think API is fine-grained enough, but you already can shoot your feet off.[definition IsAdult
[expression
[simpleexpression
[term
[factor
[fieldreference
[record "Customer" endrecord]
[field "Age" endfield]
endfieldreference]
endfactor]
endterm]
endsimpleexpression]
[relation ">" endrelation]
[simpleexpression
[term
[factor
[number 15 endnumber]
endfactor]
endterm]
endsimpleexpression]
endexpression]
enddefinition]>> unset 'x unset 'y unset 'z >> parse [a b c][(x: 1) 'g | (y: 2) 'h | (z: 3) 'f] == false >> print [x y z] 1 2 3
[none (x: 1) [fail] | none (y: 2) [fail] | ...]paren before the failure is evaluated.parse is not doing anything special - just one thing at a time, like any other procedural language.print "Hello" if a > 10 [print "World"]
if, that happens "later", to affect the print happening "earlier".[(print "Hello") integer! (print "World")], you need to think of it in terms of:print "Hello" if integer? current-value [print "World"]
parse would try to magically determine what the programmer intended to happen.collect and keep. So *where* you put them in the rules becomes very important. (And this is part of why things like ahead were introduced.)parse in Topaz was indeed to do stuff like create ASTs while parsing. It's not that hard with REBOL and Red's parse, but it's not as easy as it should be IMHO.|. In latter view a paren can start a rule initializing it.some code... \
fence: ["```" newline] parse x [copy f to fence]
parse x [some [thru "```" insert "red" thru "```"]]
parse x [some [thru "```" [ahead newline insert "red" | skip] thru "```"]]
insert find text " " "```red^/" insert next find find/last text " " newline "```^/"
foreach f read %. [write f parse read f [some [thru "" insert "red" thru ""]]]text: {Beginning
red>> "a" = "A"
== false
red>> "a" = "A"
== false
red>> "a" = "A"
== true
red>> "a" = "A"
== false
Some other text.
red>> "b" = "B"
== false
red>> "b" = "B"
== false
red>> "b" = "B"
== true
red>> "b" = "B"
== false
}
parse text [some [to " " insert "```red^/" thru [newline not [newline | " "] insert "```^/"]]]
print textred>> "a" = "A"
== false
red>> "a" = "A"
== false
red>> "a" = "A"
== true
red>> "a" = "A"
== falsered>> "b" = "B"
== false
red>> "b" = "B"
== false
red>> "b" = "B"
== true
red>> "b" = "B"
== falsethru part may need modification, e.g. detecting for end if needed.parse to file. In above case it will always be false. read file to a var/word, parse this var, write var back to file.parse/trace, or combination of both.logic! value, I believe, otherwise parsing will either fail or stuck in a loop. Also, there were a couple of tickets on Github that pinpoint differences between parse-trace and parse.[[a] b] instead of [a b]).trace for a moment and follow @moliad's approach, as it's simple and flexible enough to tailor your exact domain.>> chars: object [digit: charset [#"0" - #"9"]]
== make object! [
digit: make bitset! #{000000000000FFC0}
]
>> parse "123" [some chars/digit]
== true[[#"a"][#"b"]] or [["a"]["b"]]>> parse "xx(a)xx(b)xx" [some [collect ["(" keep skip ")"] | skip]]
== [[] [#"a"] [] [] [#"b"] [] []]keepis triggered twice, as it should be but the result makes no sense.>> z: [] parse "xx(a)xx(b)xx" [some [collect [#"(" keep skip (print "keep") ")"] | skip]]
keep
keep
== [[] [#"a"] [] [] [#"b"] [] []]keep in parse behaves like keep/only but that's been already reported.( to keep one value and it does it twice, not seven times.keep behavior, and it looks like [there are some ways to control it](https://github.com/red/red/wiki/[DOC]-Guru-Meditations#parse-collectkeep-options-combined-with-tothru-rules) now with pick and copy, but it didn't explain what I was seeing here.collecttext
>> parse "xx(a)xx(b)xx" [collect some ["(" copy match skip keep (reduce [match]) ")" | skip]]
== [["a"] ["b"]]>> parse "xx(a)xx(b)xx" [collect some ["(" set match skip keep (reduce [match]) ")" | skip]]
== [[#"a"] [#"b"]]text
>> also x: "" parse "xx(a)xx(b)xx" [collect into x some ["(" keep skip ")" | skip]]
== "ab"collect will return a block in any case, if nothing was keeped then block will be empty.collect just parsecollect?x.parse-trace crashes on b:*** Script Error: PARSE - KEEP is used without a wrapping COLLECT *** Where: parse *** Stack: parse-trace
>> parse "xx(a)xx(b)xx" [some [collect ["(" (prin #"?") keep skip (prin #"!") ")"] | skip (prin #".")] (print "")]
..?!..?!..
== [[] [#"a"] [] [] [#"b"] [] []]>> parse "xx(" [some [collect ["(" keep skip ")"] | skip]]
== [[] []]
>> parse "xx(a" [some [collect ["(" keep skip ")"] | skip]]
== [[] [#"a"] []]text >> parse [x][some collect skip] == [] >> parse [x x][some collect skip] == [[]] >> parse [x x x][some collect skip] == [[] []]
keeps supposed to backtrack if a later part of the rule is not matched?>> parse [x y z][collect [some [[keep 'x 'z] | keep skip]]] == [x x y z]
killer-robot-rule: ['kill 'everybody (kill/everybody now) /s (message "just joking")] is bad idea.parse "xx(a)xx(b)xx" [collect some [#"(" collect [keep skip] #")" | skip]]
;== [[#"a"] [#"b"]]
parse "xx(a)xx(b)xx" [collect some [#"(" collect [keep copy x skip] #")" | skip]]
;== [["a"] ["b"]]collect into some, then on each step it returns block or kept entity.collect doesn't match, it still starts a new 'collection' for each iteration of the some rule. Though as @9214 pointed out, the missing first empty block is still strange...collect creates the initial block, so first "x" is responsible for outer block in your example. But consider this:parse "xx(a)xx(b)xx" [collect some [collect ["(" keep skip ")"] | skip]]
;== [[] [] [#"a"] [] [] [#"b"] [] []]collect is removed, [#a] will become the collection block:parse "xx(a)xx(b)xx" [some [#"(" collect [keep skip] #")" | skip]]
;== [#"a" [#"b"]]parse x: "xx(a)xx(b)xx" [some [remove #"(" skip remove #")" | remove skip]] x
;== "ab"parse "xx(a" [some [collect ["(" keep skip [end | ")"]] | skip]]
== [[] [#"a"]]collect is removed, [#a] will become the collection block:parse "xx(a)xx(b)xx" [some [#"(" collect [keep skip] #")" | skip]]
;== [#"a" [#"b"]]collect within a single parse run should be forbidden without an explicit outer collect block.>> parse "abcd" [collect [keep #"a" keep #"b"] collect [keep #"c" keep #"d"]] == [#"a" #"b" [#"c" #"d"]]
== [#"a" #"b" #"c" #"d"]
>> parse "abcd" [collect [collect [keep #"a" keep #"b"] collect [keep #"c" keep #"d"]]] == [[#"a" #"b"] [#"c" #"d"]]
parse "abcd" [collect [keep #"a" keep #"b" collect [keep #"c" keep #"d"]]]
collect in parse works.collect with following rule is seenparse "a" [collect collect keep "a"] is ok, because collect keep "a" is normal rule, and we can rewrite the example asrule: [collect keep "a"] parse "a" [collect rule].p: [] prefix: [collect into p [any keep ["+" | "-"]]] digits: [collect [any keep ["1" | "2" | "3"]]] letters: [collect [any keep ["a" | "b" | "c"]]]
>> parse "123a" [collect [digits letters]] == [[#"1" #"2" #"3"] [#"a"]] >> parse "123a" [digits letters] == [#"1" #"2" #"3" [#"a"]] >> parse "123a" [prefix digits letters] == true >> parse "123a" [[prefix |] digits letters] == true >> parse "123a" [[digits | prefix digits] letters] == [#"1" #"2" #"3" [#"a"]]
parse s [rule] returns b1 (block! = type? b1)parse s [collect [rule]] to return b2 = [b1]>> parse "123" [digits letters] == [#"1" #"2" #"3" []] >> parse "123" [collect [digits letters]] == [[#"1" #"2" #"3"] []]
>> parse "abc" ["abc"] == true >> parse "abc" [["abc"]] == true >> parse "abc" [[["abc"]]] == true
true [true] [[true]] here?block! = type? b1 :)[prefix digit letters] case what did you expect?[#"1" #"2" #"3" [#"a"]] although would rather like it to be [[#"1" #"2" #"3"] [#"a"]] or [#"1" #"2" #"3" #"a"]. Definitely not just true, as all my collects were lost.>> parse "123a" [prefix digits letters] == true >> head p == [[#"1" #"2" #"3"] [#"a"]]
into destination?prefix initiated the collection-blockcollect. If you can figure out a better trade-off (easy of use vs easy of reasoning vs implementation complexity) for collect semantics, by all means write a REP so we can discuss about it. /collect refinement for parse be a better option to trigger the collecting mode instead of waiting to encounter the first collect keyword and cause some harder-to-predict results? As collecting mode changes the type of output for parse, it would make sense to make it more explicit in form of a refinement. Though, it then looses the ability to dynamically switch between the validation/collecting modes from the rules themselves, but I guess that's a very rare use-case (I doubt anyone ever relied on that feature). parse/collect looks good to me.parse/collect acts like it already found a collect inside the rule block just like parse blk [collect ...]collect rules