the-path: "#theserver:theDB/thetable/thefield" the-path: "theDB/thetable/thefield"
caratteri-path: charset [#"A" - #"Z" #"a" - #"z" #"0" - #"9"] parse the-path [ copy first-char 0 1 [ "!" | "#" ] copy first-word some caratteri-path (print ["First Word" first-word]) any [1 [ "/" | ":" ]] copy bword some [some caratteri-path (print ["bword" bword]) | 1 "/"] ]
First Word theserver *** Script Error: bword has no value *** Where: print *** Stack: decode-path-parse
text >> split "#theserver:theDB/thetable/thefield" charset "#:/" == ["" "theserver" "theDB" "thetable" "thefield"] >> split "theDB/thetable/thefield" charset "#:/" == ["theDB" "thetable" "thefield"]
chars: charset [#"A" - #"Z" #"a" - #"z" #"0" - #"9"]
element: [keep some chars]
prefix: [keep ["#" | "!"] element ":"]
delimiter: ["/" | end]
path: [some [element delimiter]]
foreach test [
"#this:that/end"
"!this:that/end"
"this/that/end"
][
probe new-line/all parse test [collect [opt prefix path]] off
]copy bword some [some caratteri-path (print ["bword" bword]) | 1 "/"]
"/" at the very end. My rule fails for the same reason, so path is never actually parsed. But each element of path is matched successfuly so copy and print work as expected for the rest of the input. some [some caratteri-path (print ["bword" bword]) | 1 "/" | end]
some caratteri-path (print ["bword" bword])
1 "/"
"end"
some [c
aword: copy [] parse the-path [ copy first-char 0 1 [ "!" | "#" ] (append aword first-char) copy first-word some caratteri-path (print ["First Word" first-word] (append aword first-word)) any [1 [ "/" | ":" ]] some [ copy bword some caratteri-path (print ["bword" bword] append aword bword) | 1 "/" (append aword "/")| end] (probe aword) ]
First Word theserver bword theDB bword thetable bword thefield ["#" "theserver" "theDB" "/" "thetable" "/" "thefield"] >
parse the-path [ 0 1 [copy first-char "!" (append aword first-char) | copy first-char "#" (append aword first-char)] copy first-word some caratteri-path (print ["First Word" first-word] (append aword first-word)) any [copy bword [1 [ "/" | ":" ]] (append aword bword)] some [ copy bword some caratteri-path (print ["bword" bword] append aword bword) | 1 "/" (append aword "/")| end] (probe aword) ]
Extraction section.parse support binary! input to iterate through it byte by byte? I am currently struggling to grok it by means of parsing a file, read as binary, and it doesn't seem to work as I would expect it to. From the looks of it, you can match the input only as a whole, not iterate through it. Or am I mistaken here?>> parse #{00100101} [some [s: (probe s/1) skip]]
0
16
1
1
== true>> rejoin parse #{497420776F726B7321} [collect some [s: keep (to-char s/1) skip]]
== "It works!"binary! input since a while, so, perhaps you're doing something wrong. What binary format are you trying to parse? You don't need Parse to iterate thru a series though.>> get also 'match parse #{AA AA DEADBEEF FF FF} [2 #{AA} thru #{DEAD} copy match 2 skip to end]
== #{BEEF}parse returns false, even if the match is found before the end of the input, due to rule failure afterwards. I was expecting true on match and therefore assumed something was not working as expected. Using parse-trace helped me to get better understanding of how parse works.parse rules? I have a code like this:print parse some-bin-file [
collect set x [n: keep (to integer! to binary! reverse reduce [n/1 n/2]) 2 skip]
x collect [o: keep (to integer! to binary! reverse reduce [o/1 o/2 o/3 o/4]) 4 skip]
]x number of times. However, this code throws an error:*** Script Error: PARSE - invalid rule or usage of rule:
text
>> parse mold quote ((( o ))) [copy match some "(" (many: length? match) skip many ")"]
== true
>> parse [3 a b c][set count integer! count word!]
== true
>> parse [3 word! a b c 2 refinement! /d /e][some [set count integer! set type! word! (type!: get type!) count type!]]
== trueset takes only the first value from the matched portion of the input, while copy takes the whole portion from start to end. So, if your length is specified by 2 bytes, you need to use copy, because set takes only the first byte (as integer! value).text
>> binary: append/dup #{ABCD} 0FFh 0ABCDh ; 43981 number of FFh
== #{
ABCDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFF...
>> parse binary [copy count 2 skip (count: to integer! count) count #{FF}]
== trueparse read input through the end?>> parse "abc" [some [end (probe "end") break | s: (probe s) skip]] "abc" "bc" "c" "end" == true
>> parse "abc" [some [end (probe "end") break | s: (probe s) skip]] "abc" "bc" "c" == true
>> parse "abc" [some [s: (probe s) skip [end (probe "end") | none]]] "abc" "bc" "c" "end" == true
[end ... | none] or [end ...|] is needed, otherwise will fail.parse "abc" [some [s: (probe s) skip] end (probe "end")]then keyword? R3 examples are fine, just really hard to search on Google.then in Red is broken IIRC, since no one could really understand its semantics.either, but most probably I am missing something ....then is different from normal parsing:parse "ab" [#"a" then #"b" | #"c"]
parse "ab" [#"a" #"b" | #"c"]
then to work.then, as it is described in Rebol docs, might be a by-product of recursive-descent Parse implementation that R3 uses.topaz-parse is recursive-descent as well? (Not that I see how implementing it as a state machine makes any difference - it's just a matter of whether you use the stack implicitly or explicitly.)then removal, but jumping over alternate rule that it makes (or supposed to make) is a nice feature.then. :Prule1 then rule2 | rule3, but I may be of course wrong, the docs are not very clear.then. I have to look if I ever used it.text [rule1 (cont: rule2) | (cont: rule3)] cont
[rule1 rule2 | rule3]
rule1 matched, match rule2. If rule1 failed, backtrack and match rule3.then tries to mimick A ← B[C,D] rule, which is different from A ← BC/D only in some obscure detail related to recursive call.[rule1 then rule2 | rule3 | rule4]rule1 fails, backtracks to rule3rule1 succeeds, but rule2 fails, backtracks to rule4 instead of rule3rule1 and rule2 succeed, then of course it does not backtrackaheadparse old-timers stumble to figure it out.)topaz-parse in preparation for much bigger changes to come: https://github.com/giesse/red-topaz-parsethen actually works on W10:parse [x y d 1][['x 'y then word! "comment here" | (probe "2") 'k 'l 'm | (probe "3") 'o 'p] then integer! "and final remarks"] ;== true parse [k l m 1][['x 'y then word! 'comment 'here | (probe "2") 'k 'l 'm | (probe "3") 'o 'p] then integer! "and final remarks"] ;"2" ;== true parse [o p 1][['x 'y then word! comment here | (probe "2") 'k 'l 'm | (probe "3") 'o 'p] then integer! "and final remarks"] ;"2" ;"3" ;== true parse [x y ta-daa! 42][['x 'y then 'x (do something) | (probe "2") 'k 'l 'm | (probe "3") 'o 'p] then integer! "and final remarks"] ;"2" ;"3" ;== false parse [x y ta-daa! 42][['x 'y then word! and whatever else | (probe "2") 'k 'l 'm | (probe "3") 'o 'p] then integer! "and final remarks"] ;== true
rule1 then rule2 rule3 | rule4 if either rule1 or rule2 fails, rule4 is tried; rule3 is always ignored.then rule2 you may write anything that is loadable. Seems usable for in-rule comments only?thenthen : regardless of failure or success of what follows, skip the next alternate rule.word! in your examples is not alternate rule.rule1 then rule2 | rule3 | rule4, in case rule1 fails rule3 is tried, in case rule1 succeeds but rule2 fails, rule rule3 is skipped and rule4 is tried. r1 then r2 r3 | r4 | r5, if both r1 and r2 succeed but r3 fails, r4 is skipped?r3 is never checked.>> parse "ac" ["a" "q" | "ac"] == true >> parse "ac" ["a" then "q" | "ac"] == false
[rule1 (cont: rule2) | (cont: rule3)] cont) is way cleaner and easier to follow than then.[either rule1 [rule2] [rule3]], or perhaps either rule1 [rule2 | rule3], though I find that just needlessly confusing.parse, like ahead or not which didn't exist in R2.parse, like ahead or not which didn't exist in R2.ahead has a bit of a [performance] problem: double matching.then as well. foo: func [x] [thru x keep to x x], x either needs to be bound to a context (either object! or function's call frame which you need to explicitly keep intact) or substituted for a literal argument. Or you need to mimic call stack and function call by yourself, doing all associated bookkeeping.parse/trace and some black magick trickery.topaz-parse I'd have to do it myself.parse dialect anyway, the compiler could directly substitute x for its value. So long as it's not a paren or something like that.then, I just want it would make sense.then as a keyword can be useful. I'm not sure for what right now, but it certainly can.thenis a redundant operator in PARSE since its the default operator.function!) for Parse rules is really worth it.foo: func [x] [thru x keep to x x] would be equally confused by view [...] or draw [...] IMO, or any other dialect.foo: func [x] [thru x keep to x x] would be equally confused about view or draw IMO.view or draw you're using commands inside the dialect. If foo: func [x] [thru x keep to x x] can be typed outside parse then it can be confusing.parse/with blabla bleble [foo], then it's not so confusing.parse/with [source] [rules] [function1: func [...] [...] function2: func [...][...]] wouldn't be a problem.x in set x). I have built it especially for parse.function!), because they can be just macros. @giesse does Topaz has anything else interesting in this regard?function! type just like we are just using block! for rules, as @rebolek says. There is no technical problem with it, it's just a matter of whether you care about possible confusion. Admittedly, most of what we do is confusing to newcomers anyway.parse, as well as not letting you accidentally use a regular function inside parse (both would lead to error messages that may not be very descriptive of what the real problem is, not to mention that both may just accidentally not error out but work in some way unexpected to the programmer which then has to figure out where the bug is). This can also be solved with a flag in the function spec, and as a way to do it right now, with a refinement like @nedzadarek suggests.function! because parse is an interpreter like in Red. But, topaz-parse being a compiler to parse right now, they can be just "macros" indeed./with, we could also do:parse [...] [
foo: rule [x] [keep to x x]
... foo 'c ...
]paren! expression.then example above demonstrates)parse and most of the confusion goes away. But, of course, if you have multiple rule blocks, where do those definitions appear? Are they global or local? Do we want /with as @rebolek suggests? etc.do interpreter and thus cause an error.parse-function! can be called from rule but not from paren! expression, and the reverse holds for function! (any-function! in fact).function! right now I would probably use a refinement like @nedzadarek suggested, so that it's unlikely that it would be called accidentally from the wrong "context".topaz-parse quite a lot, so perhaps we can play with the idea and see if it works in practice or not?function! calls. If it isn't there then function! simply bails out. This call frame provides enough "contextual" info, and there's no need for crude hacks.amb come to mind) or switching interpreter alltogether.func-rule! type is ..."topaz-parse so we can have a concrete basis for discussion.then)some-word: [...] parse [some-word] [block! | set x word! if (block? get/any x)]
block! value, but a word referring to a block! value should be accepted too.block! above, for eg., it's not a datatype! but a word referring to a datatype!. So if I write ['into opt datatype! block!] it doesn't work in practice, it needs to be ['into opt [datatype! | set x word! if (datatype? get x)] block!], and so on. In fact, almost everywhere I'm matching a value directly that's not a keyword, I want to do word lookup.parse?[get block!] which will match a word! value so long as it refers to a block! value. In general, [get rule] would match a word! as long as the value it refers to matches rule; but the generalization gets tricky because rule could match more than one value, and so on. So perhaps it should only allow datatypes and typesets.[into word! rule], but I don't know if that is confusing.integer! > 5 or something like that would match all integers greater that 5.word!/block! to indicate a word that refers to a block. This would allow something like [into word!/block! rule] to mean [into word! [into block! rule]], but that sounds like I'm going too far. :)[set x integer! if (x > 5)] is a good enough solution for that.if (x > 5) ever, but I have to lookup words basically every time I encounter one when compiling a topaz-parse rule.%ast-tools.red also have to do the same thingparse in R2 was the do keyword that solved a generalization of this problem.parse-function! idea.do does full evaluation, and it cannot back-track. I love it because you don't need compose on dialects anymore, but it seems that most people don't feel that way.if is a good general mechanism for it.parse ['collect | 'copy | 'object] [any [not '| skip | (x: none)]]
any subtlety.while. :)| (x: none) failblock? word!, more generally [ | ] , being rule that looks at next value and if it is word queries its reference value type (datatype! or typeset!)?get-datatype! syntax (:block!), but if we agree on a ref! type (@ref syntax), that could be pressed into service in parse when it's added.then go, in REP or in main repo?:block!, because in practice you'll never find an actual datatype in a rule, but a word like block! instead; so using a get-word to indicate a level of indirection might work.get block! vs :block! vs not sure what else (sorry @toomasv I don't think I like block? word!, especially because it would force us to keyword-match all the possibilities like block? and string? and so on and so forth.)topaz-parse though, at least not yet :) But, again, I don't know what this feature should look like. I'm using get right now, just hoping to find something better.--== Red 0.6.4 ==--
Type HELP for starting information.
>> do %topaz-parse.red
>> foo: rule [a b] [a ("A matched") | b ("B matched") | ("Neither matched")]
>> topaz-parse [1 2 3] [foo word! integer!]
== "B matched"
>> topaz-parse [a b c] [foo word! integer!]
== "A matched"
>> topaz-parse [a b c] [foo block! paren!]
== "Neither matched"
>> foo: rule [n [integer!]] [n integer!]
>> topaz-parse [1 2 3] [foo "String"]
*** Script Error: foo does not allow string! for its n argument
*** Where: do
*** Stack: topaz-parse cause-error
>> topaz-parse [1 2 3] [foo 3]
== 3
>> foo: rule [] [n: integer! (n * 3)]
>> topaz-parse [4] [foo]
== 12
>> n
*** Script Error: n has no value
*** Where: catch
*** Stack:foo (normal rule/block or 0-argument rule), a word! and an integer!foo rule that takes 1 argument (a word!) and an integer! orfoo rule that takes 2 arguments - a word! and an integer!topaz-parse itself: end should match whole thing as per readme:topaz-parse [a b c] ['a word! end] ; none
foo: ['a] topaz-parse [a b c] [foo word! end] ; a
rule that takes a block!/paren! and returns e.g a first and a second element (something like r: rule [bl] [bl/1 bl/2] - topaz-parse [a a] [r [word! word!]] => equivalent of 2-argument rule: r2: rule [a b] [a b] - topaz-parse [a a] [r2 word! word!])>> topaz-parse [a b] [word! word!] == b >> foo: [word!] topaz-parse [a b] [foo word!] == a
parse does not handle paths, so the compiler has to work around itbl/1 and bl/2 are, so it can only guess. This will make it fail if you try to use this for anything more complicated than just your example above. It may be possible to get most cases to work with enough static analysis, but that's a lot of work; maybe one day. :)bl/1 example is to have the compiler generate code for all possible cases and then select at runtime, or, just spit out the code for an interpreter, so when something can't be determined at compile time it just gets interpreted at runtime.disabled="false". Here are some relevant parts:close-element: function [name][
rejoin ["</" name ">"]
]
disabled-element: function [name][
compose/deep [
"<" (:name) to [{enabled=} | ">" | "/>" ]
{enabled="false"} to [">" | "/>"]
["/>" | ">" thru (close-element :name)]
]
]disabled-filter: disabled-element "Filter" disabled-stream: disabled-element "Stream" disabled-component: disabled-element "Component"
parse action [some [
remove [
manual
| disabled-filter
| disabled-stream
| disabled-component
| param-option
]
| skip]
]to/thru behavior right were the really tricky parts for me.[ahead block! into return-parse-rule] => so I would like to type this: into-block: rule [...] [...] [into-block return-parse-rule]r: rule [some-rule] [ahead block! into some-rule] rule1: ['b (print "found b")] topaz-parse [a [b] ] ['a r rule1] ; found b ; == b
set refinement refinement! (append refinements to-word refinement) => name refinement is not really important. What is important is that I append matched type to some collection. I would like to do something like this:create-name: does [to-word append "temp" random 9999] r: rule [type collection ] [set (c: create-name) type (append collection c)] arr: copy [] topaz-parse [42] [r integer! arr]
arr should have 42 in it. rule?rule function. And, no, I don't buy an argument about it being confusing - either way it's just a function that takes some arguments and returns a block that looks like a Parse rule, what's confusing in that?foo: rule [][... |] down to Parse, e.g.:[a b foo c]
[a b [... |] c]
[a b ... | c]
foo: [... |] [a b foo c] ; vs. foo: rule [blah] [... |] [a b foo c]
topaz-parse does not have ahead or set. You can use into block! [...] to specify what type into should accept.into, as I briefly mentioned yesterday you can use into block! some-rule.>> topaz-parse [42] [collect [keep integer!]] == [42]
>> input: [word /refinement some-other-word /some-other-refinement "string"] == [word /refinement some-other-word /some-other-refinement "string"] >> topaz-parse input [collect any [keep refinement! | skip]] == [/refinement /some-other-refinement]
ahead & into, in the Red, are used together so why not create a rule to make it shorter (as you have already done with into block! some-rule in the topaz; I would do it even shorter... well, just a few characters). There might be another keywords used together so why not join them into one rule.to-word). create-name: does [to-word append "temp" random 9999] r: rule [type collection ] [set (c: create-name) type (append collection to-float c)] arr: copy [] topaz-parse [42] [r integer! arr]
[42.0]
>> input: [word /refinement some-other-word /some-other-refinement "string"] == [word /refinement some-other-word /some-other-refinement "string"] >> topaz-parse input [collect any [ref: refinement! keep (to word! ref) | skip]] == [refinement some-other-refinement]
collect + keep (fun X) is a nice combination. It's smaller but I still need to make a name for the word. I want to avoid creating unnecessary words yet keeping it "readable enough". Parse's functions might deal with naming such words leaving "core" to the user. It's something like [tacit programming](https://en.wikipedia.org/wiki/Tacit_programming).binary!? In general (in harder cases), would you (or anyone) prefer to use parse's (future) functions or some external utility, For example:do https://raw.githubusercontent.com/nedzadarek/cold.red/master/main.red
foo: func [bl init] [
cold/fun/into bl func [key value] [
init/(key): init/(key) + value
] init
]
foo [
parse [1] [set i integer! (keep 1 i)]
] #{00}== #{01}collect binary! [...]. Haven't thought much about this yet.function! values should be used in topaz-parse:topaz-parse input [collect any [keep to-word refinement!]]
[on-word word! ...] where on-word is your interpreter's function for dealing with words. Combined with collect and object it would make the parse dialect almost functional.[x: some-rule (do-something-with x)] you just write [do-something-with some-rule]. It could be hard to read though.[collect any [keep to-word refinement!]] and version with your topaz's object might be good addition for core parse. *) but it's just a naming conventions. data: {
<s>
<d>15</d>
<o>
<f1>
</o>
</s>
<s>
<d>25</d>
<o>
<f2>
</o>
<o>
<f3>
</o>
</s>
<s>
<d>37</d>
<o>
<f4>
</o>
</s>
}
parse data [
collect set res [
some [
thru {<s>}
thru {<d>}
keep to {</d>}
some [
thru {<o>}
thru {<f}
keep to {>}
to {</o>}
]
to {</s>}
]
]
]
probe res, the parser is still in the some block, so the next thing it looks for is thru {} , not to , and that will jump you further down in the document than you want to go. to / thru are pretty tricky to get right. Ideally, you should try to build up rules that define the whole document, but I don't know how much more complex it is than your examplecollect but here is what you ask:ws: charset "^/^- " aws: [any ws] digit: charset "0987654321" probe parse data [ collect [some [aws "<s>" aws "<d>" collect [keep any digit "</d>" aws some ["<o>" aws "<f" keep some digit ">" aws "</o>" aws] "</s>"]]] ]
[[
"15" #"1"
] [
"25" #"2" #"3"
] [
"37" #"4"
]]collect if you remove the first one.data: {
<ul class="genres">
<li title="Family">Family</li>
<li title="Drama">Drama</li>
<li title="Adventure">Adventure</li>
</ul>
<ul class="countries">
<li title="The Netherlands">The Netherlands</li>
</ul>
}
parse data [
collect set temp [thru {<ul class="genres">} keep to {</ul>}]
]
parse to-string temp [
collect set genres [some [thru {<li title="} thru {">} keep to {</li>}]]
]
parse data [
collect set temp [thru {<ul class="countries">} keep to {</ul>}]
]
parse to-string temp [
collect set countries [some [thru {<li title="} thru {">} keep to {</li>}]
]
]
probe genres
;==["Family" "Drama" "Adventure"]
probe countries
;==["The Netherlands"]data: {
<ul class="genres">
<li title="Family">Family</li>
<li title="Drama">Drama</li>
<li title="Adventure">Adventure</li>
</ul>
<ul class="countries">
<li title="The Netherlands">The Netherlands</li>
</ul>
}
bl: charset reduce [space tab cr lf]
ul: [thru "<ul" thru ">" some [some bl </ul> break | li] | thru </ul>]
li: [thru "<li" thru ">" keep to </li> </li>]
probe load mold/flat parse data [collect some [to "<ul" collect ul]]block: parse data rule: [
collect [some [
[{"genres"} | {"countries"}] rule
| #"^"" keep to #"^"" skip
| skip
]]
]
move back tail block/1 tail block
load mold/flat block
;== [["Family" "Drama" "Adventure"] ["The Netherlands"]]topaz-parse:>> foo: [word!] topaz-parse [a b] [foo word!] == b >> foo: ['a] topaz-parse [a b c] [foo word! end] == none >> input: [word /refinement some-other-word /some-other-refinement "string"] == [word /refinement some-other-word /some-other-refinement "string"] >> topaz-parse input [collect any [keep to-word refinement! | skip]] == [refinement some-other-refinement]
parse cannot switch to a different input series with :word syntax? Are there any known hacks to do that?parse wheelsparse - thus leveraging some of it's stuff but will certainly involve some dirty hacks I won't be proud ofgoto and longjmp would probably like it. ;^)>> parse r: [[(change/only r [4 5 6]) (r: next r) into r end] [copy o collect some keep integer!]] r ? o O is a block! value. length: 3 [4 5 6]
[4 5 6] into it, then using into on that input matching it against the rest of the rule.into allowed to specify a custom series to go into.>> series1: [a b c] == [a b c] >> series2: [1 2 3] == [1 2 3] >> series: series1 == [a b c] >> parse series [some word! p: (insert p series2) some integer!] == true
>> parse series [some word! (insert clear series series2) :series some integer!] == true
opt rule is acting weird here?>> parse "1234" [a: skip opt to "." a:] ? a A is a string! value: ""
>> parse "1234" [a: skip opt [to "."] a:] ? a A is a string! value: "234" >> parse "1234" [a: skip [to "." |] a:] ? a A is a string! value: "234" >> parse "1234.5" [a: skip opt to "." a:] ? a A is a string! value: ".5"
a ("1234" and "234" respectively).>> parse x: "-" [change not space #"."] == false >> x == ".-"
not does not advance, IMO. It's look-ahead rule. (Hence, corrected my code: ahead not needed)parse x: "-" [not [2 space] rule]
rule continue from if not from same place as before trying subrule?change should change not insert, no?change/part it makes some sense.while over any - wanna mention another real world use case: I'm parsing a dialect and preprocessing it in place using logic likeparse input [while [input:
...
| (found a macro) ... (change/part input new-code 1) :input
| ...
]]any just stops since input did not advance after the substitution (by design). while gives me more control here.ahead that there's colon on line. Something like:ahead [copy value to newline if (find value colon)]
ahead [to #":" to newline]?to makes some thing unnecessarily hard :)to #":" part may skip a few newlines on it's way☺to is right again! :smiley: to and thru - it's hard to delineate up to which rightmost boundary search should be made.key: valueto:line-contains-colon: [to [":" | newline | end] ":"] rule: [ahead line-contains-colon]
to, it is no-no. :smile:to/thru is that they are deceptively simple looking[ahead [to [":" | newline] ":"]] ~ [ahead [some not-colon colon]] ; not-colon: charset [not ":^/"]
to [":" | newline] ~ some not-colon
to rule is basically equivalent to any not rule ahead rule[some not-colon colon] should maybe be [any not-colon colon], but it probably doesn't matter for his example caseany would be better.to rule ~ any not rule ahead rule is not correct. Consider this:rule: ["a"] parse "bcda" [to rule rule] ;== true parse "bcda" [any not rule ahead rule rule] ;== false
some not-colon colon makes more sense, because the key before colon should have some value, but you're right.to - another reason not to use it :)to shouldn't be ruled out, IMO. In cases where performance does not matter, it can make rules simpler, although it must be used with much care.to is great for simple stuff, if you're looking for one value in HTML page, it's the best solution. But converting some xMB document in random format to Red cannot be done with to properly.not rule doesn't advance? So is this a case where parse detects an infinite loop and fails out?to and thru are useful *very* rarely, but sometimes they are (aside from quick parsing I mean) - ie. to #">" is more readable than any non-greater-than (which also requires defining a charset) or any not #">" etc. It is also easier to optimize. That being said, I don't know how to stop people from using them too much.not cannot advance as it succeeds when the rule does *not* match. :)any not rule? It seems like that would match and not advance forever, but is the loop detected and interrupted?to and thru, you probably have little need for them.[not "b" | skip] is always failing after the 1st iteration, yet some continues to match it.>> parse "xx" [any [not "b" | skip]] == false >> parse "bb" [any [not "b" | skip]] == true >> parse "xx" [some [not "b" | skip]] == false >> parse "bb" [some [not "b" | skip]] == true >> parse "bx" [some [not "b" | skip]] (hangs)
not "b" on bx -> false, move to skipskip on bx -> true, run rule againnot "b" on x -> true, run rule againnot does not advancereject of course.parse "x" [some [not "b"]]>> parse-trace "xx" [3 [not "b" | skip]]
-->
match: [3 [not "b" | skip]]
input: "xx"
-->
-->
match: [not "b" | skip]
input: "xx"
-->
==> not matched
<--
match: ["b" | skip]
input: "xx"
<--
<--
<--
return: false
== false
>> parse-trace "bx" [3 [not "b" | skip]]
-->
match: [3 [not "b" | skip]]
input: "bx"
-->
-->
match: [not "b" | skip]
input: "bx"
-->
==> matched
<--
match: [| skip]
input: "bx"
==> matched
<--
match: [[not "b" | skip]]
input: "x"
-->
match: [not "b" | skip]
input: "x"
-->
==> not matched
<--
match: ["b" | skip]
input: "x"
<--
match: [[not "b" | skip]]
input: "x"
-->
match: [not "b" | skip]
input: "x"
-->
==> not matched
<--
match: ["b" | skip]
input: "x"
<--
<--
<--
return: false
== falsenot "b" doesn't advance, neither some, so how does it stop? >> parse "x" [some [not "b" p1:] p2:] == false >> p1 == "x" >> p2 == "x"
parse "bx" [some [not "b" | skip]] should not hang, parse cannot detect input does not advance in this case. Worth a ticket?keep pick be extended to expressions? Currently it works on matches only:>> parse [a b c][collect some [keep ['b 'c] | skip]] == [[b c]] >> parse [a b c][collect some [keep pick ['b 'c] | skip]] == [b c] >> parse [a b c][collect some ['b keep ([add some stuff]) | skip]] == [[add some stuff]] >> parse [a b c][collect some ['b keep pick ([add some stuff]) | skip]] == []
keep copy .>> u ["mamazo" "mamaxo" "hama"] == "ayah, ibu dan makanan" >> uno ["mamazo" "mamaxo" "hama"] == "ayah, ibu atau makanan"
>> u [(ia[word: "mama" determiner: "ni" adjective: "hehaha"])(ia[word: "qaja"])] == "orang tua lucu ini dan terjamah"
Red >> ieoa[[text: (oa[word: "o"])][text: (ia[word: "qima" determiner: "na"])]]["o" "i"] == "iya, jeruk lemun itu"
; %qslsamples/sample.red
ieoa[
[
text: (
ia[
word: "mamazo"
determiner: "ni"
]
)
]
[
text: (
aa[
word: "hama"
tense: "za"
negative: true
]
)
]
]["i" "a"]>> do %qaja/qslsamples/sample2.red == "ayah saya belum terjamah bahasa ini"
i mamazom pa no qaja e qisa ni>> do %qaja/qslsamples/sample2.red == "ayah saya belum terjamah bahasa ini dirumah"
i mamazom pa no qaja e qisa ni ze me>> print do %qaja/qslsamples/su-zu.red ayah saya belum terjamah bahasa ini ketika kita jalan jalan dijalan
i mamazom pa no qaja e qisa ni zu i kisa a jaja ze jajatext: copy {<p></p> <p>6</p>}
res: copy []
parse text [
some [
thru "<p>" copy between to "</p>" (append res between)
]
]
probe res/1
;==""
probe res/2
;=="6"text: copy {<p></p> <p>6</p>}
parse text [
collect set res [
some [
thru "<p>" keep to "</p>"
]
]
]
probe res/1
;==#"6"
probe res/2
;==noneres: parse text [collect some [...]]copy and collect with keep. The former copies matched input in a straightforward way, the latter is more smarter and will ignore empty matches and coerce one-character strings to char! values (because, really, what you're parsing is a series of characters).>> parse "abc" [collect some [keep skip]] == [#"a" #"b" #"c"] >> collect [parse "abc" [some [copy match skip (keep match)]]] == ["a" "b" "c"] >> collect [parse "abc" [some [set match skip (keep match)]]] == [#"a" #"b" #"c"] >> parse "xxx" [collect some [thru #"x" keep to #"x"]] == [] >> collect [parse "xxx" [some [thru #"x" copy match to #"x" (keep match)]]] == ["" ""]
copy to keep the "always return a series" invariant, which it does with empty string case. Ditto for keep - I don't expect it to collect empty garbage, and don't think that this is useful at all in use-cases where collect typically applies.keep returning char! instead of string! myself, and agree that it can be more consistent and return string! in either case. Also see https://github.com/red/REP/issues/8>> parse "<a> <>" [collect some [thru #"<" keep to #">" | keep ("")]]
== [#"a" ""]res: parse "<p></p><p>6</p>" [
collect some [
thru "<p>" keep copy x to "</p>" thru "</p>"
]]>> parse "<a> <>" [collect some [thru #"<" keep to #">" | keep ("")]]
== [#"a" ""]
>> about
Red 0.6.4 for Windows built 31-Aug-2019/17:47:43+05:00 commit #b28d8f5#do [match: none]
#macro grab: func [rule][
compose/only [keep copy match (rule)]
]
probe parse "<p></p><p>6</p>" [
collect some [thru <p> grab to </p>]
]>> do %topaz-parse.red
== func [
{Parse BLOCK according to RULES; return last result from RULES if it matches, NONE otherwise}
block [binary! any-block! an...
>> topaz-parse "<p></p> <p>text</p>" [collect some [thru <p> keep to </p> </p>]]
== [#"<" #"<"]
>> topaz-parse "<p></p> <p>text</p>" [collect some [thru <p> keep copy to </p> </p>]]
== ["" "text"]keep treats empty match - without copy it fails, with copy it keeps an empty string. One can think that in both cases what was matched is an empty string (between and ).copy it is forced to treat match as string, and it matches empty string, but with simple keep or keep pick it doesn't match anything there.copy affect anything? The only side-effect of matched copy IMO should be extraction of matched input into a word. "Anything there" in both cases is an empty string, it's actually equivalent to keep none.>> parse "<p></p><p>x</p><p>abc</p>" [collect some [thru "<p>" keep to "</p>" "</p>"]] == [#"x" "abc"] >> parse "<p></p><p>x</p><p>abc</p>" [collect some [thru "<p>" keep pick to "</p>" "</p>"]] == [#"x" #"a" #"b" #"c"] >> parse "<p></p><p>x</p><p>abc</p>" [collect some [thru "<p>" keep copy _ to "</p>" "</p>"]] == ["" "x" "abc"]
keep options "collect *matched value(s)*". How come first two examples haven't matched anything but the third one did, even though there's no *value* between < and > characters?keep and copy are separate keywords, but now it turns out that keep copy is a dedicated keep option.string! is a series of char! *values*, and there's no such thing as #"". So, why keep copy "collects matched values as a single series" when no *values* actually being matched?(of same type as input).keep variants match values of course, but of different types, i.e. strings or chars *in this case*. keep matches char! if one or string! if several. keep pick matches always chars, and keep copy matches always strings.keep doesn't match, it collects values from already matched input, but can do it in a variety of ways: either one-by-one in a block or together in a series of respective datatype. Anyway, since #"" doesn't exists, it might make sense for keep and keep pick to ignore first match but keep copy to yield an empty string. And keep copy indeed preserves "copy always return a series" invariant (which it isn't, by the way).>> parse [][copy match skip] :match >>
collect / keep idiosyncrasies and a need for their further improvement (as indicated by my comment in https://github.com/red/REP/issues/8 and elsewhere). Setting a dummy word to matched input just to keep empty string is a hacky solution.skip fails.>> parse [][copy match none] :match == []
keep copy follows copy semantics faithfully, but keep keeping empty strings is IMO questionable. keep only or something like that would be OK.>> a: "hello" == "hello" >> probe first a #"h" == #"h" >> probe type? first a char! == char! >>
collect is already confusing enough as it is now, and there are a couple of REP tickets pending in this regard. I see no harm in discussing this and coming to at least some form of consensus.first "hello" shouldn't be a string ? first returns the first value in a series. A string is a series of char! values.keep behaving differently depending on number of matched elements.>> parse "ab" [collect keep 1 skip] == [#"a"] >> parse "ab" [collect keep 2 skip] == ["ab"]
string! is a series of char! values is explicitly mentioned both in [Red](https://doc.red-lang.org/en/datatypes/string.html) and [Rebol](http://www.rebol.com/docs/core23/rebolcore-6.html) documentation. Mechanics of series itself is consistent and uniform across the whole language: if you ask for the first element of a series, you get the first element (or none, if there's none).>> res: parse @-+-a+-bc+ [collect some [thru "-" keep copy _ to "+" "+"]] == [ a bc] >> length? res == 3 >> _?: first res == >> type? _? == email! >> length? _? == 0
_?: next @
>> i: to issue! "a b c" == #a >> mold i == "#a" >> length? i == 5 >> last i == #"c"
charset, not, and union are your friends here. >> dig=: charset [#"0" - #"9"]
== make bitset! #{000000000000FFC0}
>> alnum=: make bitset! [#"0" - #"9" #"a" - #"z"]
== make bitset! #{000000000000FFC0000000007FFFFFE0}
>> non-dig=: charset [not #"0" - #"9"]
== make bitset! [not #{000000000000FFC0}]
>> hex=: union dig= charset "ABCDEFabcdef"
== make bitset! #{000000000000FFC07E0000007E}do %ebnf.rule ebnf: read %ebnf.ebnf parse ebnf remove-gaps parse ebnf remove-comments parse ebnf syntax ;== true
b instead of a:>> parse [a a: b][quote a: set value skip to end] == true >> value == a:
skipimoa instead of a: is the point.b:>> value: none parse [c a a: b][some [quote a: set value skip | skip]] value == a:
>> value: none parse [c a a: b][some [ahead set-word! quote a: set value skip | skip]] value == b
setworks. Never used a block parsing. Does it set the value at certain (matched) position? Then it should imo return b, as you suggestquote does - matches value that follows it literally. Except that in this case its a lax any-word! match.text >> parse [a:][quote a] == true >> parse/case [a:][quote a] == false >> parse/case [a:][quote a:] == true
any-word! values to word! that messes this up.word! parsing:>> parse [a][a] *** Script Error: PARSE - invalid rule or usage of rule: a *** Where: parse *** Stack: >> parse [a]['a] == true
path! parsing:>> parse [a/b][a/b] == true >> parse [a/b]['a/b] == false
path! parsing, but shouldn't the second return true!? parseparse words (word! and lit-word!) are substituted by their values, but not paths. Otherwise a: 1 parse [a/b][a/b] would return an error and we would need to write parse [a][quote a] every time, so I think it is a reasonable design choice.[a] will match with the value of a (usually a sub-rule, but could be anything), while ['a] will match the word a (equivalent to [quote a]); in the same way, [a/b] should match with the value of a/b, while ['a/b] should match with the path a/b.REBOL/Core 2.7.6.4.2 (14-Mar-2008) ... >> parse [a/b] [a/b] ** Script Error: a has no value ** Near: parse [a/b] [a/b] >> parse [a/b] ['a/b] == true
lit-path!rule for matching path!).parse [a/b] [a/b]fails (and resolves the value of a/b) then I agree that parse [a/b] ['a/b] should behave like parse [a] ['a] .parse-c-header. It's unfinished because C headers are just C in disguise. For Rebol , I did Reb-C, translator for subset of Rebol to C.parse. In fact, @dockimbel wrote a scheduling dialect just for that, which I hacked on quite a bit for production use.gpio [ hardware odroid-c2 ;defines ids and features of pins for target hw legs: [5 4 7 6] servo-vals: read legs pin legs pwm rate 60 on-timer [ write legs (gyro-correct/x) ] ] do-parse-events
split "The trip will take 21 days" spparse?parse "The trip will take 21 days" [collect any [sp | keep to [sp | end]]]a: "<img src='test.png'>" parse a [to "="]
>> parse "He is a good man" [collect any [sp | keep to sp | end] ] == ["He" "is" #"a" "good"]
char!, not as string!. Also, it misses last word:>> parse "He is a good man" [collect any [sp | keep to [sp | end]]] == ["He" "is" #"a" "good" "man"]
char! to string!:>> parse "He is a good man" [collect any [sp | copy value to [sp | end] keep (form value)]] == ["He" "is" "a" "good" "man"]
any sp will match any space. You have more than just spaces in your input.any space matches:>> parse "" [any space] == true >> parse " " [any space] == true
any matches zero or more occurrences. If you want to match at least one space, you need to use some.>> parse "" [some space] == false >> parse " " [some space] == true
>> parse " " [1 3 space] == true >> parse " " [1 3 space] == false
parse "He is a good man" [collect any [sp | keep copy _ to [sp | end]]] ☻>> parse "He is a good man" [collect any [sp | keep copy tmp to [sp | end]] ] == ["He" "is" "a" "good" "man"]
copy command you can force a series! as collected value.>> series! == make typeset! [block! paren! string! file! url! path! lit-path! set-path! get-path! vector! ha...
? in console?>> ? sp SP is a char! value: #" "
a, not with space.>> parse "aaa bbb ccc ddd" [1 skip] == false >> parse ["aaa bbb ccc ddd"] [1 skip] == true
[] mean in read?1 not needed). In block parsing you skip the string and are in the end of input.parse "aaa bbb ccc ddd" [collect [2 skip]] is wronga bbb ccc ddd is what I expectedcollect is meaningful together with keep only:>> first parse "aaa bbb ccc ddd" [collect [2 skip keep to end]] == "a bbb ccc ddd"
skip:>> skip "aaa bbb ccc ddd" 2 == "a bbb ccc ddd"
parse "He is a good man" [collect any [ keep to sp ] ]parse you can use set-word! values to mark a position in the input, and you can use paren! values as actions. With these two simple tools you can set markers and print them out in the console, to see where parsing stopped, how rules like some or any move through an input, and more.set-word! examples? I am about "set-word! values to mark a position in the input, and you can use paren! values as actions. With these two simple tools you can set markers and print them out in the console">> parse load "first second third" [some [s: skip (print s/1)]] first second third == true
s refers to the whole input series at certain index, s/1 picks first element at this index. parse "first (second third) fourth" [
some ["(" s: | ")" e: (print copy/part s back e) | skip]
]
;second thirdn: 0
parse str: copy matrjonushka: "[beware (matrjonushka) here]" [
some [
"(" s:
| ")" e: if (5 > n: n + 1)(
change/part back s matrjonushka e
) :s
| skip]
] str
== {[beware [beware [beware [beware [beware (matrjonushka) here] here] here] here] here]}get-word! because it's more advanced, and @bubnenkoff is just starting out.page: {
<html>
<title> My Great Page</title>
<h1>Big Heading A</h1>
<p>Stuff in A</p>
<h1>Big Heading B</h1>
<p>Stuff in B</p>
</html>
}
parse page [ collect any [thru <h1> | keep to "<" ]]
== ["Big Heading B"]|thru succeeds and brings you just behind any starts next round from new positionthru , let's try thiss, first subrule failskeep to "<", successparse IDE we can show matches and backtracking (even now, thanks to parse/trace). Figuring out how to visualize things such that they are as clear as this kind of explanation will be a fun challenge.>> parse [a 1 b 2 c 3 d 4 e 5] [collect [keep any [keep word! | number! ]]] == [a b c d e [a 1 b 2 c 3 d 4 e 5]]
>> parse [a b c d e f] [to 'b collect [keep any word!]] == [[b c d e f]] >> parse [a b c d e f] [to 'b collect [ any keep word!]] == [b c d e f] >>
any keep word! workingcollect initialises new collection-blockcollectkeep; now what shall we keep?any; enter the blockword!, keep it; skip number! (but both are successfully matched)keep recieves and keeps matched input from any, i.e. all of it pick after first keep.any. pick would pick elements of the block, but no block.>> rule: [any [keep word! | number!]] parse [a 1 b 2 c 3 d 4 e 5] [collect [s: rule :s rule]] == [a b c d e a b c d e]
any and you keep it as block.any loop. (As keep word! is a rule, and any operates on the following rule, there is no reason why it should not be working.)pick with keep:>> parse [a b c d e f] [to 'b collect [keep pick any word!]] == [b c d e f]
pick I didn't know it returned values as a block inside a block if multiple have been returned from the ruleKEEP which receives multiple values (or it is any management mechanism creating the block and passing it to KEEP)keep is a keyword which makes things happen in Red's belly, so that whatever is matched next will be appended to the collection, either as series (if series is matched or copy keyword (together with a word) is used after keep) or as single value (if single non-series value is matched w/o copy) or multiple single values (if series is matched but pick is used after keep). To see how it *exactly* happens please look at [parse.reds](https://github.com/red/red/blob/master/runtime/parse.reds).KEEP [parse.red](https://github.com/red/red/blob/master/runtime/parse.reds) for future reading as the topic is too advanced and also I do not learnt R/S.view [ button "open" [ if file: request-file [ do [ x: parse file/text [ collect any [thru "<purchaseObject>" keep to "</purchaseObject" ]] ] ] ] area 900x700 x ]
x that have parse result dataview [ button "open" [ if file: request-file [ a/text: read file ] ] a: area 900x700 ]
view [ button "open" [ if file: request-file [ x: parse read file/text ....
file: request-file, file is of type file!, i.e. it is filename. You have to read or load it.view [ button "open" [ if file: read request-file [ x: parse file [ collect any [thru "<purchaseObject>" keep to "</purchaseObject" ]] ] ] area 900x700 x ]
mold?collect from parse returns a block, mold will turn it into string. You can try form too, if it suits you.view [ button "open" [ if file: request-file [ a/text: mold parse read file [ collect any [thru "<purchaseObject>" keep to "</purchaseObject" ]] ] ] a: area 900x700 ]
a: area 900x700 wrap if it is long enough.area? Or there is another ways?a/text: ? And why than we writing a: area to display text. a: is creating word. Why we need create word and set area to it?a: area, an area face is created with its facets (see [doc](https://doc.red-lang.org/en/view.html#_area)), and a is refering to this face. By using set-path a/text: you can set text facet of this area and with path a/textyou are accessing that facet. If you want just to display the text and don't want to edit it, you can use text instead of area. E.g. t: text 900x700 wrap. Still, to set text you use t/text:... and to access it t/text. You need to set a word to the area (or any face/style) if you want to directly access it later. But there are ways to access it indirectly too.a/text) like dot notation (a.text) in other langs.parse file [
any [
to separator start:
to heading stop:
( change/part start stop "" )
:stop
]
]parse file [
any [
to separator start: (n1: index? start)
to heading stop: (n2: index? stop)
(n: n2 - n1)
(remove/part start n)
:stop
]
]stop: records position when hitting heading, and when you continue from :stop, then parse continues from the index? of recorded stop: which generally is not where you want it to continue. You can check it if you insert (probe stop) just after :stop.parse file [any [ to separator start: to heading stop: (remove/part start stop) :start ]]
z: charset [#"a" - #"z"]>> foreach d z [print d] *** Script Error: foreach does not allow bitset! for its series argument
>> z: charset [#"a" - #"z"]
== make bitset! #{0000000000000000000000007FFFFFE0}>> repeat c 256 [if find z c: c - 1 [prin to-char c]] prin lf abcdefghijklmnopqrstuvwxyz
bits: make bitset! [#"a" - #"b"] is 8bit for a plus 8 bit for b?length? show me strange digit0100000100. Here I have bits 2 and 8 set, so it can match for example numbers 2 and 8, or letters **b** and **h**.>> enbase/base to binary! charset #"1" 2
== {00000000000000000000000000000000000000000000000001000000}
>> enbase/base to binary! charset #"2" 2
== {00000000000000000000000000000000000000000000000000100000}
>> enbase/base to binary! charset #"3" 2
== {00000000000000000000000000000000000000000000000000010000}
>> enbase/base to binary! charset #"4" 2
== {00000000000000000000000000000000000000000000000000001000}
>> enbase/base to binary! charset #"5" 2
== {00000000000000000000000000000000000000000000000000000100}
(etc)>> enbase/base to binary! charset to char! 0 2 == "10000000" >> enbase/base to binary! charset to char! 1 2 == "01000000" >> enbase/base to binary! charset to char! 2 2 == "00100000" >> enbase/base to binary! charset to char! 3 2 == "00010000" (etc)
parse or charsets in general, those bits are flags to indicate whether a particular character is part of the bitset. But bitsets are a powerful and space efficient structure that can be used for other purposes.tag: ["<" alpahabet ">" skip thru "</" alpahabet ">"]
tag: "<" alpahabet ">" skip thru "</" alpahabet ">"
>> halo: charset "halo"
== make bitset! #{0000000000000000000000004089}
>> parse "halo" [4 halo]
== true
>> parse "ahol" [4 halo]
== true
>> parse "aaaa" [4 halo]
== truealphabet: union charset [#"a" - #"z"] charset [#"A" - #"Z"]>> alphabet: charset [#"a" - #"z" #"A" - #"Z"]
== make bitset! #{00000000000000007FFFFFE07FFFFFE0}
>> length? alphabet
== 128
>> append alphabet #"ř"
== make bitset! #{00000000000000007FFFFFE07FFFFFE000000000000000000000000000000000000000000000000000000040}
>> length? alphabet
== 352
>> repeat i length? alphabet [ if find alphabet i [print [i to-char i]] ]
65 A
66 B
...
345 řbreak. "break out of a matching loop, returning success."a: "<app><div>Hello</div></app>" >> parse a ["<" thru ">" break] == false
as true and then break will stop evaluation and will return trueto-char is just a shortcut... don't use it in tight loops ;-)>> ?? to-char to-char: func ["Convert to char! value" value][to char! :value]
to end instead of break if you want to end and return true:>> parse a ["<" thru ">" to end] == true
break user. But it makes sense, that as there is a way how to end with true, there should be a way how to end with false too. >> parse "aabb" [some [#"a" break] copy rest to end] rest == "abb" >> parse "aabb" [some [#"a" ] copy rest to end] rest == "bb"
>> charset ["a"]
== make bitset! #{00000000000000000000000040}>> enbase/base to binary! charset ["a"] 2
== {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000}a mapped to Hx0040 ?>> enbase/base to binary! charset ["a"] 16 == "00000000000000000000000040"
>> enbase/base to binary! charset ["a"] 64 == "AAAAAAAAAAAAAAAAQA=="
>> cs: charset "a"
== make bitset! #{00000000000000000000000040}
>> repeat i length? cs [if cs/:i [print i]]
97>> to-integer #"a"
== 97
>> b: enbase/base to binary! charset ["a"] 2
== {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000}
>> index? find b "1"
== 98 ;<--- 1-based indexing when using string series!a converts to 97 ?U+0061 a 61 LATIN SMALL LETTER A [Source](https://www.utf8-chartable.de/)a is the 98th characterto integer! #"a" the number a is decimal 97, while in charaset the bit is ON in position 98 because UTF8 starts from 00 ....a in the UTF8 table, I could imagine Red converts the letter to its absolute position and then check that flag.>> alphabet/97 == true >> to-char 0 == #"^@" >> alphabet/0: true == true >> find alphabet #"^@" == true
bitset is. It is so easy.charset and look at it just as a collection of bits:>> b: make bitset! []
== make bitset! #{00}
>> length? b
== 8
>> b/0: true
== true
>> enbase/base to-binary b 2
== "10000000"1 characters.>> z: charset [#"a" - #"z"] == make bitset! #{0000000000000000000000007FFFFFE0}>> charset ["a"] == make bitset! #{00000000000000000000000040}a and not A )>> enbase/base to binary! charset to char! 3 2 == "00010000"
>> enbase/base to binary! charset ["a"] 2 == {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000}>> to-integer #"a" == 97
>> index? find b "1" == 98 ;<--- 1-based indexing when using string series!
charset [#"a" - #"b"]
a in the UDF8 table, it is 97 but consider its position is 98 as the UTF8 table starts from 0. Then add 97 zeroes and one 1 in position 98.== {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000}b is position 98 in UTF8 table, so it is 99 in our bitset, add 98 zeroes and set the 99th it to 1.== {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000}== {00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100000}a will look at 98th bit and if set to 1, as it is, will return as a matching character in parse.>> to-binary "a"
== #{61}
>> to-binary to-bitset "a"
== #{00000000000000000000000040}> >> to-binary "a"
> == #{61}
> >> to-binary to-bitset "a"
> == #{00000000000000000000000040}
>#{61} and #{00000000000000000000000040}to-binary results in binary series represented in base 16. To convert it to base 2 use enbase/base 2 .>> print head insert back tail "2#{}" enbase/base to-binary "a" 2
2#{01100001}
>> print head insert back tail "2#{}" enbase/base to-binary charset "a" 2
2#{00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000}binary! in [doc](https://doc.red-lang.org/en/datatypes/binary.html) or in [spec](https://github.com/meijeru/red.specs-public/blob/master/specs.adoc#binary).> page: {
> <html>
> <title> My Great Page</title>
> <h1>Big Heading A</h1>
> <p>Stuff in A</p>
> <h1>Big Heading B</h1>
> <p>Stuff in B</p>
> </html>
> }
>
> parse page [ collect any [thru <h1> | keep to "<" ]]
> == ["Big Heading B"]
>break:>> parse page [ collect any [thru <h1> keep to "<" break]] == ["Big Heading A"]
;Let's convert a string to binary
bn: to-binary "Březovský"
;== #{42C599657A6F76736BC3BD}
;Now try to convert each byte back to characters
foreach b bn [prin to-char b]
;BÅezovský
;But if we convert it to string
to-string bn
;== "Březovský"
;Let's see the chars
to-binary "B"
;== #{42}
to-binary #"ř"
;== #{C599}
;Ha!
forall bn [
either bn/1 < 128 [
prin to-char bn/1
][ prin to-char copy/part bn 2 bn: next bn]
]()
;Březovský
;Now charset
cs: charset "Březovský"
;== make bitset! #{0000000000000000200000000411122000000000000000000000000000000004000000000000000000000040}
;Here they are in "alphabetic" (or rather "utfic") order
repeat i length? cs [if cs/:i [prin to-char i]]()
;Bekosvzýř
;Nice thing about charsets (well, bitsets actually) is that we can do set operations with these
cs2: charset "Boleslav"
;== make bitset! #{000000000000000020000000440912}
cs-union: union cs cs2
;== make bitset! #{0000000000000000200000004419122000000000000000000000000000000004000000000000000000000040}
repeat i length? cs-union [if cs-union/:i [prin to-char i]]()
;Baeklosvzýř
cs-excl: exclude cs cs2
;== make bitset! #{0000000000000000000000000010002000000000000000000000000000000004000000000000000000000040}
repeat i length? cs-excl [if cs-excl/:i [prin to-char i]]()
;kzýř
cs-diff: difference cs cs2
;== make bitset! #{0000000000000000000000004018002000000000000000000000000000000004000000000000000000000040}
repeat i length? cs-diff [if cs-diff/:i [prin to-char i]]()
;aklzýř
cs-compl: complement cs2
;== make bitset! [not #{000000000000000020000000440912}]
repeat i length? cs-compl [if cs-compl/:i [prin to-char i]]()
!"#$%&'()*+,-./0123456789:;<=>?@ACDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`bcdfghijkmnpqrtuwxto-string #{e28880 78 e28883 79 20 28 42 78 20 e288a7 20 47 79 29 20 e28a83 20 4C 78 79}
;== "∀x∃y (Bx ∧ Gy) ⊃ Lxy"
;You can use unicode code points to get characters if you convert these to integers first:
to-char to-integer #{01F609}
;== #"😉"
to-char to-integer #{01F475}
;== #"👵"
;But to get multibyte characters (also chars with decimal value > 127) in string you should use utf encoding
print to-string #{f09f91b5 20 f09f9889}
👵 😉any loop:>> parse page [ collect [thru <h1> keep to "<"]] == ["Big Heading A"]
| with the meaning **or**. As the first condition is fulfilled the second with keepwill not be processed as long as the first condition is sufficient.>> parse page [ collect any [thru <h1> keep to "<" ]] == ["Big Heading A" "Big Heading B"]
repeat i length? cs [either cs/:i [prin to-char i] [prin "."]] ...................B..................................e.....k...o...s..v...z.........................................................ý...................................................ř.......
bitset!s and add those conversation notes + Toomas' explanations there?find simply checks if the field exists, returning true or none">> b == ["name" "price" "age"] >> find b ["age"] == ["age"]
obj: object [a: 44] print find obj 'a
find simply checks if the field exists, returning true or none"> >> b > == ["name" "price" "age"] > >> find b ["age"] > == ["age"] >
find was working on objects before and was return true/false, but not anymore. that feature removed on recent versions.*** Script Error: find does not allow object! for its series argument error.find returns the given block in the position where it found the searched value (or none if it can't find). So find b ["age"] ; == ["age"] is correct.b: ["name" "price" "age"] not none? find b "age" ; returns true/false ; or if find b "age" [print "found!"]
to-logic>> to-logic find ["name" "price" "age"] 'a == false >> to-logic find ["name" "price" "age"] "name" == true
logic! value false (use help logic! to see all aliases), and none. Everything else, including unset! is considered true. So you don't have to cast to logic! values if all you care about is truthiness.tg: [any ["<" thru ">" opt lf "<" opt "/" thru ">"]]>> foreach tg "<div><title></title></div>" [print "hello"] hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello
hello printed.foreach tg " " [print tg] and you will understand.tg to contain tags, but actually you are getting each char from the given string series.>> tg: [#"<" opt #"/" thru #">"] == [#"<" opt #"/" thru #">"] >> parse "<div><title></title></div>" [any [s: tg e: (probe copy/part s e)]] "<div>" "<title>" "</title>" "</div>"
parse "<div><title>blabla</title></div>" [any [s: tg e: (probe copy/part s e) | 1 skip]]
>> parse "<div><title>blabla</title></div>" [collect any [keep tg | 1 skip]] == ["<div>" "<title>" "</title>" "</div>"]
s and e hold series at positions (start and end) of the input. Just add probe s probe e and see.copy/part s e creates a new substring from the input. get-word!.. for example:>> parse str: "aabbaa" [any [some #"a" | s: some #"b" e: (e: change/part s #"c" e) :e]] == true >> str == "aacaa"
>> tg: [#"<" opt #"/" thru #">"]
tg: [any ["<" thru ">" opt lf "<" opt "/" thru ">"]]
# for single chars as in your example?s and e hold series at positions (start and end) of the input. Just add probe s probe e and see.any behavior? Could you provide another example of Handling start and end position with any>> parse "aabbcc" [any [s: "aa" (print s) "bb" e: "cc" (print e) ] ] aabbcc cc
s and e values>> string: "This is some text." == "This is some text." >> start: skip string 4 == " is some text." >> end: skip string 8 == "some text." >> copy/part start end == " is "
sand evalues. Just remember - those are just kind of pointers/markers at certain string position.any. Try this:>> str: "abc" (parse str [1 skip bc:] bc) = skip str 1 == true
s and e hold series at positions (start and end) of the input. Just add probe s probe e and see.> >> string: "This is some text." > == "This is some text." > >> start: skip string 4 > == " is some text." > >> end: skip string 8 > == "some text." > >> copy/part start end > == " is " >
>> x == "aaabbbcccddd" >> parse x [any [s: "aaa" (print s) e: "ccc" (print e) ] ] aaabbbcccddd
aaabbbccc
s: will point to x at index 1. When rule reaches (print s) it prints s, which is basically same as x. As it doesn't match anything further, it fails.>> s = x == true
>> parse x [s: "aaa" thru "ccc" e: (print copy/part s e)] aaabbbccc == false >> parse x ["aaa" thru "ccc" e: (print copy/part x e)] aaabbbccc == false >> parse x ["aaa" thru "ccc" e:] print copy/part x e aaabbbccc
parse for the moment and play with series values directly. Use navigation funcs like [head tail next back skip at index? pick poke] and things like copy/part. Get a feel for those, then come back to parse, and I think it will make more sense.>> parse x ["aaa" thru "ccc" e:] clear e print x aaabbbccc
tg: [#"<" opt #"/" thru #">"]
opt is skip one element. So I should write something another like: opt any alpahabet. Why your code is working? and also ... yes.. there is the thru, so I could avoid it, but there could be also something more sophisticated instead of the thru. and also ... yes.. there is the thru, so I could avoid it, but there could be also something more sophisticated instead of the thru.parse " " [any [s: tg e: (probe copy/part s e)]]"s and e hold series at positions (start and end) of the input. Just add probe s probe e and see.parse "aabbcc"
cc. So parsing rule will be:parse "aabbcc" [s: "aa" e: thru "cc"]s - will hold start, e - endany [...] can be more that two positions:parse "<div><title></title></div>" [any [s: tg e: (probe copy/part s e)]]
>> parse "aabbcc" [s: "aa" e: thru "cc"] == true >> s == "aabbcc" >> index? s == 1 >> e == "bbcc" >> index? e == 3 >> copy/part s e == "aa" >> copy/part "aabbcc" (3 - 1) == "aa"
set-word! as a parse rule, it stores current input's position.e: (probe copy/part s e)
e and what happens than? What value have last e (before closing paren)e will automatically point to end of match? and than we will do copy/part from start to end?parse "aaabbbccc" [s: "aaa" thru "ccc" e: ]
e is getting ccc index!parse a [any [s: tg e: (probe (index? s index? e) ) ]]
join and rejoin? Is it good to write like:>> parse a [any [s: tg e: (probe rejoin [index? s " - " index? e] ) ]] "1 - 6" "6 - 11" "11 - 18" "18 - 26" "26 - 32" "32 - 38"
join is rudimental?>> ? join
rejoin function! Reduces and joins a block of values.>> print ["aa" "bb"] aa bb
(""aa" "bb") is the same as do ["aa" "bb"]which gives the last valued: charset [#"0" - #"9"]
>> repeat c length? d [if find to-string d c [ print to-char c]] == none
repeat c length? d [if find d c [ print to-char c]]Set-word!s work differently in the parse dialect than in normal Red. In Red's standard evaluator e: (probe copy/part s e) would set e to refer to the result of the paren evaluation, but in parse e: marks a location in the input, which you can refer to (e.g. in the paren), but the evaluation of the paren is ignored. Everything *inside* the paren is evaluated normally, not as parse dialect.a: "<app><div><title>Hello</title></div></app>" alphabet: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"] >> parse a [ any [s: tg e: (print copy/part e s) | collect keep alphabet ] ] <app> <div> <title> </title> </div> </app> == [#"H" [#"e"] [#"l"] [#"l"] [#"o"]]
Hello not as separate charsets?>> parse a [ any [s: tg e: (print copy/part e s) | collect into q keep alphabet ] ] <app> <div> <title> </title> </div> </app> == true >> q == "olleHaaab"
>> first parse a [collect any ["><" | #">" keep to #"<" | skip]] == "Hello" >> first parse a [collect any [#">" not #"<" keep to #"<" | skip]] == "Hello" >> form first parse load a [collect any [keep word! | skip]] == "Hello"
parse a [ collect any [s: tg e: (print copy/part e s) | keep copy tmp [any alphabet]]]
parse a [ collect any [s: tg e: (print copy/part e s) | keep thru [any alphabet]]]
```, although with parse-trace it actually happens before the point it would happen with parse. I thought that this could be memory related, but then I tried to cut the file and go at it in chunks but it always breaks at the some points. Does anyone have any thoughts?
The actual file is about 60000 lines long, but it doesn't get past line 279... and it breaks with this line'Sonographic diagnosis of thyroid cancer with support of AI. '
`--cli I see *** Script Error: reset-buffer does not allow vector! for its argument --cli I get parse to finish with false and last words it prints are " Sonographic diagnosis of Sonographic diagnosis of"match: [some [chars] #"." any [#" "]]
input: { Mahmoudzadeh AP, Malkov S, Fan B, Greenwood H, J
-->
*** Script Error: reset-buffer does not allow vector! for its <anon> argument
*** Where: reset-buffer
*** Stack: parse-trace--cli the behavior is different.read would have failed.parse-trace is a bit different. Fails on the 1st line, returning false. parse stops with vector thing again on 289 line.a: "<app><div><title>Hello</title></div></app>" alphabet: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"] tg: [any [#"<" opt "/" thru #">"]] >> parse a [any tg | skip any alphabet] == false
parse a [any tg | skip [any alphabet] ]
parse a [any tg | skip any alphabet]
[any tg | skip any alphabet] <- how do you understand this rule? what should it do, in your opinion?a: just before the skip and b: after the alphabet, then see what a & b shows you to understand where parsing stuck.[any tg | skip any alphabet] <- how do you understand this rule? what should it do, in your opinion?>> parse a [any tg | a: skip [any alphabet] b: ] == false >> a == "<app><div><title>Hello</title></div></app>" >> b *** Script Error: b has no value *** Where: catch *** Stack:
[any tg | whatever] will accept any number of tags. Any is zero or more, so basically anything is fine for this rule. That means that the alternate rule is never checked.parse-trace a [any tg skip [any alphabet] ]
skip doesn't work that way. Hint 5 skip skips 5 characters.any alphabet return true on first symbol and do not moving forward because "Repeat rule zero or more times until failure or if input does not advance"?>> parse a [[any tg q: skip thru alphabet] ] == false >> q == "Hello</title></div></app>"
>> parse a [[any tg q: skip thru any alphabet] ] == false >> q == "Hello</title></div></app>"
tg: [any [#"<" opt "/" thru #">"]]any works is if the input matches the rule, it will advance the input until it stops matching, then it will move to the next rule. If you have any rule1 rule2, rule1 will always match, then parse will advance to rule2. If you have any rule1 | rule2, rule2 will never be hit because any rule1 never failsparse a [[any tg q: skip thru alphabet] ]parse-trace a [any tg skip [any alphabet] ]
any tg ;<--- right associative. It's argument is "tg" skip ;<--- no associative or left associative: skip = skip one position; 5 skip = skip 5 positions.
skip [any alphabet] ;skip open position [any alphabet] is not a SKIP argument
any will match until the END position, the next rules will receive END as start position ?Hello?>> probe a "<app><div><title>Hello</title></div></app>"
>> tg == [any [#"<" opt "/" thru #">"]]
>> a: "<app><div><title></title></div></app>" == "<app><div><title></title></div></app>" >> probe tg [any [#"<" opt "/" thru #">"]] == [any [#"<" opt "/" thru #">"]]
parse a [ skip tg ] is skipping just first latterARGUMENTS:
series [series! port!]
offset [integer! pair!]skip which you can use in "normal" Red programming, and there is a skip word in the _parse dialect_ . These are not to be confused. Concerning the use of Red values (including words) in dialects, may I refer to the spec document, [section 2.5](https://github.com/meijeru/red.specs-public/blob/master/specs.adoc#25-dialects).skip (correct me if I am wrong) in parse dialect works not as I expected I should create new own word with proper behavior? skip in parse applies to elements of the string being parsed. Plese stop confusing the two.PARSE having its own language. It has similarities with Red but words work differently. Skip in parse is SKIP where is an integer and optional. >> a: "<app><div><title></title></div></app>" == "<app><div><title></title></div></app>" >> tg: [#"<" thru #">"] == [#"<" thru #">"] >> parse a [tg mark:] == false >> mark == "<div><title></title></div></app>"
IF *evaluate the Red expression*, could the success of a rule be evaluated with or without moving the input ?ahead ruleahead usage, though I'm not sure if it's the sort of example you are looking forIF [RULE] then [RULE] or THEN (code). opt [rule (code)] or opt [ahead rule (code)] depending on your needsparse "a b c" [e: any ["a b c"]] and parse "a b c" [any e: ["a b c"]] ?any in the first example but if you put the set word between ANY and the rule, is e set multiple times ? Is the set just one ?:word : resume input at the position referenced by the word">> parse str: "This is get-word example" [some [" " s: | "get-word" :s change "g" "s" | skip]] == true >> str == "This is set-word example"
>> f: t: func [w][print ["--> " w " <--"]]
>> parse "aabbcc" ["aa" s: "bb" (f s) "cc"]
--> bbcc <----> bb <--s: is just a marker of the position in the while string. Try to use something like parse "aabbcc" ["aa" s: "bb" e: (f copy/part s e) "cc"]f: func [a b][print ["--> " a b " <--"]] ?skip for that as an alternate rule. But parse also can move "faster", using to or thrukeywords. In R3 and Red (in contrast to R2), something like to [a | b | c] should work too. But beware, what you get. Do some printing like you already do, that keeps you learning ....a and b start and end (like s, e)?>> parse "aabbcc" ["aa" s: "bb" e: (f copy/part s e) "cc"] *** Script Error: f is missing its b argument
f: func [a b][print ["--> " copy/part a b " <--"]] parse "aabbcc" ["aa" s: "bb" e: (f s e) "cc"]
>> parse "aabbcc" ["aa" copy b "bb" (probe b) "cc"] "bb" == true >> b == "bb"
page: read https://medium.com/topic/visual-design
parse page [
some [
thru {<h3 class="ap} thru {>} [
{<a href="} copy url to {"}
thru {>} copy text to {</a}
( print [text lf url lf] )
| 1 skip
]
]
]{}?parse "aabbcc" ["aa" s: "bb" e: (print copy/part e s)]
e is data and s is digit?>> copy/part "aabbcc" 4 ; first element is data, second is index == "aabb"
>> s: "abc" == "abc" >> index? s == 1 >> e: next s == "bc" >> index? e == 2 >> copy/part s e == "a"
copy/part "aabbcc" 4
parse. You're doing well, but learning how series work is key, because parse operates on them (though in a special way). >> d: [aa bb cc] == [aa bb cc] >> tail d == [] >> head d == [aa bb cc]
copy/part syntaxes as shortcuts, either way. It's two *ways* to do it, either of which may be more convenient in a given circumstance. That is, sometime you know how *many* items you want to copy, and sometimes you know a position in a series you want to copy *to*.s1: "aabbccddee" s2: next next s1 >> copy/part s2 s1 == "aa" >> copy/part s1 s2 == "aa"
s1: "abcdef" list keyword that enable shortcut for writing e.g. element any [separator element] as list element separator.>> do %topaz-parse.red >> list: rule [element separator] [element any [separator element]] >> topaz-parse "a,a,a,a" [list #"a" #","] == #"a" >> topaz-parse "a,a,a,a" [copy list #"a" #","] == "a,a,a,a"
alphabet: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
tg: [{<} opt {/} thru {>} opt lf]
parse-trace a [any tg some alphabet any tg]a: {<root>
<lots>
<lot>
<name>Foo1</name>
<price>100</price>
</lot>
<lot>
<name>Bar1</name>
<price>202</price>
</lot>
</lots>
</root>}Bar1 parse a [any [s: tg e: (print ["tag:" copy/part s e]) | s: some alphabet e: (print ["txt: " copy/part s e])| skip]]>> tg: [{<} opt {/} thru {>}]
== ["<" opt "/" thru ">"]
>> parse a [any [s: tg e: (print ["tag:" copy/part s e]) | s: some alphabet e: (print ["txt: " copy/part s e])| skip]]
tag: <root>
tag: <lots>
tag: <lot>
tag: <name>
txt: Foo1
tag: </name>
tag: <price>
txt: 100
tag: </price>
tag: </lot>
tag: <lot>
tag: <name>
txt: Bar1
tag: </name>
tag: <price>
txt: 202
tag: </price>
tag: </lot>
tag: </lots>
tag: </root>
== trueparse a [any [tg | some alphabet| skip]]parse a [any tg some alphabet any tg ]
tg is described as:["<" opt "/" thru ">" opt lf]
a: {<root>
<lots>
</lots>
</root>}parse-trace because I do not see how print can help me here.parse-trace a [any [tg | some alphabet | tg ] ]
skip at end of rule all works. But I do not understand whats it's does tg. And this part of rule should not works: some alphabet | tg>> a: {<root>
{ <lots>
{ </lots>
{ </root>}
== {<root>^/<lots> ^/</lots>^/</root>}parse a [any [tg | some alphabet| skip]]skip, but till I am learning I want to declare all explicitly a: { ... }?>> a: {<root>^/<lots> ^/</lots>^/</root>}
== {<root>^/<lots> ^/</lots>^/</root>}
>> b: load a
== [<root>
<lots>
</lots>
</root>
]
>> parse b [some tag!]
== trueload any random input and have it work.tg definition, unless it really is part of the tag. Then in the rule which contains tgs, you can define where your whitespace can appear. For example, instead of using | skip (which is like "something else can appear here, but I don't know what"), you could have | some whitespacewhitespace: [crlf | lf | cr (print "how old is this text file?!") | tab | space] before a #"^/" followed by a string!; b) insert a after a string! followed by a #"^/". doctree: [
[<h3> ["Preparation for Code Generation"] </h3>]
#"^/"
#"^/"
{Before code can be generated, it is generally necessary tomanipulate and change the internal fprogram in some way. Runtimestorage must be allocated to variables. In FORTRAN, COMMON andEQUIVALENCE statements must be prccessed. One important pointincluded here is the optimization of the program in order toreduce the execution time of the object program.}
#"^/"
#"^/"
[<h3> ["Code Generation"] </h3>]
#"^/"
#"^/"
{This is the actual translation of the internal source progranminto assembly language or machine language. This is perhaps themessiest and most detailed part, but the easiest to understand.Assuming we have an internal form of quadruples as outlinedabove, we generate code for each quadruple in order. For thethree quadruples listed above we could generate, on the IBM 360,the assembly language}
#"^/"
#"^/"
]
parse doctree [
any [
to #"^/" string! insert <p>
| thru string! #"^/" insert </p> insert <p>
| skip ]
]to .. than it does not skip thru it.. so the first line always fails. @cloutiy to [#"^/" string!].parse doctree [ any [
to [#"^/" string!] insert <p> to [string! #"^/"] thru [string! #"^/"] insert </p>
| skip ] ]
remove-each item doctree [ item = #"^/" ]to [string! #"^/"].parse a [any [s: tg e: (checktag copy/part e s ) | some alphabet | tg ] ]
checktag: func [tag] [
if find tag "<lots>" [print [tag " - " length? tag]]
]>> parse a [any [s: tg e: (checktag copy/part e s ) | some alphabet | tg ] ] <lots> - 7 == true
- 7 on one linetrim, but did not get result:parse a [any [s: tg e: (checktag trim copy/part e s ) | some alphabet | tg ] ]? trim ;)lffor simplify debuging? print moldprobeparse a [thru "<lots>" s: to "</lots>" e: (checktag copy/part s e)]
checktag: func [tag] [
tag: trim/lines tag
any [tg (print "hello")]
]a: {<root>
<id>19160099</id>
<purchaseNumber>0373200101018000262</purchaseNumber>
<lots>
<lot>
<name>Foo1</name>
<price>101</price>
</lot>
<lot>
<name>Bar2</name>
<price>201</price>
</lot>
<lot>
<name>Baz3</name>
<price>302</price>
</lot>
</lots>
</root>}checktag function there is regular Red code, not parse dialect. checktag returns the result of any (native!) function, not parse dialect`s keyword.checktag: [some [not </lots> [remove [#" " | #"^/"] | skip]]] parse a [thru <lots> checktag </lots>] print a <root> <id>19160099</id> <purchaseNumber>0373200101018000262</purchaseNumber> <lots><lot><name>Foo1</name><price>101</price></lot><lot><name>Bar2</name><price>201</price></lot><lot><name>Baz3</name><price>302</price></lot></lots> </root>
>> parse "<bb><aa><bb><aa><aa>" [ any [to "<aa>" s: thru "<aa>" e: (print copy/part e s) ] ] <aa> <aa> <aa>
tags. but not "">> print parse "<bb><aa><bb><aa><aa>" [collect any [keep <aa> | skip] ] <aa> <aa> <aa> >> parse "<bb><aa><bb><aa><aa>" [any [copy _ <aa> (print _) | skip] ] <aa> <aa> <aa> == true
NAMES: copy []
[] probably. Have you [read](https://github.com/red/red/wiki/%5BDOC%5D-Why-you-have-to-copy-series-values) about importance of copy?list: ["Abel" "Cain" "Seth"] add-names: func [/local names][names: [] append names list] add-names ;== ["Abel" "Cain" "Seth"] add-names ;== ["Abel" "Cain" "Seth" "Abel" "Cain" "Seth"] add-names ;== ["Abel" "Cain" "Seth" "Abel" "Cain" "Seth" "Abel" "Cain" "Seth"] ;----------- add-names: func [/local names][names: copy [] append names list] add-names ;== ["Abel" "Cain" "Seth"] add-names ;== ["Abel" "Cain" "Seth"] add-names ;== ["Abel" "Cain" "Seth"]
parse "<bb><aa>123</aa><bb><aa>642</aa>" [ any [ thru "<aa>" copy x to "</aa>" (append vals x) ] ]
>> to-json vals
== {["123","642"]}{
"vals": [123, 642]
}>> data: [name [123 642]] == [name [123 642]] >> data/name == [123 642] >> data/name/1 == 123 >> select data 'name == [123 642] >> pick select data 'name 1 == 123
print json: rejoin collect [
vals: []
parse "<bb><aa>123</aa><bb><aa>642</aa>" [
collect into vals any [ thru "<aa>" keep to "</aa>" ]
]
keep {^{^/ "vals": [}
forall vals [
if not head? vals [
insert vals comma
vals: next vals
]
]
keep rejoin vals
keep "]^/}"
]
{
"vals": [123,642]
}? to-json>> print to-json object [vals: [123 345]]
{"vals":[123,345]}
>> print to-json/pretty object [vals: [123 345]] " "
{
"vals": [
123,
345
]
}xml: "<bb><aa>123</aa><bb><aa>642</aa>"
print to-json object [vals: parse xml [collect [any [<aa> keep to </aa> | skip]]]]
{"vals":["123","642"]} copy _ to keep (load _)load is doing here?to-integer might be used too of course prices: []
names: []
parse a [thru "<lots>"
collect [
any [
ws |
some [
collect set prices any [ thru "<price>" keep to "</price>" | skip ]
collect set names any [ thru "<name>" keep to "</name>" | skip ]
]
]
]
"</lots>" ]>> to-json object [_prices: :prices]
== {{"_prices":["101","201","302"]}}
>> to-json object [_names: :names]
== {{"_names":[]}}a: {<root>
<id>19160099</id>
<purchaseNumber>0373200101018000262</purchaseNumber>
<lots>
<lot>
<name>Foo1</name>
<price>101</price>
</lot>
<lot>
<name>Bar2</name>
<price>201</price>
</lot>
<lot>
<name>Baz3</name>
<price>302</price>
</lot>
</lots>
</root>}some and nested rules should process both of sub-ruleany [ thru "" keep to " " | skip ] do?parse a [
thru <lots>
some [
</lots> to end
| collect into prices [ "<price>" keep to "</price>" ]
| collect into names [ "<name>" keep to "</name>" ]
| skip
]
]
to-json object [_prices: :prices]
;== {{"_prices":["302","201","101"]}}
to-json object [_names: :names]
;== {{"_names":["Baz3","Bar2","Foo1"]}}thru-to pair is tricky and treacherous, especially combined with any or some quantifier. >> o: object [digit: charset [#"0" - #"9"]]
== make object! [
digit: make bitset! #{000000000000FFC0}
]
>> oo: make o [alpha: charset "abc..."]
== make object! [
digit: make bitset! #{000000000000FFC0}
alpha: make bitset! #{00000000000200000000000070}
]collect intoto collect set set will create new variable that will be cleaned on every iteration (because it inside some)?>> parse "aabbccddaa" [some [collect into b keep "aa" | skip ] ] == true >> b == ["aa" "aa"] >> >> parse "aabbccddaa" [some [collect set b keep "aa" | skip ] ] == true >> b == ["aa"]
someloop, your bis being re-set to the actual value. I am not used to use set, I prefer a more freedom in terms of the paren (code).thru is breaking collecting both price and names (without thru both of them are collectiing)id: []
prices: []
names: []
parse a [
; collect into id [thru "<id>" keep to "</id>"]
thru "<lots>"
any [
collect into prices [thru "<price>" keep to "</price>"] |
collect into names [ thru "<name>" keep to "</name>" ] |
skip
]
"</lots>" ]
to-json object [_prices: :prices]
to-json object [_names: :names]
>> to-json object [_prices: :prices]
== {{"_prices":["302","201","101"]}}
>> to-json object [_names: :names]
== {{"_names":[]}}data: object [
id: []
prices: []
names: []
]
parse a [
collect into id [thru "<id>" keep to "</id>"]
thru "<lots>"
any [
collect into data/prices [thru "<price>" keep to "</price>"] |
collect into data/names [ thru "<name>" keep to "</name>" ] |
skip
]
"</lots>" ]
to-json object [data]Script Error: PARSE - unexpected end of rule after: collectthru above should be removed, because they are part of different questionthru is breaking collectingthru you are jumping to next on each iteration and second rule gets its chance only when there are no more tags (alas, no more tags either), but without thru rules are advancing orderly by little steps.thru... If I understand Toomas correctly keep to working, but thru is force to jumping to next price block> data: object [ > id: [] > prices: [] > names: [] > ] > parse a [ > collect into id [thru "<id>" keep to "</id>"] > thru "<lots>" > any [ > collect into data/prices [thru "<price>" keep to "</price>"] | > collect into data/names [ thru "<name>" keep to "</name>" ] | > skip > ] > "</lots>" ] > > to-json object [data] >
Script Error: PARSE - unexpected end of rule after: collectcollect into object I will be thankful.[ "<price>" copy price to "</price>"] (append data/prices price ) |
collectmyself, am mentally stuck with the R2 parse and doing stuff in parens, like you just did :-), but here's the code, which seems to work:id: []
prices: []
names: []
parse a [
collect into id [thru "<id>" keep to "</id>"]
thru "<lots>"
any [
thru [
"<price>" collect into prices keep to "</price>"
| "<name>" collect into names keep to "</name>"
] | skip
]
to end
]
== true
>> id
== ["19160099"]
>> prices
== ["302" "201" "101"]
>> names
== ["Baz3" "Bar2" "Foo1"]thruan alternating rule. Not two of them, just one with options. 2) the problem also seems to be your objects. It seems parse can't use something like collect into data/prices, most probably it considers it being a path. When I moved those subobjects from the dataobject, it seems to work ....parse yet.{{"id":["19160099"],"lots":[ {name: "", price: "" } ] }}data: object [
id: []
lots: object [
prices: []
names: []
]
]
```
but `to-json` generate `lots` not as array, but as dict:`id: [] coll: []
parse a [
collect into id [
thru <id> keep to </id>
]
thru <lots>
collect into coll any [
</lots> to end
| <price> copy p to </price> thru <name> copy n to </name>
keep (object compose [price: (p) name: (n)])
| skip
]
]
to-json object compose/only [id: (id) lots: (coll)]
;== {{"id":["19160099"],"lots":[{"price":"101","name":"Bar2"},{"price":"201","name":"Baz3"}]}}parse on a text file using collect to give me a block!. The output is as follows.:[ chapter "Title" p ["Some text with @i[inline @b[formatting]]"] p ["Another paragraph"] h2 "A Level 2 Heading" ]
parse, using different rules. Specifically the block! following 'p is just a string!. What I want to do is find every sequence of 'p block! and parse the string! that is in the block, then replace that block with the result of parsing its contents (collected using collect.)into? Something along the lines of:parse doctree rules: [
'p mark: into block! (
poke mark parse mark collect [
the.rules.to.parse.the.paragraph.string ] )
]ahead block! change into [set s string!] (do something with s, return new result)ahead block! into [set s string! (modify s buffer in place)] if you still need the blockdata: object [
id: []
lots: [
]
]
parse a [
thru "<lots>"
collect any [
"<price>" copy p to "</price>" thru "<name>" copy n to "</name>" ( append data/lots object compose [ price: (p) name: (n) ] ) | skip
]
"</lots>"
]
to-json data{{"id":[],"lots":[{"price":"101","name":"Bar2"},{"price":"201","name":"Baz3"}]}}, whitch is not correct result. Homework exercise still not done :) You should get [{"price":"Foo1","name":"101"},{"price":"Bar2","name":"201"},{"price":"Baz3","name":"302"}]collect if you are not keeping anything.data: object [
id: []
lots: [
maxPrice: []
purchaseObjects: [
]
]
] append data/lots/purchaseObjects object compose [ price: (2 + 2) name: ("Mike") ]make object! [
id: []
lots: [
maxPrice: []
purchaseObjects: [make object! [
price: 4
name: "Mike"
]]
]
]make object! here...to-json to-block data good idea?{["id:",[],"lots:",["maxPrice:",[],"purchaseObjects:",[{"price":4,"name":"Mike"}]]]}object. Everything else is same.to-block in my code, because in my variant of code I use object compose for creation {"price":4,"name":"Mike"}. But now it's not problem for me. This function is dowing what I wantdataas block if you don't want it to be object:data: [
id: []
lots: [
maxPrice: []
purchaseObjects: [
]
]
]a: {<root>
<id>19160099</id>
<purchaseNumber>0373200101018000262</purchaseNumber>
<lot>
<maxPrice>8186313.66</maxPrice>
<purchaseObjects>
<purchaseObject>
<name>Foo1</name>
<price>111</price>
</purchaseObject>
<purchaseObject>
<name>Bar2</name>
<price>222</price>
</purchaseObject>
<purchaseObject>
<name>Baz3</name>
<price>333</price>
</purchaseObject>
</purchaseObjects>
</lot>
</root>}data: [
id: []
lots: [
maxPrice: []
purchaseObjects: [
]
]
]
parse a [
thru "<id>" copy id to "</id>" (append data/id id )
thru "<purchaseObjects>"
collect any [
"<price>" copy p to "</price>" thru "<name>" copy n to "</name>" ( append data/lots/purchaseObjects object compose [ price: (p) name: (n) ] ) | skip
]
"</purchaseObjects>"
]
write %file.txt to-json data["id:",["19160099"],"lots:",["maxPrice:",[],"purchaseObjects:",[{"price":"111","name":"Bar2"},{"price":"222","name":"Baz3"}]]]"id:",["19160099"], instead of "id:": ["19160099"],etcto-block the result would be exactly if as you made data into a block in the first place.data: object [
id: []
lots: [
maxPrice: []
purchaseObjects: []
]
]
parse a [
(clear data/id clear data/lots/purchaseObjects)
thru "<id>" copy id to "</id>" (append data/id id )
thru "<purchaseObjects>"
collect any [
"<name>" copy p to "</name>" thru "<price>" copy n to "</price>" (
append data/lots/purchaseObjects object compose [ price: (p) name: (n) ]
)
| skip
]
"</purchaseObjects>"
]
probe to-json dataparse "<bb><aa><bb><aa><aa>" [any [copy _ <aa> (print _) | skip] ]
parse "<bb><aa><bb><aa><aa>" [any [to _ <aa> (print _) ] ]
_ is nothing special but just a ordinary word, so to _ is like to my-word-without-a-value, hence it will fail.>> parse "<bb><aa><bb><aa><aa>" [any [to _ <aa> (print _) ] ] *** Script Error: PARSE - invalid rule or usage of rule: _ *** Where: parse *** Stack:
_ is unset (and it expects this word to hold a rule).unset! but "" , but if you haven't (in fresh console), then it has no value.