Normal Api » History » Version 5
iri, 02/24/2011 12:23 AM
1 | 1 | iri | h1. Normal Api |
---|---|---|---|
2 | |||
3 | These functions are more complexes. If they are not required by your work, don't use it ! |
||
4 | |||
5 | First, these flags are availables : |
||
6 | |||
7 | h3. CompileFlags : |
||
8 | |||
9 | * PCRE_REGEX_CASELESS |
||
10 | Letters in the pattern match both upper- and lowercase letters. This option can be changed within a pattern by a "(?i)" option setting. |
||
11 | * PCRE_REGEX_MULTILINE |
||
12 | By default, GRegex treats the strings as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter ("^") matches only at the start of the string, while the "end of line" metacharacter ("$") matches only at the end of the string, or before a terminating newline (unless G_REGEX_DOLLAR_ENDONLY is set). When G_REGEX_MULTILINE is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the string, respectively, as well as at the very start and end. This can be changed within a pattern by a "(?m)" option setting. |
||
13 | * PCRE_REGEX_DOTALL |
||
14 | A dot metacharater (".") in the pattern matches all characters, including newlines. Without it, newlines are excluded. This option can be changed within a pattern by a ("?s") option setting. |
||
15 | * PCRE_REGEX_EXTENDED |
||
16 | Whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped "#" outside a character class and the next newline character, inclusive, are also ignored. This can be changed within a pattern by a "(?x)" option setting. |
||
17 | * PCRE_REGEX_ANCHORED |
||
18 | The pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched. This effect can also be achieved by appropriate constructs in the pattern itself such as the "^" metacharater. |
||
19 | * PCRE_REGEX_DOLLAR_ENDONLY |
||
20 | A dollar metacharacter ("$") in the pattern matches only at the end of the string. Without this option, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This option is ignored if G_REGEX_MULTILINE is set. |
||
21 | * PCRE_REGEX_UNGREEDY |
||
22 | Inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It can also be set by a "(?U)" option setting within the pattern. |
||
23 | * PCRE_REGEX_RAW |
||
24 | Usually strings must be valid UTF-8 strings, using this flag they are considered as a raw sequence of bytes. |
||
25 | * PCRE_REGEX_NO_AUTO_CAPTURE |
||
26 | Disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by "?" behaves as if it were followed by "?:" but named parentheses can still be used for capturing (and they acquire numbers in the usual way). |
||
27 | * PCRE_REGEX_OPTIMIZE |
||
28 | Optimize the regular expression. If the pattern will be used many times, then it may be worth the effort to optimize it to improve the speed of matches. |
||
29 | * PCRE_REGEX_DUPNAMES |
||
30 | Names used to identify capturing subpatterns need not be unique. This can be helpful for certain types of pattern when it is known that only one instance of the named subpattern can ever be matched. |
||
31 | * PCRE_REGEX_NEWLINE_CR |
||
32 | Usually any newline character is recognized, if this option is set, the only recognized newline character is '\r'. |
||
33 | * PCRE_REGEX_NEWLINE_LF |
||
34 | Usually any newline character is recognized, if this option is set, the only recognized newline character is '\n'. |
||
35 | * PCRE_REGEX_NEWLINE_CRLF |
||
36 | Usually any newline character is recognized, if this option is set, the only recognized newline character sequence is '\r\n'. |
||
37 | |||
38 | h3. MatchFlags : |
||
39 | |||
40 | * PCRE_MATCH_ANCHORED |
||
41 | The pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched. This effect can also be achieved by appropriate constructs in the pattern itself such as the "^" metacharater. |
||
42 | * PCRE_MATCH_NOTBOL |
||
43 | Specifies that first character of the string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without G_REGEX_MULTILINE (at compile time) causes circumflex never to match. This option affects only the behaviour of the circumflex metacharacter, it does not affect "\A". |
||
44 | * PCRE_MATCH_NOTEOL |
||
45 | Specifies that the end of the subject string is not the end of a line, so the dollar metacharacter should not match it nor (except in multiline mode) a newline immediately before it. Setting this without G_REGEX_MULTILINE (at compile time) causes dollar never to match. This option affects only the behaviour of the dollar metacharacter, it does not affect "\Z" or "\z". |
||
46 | * PCRE_MATCH_NOTEMPTY |
||
47 | An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the pattern "a?b?" is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the string. With this flag set, this match is not valid, so GRegex searches further into the string for occurrences of "a" or "b". |
||
48 | * PCRE_MATCH_PARTIAL |
||
49 | Turns on the partial matching feature, for more documentation on partial matching |
||
50 | * PCRE_MATCH_NEWLINE_CR |
||
51 | Overrides the newline definition set when creating a new GRegex, setting the '\r' character as line terminator. |
||
52 | * PCRE_MATCH_NEWLINE_LF |
||
53 | Overrides the newline definition set when creating a new GRegex, setting the '\n' character as line terminator. |
||
54 | * PCRE_MATCH_NEWLINE_CRLF |
||
55 | Overrides the newline definition set when creating a new GRegex, setting the '\r\n' characters as line terminator. |
||
56 | * PCRE_MATCH_NEWLINE_ANY |
||
57 | Overrides the newline definition set when creating a new GRegex, any newline character or character sequence is recognized. |
||
58 | |||
59 | 2 | iri | Explanations (above) from these flags from the GLib's documentations. |
60 | 1 | iri | |
61 | 2 | iri | h3. AlgoFlag |
62 | 1 | iri | |
63 | 2 | iri | Algorithm to match : |
64 | 3 | iri | * PCRE_MATCH_STANDARD |
65 | 4 | iri | standard (by default) |
66 | 3 | iri | * PCRE_MATCH_DFA |
67 | DFA (Deterministic Finite Automaton) |
||
68 | 2 | iri | |
69 | Now, the functions are listed below. |
||
70 | |||
71 | 1 | iri | h2. _pcreNormalMatch |
72 | |||
73 | Scans for a match in string for the pattern. |
||
74 | |||
75 | 2 | iri | Prototype : *fun [S S I I I I] [[S I I] r1]* |
76 | |||
77 | # S : pattern |
||
78 | # S : string |
||
79 | # I : compileflag, (see above) or nil (nothing) |
||
80 | # I : start, starting index of the string to match |
||
81 | # I : matchflag, (see above) or nil (nothing) |
||
82 | # I : algoflag, (see above) |
||
83 | 5 | iri | |
84 | 2 | iri | +Return+ : [[S I I] r1] : a list, each element is a tuple : the matched word, the start position where it has been found, the end position |
85 | nil if error. |
||
86 | |||
87 | h2. _pcreNormalSplit |
||
88 | |||
89 | Breaks the string on the pattern, and returns a list of the tokens. |
||
90 | |||
91 | Prototype : *fun [S S I I I I] [S r1]* |
||
92 | |||
93 | # S : pattern |
||
94 | # S : string |
||
95 | # I : compileflag, (see above) or nil (nothing) |
||
96 | # I : start, starting index of the string to match |
||
97 | # I : matchflag, (see above) or nil (nothing) |
||
98 | # I : max, the maximum number of tokens to split string into. If this is less than 1, the string is split completely |
||
99 | 5 | iri | |
100 | 2 | iri | +Return+ : [S r1] : a list of broken string or nil if error. |
101 | |||
102 | h2. _pcreNormalReplace |
||
103 | |||
104 | Replaces all occurrences of the pattern in string with a replacement text. |
||
105 | |||
106 | Prototype : |
||
107 | |||
108 | Return [[API]] |