PCRE2CONVERT(3) Library Functions Manual PCRE2CONVERT(3) NNAAMMEE PCRE2 - Perl-compatible regular expressions (revised API) EEXXPPEERRIIMMEENNTTAALL PPAATTTTEERRNN CCOONNVVEERRSSIIOONN FFUUNNCCTTIIOONNSS This document describes a set of functions that can be used to convert "foreign" patterns into PCRE2 regular expressions. This facility is currently experimental, and may be changed in future releases. Two kinds of pattern, globs and POSIX patterns, are supported. TTHHEE CCOONNVVEERRTT CCOONNTTEEXXTT ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **ppccrree22__ccoonnvveerrtt__ccoonntteexxtt__ccrreeaattee(( ppccrree22__ggeenneerraall__ccoonntteexxtt **_g_c_o_n_t_e_x_t));; ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **ppccrree22__ccoonnvveerrtt__ccoonntteexxtt__ccooppyy(( ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **_c_v_c_o_n_t_e_x_t));; vvooiidd ppccrree22__ccoonnvveerrtt__ccoonntteexxtt__ffrreeee((ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **_c_v_c_o_n_t_e_x_t));; iinntt ppccrree22__sseett__gglloobb__eessccaappee((ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **_c_v_c_o_n_t_e_x_t,, uuiinntt3322__tt _e_s_c_a_p_e___c_h_a_r));; iinntt ppccrree22__sseett__gglloobb__sseeppaarraattoorr((ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **_c_v_c_o_n_t_e_x_t,, uuiinntt3322__tt _s_e_p_a_r_a_t_o_r___c_h_a_r));; A convert context is used to hold parameters that affect the way that pattern conversion works. Like all PCRE2 contexts, you need to use a context only if you want to override the defaults. There are the usual create, copy, and free functions. If custom memory management functions are set in a general context that is passed to ppccrree22__ccoonnvveerrtt__ccoonn-- tteexxtt__ccrreeaattee(()), they are used for all memory management within the con- version functions. There are only two parameters in the convert context at present. Both apply only to glob conversions. The escape character defaults to grave accent under Windows, otherwise backslash. It can be set to zero, mean- ing no escape character, or to any punctuation character with a code point less than 256. The separator character defaults to backslash under Windows, otherwise forward slash. It can be set to forward slash, backslash, or dot. The two setting functions return zero on success, or PCRE2_ERROR_BAD- DATA if their second argument is invalid. TTHHEE CCOONNVVEERRSSIIOONN FFUUNNCCTTIIOONN iinntt ppccrree22__ppaatttteerrnn__ccoonnvveerrtt((PPCCRREE22__SSPPTTRR _p_a_t_t_e_r_n,, PPCCRREE22__SSIIZZEE _l_e_n_g_t_h,, uuiinntt3322__tt _o_p_t_i_o_n_s,, PPCCRREE22__UUCCHHAARR ****_b_u_f_f_e_r,, PPCCRREE22__SSIIZZEE **_b_l_e_n_g_t_h,, ppccrree22__ccoonnvveerrtt__ccoonntteexxtt **_c_v_c_o_n_t_e_x_t));; vvooiidd ppccrree22__ccoonnvveerrtteedd__ppaatttteerrnn__ffrreeee((PPCCRREE22__UUCCHHAARR **_c_o_n_v_e_r_t_e_d___p_a_t_t_e_r_n));; The first two arguments of ppccrree22__ppaatttteerrnn__ccoonnvveerrtt(()) define the foreign pattern that is to be converted. The length may be given as PCRE2_ZERO_TERMINATED. The ooppttiioonnss argument defines how the pattern is to be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. One or more of the glob options, or one of the following POSIX options must be set to define the type of conver- sion that is required: PCRE2_CONVERT_GLOB PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR PCRE2_CONVERT_GLOB_NO_STARSTAR PCRE2_CONVERT_POSIX_BASIC PCRE2_CONVERT_POSIX_EXTENDED Details of the conversions are given below. The bbuuffffeerr and bblleennggtthh arguments define how the output is handled: If bbuuffffeerr is NULL, the function just returns the length of the con- verted pattern via bblleennggtthh. This is one less than the length of buffer needed, because a terminating zero is always added to the output. If bbuuffffeerr points to a NULL pointer, an output buffer is obtained using the allocator in the context or mmaalllloocc(()) if no context is supplied. A pointer to this buffer is placed in the variable to which bbuuffffeerr points. When no longer needed the output buffer must be freed by call- ing ppccrree22__ccoonnvveerrtteedd__ppaatttteerrnn__ffrreeee(()). If bbuuffffeerr points to a non-NULL pointer, bblleennggtthh must be set to the actual length of the buffer provided (in code units). In all cases, after successful conversion, the variable pointed to by bblleennggtthh is updated to the length actually used (in code units), exclud- ing the terminating zero that is always added. If an error occurs, the length (via bblleennggtthh) is set to the offset within the input pattern where the error was detected. Only gross syn- tax errors are caught; there are plenty of errors that will get passed on for ppccrree22__ccoommppiillee(()) to discover. The return from ppccrree22__ppaatttteerrnn__ccoonnvveerrtt(()) is zero on success or a non- zero PCRE2 error code. Note that PCRE2 error codes may be positive or negative: ppccrree22__ccoommppiillee(()) uses mostly positive codes and ppccrree22__mmaattcchh(()) negative ones; ppccrree22__ccoonnvveerrtt(()) uses existing codes of both kinds. A textual error message can be obtained by calling ppccrree22__ggeett__eerrrroorr__mmeess-- ssaaggee(()). CCOONNVVEERRTTIINNGG GGLLOOBBSS Globs are used to match file names, and consequently have the concept of a "path separator", which defaults to backslash under Windows and forward slash otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not permitted to match separator characters, but the double- star (**) feature (which does match separators) is supported. PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the double-star feature disabled. These options may be given together. CCOONNVVEERRTTIINNGG PPOOSSIIXX PPAATTTTEERRNNSS POSIX defines two kinds of regular expression pattern: basic and extended. These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or PCRE2_CONVERT_POSIX_EXTENDED, respectively. In POSIX patterns, backslash is not special in a character class. Unmatched closing parentheses are treated as literals. In basic patterns, ? + | {} and () must be escaped to be recognized as metacharacters outside a character class. If the first character in the pattern is * it is treated as a literal. ^ is a metacharacter only at the start of a branch. In extended patterns, a backslash not in a character class always makes the next character literal, whatever it is. There are no backrefer- ences. Note: POSIX mandates that the longest possible match at the first matching position must be found. This is not what ppccrree22__mmaattcchh(()) does; it yields the first match that is found. An application can use ppccrree22__ddffaa__mmaattcchh(()) to find the longest match, but that does not support backreferences (but then neither do POSIX extended patterns). AAUUTTHHOORR Philip Hazel University Computing Service Cambridge, England. RREEVVIISSIIOONN Last updated: 12 July 2017 Copyright (c) 1997-2017 University of Cambridge. PCRE2 10.30 12 July 2017 PCRE2CONVERT(3)