Board Thread:Lua Help/@comment-4405550-20160124224015/@comment-4405550-20160207144836

Do you mean something like what is described in this thread?: https://www.mediawiki.org/w/index.php?title=Topic:Stsdy2duez1d92zd&action=history Yup. So something like |templates&meta=siteinfo&titles=Wookieepedia:Star_Wars:_Uprising_Super_Walkthrough/Stats&rvprop=content|user|comment&rvgeneratexml=1 this would reliably identify all the templates directly called, and indirectly called via transclusion.

But this doesn't identify #invoke calls, so they still need to be extracted manually.

Although parsing templates from pages can be tricky. I wonder if there is an equivalent mwparserfromhell for node.js. The above regex is just a quick and dirty solution. Since we would just be sifting through wikitext from saved pages, we can assume it is valid. This greatly reduces the complexity by eliminating the need for most error-checking/validation.

Additionally, we don't really care about HTML or extensions that use XML/HTML syntax (like ). So the "parser" would do nothing but:


 * create an array for all the different block elements - lets call it "blocks"
 * create an array for the opening block sequences and associated block pointer - lets call it "stack"
 * Sequences will either be "{{{", "{{", " " or " " (could add more if we wanted to)
 * We could make it just a single "{" but that would cause issues when being used as a literal bracket
 * Each element will resemble { open: "{{", block: BLOCK_POINTER } (could be extended to provide additional info as it is parsed, like template/parser function name)
 * create a variable to reference the current stack being worked on
 * loop through the string looking for opening sequences
 * Ignore ALL opening sequences while inside a "pre" block
 * Ignore NON PRE opening sequences while inside a "nowiki" block
 * Ignore NON PRE closing sequences while inside a "pre" block
 * Ignore NON PRE/NOWIKI closing sequences while inside a "nowiki" block
 * If an opening sequence is matched:
 * create a new array and push it onto the current stack's block
 * add a new element to the stack with the previously created array as "block"
 * Set the current stack to the element previously created
 * Push the matched characters to the current stack's block
 * If closing sequence is matched:
 * Check it's a match for the current stack's opening sequence
 * if it doesn't match (or the stack's length is 0), send a warning to the console and ignore the sequence
 * push the current characters to the end of the current stack's block
 * remove the last index from the stack
 * set the current stack to the last element in the stack
 * If no match
 * Push the current character to the end of the current stack's block
 * If the last element in the block is a string, append that instead
 * End of the string:
 * If the stack's length is 0, return block
 * If the stack doesn't contain any "{{{", "{{" elements, return block (we don't care about XML/HTML blocks not being closed)
 * Otherwise, throw an error

It may be oversimplified, but it accomplishes the simple task of breaking apart wikitext (plus i'm pretty sure my description is longer than the actual function would be :P ). You can replace any block or insert/remove text, then stitch it back together before sending it off to be parsed by the server.