Fix sanitizer config - multiple rules (#11133)
In #9888, it was reported that my earlier pull request #9075 didn't quite function as expected. I was quite hopeful the `ValuesWithShadow()` worked as expected (and, I thought my testing showed it did) but I guess not. @zeripath proposed an alternative syntax which I like: ```ini [markup.sanitizer.1] ELEMENT=a ALLOW_ATTR=target REGEXP=something [markup.sanitizer.2] ELEMENT=a ALLOW_ATTR=target REGEXP=something ``` This was quite easy to adopt into the existing code. I've done so in a semi-backwards-compatible manner: - The value from `.Value()` is used for each element. - We parse `[markup.sanitizer]` and all `[markup.sanitizer.*]` sections and add them as rules. This means that existing configs will load one rule (not all rules). It also means people can use string identifiers (`[markup.sanitiser.KaTeX]`) if they prefer, instead of numbered ones. Co-authored-by: Andrew Thornton <art27@cantab.net> Co-authored-by: guillep2k <18600385+guillep2k@users.noreply.github.com>
This commit is contained in:
		
							parent
							
								
									6b6f20b6d4
								
							
						
					
					
						commit
						1bf9e44bda
					
				
					 4 changed files with 39 additions and 40 deletions
				
			
		|  | @ -976,8 +976,10 @@ SHOW_FOOTER_VERSION = true | ||||||
| ; Show template execution time in the footer | ; Show template execution time in the footer | ||||||
| SHOW_FOOTER_TEMPLATE_LOAD_TIME = true | SHOW_FOOTER_TEMPLATE_LOAD_TIME = true | ||||||
| 
 | 
 | ||||||
| [markup.sanitizer] | [markup.sanitizer.1] | ||||||
| ; The following keys can be used multiple times to define sanitation policy rules. | ; The following keys can appear once to define a sanitation policy rule. | ||||||
|  | ; This section can appear multiple times by adding a unique alphanumeric suffix to define multiple rules. | ||||||
|  | ; e.g., [markup.sanitizer.1] -> [markup.sanitizer.2] -> [markup.sanitizer.TeX] | ||||||
| ;ELEMENT = span | ;ELEMENT = span | ||||||
| ;ALLOW_ATTR = class | ;ALLOW_ATTR = class | ||||||
| ;REGEXP = ^(info|warning|error)$ | ;REGEXP = ^(info|warning|error)$ | ||||||
|  |  | ||||||
|  | @ -658,7 +658,7 @@ Two special environment variables are passed to the render command: | ||||||
| Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc. | Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc. | ||||||
| 
 | 
 | ||||||
| ```ini | ```ini | ||||||
| [markup.sanitizer] | [markup.sanitizer.TeX] | ||||||
| ; Pandoc renders TeX segments as <span>s with the "math" class, optionally | ; Pandoc renders TeX segments as <span>s with the "math" class, optionally | ||||||
| ; with "inline" or "display" classes depending on context. | ; with "inline" or "display" classes depending on context. | ||||||
| ELEMENT = span | ELEMENT = span | ||||||
|  | @ -670,7 +670,7 @@ REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+ | ||||||
|  - `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty. |  - `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty. | ||||||
|  - `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute. |  - `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute. | ||||||
| 
 | 
 | ||||||
| You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. | Multiple sanitisation rules can be defined by adding unique subsections, e.g. `[markup.sanitizer.TeX-2]`. | ||||||
| 
 | 
 | ||||||
| ## Time (`time`) | ## Time (`time`) | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -73,7 +73,7 @@ IS_INPUT_FILE = false | ||||||
| If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/). | If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/). | ||||||
| 
 | 
 | ||||||
| ```ini | ```ini | ||||||
| [markup.sanitizer] | [markup.sanitizer.TeX] | ||||||
| ; Pandoc renders TeX segments as <span>s with the "math" class, optionally | ; Pandoc renders TeX segments as <span>s with the "math" class, optionally | ||||||
| ; with "inline" or "display" classes depending on context. | ; with "inline" or "display" classes depending on context. | ||||||
| ELEMENT = span | ELEMENT = span | ||||||
|  | @ -86,6 +86,11 @@ FILE_EXTENSIONS = .md,.markdown | ||||||
| RENDER_COMMAND  = pandoc -f markdown -t html --katex | RENDER_COMMAND  = pandoc -f markdown -t html --katex | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute. | You must define `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` in each section. | ||||||
|  | 
 | ||||||
|  | To define multiple entries, add a unique alphanumeric suffix (e.g., `[markup.sanitizer.1]` and `[markup.sanitizer.something]`). | ||||||
| 
 | 
 | ||||||
| Once your configuration changes have been made, restart Gitea to have changes take effect. | Once your configuration changes have been made, restart Gitea to have changes take effect. | ||||||
|  | 
 | ||||||
|  | **Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section with keys that were redefined for multiple rules, however, | ||||||
|  | there were significant problems with this method of configuration necessitating configuration through multiple sections. | ||||||
|  | @ -44,7 +44,7 @@ func newMarkup() { | ||||||
| 			continue | 			continue | ||||||
| 		} | 		} | ||||||
| 
 | 
 | ||||||
| 		if name == "sanitizer" { | 		if name == "sanitizer" || strings.HasPrefix(name, "sanitizer.") { | ||||||
| 			newMarkupSanitizer(name, sec) | 			newMarkupSanitizer(name, sec) | ||||||
| 		} else { | 		} else { | ||||||
| 			newMarkupRenderer(name, sec) | 			newMarkupRenderer(name, sec) | ||||||
|  | @ -67,44 +67,36 @@ func newMarkupSanitizer(name string, sec *ini.Section) { | ||||||
| 		return | 		return | ||||||
| 	} | 	} | ||||||
| 
 | 
 | ||||||
| 	elements := sec.Key("ELEMENT").ValueWithShadows() | 	elements := sec.Key("ELEMENT").Value() | ||||||
| 	allowAttrs := sec.Key("ALLOW_ATTR").ValueWithShadows() | 	allowAttrs := sec.Key("ALLOW_ATTR").Value() | ||||||
| 	regexps := sec.Key("REGEXP").ValueWithShadows() | 	regexpStr := sec.Key("REGEXP").Value() | ||||||
| 
 | 
 | ||||||
| 	if len(elements) != len(allowAttrs) || | 	if regexpStr == "" { | ||||||
| 		len(elements) != len(regexps) { | 		rule := MarkupSanitizerRule{ | ||||||
| 		log.Error("All three keys in markup.%s (ELEMENT, ALLOW_ATTR, REGEXP) must be defined the same number of times! Got %d, %d, and %d respectively.", name, len(elements), len(allowAttrs), len(regexps)) | 			Element:   elements, | ||||||
|  | 			AllowAttr: allowAttrs, | ||||||
|  | 			Regexp:    nil, | ||||||
|  | 		} | ||||||
|  | 
 | ||||||
|  | 		ExternalSanitizerRules = append(ExternalSanitizerRules, rule) | ||||||
| 		return | 		return | ||||||
| 	} | 	} | ||||||
| 
 | 
 | ||||||
| 	ExternalSanitizerRules = make([]MarkupSanitizerRule, 0, len(elements)) | 	// Validate when parsing the config that this is a valid regular
 | ||||||
| 
 | 	// expression. Then we can use regexp.MustCompile(...) later.
 | ||||||
| 	for index, pattern := range regexps { | 	compiled, err := regexp.Compile(regexpStr) | ||||||
| 		if pattern == "" { | 	if err != nil { | ||||||
| 			rule := MarkupSanitizerRule{ | 		log.Error("In module.%s: REGEXP (%s) at definition %d failed to compile: %v", regexpStr, name, err) | ||||||
| 				Element:   elements[index], | 		return | ||||||
| 				AllowAttr: allowAttrs[index], |  | ||||||
| 				Regexp:    nil, |  | ||||||
| 			} |  | ||||||
| 			ExternalSanitizerRules = append(ExternalSanitizerRules, rule) |  | ||||||
| 			continue |  | ||||||
| 		} |  | ||||||
| 
 |  | ||||||
| 		// Validate when parsing the config that this is a valid regular
 |  | ||||||
| 		// expression. Then we can use regexp.MustCompile(...) later.
 |  | ||||||
| 		compiled, err := regexp.Compile(pattern) |  | ||||||
| 		if err != nil { |  | ||||||
| 			log.Error("In module.%s: REGEXP at definition %d failed to compile: %v", name, index+1, err) |  | ||||||
| 			continue |  | ||||||
| 		} |  | ||||||
| 
 |  | ||||||
| 		rule := MarkupSanitizerRule{ |  | ||||||
| 			Element:   elements[index], |  | ||||||
| 			AllowAttr: allowAttrs[index], |  | ||||||
| 			Regexp:    compiled, |  | ||||||
| 		} |  | ||||||
| 		ExternalSanitizerRules = append(ExternalSanitizerRules, rule) |  | ||||||
| 	} | 	} | ||||||
|  | 
 | ||||||
|  | 	rule := MarkupSanitizerRule{ | ||||||
|  | 		Element:   elements, | ||||||
|  | 		AllowAttr: allowAttrs, | ||||||
|  | 		Regexp:    compiled, | ||||||
|  | 	} | ||||||
|  | 
 | ||||||
|  | 	ExternalSanitizerRules = append(ExternalSanitizerRules, rule) | ||||||
| } | } | ||||||
| 
 | 
 | ||||||
| func newMarkupRenderer(name string, sec *ini.Section) { | func newMarkupRenderer(name string, sec *ini.Section) { | ||||||
|  |  | ||||||
		Loading…
	
		Reference in a new issue