mirror of
https://codeberg.org/forgejo/forgejo.git
synced 2025-05-25 11:22:16 +00:00
Restrict repository indexing by glob match (#7767)
* Restrict repository indexing by file extension
* Use REPO_EXTENSIONS_LIST_INCLUDE instead of REPO_EXTENSIONS_LIST_EXCLUDE and have a more flexible extension pattern
* Corrected to pass lint gosimple
* Add wildcard support to REPO_INDEXER_EXTENSIONS
* This reverts commit 72a650c8e4
.
* Add wildcard support to REPO_INDEXER_EXTENSIONS (no make vendor)
* Simplify isIndexable() for better clarity
* Add gobwas/glob to vendors
* manually set appengine new release
* Implement better REPO_INDEXER_INCLUDE and REPO_INDEXER_EXCLUDE
* Add unit and integration tests
* Update app.ini.sample and reword config-cheat-sheet
* Add doc page and correct app.ini.sample
* Some polish on the doc
* Simplify code as suggested by @lafriks
This commit is contained in:
parent
3fd0eec900
commit
72f6d5c882
38 changed files with 920 additions and 17 deletions
|
@ -181,6 +181,8 @@ Values containing `#` or `;` must be quoted using `` ` `` or `"""`.
|
|||
|
||||
- `REPO_INDEXER_ENABLED`: **false**: Enables code search (uses a lot of disk space, about 6 times more than the repository size).
|
||||
- `REPO_INDEXER_PATH`: **indexers/repos.bleve**: Index file used for code search.
|
||||
- `REPO_INDEXER_INCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **include** in the index. Use `**.txt` to match any files with .txt extension. An empty list means include all files.
|
||||
- `REPO_INDEXER_EXCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **exclude** from the index. Files that match this list will not be indexed, even if they match in `REPO_INDEXER_INCLUDE`.
|
||||
- `UPDATE_BUFFER_LEN`: **20**: Buffer length of index request.
|
||||
- `MAX_FILE_SIZE`: **1048576**: Maximum size in bytes of files to be indexed.
|
||||
|
||||
|
|
58
docs/content/doc/advanced/repo-indexer.en-us.md
Normal file
58
docs/content/doc/advanced/repo-indexer.en-us.md
Normal file
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
date: "2019-09-06T01:35:00-03:00"
|
||||
title: "Repository indexer"
|
||||
slug: "repo-indexer"
|
||||
weight: 45
|
||||
toc: true
|
||||
draft: false
|
||||
menu:
|
||||
sidebar:
|
||||
parent: "advanced"
|
||||
name: "Repository indexer"
|
||||
weight: 45
|
||||
identifier: "repo-indexer"
|
||||
---
|
||||
|
||||
# Repository indexer
|
||||
|
||||
## Setting up the repository indexer
|
||||
|
||||
Gitea can search through the files of the repositories by enabling this function in your [`app.ini`](https://docs.gitea.io/en-us/config-cheat-sheet/):
|
||||
|
||||
```
|
||||
[indexer]
|
||||
; ...
|
||||
REPO_INDEXER_ENABLED = true
|
||||
REPO_INDEXER_PATH = indexers/repos.bleve
|
||||
UPDATE_BUFFER_LEN = 20
|
||||
MAX_FILE_SIZE = 1048576
|
||||
REPO_INDEXER_INCLUDE =
|
||||
REPO_INDEXER_EXCLUDE = resources/bin/**
|
||||
```
|
||||
|
||||
Please bear in mind that indexing the contents can consume a lot of system resources, especially when the index is created for the first time or globally updated (e.g. after upgrading Gitea).
|
||||
|
||||
### Choosing the files for indexing by size
|
||||
|
||||
The `MAX_FILE_SIZE` option will make the indexer skip all files larger than the specified value.
|
||||
|
||||
### Choosing the files for indexing by path
|
||||
|
||||
Gitea applies glob pattern matching from the [`gobwas/glob` library](https://github.com/gobwas/glob) to choose which files will be included in the index.
|
||||
|
||||
Limiting the list of files prevents the indexes from becoming polluted with derived or irrelevant files (e.g. lss, sym, map, etc.), so the search results are more relevant. It can also help reduce the index size.
|
||||
|
||||
`REPO_INDEXER_INCLUDE` (default: empty) is a comma separated list of glob patterns to **include** in the index. An empty list means "_include all files_".
|
||||
`REPO_INDEXER_EXCLUDE` (default: empty) is a comma separated list of glob patterns to **exclude** from the index. Files that match this list will not be indexed. `REPO_INDEXER_EXCLUDE` takes precedence over `REPO_INDEXER_INCLUDE`.
|
||||
|
||||
Pattern matching works as follows:
|
||||
|
||||
* To match all files with a `.txt` extension no matter what directory, use `**.txt`.
|
||||
* To match all files with a `.txt` extension _only at the root level of the repository_, use `*.txt`.
|
||||
* To match all files inside `resources/bin` and below, use `resources/bin/**`.
|
||||
* To match all files _immediately inside_ `resources/bin`, use `resources/bin/*`.
|
||||
* To match all files named `Makefile`, use `**Makefile`.
|
||||
* Matching a directory has no effect; the pattern `resources/bin` will not include/exclude files inside that directory; `resources/bin/**` will.
|
||||
* All files and patterns are normalized to lower case, so `**Makefile`, `**makefile` and `**MAKEFILE` are equivalent.
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue