summary refs log tree commit diff
path: root/fs/f2fs/segment.c
AgeCommit message (Collapse)Author
2018-08-20f2fs: readahead encrypted block during GCChao Yu
During GC, for each encrypted block, we will read block synchronously into meta page, and then submit it into current cold data log area. So this block read model with 4k granularity can make poor performance, like migrating non-encrypted block, let's readahead encrypted block as well to improve migration performance. To implement this, we choose meta page that its index is old block address of the encrypted block, and readahead ciphertext into this page, later, if readaheaded page is still updated, we will load its data into target meta page, and submit the write IO. Note that for OPU, truncation, deletion, we need to invalid meta page after we invalid old block address, to make sure we won't load invalid data from target meta page during encrypted block migration. for ((i = 0; i < 1000; i++)) do { xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync"; } done for ((i = 0; i < 1000; i+=2)) do { rm /mnt/f2fs/dir/$i; } done ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0); Before: gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc] gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1 gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213902: block_plug: [gc] gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc] gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226414: block_plug: [gc] gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc] gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226911: block_plug: [gc] gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc] gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1 After: gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1 gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] .... 214327.026046: block_plug: [gc] Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gcJaegeuk Kim
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop. If it hits the miximum retrials in GC, let's give a chance to release gc_mutex for a short time in order not to go into live lock in the worst case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20f2fs: fix performance issue observed with multi-thread sequential readJaegeuk Kim
This reverts the commit - "b93f771 - f2fs: remove writepages lock" to fix the drop in sequential read throughput. Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L device: UFS Before - read throughput: 185 MB/s total read requests: 85177 (of these ~80000 are 4KB size requests). total write requests: 2546 (of these ~2208 requests are written in 512KB). After - read throughput: 758 MB/s total read requests: 2417 (of these ~2042 are 512KB reads). total write requests: 2701 (of these ~2034 requests are written in 512KB). Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14f2fs: rework fault injection handling to avoid a warningArnd Bergmann
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an unused label: fs/f2fs/segment.c: In function '__submit_discard_cmd': fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label] This could be fixed by adding another #ifdef around it, but the more reliable way of doing this seems to be to remove the other #ifdefs where that is easily possible. By defining time_to_inject() as a trivial stub, most of the checks for CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer formatting of the code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: fix to return success when trimming meta areaChao Yu
generic/251 --- tests/generic/251.out 2016-05-03 20:20:11.381899000 +0800 QA output created by 251 Running the test: done. +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument ... Ran: generic/251 Failures: generic/251 The reason is coverage of fstrim locates in meta area, previously we just return -EINVAL for such case, making generic/251 failed, to fix this problem, let's relieve restriction to return success with no block discarded. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: fix use-after-free of dicard command entryChao Yu
As Dan Carpenter reported: The patch 20ee4382322c: "f2fs: issue small discard by LBA order" from Jul 8, 2018, leads to the following Smatch warning: fs/f2fs/segment.c:1277 __issue_discard_cmd_orderly() warn: 'dc' was already freed. See also: fs/f2fs/segment.c:2550 __issue_discard_cmd_range() warn: 'dc' was already freed. In order to fix this issue, let's get error from __submit_discard_cmd(), and release current discard command after we referenced next one. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: support discard submission error injectionChao Yu
This patch adds to support discard submission error injection for testing error handling of __submit_discard_cmd(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: split discard command in prior to block layerChao Yu
Some devices has small max_{hw,}discard_sectors, so that in __blkdev_issue_discard(), one big size discard bio can be split into multiple small size discard bios, result in heavy load in IO scheduler and device, which can hang other sync IO for long time. Now, f2fs is trying to control discard commands more elaboratively, in order to make less conflict in between discard IO and user IO to enhance application's performance, so in this patch, we will split discard bio in f2fs in prior to in block layer to reduce issuing multiple discard bios in a short time. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: fix incorrect range->len in f2fs_trim_fs()Chao Yu
generic/260 reported below error: [+] Default length with start set (should succeed) [+] Length beyond the end of fs (should succeed) [+] Length beyond the end of fs with start set (should succeed) +./tests/generic/260: line 94: [: 18446744073709551615: integer expression expected +./tests/generic/260: line 104: [: 18446744073709551615: integer expression expected Test done ... In f2fs_trim_fs(), if there is no discard being trimmed, we need to correct range->len before return. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: let checkpoint flush dnode page of regularChao Yu
Fsyncer will wait on all dnode pages of regular writeback before flushing, if there are async dnode pages blocked by IO scheduler, it may decrease fsync's performance. In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint to flush these dnode pages of regular, so async IO of dnode page can be elimitnated, making fsyncer only need to wait for sync IO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: issue discard align to section in LFS modeYunlong Song
For the case when sbi->segs_per_sec > 1 with lfs mode, take section:segment = 5 for example, if the section prefree_map is ...previous section | current section (1 1 0 1 1) | next section..., then the start = x, end = x + 1, after start = start_segno + sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but their bitmap is still set, which will cause duplicated f2fs_issue_discard of this same section in the next write_checkpoint: round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0 then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0 write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0, prefree of NO.2 is cleared, and no discard issued round 2: rm data block NO.0, NO.1, NO.3, NO.4 all invalid, but prefree bit of NO.2 is set and cleared in round 1, then prefree_map: 1 1 0 1 1 write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no valid blocks of this section, so discard issued, but this time prefree bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec; round 3: write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 -> 0 0 0 0 0, no valid blocks of this section, so discard issued, this time prefree bit of NO.3 and NO.4 is cleared, but the discard of this section is sent again... To fix this problem, we can align the start and end value to section boundary for fstrim and real-time discard operation, and decide to issue discard only when the whole section is invalid, which can issue discard aligned to section size as much as possible and avoid redundant discard. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: clean up with f2fs_is_{atomic,volatile}_file()Chao Yu
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: fix to propagate error from __get_meta_page()Chao Yu
If caller of __get_meta_page() can handle error, let's propagate error from __get_meta_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: issue small discard by LBA orderChao Yu
For small granularity discard which size is smaller than 64KB, if we issue those kind of discards orderly by size, their IOs will be spread into entire logical address, so that in FTL, L2P table will be updated randomly, result bad wear rate in the table. In this patch, we choose to issue small discard by LBA order, by this way, we can expect that L2P table updates from adjacent discard IOs can be merged in the cache, so it can reduce lifetime wearing of flash. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: stop issuing discard immediately if there is queued IOChao Yu
For background discard policy, even if there is queued user IO, still we will check max_requests times for next discard entry, it is unneeded, let's just stop this round submission immediately. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01f2fs: detect bug_on in f2fs_wait_discard_biosChao Yu
Add bug_on to detect potential non-empty discard wait list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-27f2fs: fix to correct return value of f2fs_trim_fsChao Yu
We should account trimmed block number from __wait_all_discard_cmd in __issue_discard_cmd_range, otherwise trimmed blocks returned by f2fs_trim_fs will be wrong, this patch fixes it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-27f2fs: disable f2fs_check_rb_tree_consistenceChao Yu
If there is millions of discard entries cached in rb tree, each sanity check of it can cause very long latency as held cmd_lock blocking other lock grabbers. In other aspect, we have enabled the check very long time, as we see, there is no such inconsistent condition caused by bugs. But still we do not choose to kill it directly, instead, adding an flag to disable the check now, if there is related code change, we can reuse it to detect bugs. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-27f2fs: introduce and spread verify_blkaddrChao Yu
This patch introduces verify_blkaddr to check meta/data block address with valid range to detect bug earlier. In addition, once we encounter an invalid blkaddr, notice user to run fsck to fix, and let the kernel panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-27f2fs: don't issue discard commands in online discard is onJaegeuk Kim
Actually, we don't need to issue discard commands, if discard is on, as mentioned in the comment. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-12treewide: Use array_size in f2fs_kvzalloc()Kees Cook
The f2fs_kvzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: f2fs_kvzalloc(handle, a * b, gfp) with: f2fs_kvzalloc(handle, array_size(a, b), gfp) as well as handling cases of: f2fs_kvzalloc(handle, a * b * c, gfp) with: f2fs_kvzalloc(handle, array3_size(a, b, c), gfp) This does, however, attempt to ignore constant size factors like: f2fs_kvzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( f2fs_kvzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | f2fs_kvzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) | f2fs_kvzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ f2fs_kvzalloc(HANDLE, - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( f2fs_kvzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kvzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...) | f2fs_kvzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression HANDLE; expression E1, E2; constant C1, C2; @@ ( f2fs_kvzalloc(HANDLE, C1 * C2, ...) | f2fs_kvzalloc(HANDLE, - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: Use array_size() in f2fs_kzalloc()Kees Cook
The f2fs_kzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: f2fs_kzalloc(handle, a * b, gfp) with: f2fs_kzalloc(handle, array_size(a, b), gfp) as well as handling cases of: f2fs_kzalloc(handle, a * b * c, gfp) with: f2fs_kzalloc(handle, array3_size(a, b, c), gfp) This does, however, attempt to ignore constant size factors like: f2fs_kzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( f2fs_kzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | f2fs_kzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( f2fs_kzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) | f2fs_kzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ f2fs_kzalloc(HANDLE, - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( f2fs_kzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | f2fs_kzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( f2fs_kzalloc(HANDLE, C1 * C2 * C3, ...) | f2fs_kzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression HANDLE; expression E1, E2; constant C1, C2; @@ ( f2fs_kzalloc(HANDLE, C1 * C2, ...) | f2fs_kzalloc(HANDLE, - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04f2fs: fix to update mtime correctlyChao Yu
If we change system time to the past, get_mtime() will return a overflowed time, and SIT_I(sbi)->max_mtime will be udpated incorrectly, this patch fixes the two issues. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: clean up symbol namespaceChao Yu
As Ted reported: "Hi, I was looking at f2fs's sources recently, and I noticed that there is a very large number of non-static symbols which don't have a f2fs prefix. There's well over a hundred (see attached below). As one example, in fs/f2fs/dir.c there is: unsigned char get_de_type(struct f2fs_dir_entry *de) This function is clearly only useful for f2fs, but it has a generic name. This means that if any other file system tries to have the same symbol name, there will be a symbol conflict and the kernel would not successfully build. It also means that when someone is looking f2fs sources, it's not at all obvious whether a function such as read_data_page(), invalidate_blocks(), is a generic kernel function found in the fs, mm, or block layers, or a f2fs specific function. You might want to fix this at some point. Hopefully Kent's bcachefs isn't similarly using genericly named functions, since that might cause conflicts with f2fs's functions --- but just as this would be a problem that we would rightly insist that Kent fix, this is something that we should have rightly insisted that f2fs should have fixed before it was integrated into the mainline kernel. acquire_orphan_inode add_ino_entry add_orphan_inode allocate_data_block allocate_new_segments alloc_nid alloc_nid_done alloc_nid_failed available_free_memory ...." This patch adds "f2fs_" prefix for all non-static symbols in order to: a) avoid conflict with other kernel generic symbols; b) to indicate the function is f2fs specific one instead of generic one; Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: fix to let caller retry allocating block addressChao Yu
Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg: BUG: unable to handle kernel NULL pointer dereference at 00000104 *pdpt = 0000000029b7b001 *pde = 0000000000000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 CPU: 0 PID: 11161 Comm: fsstress Tainted: G O 4.17.0-rc2 #38 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] EFLAGS: 00010206 CPU: 0 EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200 ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0 Call Trace: do_write_page+0x6f/0xc0 [f2fs] write_data_page+0x4a/0xd0 [f2fs] do_write_data_page+0x327/0x630 [f2fs] __write_data_page+0x34b/0x820 [f2fs] __f2fs_write_data_pages+0x42d/0x8c0 [f2fs] f2fs_write_data_pages+0x27/0x30 [f2fs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0x94/0xd0 filemap_write_and_wait_range+0x3d/0xa0 __generic_file_write_iter+0x11a/0x1f0 f2fs_file_write_iter+0xdd/0x3b0 [f2fs] __vfs_write+0xd2/0x150 vfs_write+0x9b/0x190 ksys_write+0x45/0x90 sys_write+0x16/0x20 do_fast_syscall_32+0xaa/0x22c entry_SYSENTER_32+0x4c/0x7b EIP: 0xb7fc8c51 EFLAGS: 00000246 CPU: 0 EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000 ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74 CR2: 0000000000000104 ---[ end trace 4cac79c0d1305ee6 ]--- allocate_data_block will submit all sequential pending IOs sorted by a FIFO list, If we failed to submit other user's IO due to unaligned write, we will retry to allocate new block address for current IO, then it will initialize fio.list again, if fio was in the list before, it can break FIFO list, result in above panic. Thread A Thread B - do_write_page - allocate_data_block - list_add_tail : fioA cached in FIFO list. - do_write_page - allocate_data_block - list_add_tail : fioB cached in FIFO list. - f2fs_submit_page_write : fail to submit IO - allocate_data_block - INIT_LIST_HEAD - f2fs_submit_page_write - list_del <-- NULL pointer dereference This patch adds fio.retry parameter to indicate failure status for each IO, and avoid bailing out if there is still pending IO in FIFO list for fixing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: fix to don't trigger writeback during recoveryChao Yu
- f2fs_fill_super - recover_fsync_data - recover_data - del_fsync_inode - iput - iput_final - write_inode_now - f2fs_write_inode - f2fs_balance_fs - f2fs_balance_fs_bg - sync_dirty_inodes With data_flush mount option, during recovery, in order to avoid entering above writeback flow, let's detect recovery status and do skip in f2fs_balance_fs_bg. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: clear discard_wake earlierSheng Yong
If SBI_NEED_FSCK is set, discard_wake will never be cleared. As a result, the condition of wait_event_interruptible_timeout() is always true, which gets discard thread run too frequently. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: let discard thread wait a little longer if dev is busyYunlei He
This patch modify discard thread wait policy as below: issued io_interrupted wait time(ms) 1. 8 0 50 2. (0,8) 1 50 3. 0 1 500 (dev is busy) 4. 0 0 60000 (no candidates) Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: avoid stucking GC due to atomic writeChao Yu
f2fs doesn't allow abuse on atomic write class interface, so except limiting in-mem pages' total memory usage capacity, we need to limit atomic-write usage as well when filesystem is seriously fragmented, otherwise we may run into infinite loop during foreground GC because target blocks in victim segment are belong to atomic opened file for long time. Now, we will detect failure due to atomic write in foreground GC, if the count exceeds threshold, we will drop all atomic written data in cache, by this, I expect it can keep our system running safely to prevent Dos attack. In addition, his patch adds to show GC skip information in debugfs, now it just shows count of skipped caused by atomic write. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: introduce sbi->gc_mode to determine the policyJaegeuk Kim
This is to avoid sbi->gc_thread pointer access. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: keep migration IO order in LFS modeChao Yu
For non-migration IO, we will keep order of data/node blocks' submitting as allocation sequence by sorting IOs in per log io_list list, but for migration IO, it could be out-of-order. In LFS mode, we should keep all IOs including migration IO be ordered, so that this patch fixes to add an additional lock to keep submitting order. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: fix to wait page writeback during revoking atomic writeChao Yu
After revoking atomic write, related LBA can be reused by others, so we need to wait page writeback before reusing the LBA, in order to avoid interference between old atomic written in-flight IO and new IO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: clean up with is_valid_blkaddr()Chao Yu
- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability. - introduce is_valid_blkaddr() for cleanup. No logic change in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: fix to initialize min_mtime with ULLONG_MAXChao Yu
Since sit_i.min_mtime's type is unsigned long long, so we should initialize it with max value of the type ULLONG_MAX instead of LLONG_MAX. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: treat volatile file's data as hot oneChao Yu
Volatile file's data will be updated oftenly, so it'd better to place its data into hot data segment. In addition, for atomic file, we change to check FI_ATOMIC_FILE instead of FI_HOT_DATA to make code readability better. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: introduce release_discard_addr() for cleanupChao Yu
Introduce release_discard_addr() to include common codes for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Fengguang Wu: declare static function, reported by kbuild test robot] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: fix potential overflowChao Yu
In build_sit_entries(), if valid_blocks in SIT block is smaller than valid_blocks in journal, for below calculation: sbi->discard_blks += old_valid_blocks - se->valid_blocks; There will be two times potential overflow: - old_valid_blocks - se->valid_blocks will overflow, and be a very large number. - sbi->discard_blks += result will overflow again, comes out a correct result accidently. Anyway, it should be fixed. Fixes: d600af236da5 ("f2fs: avoid unneeded loop in build_sit_entries") Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: sanity check for total valid node blocksJaegeuk Kim
This patch enhances sanity check for SIT entries. syzbot hit the following crash on upstream commit 83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +0000) Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Try to recover 1th superblock, ret: 0 F2FS-fs (loop0): Mounted with checkpoint version = d F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740 ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:1884! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: 0018:ffff8801af526708 EFLAGS: 00010282 RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6 RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90 R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380 R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff FS: 00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077 __sync_filesystem fs/sync.c:39 [inline] sync_filesystem+0x265/0x310 fs/sync.c:67 generic_shutdown_super+0xd7/0x520 fs/super.c:429 kill_block_super+0xa4/0x100 fs/super.c:1191 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030 deactivate_locked_super+0x97/0x100 fs/super.c:316 deactivate_super+0x188/0x1b0 fs/super.c:347 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457d97 RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97 RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50 RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009 R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940 R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708 ---[ end trace f498328bb02610a2 ]--- Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: sanity check on sit entryJaegeuk Kim
syzbot hit the following crash on upstream commit 87ef12027b9b1dd0e0b12cf311fbcb19f9d92539 (Wed Apr 18 19:48:17 2018 +0000) Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value BUG: unable to handle kernel paging request at ffffed006b2a50c0 PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline] RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06 RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8 R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80 R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600 FS: 0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803 mount_bdev+0x30c/0x3e0 fs/super.c:1165 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1268 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0x564/0x3070 fs/namespace.c:2847 ksys_mount+0x12d/0x140 fs/namespace.c:3063 __do_sys_mount fs/namespace.c:3077 [inline] __se_sys_mount fs/namespace.c:3074 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443d6a RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000 RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0 RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0 CR2: ffffed006b2a50c0 ---[ end trace a2034989e196ff17 ]--- Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: clean up commit_inmem_pages()Chao Yu
This patch moves error handling from commit_inmem_pages() into __commit_inmem_page() for cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: stop issue discard if something wrong with f2fsYunlei He
v4->v5: move data corruption check to __submit_discard_cmd, in order to control discard io submitted more accurately, besides, increase async thread wait time if data corruption detected. This patch stop async thread and umount process to issue discard if something wrong with f2fs, which is similar to fstrim. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: check if inmem_pages list is empty correctlySheng Yong
`cur' will never be NULL, we should check inmem_pages list instead. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entriesZhikang Zhang
We should check valid_map_mir and block count to ensure the flushed raw_sit is correct. Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com> Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: correct return value of f2fs_trim_fsChao Yu
Correct return value in two cases: - return EINVAL if end boundary is out-of-range. - return EIO if fs needs off-line check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: don't use GFP_ZERO for page cachesChao Yu
Related to https://lkml.org/lkml/2018/4/8/661 Sometimes, we need to write meta data to new allocated block address, then we will allocate a zeroed page in inner inode's address space, and fill partial data in it, and leave other place with zero value which means some fields are initial status. There are two inner inodes (meta inode and node inode) setting __GFP_ZERO, I have just checked them, for both of them, we can avoid using __GFP_ZERO, and do initialization by ourselves to avoid unneeded/redundant zeroing from mm. Cc: <stable@vger.kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: issue all big range discards in umount processYunlei He
This patch modify max_requests to UINT_MAX, to issue all big range discards in umount. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: run fstrim asynchronously if runtime discard is onJaegeuk Kim
We don't need to wait for whole bunch of discard candidates in fstrim, since runtime discard will issue them in idle time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: turn down IO priority of discard from backgroundChao Yu
In order to avoid interfering normal r/w IO, let's turn down IO priority of discard issued from background. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: don't split checkpoint in fstrimChao Yu
Now, we issue discard asynchronously in separated thread instead of in checkpoint, after that, we won't encounter long latency in checkpoint due to huge number of synchronous discard command handling, so, we don't need to split checkpoint to do trim in batch, merge it and obsolete related sysfs entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: issue discard commands proactively in high fs utilizationJaegeuk Kim
In the high utilization like over 80%, we don't expect huge # of large discard commands, but do many small pending discards which affects FTL GCs a lot. Let's issue them in that case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>