Was doing some passive recon on bentleymotors.com and stumbled onto a subdomain I hadn't seen before, id.bentleymotors.com. Running NGINX, clearly some kind of identity / login portal. Looked custom-built, which is always interesting because custom auth flows tend to have weird edges.
The first thing that caught my eye was the CSP header. Buried in img-src there was a wildcard pointing at https://*.vwgroup.io. That's a pretty strong signal this isn't actually a Bentley-owned stack. It's VW Group's shared "Identity Kit" platform that Bentley just happens to be a tenant on. Filed that away for later because it ends up mattering.
Started mapping routes. Most of them behaved. Hit /api/profile unauthenticated, got bounced with a 401, no big surprise. Then I tried /landing-page.
200 OK. Full internal application shell. HTML, JS routing, the API map, all of it, served to me with no session, no token, nothing. The frontend auth guard was clearly doing its job client-side, but the server wasn't enforcing anything on this route. Classic case of trusting the SPA to handle auth.
Remembering the VW Group CSP tie-in, I pointed the same request at id.vwgroup.io. Identical behavior, same route, same 200, same dump.
So this isn't a Bentley misconfiguration. It's a misconfiguration in the underlying Identity Kit platform that every VW Group tenant is inheriting. That raised the scope question real fast.
While poking at authenticated routes, I noticed something weirder. Hit the OAuth callback with a bogus redirect:
Two things wrong here. First, the www-authenticate header is telling me exactly where the app expects the access token to live ("in cookies"). That's free intel for an attacker, you don't have to guess where to stuff a stolen token.
Second, and this is the weird one, a failed 401 auth attempt is mutating state. The server is issuing a valid session cookie (identity-kit-profile-session) on a request that just got rejected. Error responses shouldn't be writing session state. At minimum it's sloppy, at worst it could be abused depending on what downstream code trusts that cookie's presence.
The moment I confirmed this was VW Group shared infrastructure and not just Bentley, I stopped. Bentley's bug bounty scope is Bentley's IT systems, and going further on a platform that handles identity for the entire VW Group would've blown past that line. Reported both findings to Bentley, noted the VW Group tie-in, and let them handle escalation internally.
As of writing, both issues are still unpatched. The landing page route still serves the internal app shell to unauthenticated users, and the 401 response still leaks the session cookie and the access token location.
Was hunting for LPE primitives that don't need setuid. Spent a while on race conditions in the VFS layer, kept hitting dead ends, and pivoted to logic bugs in the idmapping chain. The idmapped mount feature is relatively new and the interaction with OverlayFS felt underexplored, so I started reading fs/overlayfs/inode.c.
The setup that matters: a host running a container with a user namespace, where container UID 0 maps to something like host UID 100000. Inside that container, you mount an idmapped mount that does the UID translation, and then you stack OverlayFS on top of it. The lower layer is the idmapped mount, the upper layer is container-local. The kernel is supposed to use the idmap for every permission check on the OverlayFS files so the container's UID 0 never gets confused with host UID 0.
The functions in inode.c receive the correct idmap argument from the VFS, then throw it away and use &nop_mnt_idmap instead. nop_mnt_idmap is the identity mapping, meaning no UID/GID translation happens at all.
So every permission check on OverlayFS files goes through the identity map. Container UID 0, which the host is supposed to translate to UID 100000, never gets translated. It's just UID 0 all the way down.
The attack chain is straightforward once the bug is clear. Inside the container, you create a file through the OverlayFS mount. Then you chmod or chown it. Because ovl_setattr uses nop_mnt_idmap, the ownership change happens in the host's UID space directly, with no translation. Container UID 0 becomes host UID 0. The permission check passes because, as far as the kernel is concerned, you ARE UID 0 on the host.
No setuid file involved. No race condition. No kernel address leak needed. The whole thing is a clean logic bug.
A minimal PoC looks like this:
While auditing around the same area I found a handful of other permission gaps. None of them are as clean as the OverlayFS one, but they're worth noting.
lookup_noperm and friends do a dcache lookup with no inode_permission() call. They're exported to filesystem modules. A malicious or buggy module can use them to walk paths without DAC checks. Severity is high but you need a filesystem module, which limits the attack surface.
Opening a file with O_PATH sets f_op = &empty_fops and returns early, before security_file_open() and fsnotify_open_perm() get called. You can then reopen the fd via /proc/self/fd/ and the LSM never sees the original open. Medium severity, but useful for flying under monitoring.
Setting xattrs with the security. or system. prefix skips the inode_permission() call entirely. You can write security xattrs without proper DAC checks. Medium severity.
If ATTR_FORCE is set in the iattr mask, the code jumps straight to kill_priv and skips every permission check. Internal callers or buggy filesystems can use this to bypass chown/chmod checks entirely. Medium, internal-triggered.
Two more threads I want to pull on but haven't confirmed yet.
FUSE as OverlayFS lower layer. If the kernel allows a FUSE filesystem as the lower layer of an OverlayFS mount, you could craft a FUSE fs that returns files owned by the attacker with capability xattrs set. When OverlayFS does copy-up, the upper layer ends up with attacker-owned files with capabilities. That would be critical if it works. I haven't verified whether the kernel actually allows FUSE as a lower layer yet.
Idmapped mount edge cases. The INVALID_UID fallback (UID 65534 / nobody) when mapping fails could grant access to files owned by unmapped UIDs. There's also a potential capability namespace mismatch in capable_wrt_inode_uidgid(), which uses mnt_userns while the inode UID might be in a different namespace. Nested namespaces with overlapping mappings could cause privilege confusion. Both need verification.
| vuln | sev | exploitable | no setuid | priority |
|---|---|---|---|---|
| nop_mnt_idmap | critical | yes | yes | #1 |
| FUSE lower layer | critical | tbd | yes | #2 |
| lookup_noperm | high | needs module | yes | #3 |
| idmapped mount bugs | high | tbd | yes | #4 |
| O_PATH bypass | med | yes | yes | #5 |
| xattr DAC skip | med | yes | yes | #6 |
| ATTR_FORCE bypass | med | internal | yes | #7 |
The fix for the main bug is a one-liner per call site. Pass the idmap that was already passed in.
Same fix in ovl_permission() at line 309 and ovl_set_acl() at lines 537 and 546. Reported and patched upstream. The fix is exactly what you'd expect, pass the idmap that was already passed in.
Was looking at the futex subsystem on Linux 6.12.74+deb13+1-amd64. Originally chasing race conditions in the private hash code, but CONFIG_FUTEX_PRIVATE_HASH isn't compiled into this kernel and the FUTEX2_NUMA TOCTOU path isn't implemented. Dead ends. Pivoted to the robust futex list, which is older code and gets walked every time a thread exits.
The relevant path is exit_robust_list() calling into handle_futex_death(). The list head lives in userspace. The kernel reads it, walks the entries, and for each entry computes the futex address as entry + futex_offset.
futex_offset is a signed 64-bit long read straight from userspace with get_user. No bounds check, no VMA validation, no magnitude check, no verification that the computed address is inside the process's own memory. Positive offset goes forward from entry, negative offset goes backward, full 64-bit range.
The compat path (compat_exit_robust_list()) has the same bug, with the added weirdness that 32-bit compat mode truncates pointers, so the address calculation can land somewhere unexpected. Same root cause, slightly different shape.
Inside handle_futex_death(), the kernel does a get_user on the computed address. That's an arbitrary userspace read. Then it masks the value with FUTEX_TID_MASK (0x3fffffff) and checks whether it equals the dying thread's TID. If it matches, it does a cmpxchg that sets bit 30 (FUTEX_OWNER_DIED).
So the primitive is: read any userspace address, and if you can place the dying thread's TID in the low 30 bits at that address, set bit 30 on it. There's also a secondary path: if FUTEX_WAITERS (bit 31) is set and it's not a PI futex, the kernel calls futex_wake() on the arbitrary address. That gives you a cross-process futex wake primitive on addresses the dying task never actually held.
| capability | details |
|---|---|
| read | arbitrary userspace read via get_user() |
| write condition | (value & 0x3fffffff) == dying_thread_tid |
| write effect | value = (value & 0x80000000) | 0x40000000 |
| bit modified | bit 30 (OWNER_DIED) |
A few scenarios that look interesting.
Cross-process futex wake. Set futex_offset to point at another process's futex word. Place the dying thread's TID there. On exit, the kernel sets OWNER_DIED and triggers futex_wake() on that address. You've now woken waiters on a futex the dying task never held. State confusion in the target process, potentially useful for locking bugs.
Info leak via fault handling. The fault_in_user_writeable() path calls fixup_user_fault() with FAULT_FLAG_WRITE. That can allocate new pages during the exit path and may trigger COW on read-only mappings. Page table modifications during exit are inherently sketchy.
setuid state confusion. Find a setuid binary that uses robust futexes. Manipulate the robust list before exec(). When the thread exits, the kernel writes OWNER_DIED to a controlled address. Confuse the setuid binary's locking state. This is the most interesting one for actual privilege escalation but needs the right target binary.
The primitive has real limits. The write only fires if the value at the target address has the dying thread's TID in the low 30 bits, but since you control the memory layout you can just place the right TID there. The write only sets bit 30, so it's a single-bit primitive, not a full write. Kernel addresses fault, so you're stuck in userspace. And ROBUST_LIST_LIMIT caps you at 2048 iterations, which limits how many addresses you can hit in one exit.
| limitation | impact |
|---|---|
| must match TID | attacker controls memory, can place correct TID |
| only sets bit 30 | limited write primitive |
| needs valid userspace addr | kernel addresses will fault |
| ROBUST_LIST_LIMIT=2048 | limited iterations per exit |
The shape of the exploit is simple. mmap a target page, set up a single-entry robust list where futex_offset is calculated so that entry + futex_offset lands on the target. Place the current PID at the target so the TID check passes. Register the robust list with set_robust_list. Exit. The kernel walks the list, reads the target, matches the TID, sets bit 30.
Next step is finding a kernel structure where flipping bit 30 actually means something. Permission bits, lock state bits, refcount bits if bit 30 happens to be significant. That's the hard part and it's where the real work is. The primitive itself is confirmed.
As of the kernel I tested, none of the four missing checks (no bounds on futex_offset, no VMA validation, no process boundary check, no max magnitude check) are present in the code. The bug is unpatched. Coordinating disclosure.