Compare commits

..

161 Commits

Author SHA1 Message Date
Masamichi Takagi
8f117cc0dc configure.ac: Update version number to 1.5.1-knl+hfi
Change-Id: Icbd08c9c5f65b22d007ec479a34acd20062e0e90
2019-05-14 17:22:33 +09:00
Balazs Gerofi
0b9a657a01 HFI: support IFS 10.8-0
Change-Id: Iebc0e2b50faf464efcc5134cc40dc52e0bd6eea7
2019-04-15 11:26:39 +09:00
Balazs Gerofi
c2d6651cd2 mcreboot: remove MCDRAM offline/online
Change-Id: Ia30180b4890508d041fc64ca35e1a9c58d903ddf
2019-04-15 11:26:39 +09:00
Dominique Martinet
d979444049 file_ops: add missing break statement (harmless)
Change-Id: I97982c96623b571d94348fd4a3df6bb0aeb515e9
2018-07-26 05:06:16 +00:00
Masamichi Takagi
faa357d5a6 Merge "configure.ac: Update version number to 1.5.0-knl+hfi" into development+unimap+hfi+OFP 2018-06-21 02:39:43 +00:00
Balazs Gerofi
653aba17a1 mcreboot: load kernel modules from under /tmp
Change-Id: I81a8c451b6dd556a00699a2c8d0c7df5a99e4ea2
2018-06-20 20:53:00 +09:00
Balazs Gerofi
7736e25ca4 mpimcexec: fix empty ${COMMAND} check
Change-Id: I9e37e952fb756a4aafb4b2e218844120fe59af7b
2018-06-20 20:50:33 +09:00
Masamichi Takagi
73d16a9d79 configure.ac: Update version number to 1.5.0-knl+hfi
Change-Id: I9d36bcfe4b64a772f6492e39a1466a2e73ddd682
2018-06-20 17:07:30 +09:00
Balazs Gerofi
922bd7e6eb mpimcexec: use PJM_PROC_BY_NODE if available
Change-Id: Id8991f78e4d3bdfbb20adf202b43762a0d915c47
2018-06-20 15:18:53 +09:00
Balazs Gerofi
0d99072109 mpimcexec: man page proof-reading
Change-Id: I58223dd86e17fa896fe3e258d2dc2e5b881a0072
2018-06-18 16:31:42 +09:00
Yutaka Ishikawa
3ced3f6080 mcexec: Options -m and -M are described in man page
Change-Id: Ie4a860c8753af654ee842b16aabb9620e68f71a1
2018-06-18 15:00:29 +09:00
Yutaka Ishikawa
d9ff940528 mpimcexec: Man page
Change-Id: I99ea2821500cc1cfadc912d93c88d308b92ed9cf
2018-06-18 14:59:40 +09:00
Yutaka Ishikawa
cd63ec877d mpimcexec: Error handling is added
Change-Id: Id4e94adad2afff324b154d0c8be270ecc7568bab
2018-06-18 14:59:18 +09:00
Masamichi Takagi
6c0bb9e576 HFI1: Range-check proc->fd_priv_table[]
sockioctl01.c in LTP calls ioctl(1025, ...) and causes kernel page-fault without
the range-check.

Change-Id: I4117783e20107f274c0857b09745f12a5cc5ce2f
2018-06-13 00:31:44 +09:00
Balazs Gerofi
ca9894108b OFP: mpimcexec: use MPI_LOCALNRANKS for ppn if available 2018-06-13 00:31:44 +09:00
Masamichi Takagi
3f26e44f85 mremap: Don't premap destination vm_range
mremap works in the following steps:
(1) Unmap the destination memory area
(2) Create a new vm_range with add_process_memory_range
(3) Move the PTEs of the source range to the destination range by using move_pte_range

The problem is that step (3) expects the destination doesn't have any physical pages,
but step (2) premaps the destination when the optimization of premapping anonymous
map is turned on.

Change-Id: Ieeebd799b7169b9a6f6f658c204c31f49817030f
2018-06-13 00:31:44 +09:00
Balazs Gerofi
bacfb0c2b9 OFP: mpimcexec wrapper around mpirun for OFP users 2018-06-13 00:31:43 +09:00
Balazs Gerofi
09f63483cc OFP: temporary ANON mmap() rewrite 2018-06-13 00:31:43 +09:00
Balazs Gerofi
2f0c2aae9e OFP: avoid drop_caches in mcreboot 2018-06-13 00:31:43 +09:00
Balazs Gerofi
f7b277a623 HFI1: use ihk_mc_pt_lookup_fault_pte() in SDMA/exp receive 2018-06-13 00:31:43 +09:00
Balazs Gerofi
a3aa96af19 MM: introduction of ihk_mc_pt_lookup_fault_pte() 2018-06-13 00:31:43 +09:00
Balazs Gerofi
91d732308d HFI1: shorten lock held spin for SDMA status changes 2018-06-13 00:31:43 +09:00
Balazs Gerofi
166c6105ef queued_spin_lock: fix compatibility with Linux 2018-06-13 00:31:43 +09:00
Balazs Gerofi
5a2f8388a6 HFI1: handle Linux queued_spin_locks in the receive path as well 2018-06-13 00:31:42 +09:00
Balazs Gerofi
8164b63fc2 HFI1: port to IFS 10.7 rpv1 and support queued_spin_lock in Linux 3.10.0-693.11.6 2018-06-13 00:31:42 +09:00
Balazs Gerofi
af22ce62d2 HFI1: clean up and eliminate dead code in user SDMA 2018-06-13 00:31:42 +09:00
Balazs Gerofi
2eca75ead8 HFI1: clean up dead code in file ops 2018-06-13 00:31:42 +09:00
Balazs Gerofi
22992780cf HFI1: use kmalloc_cache_free() in clear_tid_node() for TID nodes 2018-06-13 00:31:42 +09:00
Balazs Gerofi
3043591e9a hfi1_user_exp_rcv_overlapping(): fix return value when overlapping 2018-06-13 00:31:42 +09:00
Balazs Gerofi
7e7c0f9ed3 init_process_vm(): remove vm_range_numa_policy_list (merge fix) 2018-06-13 00:31:42 +09:00
Balazs Gerofi
7193f165cc HFI1: fix page border iteration bug in hfi1_user_exp_rcv_setup() 2018-06-13 00:31:42 +09:00
Balazs Gerofi
c8c42576fd HFI1: increase lock timeout in sdma_send_txlist() 2018-06-13 00:31:42 +09:00
Balazs Gerofi
0412e1fcc6 HFI1: add generated user_sdma_request and user_sdma_txreq headers 2018-06-13 00:31:41 +09:00
Balazs Gerofi
238e346586 HFI1: use DWARF generated headers for user_sdma_request and user_sdma_txreq 2018-06-13 00:31:41 +09:00
Balazs Gerofi
0e57c715ad HFI1: look at DW_AT_upper_bound for resolving array size from DWARF info 2018-06-13 00:31:41 +09:00
Balazs Gerofi
3facd3dcca HFI1: release lock in sdma_send_txlist() when SDMA ring is full 2018-06-13 00:31:41 +09:00
Balazs Gerofi
ec5328de69 HFI1: refactor sdma_select_user_engine() 2018-06-13 00:31:41 +09:00
Balazs Gerofi
880dd6ddb2 page_fault_handler(): enable on-demand mapping of Linux ioremap area 2018-06-13 00:31:41 +09:00
Balazs Gerofi
898708b8b4 spinlock: rewrite spinlock to use Linux ticket head/tail format 2018-06-13 00:31:41 +09:00
Balazs Gerofi
b08331b21a ihk_hfi1_common.h: use IRQ restore unlock in spin_unlock 2018-06-13 00:31:41 +09:00
Balazs Gerofi
c196c996dd HFI: add dd to generated sdma_engine 2018-06-13 00:31:41 +09:00
Balazs Gerofi
20e179f6dc sdma_select_user_engine(): refactor selection code 2018-06-13 00:31:40 +09:00
Balazs Gerofi
32fbc015f5 HFI1: eliminate lots of dead code 2018-06-13 00:31:40 +09:00
Balazs Gerofi
558c250bb3 HFI1: generate headers for sdma_state and sdma_engine structures 2018-06-13 00:31:40 +09:00
Balazs Gerofi
96ea2d3658 dwarf-extract: support enumerations 2018-06-13 00:31:40 +09:00
Balazs Gerofi
9c91298ccf do_munmap(): hook to HFI1 deferred unmap 2018-06-13 00:31:40 +09:00
Balazs Gerofi
b08da83a51 hfi1_file_ioctl(): execute HFI1_IOCTL_TID_INVAL_READ locally 2018-06-13 00:31:40 +09:00
Balazs Gerofi
fcc8310454 HFI1: track receive TIDs in a tree 2018-06-13 00:31:40 +09:00
Balazs Gerofi
96b8b30516 MM: facility for deferred munmap()
Conflicts:
	kernel/process.c
2018-06-13 00:31:40 +09:00
Balazs Gerofi
521e0dc707 HFI1: add a bunch of fields to hfi1_devdata and hfi1_filedata for receive TID handling, do necessary mappings in hfi1_map_device_addresses() 2018-06-13 00:31:40 +09:00
Balazs Gerofi
e2e773d883 HFI: fix tidinfo and length calculation in program_rcvarray() 2018-06-13 00:31:39 +09:00
Balazs Gerofi
04d22d90a3 do_mmap(): debug message cosmetics 2018-06-13 00:31:39 +09:00
Balazs Gerofi
f6405081a6 page_fault_handler(): map Linux ioremap addresses on demand (disabled) 2018-06-13 00:31:39 +09:00
Balazs Gerofi
5bea237581 HFI1: make kmalloc caches per-CPU and pre-allocate at boot time 2018-06-13 00:31:39 +09:00
Balazs Gerofi
33ad55e72b kmalloc_cache_prealloc(): specify nr_elems as argument 2018-06-13 00:31:39 +09:00
Balazs Gerofi
6848c2ecf7 HFI1: move tid_rb_node to header 2018-06-13 00:31:39 +09:00
Balazs Gerofi
79f9a2d31a HFI1: don't print at open() time 2018-06-13 00:31:39 +09:00
Balazs Gerofi
2900ce20f7 HFI1: hfi1_unmap_device_addresses() at process terminate time 2018-06-13 00:31:39 +09:00
Balazs Gerofi
002b78372d open(): ignore /proc/sys/vm/overcommit_memory 2018-06-13 00:31:38 +09:00
Dominique Martinet
5fce5e4e3c hfi1 generated headers: add missing filedata file 2018-06-13 00:31:38 +09:00
Balazs Gerofi
7a1ad31183 HFI: call hfi1_map_device_addresses() at initialization time
Conflicts:
	kernel/syscall.c
2018-06-13 00:31:38 +09:00
Dominique Martinet
54bdb3419d hfi1 generated headers:
- split headers into one file per struct
 - add filedata
 - fix s/modprobe/modinfo/ for guessed .ko path
2018-06-13 00:31:38 +09:00
Dominique Martinet
03fed4d1c8 automatically generate hfi structs from dwarf info 2018-06-13 00:31:38 +09:00
Dominique Martinet
6279f69f5c compiler.h: take in recent linux updates for newer gcc support
Had to remove from original compiler-gcc:
 - things that deal with types, e.g. READ_ONCE macro and friends;
 - #define barrier(). This one would be better there at some point.

hfi1: remove ACCESS_ONCE from hfi1 header
2018-06-13 00:31:38 +09:00
Balazs Gerofi
6959d5ead4 HFI: port to SFI driver version 10.5.1.0.2 2018-06-13 00:31:38 +09:00
Balazs Gerofi
a5aa68744f hfi1: use kmalloc_cache for tid_rb_node allocations 2018-06-13 00:31:38 +09:00
Balazs Gerofi
89c5aaa9e9 hfi1_user_exp_rcv_setup(): rewrite main loop 2018-06-13 00:31:37 +09:00
Balazs Gerofi
15422d886f hif1_file_ioctl(): use dkprintf() 2018-06-13 00:31:37 +09:00
Balazs Gerofi
f139bef0cb mmap(): remove force large page extension (meant to be RESET) 2018-06-13 00:31:37 +09:00
Dominique Martinet
de82cf8779 hfi1/user_exp_rcv/setup: keep track of position within page
ihk_mc_pt_lookup_pte + pte_get_phys will get us the physical address
for the start of the page we're looking at.
Re-offset it by position within buffer.
2018-06-13 00:31:37 +09:00
Dominique Martinet
662895c020 hfi1/user_exp_rcv: explicitely call hfi1_map_device_addresses
There were cases where nobody else did this mapping for us
2018-06-13 00:31:37 +09:00
Dominique Martinet
d23939da8c process/vm: fix lookup_process_memory_range (again)
That optimistically going left was a more serious bug than just
last iteration, we could just pass by a match and continue down
the tree if the match was not a leaf.

Fix the actual algorithm issue

Conflicts:
	kernel/process.c
2018-06-13 00:31:37 +09:00
Dominique Martinet
67529f21ff hfi1: replace true/false defines by stddef include 2018-06-13 00:31:37 +09:00
Dominique Martinet
5c11ff0950 process/vm: fix lookup_process_memory_range with small start address
Cherry-picked from 6370520e

Conflicts:
	kernel/process.c
2018-06-13 00:31:37 +09:00
Dominique Martinet
ce4eb0d409 hfi1/user_exp_rcv/setup: add access_ok check 2018-06-13 00:31:36 +09:00
Dominique Martinet
04434320fc hfi1/user_exp_rcv/setup: do not skip over pages
If the vaddr we consider is not at the start of a page, we could skip
over (smaller, not contigous) areas.

For example consider this segment of virtual memory:
[ 2MB | 4k | 4k | ... ]
Starting at 1MB offset, we would get a pgsize of 2MB so would skip
straight over 1MB worth of 4k pages.
2018-06-13 00:31:36 +09:00
Dominique Martinet
50fafa6d71 hfi1/user_exp_rcv/setup: use cache_alloc for tidlist 2018-06-13 00:31:36 +09:00
Dominique Martinet
f5ced648ef hfi1/user_exp_rcv: rework main loop
New loop now takes into account pages not physically contiguous.
Also some minor improvements, e.g. make the spin_lock used more locally,
reuse a group we had if we had one, etc.
2018-06-13 00:31:36 +09:00
Dominique Martinet
0f8f88ca46 hfi1/user_exp_rcv/invalid: Remove function
user_exp_rcv_invalid is only used together with the mmu cache
(its purpose is the delayed freeing of tids that were invalidated in cache)

Since we do not use that cache, the function can go
2018-06-13 00:31:36 +09:00
Dominique Martinet
e99f19e812 hfi1/user_exp_rcv/setup: set length in tidinfo
This was dropped early on by mistake/excessive haste, it's actually
pretty useful.
2018-06-13 00:31:36 +09:00
Dominique Martinet
9a36e5d213 hfi1/user_exp_rcv/setup: increment phys appropriately
Old code was always registering the same section with different size,
instead of properly covering the requested map
2018-06-13 00:31:36 +09:00
Dominique Martinet
4816f27639 hfi1/user_exp_rcv/setup: split into multiple tids
Do not round up to next power of two, but issue multiple requests
if necessary (e.g. 260k would be 256 + 4k in two registrations)
2018-06-13 00:31:36 +09:00
Dominique Martinet
9c0b8aa812 mcctrl/control.c: fix debug print types 2018-06-13 00:31:36 +09:00
Dominique Martinet
23f178d718 hfi1/user_exp_rcv/clear: implement TID_FREE ioctl 2018-06-13 00:31:36 +09:00
Dominique Martinet
159c18b98b hfi1/ioctl: only forward ioctl if hfi1_file_ioctl didn't handle it
Conflicts:
	kernel/syscall.c
2018-06-13 00:31:35 +09:00
Dominique Martinet
1847a3ac11 hfi1/user_exp_rcv/setup: cleanup locks/groups usage 2018-06-13 00:31:35 +09:00
Dominique Martinet
15b16ffbbb hfi1/user_exp_rcv/setup: map is noop, skip it
In the original driver's dma.c hfi1_dma_map_single just passes
the physical address back, so directly use that.
2018-06-13 00:31:35 +09:00
Dominique Martinet
e64d89cd48 hfi: bases for user_exp_rcv
This implements a skeleton setup function and call it on ioctl

Many missing points:
 - missing pci mapping to make setup work
 - no clear (passed to linux, so will likely bug out)
 - missing locks/safe-guards

Conflicts:
	kernel/Makefile.build.in
2018-06-13 00:31:35 +09:00
Dominique Martinet
7366da4390 Fix other warnings
Most were harmless, but the change to ACCESS_ONCE from volatile
cast is probably useful.
Expanding macro, we basically went from:
    m = (volatile struct sdma_vl_map *)dd->sdma_map;
to
    m = *(volatile struct sdma_vl_map **)&(dd->sdma_map);
i.e. the explicit lookup is at a different level.
2018-06-13 00:31:35 +09:00
Dominique Martinet
2dc85ee417 user_sdma: fix use of uninitialized variable (vl)
This defines a single field in hfi1_pportdata, getting offset
from dwarf headers -- need to compute that at configure time
2018-06-13 00:31:35 +09:00
Balazs Gerofi
73cc07f98e ioctl() investigation - TO RESET 2018-06-13 00:31:35 +09:00
Balazs Gerofi
815e2244ca HFI1: minor change of declarations 2018-06-13 00:31:34 +09:00
Balazs Gerofi
163af73554 HFI1: properly iterate iovecs according to underlying page sizes 2018-06-13 00:31:34 +09:00
Balazs Gerofi
fd316f3ca3 HFI1: pass per-CPU txreq_cache to user_sdma_send_pkts() 2018-06-13 00:31:34 +09:00
Balazs Gerofi
122588bc4d mcexec: --enable-hfi1 to runtime enable/disable HFI1 driver
Conflicts:
	executer/user/mcexec.c
2018-06-13 00:31:34 +09:00
Balazs Gerofi
70238982c2 HFI1: use embedded kmalloc cache for req->tids (fixes AllReduce hang) 2018-06-13 00:31:34 +09:00
Balazs Gerofi
5b5191ef64 HFI1: move txreq kmalloc cache header into CPU local variable 2018-06-13 00:31:34 +09:00
Balazs Gerofi
a65faeaed4 kmalloc cache: embed cache pointer into kmalloc_header
Conflicts:
	kernel/mem.c
2018-06-13 00:31:34 +09:00
Balazs Gerofi
4dea1842e0 kmalloc cache: embed cache pointer into kmalloc_header
Conflicts:
	kernel/mem.c
2018-06-13 00:31:34 +09:00
Balazs Gerofi
5353b11f90 HFI1: disable kmalloc cache for req->tids (AllReduce fails otherwise) 2018-06-13 00:31:34 +09:00
Balazs Gerofi
abdbf96254 HFI1: use process rank for SDMA engine selection 2018-06-13 00:31:33 +09:00
Balazs Gerofi
bd170e63ba kmalloc cache refactor and pre-alloc in HFI1 open() 2018-06-13 00:31:33 +09:00
Balazs Gerofi
d35fa16417 HFI1: more detailed profiling (disabled by default) 2018-06-13 00:31:33 +09:00
Balazs Gerofi
6406a0df6b HFI1: compute SDMA pkt length taking large pages into account 2018-06-13 00:31:33 +09:00
Balazs Gerofi
52e8f03b4b HFI1: store base physical address in iovec if physically contiguous 2018-06-13 00:31:33 +09:00
Balazs Gerofi
b071a3f32c HFI1: use fast_memcpy() in header fillings
Conflicts:
	kernel/user_sdma.c
2018-06-13 00:31:33 +09:00
Balazs Gerofi
90258f00bd HFI1: use generic kmalloc cache for user_sdma_txreqs and req tids 2018-06-13 00:31:33 +09:00
Balazs Gerofi
28eb649056 Generic lock-free kmalloc cache implementation
Conflicts:
	kernel/mem.c
2018-06-13 00:31:33 +09:00
Balazs Gerofi
744ebacf65 HFI1: more pre-allocation in txreq cache 2018-06-13 00:31:33 +09:00
Balazs Gerofi
62e438a0aa HFI1: do device ioremap() mappings in per-process fashion 2018-06-13 00:31:32 +09:00
Balazs Gerofi
5ac582a678 user_sdma_send_pkts(): unlikely() around slow path condition 2018-06-13 00:31:32 +09:00
Balazs Gerofi
51bc28acca sdma_select_user_engine(): hash on CPU number 2018-06-13 00:31:32 +09:00
Balazs Gerofi
c43654d69b user_sdma_send_pkts(): handle page sizes correctly 2018-06-13 00:31:32 +09:00
Aram Santogidis
c1d2db6a73 fixed sdma_vl_map, just in case it will be used in the future 2018-06-13 00:31:32 +09:00
Balazs Gerofi
aeef55d1b0 kmalloc(): try to get from remote_free list when regular is empty 2018-06-13 00:31:32 +09:00
Balazs Gerofi
6e289e8d9f HFI1: txreq cache and profiling 2018-06-13 00:31:32 +09:00
Balazs Gerofi
3b5363c533 HFI1: use original length calculation in sdma_send_pkts()
Conflicts:
	kernel/include/hfi1/sdma.h
2018-06-13 00:31:32 +09:00
Balazs Gerofi
60f6862db2 HFI1: use local write if private data is present; fix lenght alignment 2018-06-13 00:31:31 +09:00
Balazs Gerofi
39deff4e10 HFI1: working but a bit slow 2018-06-13 00:31:31 +09:00
Aram Santogidis
7f03c18d4d Real run test version (update_tail, kregbase+offset crash) 2018-06-13 00:31:31 +09:00
Aram Santogidis
640dba627f Added debugging output. Bugfixes in user_sdma_send_pkts() and sdma_send_txreq(). 2018-06-13 00:31:31 +09:00
Aram Santogidis
ae368d97d4 Implemented a replacement for sdma_txadd_page()
Conflicts:
	kernel/user_sdma.c
2018-06-13 00:31:31 +09:00
Balazs Gerofi
99c216d91e HFI1: fix kregbase/piobase types to avoid warnings 2018-06-13 00:31:31 +09:00
Balazs Gerofi
3c357dc30a HFI1: fix completion mapping 2018-06-13 00:31:31 +09:00
Balazs Gerofi
37866e61ab HFI1: map completion queues 2018-06-13 00:31:31 +09:00
Aram Santogidis
076e6b9b12 Enabled _sdma_txadd_daddr() 2018-06-13 00:31:30 +09:00
Aram Santogidis
fa6db686b4 Corrected spin_lock_irqsave() spin_unlock_irqrestore() definitions
Conflicts:
	kernel/include/hfi1/ihk_hfi1_common.h
2018-06-13 00:31:30 +09:00
Aram Santogidis
74a636a612 Updated structs to use completion{} and wait_queue_head_t{} and added struct size checkes in hfi1_aio_write() 2018-06-13 00:31:30 +09:00
Aram Santogidis
1c4a6568e6 Updated sdma.h (fixed struct sdma_engine size) 2018-06-13 00:31:30 +09:00
Balazs Gerofi
7d2e2f93b0 HFI1: map piobase and rcvarray_wc 2018-06-13 00:31:30 +09:00
Aram Santogidis
7005110697 Updated and confirmed struct iowait{} and struct hfi1_user_sdma_pkt_q {}
Conflicts:
	kernel/include/hfi1/ihk_hfi1_common.h
2018-06-13 00:31:30 +09:00
Aram Santogidis
c4ca4ae3ab Updated struct hfi1_devdata and confirmed its size 2018-06-13 00:31:30 +09:00
Aram Santogidis
b024a486b9 Updated hfi1_filedata {} and confirmed its size against the original on Linux
Conflicts:
	kernel/include/hfi1/hfi.h
2018-06-13 00:31:30 +09:00
Aram Santogidis
fe4c461f2f Updated kcalloc/kmalloc calls and enabled sdma_select_user_engine dependencies
Conflicts:
	kernel/include/hfi1/ihk_hfi1_common.h
2018-06-13 00:31:29 +09:00
Balazs Gerofi
b60a980088 hfi1_user_sdma_process_request(): map HFI1 kregbase 2018-06-13 00:31:29 +09:00
Balazs Gerofi
ec66229063 HFI1: adjust sdma_select_user_engine()
Conflicts:
	kernel/user_sdma.c
2018-06-13 00:31:29 +09:00
Balazs Gerofi
b875b5186f spinlock: make increment compatible with XPPSL Linux (v3.10) 2018-06-13 00:31:29 +09:00
Aram Santogidis
5cf884ef41 Updated TODO tags and struct hfi1_user_sdma_pkt_q 2018-06-13 00:31:29 +09:00
Aram Santogidis
64e2639adc * The relevant files have been modified in order to compile with McKernel.
Conflicts:
	kernel/Makefile.build.in
2018-06-13 00:31:29 +09:00
Aram Santogidis
14b360e867 * Added the original files of the driver as a basis for comparison
Conflicts:
	kernel/include/hfi1/sdma.h
	kernel/sdma.c
	kernel/user_sdma.c
2018-06-13 00:31:29 +09:00
Balazs Gerofi
4a0e389953 HFI1: comments to keep in mind
Conflicts:
	kernel/include/hfi1/sdma.h
	kernel/sdma.c
	kernel/user_sdma.c
2018-06-13 00:31:28 +09:00
Balazs Gerofi
34363c2b68 close(): clear fd_priv_table 2018-06-13 00:31:28 +09:00
Aram Santogidis
8a1d756cb1 Added private_data structure in process
Conflicts:
	executer/user/mcexec.c
	kernel/include/process.h
	kernel/process.c
2018-06-13 00:31:28 +09:00
Balazs Gerofi
e36abe57e7 open(): check on private_data for /dev/hfi 2018-06-13 00:31:28 +09:00
Balazs Gerofi
b2c8cc50dc open(): record private_data
Conflicts:
	kernel/syscall.c
2018-06-13 00:31:28 +09:00
Balazs Gerofi
b9b4a4fe36 search_free_space(): manage region->map_end internally
Cherry-pick of 87f72548a232a1626f2ca103da7f1ce62d139359

Conflicts:
	kernel/syscall.c
2018-06-13 00:31:28 +09:00
Balazs Gerofi
4b652c9353 atobytes(): restore postfix before return 2018-06-13 00:31:28 +09:00
Dominique Martinet
60ac94cbb9 process/vm/access_ok: fix edge checks.
Add check for start/end being larger than the range we're checking.
Fix corner case where the access_check() was done on last vm range, and
we would be looking beyond last element (null deref)
2018-06-13 00:31:28 +09:00
Dominique Martinet
42bbf5f2a4 process/vm: implement access_ok() 2018-06-13 00:31:27 +09:00
Balazs Gerofi
e29a40331d partitioned execution: pass process rank to LWK
Cherry-pick of d2d134d5e6a4b16a34d55d31b14614a2a91ecf47

Conflicts:
	kernel/include/process.h
2018-06-13 00:31:27 +09:00
Balazs Gerofi
655de2cd82 ihk_mc_get_linux_kernel_pgt(): add declaration
Cherry-pick of caff967a442907dd75f8cd878b9f2ea7608c77b2
2018-06-13 00:31:27 +09:00
Katsuya Horigome
205747594b Exclude areas not assigned to Mckernel from direct map of all phys. memory
It's enabled by adding -s to mcreboot.sh.

Cherry-pick of the following commit:

commit b5c13ce51a5a4926c2cf11c817cd0d369ac4402d
Author: Katsuya Horigome <katsuya.horigome.rj@ps.hitachi-solutions.com>
Date:   Mon Nov 20 09:40:41 2017 +0900

    Include measures to prevent memory destruction on Linux side (This is rebase commit for merging to development+hfi)
2018-06-13 00:31:27 +09:00
Masamichi Takagi
21f9a1ea33 eclair: fix MAP_KERNEL_START and apply Fujitsu's proposals
(1) Cherry-pick of 644afd8b45fc253ad7b90849e99aae354bac5b17
(2) Pass length to functions with arguments of variable length
    * POSTK_DEBUG_ARCH_DEP_38
(3) Separate architecture dependent functions/structures
    * POSTK_DEBUG_ARCH_DEP_34
(4) Fix include path
    * POSTK_DEBUG_ARCH_DEP_76
(5) Include config.h
    * POSTK_DEBUG_ARCH_DEP_33
2018-06-13 00:31:27 +09:00
Balazs Gerofi
aed099fbcb kmalloc_header: use signed integer for target CPU id
Cherry-pick of bdb2d4d8fa94f9c0268cdfdb21af1a2a5c2bcae5
2018-06-13 00:31:27 +09:00
Balazs Gerofi
48515970a0 ihk_mc_get_processor_id(): return -1 for non-McKernel CPUs
Cherry-pick of c45641e97add9fde467844d9272f2626cf4317de
2018-06-13 00:31:27 +09:00
Balazs Gerofi
b888f31b30 Map LWK TEXT to the end of Linux modules section (0xFFFFFFFFFE800000)
Cherry-pick of b9827b25883a9622058cb78006e705f09eaf9a84
2018-06-13 00:31:27 +09:00
Balazs Gerofi
7982008b5b virt_to_phys(): fix debug messages
Cherry-pick of 46eb3b73dac75b28ead62476f017ad0f29ec4b0a
2018-06-13 00:31:26 +09:00
Balazs Gerofi
f658173269 init_normal_area(): fix mapping start physical address
Cherry-pick of 2d3006818473af50c38a3d0e33595b4e74588004
2018-06-13 00:31:26 +09:00
Balazs Gerofi
ca7edf1df8 mem: make McKernel kernel heap virtual addresses Linux compatible
Cherry-pick of e5334c646d2dc6fb11d419918d8139a0de583fde
2018-06-13 00:31:26 +09:00
Balazs Gerofi
9a5f3ad4e6 mem: map Linux kernel virtual addresses properly
Cherry-pick of 5f37e846c3d70e5d5c0baea5b8eb8ceee3411c88
2018-06-13 00:31:26 +09:00
Balazs Gerofi
cfbab0ee82 move McKernel out of Linux kernel virtual
Cherry-pick of 88a8277f17da62d349b4340b66d37482344db649
2018-06-13 00:31:26 +09:00
576 changed files with 19734 additions and 45911 deletions

2
.gitignore vendored
View File

@@ -1,4 +1,3 @@
*~
*.o
*.elf
*.bin
@@ -15,3 +14,4 @@ elfboot/elfboot_test
linux/executer/mcexec
linux/mod_test*
linux/target
kernel/script/dwarf-extract-struct

3
.gitmodules vendored
View File

@@ -1,3 +0,0 @@
[submodule "ihk"]
path = ihk
url = https://github.com/RIKEN-SysSoft/ihk.git

View File

@@ -1,5 +1,6 @@
TARGET = @TARGET@
SBINDIR = @SBINDIR@
BINDIR = @BINDIR@
INCDIR = @INCDIR@
ETCDIR = @ETCDIR@
MANDIR = @MANDIR@
@@ -47,6 +48,7 @@ install:
mkdir -p -m 755 $(SBINDIR); \
install -m 755 arch/x86_64/tools/mcreboot-smp-x86.sh $(SBINDIR)/mcreboot.sh; \
install -m 755 arch/x86_64/tools/mcstop+release-smp-x86.sh $(SBINDIR)/mcstop+release.sh; \
install -m 755 arch/x86_64/tools/mpimcexec $(BINDIR)/mpimcexec; \
install -m 755 arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh $(SBINDIR)/mcoverlay-destroy.sh; \
install -m 755 arch/x86_64/tools/mcoverlay-create-smp-x86.sh $(SBINDIR)/mcoverlay-create.sh; \
install -m 755 arch/x86_64/tools/eclair-dump-backtrace.exp $(SBINDIR)/eclair-dump-backtrace.exp;\
@@ -57,6 +59,7 @@ install:
install -m 644 kernel/include/swapfmt.h $(INCDIR); \
mkdir -p -m 755 $(MANDIR)/man1; \
install -m 644 arch/x86_64/tools/mcreboot.1 $(MANDIR)/man1/mcreboot.1; \
install -m 644 arch/x86_64/tools/mpimcexec.1 $(MANDIR)/man1/mpimcexec.1; \
;; \
*) \
echo "unknown target $(TARGET)" >&2 \

View File

@@ -30,7 +30,6 @@
#include <debug-monitors.h>
#include <sysreg.h>
#include <cpufeature.h>
#include <debug.h>
#ifdef POSTK_DEBUG_ARCH_DEP_65
#include <hwcap.h>
#endif /* POSTK_DEBUG_ARCH_DEP_65 */
@@ -40,10 +39,16 @@
#include "postk_print_sysreg.c"
#ifdef DEBUG_PRINT_CPU
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
struct cpuinfo_arm64 cpuinfo_data[NR_CPUS]; /* index is logical cpuid */
static unsigned int per_cpu_timer_val[NR_CPUS] = { 0 };
@@ -1278,6 +1283,7 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
/*
* @ref.impl linux-linaro/arch/arm64/kernel/process.c::tls_thread_switch()
*/
@@ -1303,13 +1309,14 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *last;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
struct mcs_rwlock_node_irqsave lock;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
/* Set up new TLS.. */
dkprintf("[%d] arch_switch_context: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
#ifdef ENABLE_PERF
/* Performance monitoring inherit */
if(next->proc->monitoring_event) {
if(next->proc->perf_status == PP_RESET)
@@ -1319,10 +1326,10 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
perf_start(next->proc->monitoring_event);
}
}
#endif /*ENABLE_PERF*/
if (likely(prev)) {
tls_thread_switch(prev, next);
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
@@ -1336,12 +1343,11 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
}
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
@@ -1351,6 +1357,7 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
return last;
}
#endif /* POSTK_DEBUG_ARCH_DEP_22 */
/*@
@ requires \valid(thread);
@@ -1432,7 +1439,8 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
void clear_fp_regs(void)
void
clear_fp_regs(struct thread *thread)
{
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
#ifdef CONFIG_ARM64_SVE
@@ -1469,7 +1477,7 @@ restore_fp_regs(struct thread *thread)
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs();
clear_fp_regs(thread);
return;
}
thread_fpsimd_load(thread);

View File

@@ -9,16 +9,20 @@
#include <prctl.h>
#include <cpufeature.h>
#include <kmalloc.h>
#include <debug.h>
#include <process.h>
//#define DEBUG_PRINT_FPSIMD
#ifdef DEBUG_PRINT_FPSIMD
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
#ifdef CONFIG_ARM64_SVE
/* Maximum supported vector length across all CPUs (initially poisoned) */
@@ -69,6 +73,9 @@ static int get_nr_threads(struct process *proc)
return nr_threads;
}
extern void save_fp_regs(struct thread *thread);
extern void clear_fp_regs(struct thread *thread);
extern void restore_fp_regs(struct thread *thread);
/* @ref.impl arch/arm64/kernel/fpsimd.c::sve_set_vector_length */
int sve_set_vector_length(struct thread *thread,
unsigned long vl, unsigned long flags)
@@ -122,7 +129,7 @@ int sve_set_vector_length(struct thread *thread,
/* for self at prctl syscall */
if (thread == cpu_local_var(current)) {
save_fp_regs(thread);
clear_fp_regs();
clear_fp_regs(thread);
thread_sve_to_fpsimd(thread, &fp_regs);
sve_free(thread);

View File

@@ -7,7 +7,6 @@
#include <process.h>
#include <string.h>
#include <elfcore.h>
#include <debug.h>
#define align32(x) ((((x) + 3) / 4) * 4)
#define alignpage(x) ((((x) + (PAGE_SIZE) - 1) / (PAGE_SIZE)) * (PAGE_SIZE))
@@ -15,8 +14,11 @@
//#define DEBUG_PRINT_GENCORE
#ifdef DEBUG_PRINT_GENCORE
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
/*

View File

@@ -6,8 +6,6 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include "affinity.h"
#include <lwk/compiler.h>
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@@ -154,8 +152,6 @@ typedef struct mcs_lock_node {
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_lock_node_t;
typedef mcs_lock_node_t mcs_lock_t;
static void mcs_lock_init(struct mcs_lock_node *node)
{
node->locked = 0;
@@ -606,16 +602,4 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
static inline int irqflags_can_interrupt(unsigned long flags)
{
#ifdef CONFIG_HAS_NMI
#warning irqflags_can_interrupt needs testing/fixing on such a target
return flags > ICC_PMR_EL1_MASKED;
#else
// PSTATE.DAIF I bit clear means interrupt is possible
return !(flags & (1 << 7));
#endif
}
#endif /* !__HEADER_ARM64_COMMON_ARCH_LOCK_H */

View File

@@ -35,4 +35,38 @@ void arm64_disable_pmu(void);
int armv8pmu_init(struct arm_pmu* cpu_pmu);
/* TODO[PMU]: 共通部に定義があっても良い。今後の動向を見てここの定義を削除する */
/*
* Generalized hardware cache events:
*
* { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
* { read, write, prefetch } x
* { accesses, misses }
*/
enum perf_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,
PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum perf_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,
PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum perf_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,
PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
#endif

View File

@@ -9,11 +9,6 @@
#define _NSIG_BPW 64
#define _NSIG_WORDS (_NSIG / _NSIG_BPW)
static inline int valid_signal(unsigned long sig)
{
return sig <= _NSIG ? 1 : 0;
}
typedef unsigned long int __sigset_t;
#define __sigmask(sig) (((__sigset_t) 1) << ((sig) - 1))

View File

@@ -114,18 +114,14 @@ SYSCALL_HANDLED(236, get_mempolicy)
SYSCALL_HANDLED(237, set_mempolicy)
SYSCALL_HANDLED(238, migrate_pages)
SYSCALL_HANDLED(239, move_pages)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(241, perf_event_open)
#endif // PERF_ENABLE
SYSCALL_HANDLED(260, wait4)
SYSCALL_HANDLED(270, process_vm_readv)
SYSCALL_HANDLED(271, process_vm_writev)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(601, pmc_init)
SYSCALL_HANDLED(602, pmc_start)
SYSCALL_HANDLED(603, pmc_stop)
SYSCALL_HANDLED(604, pmc_reset)
#endif // PERF_ENABLE
SYSCALL_HANDLED(700, get_cpu_id)
#ifdef PROFILE_ENABLE
SYSCALL_HANDLED(__NR_profile, profile)

View File

@@ -7,13 +7,15 @@
#include <arch/cpu.h>
#include <memory.h>
#include <syscall.h>
#include <debug.h>
// #define DEBUG_GICV2
#ifdef DEBUG_GICV2
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
void *dist_base;

View File

@@ -7,15 +7,17 @@
#include <cputype.h>
#include <process.h>
#include <syscall.h>
#include <debug.h>
//#define DEBUG_GICV3
#define USE_CAVIUM_THUNDER_X
#ifdef DEBUG_GICV3
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
#ifdef USE_CAVIUM_THUNDER_X

View File

@@ -14,7 +14,9 @@
#include <context.h>
#include <kmalloc.h>
#include <vdso.h>
#include <debug.h>
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
@@ -2922,12 +2924,17 @@ int read_process_vm(struct process_vm *vm, void *kdst, const void *usrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, to: %p, pa: %p,"
"cpsize: %d\n", __FUNCTION__, to, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(to, va, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);
@@ -3000,12 +3007,17 @@ int write_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_WRITABLE|PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);
@@ -3066,12 +3078,17 @@ int patch_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_WRITABLE|PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);

View File

@@ -93,50 +93,21 @@ int ihk_mc_perfctr_init(int counter, uint64_t config, int mode)
return ret;
}
int ihk_mc_perfctr_start(unsigned long counter_mask)
int ihk_mc_perfctr_start(int counter)
{
int ret = 0;
int counter;
unsigned long counter_bit;
for (counter = 0, counter_bit = 1;
counter_bit < counter_mask;
counter++, counter_bit <<= 1) {
if (!(counter_mask & counter_bit))
continue;
ret = cpu_pmu.enable_counter(counter_mask);
if (ret < 0)
break;
}
return ret < 0 ? ret : 0;
int ret;
ret = cpu_pmu.enable_counter(counter);
return ret;
}
int ihk_mc_perfctr_stop(unsigned long counter_mask)
int ihk_mc_perfctr_stop(int counter)
{
int ret = 0;
int counter;
unsigned long counter_bit;
cpu_pmu.disable_counter(counter);
for (counter = 0, counter_bit = 1;
counter_bit < counter_mask;
counter++, counter_bit <<= 1) {
if (!(counter_mask & counter_bit))
continue;
ret = cpu_pmu.disable_counter(counter);
if (ret < 0)
break;
// ihk_mc_perfctr_startが呼ばれるときには、
// init系関数が呼ばれるのでdisableにする。
ret = cpu_pmu.disable_intens(counter);
if (ret < 0)
break;
}
return ret < 0 ? ret : 0;
// ihk_mc_perfctr_startが呼ばれるときには、
// init系関数が呼ばれるのでdisableにする。
cpu_pmu.disable_intens(counter);
return 0;
}
int ihk_mc_perfctr_reset(int counter)

View File

@@ -4,14 +4,16 @@
#include <ihk/perfctr.h>
#include <errno.h>
#include <ihk/debug.h>
#include <debug.h>
#define BIT(nr) (1UL << (nr))
//#define DEBUG_PRINT_PMU
#ifdef DEBUG_PRINT_PMU
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif

View File

@@ -21,13 +21,15 @@
#include <ihk/debug.h>
#include <compiler.h>
#include <lwk/compiler.h>
#include <debug.h>
//#define DEBUG_PRINT_PSCI
#ifdef DEBUG_PRINT_PSCI
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1

View File

@@ -11,17 +11,22 @@
#include <hwcap.h>
#include <string.h>
#include <thread_info.h>
#include <debug.h>
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#endif
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
extern void save_debugreg(unsigned long *debugreg);
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern int interrupt_from_user(void *);
@@ -954,7 +959,11 @@ void ptrace_report_signal(struct thread *thread, int sig)
}
thread->exit_status = sig;
/* Transition thread state */
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_TRACED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_TRACED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_TRACED;
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
@@ -973,6 +982,10 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */

View File

@@ -14,8 +14,6 @@
#include <prctl.h>
#include <limits.h>
#include <syscall.h>
#include <uio.h>
#include <debug.h>
extern void ptrace_report_signal(struct thread *thread, int sig);
extern void clear_single_step(struct thread *thread);
@@ -29,12 +27,18 @@ static void __check_signal(unsigned long rc, void *regs, int num, int irq_disabl
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#endif
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
uintptr_t debug_constants[] = {
sizeof(struct cpu_local_var),
offsetof(struct cpu_local_var, current),
@@ -55,7 +59,7 @@ static int cpuid_head = 1;
extern int num_processors;
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last) {
int obtain_clone_cpuid(cpu_set_t *cpu_set) {
int min_queue_len = -1;
int i, min_cpu = -1;
@@ -1173,10 +1177,19 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
/* Reap and set new signal_flags */
proc->signal_flags = SIGNAL_STOP_STOPPED;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_STOPPED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_STOPPED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
dkprintf("do_signal(): pid: %d, tid: %d SIGSTOP, sleeping\n",
proc->pid, thread->tid);
/* Sleep */
@@ -1193,10 +1206,19 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
thread->exit_status = SIGTRAP;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_TRACED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_TRACED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&thread->proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
/* Sleep */
dkprintf("do_signal,SIGTRAP,sleeping\n");
@@ -1572,7 +1594,7 @@ done:
return 0;
}
if (tthread->uti_state == UTI_STATE_RUNNING_IN_LINUX) {
if (tthread->thread_offloaded) {
interrupt_syscall(tthread, sig);
release_thread(tthread);
return 0;
@@ -1707,7 +1729,7 @@ SYSCALL_DECLARE(mmap)
| MAP_NONBLOCK // 0x10000
;
const uintptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const intptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const size_t len0 = ihk_mc_syscall_arg1(ctx);
const int prot = ihk_mc_syscall_arg2(ctx);
const int flags0 = ihk_mc_syscall_arg3(ctx);
@@ -1716,7 +1738,7 @@ SYSCALL_DECLARE(mmap)
struct thread *thread = cpu_local_var(current);
struct vm_regions *region = &thread->vm->region;
int error;
uintptr_t addr = 0;
intptr_t addr = 0;
size_t len;
int flags = flags0;
size_t pgsize;
@@ -1779,9 +1801,8 @@ SYSCALL_DECLARE(mmap)
goto out;
}
if (addr < region->user_start
|| region->user_end <= addr
|| len > (region->user_end - region->user_start)) {
if ((flags & MAP_FIXED) && ((addr < region->user_start)
|| (region->user_end <= addr))) {
ekprintf("sys_mmap(%lx,%lx,%x,%x,%x,%lx):ENOMEM\n",
addr0, len0, prot, flags0, fd, off0);
error = -ENOMEM;

View File

@@ -14,13 +14,15 @@
#include <ihk/debug.h>
#include <ikc/queue.h>
#include <vdso.h>
#include <debug.h>
//#define DEBUG_PRINT_VDSO
#ifdef DEBUG_PRINT_VDSO
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52

View File

@@ -1,7 +1,5 @@
/* gettimeofday.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <affinity.h>
#include <arch-memory.h>
#include <time.h>
#include <syscall.h>
#include <registers.h>

View File

@@ -9,29 +9,29 @@ PHDRS
SECTIONS
{
. = SIZEOF_HEADERS;
. = ALIGN(4096);
. = ALIGN(4096);
.text : {
*(.text)
*(.text)
} :text
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
. = ALIGN(8);
.bss : {
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}
}

View File

@@ -31,7 +31,6 @@
#include <prctl.h>
#include <page.h>
#include <kmalloc.h>
#include <debug.h>
#define LAPIC_ID 0x020
#define LAPIC_TIMER 0x320
@@ -70,8 +69,11 @@
//#define DEBUG_PRINT_CPU
#ifdef DEBUG_PRINT_CPU
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#endif
static void *lapic_vp;
@@ -94,8 +96,6 @@ int gettime_local_support = 0;
extern int ihk_mc_pt_print_pte(struct page_table *pt, void *virt);
extern int kprintf(const char *format, ...);
extern int interrupt_from_user(void *);
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
static struct idt_entry{
uint32_t desc[4];
@@ -847,6 +847,9 @@ void setup_x86_ap(void (*next_func)(void))
}
void arch_show_interrupt_context(const void *reg);
void set_signal(int sig, void *regs, struct siginfo *info);
void check_signal(unsigned long, void *, int);
void check_sig_pending();
extern void tlb_flush_handler(int vector);
void __show_stack(uintptr_t *sp) {
@@ -874,7 +877,7 @@ void interrupt_exit(struct x86_user_context *regs)
cpu_enable_interrupt();
check_sig_pending();
check_need_resched();
check_signal(0, regs, -1);
check_signal(0, regs, 0);
}
else {
check_sig_pending();
@@ -1007,12 +1010,6 @@ void handle_interrupt(int vector, struct x86_user_context *regs)
set_cputime(interrupt_from_user(regs)? 0: 1);
--v->in_interrupt;
/* for migration by IPI */
if (v->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
}
}
void gpe_handler(struct x86_user_context *regs)
@@ -1228,6 +1225,13 @@ void cpu_pause(void)
asm volatile("pause" ::: "memory");
}
/* From: kernel-xppsl_1.5.2/arch/x86/include/asm/processor.h */
/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
void cpu_relax(void)
{
asm volatile("rep; nop" ::: "memory");
}
/*@
@ assigns \nothing;
@ ensures \interrupt_disabled > 0;
@@ -1647,10 +1651,12 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *arch_switch_context(struct thread *prev, struct thread *next)
{
struct thread *last;
struct mcs_rwlock_node_irqsave lock;
dkprintf("[%d] schedule: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
@@ -1669,7 +1675,7 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
#ifdef PROFILE_ENABLE
if (prev && prev->profile && prev->profile_start_ts != 0) {
if (prev->profile && prev->profile_start_ts != 0) {
prev->profile_elapsed_ts +=
(rdtsc() - prev->profile_start_ts);
prev->profile_start_ts = 0;
@@ -1681,28 +1687,6 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
#endif
if (prev) {
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
case PS_DELAY_STOPPED:
prev->proc->status = PS_STOPPED;
break;
case PS_DELAY_TRACED:
prev->proc->status = PS_TRACED;
break;
default:
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
}
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
else {
@@ -1710,6 +1694,7 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
return last;
}
#endif
/*@
@ requires \valid(thread);
@@ -1784,6 +1769,14 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
#ifdef POSTK_DEBUG_TEMP_FIX_19
void
clear_fp_regs(struct thread *thread)
{
return;
}
#endif /* POSTK_DEBUG_TEMP_FIX_19 */
/*@
@ requires \valid(thread);
@ assigns thread->fp_regs;
@@ -1791,11 +1784,8 @@ void copy_fp_regs(struct thread *from, struct thread *to)
void
restore_fp_regs(struct thread *thread)
{
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs();
if (!thread->fp_regs)
return;
}
if (xsave_available) {
unsigned int low, high;
@@ -1814,13 +1804,6 @@ restore_fp_regs(struct thread *thread)
//release_fp_regs(thread);
}
void clear_fp_regs(void)
{
struct cpu_local_var *v = get_this_cpu_local_var();
restore_fp_regs(&v->idle);
}
ihk_mc_user_context_t *lookup_user_context(struct thread *thread)
{
ihk_mc_user_context_t *uctx = thread->uctx;

View File

@@ -6,7 +6,6 @@
#include <process.h>
#include <string.h>
#include <elfcore.h>
#include <debug.h>
#define align32(x) ((((x) + 3) / 4) * 4)
#define alignpage(x) ((((x) + (PAGE_SIZE) - 1) / (PAGE_SIZE)) * (PAGE_SIZE))
@@ -14,16 +13,13 @@
//#define DEBUG_PRINT_GENCORE
#ifdef DEBUG_PRINT_GENCORE
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
/* Exclude reserved (mckernel's internal use), device file,
* hole created by mprotect
*/
#define GENCORE_RANGE_IS_INACCESSIBLE(range) \
((range->flag & (VR_RESERVED | VR_MEMTYPE_UC | VR_DONTDUMP)))
/*
* Generate a core file image, which consists of many chunks.
* Returns an allocated table, an etnry of which is a pair of the address
@@ -313,10 +309,12 @@ int gencore(struct thread *thread, void *regs,
dkprintf("start:%lx end:%lx flag:%lx objoff:%lx\n",
range->start, range->end, range->flag, range->objoff);
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
/* We omit reserved areas because they are only for
mckernel's internal use. */
if (range->flag & VR_RESERVED)
continue;
if (range->flag & VR_DONTDUMP)
continue;
}
/* We need a chunk for each page for a demand paging area.
This can be optimized for spacial complexity but we would
lose simplicity instead. */
@@ -405,9 +403,8 @@ int gencore(struct thread *thread, void *regs,
unsigned long flag = range->flag;
unsigned long size = range->end - range->start;
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
if (range->flag & VR_RESERVED)
continue;
}
ph[i].p_type = PT_LOAD;
ph[i].p_flags = ((flag & VR_PROT_READ) ? PF_R : 0)
@@ -449,9 +446,8 @@ int gencore(struct thread *thread, void *regs,
unsigned long phys;
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
if (range->flag & VR_RESERVED)
continue;
}
if (range->flag & VR_DEMAND_PAGING) {
/* Just an ad hoc kluge. */
unsigned long p, start, phys;

View File

@@ -64,13 +64,12 @@ static inline int futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval,
return oldval;
}
static inline int futex_atomic_op_inuser(int encoded_op,
int __user *uaddr)
static inline int futex_atomic_op_inuser(int encoded_op, int __user *uaddr)
{
int op = (encoded_op >> 28) & 7;
int cmp = (encoded_op >> 24) & 15;
int oparg = (encoded_op & 0x00fff000) >> 12;
int cmparg = encoded_op & 0xfff;
int oparg = (encoded_op << 8) >> 20;
int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret, tem;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))

View File

@@ -6,7 +6,6 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include <lwk/compiler.h>
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@@ -38,58 +37,6 @@ static void ihk_mc_spinlock_init(ihk_spinlock_t *lock)
}
#define SPIN_LOCK_UNLOCKED { .head_tail = 0 }
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock_noirq(l) { int rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock_noirq(l); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock_noirq\n", ihk_mc_get_processor_id()); rc; \
}
#else
#define ihk_mc_spinlock_trylock_noirq __ihk_mc_spinlock_trylock_noirq
#endif
static int __ihk_mc_spinlock_trylock_noirq(ihk_spinlock_t *lock)
{
ihk_spinlock_t cur = { .head_tail = lock->head_tail };
ihk_spinlock_t next = { .tickets.head = cur.tickets.head, .tickets.tail = cur.tickets.tail + 2 };
int success;
if (cur.tickets.head != cur.tickets.tail) {
return 0;
}
preempt_disable();
/* Use the same increment amount as other functions! */
success = __sync_bool_compare_and_swap((__ticketpair_t*)lock, cur.head_tail, next.head_tail);
if (!success) {
preempt_enable();
}
return success;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock(l, result) ({ unsigned long rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock(l, result); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock\n", ihk_mc_get_processor_id()); rc;\
})
#else
#define ihk_mc_spinlock_trylock __ihk_mc_spinlock_trylock
#endif
static unsigned long __ihk_mc_spinlock_trylock(ihk_spinlock_t *lock, int *result)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
*result = __ihk_mc_spinlock_trylock_noirq(lock);
return flags;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_lock_noirq(l) { \
__kprintf("[%d] call ihk_mc_spinlock_lock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
@@ -652,9 +599,4 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
static inline int irqflags_can_interrupt(unsigned long flags)
{
return !!(flags & 0x200);
}
#endif

View File

@@ -40,6 +40,12 @@
#define LARGE_PAGE_MASK (~((unsigned long)LARGE_PAGE_SIZE - 1))
#define LARGE_PAGE_P2ALIGN (LARGE_PAGE_SHIFT - PAGE_SHIFT)
#define GB_PAGE_SHIFT 30
#define GB_PAGE_SIZE (1UL << GB_PAGE_SHIFT)
#define GB_PAGE_MASK (~((unsigned long)GB_PAGE_SIZE - 1))
#define GB_PAGE_P2ALIGN (GB_PAGE_SHIFT - PAGE_SHIFT)
#define USER_END 0x0000800000000000UL
#define TASK_UNMAPPED_BASE 0x00002AAAAAA00000UL

View File

@@ -133,7 +133,7 @@ static inline void ihk_atomic64_inc(ihk_atomic64_t *v)
* Note 2: xchg has side effect, so that attribute volatile is necessary,
* but generally the primitive is invalid, *ptr is output argument. --ANK
*/
#define __xg(x) ((volatile long *)(x))
#define __xg(x) ((volatile typeof(x))(x))
#define xchg4(ptr, x) \
({ \

View File

@@ -39,7 +39,7 @@ SYSCALL_HANDLED(15, rt_sigreturn)
SYSCALL_HANDLED(16, ioctl)
SYSCALL_DELEGATED(17, pread64)
SYSCALL_DELEGATED(18, pwrite64)
SYSCALL_DELEGATED(20, writev)
SYSCALL_HANDLED(20, writev)
SYSCALL_DELEGATED(21, access)
SYSCALL_DELEGATED(23, select)
SYSCALL_HANDLED(24, sched_yield)
@@ -114,7 +114,7 @@ SYSCALL_HANDLED(160, setrlimit)
SYSCALL_HANDLED(164, settimeofday)
SYSCALL_HANDLED(186, gettid)
SYSCALL_HANDLED(200, tkill)
SYSCALL_HANDLED(201, time)
SYSCALL_DELEGATED(201, time)
SYSCALL_HANDLED(202, futex)
SYSCALL_HANDLED(203, sched_setaffinity)
SYSCALL_HANDLED(204, sched_getaffinity)
@@ -161,7 +161,6 @@ SYSCALL_HANDLED(__NR_profile, profile)
SYSCALL_HANDLED(730, util_migrate_inter_kernel)
SYSCALL_HANDLED(731, util_indicate_clone)
SYSCALL_HANDLED(732, get_system)
SYSCALL_HANDLED(733, util_register_desc)
/* McKernel Specific */
SYSCALL_HANDLED(801, swapout)

View File

@@ -25,13 +25,15 @@
#include <cls.h>
#include <kmalloc.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG
#ifdef DEBUG
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#endif
static char *last_page;
@@ -168,6 +170,34 @@ static unsigned long setup_l3(struct page_table *pt,
return virt_to_phys(pt);
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys, pt_phys;
int ident_index, virt_index;
/*
* This has to start from 0x00, see load_file() in IHK-SMP.
* For security reasons, we could skip holes in the LWK
* assigned physical memory, but Linux mappings already map
* those anyway.
*/
map_start = 0;
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
ident_index = map_start >> PTL4_SHIFT;
virt_index = (MAP_ST_START >> PTL4_SHIFT) & (PT_ENTRIES - 1);
memset(pt, 0, sizeof(struct page_table));
for (phys = map_start; phys < map_end; phys += PTL4_SIZE) {
pt_phys = setup_l3(ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL),
phys, map_start, map_end);
pt->entry[ident_index++] = pt_phys | PFL4_PDIR_ATTR;
pt->entry[virt_index++] = pt_phys | PFL4_PDIR_ATTR;
}
}
static struct page_table *__alloc_new_pt(ihk_mc_ap_flag ap_flag)
{
struct page_table *newpt = ihk_mc_alloc_pages(1, ap_flag);
@@ -235,11 +265,6 @@ static unsigned long attr_to_l1attr(enum ihk_mc_pt_attribute attr)
}
}
#define PTLX_SHIFT(index) PTL ## index ## _SHIFT
#define GET_VIRT_INDEX(virt, index, dest) \
dest = ((virt) >> PTLX_SHIFT(index)) & (PT_ENTRIES - 1)
#define GET_VIRT_INDICES(virt, l4i, l3i, l2i, l1i) \
l4i = ((virt) >> PTL4_SHIFT) & (PT_ENTRIES - 1); \
l3i = ((virt) >> PTL3_SHIFT) & (PT_ENTRIES - 1); \
@@ -706,6 +731,26 @@ static void destroy_page_table(int level, struct page_table *pt)
return;
}
void ihk_mc_pt_destroy_pgd_subtree(struct page_table *pt, void *virt)
{
int l4idx, l3idx, l2idx, l1idx;
unsigned long v = (unsigned long)virt;
struct page_table *lower;
GET_VIRT_INDICES(v, l4idx, l3idx, l2idx, l1idx);
if (!(pt->entry[l4idx] & PF_PRESENT))
return;
lower = (struct page_table *)
phys_to_virt(pt->entry[l4idx] & PT_PHYSMASK);
destroy_page_table(3, lower);
pt->entry[l4idx] = 0;
dkprintf("%s: virt: 0x%lx, l4idx: %d subtree destroyed\n",
__FUNCTION__, virt, l4idx);
}
void ihk_mc_pt_destroy(struct page_table *pt)
{
const int level = 4; /* PML4 */
@@ -1500,12 +1545,12 @@ static int clear_range_l1(void *args0, pte_t *ptep, uint64_t base,
if (page) {
dkprintf("%s: page=%p,is_in_memobj=%d,(old & PFL1_DIRTY)=%lx,memobj=%p,args->memobj->flags=%x\n", __FUNCTION__, page, page_is_in_memobj(page), (old & PFL1_DIRTY), args->memobj, args->memobj ? args->memobj->flags : -1);
}
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL1_SIZE) &&
args->memobj && !(args->memobj->flags & MF_ZEROFILL)) {
if (page && page_is_in_memobj(page) && (old & PFL1_DIRTY) && (args->memobj) &&
!(args->memobj->flags & MF_ZEROFILL)) {
memobj_flush_page(args->memobj, phys, PTL1_SIZE);
}
if (!pte_is_fileoff(&old, PTL1_SIZE)) {
if (!(old & PFL1_FILEOFF)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@@ -1567,11 +1612,11 @@ static int clear_range_l2(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL2_SIZE)) {
if (page && page_is_in_memobj(page) && (old & PFL2_DIRTY)) {
memobj_flush_page(args->memobj, phys, PTL2_SIZE);
}
if (!pte_is_fileoff(&old, PTL2_SIZE)) {
if (!(old & PFL2_FILEOFF)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@@ -1648,13 +1693,13 @@ static int clear_range_l3(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL3_SIZE)) {
if (page && page_is_in_memobj(page) && (old & PFL3_DIRTY)) {
memobj_flush_page(args->memobj, phys, PTL3_SIZE);
}
dkprintf("%s: phys=%ld, pte_get_phys(&old),PTL3_SIZE\n", __FUNCTION__, pte_get_phys(&old));
if (!pte_is_fileoff(&old, PTL3_SIZE)) {
if (!(old & PFL3_FILEOFF)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@@ -1942,6 +1987,28 @@ out:
return ptep;
}
pte_t *ihk_mc_pt_lookup_fault_pte(struct process_vm *vm, void *virt,
int pgshift, void **basep, size_t *sizep, int *p2alignp)
{
int faulted = 0;
pte_t *ptep;
retry:
ptep = ihk_mc_pt_lookup_pte(vm->address_space->page_table,
virt, pgshift, basep, sizep, p2alignp);
if (!faulted && (!ptep || !pte_is_present(ptep))) {
page_fault_process_vm(vm, virt, PF_POPULATE | PF_USER);
faulted = 1;
goto retry;
}
if (faulted && ptep && pte_is_present(ptep)) {
kprintf("%s: successfully faulted 0x%lx\n", __FUNCTION__, virt);
}
return ptep;
}
pte_t *ihk_mc_pt_lookup_pte(page_table_t pt, void *virt, int pgshift,
void **basep, size_t *sizep, int *p2alignp)
{
@@ -2241,7 +2308,7 @@ out:
int ihk_mc_pt_set_range(page_table_t pt, struct process_vm *vm, void *start,
void *end, uintptr_t phys, enum ihk_mc_pt_attribute attr,
int pgshift, struct vm_range *range)
int pgshift, struct vm_range *range)
{
int error;
struct set_range_args args;
@@ -2522,82 +2589,6 @@ static void init_fixed_area(struct page_table *pt)
return;
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
map_start = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0);
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
virt = (void *)MAP_ST_START + map_start;
kprintf("map_start = %lx, map_end = %lx, virt %lx\n",
map_start, map_end, virt);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n",
__func__, virt);
}
virt += LARGE_PAGE_SIZE;
}
}
static void init_linux_kernel_mapping(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
int nr_memory_chunks, chunk_id, numa_id;
/* In case of safe_kernel_map option (safe_kernel_map == 1),
* processing to prevent destruction of the memory area on Linux side
* is executed */
if (safe_kernel_map == 0) {
kprintf("Straight-map entire physical memory\n");
/* Map 2 TB for now */
map_start = 0;
map_end = 0x20000000000;
virt = (void *)LINUX_PAGE_OFFSET;
kprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET, LINUX_PAGE_OFFSET + map_end, 0, map_end);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n", __FUNCTION__, virt);
}
virt += LARGE_PAGE_SIZE;
}
} else {
kprintf("Straight-map physical memory areas allocated to McKernel\n");
nr_memory_chunks = ihk_mc_get_nr_memory_chunks();
if (nr_memory_chunks == 0) {
kprintf("%s: ERROR: No memory chunk available.\n", __FUNCTION__);
return;
}
for (chunk_id = 0; chunk_id < nr_memory_chunks; chunk_id++) {
if (ihk_mc_get_memory_chunk(chunk_id, &map_start, &map_end, &numa_id)) {
kprintf("%s: ERROR: Memory chunk id (%d) out of range.\n", __FUNCTION__, chunk_id);
continue;
}
dkprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET + map_start, LINUX_PAGE_OFFSET + map_end, map_start, map_end);
virt = (void *)(LINUX_PAGE_OFFSET + map_start);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE, virt += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: set_pt_large_page() failed for 0x%lx\n", __FUNCTION__, virt);
}
}
}
}
}
void init_text_area(struct page_table *pt)
{
unsigned long __end, phys, virt;
@@ -2661,6 +2652,61 @@ void init_low_area(struct page_table *pt)
set_pt_large_page(pt, 0, 0, PTATTR_NO_EXECUTE|PTATTR_WRITABLE);
}
static void init_linux_kernel_mapping(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
int nr_memory_chunks, chunk_id, numa_id;
/* In case of safe_kernel_map option (safe_kernel_map == 1),
processing to prevent destruction of the memory area on Linux side
is executed */
if (safe_kernel_map == 0) {
kprintf("Straight-map entire physical memory\n");
/* Map 2 TB for now */
map_start = 0;
map_end = 0x20000000000;
virt = (void *)LINUX_PAGE_OFFSET;
kprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET, LINUX_PAGE_OFFSET + map_end, 0, map_end);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n", __FUNCTION__, virt);
}
virt += LARGE_PAGE_SIZE;
}
} else {
kprintf("Straight-map physical memory areas allocated to McKernel\n");
nr_memory_chunks = ihk_mc_get_nr_memory_chunks();
if (nr_memory_chunks == 0) {
kprintf("%s: ERROR: No memory chunk available.\n", __FUNCTION__);
return;
}
for (chunk_id = 0; chunk_id < nr_memory_chunks; chunk_id++) {
if (ihk_mc_get_memory_chunk(chunk_id, &map_start, &map_end, &numa_id)) {
kprintf("%s: ERROR: Memory chunk id (%d) out of range.\n", __FUNCTION__, chunk_id);
continue;
}
dkprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET + map_start, LINUX_PAGE_OFFSET + map_end, map_start, map_end);
virt = (void *)(LINUX_PAGE_OFFSET + map_start);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE, virt += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: set_pt_large_page() failed for 0x%lx\n", __FUNCTION__, virt);
}
}
}
}
}
static void init_vsyscall_area(struct page_table *pt)
{
extern char vsyscall_page[];
@@ -2682,7 +2728,7 @@ void init_page_table(void)
init_pt = ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL);
ihk_mc_spinlock_init(&init_pt_lock);
memset(init_pt, 0, sizeof(*init_pt));
memset(init_pt, 0, sizeof(PAGE_SIZE));
/* Normal memory area */
init_normal_area(init_pt);
@@ -2916,12 +2962,17 @@ int read_process_vm(struct process_vm *vm, void *kdst, const void *usrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, to: %p, pa: %p,"
"cpsize: %d\n", __FUNCTION__, to, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(to, va, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);
@@ -2995,12 +3046,17 @@ int write_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);
@@ -3061,12 +3117,17 @@ int patch_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
if (!is_mckernel_memory(pa, pa + cpsize)) {
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1);
ihk_mc_unmap_virtual(va, 1, 1);
}
else {
va = phys_to_virt(pa);

View File

@@ -30,7 +30,7 @@ int ihk_mc_ikc_init_first_local(struct ihk_ikc_channel_desc *channel,
memset(channel, 0, sizeof(struct ihk_ikc_channel_desc));
mikc_queue_pages = ((4 * num_processors * MASTER_IKCQ_PKTSIZE)
mikc_queue_pages = ((2 * num_processors * MASTER_IKCQ_PKTSIZE)
+ (PAGE_SIZE - 1)) / PAGE_SIZE;
/* Place both sides in this side */

View File

@@ -16,16 +16,20 @@
#include <registers.h>
#include <mc_perf_event.h>
#include <config.h>
#include <debug.h>
extern unsigned int *x86_march_perfmap;
extern int running_on_kvm(void);
#ifdef POSTK_DEBUG_TEMP_FIX_31
int ihk_mc_perfctr_fixed_init(int counter, int mode);
#endif/*POSTK_DEBUG_TEMP_FIX_31*/
//#define PERFCTR_DEBUG
#ifdef PERFCTR_DEBUG
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#endif
#define X86_CR4_PCE 0x00000100
@@ -39,11 +43,11 @@ int ihk_mc_perfctr_fixed_init(int counter, int mode);
} \
} while(0)
int perf_counters_discovered;
int NUM_PERF_COUNTERS;
unsigned long PERF_COUNTERS_MASK;
int NUM_FIXED_PERF_COUNTERS;
unsigned long FIXED_PERF_COUNTERS_MASK;
int perf_counters_discovered = 0;
int X86_IA32_NUM_PERF_COUNTERS = 0;
unsigned long X86_IA32_PERF_COUNTERS_MASK = 0;
int X86_IA32_NUM_FIXED_PERF_COUNTERS = 0;
unsigned long X86_IA32_FIXED_PERF_COUNTERS_MASK = 0;
void x86_init_perfctr(void)
{
@@ -74,17 +78,17 @@ void x86_init_perfctr(void)
op = 0x0a;
asm volatile("cpuid" : "=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"a"(op));
NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
PERF_COUNTERS_MASK = (1 << NUM_PERF_COUNTERS) - 1;
X86_IA32_NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
X86_IA32_PERF_COUNTERS_MASK = (1 << X86_IA32_NUM_PERF_COUNTERS) - 1;
NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
FIXED_PERF_COUNTERS_MASK =
((1UL << NUM_FIXED_PERF_COUNTERS) - 1) <<
BASE_FIXED_PERF_COUNTERS;
X86_IA32_NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
X86_IA32_FIXED_PERF_COUNTERS_MASK =
((1UL << X86_IA32_NUM_FIXED_PERF_COUNTERS) - 1) <<
X86_IA32_BASE_FIXED_PERF_COUNTERS;
perf_counters_discovered = 1;
kprintf("NUM_PERF_COUNTERS: %d, NUM_FIXED_PERF_COUNTERS: %d\n",
NUM_PERF_COUNTERS, NUM_FIXED_PERF_COUNTERS);
kprintf("X86_IA32_NUM_PERF_COUNTERS: %d, X86_IA32_NUM_FIXED_PERF_COUNTERS: %d\n",
X86_IA32_NUM_PERF_COUNTERS, X86_IA32_NUM_FIXED_PERF_COUNTERS);
}
/* Clear Fixed Counter Control */
@@ -93,20 +97,20 @@ void x86_init_perfctr(void)
wrmsr(MSR_PERF_FIXED_CTRL, value);
/* Clear Generic Counter Control */
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
wrmsr(MSR_IA32_PERFEVTSEL0 + i, 0);
}
/* Enable PMC Control */
value = rdmsr(MSR_PERF_GLOBAL_CTRL);
value |= PERF_COUNTERS_MASK;
value |= FIXED_PERF_COUNTERS_MASK;
value |= X86_IA32_PERF_COUNTERS_MASK;
value |= X86_IA32_FIXED_PERF_COUNTERS_MASK;
wrmsr(MSR_PERF_GLOBAL_CTRL, value);
}
static int set_perfctr_x86_direct(int counter, int mode, unsigned int value)
{
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
return -EINVAL;
}
@@ -145,14 +149,13 @@ static int set_pmc_x86_direct(int counter, long val)
val &= 0x000000ffffffffff; // 40bit Mask
cnt_bit = 1UL << counter;
if (cnt_bit & PERF_COUNTERS_MASK) {
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
// set generic pmc
wrmsr(MSR_IA32_PMC0 + counter, val);
}
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
// set fixed pmc
wrmsr(MSR_IA32_FIXED_CTR0 +
counter - BASE_FIXED_PERF_COUNTERS, val);
wrmsr(MSR_IA32_FIXED_CTR0 + counter - X86_IA32_BASE_FIXED_PERF_COUNTERS, val);
}
else {
return -EINVAL;
@@ -172,10 +175,10 @@ static int set_fixed_counter(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@@ -205,13 +208,14 @@ int ihk_mc_perfctr_init_raw(int counter, uint64_t config, int mode)
int ihk_mc_perfctr_init_raw(int counter, unsigned int code, int mode)
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
{
#ifdef POSTK_DEBUG_TEMP_FIX_31
// PAPI_REF_CYC counted by fixed counter
if (counter >= BASE_FIXED_PERF_COUNTERS &&
counter < BASE_FIXED_PERF_COUNTERS + NUM_FIXED_PERF_COUNTERS) {
if (counter >= X86_IA32_BASE_FIXED_PERF_COUNTERS) {
return ihk_mc_perfctr_fixed_init(counter, mode);
}
#endif /*POSTK_DEBUG_TEMP_FIX_31*/
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
return -EINVAL;
}
@@ -244,7 +248,7 @@ int ihk_mc_perfctr_init(int counter, enum ihk_perfctr_type type, int mode)
}
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
return -EINVAL;
}
if (type < 0 || type >= PERFCTR_MAX_TYPE) {
@@ -296,11 +300,18 @@ int ihk_mc_perfctr_set_extra(struct mc_perf_event *event)
extern void x86_march_perfctr_start(unsigned long counter_mask);
#endif
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_start(int counter)
#else
int ihk_mc_perfctr_start(unsigned long counter_mask)
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
{
int ret = 0;
unsigned long value = 0;
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@@ -317,11 +328,18 @@ int ihk_mc_perfctr_start(unsigned long counter_mask)
goto fn_exit;
}
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_stop(int counter)
#else
int ihk_mc_perfctr_stop(unsigned long counter_mask)
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
{
int ret = 0;
unsigned long value;
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@@ -358,10 +376,10 @@ int ihk_mc_perfctr_fixed_init(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@@ -402,7 +420,7 @@ int ihk_mc_perfctr_read_mask(unsigned long counter_mask, unsigned long *value)
{
int i, j;
for (i = 0, j = 0; i < NUM_PERF_COUNTERS && counter_mask;
for (i = 0, j = 0; i < X86_IA32_NUM_PERF_COUNTERS && counter_mask;
i++, counter_mask >>= 1) {
if (counter_mask & 1) {
value[j++] = rdpmc(i);
@@ -422,14 +440,13 @@ unsigned long ihk_mc_perfctr_read(int counter)
cnt_bit = 1UL << counter;
if (cnt_bit & PERF_COUNTERS_MASK) {
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
// read generic pmc
retval = rdpmc(counter);
}
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
// read fixed pmc
retval = rdpmc((1 << 30) +
(counter - BASE_FIXED_PERF_COUNTERS));
retval = rdpmc((1 << 30) + (counter - X86_IA32_BASE_FIXED_PERF_COUNTERS));
}
else {
retval = -EINVAL;
@@ -451,12 +468,12 @@ unsigned long ihk_mc_perfctr_read_msr(int counter)
cnt_bit = 1UL << counter;
if (cnt_bit & PERF_COUNTERS_MASK) {
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
// read generic pmc
idx = MSR_IA32_PMC0 + counter;
retval = (unsigned long) rdmsr(idx);
}
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
// read fixed pmc
idx = MSR_IA32_FIXED_CTR0 + counter;
retval = (unsigned long) rdmsr(idx);
@@ -489,8 +506,8 @@ int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config, unsi
}
// find avail generic counter
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
if (!(pmc_status & (1 << i))) {
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
if(!(pmc_status & (1 << i))) {
ret = i;
break;
}

View File

@@ -31,11 +31,12 @@
#include <page.h>
#include <limits.h>
#include <syscall.h>
#include <debug.h>
void terminate_mcexec(int, int);
extern long do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact);
long syscall(int num, ihk_mc_user_context_t *ctx);
void set_signal(int sig, void *regs0, siginfo_t *info);
void check_signal(unsigned long rc, void *regs0, int num);
extern unsigned long do_fork(int, unsigned long, unsigned long, unsigned long,
unsigned long, unsigned long, unsigned long);
extern int get_xsave_size();
@@ -44,8 +45,11 @@ extern uint64_t get_xsave_mask();
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
uintptr_t debug_constants[] = {
@@ -88,45 +92,33 @@ static ptrdiff_t vdso_offset;
extern int num_processors;
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last) {
int obtain_clone_cpuid(cpu_set_t *cpu_set) {
int min_queue_len = -1;
int cpu, min_cpu = -1, uti_cpu = -1;
unsigned long irqstate;
irqstate = ihk_mc_spinlock_lock(&runq_reservation_lock);
int cpu, min_cpu = -1;
/* Find the first allowed core with the shortest run queue */
for (cpu = 0; cpu < num_processors; ++cpu) {
struct cpu_local_var *v;
unsigned long irqstate;
if (!CPU_ISSET(cpu, cpu_set)) continue;
v = get_cpu_local_var(cpu);
ihk_mc_spinlock_lock_noirq(&v->runq_lock);
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved);
if (min_queue_len == -1 || v->runq_len + v->runq_reserved < min_queue_len) {
min_queue_len = v->runq_len + v->runq_reserved;
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
if (min_queue_len == -1 || v->runq_len < min_queue_len) {
min_queue_len = v->runq_len;
min_cpu = cpu;
}
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* Record the last tie CPU */
if (min_cpu != cpu && v->runq_len + v->runq_reserved == min_queue_len) {
uti_cpu = cpu;
}
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d,min_cpu=%d,uti_cpu=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved, min_cpu, uti_cpu);
ihk_mc_spinlock_unlock_noirq(&v->runq_lock);
#if 0
if (min_queue_len == 0)
break;
#endif
}
min_cpu = use_last ? uti_cpu : min_cpu;
if (min_cpu != -1) {
if (get_cpu_local_var(min_cpu)->status != CPU_STATUS_RESERVED)
get_cpu_local_var(min_cpu)->status = CPU_STATUS_RESERVED;
__sync_fetch_and_add(&get_cpu_local_var(min_cpu)->runq_reserved, 1);
}
ihk_mc_spinlock_unlock(&runq_reservation_lock, irqstate);
return min_cpu;
}
@@ -259,7 +251,7 @@ SYSCALL_DECLARE(rt_sigreturn)
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, -1);
check_signal(0, regs, 0);
}
if(ksigsp.fpregs && xsavesize){
@@ -284,6 +276,7 @@ SYSCALL_DECLARE(rt_sigreturn)
}
extern struct cpu_local_var *clv;
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern void interrupt_syscall(struct thread *, int sig);
extern void terminate(int, int);
extern int num_processors;
@@ -537,32 +530,23 @@ void ptrace_report_signal(struct thread *thread, int sig)
dkprintf("ptrace_report_signal, tid=%d, pid=%d\n", thread->tid, thread->proc->pid);
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
if (!(thread->ptrace & PT_TRACED)) {
if(!(proc->ptrace & PT_TRACED)){
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
return;
}
/* Transition thread state */
thread->exit_status = sig;
/* Transition thread state */
proc->status = PS_TRACED;
thread->status = PS_TRACED;
thread->ptrace &= ~PT_TRACE_SYSCALL;
save_debugreg(thread->ptrace_debugreg);
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
thread->signal_flags |= SIGNAL_STOP_STOPPED;
}
else {
thread->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
if (thread == proc->main_thread) {
proc->status = PS_DELAY_TRACED;
parent_pid = proc->parent->pid;
}
else {
parent_pid = thread->report_proc->pid;
waitq_wakeup(&thread->report_proc->waitpid_q);
sig == SIGTTIN || sig == SIGTTOU) {
proc->signal_flags |= SIGNAL_STOP_STOPPED;
} else {
proc->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
parent_pid = proc->parent->pid;
save_debugreg(thread->ptrace_debugreg);
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
memset(&info, '\0', sizeof info);
@@ -571,6 +555,8 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */
@@ -583,8 +569,9 @@ ptrace_arch_prctl(int pid, long code, long addr)
{
long rc = -EIO;
struct thread *child;
struct mcs_rwlock_node_irqsave lock;
child = find_thread(pid, pid);
child = find_thread(pid, pid, &lock);
if (!child)
return -ESRCH;
if (child->proc->status & (PS_TRACED | PS_STOPPED)) {
@@ -626,7 +613,7 @@ ptrace_arch_prctl(int pid, long code, long addr)
break;
}
}
thread_unlock(child);
thread_unlock(child, &lock);
return rc;
}
@@ -648,13 +635,11 @@ arch_ptrace(long request, int pid, long addr, long data)
static int
isrestart(int num, unsigned long rc, int sig, int restart)
{
if (sig == SIGKILL || sig == SIGSTOP)
if(sig == SIGKILL || sig == SIGSTOP)
return 0;
if (num < 0 || rc != -EINTR)
if(num == 0 || rc != -EINTR)
return 0;
if (sig == SIGCHLD)
return 1;
switch (num) {
switch(num){
case __NR_pause:
case __NR_rt_sigsuspend:
case __NR_rt_sigtimedwait:
@@ -675,12 +660,14 @@ isrestart(int num, unsigned long rc, int sig, int restart)
case __NR_io_getevents:
return 0;
}
if (restart)
if(sig == SIGCHLD)
return 1;
if(restart)
return 1;
return 0;
}
int
void
do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pending *pending, int num)
{
struct x86_user_context *regs = regs0;
@@ -692,15 +679,14 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
int ptraceflag = 0;
struct mcs_rwlock_node_irqsave lock;
struct mcs_rwlock_node_irqsave mcs_rw_node;
int restart = 0;
for(w = pending->sigmask.__val[0], sig = 0; w; sig++, w >>= 1);
dkprintf("do_signal(): tid=%d, pid=%d, sig=%d\n", thread->tid, proc->pid, sig);
orgsig = sig;
if ((thread->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
if((proc->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
ptraceflag = 1;
sig = SIGSTOP;
}
@@ -721,7 +707,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
if(k->sa.sa_handler == SIG_IGN){
kfree(pending);
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
goto out;
return;
}
else if(k->sa.sa_handler){
unsigned long *usp; /* user stack */
@@ -771,8 +757,9 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
memcpy(&ksigsp.sigstack, &thread->sigstack, sizeof(stack_t));
ksigsp.sigrc = rc;
ksigsp.num = num;
restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
ksigsp.restart = restart;
ksigsp.restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
if(num != 0 && rc == -EINTR && sig == SIGCHLD)
ksigsp.restart = 1;
if(xsavesize){
uint64_t xsave_mask = get_xsave_mask();
unsigned int low = (unsigned int)xsave_mask;
@@ -785,7 +772,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,no space available\n");
terminate(0, sig);
goto out;
return;
}
kfpregs = (void *)((((unsigned long)_kfpregs) + 63) & ~63);
memset(kfpregs, '\0', xsavesize);
@@ -795,7 +782,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
goto out;
return;
}
ksigsp.fpregs = (void *)fpregs;
kfree(_kfpregs);
@@ -807,7 +794,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
goto out;
return;
}
usp = (unsigned long *)sigsp;
@@ -837,13 +824,12 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, -1);
check_signal(0, regs, 0);
}
}
else {
int coredumped = 0;
siginfo_t info;
int ptc = pending->ptracecont;
if(ptraceflag){
if(thread->ptrace_recvsig)
@@ -870,37 +856,25 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = CLD_STOPPED;
info._sifields._sigchld.si_pid = thread->proc->pid;
info._sifields._sigchld.si_status = (sig << 8) | 0x7f;
if (ptc == 2 &&
thread != thread->proc->main_thread) {
thread->signal_flags =
SIGNAL_STOP_STOPPED;
thread->status = PS_STOPPED;
thread->exit_status = SIGSTOP;
do_kill(thread,
thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(
&thread->report_proc->waitpid_q);
}
else {
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(
&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
do_kill(cpu_local_var(current), thread->proc->parent->pid, -1, SIGCHLD, &info, 0);
dkprintf("do_signal,SIGSTOP,changing state\n");
/* Reap and set new signal_flags */
proc->main_thread->signal_flags =
SIGNAL_STOP_STOPPED;
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
proc->status = PS_DELAY_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(
&proc->update_lock, &lock);
/* Reap and set new signal_flags */
proc->signal_flags = SIGNAL_STOP_STOPPED;
do_kill(thread,
thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
proc->status = PS_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("do_signal(): pid: %d, tid: %d SIGSTOP, sleeping\n",
proc->pid, thread->tid);
/* Sleep */
schedule();
dkprintf("SIGSTOP(): woken up\n");
@@ -908,28 +882,19 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
case SIGTRAP:
dkprintf("do_signal,SIGTRAP\n");
if (!(thread->ptrace & PT_TRACED)) {
if(!(proc->ptrace & PT_TRACED)) {
goto core;
}
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
thread->exit_status = SIGTRAP;
proc->status = PS_TRACED;
thread->status = PS_TRACED;
if (thread == proc->main_thread) {
mcs_rwlock_writer_lock(&proc->update_lock,
&lock);
proc->group_exit_status = SIGTRAP;
proc->status = PS_DELAY_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock,
&lock);
do_kill(thread, thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
else {
do_kill(thread, thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(&thread->report_proc->waitpid_q);
}
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&thread->proc->parent->waitpid_q);
/* Sleep */
dkprintf("do_signal,SIGTRAP,sleeping\n");
@@ -944,7 +909,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info._sifields._sigchld.si_pid = proc->pid;
info._sifields._sigchld.si_status = 0x0000ffff;
do_kill(cpu_local_var(current), proc->parent->pid, -1, SIGCHLD, &info, 0);
proc->main_thread->signal_flags = SIGNAL_STOP_CONTINUED;
proc->signal_flags = SIGNAL_STOP_CONTINUED;
proc->status = PS_RUNNING;
dkprintf("do_signal,SIGCONT,do nothing\n");
break;
@@ -973,8 +938,6 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
}
}
out:
return restart;
}
static struct sig_pending *
@@ -994,12 +957,10 @@ getsigpending(struct thread *thread, int delflag){
lock = &thread->sigcommon->lock;
head = &thread->sigcommon->sigpending;
for(;;) {
if (delflag) {
if (delflag)
mcs_rwlock_writer_lock(lock, &mcs_rw_node);
}
else {
else
mcs_rwlock_reader_lock(lock, &mcs_rw_node);
}
list_for_each_entry_safe(pending, next, head, list){
for(x = pending->sigmask.__val[0], sig = 0; x; sig++, x >>= 1);
@@ -1012,23 +973,19 @@ getsigpending(struct thread *thread, int delflag){
if(delflag)
list_del(&pending->list);
if (delflag) {
if (delflag)
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
}
else {
else
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
return pending;
}
}
}
if (delflag) {
if (delflag)
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
}
else {
else
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
if(lock == &thread->sigpendinglock)
return NULL;
@@ -1043,11 +1000,6 @@ getsigpending(struct thread *thread, int delflag){
struct sig_pending *
hassigpending(struct thread *thread)
{
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
return NULL;
}
return getsigpending(thread, 0);
}
@@ -1065,12 +1017,6 @@ void save_syscall_return_value(int num, unsigned long rc)
return;
}
/** \brief check arrived signals and processing
*
* @param rc return value of syscall
* @param regs0 context
* @param num syscall number (-1: Not called on exiting system call)
*/
void
check_signal(unsigned long rc, void *regs0, int num)
{
@@ -1104,11 +1050,6 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
goto out;
}
for(;;){
pending = getsigpending(thread, 1);
if(!pending) {
@@ -1116,9 +1057,7 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
if (do_signal(rc, regs, thread, pending, num)) {
num = -1;
}
do_signal(rc, regs, thread, pending, num);
}
out:
@@ -1198,7 +1137,7 @@ check_sig_pending_thread(struct thread *thread)
}
void
check_sig_pending(void)
check_sig_pending()
{
struct thread *thread;
struct cpu_local_var *v;
@@ -1219,7 +1158,7 @@ repeat:
continue;
}
if (thread->proc->group_exit_status & 0x0000000100000000L) {
if (thread->proc->exit_status & 0x0000000100000000L) {
continue;
}
@@ -1428,8 +1367,7 @@ done:
return 0;
}
/* Forward signal to Linux by interrupt_syscall mechanism */
if (tthread->uti_state == UTI_STATE_RUNNING_IN_LINUX) {
if (tthread->thread_offloaded) {
if (!tthread->proc->nohost) {
interrupt_syscall(tthread, sig);
}
@@ -1446,10 +1384,10 @@ done:
in check_signal */
rc = 0;
k = tthread->sigcommon->action + sig - 1;
if ((sig != SIGKILL && (tthread->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))) {
if((sig != SIGKILL && (tproc->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))){
struct sig_pending *pending = NULL;
if (sig < 33) { // SIGRTMIN - SIGRTMAX
list_for_each_entry(pending, head, list){
@@ -1533,7 +1471,7 @@ set_signal(int sig, void *regs0, siginfo_t *info)
SYSCALL_DECLARE(mmap)
{
const unsigned int supported_flags = 0
const int supported_flags = 0
| MAP_SHARED // 01
| MAP_PRIVATE // 02
| MAP_FIXED // 10
@@ -1541,7 +1479,7 @@ SYSCALL_DECLARE(mmap)
| MAP_LOCKED // 2000
| MAP_POPULATE // 8000
| MAP_HUGETLB // 00040000
| (0x3FU << MAP_HUGE_SHIFT) // FC000000
| (0x3F << MAP_HUGE_SHIFT) // FC000000
;
const int ignored_flags = 0
#ifdef USE_NOCACHE_MMAP
@@ -1560,7 +1498,7 @@ SYSCALL_DECLARE(mmap)
| MAP_NONBLOCK // 00010000
;
const uintptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const intptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const size_t len0 = ihk_mc_syscall_arg1(ctx);
const int prot = ihk_mc_syscall_arg2(ctx);
const int flags0 = ihk_mc_syscall_arg3(ctx);
@@ -1569,7 +1507,7 @@ SYSCALL_DECLARE(mmap)
struct thread *thread = cpu_local_var(current);
struct vm_regions *region = &thread->vm->region;
int error;
uintptr_t addr = 0;
intptr_t addr = 0;
size_t len;
int flags = flags0;
size_t pgsize;
@@ -1632,9 +1570,8 @@ SYSCALL_DECLARE(mmap)
goto out;
}
if (addr < region->user_start
|| region->user_end <= addr
|| len > (region->user_end - region->user_start)) {
if ((flags & MAP_FIXED) && ((addr < region->user_start)
|| (region->user_end <= addr))) {
ekprintf("sys_mmap(%lx,%lx,%x,%x,%x,%lx):ENOMEM\n",
addr0, len0, prot, flags0, fd, off0);
error = -ENOMEM;
@@ -1761,11 +1698,6 @@ SYSCALL_DECLARE(arch_prctl)
ihk_mc_syscall_arg1(ctx));
}
SYSCALL_DECLARE(time)
{
return time();
}
static int vdso_get_vdso_info(void)
{
int error;
@@ -2148,7 +2080,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)local_iov,
(uintptr_t)(local_iov + liovcnt));
(uintptr_t)(local_iov + liovcnt * sizeof(struct iovec)));
if (!range) {
ret = -EFAULT;
@@ -2157,7 +2089,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)remote_iov,
(uintptr_t)(remote_iov + riovcnt));
(uintptr_t)(remote_iov + riovcnt * sizeof(struct iovec)));
if (!range) {
ret = -EFAULT;
@@ -2433,6 +2365,8 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
@@ -2452,38 +2386,41 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
case 1:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 1:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
mpsr->nodes_ready = 1;
break;
default:
break;
}
}
else if (nr_cpus >= 4 && nr_cpus < 7) {
else if (nr_cpus >= 4 && nr_cpus < 8) {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
break;
case 1:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 2:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 2:
case 3:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 3:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
@@ -2493,7 +2430,7 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
break;
}
}
else {
else if (nr_cpus >= 8) {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
@@ -2505,23 +2442,28 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * (count / 2));
break;
case 2:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 3:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 3:
case 4:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
break;
case 4:
case 5:
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 5:
case 6:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
break;
case 6:
case 7:
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
break;
default:
break;
}
@@ -2729,19 +2671,11 @@ out:
time_t time(void) {
struct syscall_request sreq IHK_DMA_ALIGN;
struct timespec ats;
time_t ret = 0;
if (gettime_local_support) {
calculate_time_from_tsc(&ats);
ret = ats.tv_sec;
}
else {
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id());
}
struct thread *thread = cpu_local_var(current);
time_t ret;
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id(), thread->proc->pid);
return ret;
}

View File

@@ -31,6 +31,51 @@ struct tod_data_s tod_data
.version = IHK_ATOMIC64_INIT(0),
};
static inline void cpu_pause_for_vsyscall(void)
{
asm volatile ("pause" ::: "memory");
return;
} /* cpu_pause_for_vsyscall() */
static inline void calculate_time_from_tsc(struct timespec *ts)
{
long ver;
unsigned long current_tsc;
__time_t sec_delta;
long ns_delta;
for (;;) {
while ((ver = ihk_atomic64_read(&tod_data.version)) & 1) {
/* settimeofday() is in progress */
cpu_pause_for_vsyscall();
}
rmb();
*ts = tod_data.origin;
rmb();
if (ver == ihk_atomic64_read(&tod_data.version)) {
break;
}
/* settimeofday() has intervened */
cpu_pause_for_vsyscall();
}
current_tsc = rdtsc();
sec_delta = current_tsc / tod_data.clocks_per_sec;
ns_delta = NS_PER_SEC * (current_tsc % tod_data.clocks_per_sec)
/ tod_data.clocks_per_sec;
/* calc. of ns_delta overflows if clocks_per_sec exceeds 18.44 GHz */
ts->tv_sec += sec_delta;
ts->tv_nsec += ns_delta;
if (ts->tv_nsec >= NS_PER_SEC) {
ts->tv_nsec -= NS_PER_SEC;
++ts->tv_sec;
}
return;
} /* calculate_time_from_tsc() */
int vsyscall_gettimeofday(struct timeval *tv, void *tz)
{
int error;

View File

@@ -43,8 +43,7 @@ error_exit() {
;;
esac
# Retun -EINVAL
exit -22
exit 1
}
fi
@@ -145,5 +144,3 @@ for cpuid in `find /sys/bus/cpu/devices/* -maxdepth 0 -name "cpu[0123456789]*" -
rm -rf /tmp/mcos/mcos0_sys/bus/cpu/devices/$cpuid
fi
done
exit 0

View File

@@ -8,9 +8,6 @@ if grep mcoverlay /proc/modules &>/dev/null; then
if [ -e /tmp/mcos ]; then rm -rf /tmp/mcos; fi
if ! rmmod mcoverlay 2>/dev/null; then
echo "error: removing mcoverlay" >&2
# Return -EINVAL
exit -22
exit 1
fi
fi
exit 0

View File

@@ -12,7 +12,6 @@
# the same set of resources as it used previously.
# Note that the script does not output anything unless an error occurs.
ret=1
prefix="@prefix@"
BINDIR="${prefix}/bin"
SBINDIR="${prefix}/sbin"
@@ -50,7 +49,7 @@ umask_old=`umask`
idle_halt=""
allow_oversubscribe=""
while getopts stk:c:m:o:f:r:q:i:d:e:hO OPT
while getopts :stk:c:m:o:f:r:q:i:d:e:hO OPT
do
case ${OPT} in
f) facility=${OPTARG}
@@ -81,11 +80,14 @@ do
;;
O) allow_oversubscribe="allow_oversubscribe"
;;
\?) exit 1
;;
*) echo "invalid option -${OPT}" >&2
exit 1
esac
done
redirect_kmsg=0
turbo="turbo"
# Start ihkmond
pid=`pidof ihkmond`
if [ "${pid}" != "" ]; then
@@ -208,8 +210,7 @@ error_exit() {
;;
esac
# Propagate exit status if any
exit $ret
exit 1
}
ihk_ikc_irq_core=0
@@ -235,7 +236,7 @@ if [ "${ENABLE_MCOVERLAYFS}" == "yes" ]; then
enable_mcoverlay="yes"
fi
else
if [ ${linux_version_code} -eq 199168 -a ${rhel_release} -ge 327 -a ${rhel_release} -le 862 ]; then
if [ ${linux_version_code} -eq 199168 -a ${rhel_release} -ge 327 -a ${rhel_release} -le 693 ]; then
enable_mcoverlay="yes"
fi
if [ ${linux_version_code} -ge 262144 -a ${linux_version_code} -lt 262400 ]; then
@@ -260,11 +261,7 @@ fi
# Remove mcoverlay if loaded
if [ "$enable_mcoverlay" == "yes" ]; then
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
error_exit "initial"
fi
. ${SBINDIR}/mcoverlay-destroy.sh
fi
# Stop irqbalance
@@ -308,16 +305,25 @@ if ! grep -E 'ihk\s' /proc/modules &>/dev/null; then
fi
fi
# Increase swappiness so that we have better chance to allocate memory for IHK
echo 100 > /proc/sys/vm/swappiness
# Copy modules under /tmp to avoid loading from shared FS
if mkdir -p /tmp/mcos-kmod; then
cp ${KMODDIR}/* /tmp/mcos-kmod/
KMODDIR="/tmp/mcos-kmod/"
fi
# Drop Linux caches to free memory
sync && echo 3 > /proc/sys/vm/drop_caches
# Fujitsu drops caches for us in between jobs so don't do it on OFP
if [ "`hostname | grep "c[0-9][0-9][0-9][0-9].ofp"`" == "" ]; then
# Increase swappiness so that we have better chance to allocate memory for IHK
echo 100 > /proc/sys/vm/swappiness
# Merge free memory areas into large, physically contigous ones
echo 1 > /proc/sys/vm/compact_memory 2>/dev/null
# Drop Linux caches to free memory
sync && echo 3 > /proc/sys/vm/drop_caches
sync
# Merge free memory areas into large, physically contigous ones
echo 1 > /proc/sys/vm/compact_memory 2>/dev/null
sync
fi
# Load IHK-SMP if not loaded and reserve CPUs and memory
if ! grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then
@@ -338,41 +344,41 @@ if ! grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then
error_exit "ihk_loaded"
fi
# Offline-reonline RAM (special case for OFP SNC-4 flat mode)
if [ "`hostname | grep "c[0-9][0-9][0-9][0-9].ofp"`" != "" ] && [ "`cat /sys/devices/system/node/online`" == "0-7" ]; then
for i in 0 1 2 3; do
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 0 > $f 2>&1 > /dev/null;
done
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 1 > $f 2>&1 > /dev/null;
done
done
for i in 4 5 6 7; do
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 0 > $f 2>&1 > /dev/null;
done
done
for i in 4 5 6 7; do
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 1 > $f 2>&1 > /dev/null;
done
done
fi
# Offline-reonline RAM (special case for OFP Quadrant flat mode)
if [ "`hostname | grep "c[0-9][0-9][0-9][0-9].ofp"`" != "" ] && [ "`cat /sys/devices/system/node/online`" == "0-1" ]; then
for i in 1; do
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 0 > $f 2>&1 > /dev/null;
done
done
for i in 1; do
find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
echo 1 > $f 2>&1 > /dev/null;
done
done
fi
# # Offline-reonline RAM (special case for OFP SNC-4 flat mode)
# if [ "`hostname | grep "c[0-9][0-9][0-9][0-9].ofp"`" != "" ] && [ "`cat /sys/devices/system/node/online`" == "0-7" ]; then
# for i in 0 1 2 3; do
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 0 | tee $f 2>/dev/null 1>/dev/null
# done
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 1 | tee $f 2>/dev/null 1>/dev/null
# done
# done
# for i in 4 5 6 7; do
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 0 | tee $f 2>/dev/null 1>/dev/null
# done
# done
# for i in 4 5 6 7; do
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 1 | tee $f 2>/dev/null 1>/dev/null
# done
# done
# fi
#
# # Offline-reonline RAM (special case for OFP Quadrant flat mode)
# if [ "`hostname | grep "c[0-9][0-9][0-9][0-9].ofp"`" != "" ] && [ "`cat /sys/devices/system/node/online`" == "0-1" ]; then
# for i in 1; do
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 0 | tee $f 2>/dev/null 1>/dev/null
# done
# done
# for i in 1; do
# find /sys/devices/system/node/node$i/memory*/ -name "online" | while read f; do
# echo 1 | tee $f 2>/dev/null 1>/dev/null
# done
# done
# fi
if ! ${SBINDIR}/ihkconfig 0 reserve mem ${mem}; then
echo "error: reserving memory" >&2
@@ -467,11 +473,7 @@ fi
# Overlay /proc, /sys with McKernel specific contents
if [ "$enable_mcoverlay" == "yes" ]; then
${SBINDIR}/mcoverlay-create.sh
ret=$?
if [ $ret -ne 0 ]; then
error_exit "os_created"
fi
. ${SBINDIR}/mcoverlay-create.sh
fi
# Start irqbalance with CPUs and IRQ for McKernel banned

View File

@@ -100,13 +100,7 @@ if grep mcctrl /proc/modules &>/dev/null; then
fi
# Remove mcoverlay if loaded
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
echo "error: mcoverlay-destroy.sh" >&2
exit $ret
fi
. ${SBINDIR}/mcoverlay-destroy.sh
# Remove SMP module
if grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then

View File

@@ -0,0 +1,60 @@
.\" Man page for mpimcexec
.\"
.TH MPIMCEXEC 1 "@MCKERNEL_RELEASE_DATE@" "Version @MCKERNEL_VERSION@" MCKERNEL @MCKERNEL_VERSION@"
.SH NAME
mpimcexec \- run an MPI application on McKernel
.\"
.\" ---------------------------- SYNOPSIS ----------------------------
.SH SYNOPSIS
.B mpimcexec \fR [\fIoptions\fR] \fI<command>\fR
.\" ---------------------------- DESCRIPTION ----------------------------
.SH DESCRIPTION
mpimcexec is a wrapper script for running MPI applications on McKernel.
It internally calls mpiexec to spawn mcexec on compute nodes, which in
turn runs \fI<command>\fR on McKernel. mpimcexec specifies a number of
mcexec arguments that enable high performance execution.
.\" ---------------------------- OPTIONS ----------------------------
.SH OPTIONS
.TP
.B -ppn N, --ppn N, --ranks-per-node N
Specify the number of MPI ranks per node. This argument is required.
.TP
.B -n N, --n N, --ranks N
Specify the number of total MPI ranks.
e.g.,
$ mpimcexec -n 32 -ppn 4 ./a.out
.br
In the above example, 32 MPI processes are invoked
on eight compute nodes each of which has four processes.
.TP
.B --nodes N
Specify the number of compute nodes.
By default, all nodes, specified by "PJM --mpi proc" option, are used.
.TP
.B --env, -env
Pass an additional environment variable
.TP
.B -m N, --numa N
Specify preferred NUMA node.
.TP
.B -h <file name>, ---hostfile <file name>
Specify a host file for MPI.
.TP
.B --help
Show help message.
.PP
.\" ---------------------------- SEE ALSO ----------------------------
.SH SEE ALSO
\fBmcexec\fR (1), \fBmpiexec\fR (1)
.\" ---------------------------- AUTHORS ----------------------------
.SH AUTHORS
Copyright (C) 2018 McKernel Development Team, RIKEN, Japan

147
arch/x86_64/tools/mpimcexec.in Executable file
View File

@@ -0,0 +1,147 @@
#!/bin/bash
#
# OFP McKernel MPI wrapper script
# author: Balazs Gerofi <bgerofi@riken.jp>
# Copyright (C) 2018 RIKEN R-CCS
#
prefix="@prefix@"
BINDIR="${prefix}/bin"
if [ "${BASH_VERSINFO[0]}" -lt 4 ]; then
echo "You need at least bash-4.0 to run this script." >&2
exit 1
fi
RANKS=""
NODES=""
PPN=""
MPI_ENV=""
COMMAND=""
NUMA=""
HOSTFILE=""
if [ ! -z "${PJM_PROC_BY_NODE}" ]; then
PPN=${PJM_PROC_BY_NODE}
elif [ ! -z "${MPI_LOCALNRANKS}" ]; then
PPN=${MPI_LOCALNRANKS}
fi
help_exit() {
echo ""
echo "Spawn an McKernel MPI job on Oakforest-PACS."
echo "usage: `basename $0` -ppn ranks_per_node [--nodes nodes] [-n ranks] [--env additional_environment]... command"
echo ""
echo " -ppn | --ppn | --ranks-per-node Number of MPI ranks per node (required)"
echo " -n | --n | --ranks Total number of MPI ranks in the job"
echo " --nodes Number of nodes to be used"
echo " --env | -env Pass an additional environment variable"
echo " -m | --numa Preferred NUMA node(s)"
echo " -h | --hostfile Host file for MPI"
echo " --help Show help message"
exit 1
}
# Parse options
while true; do
case $1 in
-ppn | --ppn | --ranks-per-node )
if [ $# -lt 2 ]; then
echo "error: needs an interger value for -ppn, --ppn, or --ranks-per-node option"
help_exit
fi
PPN=$2
shift 2
;;
-n | --n | --ranks )
if [ $# -lt 2 ]; then
echo "error: needs an interger value for -n, --n, or --ranks option"
help_exit
fi
RANKS=$2
shift 2
;;
-m | --numa )
if [ $# -lt 2 ]; then
echo "error: needs an interger value for -m or --numa option"
help_exit
fi
NUMA="-m $2"
shift 2
;;
--nodes )
if [ $# -lt 2 ]; then
echo "error: needs an interger value for --nodes option"
help_exit
fi
NODES=$2
shift 2
;;
--env | -env )
if [ $# -lt 2 ]; then
echo "error: needs an environment variable name for -env or --env option"
help_exit
fi
if [ -z "`echo $2 | grep I_MPI_PIN`" ]; then
MPI_ENV=`echo "${MPI_ENV} -env $2" | xargs`
fi
shift 2
;;
-h | --hostfile )
if [ $# -lt 2 ]; then
echo "error: needs a file name for -h or --hostfile option"
help_exit
fi
HOSTFILE="-hostfile $2"
shift 2
;;
--help )
help_exit
;;
* )
COMMAND=$@
break
;;
esac
done
if [ -z ${PPN} ]; then
echo "error: please specify the number of ranks per node"
help_exit
fi
# Unless explicitly specified, use Fujitsu inherited value
if [ -z ${NODES} ]; then
NODES=${PJM_VNODES}
fi
if [ -z ${RANKS} ] && [ -z ${NODES} ]; then
echo "error: please specify the total number of ranks or the number of nodes"
help_exit
fi
if [ "x${COMMAND}" = "x" ]; then
echo "error: please specify command"
help_exit
fi
# Calculate total job size if not specified
if [ -z ${RANKS} ]; then
let RANKS=(${PPN}*${NODES})
fi
# Support direct SSH when not executed from Fujitsu job system
if [ -z ${PJM_VNODES} ]; then
HOSTFILE="-launcher-exec ssh ${HOSTFILE}"
fi
export I_MPI_PIN=off
export PSM2_RCVTHREAD=0
export HFI_NO_CPUAFFINITY=1
export I_MPI_COLL_INTRANODE_SHM_THRESHOLD=4194304
export PSM2_MQ_RNDV_HFI_WINDOW=4194304
export PSM2_MQ_EAGER_SDMA_SZ=65536
export PSM2_MQ_RNDV_HFI_THRESH=200000
mpirun ${HOSTFILE} -n ${RANKS} -ppn ${PPN} ${MPI_ENV} ${BINDIR}/mcexec -n ${PPN} ${NUMA} --enable-hfi1 --mpol-threshold=1M --stack-premap=4M,4G --extend-heap-by=8M --disable-sched-yield --mpol-shm-premap ${COMMAND}

View File

@@ -54,6 +54,48 @@
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H
/* Define to address of kernel symbol __vvar_page, or 0 if exported */
#undef MCCTRL_KSYM___vvar_page
/* Define to address of kernel symbol hpet_address, or 0 if exported */
#undef MCCTRL_KSYM_hpet_address
/* Define to address of kernel symbol hv_clock, or 0 if exported */
#undef MCCTRL_KSYM_hv_clock
/* Define to address of kernel symbol sys_mount, or 0 if exported */
#undef MCCTRL_KSYM_sys_mount
/* Define to address of kernel symbol sys_readlink, or 0 if exported */
#undef MCCTRL_KSYM_sys_readlink
/* Define to address of kernel symbol sys_umount, or 0 if exported */
#undef MCCTRL_KSYM_sys_umount
/* Define to address of kernel symbol sys_unshare, or 0 if exported */
#undef MCCTRL_KSYM_sys_unshare
/* Define to address of kernel symbol vdso_end, or 0 if exported */
#undef MCCTRL_KSYM_vdso_end
/* Define to address of kernel symbol vdso_image_64, or 0 if exported */
#undef MCCTRL_KSYM_vdso_image_64
/* Define to address of kernel symbol vdso_pages, or 0 if exported */
#undef MCCTRL_KSYM_vdso_pages
/* Define to address of kernel symbol vdso_spec, or 0 if exported */
#undef MCCTRL_KSYM_vdso_spec
/* Define to address of kernel symbol vdso_start, or 0 if exported */
#undef MCCTRL_KSYM_vdso_start
/* Define to address of kernel symbol walk_page_range, or 0 if exported */
#undef MCCTRL_KSYM_walk_page_range
/* Define to address of kernel symbol zap_page_range, or 0 if exported */
#undef MCCTRL_KSYM_zap_page_range
/* McKernel specific headers */
#undef MCKERNEL_INCDIR
@@ -86,6 +128,3 @@
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
/* whether or not syscall_intercept library is linked */
#undef WITH_SYSCALL_INTERCEPT

650
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for mckernel 1.6.0.
# Generated by GNU Autoconf 2.69 for mckernel 1.5.1-knl+hfi.
#
#
# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@@ -577,8 +577,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='mckernel'
PACKAGE_TARNAME='mckernel'
PACKAGE_VERSION='1.6.0'
PACKAGE_STRING='mckernel 1.6.0'
PACKAGE_VERSION='1.5.1-knl+hfi'
PACKAGE_STRING='mckernel 1.5.1-knl+hfi'
PACKAGE_BUGREPORT=''
PACKAGE_URL=''
@@ -628,12 +628,9 @@ IHK_RELEASE_DATE
DCFA_VERSION
MCKERNEL_VERSION
IHK_VERSION
WITH_SYSCALL_INTERCEPT
ENABLE_QLMPI
ENABLE_RUSAGE
ENABLE_MCOVERLAYFS
LDFLAGS_SYSCALL_INTERCEPT
CPPFLAGS_SYSCALL_INTERCEPT
MANDIR
KERNDIR
KMODDIR
@@ -705,9 +702,6 @@ enable_option_checking
with_mpi
with_mpi_include
with_mpi_lib
with_syscall_intercept
with_syscall_intercept_include
with_syscall_intercept_lib
with_kernelsrc
with_target
with_system_map
@@ -1268,7 +1262,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures mckernel 1.6.0 to adapt to many kinds of systems.
\`configure' configures mckernel 1.5.1-knl+hfi to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1329,7 +1323,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of mckernel 1.6.0:";;
short | recursive ) echo "Configuration of mckernel 1.5.1-knl+hfi:";;
esac
cat <<\_ACEOF
@@ -1352,15 +1346,6 @@ Optional Packages:
--with-mpi-include=PATH specify path where mpi include directory can be
found
--with-mpi-lib=PATH specify path where mpi lib directory can be found
--with-syscall_intercept=PATH
specify path where syscall_intercept include
directory and lib directory can be found
--with-syscall_intercept-include=PATH
specify path where syscall_intercept include
directory can be found
--with-syscall_intercept-lib=PATH
specify path where syscall_intercept lib directory
can be found
--with-kernelsrc=path Path to 'kernel src', default is
/lib/modules/uname_r/build
--with-target={attached-mic | builtin-mic | builtin-x86 | smp-x86}
@@ -1446,7 +1431,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
mckernel configure 1.6.0
mckernel configure 1.5.1-knl+hfi
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1744,7 +1729,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by mckernel $as_me 1.6.0, which was
It was created by mckernel $as_me 1.5.1-knl+hfi, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -2097,13 +2082,11 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
IHK_VERSION=1.6.0
MCKERNEL_VERSION=1.6.0
IHK_VERSION=1.5.1-knl+hfi
MCKERNEL_VERSION=1.5.1-knl+hfi
DCFA_VERSION=DCFA_VERSION_m4
IHK_RELEASE_DATE=2018-11-11
MCKERNEL_RELEASE_DATE=2018-11-11
IHK_RELEASE_DATE=2019-05-14
MCKERNEL_RELEASE_DATE=2019-05-14
DCFA_RELEASE_DATE=DCFA_RELEASE_DATE_m4
@@ -3530,195 +3513,6 @@ fi
# Check whether --with-syscall_intercept was given.
if test "${with_syscall_intercept+set}" = set; then :
withval=$with_syscall_intercept; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept=PATH expects a valid PATH" >&2;}
with_syscall_intercept="" ;; #(
*) :
;;
esac
else
with_syscall_intercept=
fi
# Check whether --with-syscall_intercept-include was given.
if test "${with_syscall_intercept_include+set}" = set; then :
withval=$with_syscall_intercept_include; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept-include=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept-include=PATH expects a valid PATH" >&2;}
with_syscall_intercept_include="" ;; #(
*) :
;;
esac
fi
# Check whether --with-syscall_intercept-lib was given.
if test "${with_syscall_intercept_lib+set}" = set; then :
withval=$with_syscall_intercept_lib; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept-lib=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept-lib=PATH expects a valid PATH" >&2;}
with_syscall_intercept_lib="" ;; #(
*) :
;;
esac
fi
# The args have been sanitized into empty/non-empty values above.
# Now append -I/-L args to CPPFLAGS/LDFLAGS, with more specific options
# taking priority
if test -n "${with_syscall_intercept_include}"; then :
if echo "$CPPFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-I${with_syscall_intercept_include}\>" >/dev/null 2>&1; then :
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') contains '-I${with_syscall_intercept_include}', not appending" >&5
else
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') does not contain '-I${with_syscall_intercept_include}', appending" >&5
CPPFLAGS_SYSCALL_INTERCEPT="$CPPFLAGS_SYSCALL_INTERCEPT -I${with_syscall_intercept_include}"
fi
else
if test -n "${with_syscall_intercept}"; then :
if echo "$CPPFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-I${with_syscall_intercept}/include\>" >/dev/null 2>&1; then :
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') contains '-I${with_syscall_intercept}/include', not appending" >&5
else
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') does not contain '-I${with_syscall_intercept}/include', appending" >&5
CPPFLAGS_SYSCALL_INTERCEPT="$CPPFLAGS_SYSCALL_INTERCEPT -I${with_syscall_intercept}/include"
fi
fi
fi
if test -n "${with_syscall_intercept_lib}"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}"
fi
else
if test -n "${with_syscall_intercept}"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib"
fi
if test -d "${with_syscall_intercept}/lib64"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64"
fi
fi
fi
fi
if test -n "${with_syscall_intercept}" || test -n "${with_syscall_intercept_include}" || test -n "${with_syscall_intercept_lib}"; then :
WITH_SYSCALL_INTERCEPT=yes
else
WITH_SYSCALL_INTERCEPT=no
fi
if test "x$WITH_SYSCALL_INTERCEPT" == "xno" ; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for syscall_no_intercept in -lsyscall_intercept" >&5
$as_echo_n "checking for syscall_no_intercept in -lsyscall_intercept... " >&6; }
if ${ac_cv_lib_syscall_intercept_syscall_no_intercept+:} false; then :
$as_echo_n "(cached) " >&6
else
ac_check_lib_save_LIBS=$LIBS
LIBS="-lsyscall_intercept -lcapstone -ldl $LIBS"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
/* Override any GCC internal prototype to avoid an error.
Use char because int might match the return type of a GCC
builtin and then its argument prototype would still apply. */
#ifdef __cplusplus
extern "C"
#endif
char syscall_no_intercept ();
int
main ()
{
return syscall_no_intercept ();
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"; then :
ac_cv_lib_syscall_intercept_syscall_no_intercept=yes
else
ac_cv_lib_syscall_intercept_syscall_no_intercept=no
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
LIBS=$ac_check_lib_save_LIBS
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_syscall_intercept_syscall_no_intercept" >&5
$as_echo "$ac_cv_lib_syscall_intercept_syscall_no_intercept" >&6; }
if test "x$ac_cv_lib_syscall_intercept_syscall_no_intercept" = xyes; then :
syscall_intercept_lib_found=yes
else
syscall_intercept_lib_found=no
fi
if test "x$syscall_intercept_lib_found" != "xyes"; then :
{ $as_echo "$as_me:${as_lineno-$LINENO}: libsyscall_intercept.so not found" >&5
$as_echo "$as_me: libsyscall_intercept.so not found" >&6;}
fi
ac_fn_c_check_header_mongrel "$LINENO" "libsyscall_intercept_hook_point.h" "ac_cv_header_libsyscall_intercept_hook_point_h" "$ac_includes_default"
if test "x$ac_cv_header_libsyscall_intercept_hook_point_h" = xyes; then :
syscall_intercept_header_found=yes
else
syscall_intercept_header_found=no
fi
if test "x$syscall_intercept_header_found" != "xyes"; then :
{ $as_echo "$as_me:${as_lineno-$LINENO}: libsyscall_intercept_hook_point.h not found" >&5
$as_echo "$as_me: libsyscall_intercept_hook_point.h not found" >&6;}
fi
if test "x$syscall_intercept_lib_found" == "xyes" && test "x$syscall_intercept_header_found" == "xyes"; then :
WITH_SYSCALL_INTERCEPT=yes
else
WITH_SYSCALL_INTERCEPT=no
fi
fi
# Check whether --with-kernelsrc was given.
if test "${with_kernelsrc+set}" = set; then :
withval=$with_kernelsrc; WITH_KERNELSRC=$withval
@@ -4602,6 +4396,399 @@ KDIR="$WITH_KERNELSRC"
UNAME_R="$WITH_UNAME_R"
TARGET="$WITH_TARGET"
MCCTRL_LINUX_SYMTAB=""
case "X$WITH_SYSTEM_MAP" in
Xyes | Xno | X)
MCCTRL_LINUX_SYMTAB=""
;;
*)
MCCTRL_LINUX_SYMTAB="$WITH_SYSTEM_MAP"
;;
esac
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for System.map" >&5
$as_echo_n "checking for System.map... " >&6; }
if test -r "$MCCTRL_LINUX_SYMTAB"; then
MCCTRL_LINUX_SYMTAB="$MCCTRL_LINUX_SYMTAB"
elif test -r "/boot/System.map-`uname -r`"; then
MCCTRL_LINUX_SYMTAB="/boot/System.map-`uname -r`"
elif test -r "$KDIR/System.map"; then
MCCTRL_LINUX_SYMTAB="$KDIR/System.map"
fi
if test "$MCCTRL_LINUX_SYMTAB" == ""; then
as_fn_error $? "could not find" "$LINENO" 5
fi
if test -z "`eval cat $MCCTRL_LINUX_SYMTAB`"; then
as_fn_error $? "could not read System.map file, no read permission?" "$LINENO" 5
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $MCCTRL_LINUX_SYMTAB" >&5
$as_echo "$MCCTRL_LINUX_SYMTAB" >&6; }
MCCTRL_LINUX_SYMTAB_CMD="cat $MCCTRL_LINUX_SYMTAB"
# MCCTRL_FIND_KSYM(SYMBOL)
# ------------------------------------------------------
# Search System.map for address of the given symbol and
# do one of three things in config.h:
# If not found, leave MCCTRL_KSYM_foo undefined
# If found to be exported, "#define MCCTRL_KSYM_foo 0"
# If found not to be exported, "#define MCCTRL_KSYM_foo 0x<value>"
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_mount" >&5
$as_echo_n "checking System.map for symbol sys_mount... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_mount\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_mount\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_mount $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_umount" >&5
$as_echo_n "checking System.map for symbol sys_umount... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_umount\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_umount\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_umount $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_unshare" >&5
$as_echo_n "checking System.map for symbol sys_unshare... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_unshare\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_unshare\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_unshare $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol zap_page_range" >&5
$as_echo_n "checking System.map for symbol zap_page_range... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " zap_page_range\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_zap_page_range\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_zap_page_range $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_image_64" >&5
$as_echo_n "checking System.map for symbol vdso_image_64... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_image_64\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_image_64\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_image_64 $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_start" >&5
$as_echo_n "checking System.map for symbol vdso_start... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_start\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_start\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_start $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_end" >&5
$as_echo_n "checking System.map for symbol vdso_end... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_end\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_end\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_end $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_pages" >&5
$as_echo_n "checking System.map for symbol vdso_pages... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_pages\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_pages\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_pages $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol __vvar_page" >&5
$as_echo_n "checking System.map for symbol __vvar_page... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __vvar_page\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab___vvar_page\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM___vvar_page $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol hpet_address" >&5
$as_echo_n "checking System.map for symbol hpet_address... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " hpet_address\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_hpet_address\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_hpet_address $mcctrl_addr
_ACEOF
fi
# POSTK_DEBUG_ARCH_DEP_50, add:find kernel symbol.
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_spec" >&5
$as_echo_n "checking System.map for symbol vdso_spec... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_spec\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_spec\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_spec $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol hv_clock" >&5
$as_echo_n "checking System.map for symbol hv_clock... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " hv_clock\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_hv_clock\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_hv_clock $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_readlink" >&5
$as_echo_n "checking System.map for symbol sys_readlink... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_readlink\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_readlink\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_readlink $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol walk_page_range" >&5
$as_echo_n "checking System.map for symbol walk_page_range... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " walk_page_range\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_walk_page_range\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_walk_page_range $mcctrl_addr
_ACEOF
fi
case $ENABLE_MEMDUMP in
yes|no|auto)
;;
@@ -4799,17 +4986,6 @@ else
$as_echo "$as_me: perf is disabled" >&6;}
fi
if test "x$WITH_SYSCALL_INTERCEPT" = "xyes" ; then
$as_echo "#define WITH_SYSCALL_INTERCEPT 1" >>confdefs.h
{ $as_echo "$as_me:${as_lineno-$LINENO}: syscall_intercept library is linked" >&5
$as_echo "$as_me: syscall_intercept library is linked" >&6;}
else
{ $as_echo "$as_me:${as_lineno-$LINENO}: syscall_intercept library isn't linked" >&5
$as_echo "$as_me: syscall_intercept library isn't linked" >&6;}
fi
if test "x$MCKERNEL_INCDIR" != "x" ; then
cat >>confdefs.h <<_ACEOF
@@ -4876,9 +5052,6 @@ fi
@@ -4887,14 +5060,9 @@ ac_config_headers="$ac_config_headers config.h"
# POSTK_DEBUG_ARCH_DEP_37
# AC_CONFIG_FILES arch dependfiles separate
ac_config_files="$ac_config_files Makefile executer/user/Makefile executer/user/mcexec.1:executer/user/mcexec.1in executer/user/vmcore2mckdump executer/user/arch/$ARCH/Makefile executer/user/arch/x86_64/Makefile executer/kernel/mcctrl/Makefile executer/kernel/mcctrl/arch/$ARCH/Makefile executer/kernel/mcoverlayfs/Makefile executer/kernel/mcoverlayfs/linux-3.10.0-327.36.1.el7/Makefile executer/kernel/mcoverlayfs/linux-4.0.9/Makefile executer/kernel/mcoverlayfs/linux-4.6.7/Makefile executer/include/qlmpilib.h kernel/Makefile kernel/Makefile.build kernel/include/swapfmt.h arch/x86_64/tools/mcreboot-attached-mic.sh arch/x86_64/tools/mcshutdown-attached-mic.sh arch/x86_64/tools/mcreboot-builtin-x86.sh arch/x86_64/tools/mcreboot-smp-x86.sh arch/x86_64/tools/mcstop+release-smp-x86.sh arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh arch/x86_64/tools/mcoverlay-create-smp-x86.sh arch/x86_64/tools/eclair-dump-backtrace.exp arch/x86_64/tools/mcshutdown-builtin-x86.sh arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in arch/x86_64/tools/irqbalance_mck.service arch/x86_64/tools/irqbalance_mck.in tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in tools/mcstat/Makefile"
ac_config_files="$ac_config_files Makefile executer/user/Makefile executer/user/mcexec.1:executer/user/mcexec.1in executer/user/vmcore2mckdump executer/user/arch/$ARCH/Makefile executer/user/arch/x86_64/Makefile executer/kernel/mcctrl/Makefile executer/kernel/mcctrl/arch/$ARCH/Makefile executer/kernel/mcoverlayfs/Makefile executer/kernel/mcoverlayfs/linux-3.10.0-327.36.1.el7/Makefile executer/kernel/mcoverlayfs/linux-4.0.9/Makefile executer/kernel/mcoverlayfs/linux-4.6.7/Makefile executer/include/qlmpilib.h kernel/Makefile kernel/Makefile.build kernel/include/swapfmt.h arch/x86_64/tools/mcreboot-attached-mic.sh arch/x86_64/tools/mcshutdown-attached-mic.sh arch/x86_64/tools/mcreboot-builtin-x86.sh arch/x86_64/tools/mcreboot-smp-x86.sh arch/x86_64/tools/mcstop+release-smp-x86.sh arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh arch/x86_64/tools/mcoverlay-create-smp-x86.sh arch/x86_64/tools/eclair-dump-backtrace.exp arch/x86_64/tools/mcshutdown-builtin-x86.sh arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in arch/x86_64/tools/mpimcexec arch/x86_64/tools/mpimcexec.1:arch/x86_64/tools/mpimcexec.1in arch/x86_64/tools/irqbalance_mck.service arch/x86_64/tools/irqbalance_mck.in tools/mcstat/Makefile"
if test -e "${ABS_SRCDIR}/test"; then
ac_config_files="$ac_config_files mck_test_config.sample:test/mck_test_config.sample.in"
fi
if test "$TARGET" = "smp-x86"; then
ac_config_files="$ac_config_files arch/x86_64/kernel/Makefile.arch"
@@ -5417,7 +5585,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by mckernel $as_me 1.6.0, which was
This file was extended by mckernel $as_me 1.5.1-knl+hfi, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -5479,7 +5647,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
mckernel config.status 1.6.0
mckernel config.status 1.5.1-knl+hfi
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"
@@ -5627,11 +5795,11 @@ do
"arch/x86_64/tools/eclair-dump-backtrace.exp") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/eclair-dump-backtrace.exp" ;;
"arch/x86_64/tools/mcshutdown-builtin-x86.sh") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/mcshutdown-builtin-x86.sh" ;;
"arch/x86_64/tools/mcreboot.1") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in" ;;
"arch/x86_64/tools/mpimcexec") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/mpimcexec" ;;
"arch/x86_64/tools/mpimcexec.1") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/mpimcexec.1:arch/x86_64/tools/mpimcexec.1in" ;;
"arch/x86_64/tools/irqbalance_mck.service") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/irqbalance_mck.service" ;;
"arch/x86_64/tools/irqbalance_mck.in") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/irqbalance_mck.in" ;;
"tools/mcstat/mcstat.1") CONFIG_FILES="$CONFIG_FILES tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in" ;;
"tools/mcstat/Makefile") CONFIG_FILES="$CONFIG_FILES tools/mcstat/Makefile" ;;
"mck_test_config.sample") CONFIG_FILES="$CONFIG_FILES mck_test_config.sample:test/mck_test_config.sample.in" ;;
"arch/x86_64/kernel/Makefile.arch") CONFIG_FILES="$CONFIG_FILES arch/x86_64/kernel/Makefile.arch" ;;
"kernel/config/config.smp-arm64") CONFIG_FILES="$CONFIG_FILES kernel/config/config.smp-arm64" ;;
"arch/arm64/kernel/vdso/Makefile") CONFIG_FILES="$CONFIG_FILES arch/arm64/kernel/vdso/Makefile" ;;

View File

@@ -1,9 +1,9 @@
# configure.ac COPYRIGHT FUJITSU LIMITED 2015-2016
AC_PREREQ(2.63)
m4_define([IHK_VERSION_m4],[1.6.0])dnl
m4_define([MCKERNEL_VERSION_m4],[1.6.0])dnl
m4_define([IHK_RELEASE_DATE_m4],[2018-11-11])dnl
m4_define([MCKERNEL_RELEASE_DATE_m4],[2018-11-11])dnl
m4_define([IHK_VERSION_m4],[1.5.1-knl+hfi])dnl
m4_define([MCKERNEL_VERSION_m4],[1.5.1-knl+hfi])dnl
m4_define([IHK_RELEASE_DATE_m4],[2019-05-14])dnl
m4_define([MCKERNEL_RELEASE_DATE_m4],[2019-05-14])dnl
AC_INIT([mckernel], MCKERNEL_VERSION_m4)
@@ -77,58 +77,6 @@ AC_DEFUN([PAC_SET_HEADER_LIB_PATH],[
])
])
AC_DEFUN([PAC_SET_HEADER_LIB_PATH_SYSCALL_INTERCEPT],[
AC_ARG_WITH([$1],
[AC_HELP_STRING([--with-$1=PATH],
[specify path where $1 include directory and lib directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1=PATH expects a valid PATH])
with_$1=""])],
[with_$1=$2])
AC_ARG_WITH([$1-include],
[AC_HELP_STRING([--with-$1-include=PATH],
[specify path where $1 include directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1-include=PATH expects a valid PATH])
with_$1_include=""])],
[])
AC_ARG_WITH([$1-lib],
[AC_HELP_STRING([--with-$1-lib=PATH],
[specify path where $1 lib directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1-lib=PATH expects a valid PATH])
with_$1_lib=""])],
[])
# The args have been sanitized into empty/non-empty values above.
# Now append -I/-L args to CPPFLAGS/LDFLAGS, with more specific options
# taking priority
AS_IF([test -n "${with_$1_include}"],
[PAC_APPEND_FLAG([-I${with_$1_include}],[CPPFLAGS_SYSCALL_INTERCEPT])],
[AS_IF([test -n "${with_$1}"],
[PAC_APPEND_FLAG([-I${with_$1}/include],[CPPFLAGS_SYSCALL_INTERCEPT])])])
AS_IF([test -n "${with_$1_lib}"],
[PAC_APPEND_FLAG([-L${with_$1_lib} -Wl,-rpath,${with_$1_lib}],[LDFLAGS_SYSCALL_INTERCEPT])],
[AS_IF([test -n "${with_$1}"],
dnl is adding lib64 by default really the right thing to do? What if
dnl we are on a 32-bit host that happens to have both lib dirs available?
[PAC_APPEND_FLAG([-L${with_$1}/lib -Wl,-rpath,${with_$1}/lib],[LDFLAGS_SYSCALL_INTERCEPT])
AS_IF([test -d "${with_$1}/lib64"],
[PAC_APPEND_FLAG([-L${with_$1}/lib64 -Wl,-rpath,${with_$1}/lib64],[LDFLAGS_SYSCALL_INTERCEPT])])
])
])
AS_IF([test -n "${with_$1}" || test -n "${with_$1_include}" || test -n "${with_$1_lib}"],
[WITH_SYSCALL_INTERCEPT=yes],
[WITH_SYSCALL_INTERCEPT=no])
])
IHK_VERSION=IHK_VERSION_m4
MCKERNEL_VERSION=MCKERNEL_VERSION_m4
DCFA_VERSION=DCFA_VERSION_m4
@@ -147,23 +95,6 @@ AS_IF([test "x$numa_lib_found" != "xyes"],
PAC_SET_HEADER_LIB_PATH([mpi])
PAC_SET_HEADER_LIB_PATH_SYSCALL_INTERCEPT([syscall_intercept])
if test "x$WITH_SYSCALL_INTERCEPT" == "xno" ; then
AC_CHECK_LIB([syscall_intercept],[syscall_no_intercept],[syscall_intercept_lib_found=yes],[syscall_intercept_lib_found=no],[-lcapstone -ldl])
AS_IF([test "x$syscall_intercept_lib_found" != "xyes"],
[AC_MSG_NOTICE([libsyscall_intercept.so not found])])
AC_CHECK_HEADER([libsyscall_intercept_hook_point.h],[syscall_intercept_header_found=yes],[syscall_intercept_header_found=no])
AS_IF([test "x$syscall_intercept_header_found" != "xyes"],
[AC_MSG_NOTICE([libsyscall_intercept_hook_point.h not found])])
AS_IF([test "x$syscall_intercept_lib_found" == "xyes" && test "x$syscall_intercept_header_found" == "xyes"],
[WITH_SYSCALL_INTERCEPT=yes],
[WITH_SYSCALL_INTERCEPT=no])
fi
AC_ARG_WITH([kernelsrc],
AC_HELP_STRING(
[--with-kernelsrc=path],[Path to 'kernel src', default is /lib/modules/uname_r/build]),
@@ -408,6 +339,78 @@ KDIR="$WITH_KERNELSRC"
UNAME_R="$WITH_UNAME_R"
TARGET="$WITH_TARGET"
MCCTRL_LINUX_SYMTAB=""
case "X$WITH_SYSTEM_MAP" in
Xyes | Xno | X)
MCCTRL_LINUX_SYMTAB=""
;;
*)
MCCTRL_LINUX_SYMTAB="$WITH_SYSTEM_MAP"
;;
esac
AC_MSG_CHECKING([[for System.map]])
if test -r "$MCCTRL_LINUX_SYMTAB"; then
MCCTRL_LINUX_SYMTAB="$MCCTRL_LINUX_SYMTAB"
elif test -r "/boot/System.map-`uname -r`"; then
MCCTRL_LINUX_SYMTAB="/boot/System.map-`uname -r`"
elif test -r "$KDIR/System.map"; then
MCCTRL_LINUX_SYMTAB="$KDIR/System.map"
fi
if test "$MCCTRL_LINUX_SYMTAB" == ""; then
AC_MSG_ERROR([could not find])
fi
if test -z "`eval cat $MCCTRL_LINUX_SYMTAB`"; then
AC_MSG_ERROR([could not read System.map file, no read permission?])
fi
AC_MSG_RESULT([$MCCTRL_LINUX_SYMTAB])
MCCTRL_LINUX_SYMTAB_CMD="cat $MCCTRL_LINUX_SYMTAB"
# MCCTRL_FIND_KSYM(SYMBOL)
# ------------------------------------------------------
# Search System.map for address of the given symbol and
# do one of three things in config.h:
# If not found, leave MCCTRL_KSYM_foo undefined
# If found to be exported, "#define MCCTRL_KSYM_foo 0"
# If found not to be exported, "#define MCCTRL_KSYM_foo 0x<value>"
AC_DEFUN([MCCTRL_FIND_KSYM],[
AC_MSG_CHECKING([[System.map for symbol $1]])
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " $1\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
AC_MSG_RESULT([not found])
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
m4_ifval([$2],[],[
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_$1\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
])
AC_MSG_RESULT([$mcctrl_result])
AC_DEFINE_UNQUOTED(MCCTRL_KSYM_[]$1,$mcctrl_addr,[Define to address of kernel symbol $1, or 0 if exported])
fi
])
MCCTRL_FIND_KSYM([sys_mount])
MCCTRL_FIND_KSYM([sys_umount])
MCCTRL_FIND_KSYM([sys_unshare])
MCCTRL_FIND_KSYM([zap_page_range])
MCCTRL_FIND_KSYM([vdso_image_64])
MCCTRL_FIND_KSYM([vdso_start])
MCCTRL_FIND_KSYM([vdso_end])
MCCTRL_FIND_KSYM([vdso_pages])
MCCTRL_FIND_KSYM([__vvar_page])
MCCTRL_FIND_KSYM([hpet_address])
# POSTK_DEBUG_ARCH_DEP_50, add:find kernel symbol.
MCCTRL_FIND_KSYM([vdso_spec])
MCCTRL_FIND_KSYM([hv_clock])
MCCTRL_FIND_KSYM([sys_readlink])
MCCTRL_FIND_KSYM([walk_page_range])
case $ENABLE_MEMDUMP in
yes|no|auto)
;;
@@ -486,13 +489,6 @@ else
AC_MSG_NOTICE([perf is disabled])
fi
if test "x$WITH_SYSCALL_INTERCEPT" = "xyes" ; then
AC_DEFINE([WITH_SYSCALL_INTERCEPT],[1],[whether or not syscall_intercept library is linked])
AC_MSG_NOTICE([syscall_intercept library is linked])
else
AC_MSG_NOTICE([syscall_intercept library isn't linked])
fi
if test "x$MCKERNEL_INCDIR" != "x" ; then
AC_DEFINE_UNQUOTED(MCKERNEL_INCDIR,"$MCKERNEL_INCDIR",[McKernel specific headers])
fi
@@ -530,12 +526,9 @@ AC_SUBST(KMODDIR)
AC_SUBST(KERNDIR)
AC_SUBST(MANDIR)
AC_SUBST(CFLAGS)
AC_SUBST(CPPFLAGS_SYSCALL_INTERCEPT)
AC_SUBST(LDFLAGS_SYSCALL_INTERCEPT)
AC_SUBST(ENABLE_MCOVERLAYFS)
AC_SUBST(ENABLE_RUSAGE)
AC_SUBST(ENABLE_QLMPI)
AC_SUBST(WITH_SYSCALL_INTERCEPT)
AC_SUBST(IHK_VERSION)
AC_SUBST(MCKERNEL_VERSION)
@@ -575,18 +568,13 @@ AC_CONFIG_FILES([
arch/x86_64/tools/eclair-dump-backtrace.exp
arch/x86_64/tools/mcshutdown-builtin-x86.sh
arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in
arch/x86_64/tools/mpimcexec
arch/x86_64/tools/mpimcexec.1:arch/x86_64/tools/mpimcexec.1in
arch/x86_64/tools/irqbalance_mck.service
arch/x86_64/tools/irqbalance_mck.in
tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in
tools/mcstat/Makefile
])
if test -e "${ABS_SRCDIR}/test"; then
AC_CONFIG_FILES([
mck_test_config.sample:test/mck_test_config.sample.in
])
fi
if test "$TARGET" = "smp-x86"; then
AC_CONFIG_FILES([
arch/x86_64/kernel/Makefile.arch

View File

@@ -55,14 +55,13 @@
#define MCEXEC_UP_SYS_UMOUNT 0x30a02915
#define MCEXEC_UP_SYS_UNSHARE 0x30a02916
#define MCEXEC_UP_UTI_GET_CTX 0x30a02920
#define MCEXEC_UP_UTI_SAVE_FS 0x30a02921
#define MCEXEC_UP_UTIL_THREAD1 0x30a02920
#define MCEXEC_UP_UTIL_THREAD2 0x30a02921
#define MCEXEC_UP_SIG_THREAD 0x30a02922
#define MCEXEC_UP_SYSCALL_THREAD 0x30a02924
#define MCEXEC_UP_TERMINATE_THREAD 0x30a02925
#define MCEXEC_UP_GET_NUM_POOL_THREADS 0x30a02926
#define MCEXEC_UP_UTI_ATTR 0x30a02927
#define MCEXEC_UP_RELEASE_USER_SPACE 0x30a02928
#define MCEXEC_UP_DEBUG_LOG 0x40000000
@@ -111,10 +110,14 @@ typedef unsigned long __cpu_set_unit;
#define MPOL_NO_BSS 0x04
#define MPOL_SHM_PREMAP 0x08
#define MCEXEC_HFI1 0x01
struct program_load_desc {
int num_sections;
int status;
int cpu;
int pid;
int err;
int stack_prot;
int pgid;
int cred[8];
@@ -137,15 +140,15 @@ struct program_load_desc {
unsigned long envs_len;
struct rlimit rlimit[MCK_RLIM_MAX];
unsigned long interp_align;
unsigned long mcexec_flags;
unsigned long mpol_flags;
unsigned long mpol_threshold;
unsigned long heap_extension;
long stack_premap;
unsigned long mpol_bind_mask;
int uti_thread_rank; /* N-th clone() spawns a thread on Linux CPU */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int nr_processes;
int process_rank;
char shell_path[SHELL_PATH_MAX_LEN];
__cpu_set_unit cpu_set[PLD_CPU_SET_SIZE];
int profile;
struct program_image_section sections[0];
@@ -191,6 +194,7 @@ struct syscall_response {
long ret;
unsigned long fault_address;
unsigned long fault_reason;
void *private_data;
};
struct syscall_ret_desc {
@@ -246,28 +250,6 @@ struct sys_unshare_desc {
unsigned long unshare_flags;
};
struct release_user_space_desc {
unsigned long user_start;
unsigned long user_end;
};
struct terminate_thread_desc {
int pid;
int tid;
long code;
/* 32------32 31--16 15--------8 7----0
exit_group exit-status signal */
unsigned long tsk; /* struct task_struct * */
};
struct rpgtable_desc {
uintptr_t rpgtable;
uintptr_t start;
uintptr_t len;
};
enum perf_ctrl_type {
PERF_CTRL_SET,
PERF_CTRL_GET,
@@ -277,7 +259,7 @@ enum perf_ctrl_type {
struct perf_ctrl_desc {
enum perf_ctrl_type ctrl_type;
int err;
int status;
union {
/* for SET, GET */
struct {
@@ -317,10 +299,6 @@ struct perf_ctrl_desc {
#define UTI_FLAG_HIGH_PRIORITY (1ULL<<12)
#define UTI_FLAG_NON_COOPERATIVE (1ULL<<13)
#define UTI_FLAG_PREFER_LWK (1ULL << 14)
#define UTI_FLAG_PREFER_FWK (1ULL << 15)
#define UTI_FLAG_FABRIC_INTR_AFFINITY (1ULL << 16)
/* Linux default value is used */
#define UTI_MAX_NUMA_DOMAINS (1024)
@@ -339,30 +317,6 @@ struct kuti_attr {
struct uti_attr_desc {
unsigned long phys_attr;
char *uti_cpu_set_str; /* UTI_CPU_SET environmental variable */
size_t uti_cpu_set_len;
};
struct uti_ctx {
union {
char ctx[4096]; /* TODO: Get the size from config.h */
struct {
int uti_refill_tid;
};
};
};
struct uti_get_ctx_desc {
unsigned long rp_rctx; /* Remote physical address of remote context */
void *rctx; /* Remote context */
void *lctx; /* Local context */
int uti_refill_tid;
unsigned long key; /* OUT: struct task_struct* of mcexec thread, used to search struct host_thread */
};
struct uti_save_fs_desc {
void *rctx; /* Remote context */
void *lctx; /* Local context */
};
#endif

View File

@@ -1,31 +0,0 @@
#ifndef UTI_H_INCLUDED
#define UTI_H_INCLUDED
struct syscall_struct {
int number;
unsigned long args[6];
unsigned long ret;
unsigned long uti_clv; /* copy of a clv in McKernel */
};
#define UTI_SZ_SYSCALL_STACK 16
/* Variables accessed by mcexec.c and syscall_intercept.c */
struct uti_desc {
char lctx[4096]; /* TODO: Get the size from config.h */
char rctx[4096]; /* TODO: Get the size from config.h */
int mck_tid; /* TODO: Move this out for multiple migrated-to-Linux threads */
unsigned long key; /* struct task_struct* of mcexec thread, used to search struct host_thread */
int pid, tid; /* Used as the id of tracee when issuing MCEXEC_UP_TERMINATE_THREAD */
unsigned long uti_clv; /* copy of McKernel clv */
int fd; /* /dev/mcosX */
struct syscall_struct syscall_stack[UTI_SZ_SYSCALL_STACK]; /* stack of system call arguments and return values */
int syscall_stack_top; /* stack-pointer of syscall arguments list */
long syscalls[512], syscalls2[512]; /* Syscall profile counters */
int start_syscall_intercept; /* Used to sync between mcexec.c and syscall_intercept.c */
};
#endif

View File

@@ -1,7 +1,6 @@
/* archdeps.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <linux/version.h>
#include <linux/mm_types.h>
#include <linux/kallsyms.h>
#include <asm/vdso.h>
#include "../../../config.h"
#include "../../mcctrl.h"
@@ -18,31 +17,29 @@
#define D(fmt, ...) printk("%s(%d) " fmt, __func__, __LINE__, ##__VA_ARGS__)
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 0, 0)
void *vdso_start;
void *vdso_end;
static struct vm_special_mapping (*vdso_spec)[2];
#ifdef MCCTRL_KSYM_vdso_start
# if MCCTRL_KSYM_vdso_start
void *vdso_start = (void *)MCCTRL_KSYM_vdso_start;
# endif
#else
# error missing address of vdso_start.
#endif
int arch_symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 0, 0)
vdso_start = (void *) kallsyms_lookup_name("vdso_start");
if (WARN_ON(!vdso_start))
return -EFAULT;
vdso_end = (void *) kallsyms_lookup_name("vdso_end");
if (WARN_ON(!vdso_end))
return -EFAULT;
vdso_spec = (void *) kallsyms_lookup_name("vdso_spec");
if (WARN_ON(!vdso_spec))
return -EFAULT;
#ifdef MCCTRL_KSYM_vdso_end
# if MCCTRL_KSYM_vdso_end
void *vdso_end = (void *)MCCTRL_KSYM_vdso_end;
# endif
#else
# error missing address of vdso_end.
#endif
return 0;
}
#ifdef MCCTRL_KSYM_vdso_spec
# if MCCTRL_KSYM_vdso_spec
static struct vm_special_mapping (*vdso_spec)[2] = (void*)MCCTRL_KSYM_vdso_spec;
# endif
#else
# error missing address of vdso_spec.
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52
#define VDSO_MAXPAGES 1

View File

@@ -1,6 +1,5 @@
/* archdeps.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <linux/version.h>
#include <linux/kallsyms.h>
#include "../../../config.h"
#include "../../mcctrl.h"
@@ -14,46 +13,57 @@
#endif
#endif /* POSTK_DEBUG_ARCH_DEP_83 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 16, 0)
static struct vdso_image *vdso_image_64;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 23)
static void *vdso_start;
static void *vdso_end;
static struct page **vdso_pages;
#ifdef MCCTRL_KSYM_vdso_image_64
#if MCCTRL_KSYM_vdso_image_64
struct vdso_image *vdso_image = (void *)MCCTRL_KSYM_vdso_image_64;
#endif
static void *__vvar_page;
static long *hpet_address;
static void **hv_clock;
int arch_symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 16, 0)
vdso_image_64 = (void *) kallsyms_lookup_name("vdso_image_64");
if (WARN_ON(!vdso_image_64))
return -EFAULT;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 23)
vdso_start = (void *) kallsyms_lookup_name("vdso_start");
if (WARN_ON(!vdso_start))
return -EFAULT;
vdso_end = (void *) kallsyms_lookup_name("vdso_end");
if (WARN_ON(!vdso_end))
return -EFAULT;
vdso_pages = (void *) kallsyms_lookup_name("vdso_pages");
if (WARN_ON(!vdso_pages))
return -EFAULT;
#endif
__vvar_page = (void *) kallsyms_lookup_name("__vvar_page");
if (WARN_ON(!__vvar_page))
return -EFAULT;
#ifdef MCCTRL_KSYM_vdso_start
#if MCCTRL_KSYM_vdso_start
void *vdso_start = (void *)MCCTRL_KSYM_vdso_start;
#endif
#endif
hpet_address = (void *) kallsyms_lookup_name("hpet_address");
hv_clock = (void *) kallsyms_lookup_name("hv_clock");
return 0;
}
#ifdef MCCTRL_KSYM_vdso_end
#if MCCTRL_KSYM_vdso_end
void *vdso_end = (void *)MCCTRL_KSYM_vdso_end;
#endif
#endif
#ifdef MCCTRL_KSYM_vdso_pages
#if MCCTRL_KSYM_vdso_pages
struct page **vdso_pages = (void *)MCCTRL_KSYM_vdso_pages;
#endif
#endif
#ifdef MCCTRL_KSYM___vvar_page
#if MCCTRL_KSYM___vvar_page
void *__vvar_page = (void *)MCCTRL_KSYM___vvar_page;
#endif
#endif
long *hpet_addressp
#ifdef MCCTRL_KSYM_hpet_address
#if MCCTRL_KSYM_hpet_address
= (void *)MCCTRL_KSYM_hpet_address;
#else
= &hpet_address;
#endif
#else
= NULL;
#endif
void **hv_clockp
#ifdef MCCTRL_KSYM_hv_clock
#if MCCTRL_KSYM_hv_clock
= (void *)MCCTRL_KSYM_hv_clock;
#else
= &hv_clock;
#endif
#else
= NULL;
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52
#define VDSO_MAXPAGES 2
@@ -128,7 +138,7 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
/* VDSO pages */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0)
size = vdso_image_64->size;
size = vdso_image->size;
vdso->vdso_npages = size >> PAGE_SHIFT;
if (vdso->vdso_npages > VDSO_MAXPAGES) {
@@ -138,7 +148,7 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
for (i = 0; i < vdso->vdso_npages; ++i) {
vdso->vdso_physlist[i] = virt_to_phys(
vdso_image_64->data + (i * PAGE_SIZE));
vdso_image->data + (i * PAGE_SIZE));
}
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,23)
size = vdso_end - vdso_start;
@@ -175,36 +185,36 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
#endif
/* HPET page */
if (hpet_address && *hpet_address) {
if (hpet_addressp && *hpet_addressp) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)(-2 * PAGE_SIZE);
vdso->hpet_phys = *hpet_address;
vdso->hpet_phys = *hpet_addressp;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)(-1 * PAGE_SIZE);
vdso->hpet_phys = *hpet_address;
vdso->hpet_phys = *hpet_addressp;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)((vdso->vdso_npages + 1) * PAGE_SIZE);
vdso->hpet_phys = *hpet_address;
vdso->hpet_phys = *hpet_addressp;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,23)
vdso->hpet_is_global = 1;
vdso->hpet_virt = (void *)fix_to_virt(VSYSCALL_HPET);
vdso->hpet_phys = *hpet_address;
vdso->hpet_phys = *hpet_addressp;
#endif
}
/* struct pvlock_vcpu_time_info table */
if (hv_clock && *hv_clock) {
if (hv_clockp && *hv_clockp) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
vdso->pvti_is_global = 0;
vdso->pvti_virt = (void *)(-1 * PAGE_SIZE);
vdso->pvti_phys = virt_to_phys(*hv_clock);
vdso->pvti_phys = virt_to_phys(*hv_clockp);
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,0)
vdso->pvti_is_global = 1;
vdso->pvti_virt = (void *)fix_to_virt(PVCLOCK_FIXMAP_BEGIN);
vdso->pvti_phys = virt_to_phys(*hv_clock);
vdso->pvti_phys = virt_to_phys(*hv_clockp);
#endif
}
@@ -279,14 +289,6 @@ get_fs_ctx(void *ctx)
return tctx->fs;
}
unsigned long
get_rsp_ctx(void *ctx)
{
struct trans_uctx *tctx = ctx;
return tctx->rsp;
}
#ifdef POSTK_DEBUG_ARCH_DEP_83 /* arch depend translate_rva_to_rpa() move */
int translate_rva_to_rpa(ihk_os_t os, unsigned long rpt, unsigned long rva,
unsigned long *rpap, unsigned long *pgsizep)

View File

@@ -125,6 +125,7 @@ static int load_elf(struct linux_binprm *bprm
for(i = 0, st = 0; mode != 2;){
if(st == 0){
off = p & ~PAGE_MASK;
#ifdef POSTK_DEBUG_ARCH_DEP_41 /* HOST-Linux version switch add */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,10,0)
rc = get_user_pages_remote(current, bprm->mm,
bprm->p, 1, FOLL_FORCE, &page, NULL, NULL);
@@ -140,6 +141,17 @@ static int load_elf(struct linux_binprm *bprm
bprm->p, 1, 0, 1,
&page, NULL);
#endif
#else /* POSTK_DEBUG_ARCH_DEP_41 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0)
rc = get_user_pages_remote(current, bprm->mm,
bprm->p, 1, 0, 1,
&page, NULL);
#else
rc = get_user_pages(current, bprm->mm,
bprm->p, 1, 0, 1,
&page, NULL);
#endif
#endif /* POSTK_DEBUG_ARCH_DEP_41 */
if(rc <= 0) {
kfree(pbuf);
return -EFAULT;

File diff suppressed because it is too large Load Diff

View File

@@ -28,7 +28,6 @@
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/kallsyms.h>
#include "mcctrl.h"
#include <ihk/ihk_host_user.h>
@@ -44,6 +43,8 @@ extern void mcctrl_syscall_init(void);
extern void procfs_init(int);
extern void procfs_exit(int);
extern void rus_page_hash_init(void);
extern void rus_page_hash_put_pages(void);
extern void uti_attr_finalize(void);
extern void binfmt_mcexec_init(void);
extern void binfmt_mcexec_exit(void);
@@ -83,14 +84,13 @@ static struct ihk_os_user_call_handler mcctrl_uchs[] = {
{ .request = MCEXEC_UP_SYS_MOUNT, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYS_UMOUNT, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYS_UNSHARE, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_GET_CTX, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_SAVE_FS, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTIL_THREAD1, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTIL_THREAD2, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SIG_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYSCALL_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_TERMINATE_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_GET_NUM_POOL_THREADS, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_ATTR, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_RELEASE_USER_SPACE, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_DEBUG_LOG, .func = mcctrl_ioctl },
{ .request = IHK_OS_AUX_PERF_NUM, .func = mcctrl_ioctl },
{ .request = IHK_OS_AUX_PERF_SET, .func = mcctrl_ioctl },
@@ -178,7 +178,6 @@ int mcctrl_os_shutdown_notifier(int os_index)
mdelay(200);
}
pager_cleanup();
sysfsm_cleanup(os[os_index]);
free_topology_info(os[os_index]);
ihk_os_unregister_user_call_handlers(os[os_index], mcctrl_uc + os_index);
@@ -186,6 +185,9 @@ int mcctrl_os_shutdown_notifier(int os_index)
destroy_ikc_channels(os[os_index]);
procfs_exit(os_index);
}
#ifdef POSTK_DEBUG_TEMP_FIX_35 /* in shutdown phase, rus_page_hash_put_pages() call added. */
rus_page_hash_put_pages();
#endif /* POSTK_DEBUG_TEMP_FIX_35 */
os[os_index] = NULL;
@@ -212,68 +214,6 @@ static struct ihk_os_notifier mcctrl_os_notifier = {
.ops = &mcctrl_os_notifier_ops,
};
int (*mcctrl_sys_mount)(char *dev_name, char *dir_name, char *type,
unsigned long flags, void *data);
int (*mcctrl_sys_umount)(char *dir_name, int flags);
int (*mcctrl_sys_unshare)(unsigned long unshare_flags);
long (*mcctrl_sched_setaffinity)(pid_t pid, const struct cpumask *in_mask);
int (*mcctrl_sched_setscheduler_nocheck)(struct task_struct *p, int policy,
const struct sched_param *param);
ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz);
void (*mcctrl_zap_page_range)(struct vm_area_struct *vma,
unsigned long start,
unsigned long size,
struct zap_details *details);
struct inode_operations *mcctrl_hugetlbfs_inode_operations;
static int symbols_init(void)
{
mcctrl_sys_mount = (void *) kallsyms_lookup_name("sys_mount");
if (WARN_ON(!mcctrl_sys_mount))
return -EFAULT;
mcctrl_sys_umount = (void *) kallsyms_lookup_name("sys_umount");
if (WARN_ON(!mcctrl_sys_umount))
return -EFAULT;
mcctrl_sys_unshare = (void *) kallsyms_lookup_name("sys_unshare");
if (WARN_ON(!mcctrl_sys_unshare))
return -EFAULT;
mcctrl_sched_setaffinity =
(void *) kallsyms_lookup_name("sched_setaffinity");
if (WARN_ON(!mcctrl_sched_setaffinity))
return -EFAULT;
mcctrl_sched_setscheduler_nocheck =
(void *) kallsyms_lookup_name("sched_setscheduler_nocheck");
if (WARN_ON(!mcctrl_sched_setscheduler_nocheck))
return -EFAULT;
mcctrl_sys_readlink =
(void *) kallsyms_lookup_name("sys_readlink");
if (WARN_ON(!mcctrl_sys_readlink))
return -EFAULT;
mcctrl_zap_page_range =
(void *) kallsyms_lookup_name("zap_page_range");
if (WARN_ON(!mcctrl_zap_page_range))
return -EFAULT;
mcctrl_hugetlbfs_inode_operations =
(void *) kallsyms_lookup_name("hugetlbfs_inode_operations");
if (WARN_ON(!mcctrl_hugetlbfs_inode_operations))
return -EFAULT;
return arch_symbols_init();
}
static int __init mcctrl_init(void)
{
int ret = 0;
@@ -287,10 +227,9 @@ static int __init mcctrl_init(void)
os[i] = NULL;
}
binfmt_mcexec_init();
rus_page_hash_init();
if ((ret = symbols_init()))
goto error;
binfmt_mcexec_init();
if ((ret = ihk_host_register_os_notifier(&mcctrl_os_notifier)) != 0) {
printk("mcctrl: error: registering OS notifier\n");
@@ -302,6 +241,7 @@ static int __init mcctrl_init(void)
error:
binfmt_mcexec_exit();
rus_page_hash_put_pages();
return ret;
}
@@ -313,6 +253,7 @@ static void __exit mcctrl_exit(void)
}
binfmt_mcexec_exit();
rus_page_hash_put_pages();
uti_attr_finalize();
printk("mcctrl: unregistered.\n");

View File

@@ -49,125 +49,15 @@
//struct mcctrl_channel *channels;
void mcexec_prepare_ack(ihk_os_t os, unsigned long arg, int err);
static void mcctrl_ikc_init(ihk_os_t os, int cpu, unsigned long rphys, struct ihk_ikc_channel_desc *c);
int mcexec_syscall(struct mcctrl_usrdata *ud, struct ikc_scd_packet *packet);
void sig_done(unsigned long arg, int err);
void mcctrl_perf_ack(ihk_os_t os, struct ikc_scd_packet *packet);
void mcctrl_futex_wake(struct ikc_scd_packet *pisp);
void mcctrl_os_read_write_cpu_response(ihk_os_t os,
struct ikc_scd_packet *pisp);
void mcctrl_eventfd(ihk_os_t os, struct ikc_scd_packet *pisp);
/* Assumes usrdata->wakeup_descs_lock taken */
static void mcctrl_wakeup_desc_cleanup(ihk_os_t os,
struct mcctrl_wakeup_desc *desc)
{
int i;
list_del(&desc->chain);
for (i = 0; i < desc->free_addrs_count; i++) {
kfree(desc->free_addrs[i]);
}
}
static void mcctrl_wakeup_cb(ihk_os_t os, struct ikc_scd_packet *packet)
{
struct mcctrl_wakeup_desc *desc = packet->reply;
WRITE_ONCE(desc->err, packet->err);
/*
* Check if the other side is still waiting, and signal it we're done.
*
* Setting status needs to be done last because the other side could
* wake up opportunistically between this set and the wake_up call.
*
* If the other side is no longer waiting, free the memory that was
* left for us.
*/
if (cmpxchg(&desc->status, 0, 1)) {
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
unsigned long flags;
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
mcctrl_wakeup_desc_cleanup(os, desc);
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
return;
}
wake_up_interruptible(&desc->wq);
}
int mcctrl_ikc_send_wait(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp,
long int timeout, struct mcctrl_wakeup_desc *desc,
int *do_frees, int free_addrs_count, ...)
{
int ret, i;
int alloc_desc = (desc == NULL);
va_list ap;
if (free_addrs_count)
*do_frees = 1;
if (alloc_desc)
desc = kmalloc(sizeof(struct mcctrl_wakeup_desc) +
(free_addrs_count + 1) * sizeof(void *),
GFP_KERNEL);
if (!desc) {
pr_warn("%s: Could not allocate wakeup descriptor", __func__);
return -ENOMEM;
}
pisp->reply = desc;
va_start(ap, free_addrs_count);
for (i = 0; i < free_addrs_count; i++) {
desc->free_addrs[i] = va_arg(ap, void*);
}
va_end(ap);
if (alloc_desc)
desc->free_addrs[free_addrs_count++] = desc;
desc->free_addrs_count = free_addrs_count;
init_waitqueue_head(&desc->wq);
WRITE_ONCE(desc->err, 0);
WRITE_ONCE(desc->status, 0);
ret = mcctrl_ikc_send(os, cpu, pisp);
if (ret < 0) {
pr_warn("%s: mcctrl_ikc_send failed: %d\n", __func__, ret);
kfree(desc);
return ret;
}
if (timeout) {
ret = wait_event_interruptible_timeout(desc->wq,
desc->status, timeout);
} else {
ret = wait_event_interruptible(desc->wq, desc->status);
}
/*
* Check if wait aborted (signal..) or timed out, and notify
* the callback it will need to free things for us
*/
if (!cmpxchg(&desc->status, 0, 1)) {
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
unsigned long flags;
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
list_add(&desc->chain, &usrdata->wakeup_descs_list);
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
if (do_frees)
*do_frees = 0;
return ret < 0 ? ret : -ETIME;
}
ret = READ_ONCE(desc->err);
if (alloc_desc)
kfree(desc);
return ret;
}
/* XXX: this runs in atomic context! */
static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
void *__packet, void *__os)
@@ -182,16 +72,25 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
break;
case SCD_MSG_PREPARE_PROCESS_ACKED:
case SCD_MSG_PERF_ACK:
case SCD_MSG_SEND_SIGNAL_ACK:
case SCD_MSG_PROCFS_ANSWER:
mcctrl_wakeup_cb(__os, pisp);
mcexec_prepare_ack(__os, pisp->arg, 0);
break;
case SCD_MSG_PREPARE_PROCESS_NACKED:
mcexec_prepare_ack(__os, pisp->arg, pisp->err);
break;
case SCD_MSG_SYSCALL_ONESIDE:
mcexec_syscall(usrdata, pisp);
break;
case SCD_MSG_PROCFS_ANSWER:
procfs_answer(usrdata, pisp->pid);
break;
case SCD_MSG_SEND_SIGNAL:
sig_done(pisp->arg, pisp->err);
break;
case SCD_MSG_SYSFS_REQ_CREATE:
case SCD_MSG_SYSFS_REQ_MKDIR:
case SCD_MSG_SYSFS_REQ_SYMLINK:
@@ -207,14 +106,17 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_PROCFS_TID_CREATE:
case SCD_MSG_PROCFS_TID_DELETE:
procfsm_packet_handler(__os, pisp->msg, pisp->pid, pisp->arg,
pisp->resp_pa);
procfsm_packet_handler(__os, pisp->msg, pisp->pid, pisp->arg);
break;
case SCD_MSG_GET_VDSO_INFO:
get_vdso_info(__os, pisp->arg);
break;
case SCD_MSG_PERF_ACK:
mcctrl_perf_ack(__os, pisp);
break;
case SCD_MSG_CPU_RW_REG_RESP:
mcctrl_os_read_write_cpu_response(__os, pisp);
break;
@@ -224,10 +126,6 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
mcctrl_eventfd(__os, pisp);
break;
case SCD_MSG_FUTEX_WAKE:
mcctrl_futex_wake(pisp);
break;
default:
printk(KERN_ERR "mcctrl:syscall_packet_handler:"
"unknown message (%d.%d.%d.%d.%d.%#lx)\n",
@@ -259,15 +157,10 @@ static int dummy_packet_handler(struct ihk_ikc_channel_desc *c,
int mcctrl_ikc_send(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp)
{
struct mcctrl_usrdata *usrdata;
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
if (cpu < 0 || os == NULL) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (usrdata == NULL || cpu >= usrdata->num_channels ||
!usrdata->channels[cpu].c) {
if (cpu < 0 || os == NULL || usrdata == NULL ||
cpu >= usrdata->num_channels || !usrdata->channels[cpu].c) {
return -EINVAL;
}
return ihk_ikc_send(usrdata->channels[cpu].c, pisp, 0);
@@ -461,8 +354,6 @@ int prepare_ikc_channels(ihk_os_t os)
mutex_init(&usrdata->part_exec.lock);
INIT_LIST_HEAD(&usrdata->part_exec.pli_list);
usrdata->part_exec.nr_processes = -1;
INIT_LIST_HEAD(&usrdata->wakeup_descs_list);
spin_lock_init(&usrdata->wakeup_descs_lock);
return 0;
@@ -484,9 +375,7 @@ void __destroy_ikc_channel(ihk_os_t os, struct mcctrl_channel *pmc)
void destroy_ikc_channels(ihk_os_t os)
{
int i;
unsigned long flags;
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_wakeup_desc *mwd_entry, *mwd_next;
if (!usrdata) {
printk("%s: WARNING: no mcctrl_usrdata found\n", __FUNCTION__);
@@ -506,12 +395,6 @@ void destroy_ikc_channels(ihk_os_t os)
ihk_ikc_destroy_channel(usrdata->ikc2linux[i]);
}
}
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
list_for_each_entry_safe(mwd_entry, mwd_next,
&usrdata->wakeup_descs_list, chain) {
mcctrl_wakeup_desc_cleanup(os, mwd_entry);
}
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
kfree(usrdata->channels);
kfree(usrdata->ikc2linux);

View File

@@ -48,6 +48,7 @@
#define SCD_MSG_PREPARE_PROCESS 0x1
#define SCD_MSG_PREPARE_PROCESS_ACKED 0x2
#define SCD_MSG_PREPARE_PROCESS_NACKED 0x7
#define SCD_MSG_SCHEDULE_PROCESS 0x3
#define SCD_MSG_WAKE_UP_SYSCALL_THREAD 0x14
@@ -55,8 +56,7 @@
#define SCD_MSG_INIT_CHANNEL_ACKED 0x6
#define SCD_MSG_SYSCALL_ONESIDE 0x4
#define SCD_MSG_SEND_SIGNAL 0x7
#define SCD_MSG_SEND_SIGNAL_ACK 0x8
#define SCD_MSG_SEND_SIGNAL 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_GET_VDSO_INFO 0xa
@@ -67,7 +67,6 @@
#define SCD_MSG_PROCFS_DELETE 0x11
#define SCD_MSG_PROCFS_REQUEST 0x12
#define SCD_MSG_PROCFS_ANSWER 0x13
#define SCD_MSG_PROCFS_RELEASE 0x15
#define SCD_MSG_DEBUG_LOG 0x20
@@ -102,18 +101,23 @@
#define SCD_MSG_CPU_RW_REG 0x52
#define SCD_MSG_CPU_RW_REG_RESP 0x53
#define SCD_MSG_FUTEX_WAKE 0x60
#define DMA_PIN_SHIFT 21
#define DO_USER_MODE
#define __NR_coredump 999
#ifdef POSTK_DEBUG_TEMP_FIX_61 /* Core table size and lseek return value to loff_t */
struct coretable {
loff_t len;
unsigned long addr;
};
#else /* POSTK_DEBUG_TEMP_FIX_61 */
struct coretable {
int len;
unsigned long addr;
};
#endif /* POSTK_DEBUG_TEMP_FIX_61 */
enum mcctrl_os_cpu_operation {
MCCTRL_OS_CPU_READ_REGISTER,
@@ -121,16 +125,9 @@ enum mcctrl_os_cpu_operation {
MCCTRL_OS_CPU_MAX_OP
};
/* Used to wake-up a Linux thread futex_wait()-ing */
struct uti_futex_resp {
int done;
wait_queue_head_t wq;
};
struct ikc_scd_packet {
int msg;
int err;
void *reply;
union {
/* for traditional SCD_MSG_* */
struct {
@@ -149,7 +146,7 @@ struct ikc_scd_packet {
long sysfs_arg3;
};
/* SCD_MSG_WAKE_UP_SYSCALL_THREAD */
/* SCD_MSG_SCHEDULE_THREAD */
struct {
int ttid;
};
@@ -165,17 +162,10 @@ struct ikc_scd_packet {
struct {
int eventfd_type;
};
/* SCD_MSG_FUTEX_WAKE */
struct {
void *resp;
int *spin_sleep; /* 1: waiting in linux_wait_event() 0: woken up by someone else */
} futex;
};
char padding[8];
char padding[12];
};
struct mcctrl_priv {
ihk_os_t os;
struct program_load_desc *desc;
@@ -221,12 +211,9 @@ struct mcctrl_channel {
};
struct mcctrl_per_thread_data {
struct mcctrl_per_proc_data *ppd;
struct list_head hash;
struct task_struct *task;
void *data;
int tid; /* debug */
atomic_t refcount;
};
#define MCCTRL_PER_THREAD_DATA_HASH_SHIFT 8
@@ -244,6 +231,7 @@ struct mcctrl_per_proc_data {
struct list_head wq_list_exact; /* These requests come from IKC IRQ handler targeting a particular thread */
ihk_spinlock_t wq_list_lock;
wait_queue_head_t wq_prepare;
wait_queue_head_t wq_procfs;
struct list_head per_thread_data_hash[MCCTRL_PER_THREAD_DATA_HASH_SIZE];
@@ -355,8 +343,6 @@ struct mcctrl_usrdata {
wait_queue_head_t wq_procfs;
struct list_head per_proc_data_hash[MCCTRL_PER_PROC_DATA_HASH_SIZE];
rwlock_t per_proc_data_hash_lock[MCCTRL_PER_PROC_DATA_HASH_SIZE];
struct list_head wakeup_descs_list;
spinlock_t wakeup_descs_lock;
void **keys;
struct sysfsm_data sysfsm_data;
@@ -382,60 +368,12 @@ int mcctrl_ikc_send(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp);
int mcctrl_ikc_send_msg(ihk_os_t os, int cpu, int msg, int ref, unsigned long arg);
int mcctrl_ikc_is_valid_thread(ihk_os_t os, int cpu);
struct mcctrl_wakeup_desc {
int status;
int err;
wait_queue_head_t wq;
struct list_head chain;
int free_addrs_count;
void *free_addrs[];
};
/* ikc query-and-wait helper
*
* Arguments:
* - os, cpu and pisp as per mcctl_ikc_send()
* - timeout: time to wait for reply in ms
* - desc: if set, memory area to be used for desc.
* Care must be taken to leave room for variable-length array.
* - do_free: returns bool that specify if the caller should free
* its memory on error (e.g. if ikc_send failed in the first place,
* the reply has no chance of coming and memory should be free)
* Always true on success.
* - free_addrs_count & ...: addresses to kmalloc'd pointers that
* are referenced in the message and must be left intact if we
* abort to timeout/signal.
*/
int mcctrl_ikc_send_wait(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp,
long int timeout, struct mcctrl_wakeup_desc *desc,
int *do_frees, int free_addrs_count, ...);
ihk_os_t osnum_to_os(int n);
/* look up symbols, plus arch-specific ones */
extern int (*mcctrl_sys_mount)(char *dev_name, char *dir_name, char *type,
unsigned long flags, void *data);
extern int (*mcctrl_sys_umount)(char *dir_name, int flags);
extern int (*mcctrl_sys_unshare)(unsigned long unshare_flags);
extern long (*mcctrl_sched_setaffinity)(pid_t pid,
const struct cpumask *in_mask);
extern int (*mcctrl_sched_setscheduler_nocheck)(struct task_struct *p,
int policy,
const struct sched_param *param);
extern ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz);
extern void (*mcctrl_zap_page_range)(struct vm_area_struct *vma,
unsigned long start,
unsigned long size,
struct zap_details *details);
extern struct inode_operations *mcctrl_hugetlbfs_inode_operations;
/* syscall.c */
void pager_add_process(void);
void pager_remove_process(struct mcctrl_per_proc_data *ppd);
void pager_cleanup(void);
int __do_in_kernel_irq_syscall(ihk_os_t os, struct ikc_scd_packet *packet);
int __do_in_kernel_syscall(ihk_os_t os, struct ikc_scd_packet *packet);
int mcctrl_add_per_proc_data(struct mcctrl_usrdata *ud, int pid,
struct mcctrl_per_proc_data *ppd);
@@ -444,18 +382,20 @@ struct mcctrl_per_proc_data *mcctrl_get_per_proc_data(
struct mcctrl_usrdata *ud, int pid);
void mcctrl_put_per_proc_data(struct mcctrl_per_proc_data *ppd);
int mcctrl_add_per_thread_data(struct mcctrl_per_proc_data *ppd, void *data);
void mcctrl_put_per_thread_data_unsafe(struct mcctrl_per_thread_data *ptd);
void mcctrl_put_per_thread_data(struct mcctrl_per_thread_data* ptd);
int mcctrl_add_per_thread_data(struct mcctrl_per_proc_data* ppd,
struct task_struct *task, void *data);
int mcctrl_delete_per_thread_data(struct mcctrl_per_proc_data* ppd,
struct task_struct *task);
#ifdef POSTK_DEBUG_ARCH_DEP_56 /* Strange how to use inline declaration fix. */
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_per_proc_data *ppd, struct task_struct *task)
static inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(
struct mcctrl_per_proc_data *ppd, struct task_struct *task)
{
struct mcctrl_per_thread_data *ptd_iter, *ptd = NULL;
int hash = (((uint64_t)task >> 4) & MCCTRL_PER_THREAD_DATA_HASH_MASK);
unsigned long flags;
/* Check if data for this thread exists */
write_lock_irqsave(&ppd->per_thread_data_hash_lock[hash], flags);
/* Check if data for this thread exists and return it */
read_lock_irqsave(&ppd->per_thread_data_hash_lock[hash], flags);
list_for_each_entry(ptd_iter, &ppd->per_thread_data_hash[hash], hash) {
if (ptd_iter->task == task) {
@@ -464,27 +404,16 @@ inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_p
}
}
if (ptd) {
if (atomic_read(&ptd->refcount) <= 0) {
printk("%s: ERROR: use-after-free detected (%d)", __FUNCTION__, atomic_read(&ptd->refcount));
ptd = NULL;
goto out;
}
atomic_inc(&ptd->refcount);
}
out:
write_unlock_irqrestore(&ppd->per_thread_data_hash_lock[hash], flags);
return ptd;
read_unlock_irqrestore(&ppd->per_thread_data_hash_lock[hash], flags);
return ptd ? ptd->data : NULL;
}
#else /* POSTK_DEBUG_ARCH_DEP_56 */
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_per_proc_data *ppd, struct task_struct *task);
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(
struct mcctrl_per_proc_data *ppd, struct task_struct *task);
#endif /* POSTK_DEBUG_ARCH_DEP_56 */
int mcctrl_clear_pte_range(uintptr_t start, uintptr_t len);
void __return_syscall(ihk_os_t os, struct ikc_scd_packet *packet,
long ret, int stid);
int clear_pte_range(uintptr_t start, uintptr_t len);
int mcctrl_os_alive(void);
@@ -496,6 +425,7 @@ struct procfs_read {
int count; /* bytes to read (request) */
int eof; /* if eof is detected, 1 otherwise 0. (answer)*/
int ret; /* read bytes (answer) */
int status; /* non-zero if done (answer) */
int newcpu; /* migrated new cpu (answer) */
int readwrite; /* 0:read, 1:write */
char fname[PROCFS_NAME_MAX]; /* procfs filename (request) */
@@ -508,8 +438,7 @@ struct procfs_file {
};
void procfs_answer(struct mcctrl_usrdata *ud, int pid);
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg,
unsigned long resp_pa);
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg);
void add_tid_entry(int osnum, int pid, int tid);
void add_pid_entry(int osnum, int pid);
void delete_tid_entry(int osnum, int pid, int tid);
@@ -545,9 +474,7 @@ struct vdso {
int reserve_user_space(struct mcctrl_usrdata *usrdata, unsigned long *startp,
unsigned long *endp);
int release_user_space(uintptr_t start, uintptr_t len);
void get_vdso_info(ihk_os_t os, long vdso_pa);
int arch_symbols_init(void);
struct get_cpu_mapping_req {
int busy; /* INOUT: */

View File

@@ -103,6 +103,33 @@ getpath(struct procfs_list_entry *e, char *buf, int bufsize)
}
}
/**
* \brief Process SCD_MSG_PROCFS_ANSWER message.
*
* \param ud mcctrl_usrdata pointer
* \param pid PID of the requesting process
*/
void procfs_answer(struct mcctrl_usrdata *ud, int pid)
{
struct mcctrl_per_proc_data *ppd = NULL;
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(ud, pid);
if (unlikely(!ppd)) {
kprintf("%s: ERROR: no per-process structure for PID %d\n",
__FUNCTION__, pid);
return;
}
}
wake_up_all(pid > 0 ? &ppd->wq_procfs : &ud->wq_procfs);
if (pid > 0) {
mcctrl_put_per_proc_data(ppd);
}
}
static struct procfs_list_entry *
find_procfs_entry(struct procfs_list_entry *parent, const char *name)
{
@@ -294,8 +321,6 @@ get_base_entry(int osnum)
if(!e){
e = add_procfs_entry(NULL, name, S_IFDIR | 0555,
uid, gid, NULL);
if (!e)
return NULL;
e->osnum = osnum;
}
return e;
@@ -431,8 +456,6 @@ proc_exe_link(int osnum, int pid, const char *path)
e = add_procfs_entry(parent, "exe", S_IFLNK | 0777, uid, gid,
path);
if (!e)
goto out;
e->data = kmalloc(strlen(path) + 1, GFP_KERNEL);
strcpy(e->data, path);
task = find_procfs_entry(parent, "task");
@@ -441,7 +464,6 @@ proc_exe_link(int osnum, int pid, const char *path)
uid, gid, path);
}
}
out:
up(&procfs_file_list_lock);
}
@@ -487,6 +509,7 @@ procfs_exit(int osnum)
* This function conforms to the 2) way of fs/proc/generic.c
* from linux-2.6.39.4.
*/
#ifdef POSTK_DEBUG_TEMP_FIX_43 /* Fixed an issue that failed pread / pwrite of size larger than 4MB */
static ssize_t __mckernel_procfs_read_write(
struct file *file,
char __user *buf, size_t nbytes,
@@ -497,7 +520,7 @@ static ssize_t __mckernel_procfs_read_write(
int order = 0;
volatile struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int ret, osnum, pid;
int ret, osnum, pid, retw;
unsigned long pbuf;
size_t count = nbytes;
size_t copy_size = 0;
@@ -592,11 +615,11 @@ static ssize_t __mckernel_procfs_read_write(
while (count > 0) {
int this_len = min_t(ssize_t, count, copy_size);
int do_free;
r->pbuf = pbuf;
r->eof = 0;
r->ret = -EIO; /* default */
r->status = 0;
r->offset = offset;
r->count = this_len;
r->readwrite = read_write;
@@ -606,26 +629,50 @@ static ssize_t __mckernel_procfs_read_write(
isp.arg = virt_to_phys(r);
isp.pid = pid;
ret = mcctrl_ikc_send_wait(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0,
&isp, HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
ret = mcctrl_ikc_send(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0, &isp);
if (ret < 0) {
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
ret = -ERESTART;
}
if (!do_free)
r = NULL;
goto out; /* error */
}
/* Wait for a reply. */
ret = -EIO; /* default exit code */
dprintk("%s: waiting for reply\n", __FUNCTION__);
retry_wait:
/* Wait for the status field of the procfs_read structure,
* wait on per-process or OS specific data depending on
* who the request is for.
*/
if (pid > 0) {
retw = wait_event_interruptible_timeout(ppd->wq_procfs,
r->status != 0, HZ);
}
else {
retw = wait_event_interruptible_timeout(udp->wq_procfs,
r->status != 0, HZ);
}
/* Timeout? */
if (retw == 0 && r->status == 0) {
printk("%s: error: timeout (1 sec)\n", __FUNCTION__);
goto out;
}
/* Interrupted? */
else if (retw == -ERESTARTSYS) {
ret = -ERESTART;
goto out;
}
/* Were we woken up by a reply to another procfs request? */
else if (r->status == 0) {
/* TODO: r->status is not set atomically, we could be woken
* up with status == 0 and it could change to 1 while in this
* code, we could potentially miss the wake_up()...
*/
printk("%s: stale wake-up, retrying\n", __FUNCTION__);
goto retry_wait;
}
/* Wake up and check the result. */
dprintk("%s: woke up. ret: %d, eof: %d\n",
@@ -670,6 +717,193 @@ out:
return ret;
}
#else /* POSTK_DEBUG_TEMP_FIX_43 */
static ssize_t __mckernel_procfs_read_write(
struct file *file,
char __user *buf, size_t nbytes,
loff_t *ppos, int read_write)
{
struct inode * inode = file->f_inode;
char *kern_buffer = NULL;
int order = 0;
volatile struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int ret, osnum, pid, retw;
unsigned long pbuf;
unsigned long count = nbytes;
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0)
struct proc_dir_entry *dp = PDE(inode);
struct procfs_list_entry *e = dp->data;
#else
struct procfs_list_entry *e = PDE_DATA(inode);
#endif
loff_t offset = *ppos;
char pathbuf[PROCFS_NAME_MAX];
char *path, *p;
ihk_os_t os = NULL;
struct mcctrl_usrdata *udp = NULL;
struct mcctrl_per_proc_data *ppd = NULL;
if (count <= 0 || offset < 0) {
return 0;
}
path = getpath(e, pathbuf, PROCFS_NAME_MAX);
dprintk("%s: invoked for %s, offset: %lu, count: %lu\n",
__FUNCTION__, path,
(unsigned long)offset, count);
/* Verify OS number */
ret = sscanf(path, "mcos%d/", &osnum);
if (ret != 1) {
printk("%s: error: couldn't determine OS number\n", __FUNCTION__);
return -EINVAL;
}
if (osnum != e->osnum) {
printk("%s: error: OS numbers don't match\n", __FUNCTION__);
return -EINVAL;
}
/* Is this request for a specific process? */
p = strchr(path, '/') + 1;
ret = sscanf(p, "%d/", &pid);
if (ret != 1) {
pid = -1;
}
os = osnum_to_os(osnum);
if (!os) {
printk("%s: error: no IHK OS data found for OS %d\n",
__FUNCTION__, osnum);
return -EINVAL;
}
udp = ihk_host_os_get_usrdata(os);
if (!udp) {
printk("%s: error: no MCCTRL data found for OS %d\n",
__FUNCTION__, osnum);
return -EINVAL;
}
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(udp, pid);
if (unlikely(!ppd)) {
printk("%s: error: no per-process structure for PID %d",
__FUNCTION__, pid);
return -EINVAL;
}
}
while ((1 << order) < count) ++order;
if (order > 12) {
order -= 12;
}
else {
order = 1;
}
/* NOTE: we need physically contigous memory to pass through IKC */
kern_buffer = (char *)__get_free_pages(GFP_KERNEL, order);
if (!kern_buffer) {
printk("%s: ERROR: allocating kernel buffer\n", __FUNCTION__);
ret = -ENOMEM;
goto out;
}
pbuf = virt_to_phys(kern_buffer);
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
ret = -ENOMEM;
goto out;
}
r->pbuf = pbuf;
r->eof = 0;
r->ret = -EIO; /* default */
r->status = 0;
r->offset = offset;
r->count = count;
r->readwrite = read_write;
strncpy((char *)r->fname, path, PROCFS_NAME_MAX);
isp.msg = SCD_MSG_PROCFS_REQUEST;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = pid;
ret = mcctrl_ikc_send(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0, &isp);
if (ret < 0) {
goto out; /* error */
}
/* Wait for a reply. */
ret = -EIO; /* default exit code */
dprintk("%s: waiting for reply\n", __FUNCTION__);
retry_wait:
/* Wait for the status field of the procfs_read structure,
* wait on per-process or OS specific data depending on
* who the request is for.
*/
if (pid > 0) {
retw = wait_event_interruptible_timeout(ppd->wq_procfs,
r->status != 0, 5 * HZ);
}
else {
retw = wait_event_interruptible_timeout(udp->wq_procfs,
r->status != 0, 5 * HZ);
}
/* Timeout? */
if (retw == 0 && r->status == 0) {
printk("%s: error: timeout (1 sec)\n", __FUNCTION__);
goto out;
}
/* Interrupted? */
else if (retw == -ERESTARTSYS) {
ret = -ERESTART;
goto out;
}
/* Were we woken up by a reply to another procfs request? */
else if (r->status == 0) {
/* TODO: r->status is not set atomically, we could be woken
* up with status == 0 and it could change to 1 while in this
* code, we could potentially miss the wake_up()...
*/
printk("%s: stale wake-up, retrying\n", __FUNCTION__);
goto retry_wait;
}
/* Wake up and check the result. */
dprintk("%s: woke up. ret: %d, eof: %d\n",
__FUNCTION__, r->ret, r->eof);
if (r->ret > 0) {
if (read_write == 0) {
if (copy_to_user(buf, kern_buffer, r->ret)) {
printk("%s: ERROR: copy_to_user failed.\n", __FUNCTION__);
ret = -EFAULT;
goto out;
}
}
*ppos += r->ret;
}
ret = r->ret;
out:
if (ppd)
mcctrl_put_per_proc_data(ppd);
if (kern_buffer)
free_pages((uintptr_t)kern_buffer, order);
if (r)
kfree((void *)r);
return ret;
}
#endif /* POSTK_DEBUG_TEMP_FIX_43 */
static ssize_t mckernel_procfs_read(struct file *file,
char __user *buf, size_t nbytes, loff_t *ppos)
@@ -705,48 +939,33 @@ struct procfs_work {
int msg;
int pid;
unsigned long arg;
unsigned long resp_pa;
struct work_struct work;
};
static void procfsm_work_main(struct work_struct *work0)
{
struct procfs_work *work = container_of(work0, struct procfs_work, work);
unsigned long phys;
int *done;
switch (work->msg) {
case SCD_MSG_PROCFS_TID_CREATE:
add_tid_entry(ihk_host_os_get_index(work->os),
work->pid, work->arg);
phys = ihk_device_map_memory(ihk_os_to_dev(work->os),
work->resp_pa, sizeof(int));
done = ihk_device_map_virtual(ihk_os_to_dev(work->os),
phys, sizeof(int), NULL, 0);
*done = 1;
ihk_device_unmap_virtual(ihk_os_to_dev(work->os),
done, sizeof(int));
ihk_device_unmap_memory(ihk_os_to_dev(work->os),
phys, sizeof(int));
break;
case SCD_MSG_PROCFS_TID_CREATE:
add_tid_entry(ihk_host_os_get_index(work->os), work->pid, work->arg);
break;
case SCD_MSG_PROCFS_TID_DELETE:
delete_tid_entry(ihk_host_os_get_index(work->os),
work->pid, work->arg);
break;
case SCD_MSG_PROCFS_TID_DELETE:
delete_tid_entry(ihk_host_os_get_index(work->os), work->pid, work->arg);
break;
default:
pr_warn("%s: unknown work: msg: %d, pid: %d, arg: %lu)\n",
__func__, work->msg, work->pid, work->arg);
break;
default:
printk("%s: unknown work: msg: %d, pid: %d, arg: %lu)\n",
__FUNCTION__, work->msg, work->pid, work->arg);
break;
}
kfree(work);
return;
}
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg,
unsigned long resp_pa)
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg)
{
struct procfs_work *work = NULL;
@@ -760,7 +979,6 @@ int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg,
work->msg = msg;
work->pid = pid;
work->arg = arg;
work->resp_pa = resp_pa;
INIT_WORK(&work->work, &procfsm_work_main);
schedule_work(&work->work);
@@ -779,303 +997,6 @@ static const struct file_operations mckernel_forward = {
.write = mckernel_procfs_write,
};
#define PA_NULL (-1L)
struct mckernel_procfs_buffer_info {
unsigned long top_pa;
unsigned long cur_pa;
ihk_os_t os;
int pid;
char path[0];
};
struct mckernel_procfs_buffer {
unsigned long next_pa;
unsigned long pos;
unsigned long size;
char buf[0];
};
static int mckernel_procfs_buff_open(struct inode *inode, struct file *file)
{
struct mckernel_procfs_buffer_info *info;
int pid;
int ret;
char *path;
char *path_buf;
char *p;
ihk_os_t os;
#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
struct proc_dir_entry *dp = PDE(inode);
struct procfs_list_entry *e = dp->data;
#else
struct procfs_list_entry *e = PDE_DATA(inode);
#endif
os = osnum_to_os(e->osnum);
if (!os) {
return -EINVAL;
}
path_buf = kmalloc(PROCFS_NAME_MAX, GFP_KERNEL);
if (!path_buf) {
return -ENOMEM;
}
path = getpath(e, path_buf, PROCFS_NAME_MAX);
p = strchr(path, '/') + 1;
ret = sscanf(p, "%d/", &pid);
if (ret != 1) {
pid = -1;
}
info = kmalloc(sizeof(struct mckernel_procfs_buffer_info) +
strlen(path) + 1, GFP_KERNEL);
if (!info) {
kfree(path_buf);
return -ENOMEM;
}
info->top_pa = PA_NULL;
info->cur_pa = PA_NULL;
info->os = os;
info->pid = pid;
strcpy(info->path, path);
file->private_data = info;
kfree(path_buf);
return 0;
}
static int mckernel_procfs_buff_release(struct inode *inode, struct file *file)
{
struct mckernel_procfs_buffer_info *info = file->private_data;
int rc = 0;
if (!info) {
return -EIO;
}
file->private_data = NULL;
if (info->top_pa != PA_NULL) {
int ret;
struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int do_free;
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
rc = -ENOMEM;
goto out;
}
memset(r, '\0', sizeof(struct procfs_read));
r->pbuf = info->top_pa;
r->ret = -EIO; /* default */
r->fname[0] = '\0';
isp.msg = SCD_MSG_PROCFS_RELEASE;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = 0;
rc = -EIO;
ret = mcctrl_ikc_send_wait(info->os, 0,
&isp, 5 * HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
if (ret < 0) {
rc = ret;
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
rc = -ERESTART;
}
if (!do_free)
r = NULL;
goto out;
}
if (r->ret < 0) {
rc = r->ret;
goto out;
}
rc = 0;
out:
if (r)
kfree((void *)r);
}
kfree(info);
return rc;
}
static ssize_t mckernel_procfs_buff_read(struct file *file, char __user *ubuf,
size_t nbytes, loff_t *ppos)
{
struct mckernel_procfs_buffer_info *info = file->private_data;
unsigned long phys;
struct mckernel_procfs_buffer *buf;
int pos = *ppos;
ssize_t l = 0;
int done = 0;
ihk_os_t os;
if (nbytes <= 0 || *ppos < 0) {
return 0;
}
if (!info) {
return -EIO;
}
os = info->os;
if (info->top_pa == PA_NULL) {
int ret;
int pid = info->pid;
struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
struct mcctrl_usrdata *udp = NULL;
struct mcctrl_per_proc_data *ppd = NULL;
int do_free;
udp = ihk_host_os_get_usrdata(os);
if (!udp) {
pr_err("%s: no MCCTRL data found for OS\n",
__func__);
return -EINVAL;
}
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(udp, pid);
if (unlikely(!ppd)) {
pr_err("%s: no per-process structure for PID %d",
__func__, pid);
return -EINVAL;
}
}
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
l = -ENOMEM;
done = 1;
goto out;
}
memset(r, '\0', sizeof(struct procfs_read));
r->pbuf = PA_NULL;
r->ret = -EIO; /* default */
strncpy((char *)r->fname, info->path, PROCFS_NAME_MAX);
isp.msg = SCD_MSG_PROCFS_REQUEST;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = pid;
l = -EIO;
done = 1;
ret = mcctrl_ikc_send_wait(os,
(pid > 0) ? ppd->ikc_target_cpu : 0,
&isp, 5 * HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
if (ret < 0) {
l = ret;
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
l = -ERESTART;
}
if (!do_free)
r = NULL;
goto out;
}
if (r->ret < 0) {
l = r->ret;
goto out;
}
done = 0;
l = 0;
info->top_pa = info->cur_pa = r->pbuf;
out:
if (ppd)
mcctrl_put_per_proc_data(ppd);
if (r)
kfree((void *)r);
}
if (info->cur_pa == PA_NULL) {
info->cur_pa = info->top_pa;
}
while (!done && info->cur_pa != PA_NULL) {
long bpos;
long bsize;
phys = ihk_device_map_memory(ihk_os_to_dev(os), info->cur_pa,
PAGE_SIZE);
#ifdef CONFIG_MIC
buf = ioremap_wc(phys, PAGE_SIZE);
#else
buf = ihk_device_map_virtual(ihk_os_to_dev(os), phys,
PAGE_SIZE, NULL, 0);
#endif
if (pos < buf->pos) {
info->cur_pa = info->top_pa;
goto rep;
}
if (pos >= buf->pos + buf->size) {
info->cur_pa = buf->next_pa;
goto rep;
}
bpos = pos - buf->pos;
bsize = (buf->pos + buf->size) - pos;
if (bsize > (nbytes - l)) {
bsize = nbytes - l;
}
if (copy_to_user(ubuf, buf->buf + bpos, bsize)) {
done = 1;
pos = *ppos;
l = -EFAULT;
}
else {
ubuf += bsize;
pos += bsize;
l += bsize;
if (l == nbytes) {
done = 1;
}
}
rep:
#ifdef CONFIG_MIC
iounmap(buf);
#else
ihk_device_unmap_virtual(ihk_os_to_dev(os), buf, PAGE_SIZE);
#endif
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, PAGE_SIZE);
};
*ppos = pos;
return l;
}
static const struct file_operations mckernel_buff_io = {
.llseek = mckernel_procfs_lseek,
.read = mckernel_procfs_buff_read,
.write = NULL,
.open = mckernel_procfs_buff_open,
.release = mckernel_procfs_buff_release,
};
static const struct procfs_entry tid_entry_stuff[] = {
// PROC_REG("auxv", S_IRUSR, NULL),
// PROC_REG("clear_refs", S_IWUSR, NULL),
@@ -1085,10 +1006,10 @@ static const struct procfs_entry tid_entry_stuff[] = {
// PROC_LNK("exe", mckernel_readlink),
// PROC_REG("limits", S_IRUSR|S_IWUSR, NULL),
// PROC_REG("maps", S_IRUGO, NULL),
PROC_REG("mem", 0600, NULL),
PROC_REG("mem", S_IRUSR|S_IWUSR, NULL),
// PROC_REG("pagemap", S_IRUGO, NULL),
// PROC_REG("smaps", S_IRUGO, NULL),
PROC_REG("stat", 0444, &mckernel_buff_io),
PROC_REG("stat", S_IRUGO, NULL),
// PROC_REG("statm", S_IRUGO, NULL),
// PROC_REG("status", S_IRUGO, NULL),
// PROC_REG("syscall", S_IRUGO, NULL),
@@ -1097,26 +1018,26 @@ static const struct procfs_entry tid_entry_stuff[] = {
};
static const struct procfs_entry pid_entry_stuff[] = {
PROC_REG("auxv", 0400, &mckernel_buff_io),
PROC_REG("auxv", S_IRUSR, NULL),
/* Support the case where McKernel process retrieves its job-id under the Fujitsu TCS suite. */
// PROC_REG("cgroup", S_IXUSR, NULL),
// PROC_REG("clear_refs", S_IWUSR, NULL),
PROC_REG("cmdline", 0444, &mckernel_buff_io),
PROC_REG("comm", 0644, &mckernel_buff_io),
PROC_REG("cmdline", S_IRUGO, NULL),
// PROC_REG("comm", S_IRUGO|S_IWUSR, NULL),
// PROC_REG("coredump_filter", S_IRUGO|S_IWUSR, NULL),
// PROC_REG("cpuset", S_IRUGO, NULL),
PROC_REG("cpuset", S_IXUSR, NULL),
// PROC_REG("environ", S_IRUSR, NULL),
// PROC_LNK("exe", mckernel_readlink),
// PROC_REG("limits", S_IRUSR|S_IWUSR, NULL),
PROC_REG("maps", 0444, &mckernel_buff_io),
PROC_REG("mem", 0400, NULL),
PROC_REG("pagemap", 0444, NULL),
// PROC_REG("smaps", S_IRUGO, NULL),
// PROC_REG("stat", 0444, &mckernel_buff_io),
PROC_REG("maps", S_IRUGO, NULL),
PROC_REG("mem", S_IRUSR|S_IWUSR, NULL),
PROC_REG("pagemap", S_IRUGO, NULL),
PROC_REG("smaps", S_IRUGO, NULL),
// PROC_REG("stat", S_IRUGO, NULL),
// PROC_REG("statm", S_IRUGO, NULL),
PROC_REG("status", 0444, &mckernel_buff_io),
PROC_REG("status", S_IRUGO, NULL),
// PROC_REG("syscall", S_IRUGO, NULL),
PROC_DIR("task", 0555),
PROC_DIR("task", S_IRUGO|S_IXUGO),
// PROC_REG("wchan", S_IRUGO, NULL),
PROC_TERM
};
@@ -1124,14 +1045,14 @@ static const struct procfs_entry pid_entry_stuff[] = {
static const struct procfs_entry base_entry_stuff[] = {
// PROC_REG("cmdline", S_IRUGO, NULL),
#ifdef POSTK_DEBUG_ARCH_DEP_42 /* /proc/cpuinfo support added. */
PROC_REG("cpuinfo", 0444, &mckernel_buff_io),
PROC_REG("cpuinfo", S_IRUGO, NULL),
#else /* POSTK_DEBUG_ARCH_DEP_42 */
// PROC_REG("cpuinfo", S_IRUGO, NULL),
#endif /* POSTK_DEBUG_ARCH_DEP_42 */
// PROC_REG("meminfo", S_IRUGO, NULL),
// PROC_REG("pagetypeinfo",S_IRUGO, NULL),
// PROC_REG("softirq", S_IRUGO, NULL),
PROC_REG("stat", 0444, &mckernel_buff_io),
PROC_REG("stat", S_IRUGO, NULL),
// PROC_REG("uptime", S_IRUGO, NULL),
// PROC_REG("version", S_IRUGO, NULL),
// PROC_REG("vmallocinfo",S_IRUSR, NULL),

File diff suppressed because it is too large Load Diff

View File

@@ -790,7 +790,6 @@ out:
return error;
} /* setup_node_files() */
#ifdef SETUP_PCI_FILES
static int read_file(void *buf, size_t size, char *fmt, va_list ap)
{
int error;
@@ -799,6 +798,7 @@ static int read_file(void *buf, size_t size, char *fmt, va_list ap)
int n;
struct file *fp = NULL;
loff_t off;
mm_segment_t ofs;
ssize_t ss;
dprintk("read_file(%p,%ld,%s,%p)\n", buf, size, fmt, ap);
@@ -824,14 +824,13 @@ static int read_file(void *buf, size_t size, char *fmt, va_list ap)
}
off = 0;
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
ss = kernel_read(fp, buf, size, &off);
#else
ss = kernel_read(fp, off, buf, size);
#endif
ofs = get_fs();
set_fs(KERNEL_DS);
ss = vfs_read(fp, buf, size, &off);
set_fs(ofs);
if (ss < 0) {
error = ss;
eprintk("mcctrl:read_file:kernel_read failed. %d\n", error);
eprintk("mcctrl:read_file:vfs_read failed. %d\n", error);
goto out;
}
if (ss >= size) {
@@ -893,6 +892,16 @@ out:
return error;
} /* read_long() */
#ifdef MCCTRL_KSYM_sys_readlink
static ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz)
#if MCCTRL_KSYM_sys_readlink
= (void *)MCCTRL_KSYM_sys_readlink;
#else
= &sys_readlink;
#endif
#endif
static int read_link(char *buf, size_t bufsize, char *fmt, ...)
{
int error;
@@ -942,14 +951,30 @@ out:
return error;
} /* read_link() */
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
static int setup_one_pci(struct mcctrl_usrdata *udp, const char *name)
{
#else /* POSTK_DEBUG_TEMP_FIX_22 */
static int setup_one_pci(void *arg0, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type)
{
struct mcctrl_usrdata *udp = arg0;
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
int error;
char *buf = NULL;
long node;
struct sysfsm_bitmap_param param;
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
dprintk("setup_one_pci(%p,%s)\n", udp, name);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_one_pci(%p,%s,%d,%#lx,%#lx,%d)\n",
arg0, name, namlen, (long)offset, (long)ino, d_type);
if (namlen != 12) {
error = 0;
goto out;
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
buf = (void *)__get_free_pages(GFP_KERNEL, 0);
if (!buf) {
@@ -1001,39 +1026,26 @@ static int setup_one_pci(struct mcctrl_usrdata *udp, const char *name)
error = 0;
out:
free_pages((long)buf, 0);
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
dprintk("setup_one_pci(%p,%s): %d\n", udp, name, error);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_one_pci(%p,%s,%d,%#lx,%#lx,%d): %d\n",
arg0, name, namlen, (long)offset, (long)ino, d_type,
error);
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
return error;
} /* setup_one_pci() */
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
LIST_HEAD(pci_file_name_list);
struct pci_file_name {
char *name;
struct list_head chain;
};
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
struct mcctrl_filler_args {
struct dir_context ctx;
void *buf;
};
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0)
static int pci_file_name_gen(struct dir_context *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
#else
static int pci_file_name_gen(void *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
#endif
{
struct mcctrl_filler_args *args
= container_of(ctx, struct mcctrl_filler_args, ctx);
void *buf = args->buf;
#else
static int pci_file_name_gen(void *buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type)
{
#endif
struct pci_file_name *p;
int error = -1;
@@ -1071,31 +1083,56 @@ out:
buf, name, namlen, (long)offset, (long)ino, d_type, error);
return error;
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
static inline int mcctrl_vfs_readdir(struct file *file, filldir_t filler,
void *buf)
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
typedef int (*mcctrl_filldir_t)(void *buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type);
struct mcctrl_filler_args {
struct dir_context ctx;
mcctrl_filldir_t filler;
void *buf;
};
static int mcctrl_filler(struct dir_context *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned d_type)
{
struct mcctrl_filler_args *args
= container_of(ctx, struct mcctrl_filler_args, ctx);
return (*args->filler)(args->buf, name, namlen, offset, ino, d_type);
} /* mcctrl_filler() */
static inline int mcctrl_vfs_readdir(struct file *file,
mcctrl_filldir_t filler, void *buf)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
struct mcctrl_filler_args args = {
.ctx.actor = filler,
.ctx.actor = &mcctrl_filler,
.filler = (void *)filler,
.buf = buf,
};
return iterate_dir(file, &args.ctx);
#else
return vfs_readdir(file, filler, buf);
#endif
} /* mcctrl_vfs_readdir() */
#else
static inline int mcctrl_vfs_readdir(struct file *file, filldir_t filler,
void *buf)
{
return vfs_readdir(file, filler, buf);
} /* mcctrl_vfs_readdir() */
#endif
static int setup_pci_files(struct mcctrl_usrdata *udp)
{
int error;
int er;
struct file *fp = NULL;
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
int ret = 0;
struct pci_file_name *cur;
struct pci_file_name *next;
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_pci_files(%p)\n", udp);
fp = filp_open("/sys/bus/pci/devices", O_DIRECTORY, 0);
@@ -1105,13 +1142,18 @@ static int setup_pci_files(struct mcctrl_usrdata *udp)
goto out;
}
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
error = mcctrl_vfs_readdir(fp, &pci_file_name_gen, udp);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
error = mcctrl_vfs_readdir(fp, &setup_one_pci, udp);
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
if (error) {
eprintk("mcctrl:setup_pci_files:"
"mcctrl_vfs_readdir failed. %d\n", error);
goto out;
}
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
list_for_each_entry_safe(cur, next, &pci_file_name_list, chain) {
if (!ret) {
ret = setup_one_pci(udp, cur->name);
@@ -1120,6 +1162,7 @@ static int setup_pci_files(struct mcctrl_usrdata *udp)
kfree(cur->name);
kfree(cur);
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
error = 0;
out:
@@ -1133,7 +1176,6 @@ out:
dprintk("setup_pci_files(%p): %d\n", udp, error);
return error;
} /* setup_pci_files() */
#endif // SETUP_PCI_FILES
void setup_sysfs_files(ihk_os_t os)
{
@@ -1173,9 +1215,7 @@ void setup_sysfs_files(ihk_os_t os)
setup_cpus_sysfs_files(udp);
setup_node_files(udp);
setup_cpus_sysfs_files_node_link(udp);
#ifdef SETUP_PCI_FILES
setup_pci_files(udp);
#endif
//setup_pci_files(udp);
/* Indicate sysfs files setup completion for boot script */
error = sysfsm_mkdirf(os, NULL, "/sys/setup_complete");

View File

@@ -21,7 +21,7 @@ endif
endif
ifeq ($(BUILD_MODULE_TMP),rhel)
ifeq ($(BUILD_MODULE),none)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -eq 199168 -a ${RHEL_RELEASE} -ge 327 -a ${RHEL_RELEASE} -le 862 ]; then echo "linux-3.10.0-327.36.1.el7"; else echo "none"; fi)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -eq 199168 -a ${RHEL_RELEASE} -ge 327 -a ${RHEL_RELEASE} -le 693 ]; then echo "linux-3.10.0-327.36.1.el7"; else echo "none"; fi)
endif
ifeq ($(BUILD_MODULE),none)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -ge 262144 -a ${LINUX_VERSION_CODE} -lt 262400 ]; then echo "linux-4.0.9"; else echo "none"; fi)

View File

@@ -15,7 +15,6 @@
#include <linux/rbtree.h>
#include <linux/security.h>
#include <linux/cred.h>
#include <linux/version.h>
#include "overlayfs.h"
struct ovl_cache_entry {
@@ -35,18 +34,10 @@ struct ovl_dir_cache {
struct list_head entries;
};
/* vfs_readdir vs. iterate_dir compat */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
#define USE_ITERATE_DIR 1
#endif
#ifndef USE_ITERATE_DIR
struct dir_context {
const filldir_t actor;
//loff_t pos;
};
#endif
struct ovl_readdir_data {
struct dir_context ctx;
@@ -265,11 +256,7 @@ static inline int ovl_dir_read(struct path *realpath,
do {
rdd->count = 0;
rdd->err = 0;
#ifdef USE_ITERATE_DIR
err = iterate_dir(realfile, &rdd->ctx);
#else
err = vfs_readdir(realfile, rdd->ctx.actor, rdd);
#endif
if (err >= 0)
err = rdd->err;
} while (!err && rdd->count);
@@ -378,22 +365,6 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry)
return cache;
}
#ifdef USE_ITERATE_DIR
struct iterate_wrapper {
struct dir_context ctx;
filldir_t actor;
void *buf;
};
static int ovl_wrap_readdir(void *ctx, const char *name, int namelen,
loff_t offset, u64 ino, unsigned int d_type)
{
struct iterate_wrapper *w = ctx;
return w->actor(w->buf, name, namelen, offset, ino, d_type);
}
#endif
static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
{
struct ovl_dir_file *od = file->private_data;
@@ -405,16 +376,7 @@ static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
ovl_dir_reset(file);
if (od->is_real) {
#ifdef USE_ITERATE_DIR
struct iterate_wrapper w = {
.ctx.actor = ovl_wrap_readdir,
.actor = filler,
.buf = buf,
};
res = iterate_dir(od->realfile, &w.ctx);
#else
res = vfs_readdir(od->realfile, filler, buf);
#endif
file->f_pos = od->realfile->f_pos;
return res;

View File

@@ -13,8 +13,6 @@ KDIR ?= @KDIR@
ARCH=@ARCH@
CFLAGS=-Wall -O -I. -I$(VPATH)/arch/${ARCH} -I${IHKDIR} -I@abs_builddir@/../../../ihk/linux/include
LDFLAGS=@LDFLAGS@
CPPFLAGS_SYSCALL_INTERCEPT=@CPPFLAGS_SYSCALL_INTERCEPT@
LDFLAGS_SYSCALL_INTERCEPT=@LDFLAGS_SYSCALL_INTERCEPT@
RPATH=$(shell echo $(LDFLAGS)|awk '{for(i=1;i<=NF;i++){if($$i~/^-L/){w=$$i;sub(/^-L/,"-Wl,-rpath,",w);print w}}}')
VPATH=@abs_srcdir@
TARGET=mcexec libsched_yield ldump2mcdump.so
@@ -23,17 +21,12 @@ LIBS=@LIBS@
IHKDIR ?= $(VPATH)/../../../ihk/linux/include/
MCEXEC_LIBS=-lmcexec -lrt -lnuma -pthread -L@abs_builddir@/../../../ihk/linux/user -lihk -Wl,-rpath,$(MCKERNEL_LIBDIR)
ENABLE_QLMPI=@ENABLE_QLMPI@
WITH_SYSCALL_INTERCEPT=@WITH_SYSCALL_INTERCEPT@
ifeq ($(ENABLE_QLMPI),yes)
MCEXEC_LIBS += -lmpi
TARGET+= libqlmpi.so ql_server ql_mpiexec_start ql_mpiexec_finalize ql_talker libqlfort.so
endif
ifeq ($(WITH_SYSCALL_INTERCEPT),yes)
TARGET += syscall_intercept.so
endif
ifeq ($(ARCH), arm64)
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_ARCH_DEP_, $(i)))
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_TEMP_FIX_, $(i)))
@@ -47,7 +40,7 @@ mcexec: mcexec.c libmcexec.a
# POSTK_DEBUG_ARCH_DEP_34, eclair arch depend separate.
ifeq ($(ARCH), arm64)
eclair: eclair.c arch/$(ARCH)/arch-eclair.c
$(CC) -I.. -I. -I./arch/$(ARCH)/include -I$(VPATH)/.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS) -ldl -lz
$(CC) -I.. -I. -I./arch/$(ARCH)/include -I$(VPATH)/.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS)
else
eclair: eclair.c arch/$(ARCH)/arch-eclair.c
$(CC) -I.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS)
@@ -59,12 +52,6 @@ ldump2mcdump.so: ldump2mcdump.c
libsched_yield: libsched_yield.c
$(CC) -shared -fPIC -Wl,-soname,sched_yield.so.1 -o libsched_yield.so.1.0.0 $^ -lc -ldl
syscall_intercept.so: syscall_intercept.c libsyscall_intercept_arch.a
$(CC) $(CPPFLAGS_SYSCALL_INTERCEPT) -g -O2 $(LDFLAGS_SYSCALL_INTERCEPT) -lsyscall_intercept -fpic -shared -L. -lsyscall_intercept_arch $^ -o $@
libsyscall_intercept_arch.a::
+(cd arch/${ARCH}; $(MAKE))
libmcexec.a::
+(cd arch/${ARCH}; $(MAKE))
@@ -112,9 +99,6 @@ ifeq ($(ENABLE_QLMPI),yes)
install -m 755 ql_mpiexec_start $(BINDIR)
install -m 755 ql_mpiexec_finalize $(BINDIR)
install -m 755 ql_talker $(SBINDIR)
endif
ifeq ($(WITH_SYSCALL_INTERCEPT),yes)
install -m 755 syscall_intercept.so $(MCKERNEL_LIBDIR)
endif
@uncomment_if_ENABLE_MEMDUMP@install -m 755 eclair $(BINDIR)
@uncomment_if_ENABLE_MEMDUMP@install -m 755 vmcore2mckdump $(BINDIR)

View File

@@ -4,7 +4,7 @@ BINDIR=@BINDIR@
KDIR ?= @KDIR@
CFLAGS=-Wall -O -I.
VPATH=@abs_srcdir@
TARGET=../../libmcexec.a ../../libsyscall_intercept_arch.a
TARGET=../../libmcexec.a
LIBS=@LIBS@
all: $(TARGET)
@@ -18,12 +18,6 @@ archdep.o: archdep.S
arch_syscall.o: arch_syscall.c
$(CC) -c -I${KDIR} $(CFLAGS) $(EXTRA_CFLAGS) -fPIE -pie -pthread $<
../../libsyscall_intercept_arch.a: archdep_c.o
$(AR) cr ../../libsyscall_intercept_arch.a archdep_c.o
archdep_c.o: archdep_c.c
$(CC) -c -I${KDIR} $(CFLAGS) $(EXTRA_CFLAGS) -fPIE -pie -pthread $<
clean:
$(RM) $(TARGET) *.o

View File

@@ -42,7 +42,7 @@ int print_kregs(char *rbp, size_t rbp_size, const struct arch_kregs *kregs)
}
for (i = 0; i < sizeof(regs_1)/sizeof(regs_1[0]); i++) { /* rsi, rdi, rbp, rsp */
ret = print_bin(rbp, rbp_size, regs_1 + i, sizeof(regs_1[0]));
ret = print_bin(rbp, rbp_size, (void *)regs_1[i], sizeof(regs_1[0]));
if (ret < 0) {
return ret;
}
@@ -62,7 +62,7 @@ int print_kregs(char *rbp, size_t rbp_size, const struct arch_kregs *kregs)
}
for (i = 0; i < sizeof(regs_2)/sizeof(regs_2[0]); i++) { /* r12-r15 */
ret = print_bin(rbp, rbp_size, regs_2 + i, sizeof(regs_2[0]));
ret = print_bin(rbp, rbp_size, (void *)regs_2[i], sizeof(regs_2[0]));
if (ret < 0) {
return ret;
}

View File

@@ -67,12 +67,6 @@ get_syscall_arg6(syscall_args *args)
return args->r9;
}
static inline unsigned long
get_syscall_rip(syscall_args *args)
{
return args->rip;
}
static inline void
set_syscall_number(syscall_args *args, unsigned long value)
{

View File

@@ -48,7 +48,7 @@ archdep_syscall(struct syscall_wait_desc *w, long *ret)
if (*ret >= PATH_MAX) {
*ret = -ENAMETOOLONG;
}
if (*ret < 0) {
if (ret < 0) {
return 0;
}
__dprintf("open: %s\n", pathbuf);

View File

@@ -1,22 +1,15 @@
/*
Calling convention:
arg: rdi, rsi, rdx, rcx, r8, r9
ret: rax
arg: rdi, rsi, rdx, rcx, r8, r9
ret: rax
rdi: fd
rsi: cmd
rdx: param
rcx: save area
r8: new thread context
Syscam call convention:
syscall number: rax
arg: rdi, rsi, rdx, r10, r8, r9
return addr: rcx
rdi: fd
rsi: cmd
rdx: param
rax syscall number
syscall: (rax:num) rdi rsi rdx r10 r8 r9 (rcx:ret addr)
fd, cmd, param
rdi: fd
rsi: cmd
rdx: param
rcx: save area
r8: new thread context
*/
.global switch_ctx
@@ -98,7 +91,6 @@ switch_ctx:
1:
mov $0xffffffffffffffff,%eax
retq
2:
pushq %rax
movq $158,%rax /* arch_prctl */
@@ -154,3 +146,4 @@ compare_and_swap_int:
lock
cmpxchgl %edx,0(%rdi)
retq

View File

@@ -1,52 +0,0 @@
/*
function call convention
rdi, rsi, rdx, rcx, r8, r9: IN arguments
rax: OUT return value
syscall convention:
rax: IN syscall number
rdi, rsi, rdx, r10, r8, r9: IN arguments
rax: OUT return value
rcx, r11: CLOBBER
*/
long uti_syscall6(long syscall_number, long arg0, long arg1, long arg2, long arg3, long arg4, long arg5)
{
long ret;
asm volatile ("movq %[arg3],%%r10; movq %[arg4],%%r8; movq %[arg5],%%r9; syscall"
: "=a" (ret)
: "a" (syscall_number),
"D" (arg0), "S" (arg1), "d" (arg2),
[arg3] "g" (arg3), [arg4] "g" (arg4), [arg5] "g" (arg5)
: "rcx", "r11", "r10", "r8", "r9", "memory");
return ret;
}
long uti_syscall3(long syscall_number, long arg0, long arg1, long arg2)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number), "D" (arg0), "S" (arg1), "d" (arg2)
: "rcx", "r11", "memory");
return ret;
}
long uti_syscall1(long syscall_number, long arg0)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number), "D" (arg0)
: "rcx", "r11", "memory");
return ret;
}
long uti_syscall0(long syscall_number)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number)
: "rcx", "r11", "memory");
return ret;
}

View File

@@ -1,6 +1,4 @@
#include "../include/uprotocol.h"
extern int switch_ctx(int fd, unsigned long cmd, struct uti_save_fs_desc *desc, void *lctx, void *rctx);
extern int switch_ctx(int fd, unsigned long cmd, void **param, void *lctx, void *rctx);
extern unsigned long compare_and_swap(unsigned long *addr, unsigned long old, unsigned long new);
extern unsigned int compare_and_swap_int(unsigned int *addr, unsigned int old, unsigned int new);
extern int archdep_syscall(struct syscall_wait_desc *w, long *ret);

View File

@@ -1,5 +0,0 @@
extern long uti_syscall6(long syscall_number, long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);
extern long uti_syscall3(long syscall_number, long arg0, long arg1, long arg2);
extern long uti_syscall1(long syscall_number, long arg0);
extern long uti_syscall0(long syscall_number);

View File

@@ -11,9 +11,7 @@
typedef int (*int_void_fn)(void);
#if 0
static int_void_fn orig_sched_yield = 0;
#endif
int sched_yield(void)
{

View File

@@ -73,6 +73,13 @@ e.g.: 10k means 10Kibyte, 100M 100Mibyte, 1G 1Gibyte
Enable system call profiling. After the execution, profiling
information may be obtained by the ihkosctl tool.
.TP
.B -m N
Specify the NUMA memory policy. In the case of Quadrant&Flat mode, NUMA node
0 is CPU cores and NUMA node 1 is MCDRAM. Thus, option "-m 1"
means that user's memory areas are assigned in MCDRAM.
.TP
.B --mpol-no-heap, --mpol-no-stack, --mpol-no-bss,
Disregard NUMA memory policy in the heap/stack/BSS areas.
@@ -93,7 +100,7 @@ This option eliminates potential kernel resource contention by
avoiding page faults in the shared memory region.
.TP
.B -m N, --mpol-threshold=N
.B -M N, --mpol-threshold=N
Specify the threshold of memory size for respecting the memory
allocation policy in NUMA machines. If the size of memory allocation
is smaller than the one specified in this option, the memory area is

File diff suppressed because it is too large Load Diff

View File

@@ -1,139 +0,0 @@
#include <libsyscall_intercept_hook_point.h>
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <syscall.h>
#include <sys/time.h>
#include <sys/resource.h>
#include "../include/uprotocol.h"
#include "../include/uti.h"
#include "./archdep_uti.h"
static struct uti_desc uti_desc;
#define DEBUG_UTI
static int
hook(long syscall_number,
long arg0, long arg1,
long arg2, long arg3,
long arg4, long arg5,
long *result)
{
//return 1; /* debug */
int tid = uti_syscall0(__NR_gettid);
struct terminate_thread_desc term_desc;
unsigned long code;
int stack_top;
if (!uti_desc.start_syscall_intercept) {
return 1; /* System call isn't taken over */
}
if (tid != uti_desc.mck_tid) {
if (uti_desc.syscalls2 && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls2[syscall_number]++;
}
return 1;
}
#ifdef DEBUG_UTI
if (uti_desc.syscalls && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls[syscall_number]++;
}
#endif
switch (syscall_number) {
case __NR_gettid:
*result = uti_desc.mck_tid;
return 0;
case __NR_futex:
case __NR_brk:
case __NR_mmap:
case __NR_munmap:
case __NR_mprotect:
case __NR_mremap:
/* Overflow check */
if (uti_desc.syscall_stack_top == -1) {
*result = -ENOMEM;
return 0;
}
/* Sanity check */
if (uti_desc.syscall_stack_top < 0 || uti_desc.syscall_stack_top >= UTI_SZ_SYSCALL_STACK) {
*result = -EINVAL;
return 0;
}
/* Store the return value in the stack to prevent it from getting corrupted
when an interrupt happens just after ioctl() and before copying the return
value to *result */
stack_top = __sync_fetch_and_sub(&uti_desc.syscall_stack_top, 1);
uti_desc.syscall_stack[stack_top].number = syscall_number;
uti_desc.syscall_stack[stack_top].args[0] = arg0;
uti_desc.syscall_stack[stack_top].args[1] = arg1;
uti_desc.syscall_stack[stack_top].args[2] = arg2;
uti_desc.syscall_stack[stack_top].args[3] = arg3;
uti_desc.syscall_stack[stack_top].args[4] = arg4;
uti_desc.syscall_stack[stack_top].args[5] = arg5;
uti_desc.syscall_stack[stack_top].uti_clv = uti_desc.uti_clv;
uti_desc.syscall_stack[stack_top].ret = -EINVAL;
uti_syscall3(__NR_ioctl, uti_desc.fd, MCEXEC_UP_SYSCALL_THREAD, (long)(uti_desc.syscall_stack + stack_top));
*result = uti_desc.syscall_stack[stack_top].ret;
/* push syscall_struct list */
__sync_fetch_and_add(&uti_desc.syscall_stack_top, 1);
return 0; /* System call is taken over */
case __NR_exit_group:
code = 0x100000000;
goto make_remote_thread_exit;
case __NR_exit:
code = 0;
make_remote_thread_exit:
/* Make migrated-to-Linux thread on the McKernel side call do_exit() or terminate() */
term_desc.pid = uti_desc.pid;
term_desc.tid = uti_desc.tid; /* tid of mcexec */
term_desc.code = code | ((arg0 & 255) << 8);
term_desc.tsk = uti_desc.key;
uti_syscall3(__NR_ioctl, uti_desc.fd, MCEXEC_UP_TERMINATE_THREAD, (long)&term_desc);
return 1;
case __NR_clone:
case __NR_fork:
case __NR_vfork:
case __NR_execve:
*result = -ENOSYS;
return 0;
#if 0 /* debug */
case __NR_set_robust_list:
*result = -ENOSYS;
return 0;
#endif
case 888:
*result = (long)&uti_desc;
return 0;
default:
return 1;
}
return 0;
}
static __attribute__((constructor)) void
init(void)
{
/* Set up the callback function */
intercept_hook_point = hook;
/* Initialize uti_desc */
uti_desc.syscall_stack_top = UTI_SZ_SYSCALL_STACK - 1;
/* Pass address of uti_desc to McKernel */
uti_syscall1(733, (unsigned long)&uti_desc);
}
static __attribute__((destructor)) void
dtor(void)
{
}

1
ihk

Submodule ihk deleted from d9c74adf3f

View File

@@ -6,8 +6,9 @@ IHKDIR=$(IHKBASE)/$(TARGETDIR)
OBJS = init.o mem.o debug.o mikc.o listeners.o ap.o syscall.o cls.o host.o
OBJS += process.o copy.o waitq.o futex.o timer.o plist.o fileobj.o shmobj.o
OBJS += zeroobj.o procfs.o devobj.o sysfs.o xpmem.o profile.o freeze.o
OBJS += rbtree.o hugefileobj.o
OBJS += rbtree.o
OBJS += pager.o
OBJS += file_ops.o user_sdma.o sdma.o user_exp_rcv.o chip.o
# POSTK_DEBUG_ARCH_DEP_18 coredump arch separation.
DEPSRCS=$(wildcard $(SRC)/*.c)
@@ -19,7 +20,7 @@ endif
CFLAGS += -I$(SRC)/include -I@abs_builddir@/../ -I@abs_builddir@/include -D__KERNEL__ -g -fno-omit-frame-pointer -fno-inline -fno-inline-small-functions
ifneq ($(ARCH), arm64)
CFLAGS += -mcmodel=large -mno-red-zone -mno-sse
CFLAGS += -mcmodel=large -mno-red-zone
endif
LDFLAGS += -e arch_start
IHKOBJ = ihk/ihk.o

View File

@@ -29,13 +29,15 @@
#include <time.h>
#include <syscall.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_AP
#ifdef DEBUG_PRINT_AP
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#endif
int num_processors = 1;
@@ -66,6 +68,11 @@ static void ap_wait(void)
init_host_ikc2mckernel();
init_host_ikc2linux(ikc_cpu);
mcs_lock_unlock_noirq(&ap_syscall_semaphore, &mcs_node);
{
extern void hfi1_kmalloc_cache_prealloc(void);
hfi1_kmalloc_cache_prealloc();
}
}
/* one of them listens */
@@ -207,10 +214,8 @@ store_fake_cpu_info(struct sysfs_ops *ops0, void *instance, void *buf,
static struct fake_cpu_info_ops show_fci_online = {
.member = ONLINE,
.ops = {
.show = &show_fake_cpu_info,
.store = &store_fake_cpu_info,
},
.ops.show = &show_fake_cpu_info,
.ops.store = &store_fake_cpu_info,
};
void

126
kernel/chip.c Normal file
View File

@@ -0,0 +1,126 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
/*
* This file contains all of the code that is specific to the HFI chip,
* or what we use of them.
*/
#include <hfi1/hfi.h>
#include <hfi1/chip_registers.h>
#include <hfi1/chip.h>
//#define DEBUG_PRINT_CHIP
#ifdef DEBUG_PRINT_CHIP
#define dkprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if(0) kprintf(__VA_ARGS__); } while (0)
#endif
/*
* index is the index into the receive array
*/
void hfi1_put_tid(struct hfi1_devdata *dd, u32 index,
u32 type, unsigned long pa, u16 order)
{
u64 reg;
void __iomem *base = (dd->rcvarray_wc ? dd->rcvarray_wc :
(dd->kregbase1 + RCV_ARRAY));
if (!(dd->flags & HFI1_PRESENT))
goto done;
if (type == PT_INVALID) {
pa = 0;
} else if (type > PT_INVALID) {
kprintf("unexpected receive array type %u for index %u, not handled\n",
type, index);
goto done;
}
#ifdef TIDRDMA_DEBUG
hfi1_cdbg(TID, "type %s, index 0x%x, pa 0x%lx, bsize 0x%lx",
pt_name(type), index, pa, (unsigned long)order);
#endif
#define RT_ADDR_SHIFT 12 /* 4KB kernel address boundary */
reg = RCV_ARRAY_RT_WRITE_ENABLE_SMASK
| (u64)order << RCV_ARRAY_RT_BUF_SIZE_SHIFT
| ((pa >> RT_ADDR_SHIFT) & RCV_ARRAY_RT_ADDR_MASK)
<< RCV_ARRAY_RT_ADDR_SHIFT;
dkprintf("type %d, index 0x%x, pa 0x%lx, bsize 0x%lx, reg 0x%llx\n",
type, index, pa, (unsigned long)order, reg);
writeq(reg, base + (index * 8));
if (type == PT_EAGER)
/*
* Eager entries are written one-by-one so we have to push them
* after we write the entry.
*/
flush_wc();
done:
return;
}
void hfi1_clear_tids(struct hfi1_ctxtdata *rcd)
{
struct hfi1_devdata *dd = rcd->dd;
u32 i;
#if 0
/* this could be optimized */
for (i = rcd->eager_base; i < rcd->eager_base +
rcd->egrbufs.alloced; i++)
hfi1_put_tid(dd, i, PT_INVALID, 0, 0);
#endif
for (i = rcd->expected_base;
i < rcd->expected_base + rcd->expected_count; i++)
hfi1_put_tid(dd, i, PT_INVALID, 0, 0);
}

View File

@@ -1,28 +1,24 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@@ -41,14 +37,14 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}

View File

@@ -1,28 +1,24 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@@ -41,14 +37,14 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}

View File

@@ -1,28 +1,24 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@@ -41,10 +37,10 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
}

View File

@@ -16,10 +16,6 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@@ -16,10 +16,6 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@@ -16,10 +16,6 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@@ -16,10 +16,6 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@@ -1,28 +1,24 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xFFFFFFFFFE801000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@@ -41,9 +37,9 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
}

View File

@@ -18,9 +18,6 @@
#include <ihk/lock.h>
#include <ihk/monitor.h>
#include <errno.h>
#include <sysfs.h>
#include <debug.h>
#include <limits.h>
struct ihk_kmsg_buf *kmsg_buf;
@@ -87,8 +84,7 @@ void kputs(char *buf)
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
kprintf_unlock(flags_outer);
if (irqflags_can_interrupt(flags_outer) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@@ -127,8 +123,8 @@ int __kprintf(const char *format, ...)
}
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
if (irqflags_can_interrupt(flags_inner) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@@ -169,8 +165,7 @@ int kprintf(const char *format, ...)
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
kprintf_unlock(flags_outer);
if (irqflags_can_interrupt(flags_outer) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@@ -183,147 +178,3 @@ void kmsg_init()
{
ihk_mc_spinlock_init(&kmsg_lock);
}
extern struct ddebug __start___verbose[];
extern struct ddebug __stop___verbose[];
static ssize_t dynamic_debug_sysfs_show(struct sysfs_ops *ops,
void *instance, void *buf, size_t size)
{
struct ddebug *dbg;
ssize_t n = 0;
n = snprintf(buf, size, "# filename:lineno function flags format\n");
for (dbg = __start___verbose; dbg < __stop___verbose; dbg++) {
n += snprintf(buf + n, size - n, "%s:%d %s =%s\n",
dbg->file, dbg->line, dbg->func,
dbg->flags ? "p" : "_");
if (n >= size)
break;
}
return n;
}
static ssize_t dynamic_debug_sysfs_store(struct sysfs_ops *ops,
void *instance, void *buf, size_t size)
{
char *cur = buf;
char *file = NULL, *func = NULL;
long int line_start = 0, line_end = INT_MAX;
int set_flag = -1;
struct ddebug *dbg;
// assume line was new-line terminated and squash last newline
cur[size-1] = '\0';
/* basic line parsing, combinaisons of:
* file <file>
* func <func>
* line <line|line-line|line-|-line>
* and must end with [+-=][p_] (set/clear print flag)
*/
again:
while (cur && cur < ((char *)buf) + size && *cur) {
dkprintf("looking at %.*s, size left %d\n",
size - (cur - (char *)buf), cur,
(char *)buf - cur + size);
if (strncmp(cur, "func ", 5) == 0) {
cur += 5;
func = cur;
} else if (strncmp(cur, "file ", 5) == 0) {
cur += 5;
file = cur;
} else if (strncmp(cur, "line ", 5) == 0) {
cur += 5;
if (*cur != '-') {
line_start = strtol(cur, &cur, 0);
}
if (*cur != '-') {
line_end = line_start;
} else {
cur++;
if (*cur == ' ' || *cur == '\0') {
line_end = INT_MAX;
} else {
line_end = strtol(cur, &cur, 0);
}
}
} else if (strchr("+-=", *cur)) {
switch ((*cur) + 256 * (*(cur+1))) {
case '+' + 256*'p':
case '=' + 256*'p':
set_flag = DDEBUG_PRINT;
break;
case '-' + 256*'p':
case '=' + 256*'_':
set_flag = DDEBUG_NONE;
break;
default:
kprintf("invalid flag: %.*s\n",
size - (cur - (char *)buf), cur);
return -EINVAL;
}
/* XXX check 3rd char is end of input or \n or ; */
cur += 3;
break;
} else {
kprintf("dynamic debug control: unrecognized keyword: %.*s\n",
size - (cur - (char *)buf), cur);
return -EINVAL;
}
cur = strpbrk(cur, " \n");
if (cur) {
*cur = '\0';
cur++;
}
}
dkprintf("func %s, file %s, lines %d-%d, flag %x\n",
func, file, line_start, line_end, set_flag);
if (set_flag < 0) {
kprintf("dynamic debug control: no flag set?\n");
return -EINVAL;
}
if (!func && !file) {
kprintf("at least file or func should be set\n");
return -EINVAL;
}
for (dbg = __start___verbose; dbg < __stop___verbose; dbg++) {
/* TODO: handle wildcards */
if ((!func || strcmp(func, dbg->func) == 0) &&
(!file || strcmp(file, dbg->file) == 0) &&
dbg->line >= line_start &&
dbg->line <= line_end) {
dbg->flags = set_flag;
}
}
if (cur && cur < ((char *)buf) + size && *cur)
goto again;
return size;
}
static struct sysfs_ops dynamic_debug_sysfs_ops = {
.show = &dynamic_debug_sysfs_show,
.store = &dynamic_debug_sysfs_store,
};
void dynamic_debug_sysfs_setup(void)
{
int error;
error = sysfs_createf(&dynamic_debug_sysfs_ops, NULL, 0644,
"/sys/kernel/debug/dynamic_debug/control");
if (error) {
kprintf("%s: ERROR: creating dynamic_debug/control sysfs file",
__func__);
}
}

View File

@@ -36,13 +36,15 @@
#include <syscall.h>
#include <process.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_DEVOBJ
#ifdef DEBUG_PRINT_DEVOBJ
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
@@ -52,15 +54,16 @@ struct devobj {
uintptr_t handle;
off_t pfn_pgoff;
uintptr_t * pfn_table;
ihk_spinlock_t pfn_table_lock;
size_t npages;
};
static memobj_free_func_t devobj_free;
static memobj_release_func_t devobj_release;
static memobj_ref_func_t devobj_ref;
static memobj_get_page_func_t devobj_get_page;
static struct memobj_ops devobj_ops = {
.free = &devobj_free,
.release = &devobj_release,
.ref = &devobj_ref,
.get_page = &devobj_get_page,
};
@@ -85,9 +88,12 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
int error;
struct devobj *obj = NULL;
const size_t npages = (len + PAGE_SIZE - 1) / PAGE_SIZE;
#ifdef POSTK_DEBUG_TEMP_FIX_36
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
const size_t pfn_npages =
(npages + uintptr_per_page - 1) / uintptr_per_page;
const size_t pfn_npages = (npages + uintptr_per_page - 1) / uintptr_per_page;
#else
const size_t pfn_npages = (npages / (PAGE_SIZE / sizeof(uintptr_t))) + 1;
#endif /*POSTK_DEBUG_TEMP_FIX_36*/
dkprintf("%s: fd: %d, len: %lu, off: %lu \n", __FUNCTION__, fd, len, off);
@@ -116,8 +122,6 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
ihk_mc_syscall_arg4(&ctx) = virt_to_phys(&result);
ihk_mc_syscall_arg5(&ctx) = prot | populate_flags;
memset(&result, 0, sizeof(result));
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s: error: fd: %d, len: %lu, off: %lu map failed.\n",
@@ -131,7 +135,6 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
obj->memobj.ops = &devobj_ops;
obj->memobj.flags = MF_HAS_PAGER | MF_DEV_FILE;
obj->memobj.size = len;
ihk_atomic_set(&obj->memobj.refcnt, 1);
obj->handle = result.handle;
dkprintf("%s: path=%s\n", __FUNCTION__, result.path);
@@ -145,9 +148,10 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
strncpy(obj->memobj.path, result.path, PATH_MAX);
}
obj->pfn_pgoff = off >> PAGE_SHIFT;
obj->ref = 1;
obj->pfn_pgoff = off / PAGE_SIZE;
obj->npages = npages;
ihk_mc_spinlock_init(&obj->pfn_table_lock);
ihk_mc_spinlock_init(&obj->memobj.lock);
error = 0;
*objp = to_memobj(obj);
@@ -166,50 +170,81 @@ out:
return error;
}
static void devobj_free(struct memobj *memobj)
static void devobj_ref(struct memobj *memobj)
{
struct devobj *obj = to_devobj(memobj);
dkprintf("devobj_ref(%p %lx):\n", obj, obj->handle);
memobj_lock(&obj->memobj);
++obj->ref;
memobj_unlock(&obj->memobj);
return;
}
static void devobj_release(struct memobj *memobj)
{
struct devobj *obj = to_devobj(memobj);
struct devobj *free_obj = NULL;
uintptr_t handle;
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
#ifndef POSTK_DEBUG_TEMP_FIX_36
const size_t pfn_npages =
(obj->npages + uintptr_per_page - 1) / uintptr_per_page;
int error;
ihk_mc_user_context_t ctx;
(obj->npages / (PAGE_SIZE / sizeof(uintptr_t))) + 1;
#endif /*!POSTK_DEBUG_TEMP_FIX_36*/
dkprintf("%s(%p %lx)\n", __func__, obj, obj->handle);
dkprintf("devobj_release(%p %lx)\n", obj, obj->handle);
memobj_lock(&obj->memobj);
--obj->ref;
if (obj->ref <= 0) {
free_obj = obj;
}
handle = obj->handle;
memobj_unlock(&obj->memobj);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_UNMAP;
ihk_mc_syscall_arg1(&ctx) = handle;
ihk_mc_syscall_arg2(&ctx) = 1;
if (free_obj) {
if (!(free_obj->memobj.flags & MF_HOST_RELEASED)) {
int error;
ihk_mc_user_context_t ctx;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s(%p %lx): release failed. %d\n",
__func__, obj, handle, error);
/* through */
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_UNMAP;
ihk_mc_syscall_arg1(&ctx) = handle;
ihk_mc_syscall_arg2(&ctx) = 1;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("devobj_release(%p %lx):"
"release failed. %d\n",
free_obj, handle, error);
/* through */
}
}
if (obj->pfn_table) {
// Don't call memory_stat_rss_sub() because devobj related pages don't reside in main memory
#ifdef POSTK_DEBUG_TEMP_FIX_36
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
const size_t pfn_npages = (obj->npages + uintptr_per_page - 1) / uintptr_per_page;
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
#else
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
#endif /*POSTK_DEBUG_TEMP_FIX_36*/
}
if (to_memobj(free_obj)->path) {
kfree(to_memobj(free_obj)->path);
}
kfree(free_obj);
}
if (obj->pfn_table) {
// Don't call memory_stat_rss_sub() because devobj related
// pages don't reside in main memory
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
}
if (to_memobj(obj)->path) {
kfree(to_memobj(obj)->path);
}
kfree(obj);
dkprintf("%s(%p %lx):free\n", __func__, obj, handle);
dkprintf("devobj_release(%p %lx):free %p\n",
obj, handle, free_obj);
return;
}
static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintptr_t *physp, unsigned long *flag, uintptr_t virt_addr)
{
const off_t pgoff = off >> PAGE_SHIFT;
const off_t pgoff = off / PAGE_SIZE;
struct devobj *obj = to_devobj(memobj);
int error;
uintptr_t pfn;
@@ -227,14 +262,17 @@ static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintpt
ix = pgoff - obj->pfn_pgoff;
dkprintf("ix: %ld\n", ix);
memobj_lock(&obj->memobj);
pfn = obj->pfn_table[ix];
#ifdef PROFILE_ENABLE
profile_event_add(PROFILE_page_fault_dev_file, PAGE_SIZE);
#endif // PROFILE_ENABLE
pfn = obj->pfn_table[ix];
if (!(pfn & PFN_VALID)) {
memobj_unlock(&obj->memobj);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_PFN;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = off & ~(PAGE_SIZE - 1);
ihk_mc_syscall_arg2(&ctx) = pgoff << PAGE_SHIFT;
ihk_mc_syscall_arg3(&ctx) = virt_to_phys(&pfn);
error = syscall_generic_forwarding(__NR_mmap, &ctx);
@@ -265,9 +303,11 @@ static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintpt
dkprintf("devobj_get_page(%p %lx,%lx,%d):PFN_PRESENT after %#lx\n", memobj, obj->handle, off, p2align, pfn);
}
memobj_lock(&obj->memobj);
obj->pfn_table[ix] = pfn;
// Don't call memory_stat_rss_add() because devobj related pages don't reside in main memory
}
memobj_unlock(&obj->memobj);
if (!(pfn & PFN_PRESENT)) {
kprintf("devobj_get_page(%p %lx,%lx,%d):not present. %lx\n", memobj, obj->handle, off, p2align, pfn);

291
kernel/file_ops.c Normal file
View File

@@ -0,0 +1,291 @@
#include <hfi1/file_ops.h>
#include <hfi1/hfi.h>
#include <hfi1/user_sdma.h>
#include <hfi1/sdma.h>
#include <hfi1/ihk_hfi1_common.h>
#include <hfi1/user_exp_rcv.h>
#include <errno.h>
//#define DEBUG_PRINT_FOPS
#ifdef DEBUG_PRINT_FOPS
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
long hfi1_file_ioctl(void *private_data, unsigned int cmd,
unsigned long arg, unsigned long t_s)
{
struct hfi1_filedata *fd = private_data;
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_tid_info tinfo;
unsigned long addr;
int ret = -ENOTSUPP;
hfi1_cdbg(IOCTL, "IOCTL recv: 0x%x", cmd);
if (cmd != HFI1_IOCTL_ASSIGN_CTXT &&
cmd != HFI1_IOCTL_GET_VERS &&
!uctxt)
return -EINVAL;
switch (cmd) {
case HFI1_IOCTL_ASSIGN_CTXT:
#if 0
if (uctxt)
return -EINVAL;
if (copy_from_user(&uinfo,
(struct hfi1_user_info __user *)arg,
sizeof(uinfo)))
return -EFAULT;
ret = assign_ctxt(fp, &uinfo);
if (ret < 0)
return ret;
ret = setup_ctxt(fp);
if (ret)
return ret;
ret = user_init(fp);
#endif
dkprintf("%s: HFI1_IOCTL_ASSIGN_CTXT \n", __FUNCTION__);
break;
case HFI1_IOCTL_CTXT_INFO:
#if 0
ret = get_ctxt_info(fp, (void __user *)(unsigned long)arg,
sizeof(struct hfi1_ctxt_info));
#endif
dkprintf("%s: HFI1_IOCTL_CTXT_INFO \n", __FUNCTION__);
break;
case HFI1_IOCTL_USER_INFO:
#if 0
ret = get_base_info(fp, (void __user *)(unsigned long)arg,
sizeof(struct hfi1_base_info));
#endif
dkprintf("%s: HFI1_IOCTL_USER_INFO \n", __FUNCTION__);
break;
case HFI1_IOCTL_CREDIT_UPD:
#if 0
if (uctxt)
sc_return_credits(uctxt->sc);
#endif
dkprintf("%s: HFI1_IOCTL_CREDIT_UPD \n", __FUNCTION__);
break;
case HFI1_IOCTL_TID_UPDATE:
dkprintf("%s: HFI1_IOCTL_TID_UPDATE \n", __FUNCTION__);
if (copy_from_user(&tinfo,
(struct hfi11_tid_info __user *)arg,
sizeof(tinfo)))
return -EFAULT;
ret = hfi1_user_exp_rcv_setup(fd, &tinfo);
if (!ret) {
/*
* Copy the number of tidlist entries we used
* and the length of the buffer we registered.
* These fields are adjacent in the structure so
* we can copy them at the same time.
*/
addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
sizeof(tinfo.tidcnt) +
sizeof(tinfo.length)))
ret = -EFAULT;
}
break;
case HFI1_IOCTL_TID_FREE:
dkprintf("%s: HFI1_IOCTL_TID_FREE \n", __FUNCTION__);
if (copy_from_user(&tinfo,
(struct hfi11_tid_info __user *)arg,
sizeof(tinfo)))
return -EFAULT;
ret = hfi1_user_exp_rcv_clear(fd, &tinfo);
if (ret)
break;
addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
sizeof(tinfo.tidcnt)))
ret = -EFAULT;
break;
case HFI1_IOCTL_TID_INVAL_READ:
dkprintf("%s: HFI1_IOCTL_TID_INVAL_READ \n", __FUNCTION__);
if (copy_from_user(&tinfo,
(struct hfi11_tid_info __user *)arg,
sizeof(tinfo)))
return -EFAULT;
ret = hfi1_user_exp_rcv_invalid(fd, &tinfo);
if (ret)
break;
addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
sizeof(tinfo.tidcnt)))
ret = -EFAULT;
break;
case HFI1_IOCTL_RECV_CTRL:
#if 0
ret = get_user(uval, (int __user *)arg);
if (ret != 0)
return -EFAULT;
ret = manage_rcvq(uctxt, fd->subctxt, uval);
#endif
dkprintf("%s: HFI1_IOCTL_RECV_CTRL \n", __FUNCTION__);
break;
case HFI1_IOCTL_POLL_TYPE:
#if 0
ret = get_user(uval, (int __user *)arg);
if (ret != 0)
return -EFAULT;
uctxt->poll_type = (typeof(uctxt->poll_type))uval;
#endif
dkprintf("%s: HFI1_IOCTL_POLL_TYPE \n", __FUNCTION__);
break;
case HFI1_IOCTL_ACK_EVENT:
#if 0
ret = get_user(ul_uval, (unsigned long __user *)arg);
if (ret != 0)
return -EFAULT;
ret = user_event_ack(uctxt, fd->subctxt, ul_uval);
#endif
dkprintf("%s: HFI1_IOCTL_ACK_EVENT \n", __FUNCTION__);
break;
case HFI1_IOCTL_SET_PKEY:
#if 0
ret = get_user(uval16, (u16 __user *)arg);
if (ret != 0)
return -EFAULT;
if (HFI1_CAP_IS_USET(PKEY_CHECK))
ret = set_ctxt_pkey(uctxt, fd->subctxt, uval16);
else
return -EPERM;
#endif
ret = -ENODEV;
dkprintf("%s: HFI1_IOCTL_SET_PKEY \n", __FUNCTION__);
break;
case HFI1_IOCTL_CTXT_RESET: {
#if 0
struct send_context *sc;
struct hfi1_devdata *dd;
if (!uctxt || !uctxt->dd || !uctxt->sc)
return -EINVAL;
/*
* There is no protection here. User level has to
* guarantee that no one will be writing to the send
* context while it is being re-initialized.
* If user level breaks that guarantee, it will break
* it's own context and no one else's.
*/
dd = uctxt->dd;
sc = uctxt->sc;
/*
* Wait until the interrupt handler has marked the
* context as halted or frozen. Report error if we time
* out.
*/
wait_event_interruptible_timeout(
sc->halt_wait, (sc->flags & SCF_HALTED),
msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
if (!(sc->flags & SCF_HALTED))
return -ENOLCK;
/*
* If the send context was halted due to a Freeze,
* wait until the device has been "unfrozen" before
* resetting the context.
*/
if (sc->flags & SCF_FROZEN) {
wait_event_interruptible_timeout(
dd->event_queue,
!(ACCESS_ONCE(dd->flags) & HFI1_FROZEN),
msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
if (dd->flags & HFI1_FROZEN)
return -ENOLCK;
if (dd->flags & HFI1_FORCED_FREEZE)
/*
* Don't allow context reset if we are into
* forced freeze
*/
return -ENODEV;
sc_disable(sc);
ret = sc_enable(sc);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB,
uctxt->ctxt);
} else {
ret = sc_restart(sc);
}
if (!ret)
sc_return_credits(sc);
break;
#endif
dkprintf("%s: HFI1_IOCTL_CTXT_RESET \n", __FUNCTION__);
break;
}
case HFI1_IOCTL_GET_VERS:
#if 0
uval = HFI1_USER_SWVERSION;
if (put_user(uval, (int __user *)arg))
return -EFAULT;
#endif
dkprintf("%s: HFI1_IOCTL_GET_VERS \n", __FUNCTION__);
break;
default:
return -ENOTSUPP;
}
return ret;
}
ssize_t hfi1_aio_write(void *private_data, const struct iovec *iovec, unsigned long dim)
{
struct hfi1_filedata *fd = private_data;
struct hfi1_user_sdma_pkt_q *pq = fd->pq;
struct hfi1_user_sdma_comp_q *cq = fd->cq;
int done = 0, reqs = 0;
if (!cq || !pq)
return -EIO;
if (!dim)
return -EINVAL;
hfi1_cdbg(SDMA, "SDMA request from %u:%u (%lu)",
fd->uctxt->ctxt, fd->subctxt, dim);
if (atomic_read(&pq->n_reqs) == pq->n_max_reqs)
return -ENOSPC;
while (dim) {
int ret;
unsigned long count = 0;
ret = hfi1_user_sdma_process_request(
private_data, (struct iovec *)(iovec + done),
dim, &count);
if (ret) {
reqs = ret;
break;
}
dim -= count;
done += count;
reqs++;
}
return reqs;
}

View File

@@ -27,13 +27,15 @@
#include <string.h>
#include <syscall.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_FILEOBJ
#ifdef DEBUG_PRINT_FILEOBJ
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf(...) do { if (1) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
mcs_lock_t fileobj_list_lock;
@@ -45,21 +47,24 @@ static LIST_HEAD(fileobj_list);
struct fileobj {
struct memobj memobj; /* must be first */
uint64_t sref;
long sref;
long cref;
uintptr_t handle;
struct list_head list;
struct list_head page_hash[FILEOBJ_PAGE_HASH_SIZE];
mcs_lock_t page_hash_locks[FILEOBJ_PAGE_HASH_SIZE];
};
static memobj_free_func_t fileobj_free;
static memobj_release_func_t fileobj_release;
static memobj_ref_func_t fileobj_ref;
static memobj_get_page_func_t fileobj_get_page;
static memobj_flush_page_func_t fileobj_flush_page;
static memobj_invalidate_page_func_t fileobj_invalidate_page;
static memobj_lookup_page_func_t fileobj_lookup_page;
static struct memobj_ops fileobj_ops = {
.free = &fileobj_free,
.release = &fileobj_release,
.ref = &fileobj_ref,
.get_page = &fileobj_get_page,
.copy_page = NULL,
.flush_page = &fileobj_flush_page,
@@ -165,22 +170,22 @@ static void obj_list_remove(struct fileobj *obj)
/* return NULL or locked fileobj */
static struct fileobj *obj_list_lookup(uintptr_t handle)
{
struct fileobj *obj;
struct fileobj *p;
obj = NULL;
list_for_each_entry(p, &fileobj_list, list) {
if (p->handle == handle) {
/* for the interval between last put and fileobj_free
* taking list_lock
*/
if (memobj_ref(&p->memobj) <= 1) {
ihk_atomic_dec(&p->memobj.refcnt);
continue;
memobj_lock(&p->memobj);
if (p->cref > 0) {
obj = p;
break;
}
return p;
memobj_unlock(&p->memobj);
}
}
return NULL;
return obj;
}
/***********************************************************************
@@ -195,7 +200,13 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
struct fileobj *obj;
struct mcs_lock_node node;
dkprintf("%s(%d)\n", __func__, fd);
dkprintf("fileobj_create(%d)\n", fd);
newobj = kmalloc(sizeof(*newobj), IHK_MC_AP_NOWAIT);
if (!newobj) {
error = -ENOMEM;
kprintf("fileobj_create(%d):kmalloc failed. %d\n", fd, error);
goto out;
}
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_CREATE;
ihk_mc_syscall_arg1(&ctx) = fd;
@@ -203,41 +214,20 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
memset(&result, 0, sizeof(result));
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
/* -ESRCH doesn't mean an error but requesting a fall
* back to treat the file as a device file
*/
if (error != -ESRCH) {
kprintf("%s(%d):create failed. %d\n",
__func__, fd, error);
}
dkprintf("fileobj_create(%d):create failed. %d\n", fd, error);
goto out;
}
if (result.flags & MF_HUGETLBFS) {
return hugefileobj_pre_create(&result, objp, maxprotp);
}
mcs_lock_lock(&fileobj_list_lock, &node);
obj = obj_list_lookup(result.handle);
if (obj)
goto found;
mcs_lock_unlock(&fileobj_list_lock, &node);
// not found: alloc new object and lookup again
newobj = kmalloc(sizeof(*newobj), IHK_MC_AP_NOWAIT);
if (!newobj) {
error = -ENOMEM;
kprintf("%s(%d):kmalloc failed. %d\n", __func__, fd, error);
goto out;
}
memset(newobj, 0, sizeof(*newobj));
newobj->memobj.ops = &fileobj_ops;
newobj->memobj.flags = MF_HAS_PAGER | MF_REG_FILE;
newobj->handle = result.handle;
newobj->sref = 1;
newobj->cref = 1;
fileobj_page_hash_init(newobj);
ihk_mc_spinlock_init(&newobj->memobj.lock);
mcs_lock_lock_noirq(&fileobj_list_lock, &node);
obj = obj_list_lookup(result.handle);
@@ -247,8 +237,6 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
to_memobj(obj)->size = result.size;
to_memobj(obj)->flags |= result.flags;
to_memobj(obj)->status = MEMOBJ_READY;
ihk_atomic_set(&to_memobj(obj)->refcnt, 1);
obj->sref = 1;
if (to_memobj(obj)->flags & MF_PREFETCH) {
to_memobj(obj)->status = MEMOBJ_TO_BE_PREFETCHED;
}
@@ -317,17 +305,20 @@ error_cleanup:
}
newobj = NULL;
dkprintf("%s: new obj 0x%lx %s\n",
dkprintf("%s: new obj 0x%lx cref: %d, %s\n",
__FUNCTION__,
obj,
obj->cref,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
}
else {
found:
obj->sref++;
dkprintf("%s: existing obj 0x%lx, %s\n",
++obj->sref;
++obj->cref;
memobj_unlock(&obj->memobj); /* locked by obj_list_lookup() */
dkprintf("%s: existing obj 0x%lx cref: %d, %s\n",
__FUNCTION__,
obj,
obj->cref,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
}
@@ -341,111 +332,152 @@ out:
if (newobj) {
kfree(newobj);
}
dkprintf("%s(%d):%d %p %x\n", __func__, fd, error, *objp, *maxprotp);
dkprintf("fileobj_create(%d):%d %p %x\n", fd, error, *objp, *maxprotp);
return error;
}
static void fileobj_free(struct memobj *memobj)
static void fileobj_ref(struct memobj *memobj)
{
struct fileobj *obj = to_fileobj(memobj);
struct mcs_lock_node node;
int error;
ihk_mc_user_context_t ctx;
dkprintf("%s: free obj 0x%lx, %s\n", __func__,
obj, to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
mcs_lock_lock_noirq(&fileobj_list_lock, &node);
obj_list_remove(obj);
mcs_lock_unlock_noirq(&fileobj_list_lock, &node);
/* zap page_list */
for (;;) {
struct page *page;
void *page_va;
uintptr_t phys;
page = fileobj_page_hash_first(obj);
if (!page) {
break;
}
__fileobj_page_hash_remove(page);
phys = page_to_phys(page);
page_va = phys_to_virt(phys);
/* Count must be one because set to one on the first
* get_page() invoking fileobj_do_pageio and incremented by
* the second get_page() reaping the pageio and decremented
* by clear_range().
*/
if (ihk_atomic_read(&page->count) != 1) {
kprintf("%s: WARNING: page count is %d for phys 0x%lx is invalid, flags: 0x%lx\n",
__func__, ihk_atomic_read(&page->count),
page->phys, to_memobj(obj)->flags);
}
else if (page_unmap(page)) {
ihk_mc_free_pages_user(page_va, 1);
/* Track change in page->count for !MF_PREMAP pages.
* It is decremented here or in clear_range()
*/
dkprintf("%lx-,%s: calling memory_stat_rss_sub(),phys=%lx,size=%ld,pgsize=%ld\n",
phys, __func__, phys, PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE,
PAGE_SIZE);
}
}
/* Pre-mapped? */
if (to_memobj(obj)->flags & MF_PREMAP) {
int i;
for (i = 0; i < to_memobj(obj)->nr_pages; ++i) {
if (to_memobj(obj)->pages[i]) {
dkprintf("%s: pages[i]=%p\n", __func__, i,
to_memobj(obj)->pages[i]);
// Track change in fileobj->pages[] for MF_PREMAP pages
// Note that page_unmap() isn't called for MF_PREMAP in
// free_process_memory_range() --> ihk_mc_pt_free_range()
dkprintf("%lx-,%s: memory_stat_rss_sub,phys=%lx,size=%ld,pgsize=%ld\n",
virt_to_phys(to_memobj(obj)->pages[i]),
__func__,
virt_to_phys(to_memobj(obj)->pages[i]),
PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE,
PAGE_SIZE);
ihk_mc_free_pages_user(to_memobj(obj)->pages[i],
1);
}
}
kfree(to_memobj(obj)->pages);
}
if (to_memobj(obj)->path) {
dkprintf("%s: %s\n", __func__, to_memobj(obj)->path);
kfree(to_memobj(obj)->path);
}
/* linux side
* sref is necessary because handle is used as key, so there could
* be a new mckernel pager with the same handle being created as
* this one is being destroyed
*/
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_RELEASE;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = obj->sref;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s(%p %lx): free failed. %d\n", __func__,
obj, obj->handle, error);
/* through */
}
dkprintf("%s(%p %lx):free\n", __func__, obj, obj->handle);
kfree(obj);
dkprintf("fileobj_ref(%p %lx):\n", obj, obj->handle);
memobj_lock(&obj->memobj);
++obj->cref;
memobj_unlock(&obj->memobj);
return;
}
static void fileobj_release(struct memobj *memobj)
{
struct fileobj *obj = to_fileobj(memobj);
long free_sref = 0;
uintptr_t free_handle;
struct fileobj *free_obj = NULL;
struct mcs_lock_node node;
dkprintf("fileobj_release(%p %lx)\n", obj, obj->handle);
memobj_lock(&obj->memobj);
--obj->cref;
free_sref = obj->sref - 1; /* surplus sref */
if (obj->cref <= 0) {
free_sref = obj->sref;
free_obj = obj;
}
obj->sref -= free_sref;
free_handle = obj->handle;
memobj_unlock(&obj->memobj);
if (obj->memobj.flags & MF_HOST_RELEASED) {
free_sref = 0; // don't call syscall_generic_forwarding
}
if (free_obj) {
dkprintf("%s: release obj 0x%lx cref: %d, free_obj: 0x%lx, %s\n",
__FUNCTION__,
obj,
obj->cref,
free_obj,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
mcs_lock_lock_noirq(&fileobj_list_lock, &node);
/* zap page_list */
for (;;) {
struct page *page;
void *page_va;
uintptr_t phys;
page = fileobj_page_hash_first(obj);
if (!page) {
break;
}
__fileobj_page_hash_remove(page);
phys = page_to_phys(page);
page_va = phys_to_virt(phys);
/* Count must be one because set to one on the first get_page() invoking fileobj_do_pageio and
incremented by the second get_page() reaping the pageio and decremented by clear_range().
*/
if (ihk_atomic_read(&page->count) != 1) {
kprintf("%s: WARNING: page count is %d for phys 0x%lx is invalid, flags: 0x%lx\n",
__FUNCTION__,
ihk_atomic_read(&page->count),
page->phys,
to_memobj(free_obj)->flags);
}
else if (page_unmap(page)) {
ihk_mc_free_pages_user(page_va, 1);
/* Track change in page->count for !MF_PREMAP pages. It is decremented here or in clear_range() */
dkprintf("%lx-,%s: calling memory_stat_rss_sub(),phys=%lx,size=%ld,pgsize=%ld\n", phys, __FUNCTION__, phys, PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE, PAGE_SIZE);
}
#if 0
count = ihk_atomic_sub_return(1, &page->count);
if (!((page->mode == PM_WILL_PAGEIO)
|| (page->mode == PM_DONE_PAGEIO)
|| (page->mode == PM_PAGEIO_EOF)
|| (page->mode == PM_PAGEIO_ERROR)
|| ((page->mode == PM_MAPPED)
&& (count <= 0)))) {
kprintf("fileobj_release(%p %lx): "
"mode %x, count %d, off %lx\n",
obj, obj->handle, page->mode,
count, page->offset);
panic("fileobj_release");
}
page->mode = PM_NONE;
#endif
}
/* Pre-mapped? */
if (to_memobj(free_obj)->flags & MF_PREMAP) {
int i;
for (i = 0; i < to_memobj(free_obj)->nr_pages; ++i) {
if (to_memobj(free_obj)->pages[i]) {
dkprintf("%s: pages[i]=%p\n", __FUNCTION__, i, to_memobj(free_obj)->pages[i]);
// Track change in fileobj->pages[] for MF_PREMAP pages
// Note that page_unmap() isn't called for MF_PREMAP in
// free_process_memory_range() --> ihk_mc_pt_free_range()
dkprintf("%lx-,%s: memory_stat_rss_sub,phys=%lx,size=%ld,pgsize=%ld\n",
virt_to_phys(to_memobj(free_obj)->pages[i]), __FUNCTION__, virt_to_phys(to_memobj(free_obj)->pages[i]), PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE, PAGE_SIZE);
ihk_mc_free_pages_user(to_memobj(free_obj)->pages[i], 1);
}
}
kfree(to_memobj(free_obj)->pages);
}
if (to_memobj(free_obj)->path) {
dkprintf("%s: %s\n", __FUNCTION__, to_memobj(free_obj)->path);
kfree(to_memobj(free_obj)->path);
}
obj_list_remove(free_obj);
mcs_lock_unlock_noirq(&fileobj_list_lock, &node);
kfree(free_obj);
}
if (free_sref) {
int error;
ihk_mc_user_context_t ctx;
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_RELEASE;
ihk_mc_syscall_arg1(&ctx) = free_handle;
ihk_mc_syscall_arg2(&ctx) = free_sref;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("fileobj_release(%p %lx):"
"release %ld failed. %d\n",
obj, free_handle, free_sref, error);
/* through */
}
}
dkprintf("fileobj_release(%p %lx):free %ld %p\n",
obj, free_handle, free_sref, free_obj);
return;
}
struct pageio_args {
@@ -538,7 +570,7 @@ static void fileobj_do_pageio(void *args0)
out:
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
memobj_unref(&obj->memobj); /* got fileobj_get_page() */
fileobj_release(&obj->memobj); /* got fileobj_get_page() */
kfree(args0);
dkprintf("fileobj_do_pageio(%p,%lx,%lx):\n", obj, off, pgsize);
return;
@@ -624,9 +656,7 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
npages = 1 << p2align;
virt = ihk_mc_alloc_pages_user(npages, (IHK_MC_AP_NOWAIT |
((to_memobj(obj)->flags & MF_ZEROFILL) ?
IHK_MC_AP_USER : 0)),
virt_addr);
(to_memobj(obj)->flags & MF_ZEROFILL) ? IHK_MC_AP_USER : 0), virt_addr);
if (!virt) {
error = -ENOMEM;
kprintf("fileobj_get_page(%p,%lx,%x,%x,%p):"
@@ -651,7 +681,9 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
page->mode = PM_WILL_PAGEIO;
}
memobj_ref(&obj->memobj);
memobj_lock(&obj->memobj);
++obj->cref; /* for fileobj_do_pageio() */
memobj_unlock(&obj->memobj);
args->fileobj = obj;
args->objoff = off;
@@ -712,6 +744,10 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
return 0;
}
if (memobj->flags & MF_HOST_RELEASED) {
return 0;
}
page = phys_to_page(phys);
if (!page) {
kprintf("%s: warning: tried to flush non-existing page for phys addr: 0x%lx\n",
@@ -719,6 +755,8 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
return 0;
}
memobj_unlock(&obj->memobj);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_WRITE;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = page->offset;
@@ -733,6 +771,7 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
/* through */
}
memobj_lock(&obj->memobj);
return 0;
}

View File

@@ -70,22 +70,15 @@
#include <cls.h>
#include <kmsg.h>
#include <timer.h>
#include <debug.h>
#include <syscall.h>
//#define DEBUG_PRINT_FUTEX
#ifdef DEBUG_PRINT_FUTEX
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define uti_dkprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
#define dkprintf kprintf
#else
#define uti_dkprintf(...) do { } while (0)
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#endif
#define uti_kprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
unsigned long ihk_mc_get_ns_per_tsc(void);
int futex_cmpxchg_enabled;
/**
@@ -115,9 +108,6 @@ struct futex_q {
union futex_key key;
union futex_key *requeue_pi_key;
uint32_t bitset;
/* Used to wake-up a thread running on a Linux CPU */
void *uti_futex_resp;
};
/*
@@ -190,12 +180,11 @@ static void drop_futex_key_refs(union futex_key *key)
* lock_page() might sleep, the caller should not hold a spinlock.
*/
static int
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key, struct cpu_local_var *clv_override)
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key)
{
unsigned long address = (unsigned long)uaddr;
unsigned long phys;
struct thread *thread = cpu_local_var_with_override(current, clv_override);
struct process_vm *mm = thread->vm;
struct process_vm *mm = cpu_local_var(current)->vm;
/*
* The futex address must be "naturally" aligned.
@@ -261,7 +250,7 @@ static int cmpxchg_futex_value_locked(uint32_t __user *uaddr, uint32_t uval, uin
* The hash bucket lock must be held when this is called.
* Afterwards, the futex_q must not be accessed.
*/
static void wake_futex(struct futex_q *q, struct cpu_local_var *clv_override)
static void wake_futex(struct futex_q *q)
{
struct thread *p = q->task;
@@ -283,31 +272,8 @@ static void wake_futex(struct futex_q *q, struct cpu_local_var *clv_override)
barrier();
q->lock_ptr = NULL;
if (q->uti_futex_resp) {
int rc;
uti_dkprintf("wake_futex(): waking up migrated-to-Linux thread (tid %d),uti_futex_resp=%p\n", p->tid, q->uti_futex_resp);
/* TODO: Add the case when a Linux thread waking up another Linux thread */
if (clv_override) {
uti_dkprintf("%s: ERROR: A Linux thread is waking up migrated-to-Linux thread\n", __FUNCTION__);
}
if (p->spin_sleep == 0) {
uti_dkprintf("%s: INFO: woken up by someone else\n", __FUNCTION__);
}
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel = cpu_local_var_with_override(ikc2linux, clv_override);
pckt.msg = SCD_MSG_FUTEX_WAKE;
pckt.futex.resp = q->uti_futex_resp;
pckt.futex.spin_sleep = &p->spin_sleep;
rc = ihk_ikc_send(resp_channel, &pckt, 0);
if (rc) {
uti_dkprintf("%s: ERROR: ihk_ikc_send returned %d, resp_channel=%p\n", __FUNCTION__, rc, resp_channel);
}
} else {
uti_dkprintf("wake_futex(): waking up McKernel thread (tid %d)\n", p->tid);
sched_wakeup_thread(p, PS_NORMAL);
}
dkprintf("wake_futex(): waking up tid %d\n", p->tid);
sched_wakeup_thread(p, PS_NORMAL);
}
/*
@@ -337,7 +303,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
/*
* Wake up waiters matching bitset queued on this futex (uaddr).
*/
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset, struct cpu_local_var *clv_override)
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
@@ -348,7 +314,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!bitset)
return -EINVAL;
ret = get_futex_key(uaddr, fshared, &key, clv_override);
ret = get_futex_key(uaddr, fshared, &key);
if ((ret != 0))
goto out;
@@ -364,7 +330,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!(this->bitset & bitset))
continue;
wake_futex(this, clv_override);
wake_futex(this);
if (++ret >= nr_wake)
break;
}
@@ -382,8 +348,7 @@ out:
*/
static int
futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_wake2, int op,
struct cpu_local_var *clv_override)
int nr_wake, int nr_wake2, int op)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2;
@@ -392,10 +357,10 @@ futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int ret, op_ret;
retry:
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
ret = get_futex_key(uaddr1, fshared, &key1);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
ret = get_futex_key(uaddr2, fshared, &key2);
if ((ret != 0))
goto out_put_key1;
@@ -429,7 +394,7 @@ retry_private:
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key1)) {
wake_futex(this, clv_override);
wake_futex(this);
if (++ret >= nr_wake)
break;
}
@@ -441,7 +406,7 @@ retry_private:
op_ret = 0;
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key2)) {
wake_futex(this, clv_override);
wake_futex(this);
if (++op_ret >= nr_wake2)
break;
}
@@ -504,7 +469,7 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
*/
static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_requeue, uint32_t *cmpval,
int requeue_pi, struct cpu_local_var *clv_override)
int requeue_pi)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
int drop_count = 0, task_count = 0, ret;
@@ -512,10 +477,10 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
struct plist_head *head1;
struct futex_q *this, *next;
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
ret = get_futex_key(uaddr1, fshared, &key1);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
ret = get_futex_key(uaddr2, fshared, &key2);
if ((ret != 0))
goto out_put_key1;
@@ -550,7 +515,7 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
*/
/* RIKEN: no requeue_pi at this moment */
if (++task_count <= nr_wake) {
wake_futex(this, clv_override);
wake_futex(this);
continue;
}
@@ -609,7 +574,7 @@ queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
* state is implicit in the state of woken task (see futex_wait_requeue_pi() for
* an example).
*/
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb, struct cpu_local_var *clv_override)
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
{
int prio;
@@ -630,7 +595,7 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb, str
q->list.plist.spinlock = &hb->lock;
#endif
plist_add(&q->list, &hb->chain);
q->task = cpu_local_var_with_override(current, clv_override);
q->task = cpu_local_var(current);
ihk_mc_spinlock_unlock_noirq(&hb->lock);
}
@@ -693,64 +658,46 @@ retry:
/* RIKEN: this function has been rewritten so that it returns the remaining
* time in case we are waken.
*/
static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
uint64_t timeout, struct cpu_local_var *clv_override)
static uint64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
uint64_t timeout)
{
int64_t time_remain = 0;
uint64_t time_remain = 0;
unsigned long irqstate;
struct thread *thread = cpu_local_var_with_override(current, clv_override);
struct thread *thread = cpu_local_var(current);
/*
* The task state is guaranteed to be set before another task can
* wake it.
* queue_me() calls spin_unlock() upon completion, serializing
* access to the hash list and forcing a memory barrier.
*/
xchg4(&(thread->status), PS_INTERRUPTIBLE);
xchg4(&(cpu_local_var(current)->status), PS_INTERRUPTIBLE);
/* Indicate spin sleep. Note that schedule_timeout() with
* idle_halt should use spin sleep because sleep with timeout
* is not implemented.
*/
if (!idle_halt || timeout) {
/* Indicate spin sleep */
if (!idle_halt) {
irqstate = ihk_mc_spinlock_lock(&thread->spin_sleep_lock);
thread->spin_sleep = 1;
ihk_mc_spinlock_unlock(&thread->spin_sleep_lock, irqstate);
}
queue_me(q, hb, clv_override);
queue_me(q, hb);
if (!plist_node_empty(&q->list)) {
if (clv_override) {
uti_dkprintf("%s: tid: %d is trying to sleep\n", __FUNCTION__, thread->tid);
/* Note that the unit of timeout is nsec */
time_remain = (*linux_wait_event)(q->uti_futex_resp, timeout);
/* Note that time_remain == 0 indicates contidion evaluated to false after the timeout elapsed */
if (time_remain < 0) {
if (time_remain == -ERESTARTSYS) { /* Interrupted by signal */
uti_dkprintf("%s: DEBUG: wait_event returned -ERESTARTSYS\n", __FUNCTION__);
} else {
uti_kprintf("%s: ERROR: wait_event returned %d\n", __FUNCTION__, time_remain);
}
}
uti_dkprintf("%s: tid: %d woken up\n", __FUNCTION__, thread->tid);
} else {
if (timeout) {
dkprintf("futex_wait_queue_me(): tid: %d schedule_timeout()\n", thread->tid);
dkprintf("futex_wait_queue_me(): tid: %d schedule_timeout()\n", cpu_local_var(current)->tid);
time_remain = schedule_timeout(timeout);
}
else {
dkprintf("futex_wait_queue_me(): tid: %d schedule()\n", thread->tid);
dkprintf("futex_wait_queue_me(): tid: %d schedule()\n", cpu_local_var(current)->tid);
spin_sleep_or_schedule();
time_remain = 0;
}
dkprintf("futex_wait_queue_me(): tid: %d woken up\n", thread->tid);
}
dkprintf("futex_wait_queue_me(): tid: %d woken up\n", cpu_local_var(current)->tid);
}
/* This does not need to be serialized */
thread->status = PS_RUNNING;
cpu_local_var(current)->status = PS_RUNNING;
thread->spin_sleep = 0;
return time_remain;
@@ -774,8 +721,7 @@ static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
struct futex_q *q, struct futex_hash_bucket **hb,
struct cpu_local_var *clv_override)
struct futex_q *q, struct futex_hash_bucket **hb)
{
uint32_t uval;
int ret;
@@ -798,7 +744,7 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
* rare, but normal.
*/
q->key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q->key, clv_override);
ret = get_futex_key(uaddr, fshared, &q->key);
if (ret != 0)
return ret;
@@ -822,59 +768,49 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
}
static int futex_wait(uint32_t __user *uaddr, int fshared,
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt,
struct cpu_local_var *clv_override)
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt)
{
struct futex_hash_bucket *hb;
struct futex_q q;
int64_t time_remain;
uint64_t time_remain;
int ret;
if (!bitset)
return -EINVAL;
#ifdef PROFILE_ENABLE
if (cpu_local_var_with_override(current, clv_override)->profile &&
cpu_local_var_with_override(current, clv_override)->profile_start_ts) {
cpu_local_var_with_override(current, clv_override)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var_with_override(current, clv_override)->profile_start_ts);
cpu_local_var_with_override(current, clv_override)->profile_start_ts = 0;
if (cpu_local_var(current)->profile &&
cpu_local_var(current)->profile_start_ts) {
cpu_local_var(current)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var(current)->profile_start_ts);
cpu_local_var(current)->profile_start_ts = 0;
}
#endif
q.bitset = bitset;
q.requeue_pi_key = NULL;
q.uti_futex_resp = cpu_local_var_with_override(uti_futex_resp, clv_override);
retry:
/* Prepare to wait on uaddr. */
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb, clv_override);
if (ret) {
uti_dkprintf("%s: tid=%d futex_wait_setup returns zero, no need to sleep\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
if (ret)
goto out;
}
/* queue_me and wait for wakeup, timeout, or a signal. */
time_remain = futex_wait_queue_me(hb, &q, timeout, clv_override);
time_remain = futex_wait_queue_me(hb, &q, timeout);
/* If we were woken (and unqueued), we succeeded, whatever. */
ret = 0;
if (!unqueue_me(&q)) {
uti_dkprintf("%s: tid=%d unqueued\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
if (!unqueue_me(&q))
goto out_put_key;
}
ret = -ETIMEDOUT;
/* RIKEN: timer expired case (indicated by !time_remain) */
if (timeout && !time_remain) {
uti_dkprintf("%s: tid=%d timer expired\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
if (timeout && !time_remain)
goto out_put_key;
}
/* RIKEN: futex_wait_queue_me() returns -ERESTARTSYS when waiting on Linux CPU and woken up by signal */
if (hassigpending(cpu_local_var_with_override(current, clv_override)) || time_remain == -ERESTARTSYS) {
if (hassigpending(cpu_local_var(current))) {
ret = -EINTR;
uti_dkprintf("%s: tid=%d woken up by signal\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
goto out_put_key;
}
@@ -886,22 +822,19 @@ out_put_key:
put_futex_key(fshared, &q.key);
out:
#ifdef PROFILE_ENABLE
if (cpu_local_var_with_override(current, clv_override)->profile) {
cpu_local_var_with_override(current, clv_override)->profile_start_ts = rdtsc();
if (cpu_local_var(current)->profile) {
cpu_local_var(current)->profile_start_ts = rdtsc();
}
#endif
return ret;
}
int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared,
struct cpu_local_var *clv_override)
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared)
{
int clockrt, ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK;
uti_dkprintf("%s: uaddr=%p, op=%x, val=%x, timeout=%ld, uaddr2=%p, val2=%x, val3=%x, fshared=%d, clv=%p\n", __FUNCTION__, uaddr, op, val, timeout, uaddr2, val2, val3, fshared, clv_override);
clockrt = op & FUTEX_CLOCK_REALTIME;
if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
return -ENOSYS;
@@ -910,21 +843,21 @@ int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt, clv_override);
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAKE_BITSET:
ret = futex_wake(uaddr, fshared, val, val3, clv_override);
ret = futex_wake(uaddr, fshared, val, val3);
break;
case FUTEX_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0, clv_override);
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
break;
case FUTEX_CMP_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 0, clv_override);
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 0);
break;
case FUTEX_WAKE_OP:
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3, clv_override);
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
break;
/* RIKEN: these calls are not supported for now.
case FUTEX_LOCK_PI:

View File

@@ -34,13 +34,13 @@
#include <sysfs.h>
#include <ihk/perfctr.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_HOST
#ifdef DEBUG_PRINT_HOST
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define dkprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#endif
/* Linux channel table, indexec by Linux CPU id */
@@ -78,6 +78,7 @@ int prepare_process_ranges_args_envs(struct thread *thread,
unsigned long args_envs_p, args_envs_rp;
unsigned long s, e, up;
char **argv;
char **a;
int i, n, argc, envc, args_envs_npages;
char **env;
int range_npages;
@@ -305,7 +306,7 @@ int prepare_process_ranges_args_envs(struct thread *thread,
/* Only unmap remote address if it wasn't specified as an argument */
if (!args) {
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages);
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages, 0);
ihk_mc_unmap_memory(NULL, args_envs_rp, p->args_len);
}
flush_tlb();
@@ -340,7 +341,7 @@ int prepare_process_ranges_args_envs(struct thread *thread,
/* Only map remote address if it wasn't specified as an argument */
if (!envs) {
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages);
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages, 0);
ihk_mc_unmap_memory(NULL, args_envs_rp, p->envs_len);
}
flush_tlb();
@@ -356,13 +357,12 @@ int prepare_process_ranges_args_envs(struct thread *thread,
proc->saved_cmdline_len = 0;
}
proc->saved_cmdline_len = p->args_len - ((argc + 2) * sizeof(char **));
proc->saved_cmdline = kmalloc(proc->saved_cmdline_len,
IHK_MC_AP_NOWAIT);
proc->saved_cmdline = kmalloc(p->args_len, IHK_MC_AP_NOWAIT);
if (!proc->saved_cmdline) {
goto err;
}
proc->saved_cmdline_len = p->args_len - ((argc + 2) * sizeof(char **));
memcpy(proc->saved_cmdline,
(char *)args_envs + ((argc + 2) * sizeof(char **)),
proc->saved_cmdline_len);
@@ -370,18 +370,21 @@ int prepare_process_ranges_args_envs(struct thread *thread,
__FUNCTION__,
proc->saved_cmdline);
for (i = 0; i < argc; i++) {
// Process' address space!
argv[i] = (char *)addr + (unsigned long)argv[i];
for (a = argv; *a; a++) {
*a = (char *)addr + (unsigned long)*a; // Process' address space!
}
envc = *((long *)(args_envs + p->args_len));
dkprintf("envc: %d\n", envc);
env = (char **)(args_envs + p->args_len + sizeof(long));
for (i = 0; i < envc; i++) {
env[i] = addr + p->args_len + env[i];
while (*env) {
char **_env = env;
//dkprintf("%s\n", args_envs + p->args_len + (unsigned long)*env);
*env = (char *)addr + p->args_len + (unsigned long)*env;
env = ++_env;
}
env = (char **)(args_envs + p->args_len + sizeof(long));
dkprintf("env OK\n");
@@ -446,7 +449,7 @@ static int process_msg_prepare_process(unsigned long rphys)
if((pn = kmalloc(sizeof(struct program_load_desc)
+ sizeof(struct program_image_section) * n,
IHK_MC_AP_NOWAIT)) == NULL){
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_virtual(p, npages, 0);
ihk_mc_unmap_memory(NULL, phys, sz);
return -ENOMEM;
}
@@ -457,7 +460,7 @@ static int process_msg_prepare_process(unsigned long rphys)
(unsigned long *)&p->cpu_set,
sizeof(p->cpu_set))) == NULL) {
kfree(pn);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_memory(NULL, phys, sz);
return -ENOMEM;
}
@@ -476,6 +479,7 @@ static int process_msg_prepare_process(unsigned long rphys)
proc->sgid = pn->cred[6];
proc->fsgid = pn->cred[7];
proc->termsig = SIGCHLD;
proc->mcexec_flags = pn->mcexec_flags;
proc->mpol_flags = pn->mpol_flags;
proc->mpol_threshold = pn->mpol_threshold;
proc->nr_processes = pn->nr_processes;
@@ -502,9 +506,6 @@ static int process_msg_prepare_process(unsigned long rphys)
vm->numa_mem_policy = MPOL_BIND;
}
proc->uti_thread_rank = pn->uti_thread_rank;
proc->uti_use_last_cpu = pn->uti_use_last_cpu;
#ifdef PROFILE_ENABLE
proc->profile = pn->profile;
thread->profile = pn->profile;
@@ -543,14 +544,14 @@ static int process_msg_prepare_process(unsigned long rphys)
kfree(pn);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_memory(NULL, phys, sz);
flush_tlb();
return 0;
err:
kfree(pn);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_memory(NULL, phys, sz);
destroy_thread(thread);
return -ENOMEM;
@@ -563,6 +564,7 @@ static void syscall_channel_send(struct ihk_ikc_channel_desc *c,
}
extern unsigned long do_kill(struct thread *, int, int, int, struct siginfo *, int ptracecont);
extern void process_procfs_request(struct ikc_scd_packet *rpacket);
extern void terminate_host(int pid);
extern void debug_log(long);
@@ -573,6 +575,7 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel = cpu_local_var(ikc2linux);
int rc;
struct mcs_rwlock_node_irqsave lock;
struct thread *thread;
struct process *proc;
struct mcctrl_signal {
@@ -596,9 +599,14 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_PREPARE_PROCESS:
pckt.err = process_msg_prepare_process(packet->arg);
pckt.msg = SCD_MSG_PREPARE_PROCESS_ACKED;
pckt.reply = packet->reply;
if((rc = process_msg_prepare_process(packet->arg)) == 0){
pckt.msg = SCD_MSG_PREPARE_PROCESS_ACKED;
pckt.err = 0;
}
else{
pckt.msg = SCD_MSG_PREPARE_PROCESS_NACKED;
pckt.err = rc;
}
pckt.ref = packet->ref;
pckt.arg = packet->arg;
syscall_channel_send(resp_channel, &pckt);
@@ -609,7 +617,7 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_SCHEDULE_PROCESS:
thread = (struct thread *)packet->arg;
cpuid = obtain_clone_cpuid(&thread->cpu_set, 0);
cpuid = obtain_clone_cpuid(&thread->cpu_set);
if (cpuid == -1) {
kprintf("No CPU available\n");
ret = -1;
@@ -633,14 +641,14 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
* the waiting thread
*/
case SCD_MSG_WAKE_UP_SYSCALL_THREAD:
thread = find_thread(0, packet->ttid);
thread = find_thread(0, packet->ttid, &lock);
if (!thread) {
kprintf("%s: WARNING: no thread for SCD reply? TID: %d\n",
__FUNCTION__, packet->ttid);
ret = -EINVAL;
break;
}
thread_unlock(thread);
thread_unlock(thread, &lock);
dkprintf("%s: SCD_MSG_WAKE_UP_SYSCALL_THREAD: waking up tid %d\n",
__FUNCTION__, packet->ttid);
@@ -652,13 +660,12 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
pp = ihk_mc_map_memory(NULL, packet->arg, sizeof(struct mcctrl_signal));
sp = (struct mcctrl_signal *)ihk_mc_map_virtual(pp, 1, PTATTR_WRITABLE | PTATTR_ACTIVE);
memcpy(&info, sp, sizeof(struct mcctrl_signal));
ihk_mc_unmap_virtual(sp, 1);
ihk_mc_unmap_virtual(sp, 1, 0);
ihk_mc_unmap_memory(NULL, pp, sizeof(struct mcctrl_signal));
pckt.msg = SCD_MSG_SEND_SIGNAL_ACK;
pckt.msg = SCD_MSG_SEND_SIGNAL;
pckt.err = 0;
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
syscall_channel_send(resp_channel, &pckt);
rc = do_kill(NULL, info.pid, info.tid, info.sig, &info.info, 0);
@@ -667,14 +674,7 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
break;
case SCD_MSG_PROCFS_REQUEST:
case SCD_MSG_PROCFS_RELEASE:
pckt.msg = SCD_MSG_PROCFS_ANSWER;
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.err = process_procfs_request(packet);
pckt.reply = packet->reply;
pckt.pid = packet->pid;
syscall_channel_send(resp_channel, &pckt);
process_procfs_request(packet);
ret = 0;
break;
@@ -711,26 +711,17 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
if (!pcd->exclude_user) {
mode |= PERFCTR_USER_MODE;
}
ret = ihk_mc_perfctr_init_raw(pcd->target_cntr, pcd->config, mode);
if (ret != 0) {
break;
}
ret = ihk_mc_perfctr_stop(1 << pcd->target_cntr);
if (ret != 0) {
break;
}
ret = ihk_mc_perfctr_reset(pcd->target_cntr);
ihk_mc_perfctr_init_raw(pcd->target_cntr, pcd->config, mode);
ihk_mc_perfctr_stop(1 << pcd->target_cntr);
ihk_mc_perfctr_reset(pcd->target_cntr);
break;
case PERF_CTRL_ENABLE:
ret = ihk_mc_perfctr_start(pcd->target_cntr_mask);
ihk_mc_perfctr_start(pcd->target_cntr_mask);
break;
case PERF_CTRL_DISABLE:
ret = ihk_mc_perfctr_stop(pcd->target_cntr_mask);
ihk_mc_perfctr_stop(pcd->target_cntr_mask);
break;
case PERF_CTRL_GET:
@@ -741,15 +732,15 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
kprintf("%s: SCD_MSG_PERF_CTRL unexpected ctrl_type\n", __FUNCTION__);
}
ihk_mc_unmap_virtual(pcd, 1);
ihk_mc_unmap_virtual(pcd, 1, 0);
ihk_mc_unmap_memory(NULL, pp, sizeof(struct perf_ctrl_desc));
pckt.msg = SCD_MSG_PERF_ACK;
pckt.err = ret;
pckt.err = 0;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
ihk_ikc_send(resp_channel, &pckt, 0);
ret = 0;
break;
case SCD_MSG_CPU_RW_REG:

View File

@@ -1,303 +0,0 @@
#include <memobj.h>
#include <ihk/mm.h>
#include <kmsg.h>
#include <kmalloc.h>
#include <string.h>
#include <debug.h>
#if DEBUG_HUGEFILEOBJ
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
struct hugefilechunk {
struct list_head list;
off_t pgoff;
int npages;
void *mem;
};
struct hugefileobj {
struct memobj memobj;
size_t pgsize;
uintptr_t handle;
unsigned int pgshift;
struct list_head chunk_list;
ihk_spinlock_t chunk_lock;
struct list_head obj_list;
};
static ihk_spinlock_t hugefileobj_list_lock;
static LIST_HEAD(hugefileobj_list);
static struct hugefileobj *to_hugefileobj(struct memobj *memobj)
{
return (struct hugefileobj *)memobj;
}
static struct memobj *to_memobj(struct hugefileobj *obj)
{
return &obj->memobj;
}
static struct hugefileobj *hugefileobj_lookup(uintptr_t handle)
{
struct hugefileobj *p;
list_for_each_entry(p, &hugefileobj_list, obj_list) {
if (p->handle == handle) {
/* for the interval between last put and fileobj_free
* taking list_lock
*/
if (memobj_ref(&p->memobj) <= 1) {
ihk_atomic_dec(&p->memobj.refcnt);
continue;
}
return p;
}
}
return NULL;
}
static int hugefileobj_get_page(struct memobj *memobj, off_t off,
int p2align, uintptr_t *physp,
unsigned long *pflag, uintptr_t virt_addr)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk;
off_t pgoff;
if (p2align != obj->pgshift - PTL1_SHIFT) {
kprintf("%s: p2align %d but expected %d\n",
__func__, p2align, obj->pgshift - PTL1_SHIFT);
return -ENOMEM;
}
pgoff = off >> obj->pgshift;
ihk_mc_spinlock_lock_noirq(&obj->chunk_lock);
list_for_each_entry(chunk, &obj->chunk_list, list) {
if (pgoff >= chunk->pgoff + chunk->npages)
continue;
if (pgoff >= chunk->pgoff)
break;
kprintf("%s: no segment found for pgoff %lx (obj %p)\n",
__func__, pgoff, obj);
chunk = NULL;
break;
}
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
if (!chunk)
return -EIO;
*physp = virt_to_phys(chunk->mem + (off - chunk->pgoff * PAGE_SIZE));
return 0;
}
static void hugefileobj_free(struct memobj *memobj)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk, *next;
dkprintf("Destroying hugefileobj %p\n", memobj);
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
list_del(&obj->obj_list);
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
kfree(memobj->path);
/* don't bother with chunk_lock, memobj refcounting makes this safe */
list_for_each_entry_safe(chunk, next, &obj->chunk_list, list) {
ihk_mc_free_pages_user(chunk->mem, chunk->npages);
kfree(chunk);
}
kfree(memobj);
}
struct memobj_ops hugefileobj_ops = {
.free = hugefileobj_free,
.get_page = hugefileobj_get_page,
};
void hugefileobj_cleanup(void)
{
struct hugefileobj *obj;
int refcnt;
while (true) {
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
if (list_empty(&hugefileobj_list)) {
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
break;
}
obj = list_first_entry(&hugefileobj_list, struct hugefileobj,
obj_list);
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
if ((refcnt = memobj_unref(to_memobj(obj))) != 0) {
kprintf("%s: obj %p had refcnt %ld > 1, destroying anyway\n",
__func__, obj, refcnt + 1);
hugefileobj_free(to_memobj(obj));
}
}
}
int hugefileobj_pre_create(struct pager_create_result *result,
struct memobj **objp, int *maxprotp)
{
struct hugefileobj *obj;
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
obj = hugefileobj_lookup(result->handle);
if (obj)
goto out_unlock;
obj = kmalloc(sizeof(*obj), IHK_MC_AP_NOWAIT);
if (!obj)
return -ENOMEM;
obj->handle = result->handle;
obj->pgsize = result->size;
obj->pgshift = 0;
INIT_LIST_HEAD(&obj->chunk_list);
ihk_mc_spinlock_init(&obj->chunk_lock);
obj->memobj.flags = result->flags;
obj->memobj.status = MEMOBJ_TO_BE_PREFETCHED;
obj->memobj.ops = &hugefileobj_ops;
/* keep mapping around when process is gone */
ihk_atomic_set(&obj->memobj.refcnt, 2);
if (result->path[0]) {
obj->memobj.path = kmalloc(PATH_MAX, IHK_MC_AP_NOWAIT);
if (!obj->memobj.path) {
kfree(obj);
return -ENOMEM;
}
strncpy(obj->memobj.path, result->path, PATH_MAX);
}
list_add(&obj->obj_list, &hugefileobj_list);
out_unlock:
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
*maxprotp = result->maxprot;
*objp = to_memobj(obj);
return 0;
}
int hugefileobj_create(struct memobj *memobj, size_t len, off_t off,
int *pgshiftp, uintptr_t virt_addr)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk = NULL, *old_chunk = NULL;
int p2align;
unsigned int pgshift;
int npages, npages_left;
void *v;
off_t pgoff, next_pgoff;
int error;
error = arch_get_smaller_page_size(NULL, obj->pgsize + 1, NULL,
&p2align);
if (error)
return error;
pgshift = p2align + PTL1_SHIFT;
if (1 << pgshift != obj->pgsize) {
dkprintf("invalid hugefileobj pagesize: %d\n",
obj->pgsize);
return -EINVAL;
}
if (len & ((1 << pgshift) - 1)) {
dkprintf("invalid hugetlbfs mmap size %d (pagesize %d)\n",
len, 1 << pgshift);
obj->pgshift = 0;
return -EINVAL;
}
if (off & ((1 << pgshift) - 1)) {
dkprintf("invalid hugetlbfs mmap offset %d (pagesize %d)\n",
off, 1 << pgshift);
obj->pgshift = 0;
return -EINVAL;
}
ihk_mc_spinlock_lock_noirq(&obj->chunk_lock);
if (obj->pgshift && obj->pgshift != pgshift) {
kprintf("pgshift changed between two calls on same inode?! had %d now %d\n",
obj->pgshift, pgshift);
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
return -EINVAL;
}
obj->pgshift = pgshift;
/* Prealloc upfront, we need to fail here if not enough memory. */
if (!list_empty(&obj->chunk_list))
old_chunk = list_first_entry(&obj->chunk_list,
struct hugefilechunk, list);
pgoff = off >> PAGE_SHIFT;
npages_left = len >> PAGE_SHIFT;
npages = npages_left;
while (npages_left) {
while (old_chunk &&
pgoff >= old_chunk->pgoff + old_chunk->npages) {
if (list_is_last(&old_chunk->list, &obj->chunk_list)) {
old_chunk = NULL;
break;
}
old_chunk = list_entry(old_chunk->list.next,
struct hugefilechunk, list);
}
if (old_chunk) {
next_pgoff = old_chunk->pgoff + old_chunk->npages;
if (pgoff >= old_chunk->pgoff && pgoff < next_pgoff) {
npages_left -= next_pgoff - pgoff;
pgoff = next_pgoff;
continue;
}
}
if (!chunk) {
chunk = kmalloc(sizeof(*chunk), IHK_MC_AP_NOWAIT);
}
if (!chunk) {
kprintf("could not allocate hugefileobj chunk\n");
return -ENOMEM;
}
if (npages > npages_left)
npages = npages_left;
v = ihk_mc_alloc_aligned_pages_user(npages, p2align,
IHK_MC_AP_NOWAIT | IHK_MC_AP_USER, virt_addr);
if (!v) {
if (npages == 1) {
dkprintf("could not allocate more pages wth pgshift %d\n",
pgshift);
kfree(chunk);
/* caller will cleanup the rest */
return -ENOMEM;
}
/* exponential backoff, try less aggressive? */
npages /= 2;
continue;
}
memset(v, 0, npages * PAGE_SIZE);
chunk->npages = npages;
chunk->mem = v;
chunk->pgoff = pgoff;
/* ordered list: insert before next (bigger) element */
if (old_chunk)
list_add(&chunk->list, old_chunk->list.prev);
else
list_add(&chunk->list, obj->chunk_list.prev);
pgoff += npages;
npages_left -= npages;
}
obj->memobj.size = len;
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
*pgshiftp = pgshift;
return 0;
}

View File

@@ -19,10 +19,17 @@
* CPU Local Storage (cls)
*/
struct kmalloc_cache_header {
struct kmalloc_cache_header *next;
};
struct kmalloc_header {
unsigned int front_magic;
int cpu_id;
struct list_head list;
union {
struct list_head list;
struct kmalloc_cache_header *cache;
};
int size; /* The size of this chunk without the header */
unsigned int end_magic;
/* 32 bytes */
@@ -74,7 +81,6 @@ struct cpu_local_var {
struct thread *current;
struct list_head runq;
size_t runq_len;
size_t runq_reserved; /* Number of threads which are about to be added to runq */
struct ihk_ikc_channel_desc *ikc2linux;
@@ -101,8 +107,11 @@ struct cpu_local_var {
struct process_vm *on_fork_vm;
/* UTI */
void *uti_futex_resp;
/* HFI1 related per-core kmalloc caches */
struct kmalloc_cache_header txreq_cache;
struct kmalloc_cache_header tids_cache;
struct kmalloc_cache_header tidlist_cache;
struct kmalloc_cache_header tid_node_cache;
} __attribute__((aligned(64)));
@@ -114,6 +123,4 @@ static struct cpu_local_var *get_this_cpu_local_var(void)
#define cpu_local_var(name) get_this_cpu_local_var()->name
#define cpu_local_var_with_override(name, clv_override) (clv_override ? clv_override->name : get_this_cpu_local_var()->name)
#endif

View File

@@ -1,54 +0,0 @@
#ifndef DEBUG_H
#define DEBUG_H
#include "lwk/compiler.h"
void panic(const char *);
/* when someone has a lot of time, add attribute __printf(1, 2) to kprintf */
int kprintf(const char *format, ...);
struct ddebug {
const char *file;
const char *func;
const char *fmt;
unsigned int line:24;
unsigned int flags:8;
} __aligned(8);
#define DDEBUG_NONE 0x0
#define DDEBUG_PRINT 0x1
#define DDEBUG_DEFAULT DDEBUG_NONE
#define DDEBUG_SYMBOL() \
static struct ddebug __aligned(8) \
__attribute__((section("__verbose"))) ddebug = { \
.file = __FILE__, \
.func = __func__, \
.line = __LINE__, \
.flags = DDEBUG_DEFAULT, \
}
#define DDEBUG_TEST ddebug.flags
#define dkprintf(fmt, args...) \
do { \
DDEBUG_SYMBOL(); \
if (DDEBUG_TEST) \
kprintf(fmt, ##args); \
} while (0)
#define ekprintf(fmt, args...) kprintf(fmt, ##args)
#define BUG_ON(condition) do { \
if (condition) { \
kprintf("PANIC: %s: %s(line:%d)\n", \
__FILE__, __func__, __LINE__); \
panic(""); \
} \
} while (0)
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#endif

View File

@@ -63,7 +63,7 @@
#define FUTEX_OP_ANDN 3 /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR 4 /* *(int *)UADDR2 ^= OPARG; */
#define FUTEX_OP_OPARG_SHIFT 8U /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_OPARG_SHIFT 8 /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_CMP_EQ 0 /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE 1 /* if (oldval != CMPARG) wake */
@@ -150,7 +150,6 @@ union futex_key {
extern int futex_init(void);
struct cpu_local_var;
extern int
futex(
uint32_t __user * uaddr,
@@ -160,8 +159,7 @@ futex(
uint32_t __user * uaddr2,
uint32_t val2,
uint32_t val3,
int fshared,
struct cpu_local_var *clv_override
int fshared
);

View File

@@ -0,0 +1,60 @@
#ifndef _CHIP_H
#define _CHIP_H
/*
* Copyright(c) 2015, 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
/*
* This file contains all of the defines that is specific to the HFI chip
*/
#define MAX_EXPECTED_BUFFER (2048 * 1024)
void hfi1_put_tid(struct hfi1_devdata *dd, u32 index,
u32 type, unsigned long pa, u16 order);
void hfi1_clear_tids(struct hfi1_ctxtdata *rcd);
#endif /* _CHIP_H */

View File

@@ -0,0 +1,64 @@
#ifndef DEF_CHIP_REG
#define DEF_CHIP_REG
/*
* Copyright(c) 2015, 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
#define CORE 0x000000000000
#define RXE (CORE + 0x000001000000)
#define RCV_ARRAY (RXE + 0x000000200000)
#define RCV_ARRAY_CNT (RXE + 0x000000000018)
#define RCV_ARRAY_RT_ADDR_MASK 0xFFFFFFFFFull
#define RCV_ARRAY_RT_ADDR_SHIFT 0
#define RCV_ARRAY_RT_BUF_SIZE_SHIFT 36
#define RCV_ARRAY_RT_WRITE_ENABLE_SMASK 0x8000000000000000ull
#endif /* DEF_CHIP_REG */

View File

@@ -0,0 +1,411 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
#ifndef _COMMON_H
#define _COMMON_H
#ifdef __HFI1_ORIG__
#include "update/hfi1_user.h"
#else
#include <hfi1/hfi1_user.h>
#endif /* __HFI1_ORIG__ */
/*
* This file contains defines, structures, etc. that are used
* to communicate between kernel and user code.
*/
/* version of protocol header (known to chip also). In the long run,
* we should be able to generate and accept a range of version numbers;
* for now we only accept one, and it's compiled in.
*/
#define IPS_PROTO_VERSION 2
/*
* These are compile time constants that you may want to enable or disable
* if you are trying to debug problems with code or performance.
* HFI1_VERBOSE_TRACING define as 1 if you want additional tracing in
* fast path code
* HFI1_TRACE_REGWRITES define as 1 if you want register writes to be
* traced in fast path code
* _HFI1_TRACING define as 0 if you want to remove all tracing in a
* compilation unit
*/
/*
* If a packet's QP[23:16] bits match this value, then it is
* a PSM packet and the hardware will expect a KDETH header
* following the BTH.
*/
#define DEFAULT_KDETH_QP 0x80
/* driver/hw feature set bitmask */
#define HFI1_CAP_USER_SHIFT 24
#define HFI1_CAP_MASK ((1UL << HFI1_CAP_USER_SHIFT) - 1)
/* locked flag - if set, only HFI1_CAP_WRITABLE_MASK bits can be set */
#define HFI1_CAP_LOCKED_SHIFT 63
#define HFI1_CAP_LOCKED_MASK 0x1ULL
#define HFI1_CAP_LOCKED_SMASK (HFI1_CAP_LOCKED_MASK << HFI1_CAP_LOCKED_SHIFT)
/* extra bits used between kernel and user processes */
#define HFI1_CAP_MISC_SHIFT (HFI1_CAP_USER_SHIFT * 2)
#define HFI1_CAP_MISC_MASK ((1ULL << (HFI1_CAP_LOCKED_SHIFT - \
HFI1_CAP_MISC_SHIFT)) - 1)
#define HFI1_CAP_KSET(cap) ({ hfi1_cap_mask |= HFI1_CAP_##cap; hfi1_cap_mask; })
#define HFI1_CAP_KCLEAR(cap) \
({ \
hfi1_cap_mask &= ~HFI1_CAP_##cap; \
hfi1_cap_mask; \
})
#define HFI1_CAP_USET(cap) \
({ \
hfi1_cap_mask |= (HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT); \
hfi1_cap_mask; \
})
#define HFI1_CAP_UCLEAR(cap) \
({ \
hfi1_cap_mask &= ~(HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT); \
hfi1_cap_mask; \
})
#define HFI1_CAP_SET(cap) \
({ \
hfi1_cap_mask |= (HFI1_CAP_##cap | (HFI1_CAP_##cap << \
HFI1_CAP_USER_SHIFT)); \
hfi1_cap_mask; \
})
#define HFI1_CAP_CLEAR(cap) \
({ \
hfi1_cap_mask &= ~(HFI1_CAP_##cap | \
(HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT)); \
hfi1_cap_mask; \
})
#define HFI1_CAP_LOCK() \
({ hfi1_cap_mask |= HFI1_CAP_LOCKED_SMASK; hfi1_cap_mask; })
#define HFI1_CAP_LOCKED() (!!(hfi1_cap_mask & HFI1_CAP_LOCKED_SMASK))
/*
* The set of capability bits that can be changed after initial load
* This set is the same for kernel and user contexts. However, for
* user contexts, the set can be further filtered by using the
* HFI1_CAP_RESERVED_MASK bits.
*/
#define HFI1_CAP_WRITABLE_MASK (HFI1_CAP_SDMA_AHG | \
HFI1_CAP_HDRSUPP | \
HFI1_CAP_MULTI_PKT_EGR | \
HFI1_CAP_NODROP_RHQ_FULL | \
HFI1_CAP_NODROP_EGR_FULL | \
HFI1_CAP_ALLOW_PERM_JKEY | \
HFI1_CAP_STATIC_RATE_CTRL | \
HFI1_CAP_PRINT_UNIMPL | \
HFI1_CAP_TID_UNMAP | \
HFI1_CAP_OPFN | \
HFI1_CAP_TID_RDMA)
/*
* A set of capability bits that are "global" and are not allowed to be
* set in the user bitmask.
*/
#define HFI1_CAP_RESERVED_MASK ((HFI1_CAP_SDMA | \
HFI1_CAP_USE_SDMA_HEAD | \
HFI1_CAP_EXTENDED_PSN | \
HFI1_CAP_PRINT_UNIMPL | \
HFI1_CAP_NO_INTEGRITY | \
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_TID_RDMA | \
HFI1_CAP_OPFN) << \
HFI1_CAP_USER_SHIFT)
/*
* Set of capabilities that need to be enabled for kernel context in
* order to be allowed for user contexts, as well.
*/
#define HFI1_CAP_MUST_HAVE_KERN (HFI1_CAP_STATIC_RATE_CTRL)
/* Default enabled capabilities (both kernel and user) */
#define HFI1_CAP_MASK_DEFAULT (HFI1_CAP_HDRSUPP | \
HFI1_CAP_NODROP_RHQ_FULL | \
HFI1_CAP_NODROP_EGR_FULL | \
HFI1_CAP_SDMA | \
HFI1_CAP_PRINT_UNIMPL | \
HFI1_CAP_STATIC_RATE_CTRL | \
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_MULTI_PKT_EGR | \
HFI1_CAP_EXTENDED_PSN | \
((HFI1_CAP_HDRSUPP | \
HFI1_CAP_MULTI_PKT_EGR | \
HFI1_CAP_STATIC_RATE_CTRL | \
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_EARLY_CREDIT_RETURN) << \
HFI1_CAP_USER_SHIFT))
/*
* A bitmask of kernel/global capabilities that should be communicated
* to user level processes.
*/
#define HFI1_CAP_K2U (HFI1_CAP_SDMA | \
HFI1_CAP_EXTENDED_PSN | \
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_NO_INTEGRITY)
#define HFI1_USER_SWVERSION ((HFI1_USER_SWMAJOR << HFI1_SWMAJOR_SHIFT) | \
HFI1_USER_SWMINOR)
#ifndef HFI1_KERN_TYPE
#define HFI1_KERN_TYPE 0
#endif
/*
* Similarly, this is the kernel version going back to the user. It's
* slightly different, in that we want to tell if the driver was built as
* part of a Intel release, or from the driver from openfabrics.org,
* kernel.org, or a standard distribution, for support reasons.
* The high bit is 0 for non-Intel and 1 for Intel-built/supplied.
*
* It's returned by the driver to the user code during initialization in the
* spi_sw_version field of hfi1_base_info, so the user code can in turn
* check for compatibility with the kernel.
*/
#define HFI1_KERN_SWVERSION ((HFI1_KERN_TYPE << 31) | HFI1_USER_SWVERSION)
/*
* Define the driver version number. This is something that refers only
* to the driver itself, not the software interfaces it supports.
*/
#ifndef HFI1_DRIVER_VERSION_BASE
#define HFI1_DRIVER_VERSION_BASE "0.9-294"
#endif
/* create the final driver version string */
#ifdef HFI1_IDSTR
#define HFI1_DRIVER_VERSION HFI1_DRIVER_VERSION_BASE " " HFI1_IDSTR
#else
#define HFI1_DRIVER_VERSION HFI1_DRIVER_VERSION_BASE
#endif
/*
* Diagnostics can send a packet by writing the following
* struct to the diag packet special file.
*
* This allows a custom PBC qword, so that special modes and deliberate
* changes to CRCs can be used.
*/
#define _DIAG_PKT_VERS 1
struct diag_pkt {
__u16 version; /* structure version */
__u16 unit; /* which device */
__u16 sw_index; /* send sw index to use */
__u16 len; /* data length, in bytes */
__u16 port; /* port number */
__u16 unused;
__u32 flags; /* call flags */
__u64 data; /* user data pointer */
__u64 pbc; /* PBC for the packet */
};
/* diag_pkt flags */
#define F_DIAGPKT_WAIT 0x1 /* wait until packet is sent */
/*
* The next set of defines are for packet headers, and chip register
* and memory bits that are visible to and/or used by user-mode software.
*/
/*
* Receive Header Flags
*/
#define RHF_PKT_LEN_SHIFT 0
#define RHF_PKT_LEN_MASK 0xfffull
#define RHF_PKT_LEN_SMASK (RHF_PKT_LEN_MASK << RHF_PKT_LEN_SHIFT)
#define RHF_RCV_TYPE_SHIFT 12
#define RHF_RCV_TYPE_MASK 0x7ull
#define RHF_RCV_TYPE_SMASK (RHF_RCV_TYPE_MASK << RHF_RCV_TYPE_SHIFT)
#define RHF_USE_EGR_BFR_SHIFT 15
#define RHF_USE_EGR_BFR_MASK 0x1ull
#define RHF_USE_EGR_BFR_SMASK (RHF_USE_EGR_BFR_MASK << RHF_USE_EGR_BFR_SHIFT)
#define RHF_EGR_INDEX_SHIFT 16
#define RHF_EGR_INDEX_MASK 0x7ffull
#define RHF_EGR_INDEX_SMASK (RHF_EGR_INDEX_MASK << RHF_EGR_INDEX_SHIFT)
#define RHF_DC_INFO_SHIFT 27
#define RHF_DC_INFO_MASK 0x1ull
#define RHF_DC_INFO_SMASK (RHF_DC_INFO_MASK << RHF_DC_INFO_SHIFT)
#define RHF_RCV_SEQ_SHIFT 28
#define RHF_RCV_SEQ_MASK 0xfull
#define RHF_RCV_SEQ_SMASK (RHF_RCV_SEQ_MASK << RHF_RCV_SEQ_SHIFT)
#define RHF_EGR_OFFSET_SHIFT 32
#define RHF_EGR_OFFSET_MASK 0xfffull
#define RHF_EGR_OFFSET_SMASK (RHF_EGR_OFFSET_MASK << RHF_EGR_OFFSET_SHIFT)
#define RHF_HDRQ_OFFSET_SHIFT 44
#define RHF_HDRQ_OFFSET_MASK 0x1ffull
#define RHF_HDRQ_OFFSET_SMASK (RHF_HDRQ_OFFSET_MASK << RHF_HDRQ_OFFSET_SHIFT)
#define RHF_K_HDR_LEN_ERR (0x1ull << 53)
#define RHF_DC_UNC_ERR (0x1ull << 54)
#define RHF_DC_ERR (0x1ull << 55)
#define RHF_RCV_TYPE_ERR_SHIFT 56
#define RHF_RCV_TYPE_ERR_MASK 0x7ul
#define RHF_RCV_TYPE_ERR_SMASK (RHF_RCV_TYPE_ERR_MASK << RHF_RCV_TYPE_ERR_SHIFT)
#define RHF_TID_ERR (0x1ull << 59)
#define RHF_LEN_ERR (0x1ull << 60)
#define RHF_ECC_ERR (0x1ull << 61)
#define RHF_VCRC_ERR (0x1ull << 62)
#define RHF_ICRC_ERR (0x1ull << 63)
#define RHF_ERROR_SMASK 0xffe0000000000000ull /* bits 63:53 */
/* RHF receive types */
#define RHF_RCV_TYPE_EXPECTED 0
#define RHF_RCV_TYPE_EAGER 1
#define RHF_RCV_TYPE_IB 2 /* normal IB, IB Raw, or IPv6 */
#define RHF_RCV_TYPE_ERROR 3
#define RHF_RCV_TYPE_BYPASS 4
#define RHF_RCV_TYPE_INVALID5 5
#define RHF_RCV_TYPE_INVALID6 6
#define RHF_RCV_TYPE_INVALID7 7
/* RHF receive type error - expected packet errors */
#define RHF_RTE_EXPECTED_FLOW_SEQ_ERR 0x2
#define RHF_RTE_EXPECTED_FLOW_GEN_ERR 0x4
/* RHF receive type error - eager packet errors */
#define RHF_RTE_EAGER_NO_ERR 0x0
/* RHF receive type error - IB packet errors */
#define RHF_RTE_IB_NO_ERR 0x0
/* RHF receive type error - error packet errors */
#define RHF_RTE_ERROR_NO_ERR 0x0
#define RHF_RTE_ERROR_OP_CODE_ERR 0x1
#define RHF_RTE_ERROR_KHDR_MIN_LEN_ERR 0x2
#define RHF_RTE_ERROR_KHDR_HCRC_ERR 0x3
#define RHF_RTE_ERROR_KHDR_KVER_ERR 0x4
#define RHF_RTE_ERROR_CONTEXT_ERR 0x5
#define RHF_RTE_ERROR_KHDR_TID_ERR 0x6
/* RHF receive type error - bypass packet errors */
#define RHF_RTE_BYPASS_NO_ERR 0x0
/* IB - LRH header constants */
#define HFI1_LRH_GRH 0x0003 /* 1. word of IB LRH - next header: GRH */
#define HFI1_LRH_BTH 0x0002 /* 1. word of IB LRH - next header: BTH */
/* misc. */
#define SIZE_OF_CRC 1
#define LIM_MGMT_P_KEY 0x7FFF
#define FULL_MGMT_P_KEY 0xFFFF
#define DEFAULT_P_KEY LIM_MGMT_P_KEY
#define HFI1_FECN_SHIFT 31
#define HFI1_FECN_MASK 1
#define HFI1_FECN_SMASK BIT(HFI1_FECN_SHIFT)
#define HFI1_BECN_SHIFT 30
#define HFI1_BECN_MASK 1
#define HFI1_BECN_SMASK BIT(HFI1_BECN_SHIFT)
#define HFI1_PSM_IOC_BASE_SEQ 0x0
/* Number of BTH.PSN bits used for sequence number in expected rcvs */
#define HFI1_KDETH_BTH_SEQ_SHIFT 11
#define HFI1_KDETH_BTH_SEQ_MASK (BIT(HFI1_KDETH_BTH_SEQ_SHIFT) - 1)
static inline __u64 rhf_to_cpu(const __le32 *rbuf)
{
return __le64_to_cpu(*((__le64 *)rbuf));
}
static inline u64 rhf_err_flags(u64 rhf)
{
return rhf & RHF_ERROR_SMASK;
}
static inline u32 rhf_rcv_type(u64 rhf)
{
return (rhf >> RHF_RCV_TYPE_SHIFT) & RHF_RCV_TYPE_MASK;
}
static inline u32 rhf_rcv_type_err(u64 rhf)
{
return (rhf >> RHF_RCV_TYPE_ERR_SHIFT) & RHF_RCV_TYPE_ERR_MASK;
}
/* return size is in bytes, not DWORDs */
static inline u32 rhf_pkt_len(u64 rhf)
{
return ((rhf & RHF_PKT_LEN_SMASK) >> RHF_PKT_LEN_SHIFT) << 2;
}
static inline u32 rhf_egr_index(u64 rhf)
{
return (rhf >> RHF_EGR_INDEX_SHIFT) & RHF_EGR_INDEX_MASK;
}
static inline u32 rhf_rcv_seq(u64 rhf)
{
return (rhf >> RHF_RCV_SEQ_SHIFT) & RHF_RCV_SEQ_MASK;
}
/* returned offset is in DWORDS */
static inline u32 rhf_hdrq_offset(u64 rhf)
{
return (rhf >> RHF_HDRQ_OFFSET_SHIFT) & RHF_HDRQ_OFFSET_MASK;
}
static inline u64 rhf_use_egr_bfr(u64 rhf)
{
return rhf & RHF_USE_EGR_BFR_SMASK;
}
static inline u64 rhf_dc_info(u64 rhf)
{
return rhf & RHF_DC_INFO_SMASK;
}
static inline u32 rhf_egr_buf_offset(u64 rhf)
{
return (rhf >> RHF_EGR_OFFSET_SHIFT) & RHF_EGR_OFFSET_MASK;
}
#endif /* _COMMON_H */

View File

@@ -0,0 +1,9 @@
#ifndef _HFI1_FILE_OPS_H_
#define _HFI1_FILE_OPS_H_
#include <ihk/types.h>
#include <uio.h>
ssize_t hfi1_aio_write(void *private_data, const struct iovec *iovec, unsigned long dim);
#endif

1232
kernel/include/hfi1/hfi.h Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,41 @@
struct hfi1_ctxtdata {
union {
char whole_struct[1160];
struct {
char padding0[144];
u16 ctxt;
};
struct {
char padding1[168];
u32 rcv_array_groups;
};
struct {
char padding2[172];
u32 eager_base;
};
struct {
char padding3[176];
u32 expected_count;
};
struct {
char padding4[180];
u32 expected_base;
};
struct {
char padding5[184];
struct exp_tid_set tid_group_list;
};
struct {
char padding6[208];
struct exp_tid_set tid_used_list;
};
struct {
char padding7[232];
struct exp_tid_set tid_full_list;
};
struct {
char padding8[392];
struct hfi1_devdata *dd;
};
};
};

View File

@@ -0,0 +1,65 @@
struct hfi1_devdata {
union {
char whole_struct[7808];
struct {
char padding0[3368];
u8 *kregbase1;
};
struct {
char padding1[3376];
resource_size_t physaddr;
};
struct {
char padding2[3704];
u64 default_desc1;
};
struct {
char padding3[3736];
dma_addr_t sdma_pad_phys;
};
struct {
char padding4[3760];
struct sdma_engine *per_sdma;
};
struct {
char padding5[3768];
struct sdma_vl_map *sdma_map;
};
struct {
char padding6[3816];
void *piobase;
};
struct {
char padding7[3824];
void *rcvarray_wc;
};
struct {
char padding8[4040];
long unsigned int *events;
};
struct {
char padding9[4076];
u32 chip_rcv_contexts;
};
struct {
char padding10[4080];
u32 chip_rcv_array_count;
};
struct {
char padding11[7264];
struct hfi1_pportdata *pport;
};
struct {
char padding12[7296];
u16 flags;
};
struct {
char padding13[7299];
u8 first_dyn_alloc_ctxt;
};
struct {
char padding14[7368];
u64 sc2vl[4];
};
};
};

View File

@@ -0,0 +1,49 @@
struct hfi1_filedata {
union {
char whole_struct[104];
struct {
char padding0[0];
struct hfi1_devdata *dd;
};
struct {
char padding1[8];
struct hfi1_ctxtdata *uctxt;
};
struct {
char padding2[16];
struct hfi1_user_sdma_comp_q *cq;
};
struct {
char padding3[24];
struct hfi1_user_sdma_pkt_q *pq;
};
struct {
char padding4[32];
u16 subctxt;
};
struct {
char padding5[56];
struct tid_rb_node **entry_to_rb;
};
struct {
char padding6[64];
spinlock_t tid_lock;
};
struct {
char padding7[72];
u32 tid_used;
};
struct {
char padding8[80];
u32 *invalid_tids;
};
struct {
char padding9[88];
u32 invalid_tid_idx;
};
struct {
char padding10[92];
spinlock_t invalid_lock;
};
};
};

View File

@@ -0,0 +1,29 @@
struct hfi1_user_sdma_pkt_q {
union {
char whole_struct[352];
struct {
char padding0[4];
u16 n_max_reqs;
};
struct {
char padding1[8];
atomic_t n_reqs;
};
struct {
char padding2[16];
struct hfi1_devdata *dd;
};
struct {
char padding3[32];
struct user_sdma_request *reqs;
};
struct {
char padding4[40];
long unsigned int *req_in_use;
};
struct {
char padding5[288];
enum pkt_q_sdma_state state;
};
};
};

Some files were not shown because too many files have changed in this diff Show More