I was benchmarking forkIO
with the following code:
import System.Time.Extra
import Control.Concurrent
import Control.Monad
import Data.IORef
n = 200000
main :: IO ()
main = do
bar <- newEmptyMVar
count <- newIORef (0 :: Int)
(d, _) <- duration $ do
replicateM_ n $ do
forkIO $ do
v <- atomicModifyIORef' count $ \old -> (old + 1, old + 1)
when (v == n) $ putMVar bar ()
takeMVar bar
putStrLn $ showDuration d
This spawns 20K threads, counts how many have run with an IORef
, and when they have all started, finishes. When run on GHC 8.10.1 on Windows with the command ghc --make -O2 Main -threaded && main +RTS -N4
the performance varies remarkably. Sometimes it takes > 1s (e.g. 1.19s) and sometimes it takes < 0.1s (e.g. 0.08s). It seems that it is in the faster bucket about 1/6th of the time. Why the difference in performance? What causes it to go faster?
When I scale n
up to 1M the effect goes away and it's always in the 5+s range.
I can confirm the same behavior on Ubuntu as well. Except when I set
n=1M
this behavior does not go away and runtime ranges for me from 2 to 7 sec.我相信调度程序的不确定性是造成运行时如此巨大差异的原因。当然,这不是确定的答案,因为这只是我的猜测。
atomicModifyIORef'
is implemented with CAS (compare-and-swap), so depending on how threads are executed the functionold + 1
will be recomputed more or less times. In other words if a thread B updates thecount
ref before thread A gets a chance to update thecount
ref, but after it started the update, it will have to start over with the update operation from the beginning, thus reading new updated value from the ref and recomputingold + 1
once again.If you run
main +RTS -N1
, you will see that not only it takes a lot less time to run the program, but also the runtime is pretty consistent between executions. I suspect it is because only one thread can run at any time and there is no preemption untilatomicModifyIORef'
is done.希望其他对Haskell RTS有深入了解的人可以对此行为提供更多的见解,但这就是我的看法。