Run time problems FAQ

From Tobyfit

Jump to: navigation, search

Contents

When I run the Tobyfit GUI it warns that <startup_tobyfit> cannot find either ffind or load_spe_df.

These are binary files compiled using the Matlab mex script from C or Fortran code. Some binaries are distributed with the Tobyfit code for Windows 32bit and Linux 64bit machines. If you have a different architecture (e.g. 64bit Windows) you must compile the file ffind.c in the directory src/plot/mslice/mslice using the mex script for your system. Otherwise mslice will not work to display SPE data. For best performance you should compile the Fortran files such as load_spe_df.f using a properly configured mex script for your architecture and compiler.

I do not have access to some of the grid resources listed and want to avoid time testing them on start up, i.e. remove them from the list of machines.

The set of grid resources to be tested is given by the file Resources.properties which is located in the directory GaptkComputeToolbox/lib. This will either be in $matlabroot/toolbox or the e-science directory. It contains one block of values for each grid resource. You can remove a grid resource by deleting all the values associated with it. Users outside of RAL may wish to remove hosts scarf and mordred which are usually only available to users on site.

How do I monitor the progress of a job?

There is no automatic update of the job status tables for fitting and simulation. You must click the query button to check if a job has finished and to collect the results. For local Windows jobs it is useful to have the task manager running, even if iconised, since you can see the CPU Usage icon on the task bar. When you query an active local fitting job it will show the current version of the diagnostics file in the text area. The details of the last completed fitting step are shown at the end of the file so the user can check if it appears to be converging. Remote jobs need to be polled using the query button and these also return information on the current state of fitting jobs, but not simulation ones. For local fitting jobs the diagnostic file is returned during the course of the computation, while for remote grid jobs the output file is returned. Both of these contain similar information.

Why are some local jobs marked ACTIVE even when the executables are finished?

This is thought to be due to a bug in the local MPICH system that runs parallel jobs on Windows which fails to inform the GUI that the parallel job has finished. This should be fixed in a future version of MPICH. If the system load average has fallen to zero and, after some time, the job status of the local job is still ACTIVE, then you should select the job and press CANCEL. Then inspect the output file to see if the job did finish successfully.

Why did my job fail to return any plotting files?

Possible causes of failure include that the correct model has not been compiled with the executable or that the selected parameters are not physical. In either case the job may be marked as "COMPLETE-ERR" or "FAILED" and it is necessary to inspect the output file and/or the diagnostic file to see what errors are reported.

tobyfit_update_models('localhost') does not work on Windows

Compiling and linking a new SQW models requires that you have the correct compiler available and that the compile script can find and run it. The current software requires Intel Fortran IA32 and has been tested with compiler versions 9 to 11, with MS Visual Studio 2005. If you do not have the Intel Fortran compiler available you cannot currently build new SQW models on the localhost. If you have access to grid resources such as the NGS you can build new models there.

If an appropriate Intel Fortran compiler is installed and tobyfit_update_models('localhost') still fails, you might have an older version of MS Visual Studio installed, VS2003 generates many error messages due to its use of a different version of the C library. The current version of the compile script, bin\tf_build_local.bat, includes the ifort option /MT to force use of the multithreaded C library. This is the default with VS2005 but not with VS2003.

It is possible that the Intel Fortran compiler has not been installed correctly. Ensure that you can compile and link a simple test program just using the ifort command. Sometimes it is necessary to uninstall and then reinstall the compiler. To get additional details of any error reports from the compiler you can use these commands at the Matlab command prompt, after starting tobyfit GUI:

 tf_dbg
 tobyfit_update_models('localhost')
 tf_dbg


When using tobyfit with recent versions of Matlab why are there warning messages about the EventDispatchThread (EDT)?

The Java Matlab interface is evolving and the API for Matlab calls to Java Swing is not yet finally defined. The current Java implementation allows Java Swing events to occur on the main thread instead of the Java EDT. This warning message highlights this problem, which will be addressed in a future release. For the present a possible fix to suppress these messages is to create (if not already present) a file "java.opts" in your Matlab startup directory and add the value "-Dmathworks.WarnIfWrongThread=false". This has not been tested.

Why do I sometimes get plots of cut data without any axes shown?

This is thought to be a interaction between mslice and mgenie. It can be usually be fixed by destroying the current graphics window and redrawing it.

Why does fitting fail to converge with Monte Carlo integration?

When using Monte Carlo based integration the calculation of chi^2 is subject to some variation due to the nature of the integration process and the number of steps and tolerance specified for the method. If the random noise in chi^2 is comparable to the tolerance requested for convergence of this value then the process can exceed the maximum number of iterations allowed. The chi^2 convergence value can be adjusted in the "fit options" panel which is accessed by the button on the fitting tab. The value is an absolute one, so it may be necessary to attempt a fit first to determine the typical value of chi^2, and hence a reasonable tolerance to use. There is a trade off between the number of MC integration steps to use to get sufficiently accurate chi^2 and the total computation time.

To reduce the variation in chi^2 values you can specify integration mc_type=1 (Sobel sequence) or mc_type=0 (random sequence, but from fixed seed point for each chi^2 evaluation). These methods both ensure that exactly the same set of "random" values are used in each evaluation of chi^2. However the user should be aware that these methods may hide the variability of the calculation and it is advisable to perform at least some simulations with other mc_type values, e.g. 4, which uses a random numbers seeded by the wall clock time.

Personal tools