* Observing the problem. Without being able to reproduce the problem, one cannot observe it or find any new facts.
* Check for success. How do you know that the problem is actually fixed?
2. Reproducing is Tough
* Reproducing is one of the toughest problems in debugging.
* One must recreate the environment in which the problem occurred
* also need to recreate the problem history – the steps that lead to the problem
3. Reproducing the Environment
* Iterative Reproduction
* Start with your environment
* While the problem is not reproduced, adapt more and more circumstances from the user’s environment
* Iteration ends when problem is reproduced (or when environments are “identical”)
* Side effect: Learn about failure-inducing circumstances
4. Reproducing Execution
* Basic idea: Any execution is determined by the input (in a general sense)
* Reproducing input → reproducing execution!

4.1 data
* Easy to transfer and replicate
* Caveat #1: Get all the data you need
* Caveat #2: Get only the data you need
* Caveat #3: Privacy issues
4.2 User interaction
Record and Replay
4.3 Communication
* General idea: Record and replaylike user interaction
* Bad impact on performance
* Alternative #1: Only record since last checkpoint (= reproducible state)
* Alternative #2: Only record “last” transaction
ZW comments: not sure what the alternatives mean.
4.4 Randomness
* Program behaves different in every run
* Based on random number generator
* Pseudo-random: save seed (and make it configurable): Same applies to time of day
* True random: record + replay sequence
4.5 Operating System
* The OS handles entire interaction between program and environment
* Recording and replaying OS interaction thus makes entire program run reproducible
* Trace the program and replay the trace
* Trace Challenges
Tracing creates lots of data
Example: Web server with 10 requests/secA trace of 10 k/request means 8GB/day
All of this must be replayed to reproduce the failure (alternative: checkpoints)
Huge performance penalty!
4.6 Scheduling
* Thread changes are induced by a scheduler
* It suffices to record the schedule (i.e. the moments in time at which thread switches occur) and to replay it
* Requires deterministic input replay
Constructive Solutions
* Lock resource before writing
* Check resource update time before writing
* ... or any other synchronization mechanism
4.7 Physical Influences
Rare and hard to reproduce
* Static electricity
* Alpha particles (not cosmic rays)
* Quantum effects
* Humidity
* Mechanical failures + real bugs
4.8 Debugging Tools
* Heisenbug: Code fails outside debugger only
* Bohr Bug, Mandelbug, Schrödinbug
Bohr Bug = Repeatable under well-def’d conditions
Heisenbug = Changes when observed
Mandelbug = Causes are complex and chaotic, appears non-deterministic, but isn’t
Schrödinbug = Never should have worked, and promptly fails as soon one realizes this
5. Isolating Units
* Capture + replay unit instead of program
* Needs an unit control layer to monitor input
Examples:
* Databases. Replay only the interaction with the database.
* Compilers. Record + replay intermediate data structures rather than the entire front-end.
* Networking. Record + replay communication calls.
More Interaction
* Variables (hard to detect)
* Other units (break dependency if needed)
* Time (record + replay, too)
*
*
没有评论:
发表评论