Table of contents
- Introduction
- Prerequisites and requirements
- Common issues and solutions
- Post-operation verification
- Emergency procedures
- Contacting support
If you are having trouble understanding the language in this document, please reference the Glossary.
NOTE: This is a fairly technical guide, and some of the concepts may require an IT professional. If in doubt, please contact our support team to help with this operation.
Introduction
The "Make Primary Computer" function within the SOS Utilities menu on your SOS Desktop is to help facilitate the transfer of data and configuration files to your backup machine. Sometimes, the SOS computer currently in operation may run into either hardware or software issues, and you will want to use your spare "backup" computer instead. This guide is designed to help with any issues you may run into during that process.
Prerequisites and requirements
System requirements
- SOS Version 6.1 or later
-
Create manual backups
-
Backup current crontabs on both SOS machines:
crontab -l ~/crontab_backup_$(date +%Y%m%d_%H%M)
-
Backup critical data directories outside of automatic backup location
/shared/sos/site-backup.[hostname]
- Document current primary/backup configuration
-
Backup current crontabs on both SOS machines:
-
User accounts
- Both
sos
andsosdemo
user accounts must exist on both machines.- These should have been automatically created during your initial installation.
- Please log into the
sosdemo
user once to create necessary directories automatically.
- Both user passwords for each machine must be known to complete the SSH key setup step.
- Both
-
Network configuration
- Both machines must be on the same network.
- Firewall must be configured to allow SSH connections.
-
Directory structure
-
/shared/sos/
directory structure must exist- This should have been automatically created during your initial installation.
- Sufficient disk space for data synchronization depending on size of custom dataset and playlist data.
-
-
Hostname configuration
- Machine names must end with '1' or '2' (e.g., sos1/sos2, display1/display2).
-
/etc/hosts
file must contain IP address mappings for both machines:- Your IP Addresses may be different for your projectors than seen here below. This is just an example
/etc/hosts
file. Please enter your IP Addresses for SOS1 and SOS2 under[IP of SOS1]
and[IP of SOS2]
respectively.
- Your IP Addresses may be different for your projectors than seen here below. This is just an example
127.0.0.1 sos1 localhost
[IP of SOS1] sos1
[IP of SOS2] sos2
10.1.1.71 projector1
10.1.1.72 projector2
10.1.1.73 projector3
10.1.1.74 projector4
Common issues and solutions
Hostname configuration issues
Symptoms
- Error dialog appears immediately when starting Make Primary operation
- Cannot proceed past initial validation
- System says hostname does not follow naming convention
Error message
Resolution steps
- The Make Primary system requires machines to follow a specific naming convention where paired machines have the same base name and end with '1' and '2'.
- Examples: sos1/sos2, machine1/machine2
- To fix this issue:
- Contact your system administrator to rename the machine.
- Use the command
sudo hostnamectl set-hostname [newname]
where [newname] is the new name of your machine that ends with 1 or 2. - Update
/etc/hosts
file with the new hostname. - Reboot the system for changes to take effect.
- Verify the change by running the
hostname
command.- If it outputs your new name, it has been set correctly.
SSH connectivity failures
Symptoms
- Cannot connect to peer machine during setup
- Timeouts when attempting SSH connections
- Connection refused errors
- Network unreachable messages
Error messages
Resolution steps
-
Verify network connectivity
- Test connectivity to "peer/backup" machine:
ping [peer-hostname]
- If ping fails, check
/etc/hosts
file contains correct IP address mapping. - Verify both machines are on the same network.
- Test connectivity to "peer/backup" machine:
-
Check SSH service status
- On peer machine:
sudo systemctl status ssh
- If not running:
sudo systemctl start ssh
- Enable automatic start:
sudo systemctl enable ssh
- On peer machine:
-
Verify firewall settings
- Check if UFW is blocking connections:
sudo ufw status
- If active, allow SSH:
sudo ufw allow ssh
- Check if UFW is blocking connections:
-
Test manual SSH connection
- Try connecting manually:
ssh [peer-hostname]
- Note any specific error messages for further diagnosis.
- Try connecting manually:
SSH key setup failures
Symptoms
- Terminal window opens but key copy fails
- Password authentication continues to be required
- Permission denied errors during key generation
- SSH key files are missing or corrupted
Error messages
Resolution steps
-
Verify SSH directory permissions on your current machine
- Check directory exists:
ls -la ~/.ssh
- Correct permissions:
chmod 700 ~/.ssh
- If directory missing:
mkdir -p ~/.ssh && chmod 700 ~/.ssh
- Check directory exists:
-
Check disk space availability on your current machine
- Verify space on
/shared/
and/boot/efi/
partition:df -h
- Clean up if necessary to free space for key generation.
- Verify space on
-
Manual SSH key generation on your current machine
- Generate key manually:
ssh-keygen -t rsa -b 2048 -f ~/.ssh/id_rsa -N ''
- Copy key manually:
ssh-copy-id -i ~/.ssh/id_rsa.pub [user]@[host]
- Generate key manually:
-
Install required terminal emulator on your current machine
- Install gnome-terminal:
sudo apt-get install gnome-terminal
- Install gnome-terminal:
-
Verify key installation on your current machine
- Test connection:
ssh -o BatchMode=yes [user]@[host] echo "success"
- You should be able to connect without password prompt.
- Test connection:
Data synchronization errors
Symptoms
- Rsync commands fail during data transfer
- Partial file transfers or corrupted data
- Permission denied when accessing directories
- Out of disk space during synchronization
Error messages
Resolution steps
-
Check disk space on both machines
- Local machine:
df -h /shared /home
- Remote machine:
ssh [peer] "df -h /shared /home"
- Free up space if needed before retrying.
- Local machine:
-
Verify directory permissions
- Check ownership:
ls -la [directory_path]
- Fix ownership if needed:
sudo chown -R sos:sos [directory_path]
- Ensure write permissions:
chmod 755 [directory_path]
- Check ownership:
-
Simulate manual rsync
- Run a simulated rsync with verbose output:
rsync -avzu --progress --dry-run --stats [source] [destination]
- Check for specific error messages, and confirm how much storage space is necessary.
- Run a simulated rsync with verbose output:
-
Verify network stability
- Test sustained connection:
ssh [peer] "sleep 30"
- Check for network interruptions during large transfers.
- Test sustained connection:
-
Restart synchronization process
- The process can be re-run safely.
- Rsync will resume from where it left off.
- Address underlying issues first (disk space, permissions, connectivity).
Crontab swap failures
Symptoms
- Scheduled tasks not transferred between machines
- Crontab commands fail with permission errors
- Original crontabs lost during swap process
- Temporary files not cleaned up
Error messages
Resolution steps
-
Backup existing crontabs before retrying
- Local backup:
crontab -l ~/crontab_backup_$(date +%Y%m%d)
- Remote backup:
ssh [peer] "crontab -l ~/crontab_backup_$(date +%Y%m%d)"
- Local backup:
-
Verify user account access
- Test
sosdemo
access:ssh sosdemo@localhost "whoami"
- Check crontab access:
ssh sosdemo@localhost "crontab -l"
- Test
-
Manual crontab restoration
- If swap failed partially, restore from backup files.
- Edit crontab manually:
crontab -e
- Verify hostname references are correct after swap.
-
Clean up temporary files
- Remove temp files:
rm -f ~/tmp_crontab_*
- On peer:
ssh [peer] "rm -f ~/tmp_crontab_*"
- Remove temp files:
-
Verify crontab hostname adjustments
- Check that hostnames in cron jobs point to correct machines after swap.
- Original references to peer machine should now point to local machine.
- Specifically, the
# Sync media and sosrc to backup computer
lines should point to your new backup computer's hostname.
- Specifically, the
- Edit manually if automatic hostname replacement failed.
Terminal emulator issues
Symptoms
- SSH key setup process cannot open terminal window
- Password entry dialog doesn't appear
- Terminal window closes immediately
- No terminal programs available on system
Error messages
Resolution steps
-
Install terminal emulators on your current machine
- Install gnome-terminal:
sudo apt-get update && sudo apt-get install gnome-terminal
- Install xterm as fallback:
sudo apt-get install xterm
- Verify installation:
which gnome-terminal xterm
- Install gnome-terminal:
-
Test terminal functionality on your current machine
- Test gnome-terminal:
gnome-terminal --version
- Test xterm:
xterm -version
- Test gnome-terminal:
-
Alternative setup method
- Perform SSH key setup manually from command line.
- Use
ssh-copy-id
directly:ssh-copy-id -i ~/.ssh/id_rsa.pub [user]@[host]
- Enter your password when prompted in current the terminal.
Post-operation verification
- Move physical cables from old primary to new primary machine
-
Verify data integrity
- Check that all expected files are present in synchronized directories.
- Compare file counts and sizes between machines.
- Test SOS functionality with synchronized data.
-
Update iPad configuration
- Update iPad's IP Address to point to the new primary machine.
- Test iPad connectivity to the new primary system.
-
Update Kiosk configuration
- The IP Address in the Kiosk Admin Panel will need to be updated to your new machine's address.
- See our Kiosk Troubleshooting for more information.
-
Verify scheduled tasks
- Check crontab entries:
crontab -e
-
Verify hostname references point to correct machines (confirm
[BACKUP-HOSTNAME]
below matches your new backup machine's hostname):# Sync media and sosrc to backup computer 35 03 * * * rsync -vzrl --delete /shared/sos/media/ [BACKUP-HOSTNAME]:/shared/sos/media/ > $HOME/soslogs/media-sync.log 2>&1 35 03 * * * rsync -vzrl --delete /home/sos/sosrc/ [BACKUP_HOSTNAME]:/home/sos/sosrc/ > $HOME/soslogs/sosrc-sync.log 2>&1 35 04 * * * rsync -vzrl --delete /shared/sos/rt/ [BACKUP-HOSTNAME]:/shared/sos/rt/ > $HOME/soslogs/rt-sync.log 2>&1 35 05 * * * rsync -vzrl --delete /shared/sos/database/ [BACKUP-HOSTNAME]:/shared/sos/database/ > $HOME/soslogs/db_backup.log 2>&1
- Confirm that scheduled backup processes work correctly by checking the
/shared/sos/site-backup.HOSTNAME
folder on your new backup machine the next day.
- Check crontab entries:
Emergency procedures
-
Process interruption or system failure
- If Make Primary process is interrupted:
- Check for partial data transfers in target directories:
/shared/sos/media/site-custom
/shared/sos/site-config
-
/home/sos/sosrc
,/home/sosdemo/sosrc
- Verify crontabs on both machines - restore from backup if corrupted
- View crontab using
crontab -l
- View crontab using
- Re-run the Make Primary process - it's designed to be safely repeatable
- Check for partial data transfers in target directories:
- Document any error messages before restarting
- Take screenshots of error dialogs
- Note exact timing of when process failed
- If Make Primary process is interrupted:
-
Data recovery after failed synchronization
- Check backup locations for original data:
- Automatic backups:
/shared/sos/site-backup.[hostname]/
- Manual crontab backups in user home directories
- Automatic backups:
- Restore from backups if synchronization corrupted data
- Stop SOS services before restoring data
- Copy backup data to original locations
- Restart services and verify functionality
- Check backup locations for original data:
-
SSH connectivity completely lost
- Physical access to both machines may be required
- Reset SSH configurations on both machines
- Regenerate host keys if necessary
- Verify network configuration and /etc/hosts files
- Physical access to both machines may be required
Contacting support
If issues persist after trying these solutions, contact SOS support at sos.support@noaa.gov with the following information:
- SOS version number
- Ubuntu version on both machines
- Hostnames of both SOS machines
- Exact error messages (screenshots preferred)
- Network configuration details (
/etc/hosts
contents) - Steps already attempted to resolve the issue
- Current
/shared/sos/site-config/primary.conf
file contents (if it exists) - Include output of diagnostic commands from each machine:
hostname
ssh -o BatchMode=yes [peer] echo "test"
ls -la ~/.ssh/
df -h /shared /home
-
crontab -l
(if accessible)
- Describe current system state (which machine is currently primary)
- Include recent system changes or updates
- Prepare TeamViewer access capability (if available)
primary.conf
file or modify crontabs during active troubleshooting unless specifically instructed by support. The Make Primary process is designed to handle these operations safely.
Comments
0 comments
Please sign in to leave a comment.