---
# Multiple subject, single trial example

## Conda environment
*always activate a conda environment before starting. This notebook uses [mkconda](https://anaconda.org/kutaslab/mkconda) 0.0.11*

## Python Libraries

In [12]:
from pathlib import Path
from mkpy import mkh5
import pandas as pd

## Build the file
There is now a loop to incorporate multiple subjects into one file. This makes the following changes:
* A variable is added at the beginning with a list of the subjects you want to include in the loop/file (sids)
* Creating the file names is under the loop for the data files (not the h5 files since there is one file) 
* mk5 commands to create the continuous file go under the loop
* An 'if' statement is used for the append_mkdata command for an extra cal file
* Commands after the continuous file is built (get_event_table, set_epochs, export_epochs) are done on the entire file and aren't in the loop <br>
* If you need a refresher on any of the commands, see {doc}`singlesubject_binned`

A time function has been added at the top of the cell to demonstrate that the file now takes a while to build

Code Map:
* For this example we will use the .xls format for the code map (see {doc}`codemaps` xlsx example)

In [13]:
%%time
# set data directories
MKDIG_DIR = Path("../../mkdig")
MKPY_DIR = Path("../../mkpy")

# set the subject number(s) to loop across
sids = ['02', '03', '04', '05', '06', '07', '08', '09', '10', 
    '12', '13', '14', '15', '16', '17', '19', '20', '21',
    '22', '23', '24', '25', '26', '27', '28', '30', '31',
    '33', '34', '35']

# name files  
h5_f = MKPY_DIR / ("stmath.h5")
epochtable = MKPY_DIR / ("stmath.epochs.h5")

# reset the .h5 file
myh5 = mkh5.mkh5(h5_f)
myh5.reset_all()

# loop over subjects to add to one h5 file
for sid in sids:
    
    # subject number
    sub = 'stm'+sid
    
    # subject file names
    yhdr = MKPY_DIR / (sub + ".yhdr")
    eeg = MKDIG_DIR / (sub + ".crw")
    log = MKDIG_DIR / (sub +".x.log")

    # load in subject files
    myh5.create_mkdata(sub, eeg, log, yhdr)
    
    ##stm03 has separate cal files, so append the data   
    if sub == 'stm03':
        myh5.append_mkdata('stm03', 
                           MKDIG_DIR / ("stm03cal.crw"), 
                           MKDIG_DIR / ("stm03cal.x.log"), 
                           MKPY_DIR / ("stm03.yhdr"))    

        
    # calibrate data
    pts, pulse, lo, hi, ccode = 5, 10, -40, 40, 0
    myh5.calibrate_mkdata(sub, # specific data group
                        n_points = pts,   # pts to average
                        cal_size = pulse, # uV
                        lo_cursor = lo,   # lo_cursor ms
                        hi_cursor = hi,   # hi_cursor ms
                        cal_ccode= ccode) # condition code

# Find event codes in the data                        
event_table_raw = myh5.get_event_table(MKPY_DIR / ("stmath_code_map_new.xlsx"))

Found cals in /stm02/dblock_5
Calibrating block /stm02/dblock_0 of 6: (148224,)  
Calibrating block /stm02/dblock_1 of 6: (146944,)  
Calibrating block /stm02/dblock_2 of 6: (148224,)  
Calibrating block /stm02/dblock_3 of 6: (271104,)  
Calibrating block /stm02/dblock_4 of 6: (122624,)  
Calibrating block /stm02/dblock_5 of 6: (19200,)  


/home/astoermann/.conda/envs/test_jupyter_books_11/lib/python3.6/site-packages/mkpy/mkh5.py:3406: LogRawEventCodeMismatch: ../mkdig/stm03.crw eeg data underrun ../mkdig/stm03.x.log, dropping trailing log events [[   291 725055      0      0]
 [   291 725180      0      0]
 [   291 725305      0      0]
 [   291 725430      0      0]
 [   291 725555      0      0]
 [   291 725685      0      0]
 [   291 725810      0      0]
 [   291 725935      0      0]
 [   291 726060      0      0]
 [   291 726185      0      0]
 [   291 726311      0      0]
 [   291 726436      0      0]
 [   291 726561      0      0]
 [   291 726686      0      0]
 [   291 726811      0      0]
 [   291 726936      0      0]
 [   291 727061      0      0]
 [   291 727187      0      0]
 [   291 727312      0      0]
 [   291 727437      0      0]
 [   291 727562      0      0]
 [   291 727687      0      0]
 [   291 727812      0      0]
 [   291 727938      0      0]
 [   291 728063      0      0]
 [   291 72818

Found cals in /stm03/dblock_8
Calibrating block /stm03/dblock_0 of 9: (31232,)  
Calibrating block /stm03/dblock_1 of 9: (92160,)  
Calibrating block /stm03/dblock_2 of 9: (121600,)  
Calibrating block /stm03/dblock_3 of 9: (119040,)  
Calibrating block /stm03/dblock_4 of 9: (84992,)  
Calibrating block /stm03/dblock_5 of 9: (39680,)  
Calibrating block /stm03/dblock_6 of 9: (116992,)  
Calibrating block /stm03/dblock_7 of 9: (119296,)  
Calibrating block /stm03/dblock_8 of 9: (19200,)  
Found cals in /stm04/dblock_7
Calibrating block /stm04/dblock_0 of 8: (122368,)  
Calibrating block /stm04/dblock_1 of 8: (118784,)  
Calibrating block /stm04/dblock_2 of 8: (115968,)  
Calibrating block /stm04/dblock_3 of 8: (116992,)  
Calibrating block /stm04/dblock_4 of 8: (67328,)  
Calibrating block /stm04/dblock_5 of 8: (47360,)  
Calibrating block /stm04/dblock_6 of 8: (115200,)  
Calibrating block /stm04/dblock_7 of 8: (17920,)  
Found cals in /stm05/dblock_6
Calibrating block /stm05/dblock_0 

  len(fails)


Calibrating block /stm10/dblock_1 of 7: (116736,)  
Calibrating block /stm10/dblock_2 of 7: (114688,)  
Calibrating block /stm10/dblock_3 of 7: (119808,)  
Calibrating block /stm10/dblock_4 of 7: (115200,)  
Calibrating block /stm10/dblock_5 of 7: (114944,)  
Calibrating block /stm10/dblock_6 of 7: (18944,)  
Found cals in /stm12/dblock_8
Calibrating block /stm12/dblock_0 of 9: (125440,)  
Calibrating block /stm12/dblock_1 of 9: (4608,)  
Calibrating block /stm12/dblock_2 of 9: (123904,)  
Calibrating block /stm12/dblock_3 of 9: (121088,)  
Calibrating block /stm12/dblock_4 of 9: (123136,)  
Calibrating block /stm12/dblock_5 of 9: (38144,)  
Calibrating block /stm12/dblock_6 of 9: (83456,)  
Calibrating block /stm12/dblock_7 of 9: (117760,)  
Calibrating block /stm12/dblock_8 of 9: (18688,)  
Found cals in /stm13/dblock_7
Calibrating block /stm13/dblock_0 of 8: (37632,)  
Calibrating block /stm13/dblock_1 of 8: (67072,)  
Calibrating block /stm13/dblock_2 of 8: (48640,)  
Calibrating b

/home/astoermann/.conda/envs/test_jupyter_books_11/lib/python3.6/site-packages/mkpy/mkh5.py:3437: LogRawEventCodeMismatch: These log event codes differ from the EEG codes, make sure you know why
{mismatch_events}


Found cals in /stm24/dblock_6
Calibrating block /stm24/dblock_0 of 7: (132864,)  
Calibrating block /stm24/dblock_1 of 7: (132352,)  
Calibrating block /stm24/dblock_2 of 7: (135680,)  
Calibrating block /stm24/dblock_3 of 7: (134144,)  
Calibrating block /stm24/dblock_4 of 7: (132864,)  
Calibrating block /stm24/dblock_5 of 7: (132352,)  
Calibrating block /stm24/dblock_6 of 7: (18688,)  
Found cals in /stm25/dblock_6
Calibrating block /stm25/dblock_0 of 7: (120576,)  
Calibrating block /stm25/dblock_1 of 7: (125440,)  
Calibrating block /stm25/dblock_2 of 7: (121344,)  
Calibrating block /stm25/dblock_3 of 7: (122624,)  
Calibrating block /stm25/dblock_4 of 7: (121344,)  
Calibrating block /stm25/dblock_5 of 7: (115968,)  
Calibrating block /stm25/dblock_6 of 7: (18176,)  
Found cals in /stm26/dblock_8
Calibrating block /stm26/dblock_0 of 9: (120320,)  
Calibrating block /stm26/dblock_1 of 9: (11520,)  
Calibrating block /stm26/dblock_2 of 9: (122880,)  
Calibrating block /stm26/dblo

As of mkpy 0.2.0 to match events with a codemap regexp pattern, the
ccode column in stmath_code_map_new.xlsx must also match the log_ccode
in the datablock. If this behavior is not desired, delete or rename
the ccode column in the codemap.


searching codes in: stm02/dblock_0
searching codes in: stm02/dblock_1
searching codes in: stm02/dblock_2
searching codes in: stm02/dblock_3
searching codes in: stm02/dblock_4
searching codes in: stm02/dblock_5
searching codes in: stm03/dblock_0
searching codes in: stm03/dblock_1
searching codes in: stm03/dblock_2
searching codes in: stm03/dblock_3
searching codes in: stm03/dblock_4
searching codes in: stm03/dblock_5
searching codes in: stm03/dblock_6
searching codes in: stm03/dblock_7
searching codes in: stm03/dblock_8
searching codes in: stm04/dblock_0
searching codes in: stm04/dblock_1
searching codes in: stm04/dblock_2
searching codes in: stm04/dblock_3
searching codes in: stm04/dblock_4
searching codes in: stm04/dblock_5
searching codes in: stm04/dblock_6
searching codes in: stm04/dblock_7
searching codes in: stm05/dblock_0
searching codes in: stm05/dblock_1
searching codes in: stm05/dblock_2
searching codes in: stm05/dblock_3
searching codes in: stm05/dblock_4
searching codes in: 

CPU times: user 2h 4min 40s, sys: 1min 9s, total: 2h 5min 50s
Wall time: 2h 6min 6s


```{note}
The warnings above:

`LogRawEventCodeMismatch` is to check the data and make sure it is okay that the raw and log file event codes mismatch. This happens sometimes when you pause DIG and should be fine as long as the mismatch is trailing events (as in this case).

`dropping out of bounds` happens when the pause mark happened outside of the window you set for cals to be measured (i.e., -40ms to 40ms)

```

### Data tranformations to the event table
You do not have to do this step, but it is useful to know that you can 'break in' to the pandas data frame to add more data or drop rows to make the epochs file not as large when you export it. 
#### Add extra variables across all columns
These are new variables that you want columns for that are not imported with the header or code map (useful if you are combining multiple experiments but want one code map)

In [14]:
# show the column names before adding variables
display(event_table_raw.columns)

Index(['data_group', 'dblock_path', 'dblock_tick_idx', 'dblock_ticks',
       'crw_ticks', 'raw_evcodes', 'log_evcodes', 'log_ccodes', 'log_flags',
       'epoch_match_tick_delta', 'epoch_ticks', 'dblock_srate', 'match_group',
       'idx', 'dlim', 'anchor_str', 'match_str', 'anchor_code', 'match_code',
       'anchor_tick', 'match_tick', 'anchor_tick_delta', 'is_anchor', 'ccode',
       'regexp', 'List', 'ThreatCondition', 'Item', 'Stimulus', 'Operand1',
       'Operand2', 'Operand3', 'ShownProduct', 'CorrectProduct', 'Condition',
       'ResponseAccuracy', 'Anticipation', 'Certainty'],
      dtype='object')

In [15]:
# code for experiment
event_table_raw['expt'] = "expt_1"
# show the a few variables from the data frame with the new column added
event_table_raw[['data_group','crw_ticks','regexp','match_code','Stimulus','expt', 'anchor_tick_delta']].head(6)

Unnamed: 0,data_group,crw_ticks,regexp,match_code,Stimulus,expt,anchor_tick_delta
0,stm02,115262,1 (#2001) .* 12001 22001 2064 777 1040,2001,O1,expt_1,0
1,stm02,115417,2 (#2001) .* 12001 22001 2064 777 1040,2001,O2,expt_1,0
2,stm02,115571,3 (#2001) .* 12001 22001 2064 777 1040,2001,O3,expt_1,0
3,stm02,115725,4 (12001) (#22001) (2064) 777 1040,12001,P,expt_1,-625
4,stm02,116350,4 (12001) (#22001) (2064) 777 1040,22001,P,expt_1,0
5,stm02,116764,4 (12001) (#22001) (2064) 777 1040,2064,P,expt_1,414


#### Get reaction times
*Note: this example has an event in between the one we want to time lock too and the one we want to get a reaction time for. If you do not have this problem, then you just need to drop the extra capture group lines.*
* *Use code similar to the first line below (.query)*
* *Query for anchor_tick_delta > or < 0 depending on where you put the anchor (#)*

In [16]:
# pull out the rows of the dataframe with positive response ticks(i.e, after the anchor) and copy to new dataframe
rt_df = event_table_raw.query('anchor_tick_delta > 0').copy()

# rename anchor tick delta to response ticks, so variables don't get overwritten on merging
rt_df["response_ticks"] = rt_df["anchor_tick_delta"]

# select on the variables needed for merging
rt_df = rt_df.loc[:, ['data_group','response_ticks','anchor_code']]

# set the index for merging later
rt_df.set_index(['data_group','anchor_code'],inplace=True)

# check the dataframe has the variables you want for merging later
rt_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,response_ticks
data_group,anchor_code,Unnamed: 2_level_1
stm02,22001,414
stm02,21003,899
stm02,22007,417
stm02,23014,374
stm02,22015,402


In [17]:
#get clean rows of the dataframe for merging (i.e., drop the two capture groups that shouldn't have time-locked epochs)
clean_events = event_table_raw.query('log_evcodes < 20000 and anchor_tick_delta <= 0')

# set the index for merging later
clean_events.set_index(['data_group','anchor_code'],inplace=True)

# check the dataframe has the variables you want for merging later; display only a few variables for demonstration
# anchor_tick_delta is negative for the product event because the time-locked event is before the anchor code
clean_events[['crw_ticks','regexp','match_code','Stimulus','expt', 'anchor_tick_delta']].head(6)

Unnamed: 0_level_0,Unnamed: 1_level_0,crw_ticks,regexp,match_code,Stimulus,expt,anchor_tick_delta
data_group,anchor_code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
stm02,2001,115262,1 (#2001) .* 12001 22001 2064 777 1040,2001,O1,expt_1,0
stm02,2001,115417,2 (#2001) .* 12001 22001 2064 777 1040,2001,O2,expt_1,0
stm02,2001,115571,3 (#2001) .* 12001 22001 2064 777 1040,2001,O3,expt_1,0
stm02,22001,115725,4 (12001) (#22001) (2064) 777 1040,12001,P,expt_1,-625
stm02,1003,21491,1 (#1003) .* 11003 21003 1040 777 1040,1003,O1,expt_1,0
stm02,1003,21649,2 (#1003) .* 11003 21003 1040 777 1040,1003,O2,expt_1,0


In [18]:
# merge response ticks and clean dataframes together 
clean_rt = clean_events.join(rt_df,on=['data_group','anchor_code'])

# reset the index to the default (the rest of mkpy expects a certain index structure)
clean_rt.reset_index(['anchor_code'],inplace=True)

# check the dataframe; display only a few variables for demonstration
# response ticks only merge onto the product rows since that is what the response is to
clean_rt[['crw_ticks','regexp','match_code','Stimulus', 'expt', 'anchor_tick_delta','response_ticks']].head(8)

Unnamed: 0_level_0,crw_ticks,regexp,match_code,Stimulus,expt,anchor_tick_delta,response_ticks
data_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
stm02,115262,1 (#2001) .* 12001 22001 2064 777 1040,2001,O1,expt_1,0,
stm02,115417,2 (#2001) .* 12001 22001 2064 777 1040,2001,O2,expt_1,0,
stm02,115571,3 (#2001) .* 12001 22001 2064 777 1040,2001,O3,expt_1,0,
stm02,115725,4 (12001) (#22001) (2064) 777 1040,12001,P,expt_1,-625,414.0
stm02,21491,1 (#1003) .* 11003 21003 1040 777 1040,1003,O1,expt_1,0,
stm02,21649,2 (#1003) .* 11003 21003 1040 777 1040,1003,O2,expt_1,0,
stm02,21803,3 (#1003) .* 11003 21003 1040 777 1040,1003,O3,expt_1,0,
stm02,21958,4 (11003) (#21003) (1040) 777 1040,11003,P,expt_1,-625,899.0


#### Merge in data from Excel

In [19]:
# set columns of interest from the spreadsheet
npsych_coi = [
    'Subj', 'Gndr', 'Age', 'NativeLang', 'Bilingual', 'Major', 'GoodMath',
    'Rhand','Lhand', 'Famhand', 'WhoLH', 'ARTCorrect', 'ARTFoils', 
    'MRTCorrect','MRTFoils', 'MathAnxiety1', 'MathAnxiety2', 
    'BEM_M', 'BEM_F', 'BEM_N','DS_F', 'DS_B', 'DS_H'
]
# read the Excel spreadsheet into pandas and do some data transformation; 
# look up pd.read_excel for arguments
stmath_neuro = (
    pd.read_excel(
        "/home/kadelong/Exps/STMath/Neuro/Neuro.xlsx",
        header=2,
        nrows=30)
    .loc[:, npsych_coi]
    .rename(columns={"Subj": "data_group"})
    .set_index('data_group')
)
# check the data imported properly; display only a few variables for demo
stmath_neuro[['Rhand','Lhand','ARTCorrect','MRTCorrect', 'DS_F']].head()

Unnamed: 0_level_0,Rhand,Lhand,ARTCorrect,MRTCorrect,DS_F
data_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
stm02,24,6,7,5,8
stm03,26,3,1,2,10
stm04,21,5,6,5,13
stm05,20,7,5,8,9
stm06,29,9,7,12,8


#### Merge dataframes together 
This gets a final event_table dataframe for exporting by merging all the other dataframes you already created.

In [20]:
# merge neuropsych dataframe with response ticks dataframe
event_table=clean_rt.join(stmath_neuro, on="data_group")

# reset the index to the default (the rest of mkpy expects a certain index structure)
event_table.reset_index(['data_group'],inplace=True)

# display columns and a few variables to check everything merged correctly
display(event_table.columns)
event_table[['data_group','crw_ticks','regexp','Stimulus','expt','response_ticks','Rhand','Lhand']].head(8)

Index(['data_group', 'anchor_code', 'dblock_path', 'dblock_tick_idx',
       'dblock_ticks', 'crw_ticks', 'raw_evcodes', 'log_evcodes', 'log_ccodes',
       'log_flags', 'epoch_match_tick_delta', 'epoch_ticks', 'dblock_srate',
       'match_group', 'idx', 'dlim', 'anchor_str', 'match_str', 'match_code',
       'anchor_tick', 'match_tick', 'anchor_tick_delta', 'is_anchor', 'ccode',
       'regexp', 'List', 'ThreatCondition', 'Item', 'Stimulus', 'Operand1',
       'Operand2', 'Operand3', 'ShownProduct', 'CorrectProduct', 'Condition',
       'ResponseAccuracy', 'Anticipation', 'Certainty', 'expt',
       'response_ticks', 'Gndr', 'Age', 'NativeLang', 'Bilingual', 'Major',
       'GoodMath', 'Rhand', 'Lhand', 'Famhand', 'WhoLH', 'ARTCorrect',
       'ARTFoils', 'MRTCorrect', 'MRTFoils', 'MathAnxiety1', 'MathAnxiety2',
       'BEM_M', 'BEM_F', 'BEM_N', 'DS_F', 'DS_B', 'DS_H'],
      dtype='object')

Unnamed: 0,data_group,crw_ticks,regexp,Stimulus,expt,response_ticks,Rhand,Lhand
0,stm02,115262,1 (#2001) .* 12001 22001 2064 777 1040,O1,expt_1,,24,6
1,stm02,115417,2 (#2001) .* 12001 22001 2064 777 1040,O2,expt_1,,24,6
2,stm02,115571,3 (#2001) .* 12001 22001 2064 777 1040,O3,expt_1,,24,6
3,stm02,115725,4 (12001) (#22001) (2064) 777 1040,P,expt_1,414.0,24,6
4,stm02,21491,1 (#1003) .* 11003 21003 1040 777 1040,O1,expt_1,,24,6
5,stm02,21649,2 (#1003) .* 11003 21003 1040 777 1040,O2,expt_1,,24,6
6,stm02,21803,3 (#1003) .* 11003 21003 1040 777 1040,O3,expt_1,,24,6
7,stm02,21958,4 (11003) (#21003) (1040) 777 1040,P,expt_1,899.0,24,6


## Set epochs
Set epochs with all the new added columns. Same as the single subject example.

In [21]:
myh5.set_epochs('stmath', event_table, -100, 900) # tmin ms, tmax ms,



Sanitizing event table data types for mkh5 epochs table ...


  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)
  + "skipping epoch {0}".format(e)


```{note}
The warnings above `data error` happens when the pause mark happened outside of the window you set to be epoched (i.e., -40ms to 40ms)

```

## Export epochs

In [22]:
# set columns of interest for export to reduce file size, can also reorder thiem this way. 
# some columns are required for other software in the lab, so be careful when dropping
COI = ['epoch_id','match_time','data_group', 'Item', 'List', 
       'ThreatCondition', 'Condition', 'Stimulus', 'ResponseAccuracy', 
       'Operand1', 'Operand2', 'Operand3', 'ShownProduct', 'CorrectProduct',
       'Gndr', 'Age', 'NativeLang', 'Bilingual','Major', 'GoodMath', 
       'Rhand', 'Lhand', 'Famhand', 'WhoLH', 'ARTCorrect', 'ARTFoils', 
       'MRTCorrect', 'MRTFoils', 'MathAnxiety1', 'MathAnxiety2',
       'BEM_M', 'BEM_F', 'BEM_N', 'DS_F', 'DS_B', 'DS_H',
       'response_ticks', 'log_evcodes', 'log_flags', 
       'lle', 'lhz', 'MiPf', 'LLPf', 'RLPf', 'LMPf', 'RMPf', 'LDFr', 'RDFr', 
       'LLFr', 'RLFr', 'LMFr', 'RMFr', 'LMCe', 'RMCe', 'MiCe', 'MiPa', 'LDCe', 
       'RDCe', 'LDPa', 'RDPa', 'LMOc', 'RMOc', 'LLTe', 'RLTe', 'LLOc', 'RLOc', 
       'MiOc', 'A2', 'HEOG', 'rle', 'rhz']   

# export epochs
myh5.export_epochs('stmath', epochtable, file_format='h5', columns=COI)
    
print('done stmath1')

done stmath1
