Pandas provides Series (1D indexed arrays) and DataFrame (2D indexed arrays) for structured data (like tables with headers).
Installing Pandas
Install via pip:
python -m pip install pandas
Pandas Series
A Series is a 1D array with an index (labels for rows).
Creation:
>>> class="text-secondary">import pandas class="text-secondary">as pd
>>> ser = pd.Series([10, 20, 30, 40]) # Default index: 0–3
>>> ser = pd.Series({'a': 10, 'b': 20}) # Index class="text-secondary">from dict keys: 'a', 'b'
>>> ser = pd.Series(np.arange(10, 50, 10), index=['a', 'b', 'c', 'd']) # Custom index
Description:
ser.shape: (4,) (length 4)
ser.index: Index(['a', 'b', 'c', 'd'])
ser.values: array([10, 20, 30, 40])
ser.head(2): First 2 elements
ser.tail(2): Last 2 elements
Indexing/Slicing:
By label: ser['b'] (20), ser[['a', 'd']] (Series with labels 'a' and 'd')
By position: ser.iloc[0] (10), ser.iloc[[0, 3]] (elements at positions 0 and 3)
Slicing: ser['a':'c'] (labels 'a' to 'c'), ser.iloc[1:] (positions 1 to end)
Boolean Indexing:
>>> ser[ser.values <= 20] # Elements ≤ 20: a(10), b(20)
>>> ser[ser.index != 'a'] # Elements with index ≠ 'a': b(20), c(30), d(40)
Pandas DataFrame
A DataFrame is a 2D array with row and column indices (like a spreadsheet).
Creation:
>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data) # Default indices: 0–2(rows), 0–2(columns)
>>> df = pd.DataFrame(data, index=['a', 'b', 'c';], columns=['A', 'B', 'C']) # Custom indices
>>> data = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> df = pd.DataFrame(data) # Dict: keys = columns, values = column data
Description:
df.shape: (3, 3) (3 rows, 3 columns)
df.index: Row indices
df.columns: Column indices
df.values: 2D array of data
df.info(): Summary (data types, non-null counts)
df.describe(): Descriptive stats (count, mean, std, min, quartiles, max)
Indexing/Slicing:
By column: df['A'] (Series of column 'A'), df[['A', 'C']] (DataFrame with columns 'A' and 'C')
By row: df.loc['a'] (Series of row 'a'), df.iloc[0] (row at position 0)
Slicing: df.loc['a':'c', 'A':'B'] (rows 'a'–'c', columns 'A'–'B')
Boolean Indexing:
>>> df[df['B';] >= 3] # Rows where column 'B' ≥ 3
>>> df[(df['A'] >= 2) & (df['C'] == 9)] # Rows where 'A' ≥ 2 and 'C' = 9
>>> df[df['B'].between(4, 9)] # Rows where 'B' is between 4 and 9
Exchanging Data Between Excel and Pandas
Reading Excel Data: Use pd.read_excel(). Key parameters:
io: File path.
sheet_name: Sheet name/index (default: 0, first sheet).
header: Row number for column labels (default: 0, first row).
index_col: Column name/index for row labels.
usecols: Columns to import (e.g., usecols=['A', 'C']).
Example:
>>> df = pd.read_excel('D:\身份证号.xlsx', engine=27;openpyxl') # Read first sheet
>>> df = pd.read_excel('D:\身份证号.xlsx', sheet_name=[0, 1], index_col='工号', engine='openpyxl') # Read multiple sheets
Writing Excel Data: Use df.to_excel(). Example:
>>> df0.to_excel('D:\\new_file.xlsx') # Save DataFrame to Excel
Using xlwings with Pandas: Read/write multiple DataFrames to the same sheet:
>>> class="text-secondary">import xlwings class="text-secondary">as xw
>>> app = xw.App(visible=True, add_book=False)
>>> bk = app.books.open('D:\\身份证号.xlsx')
>>> sht1 = bk.sheets[0]
>>> sht2 = bk.sheets[1]
>>> sht3 = bk.sheets.add(after=bk.sheets.count, name='多DataFrame')
>>> df1 = sht1.range('A1:E6').options(pd.DataFrame).value # Read sheet 1 class="text-secondary">as DataFrame
>>> df2 = sht2.range('A1:E6').options(pd.DataFrame).value # Read sheet 2 class="text-secondary">as DataFrame
>>> sht3.range('A1').value = df1 # Write df1 to sheet 3
>>> sht3.range('A8').value = df2 # Write df2 below df1